IEEE Geoscience and Remote Sensing. Volume 9. No. 4 (December 2021)

Теги: magazine magazine ieee geoscience and remote sensing

ISBN: 2168-6831

Год: 2021

Похожие

Olive (December 2021)

BBC History UK (December 2021)

Your Home and Garden (December 2021)

Vogue Living Australia (November-December 2021)

Текст

Introducing IEEE Collabratec™
The premier networking and collaboration site for technology
professionals around the world.

IEEE Collabratec is a new, integrated online community where IEEE members,
researchers, authors, and technology professionals with similar fields of interest
can network and collaborate, as well as create and manage content.
Featuring a suite of powerful online networking and collaboration tools,
IEEE Collabratec allows you to connect according to geographic location,
technical interests, or career pursuits.
You can also create and share a professional identity that showcases key
accomplishments and participate in groups focused around mutual interests,
actively learning from and contributing to knowledgeable communities.
All in one place!

Learn about IEEE Collabratec at
ieee-collabratec.ieee.org

Network.
Collaborate.
Create.

DECEMBER 2021
VOLUME 9, NUMBER 4

–40
–60

–60

Latitude (°)

0
–20
–40
–60

–150 –100 –50
0
50
Longitude (°)

100

150

–60

1
0.5
0

20
0
–20
–40
–60

–150 –100 –50
0
50
Longitude (°)

100

Latitude (°)

150

1
0.5
0

0
–20
–60

100

150

20
0

–40
–60

1
0

40
20
0
–20
–40

–1

–60

–150 –100 –50
0
50
Longitude (°)

100

150

Latitude (°)

60
Spatial–Temporal

60
40

1
0

40
20
0
–20
–40

–1

–60

–150 –100 –50
0
50
Longitude (°)

(a)

(b)

100

150

100

150

1
0.5
0

–150 –100 –50
0
50
Longitude (°)

–20

100

–40

–150 –100 –50
0
50
Longitude (°)

1
0.5
0

–150 –100 –50
0
50
Longitude (°)
80

I (x)

1 × 1 × 46

I (x)

Latitude (°)

150

–20

Latitude (°)

100

1
0.5
0

–40

Spatial–Temporal I (x )

–20
–40

–150 –100 –50
0
50
Longitude (°)

FEATURES

I (x)

–20

1
0.5
0

I (x)

ethods for Small, Weak Object
M
Detection in Optical High-Resolution
Remote Sensing Images

by Wei Han, Jia Chen, Lizhe Wang, Ruyi Feng,

Fengpeng Li, Lin Wu, Tian Tian, and Jining Yan
Spatial–Temporal

Latitude (°)

Spatial–Temporal

7×7×1

I (x)

Latitude (°)

WWW.GRSS-IEEE.ORG

1
0
–1

–150 –100 –50
0
50
Longitude (°)

100

150

(c)

Hyperspectral Image Clustering

by Han Zhai, Hongyan Zhang, Pingxiang Li,

and Liangpei Zhang

PG. 191

hange Detection From Very-HighC
Spatial-Resolution Optical Remote
Sensing Images

by Dawei Wen, Xin Huang, Francesca Bovolo,

Jiayi Li, Xinli Ke, Anlu Zhang, and
Jón Atli Benediktsson

ON THE COVER:
The cover on this issue illustrates the development
trend of high-resolution remote sensing (HRRS) data
sets over the last decade. The feature by Han, et al.,
on page 8, reviews the use of these data sets in the
development, verification, and evaluation of new algorithms for detection of objects in HRRS images.

102

he CCSDS 123.0-B-2 “Low-Complexity
T
Lossless and Near-Lossless Multispectral
and Hyperspectral Image Compression”
Standard

by Miguel Hernández-Cabronero, Aaron B. Kiely,

NASA/JPL

Matthew Klimesh, Ian Blanes, Jonathan Ligo,
Enrico Magli, and Joan Serra-Sagristà

120

dvances and Opportunities in Remote
A
Sensing Image Geometric Registration

by Ruitao Feng, Huanfeng Shen,

SCOPE

Jianjun Bai, and Xinghua Li

IEEE Geoscience and Remote Sensing Magazine (GRSM) will
inform readers of activities in the IEEE Geoscience and
Remote Sensing Society, its technical committees,
and chapters. GRSM will also inform and educate
readers via technical papers, provide information on
international remote sensing activities and new satellite
missions, publish contributions on education activities,
industrial and university profiles, conference news, book
reviews, and a calendar of important events.

143

Deep Learning Meets SAR

by Xiao Xiang Zhu, Sina Montazeri, Mohsin Ali,

Yuansheng Hua, Yuanyuan Wang, Lichao Mou,
Yilei Shi, Feng Xu, and Richard Bamler

173

Forward-Looking GroundPenetrating Radar

by Davide Comite, Fauzia Ahmad,

Moeness G. Amin, and Traian Dogaru

191

Gaussianizing the Earth

by J. Emmanuel Johnson, Valero Laparra,
Digital Object Identifier 10.1109/MGRS.2021.3120176
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

María Piles, and Gustau Camps-Valls
1

FEATURES (CONTINUED)

209

ireless Sensor Networks
W
Applied to Precision
Agriculture

by Mónica Karel Huerta,

Andrea García-Cedeño,
Juan Carlos Guillermo,
and Roger Clotet

223

EDITORIAL BOARD
Dr. James L. Garrison
Editor-in-Chief
School of Aeronautics and
Astronautics
Purdue University
West Lafayette, Indiana 47907
USA
Email: jlg@ieee.org
Dr. Paolo Gamba
University of Pavia, Italy

pectral Variability
S
in Hyperspectral
Data Unmixing

Dr. Linda Hayden
Center of Excellence in Remote Sensing
Education and Research
Elizabeth City State University, USA
Email: haydenl@mindspring.com

Tales Imbiriba,
José Carlos Moreira Bermudez,
Cédric Richard, Jocelyn Chanussot,
Lucas Drumetz, Jean-Yves Tourneret,
Alina Zare, and Christian Jutten

Dr. Irena Hajnsek
ETH Zürich, Switzerland, and DLR,
Germany
Email: Irena.Hajnsek@dlr.de

by Ricardo Augusto Borsoi,

COLUMNS &
DEPARTMENTS
3
6
271
274
284
289
293

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

FROM THE EDITOR
PRESIDENT’S MESSAGE
WOMEN IN GRSS
TECHNICAL COMMITTEES
CHAPTERS
EDUCATION

Dr. Michael Inggs
University of Cape Town,
South Africa
Email: mikings@gmail.com
Dr. John Kerekes
Cochair, Conference Advisory Committee
Rochester Institute of Technology, USA
Email: kerekes@cis.rit.edu
Dr. David M. Le Vine
NASA Goddard Space Flight Center, USA
Email: David.M.LeVine@nasa.gov
Dr. Gail Skofronick Jackson
NASA Goddard Space Flight Center, USA
Email: Gail.S.Jackson@nasa.gov
Dr. Marwan Younis
DLR, Germany
Email: marwan.younis@dlr.de

IN MEMORIAM

MISSION STATEMENT
The IEEE Geoscience and Remote Sensing Society of the IEEE seeks to advance science and
technology in geoscience, remote sensing and
related fields using conferences, education, and
other resources.

IEEE Geoscience and Remote Sensing Magazine (ISSN 2168-6831) is published
quarterly by The Institute of Electrical and Electronics Engineers, Inc., IEEE
Headquarters: 3 Park Ave., 17th Floor, New York, NY 10016-5997, +1 212 419
7900. Responsibility for the contents rests upon the authors and not upon the
IEEE, the Society, or its members. IEEE Service Center (for orders, subscriptions, address changes): 445 Hoes Lane, Piscataway, NJ 08854, +1 732 981
0060. Individual copies: IEEE members US$20.00 (first copy only), nonmembers US$110.00 per copy. Subscription rates: included in Society fee for each
member of the IEEE Geoscience and Remote Sensing Society. Nonmember
subscription prices available on request. Copyright and Reprint Permissions:
Abstracting is permitted with credit to the source. Libraries are permitted to
photocopy beyond the limits of U.S. Copyright Law for private use of patrons:
1) those post-1977 articles that carry a code at the bottom of the first page,

GRS OFFICERS
President
Dr. David Kunkee
The Aerospace Corporation, USA
Executive Vice President
Dr. Mariko Burgin
Jet Propulsion Laboratory (JPL), USA
Vice President of Publications
Dr. William Emery
University of Colorado, USA
Vice President of Information Resources
Dr. Sidharth Misra
Jet Propulsion Laboratory (JPL), USA
Vice President of Professional Activities
Dr. Lorenzo Bruzzone
University of Trento, Italy
Vice President of Meetings and Symposia
Dr. Saibun Tjuatja
The University of Texas at Arlington
Vice President of Technical Activities
Dr. Fabio Pacifici
Maxar, USA
Secretary
Dr. Steven C. Reising
Colorado State University, USA
Chief Financial Officer
Dr. John Kerekes
Rochester Institute of Technology, USA
IEEE PERIODICALS
MAGAZINES DEPARTMENT
Journals Production Manager
Sara T. Scudder
Senior Managing Editor
Geraldine Krolin-Taylor
Senior Art Director
Janet Dudar
Associate Art Director
Gail A. Schnitzer
Production Coordinator
Theresa L. Smith
Director, Business Development–
Media & Advertising
Mark David
+1 732 465 6473
m.david@ieee.org
Fax: +1 732 981 1855
Advertising Production Manager
Felicia Spagnoli
Production Director
Peter M. Tuohy
Editorial Services Director
Kevin Lisankie
Senior Director, Publishing Operations
Dawn M. Melley

provided the per-copy fee indicated in the code is paid through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923 USA; 2) pre-1978 articles
without fee. For all other copying, reprint, or republication information, write to:
Copyrights and Permission Department, IEEE Publishing Services, 445 Hoes Lane,
Piscataway, NJ 08854 USA. Copyright © 2021 by the Institute of Electrical and Electronics Engineers, Inc. All rights reserved. Application to Mail at Periodicals Postage
Prices is pending at New York, New York, and at additional mailing offices. Canadian GST #125634188. Canada Post Corporation (Canadian distribution) publications mail agreement number 40013885. Return undeliverable Canadian addresses
to PO Box 122, Niagara Falls, ON L2E 6S8 Canada. Printed in USA.
IEEE prohibits discrimination, harassment, and bullying. For more information,
visit http://www.ieee.org/web/aboutus/whatis/policies/p9-26.html.

Digital Object Identifier 10.1109/MGRS.2021.3120166

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

FROM THE EDITOR
BY JAMES L. GARRISON

Introducing the December Issue

elcome to the December 2021 Issue of IEEE
Geoscience and Remote Sensing Magazine! Our
theme in this issue is “innovative methods for different
modalities.” With this in mind, we have nine feature
articles covering a variety of different processing and
analysis techniques with applications across a range
of remote sensing modalities.
Our first five features lie in the optical spectrum. We
start off with the problem of identifying small, weak,
and typically anthropogenic objects in high-resolution
remote sensing images. Applications such as urban
monitoring, military reconnaissance, and national security all make use of this capability. Han et al. review
the challenges of this problem.
A broad range of object-detection frameworks are described, including template matching, object-based image analysis, classical machine learning, and deep learning (DL). These are applied to 13 widely used data sets
and evaluated for detection speed and accuracy. Recent
advances to improve performance in the presence of image degradation, sensor limitations, object variation,
and insufficient training data as well as improvements
in suppressing background information and incorporating related context information are presented. Some future research directions include the use of multisource
data fusion, weakly supervised detection, automatic
neural architecture search, and a universal object framework. The article concludes by identifying the promising
future research directions. This issue’s cover image was
taken from Figure 6 of the article.
Hyperspectral images (HSIs) are high-dimensional
data sets, which can be characterized as having a “cube”
structure with thousands of spectral bands forming the
third dimension. The interpretation of HSI data using
supervised methods requires a large amount of high-
quality labeled data for training. Collecting and processing a sufficiently large training set is very labor and time
Digital Object Identifier 10.1109/MGRS.2021.3129109
Date of current version: 14 January 2022

DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

intensive, but there is a risk of underfitting if an insufficient number of examples are provided. Unsupervised
classification methods can expand the use and interpretation of HSIs in some applications.
Zhai et al. review the current status of clustering, a
widely used unsupervised method that groups similar
pixels and separates dissimilar pixels into different classes
based only upon the properties of the hyperspectral data
themselves, obviating the need for labeled samples. They
group clustering methods into nine main categories: centroid based, density based, probability based, bionics
based, intelligent computing based, graph based, subspace clustering, DL based, and hybrid-mechanism based.
Several popular clustering methods are then evaluated on
two widely used images: Indian Pines and the University
of Houston. Quantitative measure of the clustering performance (e.g., overall accuracy and purity) along with running time were compared across these different methods.
Spectral–spatial methods were generally found to
outperform spectral-based approaches, suggesting the
value that spatial information adds to improve clustering. Centroid-, density-, and probability-based methods
generally did not perform well because HSIs often do
not meet their basic assumptions, but they have low
complexity and are efficient with large data sets. Two
examples of recently developed subspace clustering
methods were found to show good potential for use
with HSIs, but at a large computational cost.
The article concludes by identifying several HSI clustering challenges and possible future research lines, including the tradeoff between accuracy and efficiency,
pointing toward hybrid approaches and the integration
with high-performance computing. Multifeature methods and object or subpixel-based methods are also identified, along with DL, as future research directions. F inally,
automatic estimation of the number of clusters is an important research problem that has not, thus far, received
much attention.
The feature by Wen, et al., is concerned with change
detection, an important technique in remote sensing. This
3

becomes a particularly challenging problem with the advent of
very high resolution (VHR) images. A comprehensive review
of the research on VHR change detection is provided covering
methods, applications, and discussion of future directions.
Moving on, the next article addresses compression, a necessity for handling an increasing amount of data while being
limited by communication bandwidth or power. As indicated
in the previous article, HSI generates a substantially larger
volume of data than other imagers (up to 5 TB per day in the
case of HyspIRI), so effective compression is required. “Nearlossless” algorithms can provide a balance between reduction
in data volume and error by allowing the user to specify a
bound on the maximum error introduced by compression.
In our fourth feature, Hernández-Cabronero et al. present a comprehensive review of the Consultative Committee
for Space Data Systems (CCSDS) 123.0-B-2 Standard with
“Low-Complexity Lossless and Near-Lossless Multispectral
and Hyperspectral Image Compression,” the latest in a series of standards developed by the CCSDS.
CCSDS 123.0-B-2 incorporates support for near-lossless
compression to achieve significantly better results. It has
a number of novel features, including enhanced performance on low-entropy data, modes to facilitate efficient
hardware implementation, and support for ancillary information. Decompression is backward compatible with data
generated by CCSDS 123.0-B-1.
Compression performance was demonstrated using
mostly public data consisting of 17 multispectral images,
38 HSIs and two sounder data samples, produced from 14
different instruments. Generally, the new standard was
able to meet state-of-the-art performance specifications in
absolute or relative error measurements.
Our fifth feature, by Feng et al., addresses systematic geometric distortions in attempting to align two or more remote
sensing images, collected at different times, with different
viewing angles, or from different instruments. Registration
techniques have been developed to perform this alignment
using information in the images themselves. This is often a
required preprocessing step for advanced methods such as
image mosaicking or image fusion.
A review of intensity-, feature-based, and combination
approaches to registration is presented, along with evaluation methods for registration performance (tie-point accuracy, transformation model performance, and alignment
error). Some future trends include acceleration of the registration process, the use of compressed sensing methods,
and frame-by-frame alignment. A combination of different
advanced methods and high-performance computing may
be necessary to meet future requirements for high-resolution, heterogenous, and cross-scale remote sensing images.
The next feature, by Zhu et al., marks a good transition
from optical to microwave modalities, describing the largely
unrealized potential to apply DL methods (which have a long
history in optical remote sensing) to synthetic aperture radar
(SAR) data. DL models seek to encode input data into effective feature representations for target tasks. Common meth4

ods include convolutional neural networks, recurrent neural
networks, and generative adversarial networks. Most of the
DL approaches are supervised, however, and the existence of
high-quality benchmark data for training is important.
Although DL has proven quite effective in extracting
data from optical images, its application to SAR has been
quite limited mostly due to the lack of these large and representative benchmark data sets. In addition, some of the
specific characteristics of SAR signals have made the direct
application of DL models more difficult. These characteristics include their larger dynamic range, signal statistics,
imaging geometry, and that native SAR data are complex
with much information content in the phase.
This article reviews six typical applications of DL to
SAR: terrain surface classification, object detection, parameter inversion, despeckling, interferometric SAR, and the
data fusion of SAR with optical images. The generation of
representative training data sets, unsupervised DL, interferometric data processing, quantification of uncertainty,
large-scale nonlinear optimization problems, and cognitive sensors are identified as promising future trends in this
area. Several spaceborne SAR missions are expected to be
launched in the upcoming years. Hopefully, this article will
encourage more joint initiatives in this area.
Forward-looking ground-penetrating radar (FL-GPR) has
found important applications in real-time security, military
situational awareness, and humanitarian demining. Typically mounted on a vehicle, FL-GPR can provide target detection from a standoff distance. Comite et al. review methods
of detecting, locating, and imaging surface targets from arraybased FL-GPR systems, considering aspects of both the electromagnetic modeling and signal processing in the problem
formulation and solutions. These are challenging problems as
the signal return is strongly influenced by soil conditions and
surface roughness. Furthermore, the target signature can be
quite weak because most of the transmitted energy is forward
scattered, and returns from the ground interface can dominate the radar measurements and obscure the target.
Electromagnetic modeling and image-formation methods applied to this problem are introduced. The article also
reviews migration approaches adapted from seismology,
microwave tomography, and data-adaptive/compressive
sensing. The use of FL-GPR from unmanned aerial vehicles
is a promising future research area with a number of challenges, such as antenna design. Other open issues concern
the detection of nonmetallic targets and real-time operation
under realistic conditions. As in many other remote sensing
fields, machine learning is attracting interest relevant to FLGPR. Multiplatform data fusion under communication and
computation constraints is another important research area.
Copious amounts of data do not necessarily mean large
quantities of information. Quantifying the information
content in Earth science and climate data can be difficult as
the application of information theory requires a good estimation of the probability densities. For many types of remote
sensing data, producing the density estimate is problematic.
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

Johnson et al. review work on “Gaussianization” methods to
produce statistics that can be used to estimate informationtheoretic measures (e.g., entropy, total correlation, and mutual information). This methodology scales to high dimensions, uses a simple orthogonal transform, and does
not assume any parametric form for the density.
This approach is demonstrated on several distinct
types of data, including radar backscattering intensities,
hyperspectral data, and aerial optical images. It is also applied to quantify the information content of soil–vegetation status in agroecosystems. Code and demonstrations of
the implemented algorithms and IT measures are provided.
Next we have a literature review on the use of wireless
sensor networks for precision agriculture, focusing on Latin
America. Huerta et al. describe how these networks have
been applied to improve traditional agricultural processes
in the region by monitoring the weather and environment
in a noninvasive manner. They document the growth and
global distribution of publications on this topic and the
benefit of this technology to the agricultural industry in
terms of time, production, and environmental factors.
Our last feature concern spectral variability in Hyperspectral images. Bayesian, parametric and local endmember (EM)
techniques have been developed to address this problem. A
literature review covers both classic and recent approaches
and provides a new taxonomy to organize these methods
from the perspective of the user, based upon the necessary

amount of supervision and the computational cost. The article concludes with an outline of future research directions.
“Women in GRSS” reports on the IGARSS GRSS Diversity Fireside Chat, in conjunction with the 2021 IEEE International Geoscience and Remote Sensing Symposium
(IGARSS) conference, and the Women in Engineering
(WIE) International Leadership Conference, both held virtually this year.
This issue contains two Technical Committee (TC) columns. The first, from the Information Analysis and Data Fusion TC, presents results from this year’s data fusion contest
with the theme of “Geospatial AI for Social Good.” The second
one is from the Frequency Allocations in Remote Sensing TC,
which reviews items relevant to microwave remote sensing on
the World Radiocommunication Conference agenda for 2023.
The student branch of the University of Chinese Academy of Sciences, established in 2013, is featured in our
Chapters column. The “Education” column reports on the
“Green in the City” high school program targeting 16- and
17-year-old pupils in Flanders, Belgium, and held in conjunction with IGARSS.
Lastly, I am sorry to report on the loss of two very active members of the geoscience and remote sensing community: Tom von Deak, who worked to ensure that radiofrequency spectrum needs for Earth science remote sensing
(continued on p. 7)

1 year free PPK for UAV applications - Try it!

QUANTA - Direct Georeferencing
»

Cost-effective and Full-featured solution

Real-time and Post-processing

Land and Aerial mapping projects

www.sbg-systems.com

DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

PRESIDENT’S MESSAGE
BY DAVID KUNKEE

GRSS Accomplishments in 2021:
Success and Unexpected Turns

n this last message of the year, I think it is important
to summarize accomplishments of the Society in 2021
and describe our preparations for success in 2022. By the
numbers, our Society continues to grow. We added two
student branch Chapters in September with another one
expected in November. When this one receives its final
approval, the IEEE Geoscience and Remote Sensing Society (GRSS) community will total 70 Chapters, 28 student branch Chapters, and 10 ambassadors engaged with
communities in locations where we hope GRSS Chapters
will form in the near future. This means that the GRSS
community now consists of 98 combined Chapters with
membership in the Society surpassing 5,300 members
and submissions to our journals surpassing expectations.
The numbers confirm continued success for GRSS in
2021, but this year also brought some unexpected twists
and turns with high expectations at the beginning to immediately resume in-person meetings.
In the spring, GRSS completed transition of the website
to a new provider and updated its appearance and structure. I hope that it offers more content and is more straightforward to use when navigating the website. The process of
improving the website is ongoing, and we are continuing
to transition and add material. It is great to see this result
from many past planning sessions and discussions.
Extensive preparation by the 2021 IEEE International
Geoscience and Remote Sensing Symposium (IGARSS)
team to address numerous possible outcomes this summer
about the state of COVID-19 worldwide resulted in a decision to pivot from the planned hybrid meeting format at
the beginning of the year to a fully virtual meeting. IGARSS
2021 again enjoyed record attendance and continued success overall, including an in-person drone workshop and
an evolved meeting format. We are leveraging lessons
learned from the past two IGARSSs to provide the best

Digital Object Identifier 10.1109/MGRS.2021.3129110
Date of current version: 14 January 2022

experience for the upcoming IGARSS 2022, which is currently expected to have both online and in-person content
for those who can attend the meeting in Kuala Lumpur.
This past year, GRSS education and outreach activities also expanded to include schools offered in all seasons, not just summer, and we expanded GRSS course
offerings through the IEEE Learning Network. Don’t
forget our cool videos, and now the second season’s
sponsoring our “Down to Earth” podcast. Before the
end of the year, we also plan to reinstate in-person engagements with our booth at the upcoming American
Geophysical Meeting in December.
Also underway is the third GRSS Student Grand
Challenge. This activity is a collaboration between the
Van Allen Foundation of the University of Montpellier
and IEEE GRSS. The combined activity consists of four
projects overall: REmote Sensing detection of Plastic
POllution in the Gulf of LIons, optiCal floAt for PlasTic
quAntIficatioN, Remote Identification of Microplastics
using Ocean Surface Anomalies, and Micro-PLAStic in
the SEA Detection experiment, with GRSS facilitating
the latter three projects. The project kickoff meetings,
with three of the four participants of this collaboration,
were held in October, with the fourth project kickoff
meeting anticipated for November. Tracking plastics
and debris in our oceans is an important topic requiring
a multidisciplinary approach. The value of a wide variety of data, both remotely and in situ sensed, needs to be
assessed. New sensors may be needed to supply data to
better understand the problem, assessments for decision
makers, and design support to better control the problem and its impacts. The four projects underway focus
on different approaches to the overall problem and possible mitigations. It is exciting to see the enthusiasm and
value of these different approaches coming together.
In November, GRSS cosponsored the 2021 Asia Pacific Conference on Synthetic Aperture Radar, which was
held in Bali, Indonesia, thanks to conference organizers
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

Josaphat Sumantyo, the GRSS Instrumentation and Future
Technologies Technical Committee, and Arifin Nugroho,
chair of the GRSS Indonesian Chapter. Thanks also for keynote presentations provided by GRSS Administrative Committee (AdCom) members Alberto Moreira and Paul Rosen.
Next, I would like to announce that in September, GRSS
selected the proposing team from Brisbane, Australia, to
host IGARSS 2025, which is now planned for early August 2025. Congratulations to Prof. Xiuping Jia, Prof. Jeffrey Walker, and Prof. Jocelyn Chanussot, general cochairs
of IGARSS 2025. I would also like to thank all the teams
that participated in the competition for their hard work
and preparation. We recognize and appreciate your efforts,
and we hope that you will continue to support the longterm success of IGARSS. The call for proposals for IGARSS
2026 from IEEE Regions 1–7 and 9 has now been posted on
the GRSS website. Interested groups should submit a letter of intent and a preliminary proposal (preproposal) to
the vice president of meetings and symposia of the GRSS
at vp_meetings_symposia@grss-ieee.org by 1 March 2022.
I am also happy to report that GRSS now has a published
standard (IEEE 4003-2021) on IEEE Xplore describing global navigation satellite systems reflectometry data sets. This
standard is notable not only because it was developed almost
independent of industry representatives but also because a
draft for balloting was produced in two years despite changes
in leadership. The GRSS Standards Committee has several
more IEEE standards projects ongoing. In future AdCom
meetings, GRSS may consider further defining the role of
standards activity as it relates to the Society’s core mission.
In 2021, GRSS leadership held four additional executive
sessions of AdCom meetings. These online sessions provided
some extra time for discussion on important topics, which
has been difficult due to the inability to hold in-person meetings. This year, all AdCom meetings were again held virtually
due to the continuing changing nature of the global pandemic, although we are now planning to restart in-person meetings, beginning with our spring AdCom meeting in March.
From our recent November AdCom meeting, some key
decisions include the adoption of changes to our Bylaws and

FROM THE EDITOR

GRS

(continued from p. 5)

were represented in international proceedings, and Dr. Gail
Skofronick-Jackson, NASA program manager, IEEE Fellow,
former Administrative Committee member, and leader of
WIE activities. Their memorials begin on page 289.
As I have mentioned in the past few issues, IEEE Geoscience and Remote Sensing Magazine has now implemented
a two-stage review process to give more timely feedback
to potential authors. Short (five pages or fewer, excluding
references) white papers will be submitted first. These will
then be reviewed by associate editors or members of the
editorial board. Following a positive review of the white
DECEMBER 2021

Operations and Procedures (OPs) Manual that better reflect
the Society’s practice and help ensure transparency in our
future operations. The scope of these changes included the
addition of required clauses and conditions for our GRSS
awards committees, changes to the conference advisory
committee charter, and reduction of the GRSS past-president
term of office from three to two years. The roles of social media chair and social media ambassadors were also codified
in our documents. Finally, additions to the OPs manual in
November defined the terms of our associate and topical associate editors. Please look for these updates and additions to
the GRSS Bylaws and OPs Manual on our website. There is a
requirement from IEEE to allow a 30-day review period for
changes to the Bylaws before they become effective.
Considering the scope of the November meeting, I would
like to thank the AdCom for their many contributions to
GRSS activities throughout the year as well as their time
spent preparing and reporting at all of the meetings throughout the year. Of note, the November AdCom meeting included 15 portfolio topics with 52 live presentations and 20 consent agenda presentations. The live meeting was held in short
sessions spread over three days to cover the scope of activities
within the Society. It was clear from listening to the many
speakers at the November meeting that the level of activity is
continuing to grow with our membership.
To conclude my December letter, it is with a very
heavy heart that I forward news of the passing of Dr. Gail
Skofronick-Jackson due to an accident while she was on
the island of St. Croix in the U.S. Virgin Islands. Gail was
a close friend and colleague to many of us within GRSS,
NASA, and the international Earth science community.
Within GRSS, she served as a member of the AdCom from
2012 to 2016, was a member of the organizing committee
of IGARSS 2020, and for several years organized and led
GRSS Women in Engineering activities. Gail was a brilliant
scientist, continually enthusiastic to learn about the world
around us, and always very thoughtful of others. I am grateful for all of the times we were able to share with her at our
various meetings and activities.

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

paper, authors may be invited to submit a full manuscript,
which will then undergo a complete peer review.
Contributions to our regular columns; (“Chapters,”
“Space Agencies,” “Women in GRSS,” “Education,” “Software and Data Sets,” and “Conference Reports”) are always welcome. White papers, columns, and invited manuscripts should be submitted through manuscript central
at http://mc.manuscriptcentral.com/grsm. Proposals for special issues should be sent to me directly at jlg@ieee.org.
Please continue to stay safe!
GRS

Methods for Small,
Weak Object Detection
in Optical High-Resolution
Remote Sensing Images
A survey of advances and challenges
WEI HAN, JIA CHEN, LIZHE WANG, RUYI FENG, FENGPENG LI,
LIN WU, TIAN TIAN, AND JINING YAN

bject detection that focuses on locating objects of interest and categorizing them has long played a critical role
in the development of remote sensing imagery. Following
significant improvements in Earth observation technologies, the objects in high-resolution remote sensing (HRRS)
images show additional detailed information and more
complex patterns. Some applications, such as urban monitoring, military reconnaissance, and national security, have
urgent needs in terms of identifying small-scale (small)
and weak-feature-response (weak) objects. However, these
kinds of objects usually take up the small proportion of an
image that has enough of its own variations in color, shape,
and texture so that the objects’ features are easily affected
by weather, illumination, and occlusion. These characteristics of small, weak objects make their detection a more
challenging task than generic object detection. This article
comprehensively reviews the existing challenges and corresponding technologies for addressing that task and its
specific problems.

INTRODUCTION
Object detection in remote sensing images aims at locating objects of interest on the ground and categorizing them.
The term object generally refers to man-made or highly
structured bodies (vehicles, buildings, ships, and so forth)
that are independent of complex background environments
as well as landscapes. As a fundamental task in the field of
satellite and aerial image analysis, object detection plays
an important role in a wide range of applications, such as
urban planning, geographic information processing, precision agriculture, and environmental monitoring.
Digital Object Identifier 10.1109/MGRS.2020.3041450
Date of current version: 25 January 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

©SHUTTERSTOCK.COM/WILLEM

In the past 20 years, the increasing image interpretation
accuracy of these applications has enabled them to meet
the requirements needed in actual scenarios and thus significantly promotes the development of Earth observation
technologies and object-detection approaches. The spatial, temporal, and spectral resolutions of Earth observation sensors have also been greatly improved [1]–[3]. For
instance, the images from Google Earth (Google Inc.) [4]
have resolutions of up to approximately 0.5 m. WorldView-3
(DigitalGlobe, Inc.) [5] provides a 0.31-m panchromatic resolution and a 1.24-m multispectral resolution. These HRRS
images show more texture and shape and additional detailed information about geospatial objects as well as complex spatial patterns. The data volume of HRRS images has
also dramatically increased, and a massive number of images is now accessible. The advantages of HRRS images are
that they can offer the most economical and efficient way
to achieve full-time, high-precision Earth surface monitoring with global coverage and fast detection of small-scale
(small), weak-feature-response (weak), and nonuniformly

DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

(sparsely or densely) distributed objects is of great significance when meeting the requirements of real scenarios in
many special applications, such as military reconnaissance,
national security, urban monitoring, and geological disaster monitoring.
Unlike natural images, which are often clearer and contain several categories of objects, HRRS images cover an
extensive range of the Earth’s surface and involve a massive
number of objects. The objects vary in scale, color, shape,
and texture; their features are easily affected by weather,
illumination, occlusion, and imaging conditions. In addition, the great distance between the sensor and targets
means that some kinds of targets occupy only a few to
dozens of pixels in the imaging plane and are presented as
small objects that can easily be overwhelmed by a bright
background [6]. Objects of this kind are usually characterized by a low signal-to-noise ratio (SNR) and inadequate
structure information, which is presented as a weak feature
response. These characteristics make the detection of small,
weak targets a more challenging task in remote sensing.
The past decade has witnessed major advances in object
detection in remote sensing images. At an early stage, various models based on prior knowledge [7]–[10] were proposed for target detection in satellite images. As image resolution increases, prior-knowledge-based models increase in
uncertainty because the high complexity of HRRS images
tends to cause limited detection accuracy. More recently,
various forms of machine learning (ML) approaches [11]
have played a critical role in object detection. With the increasing availability of big data and remarkable advances in
data mining, novel methods have come into use for HRRS
image processing.
Deep learning (DL) models [12]–[15] have attracted serious attention and become dominant tools for processing
large-scale, high-dimension data; they have achieved satisfactory accuracy for several tasks in the field of remote sensing. By stacking multiple nonlinear layers, DL models extract
semantic information about objects as well as the context
relationships among them and the background. DL models demonstrate superiority in the extractions and fusions
of multiscale features and have therefore outperformed the
early models, with significant developments in remote sensing object representations. In recent decades, many works
have presented ML- and DL-based models, leading to the
creation of a series of benchmark data sets for promoting
remote sensing and small, weak object detection [16]–[19].
Although several survey papers on object detection have
been published, they have focused mainly on detection
technologies from the image-processing aspect [20], [21] or
on reviewing some categories of approaches, such as ML[11] and DL-based methods [19], or some specific detection
problems and tasks, including vehicle detection [22] and salient methods [23]–[25] for remote sensing object detection.
There is still the lack of a comprehensive review of existing
works that addresses the problems of small, weak object detection. Based on the aforementioned analysis, this article
9

concentrates on challenges to and recent advances in addressing these problems and can be summarized as follows:
◗◗ This article systematically analyzes the challenges of
small, weak target detection. According to their causes,
the challenges have been divided into three aspects: image quality, object variations, and complex context.
◗◗ The technical evolution of object detection, including main
developments in the fields of computer vision and remote
sensing, is comprehensively involved; the existing benchmark data sets and their contributions to small, weak object
detection are introduced and analyzed.
◗◗ The existing works that address the various challenges
are also summarized, and some promising research directions into further improvements to small, weak object detection are discussed.
DIFFICULTIES AND CHALLENGES IN REMOTE
SENSING SMALL, WEAK OBJECT DETECTION
GENERIC OBJECT DETECTION IN REMOTE SENSING
Object detection, a fundamental and essential task, has attracted broad attention over the past decades. The task is

defined as follows: given a remote sensing image, determine
whether it includes instances of objects from predefined
categories, and, if it does, predict the spatial location and
the extent of each instance [27]. Although thousands of
geospatial objects occupy optical remote sensing images,
research scholars interested in this topic use the term objects
to refer to human-made or highly structured bodies (e.g.,
ships, vehicles, and airplanes) that have shape boundaries
and are independent of the background environment and
landscape items [11] rather than unstructured bodies or
scenes, such as the sky or clouds.
Generally, the spatial location and extent of an object
can be defined using a bounding box (BB) (a horizontal
or orientation rectangle tightly bounding the object) or
a precise pixelwise segmentation mask, as shown in Figure 1. Over the past several years, BB annotation has become the most widely used method for evaluating detection performance in remote sensing images; it can define
the location of an object by the corner coordinates of a
rectangle. The main advantage of this type of annotation
is that it focuses on locating only objects of interest, ignoring the context. Therefore, it can greatly save labor costs

(a)

(b)

(c)

(d)

FIGURE 1. The different annotation types in the HRSC2016 [26] data set. (a) The original image, (b) the HBBs’ annotation, (c) the OBBs’ annotation, and (d) the pixelwise segmentation mask.

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

in labeling data and is available to quickly create largescale object-detection data sets for specific applications.
The precise pixelwise segmentation mask is an annotation
method, wherein each pixel in the image is assigned a category label, such as forest, farmland, road, or background.
This type can be applied in scenarios in which the environmental context is important. This type of annotation
requires more expert knowledge and labor to be successful. Due to the massive number of object categories and
instances, complex backgrounds, and a large data volume
of HRRS images, the precise pixelwise segmentation mask
annotation is rarely used in large-scale remote sensing target detection.
There are two types of widely used BB annotation methods: horizontal BBs (HBBs) and orientation (rotation) BBs
(OBBs). In Figure 1, HBBs (the axis-aligned rectangle) were
first used to localize objects. However, objects in HRRS images often appear in arbitrary orientations and may be densely distributed. In some extreme but common scenarios, this
annotation method involves both the background and targets of interest; it cannot accurately or compactly outline the
locations of objects and may decrease detector performance.
The annotation method of the OBBs, which can be regraded
to add angles to the HBBs, is utilized to gain a tight bounding for the rotation object. For this article, we review mainly
methods with these two types of BB annotations.
DIFFICULTIES AND CHALLENGES IN SMALL,
WEAK OBJECT DETECTION
Relevant works for small, weak object detection of infrared
images started to appear long ago, when the spatial resolution of remote sensing images was relatively low and infrared images were the main data source for object detection.
Many works have addressed solutions to such problems
[28]–[32]. Related works covering the analysis of object
detection in infrared images [33]–[35] originally defined a
small object as one with a total spatial extent of fewer than
80 pixels (a width of fewer than nine pixels), which is less
than 0.2% in an image of 256 × 256 pixels. As shown in
Figure 2, the long distance of imaging means that the target takes up only a few dozen pixels in the imaging plane,
presenting as a small target. Objects of this kind are basically shapeless and have no available texture features. Small
objects are usually characterized as having a low SNR, small
size, and no adequate structure information for the undulant clutter and imaging distance. These characteristics make
small targets very difficult to detect, and small targets are
easily overwhelmed by a bright background [6]. Therefore, a
small, weak object is more formally defined as 1) small: the
scale of the target is small, or the target’s proportion of the
total images is low; and 2) weak: the features of the target are
insufficient and easily affected by its background.
Thanks to the acquisition of HRRS images and the requirements of the related applications, small, weak object
detection has attracted increasing attention. Although numerous efforts have been made to develop detectors and
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

benchmark data sets for promoting the development of
small, weak object detection in HRRS images, there is still
no consensus on the definition of small, weak object detection. Kang et al. [16] proposed a complex background
benchmark wherein vehicles were set as the small objects.
VEDAI (for “vehicle detection in aerial imagery”) [17] is a
data set created for small target detection, but the authors
did not propose a specific definition for a small object. Xia
et al. [18] proposed a new large-scale benchmark for HRRS
image detection, dividing the object instances into three
classes according to the pixel width of their BBs: small for
a width from 10 to 50, middle for a width from 50 to 300,
and large for a width of greater than 300. This was the first
work to clearly define a scale for small objects in HRRS
images. For the aspect of the weak feature response of objects, no related work has proposed sufficient discussion or
drawn a clear conclusion.
In this section, we comprehensively consider the factors
that affect detection performance and then summarize the
difficulties of and challenges to small, weak object detection in HRRS images. In Table 1, each influencing factor is
examined from the three aspects of image quality, object
variations, and complex context as follows.
1) Image quality: In the process of HRRS image acquisition, the
imaging environment, satellite platform, optical system,

FIGURE 2. Small, weak object detection in infrared images.

TABLE 1. THE CHALLENGES AFFECTING SMALL,
WEAK OBJECT DETECTION.
THREE
ASPECTS

SPECIFIC CONTENT

Image quality

Mixed noise, patch missing, occlusion caused by
cloud, fuzzy, shadow, and multisource data

Object
variations

Small size, high intraclass variations, a change of the
object features caused by illumination and background, antagonism of the background and the target,
a lack of annotation samples, nonuniform distribution,
and an imbalance of positive and negative
training examples

Complex
context

Many types and quantities of background targets and
complex distribution patterns

and electronic equipment may affect the image quality,
which leads to a certain degree of degradation of the acquired images. As presented in Figure 3, these images
cannot fully meet the requirements of precise interpretation in real-world applications. There are two main
categories of factors that degrade the image quality. The
first one is the factors that possibly appear in the imaging
process, such as noise, blurring, cloud occlusion, missing
information, shadow, and so on. These kinds of factors
are the main reason for remote sensing image degradation. Another category of factors arises from the limitations of sensor production technologies and application
scenarios. Because spectral, spatial, and temporal resolutions are often mutually restricted, imaging sensors can
achieve high resolution in only one of these three aspects.
For these kinds of low-quality images, some methods for
improving the image quality should be applied. Multisource satellite data with different resolutions should be
complementary to obtain the required data.
2) Object variations: An HRRS image can cover an extensive area of the Earth’s surface and contain many kinds
of objects. The scale variations of object instances in
HRRS images are great, and some objects are ver y
small. As depicted in Figure 4, some objects always take
a small proportion of a total image and show weak feature response; for example, the width of a small ship
can be fewer than 25 pixels. Different resolution, scale,
color, shape, and texture changes residing within a
single category create high intraclass variations for ob-

(a)

jects. These kinds of small-object instances may likely
crowd into a specific region of aerial images. Additionally, HRRS images are noisy, and the features of objects
easily change when affected by weather, illumination,
and occlusions. Some specific targets are adversarial
and camouflaged, making them difficult to identify
effectively. Another critical problem is that the annotation samples may be insufficient. At present, there are
more than 2,000 satellites in orbit around the world;
they generate more than a petabyte of data every day.
However, there are roughly only 100 GB of annotated
data for target detection.
3) Complex context: Generally, the background and context
of objects of interest are complex and crowded with other
type of objects, as displayed in Figure 5. Natural images
are often taken from horizontal perspectives, while HRRS
images are typically taken as bird’s-eye views; this implies
that many objects of interest form complex spatial patterns with the background. The intricate patterns increase
the difficulty of object detection in HRRS images.
Considering the three aspects of the challenges, remote
sensing small, weak objects can be defined as 1) data quality,
i.e., HRRS images for small, weak object detection may be
of low quality due to the noise, illumination change, occlusion, and so forth introduced in the imaging process; (2) objects, i.e., they are of small scale, have weak feature response
with many categories showing high intraclass variations and
a nonuniform distribution, and may lack annotation samples; (3) context, i.e., the context is complex and changeable,

(b)

(c)

IKONOS
(0.8–1 m)

(d)

(e)

WorldView-3
(0.31 m)

(f)

FIGURE 3. Some problems caused by image quality: (a) blur, (b) noise, (c) missing information (d) cloud, (e) shadow, and (f) multisource data.

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

(a)

(b)

(c)

(d)

(e)

FIGURE 4. Some problems caused by object variations: (a) small-scale weak feature; (b) high intraclass variations; (c) multiclass densely

distributed instances; (d) occlusions; and (e) camouflage and adversariness.

and targets are easily hidden in the background. All of these
characteristics make small, weak object detection a more
challenging task than generic object detection. To promote
its development, more work is needed to address these different aspects of the challenges and their difficulties.
A REVIEW OF HRRS OBJECT-DETECTION
BENCHMARK DATA SETS AND
PERFORMANCE EVALUATION
HRRS OBJECT-DETECTION BENCHMARK DATA SETS
Throughout the development of object detection and recognition, data sets have played a critical role not only as
common resources for the evaluation and verification of
algorithm performance but also in pushing research into
increasingly complex and challenging problems [20]. Over
the past decade, in particular, detection and recognition
methods based on DL have achieved tremendous success
in addressing visual-understanding problems in the computer vision community; large amounts of annotation data,
including Pascal visual object classes [44], ImageNet [45],
Microsoft common objects in context (COCO) [46], and
Open Images [47], have played a key role in this success.
The development of Earth observation technologies and
access to a large number of HRRS images make it possible
to build large-scale data sets for capturing the vast richness
and diversity of objects, promoting unprecedented performance in remote sensing object detection.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

In past decades, research groups in remote sensing have
released many public data sets with different characteristics
for solving different problems. There are 13 widely used data
sets: the Institute for Computer Science and Control in Hungary-Inria (SZTAKI-Inria) [36], Northwestern Polytechnical
University very high resolution (NWPU VHR)-10 [37], Chinese Academy of Sciences UCAS-AOD [38], road scene object
detection (RSOD) [40], data set for object detection (DOTA),
object detection in aerial images (ODAI) [18], VEDAI [17],
high-resolution ship collection 2016 (HRSC2016) [26], 3K
vehicle [16], cars overhead with context (COWC) [39], xView
[41], HRRS detection (HRRSD) [42], [43], and detection in
optical remote sensing images (DIOR) [19]. The attributes of

(a)

(b)

FIGURE 5. The problems introduced by complex contexts. (a) A
complex context and (b) massive background objects.

these data sets are listed in Table 2 for comparison. The development trend and some representative small, weak targets
of the data sets are displayed in Figures 6 and 7, respectively.
Each data set is introduced in this section. The things-andstuff data set [48] is excluded from this discussion because of
its relatively low spatial resolution.
SZTAKI-INRIA
This benchmark data set, from SZTAKI and the Inria Sophia
Antipolis-Méditerranée Research Center in France [36], was
created for building detection and is a multisensor aerial
set from QuickBird, IKONOS, and Google Earth [4]. It contains nine images and 665 building instances, annotated

with oriented OBBs. The images of the data set have three
bands: red, green, and blue (RGB).
NWPU VHR-10
This available 10-class geospatial object-detection data
set from NPWU in Xi’an, China, is used for research purposes [37]. The object classes are airplane, ship, storage
tank, baseball diamond, tennis court, basketball court,
ground track field, harbor, bridge, and vehicle. The data
set contains 800 VHR remote sensing images cropped
from the Google Earth and Vaihingen data sets, which are
then manually annotated by experts into 3,775 instances
with HBBs. The image resolutions range from 0.5 to 2 m.

TABLE 2. COMPARIONS OF THE AVAILABLE BENCHMARK DATA SETS IN EARTH OBSERVATION COMMUNITY.
DATA SET NAME

TOTAL
CATEGORIES IMAGES INSTANCE

IMAGE
WIDTH

DATA SOURCE

RESOLUTION ANNOTATION YEAR CHARACTERISTICS

SZTAKI-INRIA [36] 1

665

~800

Quick Bird,
IKONOS, and
Google Earth

0.5–1 m

OBBs

2012

Single category, highresolution satellite
images, multiple
sensors

NWUP-VHR10
[37]

800

3,775

~1,000

Google Earth

0.3–2 m

HBBs

2014

Multiple categories,
clean background

UCAS-AOD [38]

910

6,029

1,280

Google Earth

0.3–2 m

HBBs

2015

Airplane and vehicle
detection

VEDAI [17]

1,210

3,640

1,024

Utah AGRC

0.125 m

OBBs

2015

Small-scale objects,
multispectral and
multiresolution images, illumination
changes

3K vehicle [16]

14,235

5,616

DLR 3K camera 0.13 m
system

OBBs

2015

Small-scale objects,
VHR images

COWC [39]

32,716

2,000–
19,000

Six sources

0.15 m

Dot

2016

Small-scale objects,
multisensor images

HRSC2016 [26]

1,061

2,976

~1,000

Google Earth

0.4–2 m

Three types

2016

Sufficient object
variations, complex
background

RSOD [40]

976

6,950

~1,000

Google Earth,
Tianditu

0.3–3 m

HBBs

2017

Multisensor and multiresolution images

DOTA [18]

2,806

188,282

800–
4,000

Google Earth,
JL-1, and GF-2

0.3–1 m

HBBs and
OBBs

2018

Multisensor and multiresolution images,
nonuniform distribution,
many object categories, sufficient object
variations

ODAI [18]

2,806

~400,000

800–
4,000

Google Earth,
JL-1, and GF-2

0.3–1 m

HBBs and
OBBs

2019

Improved version of
DOTA, more instances
and categories, especially for small, weak
objects

xView [41]

1,128

~1,000,000 2,000–
4,000

Worldview-3

0.3 m

HBBs

2018

Complex background,
many categories, massive instances, dense
distribution, noise, blur,
occlusion

HRRSD [42], [43]

21,761

55,740

~11,000 Google Earth

0.15–1.2m

HBBs

2019

Many categories, many
instances, sufficient
variations

DIOR [19]

23,463

192,472

800

0.5–30 m

HBBs

2019

Complex background,
many categories, noise,
blur, occlusion

Google Earth

DLR: German Aerospace Center; COWC: cars overhead with context. AGRC: Automated Geographic Reference Center.

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

UCAS-AOD
This UCAS data set, collected from Google Earth, contains
two detection classes: airplane and vehicle [38]. The airplane category has 600 images with 3,210 instances, while
the vehicle category has 310 images with 2,819 vehicles.
VEDAI
This data set was created for the task of multiclass vehicle
detection in satellite images [17]. It consists of nine categories with a total of 3,640 instances, including boat, car,
camping car, plane, pickup, tractor, truck, van, and a category labeled “other.” The data set has 1,210 images, each
of which is 1,024 × 1,024 pixels with VHR (12.5 cm). VEDAI

is provided as a tool to benchmark automatic target-recognition algorithms in unconstrained environments. The vehicles contained in the database, in addition to being small,
exhibit different characteristics, such as multiple orientations, illumination/shadowing changes, peculiarities, and
occlusions. Furthermore, each image is available in several
spectral bands and resolutions.
3K VEHICLE
This data set is another of those used for vehicle detection
[16]. It has 20 images with 5,616 × 3,744 pixels and a spatial
resolution of 13 cm. It contains 14,235 vehicles with OBBs.
The images were captured by the German Aerospace Center

2012
Multiple Class
High Resolution

SZTAKI-INRIA

2014
NWPU VHR-10
Small Scale
Illumination/Shadow
Higher Resolution
UCAS-AOD

3K Vehicle Detection

VEDAI

2015

COWC
2016

HRSC2016

ROSD

2017

2018
DOTA and ODAI

Complex Background
Multiple Scales

xView

Massive Instance
More Categories
Dense Distribution
Multisensor Data
Sufficient Variations

2019

HRRSD

DOIR

FIGURE 6. The development trend of existing HRRS data sets.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

3K camera system (a real-time airborne digital monitoring system) at a height of 1 km above the ground.

from Google Earth and Tianditu, and its resolutions range
from 0.3 to 3 m.

CARS OVERHEAD WITH CONTEXT
Also created for vehicle detection, the cars overhead with context (COWC) data set images are standardized to 12.5 cm per
pixel at ground level from their original resolutions [39]. The
set contains 32,716 unique cars from six sources: Toronto,
Canada; Selwyn, New Zealand; Potsdam and Vaihingen, Germany; Columbus, Ohio; and Utah, the United States, covering different geographical locations and produced by different imaging sensors. The car sizes range from 24 to 48 pixels.
Two of the sets (Vaihingen and Columbus) are in gray scale;
the others are in RGB color. It should be noted that each car
in the annotated images has a dot placed on its center.
HRSC2016
HSRC2016 is a benchmark data set for boat detection [26];
it has 1,070 images and 2,976 instances from Google Earth
with HBB annotations. The image sizes vary from 300 ×
300 to 1,500 × 900 pixels. The images contain large variations of scale, position, shape, and appearance.

DOTA AND ODAI
DOTA is a larger-scale data set with HBB and OBB annotations [18]. It contains 2,806 large images and classifies objects into 15 categories, including baseball diamond, ground
track field, small and large vehicles, tennis court, basketball
court, storage tank, soccer field, roundabout, swimming
pool, helicopter, bridge, harbor, ship, and plane. The fully annotated DOTA contains 188,282 object instances, which vary
greatly in scale, orientation, and aspect ratio; the resolutions
range from 0.3 to 1 m. The images are collected mainly from
Google Earth [4], but some are taken from JL-1 and the rest
from GF-2 of the China Center for Resources Satellite Data
and Application. ODAI is an updated version of the DOTA
data set and contains 0.4 million annotated object instances
in 16 categories. Both DOTA and ODAI use the same aerial
images, but ODAI has revised and updated the annotation of
objects, adding many small-object instances (approximately
10 pixels or fewer) that were missed in DOTA and extending
the categories by adding a new one: a container crane.

RSOD
RSOD consists of 976 images and 6,950 object instances
involving four categories [40]. The data set was collected

XVIEW
This is one of the largest published aerial data sets, covering 60 object classes [41]. It contains images from complex

Ship

Wind Mill
(a)

Ground Track Field

(b)
FIGURE 7. Small, weak object examples of existing HRRS data sets. (a) Small, weak objects in large-scale data sets and (b) small, weak

objects in small-scale data sets for vehicle detection.
16

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

scenes and more than one million object instances with
HBB annotations. Compared with images in existing HRRS
data sets, xView images are high resolution, multispectral,
and labeled with a greater variety of objects. The images collected from WorldView-3 have a resolution of up to 0.3 m.
The variations of scale, color, shape, and texture make the
data set more challenging to the remote sensing community.

small objects. They can supplement the large-scale data
sets mentioned previously. However, some critical limitations in these data sets are that the object categories are few,
including only cars or airplanes, and the image quality is
high, which is not consistent with actual scenarios. VHR
data sets with more categories, variations, and challenges
need to be developed further.

HRRSD
The HRRSD data set is a large-scale benchmark with 21,758
RBG images extracted from Google Earth and has spatial
resolutions ranging from 0.15 to 1.2 m [42], [43]. There are
13 categories of objects, which allows this to be considered
an extended version of the NWPU VHR-10 data set with additional classes, such as crossroads, parking lots, and T junctions. This data set is class balanced, and each category has
3,700–5,000 instances.

EVALUATION METRICS
There are two categories of metrics for evaluating detector performance: detection speed in frames per second (FPS) and detection accuracy in precision, recall,
and average precision (AP). FPS is a metric used to express how fast the detector is; it means the number of
image FPS that the detector can process. For example,
if the time needed for a detector to analyze a standardscale image is 0.04 s, its detection speed is a frame rate
of 25 FPS.
For a given input image I, the outputs of a detector are
the predicted results {(b j, c j, p j)} Mj = 1 (indexed by the object
order j; M is the number of predicted detections) of the BB
b j, predicted label c j, and confidence score p j. The groundtruth boxes are {(B k, C k)} kN= 1 (indexed by the order k; N is
the number of ground-truth boxes) of the BB B k and label
C k. {(b j, c j, p j)} Mj = 1 are greedily matched to {(B k, C k)} kN= 1.
For given a confidence threshold t and a intersection over
union (IoU) threshold e, a predicted result (b j, c j, p j) is set
as a true positive (TP) if the following criteria are met:
◗◗ The predicted label c j is equal to the label C k of a
ground-truth box (B k, C k), and p j is greater than t.
◗◗ The IoU value between the predicted BB b j and the
ground-truth BB B k, IoU (b j, B k), is larger than e, where
IoU (b j, B k) is computed as

DIOR
The DIOR data set is a recently released aerial DOTA [19]. It
contains 23,463 images with 800 × 800 pixels and 192,472
instances labeled with HBBs. The images were collected
from Google Earth and have resolutions ranging from 0.5
to 30 m. This data set has sufficient variations of scale,
weather, seasons, imaging conditions, and quality as well
as high interclass similarity and intraclass diversity. It is
also one of the larger-scale data sets, with massive images
and object instances.
COMPARISON
As shown in Figure 6, early HRRS data sets, such as SZTAKIIRIA [36] and NWPU VHR10 [37], contained a small number of categories and instances for the detection of large or
easily recognized objects. After several years, scholars have
forged ahead to introduce massive numbers of instances
and many categories: multisensor data, complex context,
and low-quality images to create large-scale challenging
data sets, such as xView [41], DIOR [19], HRRSD [42], [43],
and DOTA [18], which are becoming more and more in
line with the conditions of actual applications. These four
satellite data sets contain more than 13 object categories
and more than 50,000 object instances, with resolutions
ranging from 0.3 to 30 m, all available for the development
detectors that adapt to large-scale object detection. Some
representative small, weak examples from the aforementioned data sets are collected in Figure 7(a). It can be seen
that the objects in the large-scale data sets, such as ships
and windmills, are very small scaled, weak featured, and
easily affected by the context and low-quality data.
There are data sets that are very challenging and suitable to develop the detectors for small, weak object detection. VEDAI [17], 3K vehicle [16], and COWC [39] are three
relatively small-scale data sets used for vehicle detection. As
shown in Figure 7, their images have VHR (up to about
12.5 cm) and their objects, which are fixed to a range, are
beneficial for developing and testing a model to detect
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

IoU (b j, B k) =

area _ b j ( B k i
,(1)
area _ b j ' B k i

and the symbols of + and , denote the intersection and
union, respectively. The value of e is generally set to 0.5.
Otherwise, the predicted result is regarded as a false-positive (FP) sample.
Precision is the proportion of correct detection instances
out of the total detection results predicted by the detector.
Based on the calculations of the TP and FP results, it can be
computed by
TP
P (t) = TP + FP . (2)
Recall is defined as the proportion of all positive instances indicated by a detector. It can be formulated by
TP
R (t) = N , (3)
where N is the number of ground-truth boxes.
Precision and recall can drive AP, which is the metric
most used in recent works. AP is usually computed for each
class separately. The precision, P(t), and the recall, R(t), can
17

TABLE 3. THE COMPARIONS OF FIVE CATEGORIES OF DETECTION METHODS.
METHOD NAME MAIN CATEGORIES/MILESTONES

HIGHLIGHTS

LIMITATIONS

Template
matching

Rigid template matching [7], [50], [51],
deformable template [52]–[54]

Simple and fast to implement, no
training samples required

Limited to the variations of object appearances, consumes more prior knowledge

Knowledge

Geometric information [8], [55], context
knowledge [9], [56]

Detects objects from coarse-to-fine
hierarchical architecture, combines
more prior information

Defining the detection rules and knowledge is
subjective, labor consuming

OBIA

Multiresolution segmentation [57]–[59]

Flexible incorporation of different features, GIS-like functionality and expert
knowledge

Lacks generic solutions to the full automation
of segmentation process, defining the classification rules is subjective and not robust

Classical ML

Features: HoGs [10], BoWs [60], texture Automatically establishes object-andfeatures [61], [62], and so on; classifiers: learn feature representation, better
SVM [63], [64], AdaBoost [65], [66], kNN scalability and compatibility
[67], and CRF [68], [69]

Labels many training samples, detection accuracy depends on the training samples and
the feature extractor

Two stage: RCNN [70], SPPNet [71], fast
RCNN [72], faster RCNN [73], RFCN [74],
and so forth; one stage: YOLO [75], SSD
[76], RetinaNet [77], and CornerNet [78]

Labels a large number of samples, consumes
massive computing resources

End-to-end framework without
manual intervention, automatically
learns high-level features, adapts to
large-scale complex image processing

OBIA: object-based image analysis; GIS: geographic information system; HoGs: histogram of oriented gradients; BoWs: bag of words; SVM: support vector machine; kNN: k-nearest
neighbor; CRF: conditional random field; RCNN: region-based convolutional neural network; SPPNet: spatial pyramid pooling network; RFCN: region-based fully convolutional network;
YOLO: you only look; SSD: single-shot multibox detector.

be computed as a function of the confidence threshold t; by
varying the confidence threshold, t, different pairs (P, R) can
be obtained; in principle, this allows precision to be considered as a function of recall from which the AP value can
be found. The mean AP, the average of the AP values of all
the object categories, has therefore been adopted as the final
measure to evaluate the overall accuracy [44], [45], [49].
A BRIEF REVIEW OF OBJECT-DETECTION
FRAMEWORKS
Incredible progress has been made in feature representations and classifiers for object detection. In terms of feature

Template-Matching Methods

Classical ML Methods

Deformable Template
Matching
Rigid Template
Matching

1980

…

1995

DL Methods
Mask RCNN

SVM
AdaBoost

1990

representation and recognition, an impressive change is the
shift from handcrafted features to DL features. In terms of
localization, the sliding-window stage is mainstream. However, the number of windows is extensive and increases dramatically with the number of image pixels, especially when
processing remote sensing images. Therefore, scholars focus mainly on the design of effective and efficient objectdetection strategies; these include sharing-feature computations, cascading, reducing per-window computations, the
fast localization of objects of interest, and the reduction of
computational costs. In the following, we briefly review
milestone works. (see Table 3 and Figure 8).

Faster
SPPNet RCNN

BOWs
HOGs

2000

2005

RCNN

2010

Fast
RCNN

2015

FPN

2020

YOLO
Context
Knowledge
Geometric
Information

SSD
OBIA-Based Methods

CornerNet

RetinaNet

Knowledge-Based Methods
FIGURE 8. A road map of object-detection frameworks. SVM: support vector machine; BoWs: bag-of-words; HoGs: histogram of oriented

gradients; RCNN: region-based convolutional neural network; SPPNet: spatial pyramid pooling network; FPN: feature pyramid network;
SSD: single-shot multibox detector; YOLO: you only look once.
18

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

OBJECT-DETECTION METHODS BASED
ON TEMPLATE MATCHING
Methods based on template matching [11] are one kind of
simple approach to object detection; they find matches in
an input image, basing them on a series of predefined templates. The two main steps are 1) template generation, in
which a template for each object category should be generated by manual design or learning from the training set, and 2)
similarity measurement, in which, given an input image, the
template is used to match the entire image at each possible
position to find the matches. The methods have been classified into two groups: rigid template matching and deformable template matching. Early research concentrated mainly
on rigid template matching, applying it to detect specific objects with simple appearances and small variations [7], [50],
[51]. Because of its advanced ability to both impose geometrical constraints on the shape and integrate the local image
evidence, deformable template matching is more powerful
and flexible than rigid shape matching in processing shape
deformations and intraclass variations [52]–[54]. Objectdetection methods based on template matching are simple
and easy to implement for application to a specific task; expert knowledge is needed only to design them, and they do
not need training samples. However, designing the templates
calls for considerable prior knowledge and extensive computations; the templates are limited in their scale and rotation
and shape viewpoint changes in objects.
OBJECT-DETECTION METHODS
BASED ON KNOWLEDGE
Object-detection methods based on knowledge can transfer object detection into a hypothesis-testing problem by establishing various knowledge types and rules. The establishment of knowledge and rules is the most important step. Two
widely used methods involve both geometric and context
knowledge. The geometric information method is the most
important and is widely used for early-target object detection; users can encode prior knowledge by taking parametric, specific, or generic-shape models [8], [55]. The context
knowledge method is also crucial as the most widely used
for object and background context and the relationships
among objects and surrounding regions or objects [9], [56].
The methods of this kind enable users to perform the detection process through a coarse-to-fine hierarchical structure.
However, decisions on how to define the prior-knowledge
detection rules are subjective, and these factors pose critical challenges to the methods. Rules that are too loose cause
false positives; too tight and they cause false negatives.
OBJECT-DETECTION METHODS USING
OBJECT-BASED IMAGE ANALYSIS
With the increasing availability of submeter images, objectbased image analysis (OBIA) has been presented for classifying or mapping HRRS imagery into meaningful objects [57]–[59]. It contains two steps: image segmentation
and object classification. First, imagery is segmented into
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

homogeneous regions (also called objects), representing a
relatively homogeneous group of pixels; this is achieved by
selecting the desired scale, shape, and compactness criteria. In the second step, a classification process is applied to
these objects.
An advantage of OBIA-based methods is that they exploit the knowledge of geographic information systems to
overcome the limitations of pixel-based image-classification
methods. The real challenges to the satisfactory performance
of OBIA methods are in defining appropriate segmentation
parameters for varying size, shape, and spatially distributed
objects. In addition, accuracy assessments of OBIA are difficult, although many efforts have been made to address the
problem. The technique’s advantages lie in its flexible incorporation of the shape, texture, geometry, and contextual semantic features as well as expert knowledge, making it context aware and multisource capable. Generic solutions to the
full automation of the segmentation process are still missing,
and the expert knowledge needed to decide how to define
the classification rules is still subjective; these problems limit
the technique’s adaptability to different tasks.
OBJECT-DETECTION METHODS
BASED ON CLASSICAL ML
Due to the remarkable advances of ML techniques, especially their impressive feature representations and powerful classifiers, many recent approaches have taken object
detection to be a classification problem, achieving significant improvements. ML object detection can be performed by training a classifier that captures the variations
in object appearances and the views from a set of training
data. The classifier takes a set of regions (object proposals or image patches) with their feature representations
as the input; the output consists of their corresponding
predicted labels. The most important components in the
process of object detection are feature extraction, feature
fusion, and classifier training. The dimension-reduction
step is an optional operation. A histogram of oriented gradients (HoGs) feature [10], a bag-of-words (BoWs) feature
[60], texture [61], [62], sparse representation-based [79],
and Haar-like features [80] are common. The classifiers
include support vector machines (SVMs) [63], [64], AdaBoost [65], [66], k-nearest neighbors [67], and conditional
random fields [68], [69].
Methods based on ML can be automatically established using ML techniques. The scalability and compatibility are both greatly improved, but these methods need
a large number of training samples to learn classifiers and
are not suitable for large-scale data sets. In addition, the
representation ability of the learned features is not sufficiently robust enough to deal with variations in an object’s appearance.
DL-BASED DETECTION FRAMEWORKS
We discuss DL detectors separately from the ML methods described previously because of the great success
19

of DL-based techniques in recent years. Deep convolutional neural networks (CNNs) can extract high-level
feature representations of an input image and improve
classification performance. Girshick et al. [70] took the
lead, applying CNNs to object detection by developing
region-based CNN (RCNN) features. Since then, many
milestones have marked the unprecedented speed of the
development of object detection. The main milestone approaches are reviewed in the following sections; they can
be categorized into two classes according to the presence
or absence of a proposal generation stage: two- and onestage detection frameworks. In the next sections, existing
milestones of the two categories of detection frameworks
are introduced first, and then the advances of DL-based
detectors in small, weak object detection are reviewed.
TWO-STAGE DETECTION FRAMEWORKS
As depicted in Figure 9(a), for an input image, a two-stage
detector would first examine DL features using a pretrained
CNN architecture. Then, in the region proposal step, many
regions of interest (RoIs), i.e., regions where a target may
likely exist, would be generated. Finally, a detection head

with a classifier and a regressor would simultaneously predict the location and category of a target for each RoI. The
critical characteristic of two-stage detection frameworks is
that they contain a prepressing component for generating
object proposals. These kinds of detectors have dominated
object recognition since the creation of RCNNs [70] due
to their remarkable detection performance on benchmark
data sets.
REGIONS WITH CNN FEATURES
The main principle of RCNNs [70] is that they first extract
a set of object proposals (candidate boxes) using a selective
search. The proposals are resized to a fixed scale and fed
into a CNN model pretrained on ImageNet [12] to extract
high-level features; for example, Visual Geometry Group
[81], a residual neural network (ResNet) [13], and ResNeXt
[82]. Then, a linear SVM classifier is used to predict the presence of an object and the object category for each proposed
region. RCNNs have achieved remarkable improvement in
natural image object detection, but they have obvious drawbacks; for example, the selective search strategy may generate more than 2,000 proposal candidates for one image,

RPN
For Each Pixel Position
Whether There
Is a Target
Box Location

For Each RoI
Multiclass
Classification
BB
Regressor
Input Image

Feature Extractor

Feature
Maps

Feature Maps RoI Region
Vector
With Proposal

Classification
and Regression

Output Results

(a)

For Each Grid
Multiclass
Classification
BB
Regressor
Input Image

Feature Extractor

Feature Maps Feature Grid

Classification
and Regression

Output Results

(b)
FIGURE 9. The main structures of mainstream frameworks. (a) An illustration of two-stage detection frameworks (using a faster RCNN as

an example). (b) An illustration of one-stage detection frameworks (using YOLO as an example). RPN: region proposal network;
RoI: region of interest.
20

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

increasing very significantly the computation cost and slowing the detection speed.
SPATIAL PYRAMID POOLING NETWORK
To reduce the computational costs incurred by an RCNN, He
et al. [71], [83] proposed the spatial pyramid pooling network
(SPPNet), wherein the SPP layer is the main improvement.
Instead of requiring an input image of fixed size, the SPP layer
can generate a fixed-length feature representation regardless
of the size of the input proposals. During the detection process, the feature maps need only be computed once from the
entire image. The SPP layer can then extract the corresponding
region of the feature maps and generate a fixed-size feature
representation for each region proposal. This significantly
speeds up detection by avoiding repeated computations of
the feature maps. SPPNet achieved speeds more than 20-times
faster than those of RCNNs. However, it is not an end-to-end
framework and can fine-tune only its fully connected layers,
thus limiting the efficiency and performance of the model.
FAST RCNN AND FASTER RCNN
In 2015, Girshick et al. [72] proposed the fast RCNN detection framework that uses a unified neural module to
localize and recognize targets. It increases detection precision and accelerates detection speed because it can train a
classifier and a BB regressor simultaneously. Although fast
RCNN outperforms RCNNs and SPPNet, it is restricted by
the proposal-generation strategy.
The faster RCNN framework presented by Ren et al. [73]
is a fully end-to-end framework. It breaks though the speed
bottleneck of fast RCNN by introducing a region proposal
network (RPN) that enables generated object proposals using a CNN model. It achieved a near-real-time detection
speed and state-of-the-art accuracy. From RCNNs to Faster
RCNN, the building blocks of a detector, including region
proposal generation, feature extraction, and BB regression,
have been gradually improved and unified into an effective
learning framework.
REGION-BASED FULLY CONVOLUTIONAL NETWORK
The regionwise subnetwork for localizing and recognizing
an object in faster RCNN still needs to be applied per region
proposal (several hundred proposals per image). To address
this problem in faster RCNN, Dai et al. [74] proposed the
region-based fully convolutional network (RFCN), a fully
convolutional architecture with most of the computations
shared over the entire image. Dai et al. constructed a set of
position-sensitive score maps by using a bank of specialized convolutional layers as the FCN output and adding a
position-sensitive RoI pooling (RoIPool) layer on top. An
RFCN with ResNet101 could achieve an accuracy comparable to faster RCNN (often with faster running times).
MASK RCNN
Mask RCNN was presented by He et al. [84], [85] to tackle
pixelwise object-instance segmentation by extending faster
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

RCNN. Mask RCNN adopts the same two-stage pipeline
with an identical first stage (RPN). In the second stage, mask
RCNN adds a branch that outputs a binary mask for each
RoI in parallel with the class prediction and box offset. The
new branch is an FCN [86], [87] on top of a CNN feature
map. To avoid the misalignments caused by the original
RoIPool layer, an RoI alignment layer was proposed to preserve the pixel-level spatial correspondence. With a backbone network, i.e., a ResNeXt101-feature pyramid network
(FPN), mask RCNN achieved the top results for COCO objectinstance segmentation and BB object detection [46].
FPN
The previous examples detect objects on only the top layer
of the feature-extraction network. In some cases, this is not
suitable for localizing objects, especially small ones.
Lin et al. [88] proposed an FPN whose top-down architecture has skip connections to the remaining all-scale feature
maps. It shows great advances for detecting objects with a
wide variety of scales and aspect ratios and has been set as
a basic building block in many recent detectors.
CHAINED CASCADE NETWORK AND CASCADE RCNN
Two-stage object detection can be considered a cascade
structure; the first detector removes large amounts of background, and the second stage classifies the remaining regions. Recently, a series of end-to-end learning of more
than two cascaded classifiers and regressors for generic object detection in the chained cascade network [89] was proposed, extended in cascade RCNN [90], and later applied
to simultaneous object detection and instance segmentation [91]. These models have a sequence of detection heads
trained with increasing IoU thresholds. The subsequent
heads with the increasing IoU thresholds would train on
more abundant positive samples to conduct accurate detection and avoid the problem of overfitting.
ONE-STAGE DETECTION FRAMEWORKS
Although two-stage detectors perform satisfactorily, they
are computation intensive and therefore unsuitable for scenarios with limited storage and computational capability.
Research scholars have therefore started to design one-stage
unified detection approaches to accelerate detection speed.
As displayed in Figure 9(b), a one-stage detector directly
predicts the locations of the BB and the class probabilities
in an entire image by using a single CNN. It does not involve the steps of region proposal generation, feature resampling, and postclassification, but it does encapsulate all
of the computations in a single network [20].
YOU ONLY LOOK ONCE
You only look once (YOLO), presented by Joseph et al. [75],
is considered the first one-stage detector in the DL era. The
model divides the entire image into many regions then predicts the category probabilities and BB offsets for each region simultaneously. Two improved versions, YOLO v2 and
21

v3, were proposed later [92], [93]; these further promote
detection precision while retaining high detection speed.
Although they have obvious speed advantages, these models have a lower localization accuracy than do the two-stage
models, especially for small-scale objects.
SINGLE-SHOT MULTIBOX DETECTOR
To further boost the localization accuracy of a one-stage detector, Liu et al. developed a single-shot multibox detector
(SSD) [76], which is faster than YOLO and achieves better
detection accuracy. The main idea of SSD is that it can effectively combine an RPN in faster RCNN with multiscale
feature maps, thus achieving high detection accuracy while
keeping a fast detection speed. Unlike two-stage detectors,
an SSD can predict only a fixed number of BBs, followed
by a nonmaximal suppression (NMS) operation to obtain
the final results. The network architecture of an SSD uses
FCNs. It carries out detection processing on multiple feature maps, each of which predicts a category score and location offset for each box of an appropriate size.
RETINANET
For years, there has been a large gap between the accuracies of one- and two-stage detectors. Lin et al. [77] claimed
that the central cause of this gap is the extreme foreground–
background class imbalance encountered during the training of dense detectors. To counter this, a new loss function,
focal loss, has been proposed in RetinaNet to improve the
standard cross-entropy loss. Focal loss makes the detector
focus more on hard-to-classify examples during training.
It enables one-stage detectors to achieve detection performances comparable to those of two-stage detectors while
maintaining a high detection speed.
CORNERNET
Law et al. [78], [94], thinking that the anchor boxes for regressing the location of objects could cause a huge imbalance between positive and negative examples, proposed
CornerNet. This formulates BB object detection as the identification of paired top-left and bottom-right key points. In
CornerNet, the backbone network consists of two stacked
hourglass networks [95], with a simple corner pooling approach to better localize corners. Its accuracy, although improved, was obviously lower than that of SSD and YOLO’s.
CornerNet may generate incorrect BBs because it is difficult
to decide which pairs of key points belong to the same objects. Duan et al. [96] addressed the problem by detecting
each object as a triplet of key points, introducing an extra
point at the center of a proposal.
DL FRAMEWORKS FOR SMALL, WEAK OBJECT DETECTION
Though there is not a clear definition of small, weak object
detection in the field of remote sensing, some excellent DLbased works have been made to address the related challenges. Data augmentation is a straightforward and simple
technique used to improve the detection accuracy of small
22

objects. Kisantal et al. [97] simply oversampled images with
small objects and augmented each of those by copying and
pasting objects many times for small-object detection. Features of different levels in DL models can effectively retain
the location and semantic information of targets with different scales.
The development of multiscale detection, that is, detecting objects in an appropriate feature level, is marked by
many milestones, such as an FPN [88] and path aggregation [98], extended [99], multilevel [100], and multiscale
FPNs [101]. These models have proved their superiority and
achieved satisfactory performances, especially for small-scale
object detection. Although there has been success with multiscale detection, some objects lack the discriminative features necessary for recognition. Deng et al. [102] developed
a feature-level superresolution method that enhances the
features of small RoIs. Li et al. [103] proposed a perceptual
generative adversarial network (GAN) to improve the representations of tiny objects to large objects with similar characteristics for more precise detection. Visual attention is an
effective method used to highlight objects of interest, so it is
used to detect small and dim objects.
Yang et al. [104] developed a multicategory rotation detector for small, cluttered, and rotated objects wherein a supervised pixel-attention network and a channel-attention
network are jointly used for highlighting small and cluttered objects. Lim et al. [105] combined the context information and the objects of interest for addressing the limited
information of small objects. To address the nonuniform
distribution, Yang et al. [106] presented a clustered detection network wherein a cluster proposal subnetwork can
conduct object cluster regions and a scale-estimation subnetwork estimates object scales for each region. The clusterbased scale estimation is more accurate than the ones based
on single objects, and the clustered regions implicitly model the prior context information. The detailed techniques
and approaches for addressing small, weak target detection
are summarized in the next section.
ADVANCES FOR ADDRESSING DIFFERENT
CHALLENGES IN SMALL, WEAK OBJECT DETECTION
Inspired by the significant progress of object-detection
methods and technologies, extensive studies have been
devoted to object detection in remote sensing. Having
thoroughly reviewed the recent progress of representative
methods for remote sensing object detection, we introduce
some critical technologies and methods that address the
challenges to small, weak object detection. All of the mentioned approaches are divided into three aspects for solving
the challenges discussed in the “Difficulties and Challenges
in Remote Sensing Small, Weak Object Detection” section.
HANDLING THE CHALLENGES
INVOLVED IN IMAGE QUALITY
In remote sensing image acquisition, there are various
kinds of uncertain factors, such as noise, blurring, thin
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

clouds, missing information, and shadows, which may cause
some degree of image degradation. In addition, due to
the limitations of manufacturing technologies and the
characteristics of imaging sensors, remote sensing images
can reach a high resolution in only one aspect of spectral,
spatial, and temporal resolution. These low-quality images cause the missing or false detection of small, weak
objects. Therefore, improving the quality of remote sensing images is of great significance for small, weak object
detection. In the following, the problems to be solved
by the current methods for improving image quality are
summarized from two aspects: image degradation and
imaging sensor limitations.
HANDLING IMAGE DEGRADATION
The factors that cause image degradation can be divided
into two categories: 1) the atmospheric influence on the
reflection wave of ground objects and 2) the loss of information caused by the damaged components of the imaging sensors. Furthermore, a variety of degradation models,
such as noise, blurring, thin clouds, missing information,
and shadows, have been produced. Over the past few years,
many approaches have been developed for addressing these
different types of degradation models.
In general, noise cannot be entirely avoided while acquiring remote sensing images. The most common types are additive, multiplicative speckle, and stripe noises. Some classical denoising methods are described in [107]–[109].
The causes of blurring in remote sensing images are optical blurring, mainly caused by imaging components; motion
blurring, caused by relative motion between the target and
sensor; and atmospheric blurring, caused by atmospheric
turbulence. Most deblurring models use regularization terms
to keep the solution stable and suppress the corresponding noise interference. In general, existing works for image
deblurring can be divided into 1) image restoration with a
known blur kernel function and 2) blind image restoration
with an unknown blur kernel function [110], [111].
A large number of remote sensing images are likely covered
by clouds, which can be characterized as thin and thick
clouds. Thin clouds lead to the color fading of objects and
reduce the contrast of objects in the images, making them
difficult to recognize. In recent years, many approaches
[112]–[114] have been proposed for thin cloud removal.
Thick clouds and damaged sensors cause the loss of some
image regions. In this case, the surface information of Earth
obtained by images is incomplete and difficult to acquire
for real-world applications. Some representative methods
[115]–[117] have been developed to restore the missing
parts of remote sensing images.
Because of the imaging angle of sensors, shadows are
one of the basic characteristics of remote sensing images.
Tall trees, scattered buildings, mountains, and so on may
cause shadows. Many small, weak objects in shadows are
more difficult to recognize. Some effective methods for removing shadows are introduced in [118]–[121].
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

HANDLING SENSOR LIMITATIONS
Due to the limitations of sensors, remote sensing images
achieve high performance in only one aspect of spatial,
spectral, and temporal resolution, which cannot meet the requirements for some specific tasks. Additionally, when processing remote sensing images, it is necessary to discretize
the time, space, spectrum, and observation angle information from original images to save them in the form of digital
images. The process of discretization often means downsampling data, which inevitably leads to a loss of information. To
some extent, image-fusion models that fuse single or multisource images with different resolutions can remedy the
degradation of remote sensing images and improve the data
quality. Information-complementary fusion methods include spatial and spectral [122], temporal and spatial [123],
and multispectral and hyperspectral fusion [124].
HANDLING THE CHALLENGES INVOLVED
WITH OBJECT VARIATIONS
HRRS images always contain massive object categories and
instances, which are variant in scale, appearance, and distribution. The features easily change, as they are affected
by weather, illumination, and occlusions. Additionally,
due to large image sizes, the problem of unbalanced positive and negative training examples is quite serious, and
high-quality training instances are relatively few. Obtaining large-scale annotation data sets is another critical problem for achieving satisfactory detection performance. The
aforementioned challenges of object variations are divided
into four types in this article: scale variations, high intraclass variations, the imbalance of positive and negative examples, and a lack of annotation data sets. The scale problem should belong with high intraclass variations; however,
because of its importance in remote sensing object detection, we list it separately and summarize the corresponding
methods. The methods used to address the challenges of
these four aspects are introduced in the following sections.
HANDLING SCALE VARIATIONS
In the remote sensing community, scale variations, overlarge
images, complex image backgrounds, and the nonuniform
distribution of training samples make detection tasks more
challenging, especially for small and cluttered objects. Some
targets, such as football fields and harbors, are wider than
150 m and occupy 300 pixels in an image, while the widths
of some other targets, such as vehicles, are fewer than 3 m
and can occupy only 10 pixels in an image. The multiscale
detection of objects with different sizes and aspect ratios is
one of the main challenges in remote sensing object detection. Many scholars have further improved the model and
achieved better results for robust multiscale detection.
There are three main categories of detection methods
used in Earth observation. The first category uses an image
or sliding-window pyramids as the input. Zhang et al. [125],
[126] resized the input image to different scales and extracted image features on each scale. Yao et al. [127]–[129] used
23

multiscale sliding windows with different step sizes to conduct training with images for generating potential candidate
boxes. This method, however, is too time- and computation
consuming to meet the requirements of practical applications. The second category is based mainly on various multiscale features of a manual design, such as a scale-invariant
feature transform (SIFT) [130], an HoG [10], and a BoW [60].
Beril et al. [131] utilized the SIFT feature and graph theory
to detect buildings and urban areas. Shi et al. [40], [132]
combined both circle-frequency and HoG features to learn
the appearances and shapes of objects. Sun et al. [134] developed a spatial sparse-coding BoW model to build the visual vocabulary by clustering local features; it can effectively
fuse local and global features. However, the two categories
of methods pose difficulties when it comes to achieving satisfactory performances for remote sensing target detection
because they all depend on handcrafted features—extracted
according to expert experience—and are not robust enough
to process complex remote sensing images.
Since 2014, many learning-based detectors that incorporate the object proposal strategy, coupled with the remarkable performance of DL-based features [13], [14], [81],
[135], have enabled significant improvements in the performance of object localization and recognition [136]–[138].
Multireference and multiresolution detection, developed on
this basis, have become the two most widely used fundamental blocks in the task of object detection [21]. The main
idea of multireference detection is to predefine a set of reference boxes (anchor boxes) with different sizes and aspect
ratios and then to predict the detection box based on those
references. The milestone models are faster RCNN [73], RetinaNet [77], and mask RCNN [84], [85].
Multiresolution detection detects objects with different
scales by constructing a feature pyramid at different layers
of the network. The shallow layers hold information about
small objects, while the deep layers contain information
about large objects. The main improvements are in the FPN
[88]. To detect multiscale objects, especially small ones, in
HRRS images, Guo et al. [139] and Zhang et al. [140] designed unified multiscale detection frameworks; they used
a modified FPN as well as anchors with different scales and
aspect ratios.
Qiu et al. [141] developed an adaptive aspect ratio multiscale network, which utilizes a multiscale feature gatefusion subnetwork and an aspect ratio attention network
to learn the weights of different feature maps and automatically select the appropriate aspect ratios in accordance
with the aspect ratios of objects. Wu et al. [142] introduced
multiscale and rotation-insensitive convolutional channel
features by involving two modules, the rotation-insensitive
descriptor and the multiscale aggregated descriptor. AlAlimi et al. [143] designed a unique shallow-deep feature
extraction that employs a squeeze and excitation network
and ResNet to obtain feature maps. Deng et al. [144] addressed the problems of scale variants by applying different
filters to several intermediate layers. Li et al. [145] proposed
24

multiscale convolutional feature fusion to detect multisensor HRRS images using a symmetric encoder–decoder
module to extract and fuse multiscale and high-level spatial features.
Some scholars have focused their research work on segmentation methods. Dong and You [146], [147] utilized a
graph-segmentation algorithm. Based on multiscale saliency maps, it is constructed to overcome the problem
of ship scale change to accurately locate candidate regions.
Kang et al. [148] designed an FCN with dense SPP for building detection that can extract dense and multiscale features
simultaneously. Mo et al. [149] focused on generating an
anchor of the most suitable scale for each category and developed a class-specific anchor block, which provides better initial values for an RPN. Xie et al. [150] used multidetectors with different sensitivities and accessed the fused
features to finish the task of target detection. Superresolution [102] and GANs [103] have also been used to restore or
enhance the features response of small targets during the
detection process.
HANDLING INTRACLASS VARIATIONS
Objects in HRRS images vary in color, texture, and shape
feature because of the vast number of object instances and
categories as well as the influences of weather, illumination,
imaging condition, and occlusion. For real-scenario HRRS
image object detection, powerful object representations
should be extracted with robustness and discrimination.
Many recent works have been devoted to handling changes
in object variations by applying DL models to remote sensing object detection. However, CNN models lack the ability
to be spatially invariant for generating transformations of
input data. In processing HRRS images, the performance
of these models is limited due to the intraclass variations
of objects.
Data augmentation is the most straightforward method
used to address intraclass variations, including rotation and
resizing. To some extent, these operations can make detectors learn robustness with regard to rotation and scale, although these methods can involve expensive training and
a massive number of model parameters. Therefore, many
attempts have been made to learn invariant CNN representations with respect to different transformations, including
scale [151]–[153], rotation [151], [154]–[156], or both [157].
Early deformable part-based models (DPMs) [157], which
represent objects by components arranged in a deformable
configuration, were successful for generic object detection,
but these models are less sensitive to object variations in
both pose and viewpoint. Many scholars have attempted
to combine DPMs with CNNs, aiming to realize the advantages of both [159]–[161]. To address the problem of occlusions, deformable RoIPool [161]–[163] and deformable
convolution have been proposed to achieve more flexibility
in fixed geometric structures [27]. Another method, the application of GANs [164], [165] to generate missing parts of
objects and context, is promising.
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

HANDING THE IMBALANCE OF POSITIVE
AND NEGATIVE EXAMPLES
In essence, training a detector is a problem in imbalanced data learning. For detectors based on a sliding
window, the imbalance between objects of interest and
backgrounds may be as extreme as 104–105 background
windows for each object [21]. For a modern detection task
with a prediction of the object aspect ratio, the imbalanced ratios increase to greater than 106. In this case, a
vast number of negative and easy samples would guide the
training process, and the detector would achieve poor performances for hard-to-recognize objects, especially small,
weak objects. Hard negative mining focuses on solving
the problem of imbalanced data during the training process. Bootstrapping was a milestone technique used for
addressing the problem of a training data imbalance in
object detection, in which the training starts with a small
number of background samples to which new misclassified backgrounds are added iteratively during the training
process [166].
Later in the DL era, detectors such as faster RCNN [73]
and YOLO [75] developed a weighted balancing method
for positive and negative samples. However, that method
cannot completely address an imbalanced data problem.
Bootstrapping was reused in DL-based detectors [76], [167].
In RefineDet [168], an anchor-refinement module is designed to filter easy negatives. An alternative improvement
is to design new loss functions [77], [170] by reshaping the
standard cross-entropy loss to put more focus on difficult,
misclassified examples. The recent A-Fast-RCNN detection
model [164], which utilizes GANs to handle occlusion and
deformation samples, is also regarded as a hard miningapproach example. Pang et al. [172] proposed an IoUbalanced sample method to adaptively select high-quality
negative examples in the proposal candidates for stabilizing the training process.
In Earth observation literature, recent research works reveal that detection data sets contain an overwhelming number of easy examples and only a few difficult examples. Many
scholars have therefore tried to mine the more representative
difficult examples to balance the proportion of foreground–
background class examples. Traditional methods usually
freeze the model to mine negative examples; however, positive sample mining is also essential to avoid missed detection. Besides, freezing the model to collect difficult examples
would dramatically slow the progress of the model.
Cheng et al. [173] developed a two-step iterative training strategy, which alternates between updating the detection model given to the training set and adaptively
selecting the difficult negative examples for updating
the detection model. Focusing on airport detection, Cai
et al. [174] and Xu et al. [175] applied cascade strategies
to automatically select difficult examples according to
the loss values of proposals. The cascade strategies significantly inhibited the false alarms that existed in airport detection.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

HANDLING INSUFFICIENT TRAINING DATA
The difficulty of acquiring annotation samples means that
the training data are not usually sufficient for obtaining
ideal models, and data augmentation is the most straightforward method for increasing training data. In addition,
research scholars have developed many methods to address
the problem; these can be divided into three categories:
transfer learning (TL), active learning (AL), and weak supervised learning (WSL). TL can effectively transfer welltrained knowledge from one or more source tasks to another task; this needs only a small amount of labeled data
and eliminates the drudgery of preliminary learning [176]–
[179]. Dong et al. [180] proposed a Sig-NMS-based faster
RCNN with TL; this can annotate not only the class of an
object but also its location. Chan-Hon-Tong et al. [181] and
Kellenberger et al. [182] exploited an AL-based strategy to
find very confident samples for the quick retrieval of TPs in
the target data set.
Another method, WSL, addresses the data insufficiency problem by training detection using image-level labels
only. Recently, research works on WSL have followed different branches. Some scholars have utilized multi-instance
learning for WSL [183]–[185]. If an image contains many
object candidates, it is considered to involve a set of labeled
bags, with each bag containing many instances; image-level
annotation acts as the label. The object detector is then obtained by alternating detector training, using the detector
to select the most likely object instances in positive images.
Research works on CNN visualization have demonstrated
that the convolution layer of a CNN model behaves as a
target detector even though there is no supervision of the
object’s location. Therefore, class-activation mapping sheds
light on a way to give a CNN model localization ability by
training it on image-level labels [186]–[188]. Some scholars
automatically select the most informative regions and train
them with image-level annotation [189]. Another method
masks out different regions of the image to localize the object [190]. Interactive annotation [184] and generative adversarial training have also been used for WSL [191].
To address the problem of a lack of annotated HRRS
data sets, Zhang et al. [192] employed an iterative, weakly
supervised learning framework to automatically mine and
augment a training data set from the original images.
Cao et al. [193] proposed a novel multi-instance-detection
algorithm based on learning, using it to learn instancewise
detectors from such a “weak annotation.” In the algorithm,
a density estimator is adopted to estimate the density map
of vehicle instances from the positive regions; a multi-instance SVM is then trained to classify and locate vehicle
instances from this map. Although existing WSL methods
take scenes as being isolated and ignore the mutual cues
between scene pairs when optimizing deep networks,
Li et al. [194] exploited both the separate scene category
information and the mutual cues between scene pairs to
train deep networks well enough to pursue superior objectdetection performance.
25

HANDLING COMPLEX CONTEXT
Objects of interest are always embedded in a typical context
with surrounding environments and objects. An HRRS image usually involves a broad range of space and contains
many kinds of objects that form an intricate spatial pattern.
The complex background of the objects of interest increases
the difficulty of highly accurate detection; however, many
existing works have demonstrated that the proper use of
context information can improve the performance of detectors. Current works on the adaptation of complex backgrounds have been divided into two categories: 1) detection with a suppressing background and 2) detection with
related context information.
DETECTION WITH SUPPRESSING BACKGROUND
Many early works, taking advantage of the remarkable
feature-extraction ability of the CNN model, directly applied the models to adapt to the complex, changeable background and learn discriminative features for HRRS image
detection [126], [195]. To effectively distinguish between
the target and background information, Xiao et al. [196] designed an encoder–decoder network to perform paired semantic segmentation for per-pixel prediction. The top-left
and bottom-right parts of the objects of interest are then
predicted, and the rotated minimum BB is generated as the
rotated anchor. Compared to the presented methods, this
method is more robust across different data sets.
DETECTION WITH RELATED CONTEXT INFORMATION
The remote sensing community has long acknowledged that
context information benefits the improvement of object detection. Therefore, more work has been done to explore how
to make good use of that information. Context information
can be placed into two categories: local and global context
[21]. Local context refers to visual information such as the texture, color, and objects in the region that surrounds the targets to be detected. In contrast, global context employs scene
semantics as the additional information for target detection.
Existing methods focus mainly on fusing local contexts
to improve detection performance. Gong et al. [197] integrated the context RoIs’ mining layer into the detector. The
layer can extract local context features by mapping context
RoIs to multilevel feature maps. Considering the limited label information provided by objects—especially small objects—in the feature map, Mo et al. [149] doubled the size
of the region proposal box, with the center in the predicted
box, to incorporate the local context information and thus
improve the discriminative ability of features in recognizing the objects. Ma et al. proposed a multimodel decision fusion network [198], based on gated recurrent units
(GRUs) [199], in which one of the subnetworks is designed
to learn the local context of objects of interest and the object–object relationships. GRUs are used to merge all of the
features and form discriminative-feature representation.
Bell et al. [200] developed the inside–outside network
(ION) to exploit information both inside and outside the
26

RoIs; it integrates the contextual information outside the
RoIs by using spatial recurrent neural networks. Xiao et al.
[129] fused auxiliary features within and around the RoIs
to represent the complementary information of each region
proposal for airport detection, effectively alleviating detection problems caused by the diversity of illumination intensities in remote sensing images. To generate accurate rotation
BBs in large-scale aerial images, Feng et al. [202] proposed
a detection network that introduced a novel sequence local
context module. It can extract local context features, thus
making the rotated BB fit the ship tightly. The accurate BB
can include the discriminative parts, such as the prow, and
exclude noise information, such as the background.
Other works have promoted the global context as additional information. Focusing on the task of vehicle detection, Tao et al. [158] proposed a vehicle-detection method
driven by scene context. This first classifies the input image into different scene categories (e.g., road, parking lot,
and others) and then detects vehicles in different scenes
separately from the contextual information provided by
the prior scene. Incorporating the scene before vehicle detection can effectively confine the region where vehicles
may be present and apply a more flexible postprocessing
strategy according to different scene types. By analyzing the
relationship of objects and scenes in remote sensing images, Chen et al. [133] found that most of the objects appeared in their relevant scenes. The objects have a strong
correlation with the contextual information of their scene.
Chen et al. [133] proposed a scene-contextual FPN that fuses
the global scene features into region proposal features for
training the classifier.
Both global and local contextual information is valuable, so the fusion of the two may achieve a better performance. Relevant work has been carried out on this approach. Zhang and Liu et al. [169] proposed a context-aware
detection network to improve the accuracy of target detection; this can learn the correlations of the global information (at the scene level) and the local neighboring objects
or features (at the object level). Li and Gong [156] used a
double-channel network to fuse the local and global features to enhance the discrimination of the feature.
FUTURE RESEARCH DIRECTIONS
Despite tremendous recent progress in small, weak object
detection, the main technologies are still primitive and
cannot satisfactorily address all the difficulties and challenges. Our analysis shows that future research may focus
on (but should not be limited to) the following areas.
DETECTION WITH MULTISOURCE DATA FUSION
Detectors for small, weak object detection may not be
stable. The fusion of multiple sources/modalities of data,
such as 3D point clouds, lidar, and Internet data, is of great
importance for improving detection accuracy. Two critical
problems should be addressed: how to encode multisource
or multimodal data into a unified input for the detectors
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

and how to transfer well-trained detectors to different modalities of data.
WEAKLY SUPERVISED DETECTION
Recent state-of-the-art approaches require many samples
with accurate annotation, in the manner of fully supervised
learning. However, labeling samples is labor intensive and
time consuming. Meanwhile, weakly/partially annotated
or unlabeled samples are easily accessible and sufficient.
Therefore, it is essential to leverage DL-based models to
learn from these samples to boost detection ability.
LIGHTWEIGHT OBJECT MODEL
The number of layers in existing CNN models for extracting
features has dramatically increased from several [14] to hundreds of layers [13], [206]. They have millions of parameters
and need massive computation resources and training data
to obtain an ideal mode. To train the CNN models effectively,
much work has been done to develop a series of lightweight
and compact models. However, a significant gap in the efficiency between detectors and the human eye remains.
AUTOMATIC NEURAL ARCHITECTURE SEARCH
Most existing target detectors are based on manual design.
To meet problems of ever-increasing complexity requires increasing domain knowledge and expertise. Recently, a natural
research direction has been to automatically select and build
a detector with a performance that can deal with the number of parameters, such as automated ML [201]. Related work
should be carried out for small, weak object detection.
IMPROVEMENT OF IMAGE QUALITY
Affected by imaging conditions such as weather, light, and
the resolution of sensors, remote sensing images may not be
able to meet the requirements of usage, as they are blurred
or noisy or have low resolution. Algorithms, such as those
undertaken in image fusion, image denoising, and superresolution, have been developed to address these problems.
These should be combined with detection methods to improve detection performance.
UNIVERSAL OBJECT FRAMEWORK
Recently, increasing efforts have been made in learning universal representations, reinforcement learning, and lifelong
learning; these are effective in learning, transferring, and
reasoning knowledge from massive data. It is meaningful to
design a universal object framework based on state-of-theart advances, which can gradually self-evolve and improve
detection performance.
CONCLUSIONS
To meet the requirements of some applications, the task of
small, weak object detection, which is more challenging
than generic object detection, has gradually become increasingly important and attracted much attention. During
the last several years, considerable efforts have been made
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

to develop various methods that address small, weak object detection. This article presented a systematic review of
the advances of small, weak object detection in the remote
sensing community. Having analyzed the challenges and
difficulties of small, weak target detection, we discussed
the technical evolution of object detection and benchmark
data sets. Finally, we categorized the existing works that address different challenges and in which some promising research directions have been drawn for the further improvement of small, weak object detection. The research of small,
weak object detection is still far from complete, but given
the breakthroughs over the past several years, we are optimistic about future developments.
ACKNOWLEDGMENTS
This work was supported by the National Natural Science Foundation of China under grants U1711266 and 41925007 and
the Fundamental Research Funds for the Central Universities,
China University of Geosciences, Wuhan (no. 162301212697).
Lizhe Wang and Ruyi Feng are the corresponding authors.
AUTHOR INFORMATION
Wei Han (weihan@cug.edu.cn) is with the School of Computer Science, China University of Geosciences, Wuhan,
430078, China, and also the Hubei Key Laboratory of Intelligent Geo-Information Processing, China University of
Geosciences, Wuhan, 430078, China.
Jia Chen (chen_jia@cug.edu.cn) is with the School of
Computer Science, China University of Geosciences, Wuhan, 430078, China, and also the Hubei Key Laboratory of
Intelligent Geo-Information Processing, China University
of Geosciences, Wuhan, 430078, China.
Lizhe Wang (lizhe.wang@foxmail.com) is with the
School of Computer Science, China University of Geosciences, Wuhan, 430078, China, the Hubei Key Laboratory
of Intelligent Geo-Information Processing, China University of Geosciences, Wuhan, 430078, China, and the Key
Laboratory of Geological Survey and Evaluation of the
Ministry of Education, China University of Geosciences,
Wuhan, 430078, China.
Ruyi Feng (fengry@cug.edu.cn) is with the School of
Computer Science, China University of Geosciences, Wuhan, 430078, China, and also the Hubei Key Laboratory of
Intelligent Geo-Information Processing, China University
of Geosciences, Wuhan, 430078, China.
Fengpeng Li (li_feng_peng@cug.edu.cn) is with the
School of Computer Science, China University of Geosciences, Wuhan, 430078, China, and also the Hubei Key Laboratory of Intelligent Geo-Information Processing, China
University of Geosciences, Wuhan, 430078, China.
Lin Wu (wulin@cug.edu.cn) is with the Key Laboratory of
Geological Survey and Evaluation of the Ministry of Education,
China University of Geosciences, Wuhan, 430074, China.
Tian Tian (tiantian@cug.edu.cn) is with the School of
Computer Science, China University of Geosciences, Wuhan, 430078, China, and also the Hubei Key Laboratory of
27

Intelligent Geo-Information Processing, China University
of Geosciences, Wuhan, 430078, China.
Jining Yan (yanjn@cug.edu.cn) is with the School of
Computer Science, China University of Geosciences, Wuhan, 430078, China, and also the Hubei Key Laboratory of
Intelligent Geo-Information Processing, China University
of Geosciences, Wuhan, 430078, China.
REFERENCES
[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

Z. Lin et al., “A contextual and multitemporal active-fire detection algorithm based on FengYun-2G S-VISSR data,” IEEE
Trans. Geosci. Remote Sens., vol. 57, no. 11, pp. 8840–8852,
2019. doi: 10.1109/TGRS.2019.2923248.
Z. Lin et al., “An active fire detection algorithm based on multitemporal FengYun-3C VIRR data,” Remote Sens. Environ, vol.
211, pp. 376–387, June 2018. doi: 10.1016/j.rse.2018.04.027.
N. Wang, F. Chen, B. Yu, and Y. Qin, “Segmentation of largescale remotely sensed images on a spark platform: A strategy
for handling massive image tiles with the MapReduce model,”
ISPRS J. Photogram. Remote Sens., vol. 162, pp. 137–147, Apr.
2020. doi: 10.1016/j.isprsjprs.2020.02.012.
N. Gorelick, M. Hancher, M. Dixon, S. Ilyushchenko, D. Thau,
and R. Moore, “Google earth engine: Planetary-scale geospatial analysis for everyone,” Remote Sens. Environ., vol. 202, pp.
18–27, Dec. 2017. doi: 10.1016/j.rse.2017.06.031.
D. Li, Y. Ke, H. Gong, and X. Li, “Object-based urban tree species classification using bi-temporal worldview-2 and worldview-3 images,” Remote Sens., vol. 7, no. 12, pp. 16,917–16,937,
2015. doi: 10.3390/rs71215861.
K. Huang and X. Mao, “Detectability of infrared small targets,”
Infrared Phys. Techn., vol. 53, no. 3, pp. 208–217, 2010. doi:
10.1016/j.infrared.2009.12.001.
D. M. McKeown Jr. and J. L. Denlinger, “Cooperative methods
for road tracking in aerial imagery,” in Proc. IEEE Comput. Soc.
Conf. Comput. Vis. Pattern Recognit., 1988, pp. 662–672. doi:
10.1109/CVPR.1988.196307.
S. Leninisha and K. Vani, “Water flow based geometric active
deformable model for road network,” ISPRS J. Photogram. Remote Sens., vol. 102, pp. 140–147, Apr. 2015. doi: 10.1016/j.isprsjprs.2015.01.013.
J. Peng and Y. Liu, “Model and context-driven building extraction in dense urban aerial images,” Int. J. Remote Sens., vol. 26, no.
7, pp. 1289–1307, 2005. doi: 10.1080/01431160512331326675.
N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 886–893. doi: 10.1109/CVPR.2005.177.
G. Cheng and J. Han, “A survey on object detection in optical
remote sensing images,” ISPRS J. Photogram. Remote Sens., vol.
117, pp. 11–28, July 2016. doi: 10.1016/j.isprsjprs.2016.03.014.
J. Deng, W. Dong, R. Socher, L. Li, K. Li, and F. Li, “ImageNet:
A large-scale hierarchical image database,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), 2009,
pp. 248–255.
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning
for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern
Recognit., 2016, pp. 770–778. doi: 10.1109/CVPR.2016.90.

[14] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc.
Adv. Neural Inf. Process. Syst. 25: 26th Annu. Conf. Neural Inf. Process. Syst., 2012, pp. 1106–1114.
[15] F. Li, R. Feng, W. Han, and L. Wang, “High-resolution remote
sensing image scene classification via key filter bank based
on convolutional neural network,” IEEE Trans. Geosci. Remote
Sens., vol. 58, no. 11, pp. 8077–8092, 2020. doi: 10.1109/TGRS
.2020.2987060.
[16] K. Liu and G. Mattyus, “Fast multiclass vehicle detection on
aerial images,” IEEE Geosci. Remote Sens. Lett., vol. 12, no. 9, pp.
1938–1942, 2015. doi: 10.1109/LGRS.2015.2439517.
[17] S. Razakarivony and F. Jurie, “Vehicle detection in aerial imagery: A small target detection benchmark,” J. Vis. Commun. Image
R, vol. 34, pp. 187–203, 2016. doi: 10.1016/j.jvcir.2015.11.002.
[18] G. Xia et al., “DOTA: A large-scale dataset for object detection
in aerial images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 3974–3983. doi: 10.1109/CVPR.2018.00418.
[19] K. Li, G. Wan, G. Cheng, L. Meng, and J. Han, “Object detection in optical remote sensing images: A survey and a new
benchmark,” ISPRS J. Photogram. Remote Sens., vol. 159, pp.
296–307, Jan. 2020. doi: 10.1016/j.isprsjprs.2019.11.023.
[20] L. Liu et al., “Deep learning for generic object detection: A survey,” Int. J. Comput. Vis., vol. 128, no. 2, pp. 261–318, 2020. doi:
10.1007/s11263-019-01247-4.
[21] Z. Zou, Z. Shi, Y. Guo, and J. Ye, “Object detection in 20
years: A survey,” 2019. [Online]. Available: http://arxiv.org/
abs/1905.05055
[22] M. Manana, C. Tu, and P. A. Owolawi, “A survey on vehicle
detection based on convolution neural networks,” in Proc. 3rd
IEEE Int. Conf. Comput. Commun. (ICCC), 2017, pp. 1751–1755.
doi: 10.1109/CompComm.2017.8322840.
[23] A. Borji, M.-M. Cheng, Q. Hou, H. Jiang, and J. Li, “Salient object detection: A survey,” Comput. Vis. Media, vol. 1411, no. 7,
pp. 1–34, 2014. doi: 10.1007/s41095-019-0149-9.
[24] W. Wang, Q. Lai, H. Fu, J. Shen, and H. Ling, “Salient object
detection in the deep learning era: An in-depth survey,” 2019,
arXiv:1904.09146.
[25] J. Han, D. Zhang, G. Cheng, N. Liu, and D. Xu, “Advanced
deep-learning techniques for salient and category-specific object detection: A survey,” IEEE Signal Process. Mag., vol. 35, no.
1, pp. 84–100, 2018. doi: 10.1109/MSP.2017.2749125.
[26] Z. Liu, H. Wang, L. Weng, and Y. Yang, “Ship rotated bounding
box space for ship extraction from high-resolution optical satellite images with complex backgrounds,” IEEE Geosci. Remote
Sens. Lett., vol. 13, no. 8, pp. 1074–1078, 2016. doi: 10.1109/
LGRS.2016.2565705.
[27] X. Zhang, Y. Yang, Z. Han, H. Wang, and C. Gao, “Object class
detection: A survey,” ACM Comput. Surv., vol. 46, no. 1, pp.
10:1–10:53, 2013. doi: 10.1145/2522968.2522978.
[28] G.-D. Wang, C.-Y. Chen, and X.-B. Shen, “Facet-based infrared
small target detection method,” Electron. Lett., vol. 41, no. 22,
pp. 1244–1246, 2005. doi: 10.1049/el:20052289.
[29] G. J. Klinker, S. A. Shafer, and T. Kanade, “Image segmentation
and reflection analysis through color,” in Proc. Appl. Artific. Intell. VI, vol. 937, 1988, pp. 229–244. doi: 10.1117/12.946980.
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

[30] P. W. Kruse, “Principles of uncooled infrared focal plane arrays,” in Semiconductors Semimetals, vol. 47, P. W. Kruse and D.
D. Skatrud, Amsterdam, The Netherlands: Elsevier, 1997,
pp. 17–42.
[31] J. Han, Y. Ma, B. Zhou, F. Fan, K. Liang, and Y. Fang, “A robust
infrared small target detection algorithm based on human visual system,” IEEE Geosci. Remote Sens. Lett., vol. 11, no. 12, pp.
2168–2172, 2014. doi: 10.1109/LGRS.2014.2323236.
[32] B. Lei, B. Wang, G. Sun, Y. Xu, P. Hong, C. Liu, and S. Yue, “A
fast detection method for small weak infrared target in complex
background,” in Proc. Infrared, Millimeter-Wave, Terahertz Technol. IV, vol. 10030, 2016, p. 100301V. doi: 10.1117/12.2245912.
[33] A. G. Tartakovsky, S. Kligys, and A. Petrov, “Adaptive sequential algorithms for detecting targets in a heavy IR clutter,” in
Proc. Signal Data Process. Small Targets 1999, vol. 3809, pp. 119–
130. doi: 10.1117/12.364013.
[34] A. G. Tartakovsky and R. B. Blazek, “Effective adaptive spatialtemporal technique for clutter rejection in IRST,” in Proc. Signal Data Process. Small Targets 2000, vol. 4048, pp. 85–95. doi:
10.1117/12.392023.
[35] B. L. Rozovskii, A. Petrov, and R. B. Blazek, “Interactive banks
of Bayesian matched filters,” in Proc. Signal Data Process Small
Targets 2000, vol. 4048, pp. 122–133. doi: 10.1117/12.391972.
[36] C. Benedek, X. Descombes, and J. Zerubia, “Building development monitoring in multitemporal remotely sensed image
pairs with stochastic birth-death dynamics,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 1, pp. 33–50, 2012. doi:
10.1109/TPAMI.2011.94.
[37] G. Cheng, J. Han, P. Zhou, and L. Guo, “Multi-class geospatial
object detection and geographic image classification based on
collection of part detectors,” ISPRS J. Photogram. Remote Sens.,
vol. 98, pp. 119–132, 2014. doi: 10.1016/j.isprsjprs.2014.10.002.
[38] H. Zhu, X. Chen, W. Dai, K. Fu, Q. Ye, and J. Jiao, “Orientation
robust object detection in aerial images using deep convolutional neural network,” in Proc. IEEE Int. Conf. Image Process.,
2015, pp. 3735–3739. doi: 10.1109/ICIP.2015.7351502.
[39] T. N. Mundhenk, G. Konjevod, W. A. Sakla, and K. Boakye, “A
large contextual dataset for classification, detection and counting of cars with deep learning,” in Proc. Comput. Vis. - ECCV
2016 - 14th Euro. Conf., Amsterdam, The Netherlands, pp. 785–
800. doi: 10.1007/978-3-319-46487-9_48.
[40] Z. Xiao, Q. Liu, G. Tang, and X. Zhai, “Elliptic fourier transformation-based histograms of oriented gradients for rotationally invariant object detection in remote-sensing images,”
Int. J. Remote Sens., vol. 36, no. 2, pp. 618–644, 2015. doi:
10.1080/01431161.2014.999881.
[41] D. Lam et al., “xView: Objects in context in overhead imagery,”
2018, arXiv:1802.07856.
[42] Y. Zhang, Y. Yuan, Y. Feng, and X. Lu, “Hierarchical and robust
convolutional neural network for very high-resolution remote
sensing object detection,” IEEE Trans. Geosci. Remote Sens., vol. 57,
no. 8, pp. 5535–5548, 2019. doi: 10.1109/TGRS.2019.2900302.
[43] X. Lu, Y. Zhang, Y. Yuan, and Y. Feng, “Gated and axis-concentrated localization network for remote sensing object detection,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 1, pp.
179–192, 2020. doi: 10.1109/TGRS.2019.293517.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

[44] M. Everingham, S. M. A. Eslami, L. V. Gool, C. K. I. Williams,
J. M. Winn, and A. Zisserman, “The Pascal visual object classes
challenge: A retrospective,” Int. J. Comput. Vis., vol. 111, no. 1,
pp. 98–136, 2015. doi: 10.1007/s11263-014-0733-5.
[45] O. Russakovsky et al., “ImageNet large scale visual recognition
challenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252,
2015. doi: 10.1007/s11263-015-0816-y.
[46] T. Lin et al., “Microsoft COCO: Common objects in context,”
in Proc. Comput. Vis. - ECCV 2014 - 13th Euro. Conf., D. J. Fleet,
T. Pajdla, B. Schiele, and T. Tuytelaars, Eds., in Lecture Notes
in Computer Science, vol. 8693, 2014, pp. 740–755. doi:
10.1007/978-3-319-10602-1_48.
[47] A. Kuznetsova et al., “The open images dataset V4: Unified
image classification, object detection, and visual relationship
detection at scale,” 2018. [Online]. Available: http://arxiv.org/
abs/1811.00982
[48] G. Heitz and D. Koller, “Learning spatial context: Using stuff to
find things,” in ECCV 2008: Proc. 10th Euro. Conf. Comput. Vis.,
Part I, pp. 30–43. doi: 10.1007/978-3-540-88682-2_4.
[49] M. Everingham, L. V. Gool, C. K. I. Williams, J. M. Winn, and
A. Zisserman, “The Pascal visual object classes (VOC) challenge,” Int. J. Comput. Vis., vol. 88, no. 2, pp. 303–338, 2010.
doi: 10.1007/s11263-009-0275-4.
[50] J. Zhang, X. Lin, Z. Liu, and J. Shen, “Semi-automatic road
tracking by template matching and distance transformation in
urban areas,” Int. J. Remote Sens., vol. 32, no. 23, pp. 8331–8347,
2011. doi: 10.1080/01431161.2010.540587.
[51] J. Zhou, W. F. Bischof, and T. Caelli, “Road tracking in aerial
images based on human–computer interaction and Bayesian
filtering,” ISPRS J. Photogram. Remote Sens., vol. 61, no. 2, pp.
108–124, 2006. doi: 10.1016/j.isprsjprs.2006.09.002.
[52] M. A. Fischler and R. A. Elschlager, “The representation
and matching of pictorial structures,” IEEE Trans. Comput., vol. C -22, no. 1, pp. 67–92, 1973. doi: 10.1109/
T-C.1973.223602.
[53] A. K. Jain, Y. Zhong, and M. Dubuisson-Jolly, “Deformable
template models: A review,” Signal Process., vol. 71, no. 2, pp.
109–129, 1998. doi: 10.1016/S0165-1684(98)00139-X.
[54] C. Xu and H. Duan, “Artificial bee colony (ABC) optimized edge potential function (EPF) approach to target
recognition for low-altitude aircraft,” Pattern Recognit.
Lett., vol. 31, no. 13, pp. 1759–1772, 2010. doi: 10.1016/j.
patrec.2009.11.018.
[55] A. Huertas and R. Nevatia, “Detecting buildings in aerial images,” Comput. Vis. Graph. Image Process., vol. 41, no. 2, pp. 131–
152, 1988. doi: 10.1016/0734-189X(88)90016-3.
[56] R. B. Irvin and D. M. McKeown, “Methods for exploiting the
relationship between buildings and their shadows in aerial
imagery,” IEEE Trans. Syst., Man, Cybern., vol. 19, no. 6, pp.
1564–1575, 1989. doi: 10.1109/21.44071.
[57] T. Blaschke, “Object based image analysis for remote sensing,”
ISPRS J. Photogram. Remote Sens., vol. 65, no. 1, pp. 2–16, 2010.
doi: 10.1016/j.isprsjprs.2009.06.004.
[58] T. Blaschke et al., “Geographic object-based image analysis–Towards a new paradigm,” ISPRS J. Photogram. Remote Sens., vol.
87, pp. 180–191, Jan. 2014. doi: 10.1016/j.isprsjprs.2013.09.014.

[59] T. Blaschke, S. Lang, and G. Hay, Object-Based Image Analysis:
Spatial Concepts for Knowledge-Driven Remote Sensing Applications. Berlin: Springer-Verlag, 2008.
[60] F. Li and P. Perona, “A Bayesian hierarchical model for learning natural scene categories,” in Proc. IEEE Comput. Soc.
Conf. Comput. Vis. Pattern Recognit. CVPR, pp. 524–531. doi:
10.1109/CVPR.2005.16.
[61] Ö. Aytekin, U. Zöngür, and U. Halici, “Texture-based airport
runway detection,” IEEE Geosci. Remote Sens. Lett., vol. 10, no.
3, pp. 471–475, 2013. doi: 10.1109/LGRS.2012.2210189.
[62] C. Senaras, M. Ozay, and F. T. Yarman-Vural, “Building detection with decision fusion,” IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., vol. 6, no. 3, pp. 1295–1304, 2013. doi: 10.1109/
JSTARS.2013.2249498.
[63] V. Vapnik, Statistical Learning Theory. Hoboken, NJ: Wiley, 1998.
[64] J. Inglada, “Automatic recognition of man-made objects in
high resolution optical remote sensing images by svm classification of geometric image features,” ISPRS J. Photogram. Remote Sens., vol. 62, no. 3, pp. 236–248, 2007. doi: 10.1016/j.
isprsjprs.2007.05.011.
[65] Y. Freund and R. E. Schapire, “Experiments with a new boosting
algorithm,” in Proc. 13th Int. Conf. Machine Learn. (ICML ‘96),
Bari, Italy, 1996, pp. 148–156. doi: 10.5555/3091696.3091715.
[66] Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,” J. Comput. Syst. Sci., vol. 55, no. 1, pp. 119–139, 1997. doi: 10.1006/
jcss.1997.1504.
[67] E. Blanzieri and F. Melgani, “Nearest neighbor classification of
remote sensing images with the maximal margin principle,”
IEEE Trans. Geosci. Remote Sens., vol. 46, no. 6, pp. 1804–1811,
2008. doi: 10.1109/TGRS.2008.916090.
[68] E. Li, J. Femiani, S. Xu, X. Zhang, and P. Wonka, “Robust rooftop extraction from visible band images using higher order
CRF,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 8, pp. 4483–
4495, 2015. doi: 10.1109/TGRS.2015.2400462.
[69] P. Zhong and R. Wang, “A multiple conditional random fields
ensemble model for urban area detection in remote sensing
optical images,” IEEE Trans. Geosci. Remote Sens., vol. 45, no.
12, pp. 3978–3988, 2007. doi: 10.1109/TGRS.2007.907109.
[70] R. B. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
(CVPR), 2014, pp. 580–587. doi: 10.1109/CVPR.2014.81.
[71] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling
in deep convolutional networks for visual recognition,” IEEE
Trans. Pattern Anal. Mach. Intell., vol. 37, no. 9, pp. 1904–1916,
2015. doi: 10.1109/TPAMI.2015.2389824.
[72] R. B. Girshick, “Fast R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Santiago, Chile, 2015, pp. 1440–1448. doi:
10.1109/ICCV.2015.169.
[73] S. Ren, K. He, R. B. Girshick, and J. Sun, “Faster R-CNN: Towards
real-time object detection with region proposal networks,” in
Proc. Annu. Conf. Neural Inf. Process. Syst., 2015, pp. 91–99.
[74] J. Dai, Y. Li, K. He, and J. Sun, “R-FCN: Object detection via region-based fully convolutional networks,” in Proc. Annu. Conf.
Neural Inf. Process. Syst., 2016, pp. 379–387.

[75] J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi, “You
only look once: Unified, real-time object detection,” in Proc.
IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp.
779–788.
[76] W. Liu et al., “SSD: Single shot multibox detector,” in Proc. 14th
Euro. Conf. Comput. Vis., Amsterdam, The Netherlands, 2016,
pp. 21–37. doi: 10.1007/978-3-319-46448-0_2.
[77] T. Lin, P. Goyal, R. B. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” IEEE Trans. Pattern Anal.
Mach. Intell., vol. 42, no. 2, pp. 318–327, 2020. doi: 10.1109/
TPAMI.2018.2858826.
[78] H. Law and J. Deng, “Cornernet: Detecting objects as paired
keypoints,” Int. J. Comput. Vis., vol. 128, no. 3, pp. 642–656,
2020. doi: 10.1007/s11263-019-01204-1.
[79] Y. Zhao and J. Yang, “Hyperspectral image denoising via sparse
representation and low-rank constraint,” IEEE Trans. Geosci.
Remote Sens., vol. 53, no. 1, pp. 296–308, 2015. doi: 10.1109/
TGRS.2014.2321557.
[80] P. A. Viola and M. J. Jones, “Rapid object detection using a
boosted cascade of simple features,” in Proc. 2001 IEEE Comput.
Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 511–518.
doi: 10.1109/CVPR.2001.990517.
[81] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2014. [Online]. Available: http://arxiv.org/abs/1409.1556
[82] S. Xie, R. B. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated
residual transformations for deep neural networks,” in Proc.
2017 IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp.
5987–5995. doi: 10.1109/CVPR.2017.634.
[83] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,”
in Proc. 13th Euro. Conf. Comput. Vis., 2014, pp. 346–361. doi:
10.1007/978-3-319-10578-9_23.
[84] K. He, G. Gkioxari, P. Dollár, and R. B. Girshick, “Mask R-CNN,”
in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2017, pp. 2980–
2988. doi: 10.1109/ICCV.2017.324.
[85] K. He, G. Gkioxari, P. Dollár, and R. B. Girshick, “Mask R-CNN,”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 2, pp. 386–
397, 2020. doi: 10.1109/TPAMI.2018.2844175.
[86] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional
networks for semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2015, pp. 3431–3440. doi:
10.1109/CVPR.2015.7298965.
[87] E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional
networks for semantic segmentation,” IEEE Trans. Pattern Anal.
Mach. Intell., vol. 39, no. 4, pp. 640–651, 2017. doi: 10.1109/
TPAMI.2016.2572683.
[88] T. Lin, P. Dollár, R. B. Girshick, K. He, B. Hariharan, and S. J.
Belongie, “Feature pyramid networks for object detection,” in
Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp.
936–944. doi: 10.1109/CVPR.2017.106.
[89] W. Ouyang, K. Wang, X. Zhu, and X. Wang, “Chained cascade
network for object detection,” in Proc. IEEE Int. Conf. Comput.
Vis. (ICCV), 2017, pp. 1956–1964. doi: 10.1109/ICCV.2017.214.
[90] Z. Cai and N. Vasconcelos, “Cascade R-CNN: Delving into
high quality object detection,” in Proc. IEEE Conf. Comput. Vis.
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

[91]

[92]

[93]
[94]

[95]

[96]

[97]

[98]

[99]

[100]

[101]

[102]

[103]

[104]

[105]

[106]

[107]

Pattern Recognit. (CVPR), 2018, pp. 6154–6162. doi: 10.1109/
CVPR.2018.00644.
K. Chen et al., “Hybrid task cascade for instance segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
2019, pp. 4974–4983. doi: 10.1109/CVPR.2019.00511.
J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
2017, pp. 6517–6525. doi: 10.1109/CVPR.2017.690.
J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,” 2018. [Online]. Available: http://arxiv.org/abs/1804.02767
H. Law and J. Deng, “CornerNet: Detecting objects as paired
keypoints,” in Proc. 15th Euro. Conf. Comput. Vis., 2018, pp.
765–781.
A. Newell, K. Yang, and J. Deng, “Stacked hourglass networks
for human pose estimation,” in Proc. 14th Euro. Conf. Comput.
Vis., Amsterdam, The Netherlands, 2016, pp. 483–499.
K. Duan, S. Bai, L. Xie, H. Qi, Q. Huang, and Q. Tian, “Centernet: Keypoint triplets for object detection,” in Proc. IEEE/CVF
Int. Conf. Comput. Vis. (ICCV), 2019, pp. 6568–6577.
M. Kisantal, Z. Wojna, J. Murawski, J. Naruniec, and K. Cho,
“Augmentation for small object detection,” 2019. [Online].
Available: http://arxiv.org/abs/1902.07296
S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path aggregation network for instance segmentation,” in Proc. IEEE Conf. Comput.
Vis. Pattern Recognit. (CVPR), 2018, pp. 8759–8768.
C. Deng, M. Wang, L. Liu, and Y. Liu, “Extended feature pyramid network for small object detection,” 2020. [Online]. Available: https://arxiv.org/abs/2003.07021
Q. Zhao et al., “M2Det: A single-shot object detector based
on multi-level feature pyramid network,” in Proc. 33rd AAAI
Conf. Artific. Intell., 2019, pp. 9259–9266. doi: 10.1609/aaai.
v33i01.33019259.
Z. Liu, G. Gao, L. Sun, and Z. Fang, “HRDNet: High-resolution
detection network for small objects,” 2020. [Online]. Available:
https://arxiv.org/abs/2006.07607
J. Noh, W. Bae, W. Lee, J. Seo, and G. Kim, “Better to follow, follow to be better: Towards precise supervision of feature superresolution for small object detection,” in Proc. IEEE/CVF Int.
Conf. Comput. Vis. (ICCV), 2019, pp. 9724–9733.
J. Li, X. Liang, Y. Wei, T. Xu, J. Feng, and S. Yan, “Perceptual
generative adversarial networks for small object detection,” in
Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp.
1951–1959. doi: 10.1109/CVPR.2017.211.
X. Yang et al., “SCRDet: Towards more robust detection for
small, cluttered and rotated objects,” in Proc. IEEE/CVF Int.
Conf. Comput. Vis. (ICCV), 2019, pp. 8231–8240. doi: 10.1109/
ICCV.2019.00832.
J. Lim, M. Astrid, H. Yoon, and S. Lee, “Small object detection
using context and attention,” 2019. [Online]. Available: http://
arxiv.org/abs/1912.06319
F. Yang, H. Fan, P. Chu, E. Blasch, and H. Ling, “Clustered object detection in aerial images,” in Proc. IEEE/CVF Int. Conf.
Comput. Vis. (ICCV), 2019, pp. 8310–8319.
V. S. Frost, J. A. Stiles, K. S. Shanmugan, and J. C. Holtzman,
“A model for radar images and its application to adaptive
digital filtering of multiplicative noise,” IEEE Trans. Pattern

DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

Anal. Mach. Intell., vol. PAMI-4, no. 2, pp. 157–166, 1982. doi:
10.1109/TPAMI.1982.4767223.
[108] D. T. Kuan, A. A. Sawchuk, T. C. Strand, and P. Chavel, “Adaptive noise smoothing filter for images with signal-dependent
noise,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-7, no.
2, pp. 165–177, 1985. doi: 10.1109/TPAMI.1985.4767641.
[109] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation
based noise removal algorithms,” Phys. D, Nonlinear Phenomena, vol. 60, nos. 1–4, pp. 259–268, 1992. doi: 10.1016/01672789(92)90242-F.
[110] C. R. Vogel and M. E. Oman, “Fast, robust total variationbased reconstruction of noisy, blurred images,” IEEE Trans.
Image Process., vol. 7, no. 6, pp. 813–824, 1998. doi: 10.1109/
83.679423.
[111] J. Cai, H. Ji, C. Liu, and Z. Shen, “Framelet-based blind motion
deblurring from a single image,” IEEE Trans. Image Process., vol.
21, no. 2, pp. 562–572, 2012. doi: 10.1109/TIP.2011.2164413.
[112] M. Xu, M. R. Pickering, A. J. Plaza, and X. Jia, “Thin cloud
removal based on signal transmission principles and spectral mi x ture analysis,” IEEE Trans. Geosci. Remote Sens.,
vol. 54, no. 3, pp. 1659–1669, 2016. doi: 10.1109/TGRS.2015.
2486780.
[113] Y. Zhang, B. Guindon, and J. Cihlar, “An image transform to
characterize and compensate for spatial variations in thin
cloud contamination of Landsat images,” Remote Sens. Environ., vol. 82, nos. 2–3, pp. 173–187, 2002. doi: 10.1016/S00344257(02)00034-2.
[114] S. Le Hégarat-Mascle and C. André, “Use of Markov random
fields for automatic cloud/shadow detection on high resolution optical images,” ISPRS J. Photogram. Remote Sens., vol.
64, no. 4, pp. 351–366, 2009. doi: 10.1016/j.isprsjprs.2008.
12.007.
[115] J. Zhang, M. K. Clayton, and P. A. Townsend, “Missing data
and regression models for spatial images,” IEEE Trans. Geosci.
Remote Sens., vol. 53, no. 3, pp. 1574–1582, 2015. doi: 10.1109/
TGRS.2014.2345513.
[116] C. Zeng, H. Shen, and L. Zhang, “Recovering missing pixels for
Landsat ETM + SLC-off imagery using multi-temporal regression analysis and a regularization method,” Remote Sens. Environ., vol. 131, pp. 182–194, Apr. 2013. doi: 10.1016/j.rse.2012.
12.012.
[117] X. Li, H. Shen, L. Zhang, and H. Li, “Sparse-based reconstruction of missing information in remote sensing images
from spectral/temporal complementary information,” ISPRS
J. Photogram. Remote Sens., vol. 106, pp. 1–15, Aug. 2015. doi:
10.1016/j.isprsjprs.2015.03.009.
[118] H. Li, L. Zhang, and H. Shen, “An adaptive nonlocal regularized shadow removal method for aerial remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 1, pp. 106–120,
2014. doi: 10.1109/TGRS.2012.2236562.
[119] G. D. Finlayson, S. D. Hordley, and M. S. Drew, “Removing
shadows from images using retinex,” in Proc. 10th Color Imag.
Conf., Color Sci. Eng. Syst., Technol., Appl., 2002, pp. 73–79.
[120] A. Suzuki, A. Shio, H. Arai, and S. Ohtsuka, “Dynamic shadow
compensation of aerial images based on color and spatial
analysis,” in Proc. 15th Int. Conf. Pattern Recognit. (ICPR’00),

Barcelona, Spain, 2000, pp. 1317–1320. doi: 10.1109/ICPR.2000.
905339.
[121] H. Song, B. Huang, and K. Zhang, “Shadow detection and re
construction in high-resolution satellite images via morphological filtering and example-based learning,” IEEE Trans.
Geosci. Remote Sens., vol. 52, no. 5, pp. 2545–2554, 2014. doi:
10.1109/TGRS.2013.2262722.
[122] H. Li, B. S. Manjunath, and S. K. Mitra, “Multi-sensor image fusion using the wavelet transform,” in Proc. 1994 Int. Conf. Image
Process., pp. 51–55. doi: 10.1109/ICIP.1994.413273.
[123] F. Gao, J. G. Masek, M. R. Schwaller, and F. G. Hall, “On the
blending of the Landsat and MODIS surface reflectance: Predicting daily Landsat surface reflectance,” IEEE Trans. Geosci.
Remote Sens., vol. 44, no. 8, pp. 2207–2218, 2006. doi: 10.1109/
TGRS.2006.872081.
[124] Q. Wei, J. M. Bioucas-Dias, N. Dobigeon, and J. Tourneret, “Hyperspectral and multispectral image fusion based on a sparse
representation,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 7,
pp. 3658–3668, 2015. doi: 10.1109/TGRS.2014.2381272.
[125] L. Zhang and Y. Zhang, “Airport detection and aircraft recognition based on two-layer saliency model in high spatial resolution remote-sensing images,” IEEE J Sel. Topics Appl. Earth
Observ. Remote Sens., vol. 10, no. 4, pp. 1511–1524, 2017. doi:
10.1109/JSTARS.2016.2620900.
[126] Y. Long, Y. Gong, Z. Xiao, and Q. Liu, “Accurate object localization in remote sensing images based on convolutional neural
networks,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 5, pp.
2486–2498, 2017. doi: 10.1109/TGRS.2016.2645610.
[127] X. Yao, J. Han, L. Guo, S. Bu, and Z. Liu, “A coarse-to-fine model
for airport detection from remote sensing images using
target-oriented visual saliency and CRF,” Neurocomputing,
vol. 164, pp. 162–172, Sept. 2015. doi: 10.1016/j.neucom.2015.
02.073.
[128] J. Han, D. Zhang, G. Cheng, L. Guo, and J. Ren, “Object detection in optical remote sensing images based on weakly supervised learning and high-level feature learning,” IEEE Trans.
Geosci. Remote Sens., vol. 53, no. 6, pp. 3325–3337, 2015. doi:
10.1109/TGRS.2014.2374218.
[129] Z. Xiao, Y. Gong, Y. Long, D. Li, X. Wang, and H. Liu, “Airport detection based on a multiscale fusion feature for optical remote sensing images,” IEEE Geosci. Remote Sens. Lett.,
vol. 14, no. 9, pp. 1469–1473, 2017. doi: 10.1109/LGRS.2017.
2712638.
[130] D. G. Lowe, “Distinctive image features from scale-invariant
keypoints,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004.
doi: 10.1023/B:VISI.0000029664.99615.94.
[131] B. Sirmaçek and C. Ünsalan, “Urban-area and building detection using SIFT keypoints and graph theory,” IEEE Trans. Geosci.
Remote Sens., vol. 47, no. 4, pp. 1156–1167, 2009. doi: 10.1109/
TGRS.2008.2008440.
[132] Z. Shi, X. Yu, Z. Jiang, and B. Li, “Ship detection in high-resolution optical imagery based on anomaly detector and local
shape feature,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 8,
pp. 4511–4523, 2014. doi: 10.1109/TGRS.2013.2282355.
[133] C. Tao, L. Mi, Y. Li, J. Qi, Y. Xiao, and J. Zhang, “Scene contextdriven vehicle detection in high-resolution aerial images,”

IEEE Trans. Geosci. Remote Sens., vol. 57, no. 10, pp. 7339–7351,
2019. doi: 10.1109/TGRS.2019.2912985.
[134] H. Sun, X. Sun, H. Wang, Y. Li, and X. Li, “Automatic target
detection in high-resolution remote sensing images using
spatial sparse coding bag-of-words model,” IEEE Geosci. Remote Sens. Lett., vol. 9, no. 1, pp. 109–113, 2011. doi: 10.1109/
LGRS.2011.2161569.
[135] C. Szegedy et al., “Going deeper with convolutions,” in Proc.
IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2015, pp. 1–9.
doi: 10.1109/CVPR.2015.7298594.
[136] S. Zhuang, P. Wang, B. Jiang, G. Wang, and C. Wang, “A single
shot framework with multi-scale feature fusion for geospatial
object detection,” Remote Sens., vol. 11, no. 5, p. 594, 2019. doi:
10.3390/rs11050594.
[137] S. Chen, R. Zhan, and J. Zhang, “Geospatial object detection in
remote sensing imagery based on multiscale single-shot detector with activated semantics,” Remote Sens., vol. 10, no. 6, p.
820, 2018. doi: 10.3390/rs10060.
[138] W. Li, R. Dong, H. Fu, and L. Yu, “Large-scale oil palm tree
detection from high-resolution satellite images using two-stage
convolutional neural networks,” Remote Sens., vol. 11, no. 1, p.
11, 2019. doi: 10.3390/rs11010011.
[139] W. Guo, W. Yang, H. Zhang, and G. Hua, “Geospatial object
detection in high resolution satellite images based on multiscale convolutional neural network,” Remote Sens., vol. 10, no.
1, p. 131, 2018. doi: 10.3390/rs10010131.
[140] X. Zhang et al., “Geospatial object detection on high resolution
remote sensing imagery based on Double multi-scale feature
Pyramid Network,” Remote Sens., vol. 11, no. 7, p. 755, 2019.
doi: 10.3390/rs11070755.
[141] H. Qiu, H. Li, Q. Wu, F. Meng, K. N. Ngan, and H. Shi, “A2RMNet: Adaptively Aspect Ratio Multi-Scale Network for object
detection in remote sensing images,” Remote Sens., vol. 11, no.
13, pp. 1–23, 2019. doi: 10.3390/rs11131594.
[142] X. Wu, D. Hong, P. Ghamisi, W. Li, and R. Tao, “MsRi-CCF:
Multi-scale and rotation-insensitive convolutional channel
features for geospatial object detection,” Remote Sens., vol. 10,
no. 12, p. 1990, 2018. doi: 10.3390/rs10121990.
[143] D. AL-Alimi, Y. Shao, R. Feng, M. A. Al-Qaness, M. A. Elaziz,
and S. Kim, “Multi-scale geospatial object detection based
on shallow-deep feature extraction,” Remote Sens., vol. 11,
no. 21, 2019.
[144] Z. Deng, H. Sun, S. Zhou, J. Zhao, L. Lei, and H. Zou, “Multiscale object detection in remote sensing imagery with convolutional neural networks,” ISPRS J. Photogram. Remote
Sens., vol. 145, pp. 3–22, Nov. 2018. doi: 10.1016/j.isprsjprs.
2018.04.003.
[145] Z. Li, H. Shen, Q. Cheng, Y. Liu, S. You, and Z. He, “Deep learning based cloud detection for medium and high resolution remote sensing images of different sensors,” ISPRS J. Photogram.
Remote Sens., vol. 150, pp. 197–212, Mar. 2019. doi: 10.1016/j.
isprsjprs.2019.02.017.
[146] C. Dong, J. Liu, F. Xu, and C. Liu, “Ship detection from optical
remote sensing images using multi-scale analysis and Fourier
HOG descriptor,” Remote Sens., vol. 11, no. 13, p. 1529, 2019.
doi: 10.3390/rs11131529.
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

[147] Y. You, Z. Li, B. Ran, J. Cao, S. Lv, and F. Liu, “Broad area target search system for ship detection via deep convolutional
neural network,” Remote Sens., vol. 11, no. 17, p. 1965, 2019.
doi: 10.3390/rs11171965.
[148] W. Kang, Y. Xiang, F. Wang, and H. You, “EU-Net: An efficient
fully convolutional network for building extraction from optical remote sensing images,” Remote Sens., vol. 11, no. 23,
p. 2813, 2019. doi: 10.3390/rs11232813.
[149] N. Mo, L. Yan, R. Zhu, and H. Xie, “Class-specific anchor
based and context-guided multi-class object detection in High
Resolution Remote Sensing Imagery with a convolutional
neural network,” Remote Sens., vol. 11, no. 3, p. 272, 2019. doi:
10.3390/rs11030272.
[150] W. Xie, H. Qin, Y. Li, Z. Wang, and J. Lei, “A novel effectively
optimized one-stage network for object detection in remote
sensing imagery,” Remote Sens., vol. 11, no. 11, p. 1376, 2019.
doi: 10.3390/rs11111376.
[151] G. Cheng, P. Zhou, and J. Han, “RIFD-CNN: Rotation-invariant and fisher discriminative convolutional neural networks for object detection,” in Proc. IEEE Conf. Comput. Vis.
Pattern Recognit. (CVPR), 2016, pp. 2884–2893. doi: 10.1109/
CVPR.2016.315.
[152] J. Bruna and S. Mallat, “Invariant scattering convolution networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp.
1872–1886, 2013. doi: 10.1109/TPAMI.2012.230.
[153] H. He, Y. Lin, F. Chen, H. Tai, and Z. Yin, “Inshore ship detection in remote sensing images via weighted pose voting,” IEEE
Trans. Geosci. Remote Sens., vol. 55, no. 6, pp. 3091–3107, 2017.
doi: 10.1109/TGRS.2017.2658950.
[154] G. Cheng, J. Han, P. Zhou, and D. Xu, “Learning rotationinvariant and Fisher discriminative convolutional neural
networks for object detection,” IEEE Trans. Image Process.,
vol. 28, no. 1, pp. 265–278, 2019. doi: 10.1109/TIP.2018.
2867198.
[155] Y. Zhou, Q. Ye, Q. Qiu, and J. Jiao, “Oriented response networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
(CVPR), 2017, pp. 4961–4970.
[156] K. Li, G. Cheng, S. Bu, and X. You, “Rotation-insensitive and
context-augmented object detection in remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 56, no. 4, pp. 2337–
2348, 2018. doi: 10.1109/TGRS.2017.2778300]
[157] M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, “Spatial transformer networks,” in Proc. Annu. Conf. Neural
Inf. Process. Syst., 2015, pp. 2017–2025.
[158] C. Chen, W. Gong, Y. Chen, and W. Li, “Object detection in
remote sensing images based on a scene-contextual feature
pyramid network,” Remote Sens., vol. 11, no. 3, p. 339, 2019.
doi: 10.3390/rs11030339.
[159] R. B. Girshick, F. N. Iandola, T. Darrell, and J. Malik, “Deformable part models are convolutional neural networks,” in Proc.
IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2015, pp. 437–
446. doi: 10.1109/CVPR.2015.7298641.
[160] W. Ouyang et al., “DeepID-Net: Deformable deep convolutional neural networks for object detection,” in Proc. IEEE Conf.
Comput. Vis. Pattern Recognit. (CVPR), 2015, pp. 2403–2412.
doi: 10.1109/CVPR.2015.7298854.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

[161] J. Dai et al., “Deformable convolutional networks,” in Proc.
IEEE Int. Conf. Comput. Vis. (ICCV), 2017, pp. 764–773.
[162] T. Mordan, N. Thome, G. Hénaff, and M. Cord, “End-to-end
learning of latent deformable part-based representations for
object detection,” Int. J. Comput. Vis., vol. 127, nos. 11–12, pp.
1659–1679, 2019. doi: 10.1007/s11263-018-1109-z.
[163] W. Ouyang and X. Wang, “Joint deep learning for pedestrian
detection,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2013,
pp. 2056–2063. doi: 10.1109/ICCV.2013.257.
[164] X. Wang, A. Shrivastava, and A. Gupta, “A-fast-RCNN: Hard
positive generation via adversary for object detection,” in Proc.
IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp.
3039–3048. doi: 10.1109/CVPR.2017.324.
[165] S. Zhang, J. Yang, and B. Schiele, “Occluded pedestrian detection through guided attention in CNNs,” in Proc. IEEE Conf.
Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 6995–7003.
doi: 10.1109/CVPR.2018.00731.
[166] N. Dalal and B. Triggs, “Histograms of oriented gradients for
human detection,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), 2005, pp. 886–893. doi:
10.1109/CVPR.2005.177.
[167] A. Shrivastava, A. Gupta, and R. B. Girshick, “Training regionbased object detectors with online hard example mining,” in
Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp.
761–769. doi: 10.1109/CVPR.2016.89.
[168] S. Zhang, L. Wen, X. Bian, Z. Lei, and S. Z. Li, “Single-shot refinement neural network for object detection,” in Proc. IEEE Conf.
Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 4203–4212.
[169] G. Zhang, S. Lu, and W. Zhang, “CAD-net: A context-aware detection network for objects in remote sensing imagery,” IEEE
Trans. Geosci. Remote Sens., vol. 57, no. 12, pp. 10,015–10,024,
2019. doi: 10.1109/TGRS.2019.2930982.
[170] J. Jin, K. Fu, and C. Zhang, “Traffic sign recognition with hinge
loss trained convolutional neural networks,” IEEE Trans. Intell.
Transp. Syst., vol. 15, no. 5, pp. 1991–2000, 2014. doi: 10.1109/
TITS.2014.2308281.
[171] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger,
“Densely connected convolutional networks,” in Proc. IEEE
Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 2261–
2269. doi: 10.1109/CVPR.2017.243.
[172] J. Pang, K. Chen, J. Shi, H. Feng, W. Ouyang, and D. Lin, “Libra
R-CNN: Towards balanced learning for object detection,” in
Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2019, pp.
821–830. doi: 10.1109/CVPR.2019.00091.
[173] G. Cheng et al., “Object detection in remote sensing imagery using a discriminatively trained mixture model,” ISPRS
J. Photogram. Remote Sens., vol. 85, pp. 32–43, Nov. 2013. doi:
10.1016/j.isprsjprs.2013.08.001.
[174] B. Cai, Z. Jiang, H. Zhang, D. Zhao, and Y. Yao, “Airport detection using end-to-end convolutional neural network with hard
example mining,” Remote Sens., vol. 9, no. 11, pp. 1–20, 2017.
doi: 10.3390/rs9111198.
[175] Y. Xu, M. Zhu, S. Li, H. Feng, S. Ma, and J. Che, “End-to-end airport
detection in remote sensing images combining cascade region proposal networks and multi-threshold detection networks,” Remote
Sens., vol. 10, no. 10, pp. 1–17, 2018. doi: 10.3390/rs10101516.

[176] M. Zhu, Y. Xu, S. Ma, S. Li, H. Ma, and Y. Han, “Effective airplane detection in remote sensing images based on multilayer
feature fusion and improved nonmaximal suppression algorithm,” Remote Sens., vol. 11, no. 9, p. 1062, 2019. doi: 10.3390/
rs11091062.
[177] G. Zhou and Y. Zhang, “Transfer and association: A novel detection method for targets without prior homogeneous samples,” Remote Sens., vol. 11, no. 12, p. 1492, 2019. doi: 10.3390/
rs11121492.
[178] Z. Chen, T. Zhang, and C. Ouyang, “End-to-end airplane detection using transfer learning in remote sensing images,” Remote
Sens., vol. 10, no. 1, pp. 1–15, 2018. doi: 10.3390/rs10010139.
[179] C. Liu, S. Li, F. Chang, and W. Dong, “Supplemental boosting
and cascaded ConvNet based transfer learning structure for
fast traffic sign detection in unknown application scenes,” Sensors, vol. 18, no. 7, p. 2386, 2018. doi: 10.3390/s18072386.
[180] R. Dong, D. Xu, J. Zhao, L. Jiao, and J. An, “Sig-NMS-based RCNN combining transfer learning for small target detection in
VHR optical remote sensing imagery,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 11, pp. 8534–8545, 2019. doi: 10.1109/
TGRS.2019.2921396.
[181] A. Chan-Hon-Tong and N. Audebert, “Object detection in remote
sensing images with center only,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2018, pp. 7054–7057. doi: 10.1109/
IGARSS.2018.8517860.
[182] B. Kellenberger, D. Marcos, S. Lobry, and D. Tuia, “Half a percent of labels is enough: Efficient animal detection in UAV
imagery using deep CNNs and active learning,” IEEE Trans.
Geosci. Remote Sens., vol. 57, no. 12, pp. 9524–9533, 2019. doi:
10.1109/TGRS.2019.2927393.
[183] R. G. Cinbis, J. J. Verbeek, and C. Schmid, “Weakly supervised
object localization with multi-fold multiple instance learning,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 1, pp.
189–203, 2017. doi: 10.1109/TPAMI.2016.2535231.
[184] D. P. Papadopoulos, J. R. R. Uijlings, F. Keller, and V. Ferrari, “We don’t need no bounding-boxes: Training object class
detectors using only human verification,” in Proc. IEEE Conf.
Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 854–863. doi:
10.1109/CVPR.2016.99.
[185] T. G. Dietterich, R. H. Lathrop, and T. Lozano-Pérez, “Solving
the multiple instance problem with axis-parallel rectangles,”
Artif. Intell., vol. 89, nos. 1–2, pp. 31–71, 1997. doi: 10.1016/
S0004-3702(96)00034-3.
[186] Y. Zhu, Y. Zhou, Q. Ye, Q. Qiu, and J. Jiao, “Soft proposal networks for weakly supervised object localization,” in Proc. IEEE
Int. Conf. Comput. Vis. (ICCV), 2017, pp. 1859–1868.
[187] A. Diba, V. Sharma, A. M. Pazandeh, H. Pirsiavash, and L. V.
Gool, “Weakly supervised cascaded convolutional networks,”
in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017,
pp. 5131–5139.
[188] B. Zhou, A. Khosla, À. Lapedriza, A. Oliva, and A. Torralba,
“Learning deep features for discriminative localization,” in
Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp.
2921–2929. doi: 10.1109/CVPR.2016.319.
[189] H. Bilen and A. Vedaldi, “Weakly supervised deep detection networks,” in Proc. IEEE Conf. Comput. Vis. Pattern

Recognit. (CVPR), 2016, pp. 2846–2854. doi: 10.1109/CVPR.
2016.311.
[190] L. Bazzani, A. Bergamo, D. Anguelov, and L. Torresani, “Selftaught object localization with deep networks,” in Proc. IEEE
Winter Conf. Appl. Comput. Vis. (WACV), 2016, pp. 1–9. doi:
10.1109/WACV.2016.7477688.
[191] Y. Shen, R. Ji, S. Zhang, W. Zuo, and Y. Wang, “Generative adversarial learning towards fast weakly supervised detection,”
in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018,
pp. 5764–5773. doi: 10.1109/CVPR.2018.00604.
[192] F. Zhang, B. Du, L. Zhang, and M. Xu, “Weakly supervised
learning based on coupled convolutional neural networks
for aircraft detection,” IEEE Trans. Geosci. Remote Sens., vol.
54, no. 9, pp. 5553–5563, 2016. doi: 10.1109/TGRS.2016.
2569141.
[193] L. Cao et al., “Weakly supervised vehicle detection in satellite
images via multi-instance discriminative learning,” Pattern
Recognit., vol. 64, pp. 417–424, Apr. 2017. doi: 10.1016/j.patcog.2016.10.033.
[194] Y. Li, Y. Zhang, X. Huang, and A. L. Yuille, “Deep networks
under scene-level supervision for multi-class geospatial object
detection from remote sensing images,” ISPRS J. Photogram.
Remote Sens., vol. 146, pp. 182–196, Sept. 2018. doi: 10.1016/j.
isprsjprs.2018.09.014.
[195] Y. Ren, C. Zhu, and S. Xiao, “Small object detection in optical
remote sensing images via modified Faster R-CNN,” Appl. Sci.,
vol. 8, no. 5, p. 813, 2018. doi: 10.3390/app8050813.
[196] X. Xiao, Z. Zhou, B. Wang, L. Li, and L. Miao, “Ship detection
under complex backgrounds based on accurate rotated anchor
boxes from paired semantic segmentation,” Remote Sens., vol.
11, no. 21, pp. 1–18, 2019. doi: 10.3390/rs11212506.
[197] Y. Gong et al., “Context-aware convolutional neural network
for object detection in VHR remote sensing imagery,” IEEE
Trans. Geosci. Remote Sens., vol. 58, no. 1, pp. 34–44, 2020. doi:
10.1109/TGRS.2019.2930246.
[198] W. Ma, Q. Guo, Y. Wu, W. Zhao, X. Zhang, and L. Jiao, “A novel
multi-model decision fusion network for object detection in
remote sensing images,” Remote Sens., vol. 11, no. 7, pp. 1–18,
2019. doi: 10.3390/rs11070737.
[199] K. Cho et al., “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” in Proc. 2014
Conf. Empir. Methods Nat. Lang. Process. (EMNLP), 1724–1734.
doi: 10.3115/v1/D14-1179.
[200] S. Bell, C. L. Zitnick, K. Bala, and R. B. Girshick, “Inside-outside
net: Detecting objects in context with skip pooling and recurrent neural networks,” in Proc. 2016 IEEE Conf. Comput. Vis.
Pattern Recognit. (CVPR), 2016, pp. 2874–2883. doi: 10.1109/
CVPR.2016.314.
[201] B. Zoph and Q. V. Le, “Neural architecture search with reinforcement learning,” in Proc. 5th Int. Conf. Learn. Represent.
(ICLR), 2017.
[202] Y. Feng, W. Diao, X. Sun, M. Yan, and X. Gao, “Towards automated ship detection and category recognition from highresolution aerial images,” Remote Sens., vol. 11, no. 16, pp.
1–23, 2019.
GRS
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

Hyperspectral
Image
Clustering
Current achievements
and future lines

HAN ZHAI, HONGYAN ZHANG,
PINGXIANG LI, AND LIANGPEI ZHANG
ST
ER
TT
HU
©S

O
CK
.C
OM
/S
ER
GE
YN
IVE
NS

yperspectral remote sensing organically combines traditional space imaging with advanced spectral measurement technologies, delivering advantages stemming
from continuous spectrum data and rich spatial information. This development of hyperspectral technology takes
remote sensing into a brand-new phase, making the technology widely applicable in various fields. Hyperspectral
clustering analysis is widely utilized in hyperspectral image
(HSI) interpretation and information extraction, which can
reveal the natural partition pattern of pixels in an unsupervised way. In this article, current hyperspectral clustering
algorithms are systematically reviewed and summarized in
nine main categories: centroid-based, density-based, probability-based, bionics-based, intelligent computing-based,
graph-based, subspace clustering, deep learning-based,
and hybrid mechanism-based. The performance of several
popular hyperspectral clustering methods is demonstrated
on two widely used data sets. HSI clustering challenges and
possible future research lines are identified.

Digital Object Identifier 10.1109/MGRS.2020.3032575
Date of current version: 19 January 2021

DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

THE NECESSITY FOR HSI CLUSTERING
Hyperspectral sensors can image an area of interest at a
nanometer spectral resolution and collect rich spectral
information to capture subtle differences among various
ground objects [1]–[3]. An HSI has a 3D cube structure,
containing tens and up to hundreds of bands, as shown in
Figure 1, to support the fine recognition of ground objects
[4]–[9]. This is good news in numerous applications, such
as mineral exploration [10], [11], vegetation monitoring
[12], [13], the quantitative inversion of physical and biological parameters [14], [15], military reconnaissance [16],
[17], and so forth. However, with such high-dimensional
data, the interpretation of HSIs commonly relies on a
great quantity of high-quality labeled samples to avoid the
Hughes phenomenon caused by having insufficient training examples and the underfitting problem that results
from the inadequate training of the classifiers [18]–[20].
Unfortunately, in practice, sample collection is commonly
time consuming, labor intensive, expensive, and inefficient, and, in some remote and uninhabited areas, training
samples can be unavailable, which greatly limits the applications of hyperspectral remote sensing. Therefore, it is necessary to develop unsupervised ground object recognition
0274-6638/21©2021IEEE

theory and methods to overcome the restrictions related to
labeled samples and prior knowledge.
Clustering is an effective unsupervised pattern recognition and information extraction technique, and it is a common means for HSI interpretation [21]–[25]. Hyperspectral
clustering groups similar pixels and separate dissimilar
pixels, with each assemblage corresponding to a certain
class, by fully mining the structural properties of hyperspectral data according to a
similarity criterion, such as
distance [26], [27], correlation
THIS DEVELOPMENT OF
[28], spectral angle [29], and
HYPERSPECTRAL
pair-wise pixel metrics [30].
TECHNOLOGY TAKES
Because no labeled samples
REMOTE SENSING INTO A
are required, clustering seems
BRAND-NEW PHASE,
more attractive in many apMAKING THE TECHNOLOGY
plications, in contrast to suWIDELY APPLICABLE IN
pervised classification. EspeVARIOUS FIELDS.
cially when there is no labeled
sample, clustering can be an
effective approach for ground
object recognition, improving the application potential of
hyperspectral remote sensing to a large degree.
HSIs have a much more complex internal structure than
handwritten figures, text, natural pictures, and multispectral images. In addition, there is a large spectral variability in HSIs, as pixels from the same class have different
spectra, given the complexity of the imaging environment.
Generally, in the high-dimensional feature space, the distribution of pixels is relatively sparse and uniform, with no
clear rules to follow. Accordingly, hyperspectral clustering
is commonly a more challenging task.
Hyperspectral clustering has experienced decades of development, and a great quantity of methods has been put
forward. However, to the best of our knowledge, very few
studies have systematically and comprehensively reviewed

FIGURE 1. The 3D cube structure of an HSI.

the current research status of hyperspectral clustering.
Therefore, in this article, we fill this gap and investigate the
current hyperspectral clustering methods in the literature
to provide a detailed summary and analysis of various clustering methods, and we discuss challenges and possible future directions.
REVIEW OF CURRENT HYPERSPECTRAL
CLUSTERING METHODS
Hyperspectral clustering generally includes two major
tasks, i.e., estimating the number of clusters and constructing the proper clustering model. However, studies
of the first task are relatively few in the hyperspectral
clustering field. In [31]–[33], the number of clusters is
automatically estimated by evolution algorithms and
by using statistical histograms. However, these methods are generally bound to specific clustering models,
such as the fuzzy c means (FCM) model [34], and are
not universally applicable. In addition, many densitybased models can automatically estimate the number of
clusters [35]–[37]. However, due to the inherent defects
of density-based clustering, such techniques are generally less effective when applied to HSIs, which will be
discussed in a later section. In some studies [38], [39],
the optimal number of clusters is determined by a series
of experiments. However, this strategy is time consuming and not practical in many use instances. In many
cases, the number of clusters is regarded as a manually
input parameter [21], [22], [40]–[44]. This number can
be determined by visually interpreting the original HSIs
[21], [41], which is simple and convenient but subjective
and not fully automated. More often, in practice, this
quantity is set as the number of classes in the ground
truth [22], [42]–[44].
Generally speaking, cluster number estimation has
always been an important topic in hyperspectral clustering research, while clustering model construction is the
core of hyperspectral clustering, whose reasonability and
effectiveness have a direct influence on the final clustering accuracy. Thus, the clustering methodology/model has
always been a focus in the HSI processing field, and most
of the existing work concentrates on clustering methodologies. In the article, we also focus on clustering models
and methods.
On the basis of the principle and the working mechanism, the current hyperspectral clustering methods can
be classified into nine main types: 1) centroid-based
methods, 2) density-based methods, 3) probabilitybased methods, 4) bionics-based methods, 5) intelligent
computing-based methods, 6) graph-based methods,
7) subspace clustering methods, 8) deep learning-based
methods, and 9) hybrid mechanism-based methods.
In practice, an HSI can be expressed as a 2D data matrix, i.e., Y = 6Y1, Y2, f, YMN@ ! R D # MN, with each column
denoting a pixel, where D and MN represent the number of bands and the number of pixels, respectively.
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

For hyperspectral clustering, in the case of c different
classes, the core task is to partition the pixels into c different groups based on a certain clustering model, with
each group corresponding to a certain class. Different
methods deal with the internal structure and the complexity of HSIs, with various model assumptions, which
determines their clustering effect to a large degree. A

taxonomy of the hyperspectral clustering methods considered in this article appears in Table 1.
CENTROID-BASED CLUSTERING METHODS
Centroid-based methods are the most classical and representative clustering approach, and they were also the earliest
to be introduced to HSI analysis [45], [46]. Such techniques

TABLE 1. THE TAXONOMY OF HYPERSPECTRAL CLUSTERING METHODS.
CATEGORY

MECHANISM

SUBCATEGORY

REPRESENTATIVE METHODS

Centroid

Assumes the cluster has a ball-like structure in the
feature space; clusters HSIs by iteratively minimizing the overall partition error

Hard partition

k-means [47], ISODATA
[49], NC-k-mean [52]

Soft partition

FCM [45], FCM-S1 [64],
FLDNICM [69]

Density

Assumes clusters are density point sets separated
by sparse areas in the feature space; clusters HSIs
based on the local density and relative distances
of pixels

—

CFSFDP [71], DAE [72],
SSDL [77]

Probability

Assumes pixels from the same class satisfy a probability distribution model; clusters HSIs based on a
probability rule

—

GMM [79], ICAMM [80],
CLDD [86]

Bionics

Simulates the complex internal structure of HSIs
with a certain biological model; clusters HSIs
through a biological evolution algorithm

—

SOM [88], UAIC [42],
UADSM [39]

Intelligent computing

Based on other clustering models; utilizes advanced intelligent computing algorithms to search
for the global optimal solution to the clustering
model

Single objective

FCIDE [92], MoDEFC [31],
PSO-GMM [93]

Multiple objective

AFCMDE [32], AFCMOMA
[94], MOPSO [38]

Models the similarity among pixels with an
adjacency matrix; clusters HSIs with a graph cut
algorithm

Complete graph

SC [105], SENP [106],
NLTV [107]

Bipartite graph

SSCC-BG [115], S-SC [116],
BGP-CJS [117]

Abbreviated graph

FSCAG [43], SGCNR [121]

Models the internal complex structure of HSIs via
the union of subspaces; explores the underlying
adjacency between pixels through self-representation learning; groups HSIs by applying spectral
clustering (SC) to the adjacent matrix induced by
the coefficient matrix

Spectral–spatial subspace
clustering

S 4C [40], L2-SSC [41], SSC3DEPF [128]

Multiple-view subspace
clustering

SSMLC [139], k-SSMLC [140],
p-SSMLC [141]

Kernel subspace clustering

KSSC-SMP [142],
KSLRSC [143]

Relies on deep neural networks to learn more
discriminative features for clustering and more
accurately simulate the nonlinearity of data

Autoencoder

DCN [147], DMC [148],
DSCNet [151]

Separated network

CCNN [155], DBNC [156],
JSL [159]

—

Generative network

CatGAN [162], DAGMC [164],
VaDE [166]

Hybrid mechanism

Deals with the clustering task by combining two or
more clustering models

—

k-GMM [168], k-FDPC [169],
SDCR [174]

Graph

Subspace clustering

Deep learning

ISODATA: iterative self-organizing data analysis technique algorithm; NC-k-mean: neighborhood-constrained k-means; FCM-S1: FCM with mean filtered spatial information; FLDNICM:
fuzzy local double neighborhood information c-means; CFSFDP: clustering by the fast searching and finding of density peaks; DAE: density analysis ensemble; SSDL: spectral–spatial
(SS) diffusion learning; GMM: Gaussian mixture model (MM); ICAMM: independent component analysis MM; CLDD: clustering based on the latent Dirichlet distribution; SOM: selforganizing map; UAIC: unsupervised artificial immune classifier; UADSM: unsupervised spectral matching classifier based on artificial deoxyribonucleic acid (DNA) computing; FCIDE:
fuzzy clustering (FC) using improved differential evolution (DE); MoDEFC: modified DE FC; PSO–GMM: particle swarm optimization-based GMM; AFCMDE: automatic FC based on
multiple-objective DE; AFCMOMA: adaptive multiple-objective memetic FC algorithm; MOPSO: multiple-objective PSO; SC: spectral clustering; SENP: Schroedinger Eigenmap with
nondiagonal potentials; NLTV: graph-based nonlocal total variation; SSCC-BG: SS coclustering based on a bipartite graph (BG); S-SC: sequential SC; BGP-CJS: BG partition-based
coclustering with joint sparsity; FSCAG: fast SC with anchor graph; SGCNR: scalable graph-based clustering with nonnegative relaxation; S 4C: SS sparse subspace clustering; L2-SSC:
, 2 -norm regularized sparse subspace clustering; SSC-3DEPF: SSC based on 3D edge-preserving filtering; SSMLC: SS-based multiple-view low-rank SSC; p-SSMLC: parallel SSMLC; DCN:
deep clustering network; DMC: deep multiple-manifold clustering; DSCNet: deep SSC based on an autoencoder network; CCNN: clustering based on a convolutional neural network;
DBNC: deep brief network nonparametric clustering; JSL: joint unsupervised learning; CatGAN: categorical generative adversarial network; DAGMC: deep adversarial Gaussian mixture
autoencoder clustering; VaDE: variational deep embedding; k-GMM: hybridization of the k-means and the GMM; k-FDPC: hybridization of the k-means and fast finding of density peaks
clustering; SDCR: sparse dictionary-based anchor regression.

DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

are based on the assumption that a cluster has a “ball-like”
structure in the feature space. Starting with random initializations, such methods iteratively update the centroids and
their associated pixel partitions until the overall partition
error meets the tolerance requirement or the number of iterations reaches the predefined
HYPERSPECTRAL
maximum value, as illustrated
CLUSTERING HAS
in Figure 2. Partition error is
EXPERIENCED DECADES OF
generally defined as the sum
DEVELOPMENT, AND A
of squared distances between
GREAT QUANTITY OF
the assigned pixels and the corMETHODS HAS BEEN PUT
responding centroids across all
FORWARD.
classes. Centroid-based clustering methods mainly include
two types, i.e., hard partition
clustering and soft partition clustering, based on whether a
pixel belongs to multiple classes or not.

min /

Y j - n i 22, (1a)

2
2

# Y j - n i 22 , i ! "1, f, c ,, (1b)

i=1 j=1

1
n i = n i / Yl ,
l=1

Y j - n i)

where n i denotes the centroid of the ith cluster and n i represents the number of pixels in the ith cluster. Specifically,
the k-means starts with randomly selected centroids and
then iteratively updates the cluster centroids, with each pixel Y j assigned to the nearest cluster centroid n i) based on
the distance metric, according to (1b) [48], until the cluster
centroids do not change or the total partition error in (1a)
does not significantly vary.
Based on the k-means, numerous improved methods
were developed. For example, the iterative self-organizing data analysis technique algorithm (ISODATA) was
proposed to improve the clustering effect by integrating
the dynamic adjustment mechanism of clusters into the
clustering process, and it was successfully applied to HSIs
[49]. In [50], a distributed k-means clustering method
was developed for HSIs to further improve efficiency and
practicability by employing the parallel computing technique. In [51], a kernel k-means was used for HSI feature
extraction, which conducts clustering in the much-higher-dimensional kernel space to relieve the nonlinearity of
HSIs. In addition, a neighborhood-constrained k-means

HARD PARTITION-BASED CLUSTERING
Hard partition-based methods allow each pixel to belong to
only one class and assign each pixel to the nearest cluster. A
typical example is the k-means [47], commonly considered
as the originator of clustering analysis and one of the earliest clustering methods applied to HSIs [46]. The principle of
the k-means is simple: it segments HSIs by minimizing the
partition error across all c classes, as in (1):

(a)

(b)

(c)

(e)

(d)

Iteratively Updating
the Centroids and
the Pixel Assignment

…

(f)

FIGURE 2. The centroid-based clustering mechanism. (a) The original pixel points. (b) The initialization (randomly selecting the centroids).
(c) The pixel assignment. (d) Updating the centroids. (e) Updating the pixel assignment. (f) The clustering result.

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

(NC-k-means) approach was put forward, inspired by the
clearly evident spatial correlation among neighboring pixels [52]. With a pure neighborhood index integrated into
(1), the spatial information of HSIs is incorporated to help
with the spectral analysis, and a much better clustering
result is obtained.
Furthermore, a two-stage k-means clustering technique
combined with a neighboring union histogram (k-NUH)
was developed, integrating the spatial information by the
NUH [53]. It divides HSIs into several uncorrelated groups
and computes the NUH of each collection based on the
first few principle components. Then, it employs a twostage k-means model to cluster HSIs from rough to fine.
Moreover, an improved k-means (I-means) algorithm was
proposed for HSI mineral mapping. It takes the spectral
information divergence as the similarity measurement and
initializes the centroids via three different strategies [54].
SOFT PARTITION-BASED CLUSTERING
Differing from hard partition-based approaches, soft
partition-based methods consider the uncertainty of the
pixel partitioning during the clustering process, allowing
each pixel to belong to multiple classes, which may be more
suitable for HSIs, due to the mixed pixel problem. Such
techniques assign a fuzzy membership to each pixel in the
range of [0, 1], with the sum of the memberships across all
c classes being equal to one. The most representative soft
partition-based clustering method is the FCM model [34],
[45], which can be formulated as in (2):
c

min /

/ U mi,j

Y j - n i 22 , (2a)

i=1 j=1

/ U mi,j Y j

ni =

j=1
MN

/ U mi,j

j=1

, U i, j =

Yj - ni

-2/^m - 1 h

Yj - nl

-2/^m - 1 h

, (2b)

l=1

where U denotes the fuzzy membership matrix, with each
element U i, j standing for the fuzzy membership of the jth
pixel belonging to the ith centroid; n i represents the ith
centroid, which can be updated according to (2b); and m is
the fuzzy exponent.
Based on the FCM, many enhanced methods were successively proposed. In [55] and [56], two weighted FCM
models were developed, i.e., fuzzy weighted c-means
(FWCM) and new weighed FCM (NW-FCM). These two approaches weight the similarity between neighboring pixels
and the center pixel, which effectively improves the clustering performance. In [57], an uncertainty analysis-based
FCM (UAFCM) algorithm was introduced. It detects pixels
that have a large uncertainty through entropy and squared
error-based criterion and reclassifies those pixels to refine
the clustering results. In addition, to address the nonlinearity of HSIs, a kernel FCM was used in HSI semisupervised
classification [58]. To overcome the sensitivity of the FCM
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

to initialization, an improved FCM algorithm based on the
support vector domain description (SVDD) was proposed
for HSIs [59]. It estimates the cluster centroids based on the
SVDD to reduce the influence of noise and outliers on the
centroids. Furthermore, in [33], an automatic histogrambased FCM (AHFCM) algorithm was developed. It obtains
the initializations and the number of clusters for the FCM
through two steps, clustering each band by calculating the
slopes in the histogram and automatically fusing the labeled images.
However, these techniques take only the spectral information into account, which are susceptible to noise and singular points and the spatial homogeneity of the clustering
result is difficult to guarantee. To overcome these obstacles,
a large number of enhanced FCM models that incorporate
spatial information were developed. A representative example can be found in the spatial model for fuzzy clustering
(SMFC) [60], with the formulation shown in (3):
c

min /

/ U mi,j

2
2

Yj - ni

i=1 j=1

b c MN
+ 2 / / U mi, j / / U mp, q, (3a)
i=1 j=1
p ! Mi q ! N j

/ U mi,j Y j

ni =

U i, j =

` Yj - n

c
l=1

j=1
MN

/ U mi,j

j=1

2
i 2

+ b / p ! Mi / q ! N j U j

` Yj - nl

2
2

m -1/^m - 1 h
p, q

+ b / p ! Mi / q ! N j U mp, q j

-1/^m - 1 h

, (3b)

where N j represents t he neig hbors of pi xel j, M i =
" 1, 2, f, c , \ " i ,, and b is a tradeoff parameter. By adding
a spatial penalty term to (3), the spatial neighborhood information is integrated to smooth the membership matrix,
which leads to a more accurate result.
In addition, in [61], a conditional FCM (C-FCM) algorithm was proposed. It simultaneously makes use of spectral–spatial information via the generalized multiplication
of the spatial information and the spectral information.
These methods have been successfully applied to HSIs [62].
To better utilize the spatial information, a neighborhood
constraint clustering (NCC) algorithm was put forward [62].
It exploits the local spatial information via a neighborhood
homogeneity index and obtains more smooth clustering results with a higher accuracy for HSIs. In addition, through
adding a spatial constraint term to (2), an FCM with spatial
information (FCM-S) algorithm was proposed [63]. It explores the spatial neighborhood information through a local window that is opened for each target pixel and obtains
much better performance compared to the FCM. However, the
FCM-S is computationally complex. To tackle this problem,
two improved versions were developed, i.e., the FCM-S1 and
FCM-S2, which, respectively, employ the mean filtered result
and the median filtered result to simplify the spatial information calculation [64]. These techniques were then successfully applied to HSIs [65]. However, the spatial regularization
39

pixels and the center pixel to accurately model the spatial
contextual information of HSIs. In this way, the clustering
accuracy is further improved.
Generally speaking, due to their simplicity and efficiency,
centroid-based methods are very popular in many practical
applications. However, centroid-based methods, in essence,
belong to the “mountain-climbing” algorithms, which are
easy to sink into the local optimal solutions [65], [70]. What
is worse, the “ball-like” structure assumption generally cannot be satisfied by HSIs, due to a complex internal structure
and a large spectral variability, which limit the approaches’
clustering performance to a large degree.

parameters are difficult to determine, and the global information is poorly utilized.
To overcome the drawbacks of these models, an adaptive memetic fuzzy clustering algorithm with spatial information (AMASFC) was proposed [65]. Through adaptively
determining the spatial regularization parameters based
on the information entropy and by simultaneously exploring the local information and the global information via
the memetic algorithm, the clustering accuracy is further
improved. Furthermore, a fuzzy approach with the spatial
membership relations (FASMR) algorithm was proposed
[66]. It incorporates the spatial information via a Gaussian
filter and explores the membership relations among pixels
in a local neighborhood. Moreover, by defining a fuzzy factor to integrate the spectral information and the local spatial
information and to avoid parameter determination, a new
fuzzy local information c-means clustering model (FLICM)
was developed [67]. It was then applied to HSIs [68]. However, the FLICM has drawbacks, such as fuzzy edges and poor
maintenance of spatial details.
Faced with these obstacles, an adaptive FLICM (ADFLICM) algorithm was put forward [68]. It constructs a pixel
spatial attraction model to adaptively measure the effects
of neighboring pixels through weighting, which better recognizes the boundaries among different classes and maintains the details. Then, by flexibly exploiting the local spatial information and the spectral information, an improved
version, the fuzzy local double neighborhood information
c-means clustering (FLDNICM) algorithm, was introduced
[69]. A fuzzy prior probability function is constructed based
on the mutual dependent information between neighboring

DENSITY-BASED CLUSTERING METHODS
Density-based clustering methods partition pixels according to density criteria, under the basic assumption that clusters are generally dense point sets separated by sparse areas
in the feature space. Such methods cluster HSIs based on
the local density and the relative distances of pixels, as detailed in Figure 3. A typical example is the clustering by the
fast searching and finding of density peaks (CFSFDP) algorithm [71]. It assumes that cluster centroids are surrounded
by pixel points that have a lower density and are relatively
far from pixel points with a higher density, computing two
quantities for each pixel, i.e., local density t i and relative
distance d i, as in (4), to search the optimal centroids:
t i = / | ^d ij - d c h, | ^ x h = (
j

di = (

1, if x 1 0
, (4a)
0, otherwise

min ^d ij h, j : t j 2 t i
, (4b)
max ^d ij h, t i = max ^t h

(b)
Pixel Assignment
0.5
0.4

Cluster Centroids

0.3
δ

(a)

0.2

(d)

0.1
0
0

120

160

(c)

FIGURE 3. The density-based clustering mechanism. (a) The HSI. (b) The density assumption for the clusters. (c) Searching the cluster
centroids based on the local density and the relative distances. (d) The clustering result.

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

where d ij denotes the distance between pixels i and j, and
d c represents the cutoff distance. Cluster centroids can be
found by constructing a decision graph, i.e., a d-t graph,
or determined by the measurement c i = t i # d i . Pixels
with significantly large d and relatively large t values in
the decision graph or with a significantly large c are considered to be centroids. Then, by assigning each pixel to
the nearest cluster centroid, the final clustering result can
be obtained.
To further improve the efficiency and accuracy of CFSFDP,
an enhanced version, i.e., density analysis ensemble (DAE)
clustering, was developed for HSIs [72]. The DAE uses a random subspace ensemble to establish a series of clustering
systems, with each individual system corresponding to a
density analysis. Subsequently, the final clustering result is
obtained by majority voting. Another representative method is the density-based spatial clustering of applications
with noise (DBSCAN) [73]. The core idea of DBSCAN is to
find pixels that have a higher density and connect them to
generate clusters. The approach was utilized for HSI band
selection and obtained good results [74]. In addition, the
mean shift (MS) is also a typical density-based model,
based on the rule of density gradient rising [75]. In [76], an
adaptive MS algorithm was put forward by integrating nonnegative matrix factorization (NMF) and bandwidth selection, which better segments HSIs.
In addition, in recent years, a series of nearest-neighbor
density-based clustering methods were developed for HSIs.
For example, the k-nearest-neighbor density-based clustering (KNNCLUST) method was proposed by extending
the k-nearest-neighbor (KNN) model to an iterative procedure to automatically estimate the number of clusters [35].
Each pixel is assigned based on its KNNs and the distances
to those neighbors by using the Bayes decision rule. Then,
KNNCLUST was applied to HSIs, and a stochastic extended
version, i.e., the kernel stochastic expectation maximum
(KSEM), was developed for HSIs [36]. The KSEM employs
KNNs to estimate the contextual class conditional distribution, which it iteratively updates with the posterior probability to account for the current clustering result. Then, the
KSEM defines the stopping criterion based on the clustering
entropy to make the conditional distribution converge to a
stationary clustering result. As a result, the KSEM outperforms KNNCLUST.
Moreover, a graph watershed clustering based on nearest
neighbors (GWNN) algorithm was introduced for HSIs to
alleviate the quadratic complexity of KNN estimation [37].
GWNN utilizes a labeling rule similar to KNNCLUST to account for the local density values and introduce a coarse-tofine multiresolution scheme, instead of a full KNN graph
computation with all pixels. Consequently, GWNN effectively enhances the efficiency of the model and obtains a
high clustering accuracy.
Furthermore, in [77], an unsupervised spectral–spatial
diffusion learning (SSDL)-based clustering algorithm was
proposed for HSIs. SSDL takes advantage of geometrical
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

estimation and diffusion-inspired labeling to excavate the
spectral–spatial duality of HSIs, based on the diffusion
distances. SSDL includes two main steps, i.e., finding the
cluster modes through density estimation and geomatic
analysis and assigning pixels to the corresponding modes
based on the spectral–spatial proximity. In addition, based
on SSDL, an enhanced spectral–spatial diffusion geometry
(SSDG)-based clustering method was developed [78]. SSDG
introduces the spatially regularized random walk strategy
to the diffusion construction, regularizes neighboring pixels by Markov diffusion, searches cluster modes via kernel
density estimation and the diffusion distance, and assigns
pixels based on the selected modes. As a result, SSDG further improves the clustering accuracy.
In a word, density-based methods are relatively robust to
noise and the shapes of clusters. In addition, many densitybased methods can automatically estimate the number of
clusters. However, the relatively sparse and uniform distribution of the high-dimensional feature space of HSIs makes
the assumption of the density-based clustering methods
not fully satisfied, which degrades the clustering effect to
a large degree.
PROBABILITY-BASED CLUSTERING METHODS
Probability-based clustering methods partition pixels based
on certain likelihood criteria. Such methods assume that
pixels from the same class generally obey a certain probability distribution, with each cluster modeled by a multivariate
conditional distribution with specific parameters and the
HSIs modeled by the joint probability distribution, as in Figure 4. Then, the final clustering result can be obtained by
maximizing the likelihood function based on a certain probability stipulation, such as expectation maximization (EM),
the maximum posterior probability, and the Bayesian rule.
A representative probability-based clustering method is the
Gaussian mixture model (GMM) [79]. The GMM is based
on the assumption that hyperspectral pixels generally satisfy
the Gaussian distribution, and it models each cluster with
a multivariate Gaussian conditional distribution, as in (5):
c

p ^ Yih = / Pj g ^ Yi | m j, C jh, (5a)
j =1

p ^ Y h = % p ^ Yih, (5b)
i =1

where g is a Gaussian probability density function (pdf)
and Pj is the prior probability of the jth cluster, with m j
and C j denoting the mean vector and the covariance matrix of the jth cluster. Then, according to a certain probability rule, such as EM, the GMM partitions pixels into c different clusters to obtain the final clustering result.
Considering that hyperspectral pixels commonly would
not strictly obey the Gaussian distribution, an independent
component analysis mixture model (ICAMM) was constructed for HSIs [80], [81]. The ICAMM represents each
cluster as a non-Gaussian distribution, as in (6):
41

p ^ Yi | Hh =

/ p^ Yi | ~ j , i jhP^~ jh,(6a)

j=1
MN

p ^ Y | H h = % p ^ Yi | H h, (6b)
i =1

where H = 6i 1, i 2, f, i c@ is the class parameter set and P ^~ j h is
the prior probability of the jth class ~ j . Then, the independent
components and the mixing matrix of each class are estimated
based on the modified information maximum model, and
the membership probability of each pixel to belong to various classes is computed. Based on the maximum membership
probability rule, the pixel partition result can be obtained
based on the ICA model. A weighted principle component
analysis ICA (WPCA-ICA) method was developed to extract
the independent features based on second- and higher-order
statistics, which performs better for HSIs [82].
Furthermore, in [83], a nonparametric stochastic expectation maximum (NPSEM) algorithm was proposed, which
extends stochastic EM to a nonparametric representation
to further improve the model’s practicability. The NPSEM
was then introduced to HSIs and performed well [36]. In
[84], a pairwise Markov field (PMF) model was constructed
to segment noisy and blurred astronomical HSIs. It integrates the PMF model into the Bayesian framework to optimize the probability model, and it segments HSIs based
on faint singles. In addition, to better learn the similarity among hyperspectral pixels, a layered sparse adaptive
possibility c-means clustering (LSAPCM) approach was
developed [85]. It integrates the layered possibility into the
FCM framework to extend the architecture to a probability
optimization model, and it produces good clustering results.
In [86], a novel clustering model based on the latent Dirichlet distribution (CLDD) was constructed by introducing the
topic model to simulate the structure of HSIs, with each topic
modeled by the LDD.
Moreover, considering that the mixed pixels of HSIs generally degrade the GMM performance, a Bayesian clustering method based on the spectral mixture model (SMM)

and the Markov random field (MRF) was put forward for
HSIs [87]. The Bayesian SMM-MRF utilizes the SMM to obtain the end-member abundance for each mixed pixel, and
it assigns the mixed pixel according to the dominant endmember. Subsequently, this method integrates the SMM
into the Bayesian framework to construct a conditional distribution of the mixed pixels to search for the dominant
end-member, with the MRF utilized to optimize the label
prior. Last, by solving the maximum posterior probability
problem based on the EM rule, the pixel partition result
is obtained. By considering the mixed pixel problem and
comprehensively utilizing spectral–spatial information,
the Bayesian SMM-MRF achieves good performance.
As a whole, probability-based clustering methods have
strict mathematical foundations and employ various probability theories to optimize the clustering model. However, the
complex internal structure and large spectral variability of HSIs
make hyperspectral pixels not strictly obey specific probability
distributions, and thus they are inconsistent with the assumptions of such methods. As a result, probability-based clustering
methods may fail to obtain good performance for HSIs.
BIONICS-BASED CLUSTERING METHODS
Bionics-based clustering methods employ certain biological models, such as artificial neural networks (NNs), to
simulate the complex internal structure of HSIs and partition pixels based on certain biological evolution algorithms, as described in Figure 5. A typical example is the
self-organizing map (SOM) model, which is an unsupervised learning method based on the Kohonen NN and
has been successfully applied to HSIs [42], [88]. The SOM
automatically learns the underlying similarity among the
input pixels and then puts similar pixels close together in
the network. The SOM generally consists of an input layer
and a competitive layer, with a learning stage and a clustering stage. In the learning stage, the winning neurons are
selected based on Euclidean distance, and then the weights
of the winning neurons and the neighboring neurons are

0.04
0.02
0
20

10
(a)

10
0 0
(b)

(c)

FIGURE 4. The probability-based clustering mechanism. (a) The HSI. (b) The probability model construction and optimization. (c) The

clustering result.
42

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

updated, as in (7). In the clustering stage, similar pixels are
mapped to the neighboring neurons:

DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

FIGURE 5. The bionics-based clustering mechanism. (a) The HSI. (b) The biological model construction. (c) The biological evolution optimization. (d) The clustering result.

(b)
(a)

Artificial DNA Model

Evolution
Affinity Threshold
Artificial Immune Network

Fitness Value

(c)

Reproduction

Crossover

Mutation

Operators
Population
Memory Cell
Antigen

New Memory Cell

where W kij denotes the weight between neurons i and j in
the kth iteration, TW ij stands for the weight gains, d means
the Euclidean distance, I ^ $ h represents the activated neuron, h is the learning rate, and v is the kernel parameter.
To better simulate the complexity of HSIs, many advanced biological models have been constructed. For example, in [42], an unsupervised artificial immune classifier
(UAIC) was proposed. The UAIC utilizes an artificial immune system to simulate the complex internal structure of
HSIs and employs a series of biological computation techniques, such as clonal selection, immune network, and immune memory, to partition pixels. Specifically, cluster centroids are randomly selected, and each pixel is assigned to
a cluster with the maximum affinity between antigens and
antibodies. An immune evolution algorithm is utilized to
update the antibody population and the memory cell (MC)
pooling until convergence. As a result, the UAIC obtains
a relatively good result for HSIs. Then, an enhanced version of the UAIC, i.e., an unsupervised artificial immune
network for remote sensing classification (RSUAIN) was
constructed to further improve the clustering performance
[89]. Instead of utilizing the distance threshold scalar to update the MC pooling and constrain the number of MCs, the
RSUAIN introduces two immunological parameters, i.e.,
the death rate and the suppression rate, to update the MC
matrix and determine the structure of the network by controlling the connection of network cells. Then, the RSUAIN
forces each class to have an inner network connection and
enhances the diversity of the MC population via a suppression rate to improve the evolution quality.
In addition, considering the large volume, high dimension, and spectral diversity of HSIs, an unsupervised
spectral-matching classifier based on artificial deoxyribonucleic acid (DNA) computing (UADSM) was put forward [39]. The UADSM employs an artificial DNA model
to simulate the complexity of HSIs, and it clusters pixels
through a series of artificial DNA computing techniques,
including DNA spectral coding, optimization, and matching. The UADSM extracts multiple spectral features,
such as the shape, amplitude, and slope, to enhance the
discriminability of the features and optimizes clusters by
recombining DNA strands. Based on the normalized DNA
spectral similarity, the spectral signature of each pixel is
assigned to the corresponding cluster to obtain the clustering result.
Moreover, in [90], a novel context-aware unsupervised
discriminative extreme learning machine (CUDELM) algorithm was developed for HSIs. The CUDELM introduces the
extended NN, i.e., the ELM, to efficiently learn the structural information. Then, local spectral–spatial information

(d)

(7)

Closest Antibody

W kij + 1 = W kij + TW ij ,

TW ij = h exp _ - d 2j, I^ Yih /2v 2 i ^ Y j - W ij h,

is incorporated into the hidden layer features via a contextaware propagation filter, and the local and global structural
information is integrated through regularization to learn
more discriminative features. Consequently, the CUDELM
yields accurate clustering results for HSIs. Besides, in [91],
a new weighted incremental NN (WINN) method was developed for HSI segmentation. The WINN models the topology of pixels by using a set of weighted nodes, with the
weights determined by the local density, and clusters the
net through a watershed-like procedure to obtain the final
clustering result.
On the whole, bionics-based clustering methods can effectively simulate the internal complexity of HSIs to some
degree, and they may produce accurate clustering results
by employing advanced biological evolution algorithms.
However, these methods still face obstacles. For example,
the complex structure of HSIs cannot always be well fitted by specific biological models, in practice, and the large
spectral variability further reduces the modeling accuracy,
which limits the clustering performance.
INTELLIGENT COMPUTING-BASED
CLUSTERING METHODS
Intelligent computing-based clustering methods are generally founded on other clustering models, such as the centroid-based clustering model, and utilize some advanced
intelligent computing algorithms, such as genetic evolution, differential evolution, and particle swarm optimization (PSO), to search for the global optimal solution of the
clustering model and further improve the clustering performance, as presented in Figure 6. According to the number
of objective functions in the optimization problem, intelligent computation-based clustering methods can be further
divided into two types: 1) single-objective-based clustering
and 2) multiobjective-based clustering.

SINGLE-OBJECTIVE-BASED CLUSTERING
The single-objective-based clustering method has only a
single objective function in the optimization problem, with
an intelligent computing technique utilized to search for
the global optimal solution. A representative single-objective-based clustering method is fuzzy clustering using an
improved differential evolution (FCIDE) algorithm [92]. It
introduces a certain validation index as the fitness function
and searches for the optimal solution based on the differential evolution algorithm. Specifically, FCIDE utilizes the
clustering separation (CS) measure or the Davis–Bouldin
(DB) measure as the validation index to define the fitness
function, as in (8):
f=

1
1
, (8)
or f =
CS i ^K h + eps
DB i ^K h + eps

where K denotes the number of clusters and eps is an adjustment factor. The definition of CS and DB can refer to [92].
Based on FCIDE, in [31], a modified differential evolution fuzzy clustering (MoDEFC) algorithm was put forward
to further improve the clustering performance. MoDEFC
constructs a model using the Xie–Beni index as a validation
index. FCIDE and MoDEFC were then introduced to HSIs,
delivering good performance [32]. In addition, the AMASFC method employs the memetic algorithm to combine local and global information to search for the optimal solution, and it further improves the clustering accuracy [65].
Moreover, considering that the GMM–EM easily falls into
the local optimal solution, a novel PSO-based GMM clustering (PSO-GMM) method was developed for HSIs [93]. It
uses the advanced PSO algorithm instead of EM to search
for a global optimal solution and improves the parameterization and parameter updating approaches to overcome
the degeneracy problem. Consequently, the clustering accuracy is effectively improved.

Z2
Z1

Individuals
Pareto Front

Global Optimum

Evolution Algorithm
Single-/
Multiobjective
Model
(a)

(b)

(d)

Particle Swarm Algorithm
(c)

FIGURE 6. The intelligent computing-based clustering mechanism. (a) The HSI. (b) The clustering model construction. (c) Intelligent comput-

ing. (d) The clustering result.
44

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

MULTIOBJECTIVE-BASED CLUSTERING
Multiobjective-based clustering methods generally address
more than one optimization problem and simultaneously
search for optimal solutions based on certain intelligent computing techniques. Compared with the singleobjective-based clustering methods, multiobjective-based
clustering approaches are more popular and generally perform better, as they consider numerous factors at the same
time, e.g., spectral and spatial information, local and global information. A representative example is the automatic
fuzzy clustering based on the multiobjective differential
evolution (AFCMDE) algorithm [32]. It extends the MoDEFC model to a multiobjective version for an improved
ability to learn the complexity of remote sensing images,
with two objective functions included, i.e., the partition error and the Xie–Beni index, as in (9):
min f ^ Y h = 6 f1 ^ Y h, f2 ^ Y h@, (9a)
c

f1 = /

/ / U mi,j

m
i, j

Yj - n

i =1 j =1

2
i 2

, f2 =

Yj - ni

i =1 j =1

MN min i ! k n i - n k

2
2
2
2

, (9b)

/ U mi,j Y j

ni =

j=1
MN

/ U mi,j

j =1

, U i, j =

Yj - ni

-2 /^m - 1 h

Yj - nl

-2/^m - 1 h

. (9c)

l =1

Specifically, AFCMDE consists of two layers, i.e., optimization and clustering. In the optimization layer, a feasible number of clusters is obtained by minimizing these two objective
functions. In the clustering layer, a nondominated sorting
method is utilized to update the population and search the
Pareto front to obtain the final clustering result. Through
multiobjective optimization, AFCMDE outperforms MoDEFC. Then, based on AFCMDE, a multiobjective memetic FCM
algorithm (AFCMOMA) was presented to further improve
the optimization capability of the model [94]. The approach
introduces the memetic algorithm to balance the local and
global search ability and adds a new population-updating
strategy to obtain more high-quality individual samples. As a
result, the clustering accuracy is further improved.
In addition, a novel social recognition-based multiobjective gravitational algorithm (SMGSA) was developed for
HSIs to learn the similarity relationships among pixels [95].
The SMGSA algorithm searches individual pixels among the
elite ones obtain by the gravitational force and the general
ones learned from the social recognition model, based on
the whole population, to generate an outstanding exploitation ability. Furthermore, in [38], a novel multiobjective
PSO (MOPSO) method was proposed for HSIs to simultaneously solve three problems, i.e., clustering the statistical
parameter estimation, searching for the best discriminative
bands, and estimating the number of clusters, using three
different optimization criteria. Moreover, based on the advanced sparse subspace clustering (SSC) model, a multiobjective SSC (MOSSC) method was put forward for HSIs. It
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

treats the sparse constraint term and the data fidelity term as
two objective functions to avoid the manual determination
of the regularization parameter, as in the SSC model [96].
Commonly, with the help of advanced intelligent optimization algorithms to search for the global optimal solution to the clustering model, intelligent computing-based
clustering methods may perform better than traditional
clustering approaches. However, such techniques still have
several disadvantages that limit their practical applications
to some degree. For example, the principle of such methods is relatively complex, with a high application threshold.
In addition, such techniques are generally based on other
clustering models, and their performance is limited by the
inherent defects of the foundation clustering models, such
as FCM and GMM.
GRAPH-BASED CLUSTERING METHODS
Graph-based clustering is one of the recently developed advanced hyperspectral clustering approaches that is evolved
from graph theory. Such methods generally model the relationships among hyperspectral pixels with an adjacency
u ! R MN # MN, also known as a similarity graph,
matrix W
whose element represents the similarity between a corresponding pair of pixels or the penalty factor when separating the corresponding two pixels into different subgraphs.
The adjacency matrix is the basis of graph clustering. The
quality of the matrix directly affects the final clustering
accuracy. In practice, it is generally constructed by the
f- ball strategy [97], the KNN strategy [98], and the full connection strategy [99]. Then, by applying a certain graph cut
algorithm to minimize the total cutting cost of the adjacency matrix, the final clustering result can be obtained, as
shown in Figure 7.
Specifically, graph cut is a very important part of graph
theory. It aims to segment a graph into several disjoint and
distinctive subgraphs by maximizing the intrasubgraph
similarity and minimizing the intersubgraph similarity,
with each subgraph denoting a specific class. With decades
of development, many graph cut algorithms have been developed, including minimum cut [100], radio cut [101], normalized cut [102], average cut [103], minimum–maximum
cut [104], and so on. Among them, the normalized cut algorithm is the most widely used. According to the differences
among the constructed graphs, graph-based clustering
methods can be coarsely divided into three main kinds: 1)
complete graph-based clustering, 2) bipartite graph-based
clustering, and 3) abbreviated graph-based clustering.
COMPLETE GRAPH-BASED CLUSTERING
Complete graph-based clustering methods group HSIs based
u that consists of all pixels; the maon an adjacency matrix W
trix contains the similarity between any pair of pixels, at a size
of MN # MN. A typical example is spectral clustering (SC)
[102], [105], which generally employs the normalized cut
algorithm to conduct graph cutting, with a spectral analysis
model formulated as the following optimization problem:
45

min Tr ^F T LFh, (10)

0.6

(b)

0 0.7 0.8 1

(a)

0
v6 0

2
0 0.8 1 0.8
v5 0.2 0

0 0.3 1 0.8 0.7

0
v3 0.6 0.8 1 0.3 0

v4 0

0
v2 0.8 1 0.8 0

0.8

1
v1 1 0.8 0.6 0 0.2 0

v1 v2 v3 v4 v5 v6

FIGURE 7. The graph-based clustering mechanism. (a) The HSI. (b) The adjacency matrix. (c) The graph cut. (d) The clustering result.

0.3

0.8
0.2

Cut Edge

0.7

0.8

(d)

FT F = I

u , and
where L is the graph Laplacian matrix, i.e., L = D - W
D is the degree matrix, which is a diagonal matrix with the
u ij . SC commonly solves
diagonal element being D ii = / j W
the optimization problem via singular value decomposition (SVD). By extracting the c eigenvectors corresponding to the c smallest eigenvalues, an optimal F can be obtained, where c is the number of clusters. Then, by applying
k-means to F, the final clustering result can be obtained.
In [106], a novel Schroedinger eigenmap with nondiagonal potentials for a spectral–spatial clustering (SENP)
algorithm was proposed for HSIs. The approach employs a
Schroedinger eigenmap, which is an extension of the graph
Laplacian matrix, to integrate barrier and cluster potentials
to accurately model the similarity between pixels. Then,
different kinds of nondiagonal potentials are explored
within the model to encode the spatial proximity and integrate the spectral proximity through manifold learning.
As a result, the graph discriminability is enhanced, and a
more accurate clustering result is obtained. In addition,
in [107], a graph-based nonlocal total variation (NLTV)
method was developed. It explores the spatial information
of HSIs with an NLTV constraint to construct a more accurate similarity graph, and it introduces the primal–dual
hybrid gradient algorithm to efficiently solve the graph cut
problem. Consequently, NLTV obtains accurate clustering
results for HSIs.
Furthermore, a joint spectral–spatial clustering with a
block-diagonal amplified affinity matrix (JC-BAAM) algorithm was proposed. It considers the size and shape differences of the spatial neighborhoods of different hyperspectral pixels to promote the block-diagonal property of
the affinity matrix and increase the separability between
different classes [108]. Besides, by paying special attention
to small variations in data density and scaling the clusters
based on the latent structure, a novel graph-based clustering (GC) algorithm was developed for HSIs. It obtains a better effect for small classes that have few pixels [109]. In addition, a graph clustering-based method was put forward to
solve semisupervised and unsupervised classification problem for HSIs [110]. It constructs a pairwise pixel similarity
graph and develops a parallel Nyström extension model
that randomly samples the graph to obtain a low-rank approximation of the graph Laplacian for SC.
Moreover, some other extended graph-based clustering
models, i.e., manifold-based models, were developed for
HSIs. For example, in [111], a multimanifold SC (MMSC)
algorithm was proposed for HSIs that constructs a nearestneighbor connectivity model based on the shared nearest neighborhood and estimates the tangent space with
a weighted principal component analysis (PCA). Then,
an enhanced MMSC, i.e., contractive autoencoder-based
MMSC (CA-MMSC), was developed for HSIs to estimate
the tangent space via a contractive autoencoder and obtain
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

better performance [112]. In [113], a rank 2 NMF-based hierarchical clustering (H2NMF) algorithm was developed
for HSIs. It first treats all pixels as a cluster and then splits
one cluster into two disjoint clusters using a rank 2 NMF
model until obtaining stable results.
In addition, in [114], an orthogonal graph-regularized
NMF (OGNMF) method was introduced. It combines the
orthogonal graph constraints with the NMF model to learn
the local structure information of HSIs and achieves a relatively good clustering effect. In addition, a robust manifold
factorization-based clustering (RMFC) algorithm was proposed for HSIs [22]. It employs a low-rank matrix factorization framework to simultaneously deal with the dimension
reduction (DR) task and the clustering task, with manifold
regularization to enhance the robustness of the clustering
model. With the help of the out-of-sample extension trick,
it can be extended to large HSIs.
BIPARTITE GRAPH-BASED CLUSTERING
Bipartite graph-based clustering is an extended version of
complete graph-based clustering, and it has been successfully applied to HSIs to obtain good effect. In contrast to
the completed graph, the bipartite graph models the relationships between two different sets, i.e., the anchor set
and the pixel set, to obtain a structured similarity matrix
t ! R^MN +nh # ^MN + nh at a larger size, as in (11):
W
t = c 0T A m . (11)
W
A 0
Here, A is generally constructed based on Gaussian kernel
distances, with the KNN strategy utilized, as in (12):
A ij = )

2
exp _ - d Yi - Yt j i, Yi ! M k ^ Yt jh or Yt j ! M k ^ Yih
, (12)
0,
otherwise

where Yt ! R D # n is the anchor matrix derived from the HSI
matrix and d is the kernel parameter.
A representative bipartite graph-based clustering method
is spectral–spatial coclustering based on a bipartite graph
(SSCC-BG) [115]. It extracts anchors from the cluster centroids of k-means and then constructs a bipartite graph
between centroids and pixels. SSCC-BG obtains good clustering results for HSIs by fusing spectral information and
spatial information into the graph. In addition, in [116], a
sequential SC (S-SC) method was developed to efficiently
cluster HSIs. It employs the minibatch k-means to determine the anchors and conduct cluster assignments. Then,
based on the bipartite graph, S-SC utilizes the sequential
SVD for the product of the rows and columns of A T A , instead of directly decomposing it, which effectively reduces
the computational complexity and improves the efficiency
of the model.
Furthermore, in [117], a novel bipartite graph partitionbased coclustering with joint sparsity (BGP-CJS) was put
forward for HSIs. The technique builds a more informative
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

bipartite graph with a learned A from the joint-sparsity-constrained optimization problem. Then, an efficient spectral
graph-based normalized cut method is proposed to simultaneously cluster the rows and columns of the similarity matrix. Consequently, the BGP-CJS further improves the clustering accuracy.
ABBREVIATED GRAPH-BASED CLUSTERING
To overcome the large computational complexity of complete graph-based clustering, efficient abbreviated graphbased clustering has been developed, which selects only a
few important and representative points, i.e., key points, to
construct a similarity graph at a much smaller size of n # n,
where n is the number of key points. In practice, the abbreviated graph is generally induced by the anchor graph.
Hence, anchor graph-based clustering methods can be the
most representative abbreviated graph-based clustering
models. In many recent studies, the anchor graph is utilized
to evaluate the similarity among pixels, instead of the complete-graph [118], [119].
Such methods generally include two main steps, i.e.,
anchor selection and relation matrix construction. The representative pixels or cluster centroids are considered as anchors, which are commonly obtained via random selection
or preclustering. The relation matrix A ! R MN # n models the
relationships among anchors and pixels, and it is generally constructed based on certain similarity measurements,
such as the Gaussian kernel distance. In [120], A is taken as
a variable and obtained by learning, as in (13), which leads
to a more accurate A:
min

A1=1, A ij $ 0

MN n

i =1 j =1

Yi - Yt j

2
2

A ij + c A 2F , (13)

where c is the regularization parameter and 1 ! R n #1
is a vector whose elements are all ones. With the obu can be constructed as
tained A , the adjacency matrix W
u = A K -1 A T , where K ! R n # n is a diagonal matrix with
W
MN
the diagonal element being K jj = / i =1 A ij . Such methods
are generally much more efficient, and they are more scalable to large HSIs, given their small computing demands.
However, because only a few key points are utilized to approximate the structure information of HSIs, the underlying adjacency among pixels cannot be accurately mined.
Consequently, the clustering accuracy of such methods is
generally discounted.
A typical example of the abbreviated graph-based clustering method is the fast SC with an anchor graph (FSCAG)
algorithm [43]. To ensure the clustering efficiency, FSCAG
randomly selects the anchors from the original hyperspectral pixels. Then, the relation matrix A is learned from (13),
with a spatial constraint based on the mean filtered results
of HSIs, inspired by FCM-S1, to incorporate spatial information into the anchor graph. Last, through spectral analu induced by A , as in (10),
ysis of the adjacency matrix W
the final clustering result is yielded. In addition, in [121],
47

a scalable graph-based clustering with nonnegative relaxation (SGCNR) algorithm was proposed. It learns A from
u . Then, through
(13) to construct the adjacency matrix W
adding an additional nonnegative constraint to the spectral
analysis model to more accurately relax it from the discrete
case to the continuous case, improved clustering results can
be obtained.
In summary, because of the flexible graph construction
means, powerful structure information mining ability,
and relatively robust clustering performance, graph-based
clustering methods have drawn wide attention and become
one of the research hot spots in the hyperspectral clustering field. However, they are generally restricted by computational complexity, and they need to strike a compromise
between accuracy and efficiency, as in abbreviated graphbased clustering. In addition, due to the inadequate consideration of the interactions among pixels during the graph
construction process and the influences from the large
spectral variability and high correlations among hyperspectral pixels, such techniques generally cannot accurately
mine the underlying adjacency among pixels, which limits
their clustering performance to a certain degree.
SUBSPACE CLUSTERING METHODS
Subspace clustering is another recently developed advanced
hyperspectral clustering approach founded on graph-based
clustering models. Such methods generally model same-class
pixels that have various spectral signatures with a subspace
and approximate the complex internal structure of HSIs by
a union of subspaces, as detailed in Figure 8, which may relieve the large spectral variability and improve the modeling

accuracy. Then, such methods explore the underlying adjacency among pixels through self-representation learning via
an overcompleted dictionary derived from the HSI data, with
a certain prior structural constraint utilized for the representation coefficient matrix to obtain stable solutions, as in (14). By
fully exploring the interactions among pixels and the contribution of each atom to the target pixel, the learned adjacency
matrix may be more accurate and informative, and it may
guarantee that pixels are segmented into the correct subspaces:
min C G ^C h subject to Y = YC + N, (14)
where C ! R MN # MN is the representation coefficient matrix,
which contains pairwise-pixel similarity and reveals the latent partition pattern of pixels to a certain degree. Here, G ^ $ h
denotes the certain prior structural constraint for C , including the sparse constraint [122], low-rank constraint [123],
energy constraint [124], and so on. Then, the adjacency mau can be induced by the coefficient matrix C , such as
trix W
u = C + C T . Last, by employing SC to the adjacency maW
trix, the final clustering result can be obtained [125], [126].
The most representative subspace clustering model can
be SSC [122], [127], which exploits the underlying adjacency among hyperspectral pixels by solving the following
sparsity-promoting optimization problem, based on the
basic assumption that each target pixel can be recovered by
only a few atoms from its own subspace in the HSI selfdictionary. The SSC model can be formulated as in (15):
min C C

m
+ 2 Y - YC

2
F

C T 1 = 1,

subject to diag ^C h = 0,

(15)

7,000
6,000

DN Value

5,000
4,000
3,000
2,000
1,000
0
0

100
Band Number

150

200

Spectral Variability

A
Subspace

A Union of
Subspaces

Spectral
Clustering

(b)
Structural Prior Constraint
MN

(a)

(d)
D

× MN

Hyperspectral
2D Matrix Y

(e)

Dictionary
Y

Coefficient
Matrix C
Spatial Regularization

(c)
FIGURE 8. The subspace clustering mechanism. (a)The HSI. (b) Subspace modeling. (c) Self-representation learning. (d) The similarity

graph. (e) The clustering result. DN: digital number.
48

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

where m is a regularization parameter to balance the sparsity
term and the data fidelity term, diag ^Ch = 0 is to avoid the
trivial solution caused by representing each pixel by itself,
and C T1 = 1 means that the affine subspace model is adopted.
Although SSC has shown significant potential in hyperspectral clustering, due to some shortcomings, e.g., ignoring
the importance of the spatial information and the nonlinearity of HSIs, the clustering performance is still limited. Based
on this fact, in recent years, many enhanced subspace clustering algorithms have been proposed to further improve the
clustering performance and exploit the potential of subspace
clustering. On the basis of the working mechanism, such
methods can be coarsely summarized into three main categories: 1) spectral–spatial subspace clustering, 2) multiview
subspace clustering, and 3) kernel subspace clustering.
SPECTRAL–SPATIAL SUBSPACE CLUSTERING
Spectral–spatial subspace clustering methods focus on exploring the spectral–spatial duality of HSIs within the selfrepresentation framework to reduce the influence of saltand-pepper noise and enhance the spatial homogeneity of
the clustering result. By incorporating spatial information
to help spectral analysis in the representation domain, the
piecewise smoothness of the representation coefficient matrix can be effectively enhanced, and the representation bias
can be reduced to some degree. As a result, the clustering
performance can be effectively improved. In general, with a
certain spatial constraint, the spectral–spatial subspace clustering model can be formulated as follows:
min C G ^C h + aR ^C h subject to Y = YC + N, (16)

where R^ $ h denotes the spatial regularization term and a
is a regularization parameter to trade off the importance
between the spectral term and the spatial term.
A typical example is the spectral–spatial SSC ^S 4 C h algorithm [40]. It promotes the target pixels to be represented
by highly related atoms via a weighting strategy and incorporates the spatial neighborhood information to generate
an integrated self-representation model by constructing an
eight-neighborhood local average spatial regularization,
based on the assumption that the average coefficient in the
local small window should be close to the coefficient of the
center pixel. Considering that the assumption of the local
average constraint cannot be satisfied in areas with a complex land cover distribution, a new , 2- norm regularized SSC
(L2-SSC) algorithm was proposed [41]. It incorporates spatial information in a more refined way by constructing an
efficient four-neighborhood , 2- norm spatial regularization,
which further improves the clustering performance.
In addition, in [128], a spectral–spatial SSC based on 3D
edge-preserving filtering (SSC-3DEPF) algorithm was put forward. It utilizes 3D edge-preserving filtering for the sparse coefficient matrix obtained by SSC to extract the spectral–spatial
information to generate a more accurate coefficient matrix,
which is favorable for clustering. In [129], a joint SSC (JSSC)
method was proposed to make use of spatial information
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

through joint sparse representation. It forces the pixels in a
spatial neighborhood to share the same sparse basis. In [44],
based on the sparse coefficient matrix learned by SSC, two enhanced methods were put forward to construct a more accurate
adjacency matrix, i.e., cosine–Euclidean (CE) and CE dynamic
weighting (CEDW). These two methods simultaneously utilize
the spectral and spatial information, with the cosine similarity
exploited to measure the spectral similarity and Euclidean distances utilized to incorporate the spatial information.
Moreover, in [130], a Laplacian-regularized low-rank
subspace clustering (LLRSC) algorithm was proposed. It
incorporates three different Laplacian regularizations into
the low-rank subspace clustering (LRSC) model to explore
the importance of the correlation information of HSIs, and it
achieves good performance in HSI band selection. In [131],
a spectral–spatial LRSC (SS-LRSC) model was developed. It
utilizes a new modulation strategy to incorporate the correlations into the low-rank representation matrix through weighting and local spatial bilateral filtering, which performs well for
HSIs. Furthermore, in [132], a Gaussian kernel dynamic similarity matrix-based SSC (GKD-SSC) method was introduced.
It improves the quality of the adjacency matrix by simultaneously utilizing the sparse coefficient matrix obtained by SSC
and the Gaussian kernel similarity based on the distances
between pixels after PCA processing.
Considering the large computational complexity of
sparse recovery-based methods, a novel total variation
(TV)-regularized collaborative representation clustering algorithm with a locally adaptive dictionary (TV-CRC-LAD)
was proposed for HSIs [21]. This approach exploits the collaborative and competitive relationships among pixels from
all classes in the self-representation process and deals with
the clustering task within the collaborative representation
framework, with less complexity. Then, it reduces the serious
interferences from unrelated atoms in the whole dictionary
by constructing a locally adaptive dictionary for each target pixel and integrates spatial information to enhance the
piecewise smoothness of the coefficient matrix via the TV
regularization. As a result, the TV-CRC-LAD may perform
better for HSIs.
In addition, to overcome the large computational complexity and the time and memory cost of the self-dictionarybased methods, a sketched subspace clustering method was
developed [133]. It conducts the self-representation learning
under a sketched dictionary with much fewer atoms obtained
by random projection, which reduces the computational
complexity and enhances the scalability of the model to a
large degree. Then, the sketched subspace clustering method
was introduced to HSIs, and based on it, through the TV constraint to incorporate spatial information, an enhanced method was proposed, i.e., TV sketched subspace clustering [134].
Furthermore, considering that pixel-based clustering
methods generally encounter several obstacles and that they
were easily affected by salt-and-pepper noise and could not
accurately model the spatial neighborhoods of hyperspectral pixels with various shapes, several object/super-pixel
49

based SSC methods were developed for HSIs. A typical example is the mass center-reweighted object-oriented SSC
(MCR-OOSSC) algorithm [135]. It flexibly models spatial
neighborhoods with various shapes via objects obtained
from oversegmentation and extracts more representative
and discriminative object mass centers as features to construct the object sparse representation model, as in (17):
u
min Cu C

m
uu
+ 2 Yu - YC

u T 1 = 1.
C

2
F

u h = 0,
subject to diag ^C

(17)

Here, Yu ! R D # G is the object mass center data matrix and
C ! R G # G is the associated sparse coefficient matrix, with G
denoting the number of objects. Based on the MCR-OOSSC
approach, in [136], a higher-order superpixel-based SSC algorithm with a conditional random field (SP-SSC-CRF) was
proposed. It integrates the advantages of the S 4 C and OOSSC
methods to generate an enhanced model and utilizes the conditional random field to further smooth the within-class noise.
In general, these object/superpixel-based methods improve the
clustering performance to a certain degree and greatly reduce
the time cost by converting pixel clustering to object clustering, which significantly increases the attractiveness of subspace
clustering in practical applications.
Moreover, to better evaluate the discriminative information and more accurately learn the nonlinear structure of
HSIs, a Laplacian-regularized deep subspace clustering (LRDSC) algorithm was proposed [137]. It combines subspace
clustering with the deep convolutional autoencoder network
to learn the nonlinearity of HSIs and extracts spectral–spatial
information through 3D convolutions and deconvolutions
with skip connections to fully exploit multilevel features.
Consequently, LRDSC obtains highly competitive clustering
performance for HSIs.
MULTIVIEW SUBSPACE CLUSTERING
Multiview subspace clustering methods take full advantage
of complementary information found in different domains
of HSIs to further improve the clustering performance. Generally, each view corresponds to a specific feature domain,
such as the spectral feature domain, the contexture feature domain, the shape feature domain, and so on. Such techniques
generally construct a unified model to integrate multiview
feature self-representation problems. A typical example is
the spectral–spatial-based multiview low-rank SSC (SSMLC)
method [138], which has been applied to HSIs [139]. Specifically, it generates the spectral view by spectral partitioning to
obtain correlated bands and creates the spatial view by morphological processing. In addition, another view is generated
by PCA to remove the serious noise in HSIs. By integrating
different views within the SSC framework, SSMLC can be
modeled as in (18):

min
2

/ ^b 1
m

C , C , f, C i =1

)

+ b 2 C i 1h + c

subject to Y i = Y i C i, diag ^C ih = 0,

1 # i, j # m, i ! j

Ci - C j

2
F

(18)

where m denotes the number of views and Y i indicates the
feature matrix of the ith view, with C i representing the associated coefficient matrix. The terms b 1, b 2, and c are three
regularization parameters. The second term is utilized to
force the coefficient matrixes learned from different views
to share the same pattern. Through multiview learning,
the complementary information of HSIs can be effectively
integrated, and the discriminability of the representation
coefficients can be enhanced to some degree, which leads
to more accurate clustering results.
Considering the nonlinearity of HSIs, the SSMLC model
was extended to a kernel version, i.e., k-SSMLC [140]. It further improves the clustering accuracy by introducing the kernel technique to address the nonlinearly separable problem
of HSIs in the multiview subspace clustering framework. In
addition, to overcome the large computational burden of
multiview subspace clustering, a parallel SSMLC (p-SSMLC)
method was put forward [141]. It adopts a simple parallel
strategy to reduce the time cost of SSMLC. Specifically, given
the large size of remote sensing images, the HSI is first partitioned into many nonoverlapping 3D blocks. Then, the
SSMLC method is applied to each 3D block to obtain the local clustering results. Last, by merging these local clustering
outcomes, the final clustering result is obtained. By employing the advanced parallel computing technique, the overall
time cost is significantly reduced, which further improves the
practicability of the computationally expensive multiview
subspace clustering models.
KERNEL SUBSPACE CLUSTERING
Due to the complex imaging environment and serious
interference from various nonlinear factors in the imaging process, HSIs generally have an obvious nonlinear
structure, and different classes are generally not linearly
separable. However, most subspace clustering methods are
based on the linear subspace assumption, which utilizes
the union of linear subspaces to approximate the complex
nonlinear internal structure of HSIs, leading to a large systematic error and poor separability among different classes.
As a result, the clustering performance is degraded to some
degree. Based on this fact, kernel subspace clustering methods have been developed to relieve the nonlinearity of HSIs
to further improve the clustering performance through a
kernel self-representation model, instead of a linear model,
to more accurately mine the latent adjacency among pixels. Such methods first map pixels from the original feature
space into a much higher dimensional kernel space to approximately transform the nonlinearly separable problem
into a linearly separable one. Then, the self-representation
property of the mapped features in the reproducing kernel
space is exploited to construct a kernel self-representation
model, as in (19), which generally leads to a more accurate
coefficient matrix:
m
min C H ^C h + 2 K ^ Y h - K ^ Y h C 2F

subject to diag ^Ch = 0,
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

(19)
DECEMBER 2021

where K ^ $ h denotes the kernelized data matrix, with the
Gaussian radial basis function commonly utilized.
A typical kernel subspace clustering method is the kernel
SSC algorithm with a spatial maximum pooling operation
(KSSC-SMP) [142]. The KSSC-SMP extends SSC to nonlinear manifolds to construct the KSSC model to relieve the
nonlinearly separable problem of HSIs to some degree. Then,
it incorporates spatial neighborhood information through
spatial maximum pooling to generate more discriminative
features. Consequently, the KSSC-SMP may outperform linear SSC methods. In addition, in [143], a kernel sparse and
LRSC (KSLRSC) algorithm was proposed. It utilizes sparse
and low-rank constraints to simultaneously explore the local
and global structure information of HSIs. Accordingly, the
underlying adjacency among pixels can be more accurately
learned. Then, the KSLRSC method is extended to semisupervised classification for HSIs. Furthermore, by adding a
TV denoising constraint into the KSSC model to enhance
the similarity among pixels from the same subspace, a KSSC
with TV denoising (KSSC-TVD) algorithm was put forward
for HSIs [144]. In addition, the k-SSMLC method is also a
typical kernel subspace clustering model [140], which extends the linear multiview subspace clustering model to a
kernel version to further improve the clustering accuracy.
In general, because of accurate modeling and powerful information extraction capability, subspace clustering methods
have shown great potential for HSI clustering and achieved
very competitive performance. In recent years, subspace clustering has gained progressively more attention. However,
such methods are generally accompanied by a large computational complexity and massive time and memory consumption, which limits their applications to some degree.

DEEP LEARNING-BASED CLUSTERING METHODS
Deep learning-based clustering methods have been recently
developed and are one of the most advanced clustering techniques [145]. These approaches rely on deep NNs (DNNs),
such as fully connected networks (FCNs) and convolutional
NNs (CNNs), to learn more discriminative features for clustering and to more accurately simulate the nonlinearity of
data, as shown in Figure 9. Such methods generally deploy
two components, i.e., the network and the clustering model.
Since there are no available labeled samples, these models
are generally optimized in an unsupervised way. According to the basic architecture, deep learning-based clustering
methods can be further divided into three main categories:
1) autoencoder-based clustering, 2) separated network-based
clustering, and 3) generative network-based clustering [146].
AUTOENCODER-BASED CLUSTERING
Autoencoder-based clustering methods are the earliest and
most representative deep clustering approaches. An autoencoder is an unsupervised NN with the advantages of
simplicity and effectiveness. It generally consists of an encoder for data representation and a decoder for data reconstruction, and it self-trains by minimizing the reconstruction error. A typical example is the deep clustering network
(DCN) [147]. It implements DR via a deep autoencoder
network to learn more k-means-friendly features and optimizes the DR and clustering tasks in a unified framework,
as shown in (20):
MN
c
min / ` , ^ g ^ f ^ Yihh, Yih + 2 f ^ Yih - Is i 22 j
I, " s i , i = 1

T
subject to s j, i ! " 0, 1 ,, 1 s i = 1, 6i, j,

(20)

Autoencoder
Features
Encoder

Reconstruction Loss
Decoder

Joint
Training

Clustering Loss
Separated Network
Features

Clustering
Model
Fine-Tuning the Network
(a)

Generator

D (x)

G (z)
Discriminator
x_real

D (G (z))

(b)
FIGURE 9. The deep learning-based clustering mechanism. (a) The HSI. (b) Deep learning. (c) The clustering result.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

where f ^ $ h and g ^ $ h denote the nonlinear mapping function of the encoder and the decoder, respectively; , ^ $ h
stands for the reconstruction loss function, defined as
, ^ Yi, X ih = Yi - X i 22 , with X i being the reconstructed sample; I represents the centroid matrix, with its ith column
referring to the ith cluster centroid; and s i means the assignment vector with only one nonzero element. The first term
is the network loss, and the second term is the clustering
assignment loss, with c being a tradeoff parameter. Then,
the DCN solves this problem with an alternating stochastic
algorithm. Through learning more discriminative features
via a deep autoencoder network, the DCN outperforms traditional clustering methods.
A set of deep clustering models has been developed
based on an autoencoder approach. In [148], a deep multimanifold clustering (DMC) method was proposed. It integrates a locality-preserving constraint into the deep autoencoder network to learn the latent embedded manifolds
and more informatic features, using both the reconstruction loss and the local preserving loss. Then, the proximity
of the representations to the centroids is employed as the
penalty to enhance the representations’ clustering friendliness. Furthermore, a dual autoencoder-based deep SC
(DAE-DSC) method was proposed that jointly optimizes
the deep autoencoder and the deep SC networks [149]. It
employs a dual autoencoder network to learn more robust reduced representations and uses mutual information
to more effectively reserve discriminative information.
Moreover, a general deep clustering model was developed
by integrating the traditional clustering models, e.g., the
k-means and the GMM, into the deep networks [150]. It
yields a much higher accuracy than tradition methods.
The autoencoder was successfully introduced to subspace
clustering and delivered competitive performance by employing deep networks to more effectively deal with feature
extraction and data nonlinearity simulation tasks. For example, in [151], a deep subspace clustering algorithm based on
an autoencoder network (DSCNet) was proposed. It inserts
a self-expressive layer between the encoder and the decoder
to learn the pairwise adjacency between data points via back
propagation, and it integrates the reconstruction loss and the
self-representation loss to learn more discriminative representations. In addition, a structured autoencoder-based subspace clustering (StructAE) method was developed, which
constructs a structured autoencoder network to more effectively preserve the local and global structure information of
data [152].
Furthermore, a self-supervised convolutional subspace
clustering network (S2ConvSCN) was put forward [153].
It employs a convolutional autoencoder network to fully
explore the spatial information of an image, and it adds a
self-expression module and an SC module into the network
to generate a trainable end-to-end model. Then, it jointly
optimizes the feature extraction and subspace clustering in
a self-supervised way. Moreover, the LRDSC algorithm introduced previously obtains a higher clustering accuracy for
52

HSIs by introducing a convolutional autoencoder to subspace clustering [137]. In addition, in [154], a deep subspace
clustering band selection model was developed for HSIs,
combining a convolutional autoencoder network with the
SSC model and producing a good band selection effect.
SEPARATED NETWORK-BASED CLUSTERING
Separated network-based clustering methods generally optimize a deep network only by the clustering loss, with the
network and the clustering model separated. Although the
basic network can be very deep, these methods may fail to
learn informatic features for clustering, due to the absence
of a network constraint, such as the reconstruction loss.
Therefore, the initialization of the network seems crucial
for these methods. Generally, the network is pretrained or
randomly initialized. A typical example is clustering based
on CNN (CCNN) [155]. It deals with the feature extraction
and clustering tasks within the CNN framework in an iterative way. First, it employs a CNN pretrained on ImageNet
to extract features for initial clustering, with c randomly selected centroids and the minibatch k-means utilized. Then,
it exploits the difference between the label predicted by the
CNN and the minibatch k-means to fine-tune the network,
based on the stochastic gradient descent algorithm, and simultaneously updates the cluster centroids, as in (21). The
CCNN employs feature drift compensation to relieve mismatching to further improve the clustering accuracy:
1
SSE = 2

/ ^y j - t jh2,(21a)

j =1

n j = ^1 - c jh n j
^kh

^k - 1h

kh
+ c j h^new
, (21b)

where SSE denotes the sum of the squared error; y j and t j
stand for the label predicted by the CNN and the minibatch
^kh
k-means, respectively; n j represents the jth centroid in the
^kh
kth iteration; h new indicates the extracted features from a
minibatch assigned to n j ; and c j is the learning rate of the
jth centroid, which is defined as the reciprocal of the number of samples in the jth cluster.
Unsupervised pretraining is widely utilized for separated network-based clustering methods. In [156], a deep
brief network (DBN) nonparametric clustering (DBNC) algorithm was proposed that relies on an unsupervised pretrained DBN. It learns the reduced representations of data
through the pretrained DBN and employs nonparametric
clustering with the maximum margin to perform clustering, with the parameters of the top layer of the DBN finetuned subsequently. In addition, a deep embedded clustering (DEC) method was developed [157]. It first pretrains a
stacked autoencoder network, based on the reconstruction
loss in an unsupervised way, and drops the decoder part.
Then, it fine-tunes the network with the clustering loss and
refines the clustering result based on the Kullback–Leibler
divergence between the soft assignment and the auxiliary
distribution in an iterative way. Based on the DEC model, an improved version was put forward, by employing a
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

L G = min - G G 6p ^y | D h@ + E z 6G 6p ^y | G ^ z h, D h@@, (22b)

enhances the discriminability and robustness of the classifier and yields a high clustering accuracy.
Based on the CatGAN, in [163], an information-maximizing GAN (InfoGAN) was proposed. It enhances the
clustering performance by exploiting the mutual information among the fixed small subset of latent variables and
the observations. Furthermore, a deep adversarial GMM
autoencoder clustering (DAGMC) algorithm was developed
[164]. It uses an adversarial autoencoder network to learn
the reduced representations and employs a tunable GMM
for clustering. It simultaneously considers the autoencoder,
the GMM, and the adversarial losses in its objectives, which
are optimized by the stochastic gradient descend algorithm.
Moreover, a deep adversarial subspace clustering (DASC)
method was put forward [165]. It is also based on an adversarial autoencoder network with a generator for subspace
estimation and the clustering assignment and a discriminator for clustering performance evaluation. Then, it progressively learns more informatic representations, with the selfrepresentation and subspace clustering tasks supervised by
adversarial learning.
In addition to GAN-based models, a set of variational
autoencoder (VAE)-based generative deep clustering methods has been developed, integrating certain probability
models into a deep network to learn the distribution of data
for sample generation. For example, in [166], a variational
deep embedding (VaDE) algorithm was proposed. It integrates GMM into the VAE network for sample generation
and optimizes the clustering problem by maximizing the
evidence lower bound via the stochastic gradient variational
Bayes estimator. In addition, a VAE with Gaussian mixture
(VAE-GM) method was put forward [167]. It generates samples from a prior distribution, i.e., Gaussian mixture, and
introduces the minimum information constraint to relieve
the over-regularization of VAE. As a result, it yields a high
clustering accuracy.
Overall, deep learning-based clustering methods can
bring about a higher clustering accuracy due to their powerful feature learning and nonlinearity-fitting capabilities.
Accordingly, they have become a research hot spot in the
clustering field. However, most deep clustering methods are
concentrated in the computer vision field, with rare trails in
the hyperspectral remote sensing arena. Hence, more deep
learning-based hyperspectral clustering methods should
be developed to promote the development of this field. In
addition, most of the relevant works focus on improving
the clustering performance but ignoring the theoretical exploration behind the performance, which leads to the poor
interpretability of these methods and limits their popularization and application to a certain degree.

where G ^ $ h denotes the empirical entropy, y is the predicted label of a given example Yi, and z is a generated noise
vector from a prior distribution P ^ z h, with D and G representing the discriminator and the generator, respectively. Through adversarial learning, the CatGAN effectively

HYBRID MECHANISM-BASED CLUSTERING MODELS
Hybrid mechanism-based methods deal with the clustering task by combining two or more models, as presented in
Figure 10. Considering that a single clustering model generally has certain shortcomings, such techniques integrate

convolutional autoencoder network to learn more informatic features for clustering [158].
Random initialization is also often utilized for separated network-based clustering models. For instance, in
[159], a CNN-based joint unsupervised learning (JSL) algorithm was proposed. It starts with a random initialization
and formulates a recurrent framework to jointly update the
representations and clusters during the training process,
with clustering as the forward pass and representation as
the backward pass. In addition, a deep SSC (DSSC) method
was developed, which combines a DNN with SSC [160].
It randomly initializes the network and iteratively refines
the sparse coding and the clustering results in the forward
propagation stage, with the parameters of the DNN updated in the backward propagation stage. Furthermore, a
DNN-based SC (SpectralNet) method was proposed [161].
It randomly initializes the parameters of the network and
considers three terms in the unsupervised training process: 1) affinity learning based on a Siamese network, 2)
embedding learning under an orthogonality constraint to
map the data into eigenspace, and 3) the clustering assignment. As a result, it significantly outperforms traditional
SC methods.
GENERATIVE NETWORK-BASED CLUSTERING
Differing from autoencoder- and separated network-based
deep clustering methods, generative network-based clustering approaches simultaneously perform clustering and
uncover the underlying structure of data to generate new
samples. These methods generally aim at learning the real
structure of data as accurately as possible to create highquality samples. Therefore, they can more effectively guarantee the discriminability of the extracted features. The
most representative generative deep clustering methods
can be the generative adversarial network (GAN)-based
approaches. These methods commonly include two parts,
i.e., the generator and the discriminator, and improve the
quality of the extracted features through the antagonism
between the generator and the discriminator. A typical
example is the categorical GAN (CatGAN) [162]. It plays
a minimum–maximum adversarial game to learn a discriminative classifier in an unsupervised way, by trading
off mutual information between the observations and the
predicted class distribution. Its discriminator and generator
are defined as in (22):
L D = max G 6p ^y | D h@ - E Yi 6G 6p ^y | Yi, D h@@
D

+ E z 6G 6p ^y | G ^ z h, D h@@,

(22a)

DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

model, a graph-based k-means (G-k-means) technique
was developed. It utilizes the graph model to estimate the
parameters and initializations for the k-means, which effectively improves the clustering performance to obtain
an accurate segmentation result for HSIs for Mars exploration. In addition, by combining anchor graph-based clustering with subspace clustering, a sparse dictionary-based
anchor regression (SDCR) algorithm was introduced for
HSIs [174]. It constructs a more representative dictionary
through dictionary learning with double sparsity constraints, and it utilizes the anchor subspace regression to
efficiently evaluate the similarity between hyperspectral
pixels. With the help of SC, the final clustering result is obtained. By integrating the advantages of the anchor graph
and the subspace, SDCR achieves good performance.
Generally speaking, by comprehensively taking advantage of two or more different clustering schemes, hybrid
mechanism-based clustering methods can overcome the
defects of both techniques and may bring about better
clustering performance. In theory, hybridization can be
extended to any two or more clustering schemes, and progressively more attractive hybrid clustering methods may
be developed through future research.

the advantages of different schemes to further improve the
clustering performance. A typical example is the combination of the centroid-based clustering scheme with other
approaches, such as the k-GMM [168]. The k-GMM combines centroid- and probability-based clustering. Taking
advantage of the k-means and the GMM, k-GMM obtains
better clustering accuracy for HSIs. In addition, in [169],
an improved fast density peaks-based clustering (k-F DPC)
algorithm was proposed, which is a hybridization of the
k-means and the CFSFDP. Based on the CFSFDP approach, this algorithm calculates the local density based
on an adaptive bandwidth pdf, and it searches cluster centroids by fitting the density and distance decision graph.
Subsequently, it infers the pixel assignment through the
k-means. As a result, k-FDPC outperforms both k-means
and CFSFDP.
In addition, bionics-based clustering can also be combined with other approaches. For example, in [170], a
fuzzy Kohonen local information c-means clustering
(FKLICM) method was put forward. It employs the Kohonen NN to model the complexity of remote sensing images
and integrates the discriminative rules of the FLICM to
enhance the discriminability of the model. Consequently,
more accurate clustering results are obtained. In addition,
by combining the advanced artificial bee colony (ABC)
model with MRF, a novel ABC–MRF clustering algorithm
was developed for HSIs [171]. The ABC model is utilized
to better search cluster centroids and optimize the objective function, with the MRF utilized to incorporate spatial
neighborhood information to further improve the clustering accuracy.
Moreover, the graph-based clustering scheme can be
flexibly combined with other clustering schemes as well.
For example, in [172], a Gaussian SC model (GSC) was
constructed by integrating the powerful information extraction ability of the graph model into the GMM framework. In [173], by combining the k-means with the graph

EXPERIMENTS
In this section, the performance of some popular and representative hyperspectral clustering algorithms is evaluated, including FCM (https://github.com/wwwwwwzj/
fcm)[45],FCM-S1[64],CFSFDP (https://github.com/DesperadoZ/
Density_Peak_Clustering) [71], GMM (https://github.com/
AdamaTG/Matlab_GMM) [79], SC (https://github.com/jhliu17/
SpectralClustering) [105], FSCAG [43], SGCNR [121], SSC
(http://vision.jhu.edu/code/) [40], [122], and L2-SSC [41].
Specifically, FCM is one of the most representative centroidbased clustering methods, while FCM-S1 is a classical improved version of FCM, achieved by incorporating spatial
information. CFSFDP is a representative density-based

Centroid-Based Model
0.04
0.02

Combination
(a)

Density-Based
Model

Optimization

0
20
10

0 0

Probability-Based
Model

Graph-Based Model
(b)
FIGURE 10. The hybrid mechanism-based clustering scheme. (a) The HSI. (b) The hybrid clustering model construction. (c) The hybrid

model optimization. (d) The clustering result.
54

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

clustering approach. GMM is a typical probability-based
clustering method. SC is a complete graph-based clustering technique, while FSCAG and SGCNR are two recently
developed state-of-the-art abbreviated graph-based clustering approaches. SSC is a representative subspace clustering
method, and L2-SSC is a very competitive spectral–spatial
subspace clustering technique for HSIs.
These clustering methods were tested on two wellknown HSIs, i.e., the Indian Pines image and the University of Houston image, with both cluster maps and quantitative assessments provided for comprehensive evaluation
and comparison. Specifically, the producer’s accuracy
(PA), user’s accuracy (UA), overall accuracy (OA), kappa,
and purity were utilized for quantitative analysis. In addition, the running time of each method was also given.
The parameters of each approach were manually adjusted
to be optimal. In the experiments, the clusters’ thematic
information was automatically determined by the widely
used Hungarian algorithm [175], [176], with the number
of clusters set as the quantity of the classes in the ground
truth [22], [43], [121].
Until now, there has been no unified standard for the
utilization of unlabeled pixels in the ground truth in the hyperspectral clustering field. Some of the literatures utilize all
the pixels and gives the cluster map of the whole image [21],
[39], [68], while other works give only the cluster map of the
labeled pixels in the ground truth [22], [77], [117]. Generally speaking, each of these strategies has advantages. The
former seems more in line with the working mechanism
of unsupervised clustering, as there is no available prior
knowledge. The latter can more clearly present the differences between the clustering results of different algorithms.
In this article, the latter is utilized.
The Indian Pines image was collected by the Airborne Visible/Infrared Imaging Spectrometer sensor
over northwestern Indiana on 12 June 1992. This image
has a size of 145 × 145 pixels and 220 spectral bands,
with a spatial resolution of 20 m. In the experiments,
only 200 bands were utilized for analysis, with 20 badquality bands removed. This scene covers an agricultural area and has a relatively concentrated land cover
distribution. It contains 16 different classes, with many
subclasses of vegetation. As in [21] and [177], nine main
classes are utilized for clustering. The false-color image
and the ground truth are shown in Figure 11(a) and (b).
Figure 11(c) displays the mean spectra of the nine classes,
with the t-distributed stochastic neighbor embedding
(t-SNE) graph of labeled samples of the nine classes given
in Figure 11(d) [178], [179], from which it can be found
that different classes are mixed together and are difficult
to separate, leading to a very challenging clustering task.
The University of Houston image is a relatively new HSI
data set, provided by the 2013 IEEE Geoscience and Remote
Sensing Society Data Fusion Competition. It was obtained
above the University of Houston by the National Center for
Airborne Laser Mapping sensor on 23 June 2012. Different
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

from the Indian Pines image, this scene mainly covers an
urban area with a relatively complex land cover distribution. The image has a size of 349 × 1,905 pixels, with 144
spectral bands. In the experiment, a typical subset at a size
of 160 × 150 × 144 was utilized, with seven main classes
included [21]. The false-color image, the ground truth, the
mean spectra, and the t-SNE graph of labeled samples of
the seven classes appear in Figure 12.
Cluster maps of different methods are provided in Figures 13 and 14, with the quantitative evaluations given in
Tables 2 and 3, respectively. Comprehensively analyzing
the experimental results, it can be seen that, in general,
the spectral–spatial methods outperform the spectralbased approaches by taking full advantage of the spectral–spatial duality of HSIs to obtain smoother clustering
results with a higher accuracy, which suggests that the
spatial information is informative and favorable for clustering. Specifically, it can be seen that FCM and FCM-S1
fail to perform well for both HSI data sets, with a large
number of misclassifications, a significant amount of
within-class noise in the cluster map, and relatively lower
clustering accuracy.
Generally speaking, centroid-based methods are more
suitable for data that have a well-separated and near-spherical geometric structure [180]. However, this performance
guarantee is generally violated for HSIs. Comparatively
speaking, these methods perform better on the second image scene, where there is better indivisibility, as shown in
Figures 11 and 12. Similarly, GMM also performs poorly
for both image scenes because its assumption that samples
from different classes obey the union of Gaussian distributions cannot be fully satisfied by HSIs. Although CFSFDP
obtains relatively smooth clustering results for both scenes,
there are several important classes that are not effectively
recognized, especially for the Indian Pines image. This is
because density-based methods commonly have strong assumptions about the distribution of the feature space and
are suitable for data with a multimodal distribution and
nonlinear structure [180]. Unfortunately, the complexity of
HSIs generally conflicts with the performance guarantee of
these methods.
By comparison, SC performs better, as it more accurately
exploits the similarity among pixels by means of the graph.
It obtains the second-best clustering accuracy for the Indian
Pines image and the fourth-best clustering accuracy for the
University of Houston image. Since the abbreviated graph
cannot accurately model the relationships among pixels,
SGCNR and FSCAG fail to obtain good performance for
both scenes, although they are very efficient. There are a large
number of misclassifications and a notable amount of noise
in the cluster map for both scenes. In general, graph-based
methods also need certain performance guarantees and are
more suitable for data with a geometric structure that samples
from different classes and are almost orthogonal or where the
overlap between classes is small relative to the indivisibility
[180]–[182], which cannot be well satisfied by HSIs.
55

effectiveness, i.e., a tolerable noise level to support a strict
subspace model, enough samples for each subspace, and a
low affinity between different subspaces, which has been
theoretically proved [183].
It can be found that subspace clustering methods fail
to obtain a satisfactory accuracy for the Indian Pines image, due to the strong noise and serious overlap between
different classes, while the approaches perform well for
the noiseless University of Houston image, with larger distances between different classes. In addition, it should be
noted that SSC and L2-SSC are troubled by the large computational complexity and are time consuming compared
with the other clustering methods, which is a shortcoming
of such approaches that needs to be solved.

Relative to these methods, the recently developed subspace clustering techniques, i.e., SSC and L2-SSC, may
better model the complex structure of HSIs and relieve
the large spectral variability with the subspace model.
Through self-representation learning, interactions among
pixels can be more effectively exploited, and the underlying adjacency between pixels can be more accurately
learned, which might guarantee that pixels are partitioned
into the correct groups. As a result, SSC and L2-SSC have
a relatively good performance and show significant potential for HSIs. L2-SSC achieves the best clustering results,
with smoother cluster maps and higher clustering accuracy
for both scenes. However, behind the good performance,
some restrictive assumptions are needed to guarantee their

(b)

(a)
80

7,000

60
Dimension Two After DR

8,000

DN Value

6,000
5,000
4,000
3,000
2,000
1,000

40
20
0
–20
–40
–60

80 100 120 140 160 180 200
Band Number

Corn-Notill
Corn-Minimum-Till
Grass/Pasture
Grass/Trees
Hay-Windrowed

Soybeans-Notill
Soybeans-Minimum-Till
Soybeans-Clean
Woods
(c)

–80
–80

–60

–40 –20
0
20
40
Dimension One After DR

Corn-Notill
Corn-Minimum-Till
Grass
Trees
Hay-Windrowed

Soybeans-Notill
Soybeans-Minimum-Till
Soybeans-Clean
Woods
(d)

FIGURE 11. The Indian Pines data set. (a) The original image (red: 40; green: 30; blue: 20). (b) The ground truth. (c) The mean spectra of the
nine classes. (d) The t-SNE graph of labeled samples of the nine classes.

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

SUMMARY AND DISCUSSION
Hyperspectral remote sensing images provide a wealth of
spectral information and show subtle differences between
various classes to support fine land cover classification, and
they have been an important data resource in various applications. As typical high-dimensional data, the interpretation of HSIs relies on a large number of labeled samples.
However, it is very difficult to acquire high-quality samples,
in practice. Therefore, during recent decades, many clustering methods have been developed for HSIs to deal with the
interpretation task in an unsupervised way. In this article,
we systematically reviewed the existing hyperspectral clustering methods in the literature and summarized them into
nine main kinds, i.e., centroid-based, density-based, probability-based, bionics-based, intelligent computing-based,
graph-based, subspace clustering, deep learning-based, and

hybrid mechanism-based. In addition, we introduced the
principle and mechanism of each type of clustering method
and reviewed the representative approaches in detail, with
the advantages and disadvantages simply summarized.
From this research, we find
that the development of hyperIN THIS ARTICLE, WE
spectral clustering is not balSYSTEMATICALLY REVIEWED
anced. The development of the
centroid-, density-, and probaTHE EXISTING
bility-based clustering methHYPERSPECTRAL
ods is more mature, especially
CLUSTERING METHODS IN
for the former two approaches.
THE LITERATURE AND
The achievements of these two
SUMMARIZED THEM INTO
kinds of clustering methods
NINE MAIN KINDS.
are relatively more abundant.
Research on bionics-based

(b)

(a)
× 104

3.5

100
80
Dimension Two After DR

DN Value

2.5
2
1.5
1
0.5
0

60
40
20
0
–20
–40
–60

100
Band Number

150

–80
–80 –60

–40

Grass-Synthetic
Running Track
Bare Soil

Building 1
Building 2
Grass
Trees
(c)

–20
0
20
40
60
Dimension One After DR

Building 1
Building 2
Grass
Trees

100

Grass-Synthetic
Running Track
Bare Soil
(d)

FIGURE 12. The University of Houston data set. (a) The original image (red: 110; green: 40; blue: 10). (b) The ground truth. (c) The mean

spectra of the seven classes. (d) The t-SNE graph of labeled samples of the seven classes.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

and trials are needed in the future. Recently, graph-based clustering and subspace clustering have gained an increasing attention due to their relatively good clustering performance,
and more and more algorithms have been proposed.

clustering is relatively few, which demands more attention in
future work. In addition, deep learning has obtained remarkable achievements in the computer vision field, but it has few
applications in the hyperspectral clustering arena. More effort

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

Corn-No-Till

Corn-Minimum-Till

Grass/Pasture

Grass/Trees

Hay-Windrowed

Soybeans-No-Till

Soybeans-Minimum-Till

Soybeans-Clean

Woods

Unlabeled

FIGURE 13. Cluster maps of the different methods for the Indian Pines image. (a) The ground truth. (b) FCM. (c) FCM-S1. (d) CFSFDP.
(e) GMM. (f) SC. (g) SGCNR. (h) FSCAG. (i) SSC. (j) L2-SSC.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

Building 1

Building 2

Grass

Trees

Grass-Synthetic

Running Track

Bare Soil

Unlabeled

FIGURE 14. Cluster maps of the different methods for the University of Houston image. (a) The ground truth. (b) FCM. (c) FCM-S1.
(d) CFSFDP. (e) GMM. (f) SC. (g) SGCNR. (h) FSCAG. (i) SSC. (j) L2-SSC.

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

Moreover, we comprehensively compared and analyzed
the performance of several popular hyperspectral clustering
methods on two well-known HSIs. From the experimental
results, we find that, in general, spectral–spatial methods

outperform spectral-based methods, which indicates the
importance of spatial information. Centroid-, density-, and
probability-based methods, e.g., FCM, FCM-S1, CFSFDP,
and GMM, do not perform well because their assumptions

TABLE 2. QUANTITATIVE EVALUATIONS OF THE DIFFERENT METHODS FOR THE INDIAN PINES IMAGE.
METHOD

CLASS

FCM

FCM-S1

CFSFDP

GMM

SGCNR

FSCAG

SSC

L2-SSC

PA (%)

Cluster 1

35.78

37.9

58.33

25.59

30.21

39.27

28.6

48.39

44.33

Cluster 2

3.86

17.54

0.96

3.01

8.07

Cluster 3

56.73

60.25

36.65

57.56

39.09

51.76

59.42

65.01

Cluster 4

52.19

54.79

66.71

67.97

77.4

55.21

64.96

67.67

76.71

Cluster 5

98.95

100

84.35

99.58

90.79

99.71

88.08

100

Cluster 6

49.38

49.59

59.57

30.31

53.5

40.64

55.95

34.36

45.68

Cluster 7

42.93

42.62

15.72

56.43

47.98

50.26

40.17

48.92

67.09

Cluster 8

33.73

34.57

15.08

28.84

16.16

27.52

2.87

2.53

Cluster 9

47.43

45.68

99.68

64.43

55.73

48.08

58.04

60.16

49.96

Cluster 1

52.95

53.34

28.1

37.96

63.78

44.81

65.96

45.22

48.62

Cluster 2

8.74

7.42

18.18

3.14

7.55

Cluster 3

30.34

30.04

27.75

33.03

36.78

33.73

34.41

34.97

Cluster 4

94.54

93.9

53.22

75.45

91.87

67.05

82.96

96.67

98.94

Cluster 5

69.56

68.92

80.28

60.79

62.64

94.51

98.14

93.91

Cluster 6

20.78

21.01

23.22

28.48

21.54

23.05

22.09

28.64

23.97

Cluster 7

51.92

52.89

58.13

45.27

49.41

45.9

47.04

45.03

51.78

Cluster 8

21.93

22.76

25.19

23.42

20.82

20.45

4.05

9.74

Cluster 9

88.76

91.02

77.08

84.16

94.5

77.11

92.73

96.09

98.29

OA (%)

43.03

43.55

38.75

45.18

46.92

42.45

43.99

46.27

51.15

Kappa

0.3427

0.3497

0.2927

0.3486

0.3839

0.3227

0.353

0.3676

0.419

Purity

0.5528

0.5596

0.473

0.5205

0.5498

0.4843

0.5642

0.5489

0.5647

Time (s)

381

497

30.68

5409

7.42

1.44

32764

13532

UA (%)

Cluster 1: corn-notill; cluster 2: corn-minimum-till; cluster 3: grass/pasture; cluster 4: grass/trees; cluster 5: hay-windrowed; cluster 6: soybeans-no-till; cluster 7: soybeans-minimum-till;
cluster 8: soybeans-clean; cluster 9: woods.

TABLE 3. THE QUANTITATIVE EVALUATION OF THE DIFFERENT METHODS FOR THE UNIVERSITY OF HOUSTON IMAGE.
METHOD

CLASS

FCM

FCM-S1

CFSFDP

GMM

SGCNR

FSCAG

SSC

L2-SSC

PA (%)

Cluster 1

51.73

51.12

99.82

68.37

88.72

66.67

74.59

79.53

91.8

Cluster 2

95.16

97.25

93.96

39.52

96.52

38.94

38.68

58.42

99.08

Cluster 3

94.45

95.51

99.94

95.01

93.48

59.78

76.09

95.25

96.61

Cluster 4

49.44

68.5

2.07

28.42

73.69

37.58

87.66

68.67

64.32

Cluster 5

100

99.76

99.95

100

76.66

96.71

98.9

100

Cluster 6

16.37

98.42

41.21

0.39

79.16

58.82

94.87

98.42

Cluster 7

67.32

72.32

94.91

73.48

54.54

48.92

82.81

69.07

89.09

Cluster 1

100

98.57

98.72

99.75

100

65.93

98.59

99.77

99.27

Cluster 2

47.51

49.54

99.81

23.66

50.87

13.73

20.18

51.04

68.48

Cluster 3

90.32

92.21

76.6

82.6

96.08

81.05

95.58

96.44

92.69

Cluster 4

43.89

79.82

30.1

37.13

78.47

39.32

66.49

88.45

86.65

Cluster 5

82.65

96.24

88.93

75.16

59.88

92.81

74.28

Cluster 6

17.28

99.87

22.04

0.5

43.67

100

99.07

Cluster 7

69.65

67.34

98.38

80.43

60.1

79.02

79.16

78.45

99.03

OA (%)

73.99

76.62

86.69

74.01

78.27

57.45

77.14

83.82

91.13

Kappa

0.6614

0.6968

0.8199

0.6556

0.7209

0.4756

0.7131

0.7921

0.8849

Purity

0.8083

0.8301

0.87

0.7903

0.8153

0.6931

0.8479

0.8382

0.9113

Time (s)

391

507

24382

7.47

1.28

24382

53281

UA (%)

Cluster 1: building 1; cluster 2: building 2; cluster 3: grass; cluster 4: trees; cluster 5: grass-synthetic; cluster 6: running track; cluster 7: bare soil.

DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

cannot be fully satisfied by HSIs. FCM and FCM-S1 have
a low complexity of O(MNDct) and are relatively efficient,
and thus they are suitable for large hyperspectral data sets,
where t denotes the number of iterations. CFSFDP has a
higher complexity of approximately O ^^ MN h2h and requires a relatively large memory to store a sizeable pairwise
pixel distance matrix, limiting its suitability for large HSIs.
GMM has a relatively large complexity of O ^^ MN h2ct h,
which degrades its suitability for large HSIs to some degree.
By comparison, complete graph-based methods, e.g.,
SC, may perform well, but they are troubled by a large
computational cost. Specifically, SC has a large complexity of O ^^ MN h2 Dt h and is time consuming, which reduces
its practicability to a large degree. Comparatively speaking,
abbreviated graph-based methods, e.g., FSCAG and SGCNR, are very efficient and suitable for large HSIs because
of their lower complexity. The complexity of FSCAG and
SGCNR are O(MNDu) and O(MND log u + MNc 2 + MNcv
+ c 3), respectively, where u is the number of anchors and
v is the number of nearest neighbors, with u, v % MN [43],
[121]. However, their clustering accuracy cannot be guaranteed. Relative to the above methods, subspace clustering approaches, e.g., SSC and L2-SSC, may deal better with
the clustering task for HSIs and bring about a competitive
clustering performance. However, such methods generally have a very large computational complexity of about
O ^^ MN h3t h and are very time and memory consuming,
which degrades their attractiveness in real functions and
hinders their applications to large hyperspectral data sets
to a large degree.
In general, clustering is an important and necessary
technique for HSI interpretation, but it has much room for
improvement. Accuracy, efficiency, and intelligence may be
the major lines of development for hyperspectral clustering
in the future. Based on the research status of hyperspectral
clustering, the challenges and possible future research lines
are pointed out as follows.
DEVELOPING EFFECTIVE AND EFFICIENT MODELS
Accuracy and efficiency are both very important in practical applications. However, most current hyperspectral
clustering algorithms cannot simultaneously consider
these two aspects very well. For example, subspace clustering may bring about a higher clustering accuracy, but
significant computational complexity generally follows,
which degrades the technique’s scalability to large scenes
and limits its practical applications. Centroid-, density-,
and probability-based methods are generally efficient but
with limited clustering accuracy for HSIs. Hence, how to
develop more effective and efficient hyperspectral clustering models with a high accuracy and a low time cost
is an interesting and attractive topic. Generally speaking,
combining the advantages of different clustering models,
such as hybrid mechanism-based clustering, may be an effective way to overcome these obstacles. In addition, combining advanced clustering models with high-performance
60

computing techniques, such as parallel computing, may
greatly enhance the efficiency while guaranteeing a high
clustering accuracy.
DEVELOPING MULTIFEATURE-BASED METHODS
Hyperspectral remote sensing images generally come with
the serious problem that pixels from the same class have
different spectra, while pixels from different classes have
similar spectra, which greatly degrades the separability
among different classes. Multiple features from different
views/domains, e.g., spectrum, texture, and geometry, describe ground objects from different views and can provide
complementary information to effectively enhance the
discriminable capability of a clustering model to improve
the clustering accuracy. However, most existing clustering methods integrate the spatial information by means
of regularization to simply explore the discriminability of
the spectral–spatial information or simply fuse multiple
features through concatenation, which does not fully excavate the potential of the multidomain information in HSIs.
Therefore, more advanced multiple featured-based clustering methods should be developed to further improve the
clustering accuracy.
DEVELOPING OBJECT- OR
SUPERPIXEL-BASED METHODS
Hyperspectral remote sensing images contain abundant and
complex spatial neighborhood information; however, they
are seriously influenced by noise during the imaging process.
Most existing clustering methods are pixel-based methods,
which have several inherent shortcomings. First, pixel-based
methods are easily affected by salt-and-pepper noise, resulting in fragmented cluster maps. Second, pixel-based methods cannot flexibly model the spatial neighborhoods with
various shapes, which leads to an inadequate exploitation of
the spatial information of HSIs. Last, due to the large number of pixels, pixel-based methods may be troubled by a
large computational cost, especially for graph-based clustering and subspace clustering methods. At this point, object/
superpixel-based clustering techniques can effectively overcome these obstacles. Thus, more object-/superpixel-based
clustering methods should be developed for HSIs to further
improve the clustering performance.
PUSHING DEEP LEARNING INTO THE
HYPERSPECTRAL CLUSTERING FIELD
Hyperspectral remote sensing images generally have a
typical nonlinear structure due to the complex imaging
environment and the influences of many nonlinear factors. Thus, pixels from different classes are commonly not
linearly separable. On the other hand, low-level spectral
or spatial features have a limited discriminability and
cannot well distinguish various classes with high similarity. However, most existing clustering methods are based
on linear models to approximate the nonlinearity of HSIs,
which leads to a large systematical error, or simply deal
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

with the nonlinearity of HSIs through the kernel technique. It should be noted that the kernel approach is, in
essence, a template-based model, which results in a large
computing amount and can only alleviate the nonlinear
separable problem to a certain extent, which restricts the
technique’s practical applications.
Many deep learning-based methods have been developed in the computer vision field and shown powerful capabilities for nonlinear fitting and feature extraction, but the
successful deployments in hyperspectral clustering are very
rare. Generally speaking, current hyperspectral clustering
methods remain at the stage of shallow learning and only
utilize the low-level features of HSIs, which yields a limited
clustering accuracy. Due to the huge differences between
hyperspectral data and natural figures, directly introducing
deep models in the computer vision field to HSIs generally
fails to obtain a satisfactory effect. Therefore, how to adjust/
modify deep models to better learn the intrinsic structure
of HSIs and extract more informatic and discriminative features to further improve the clustering performance would
be a very promising research line.
AUTOMATICALLY ESTIMATING
THE NUMBER OF CLUSTERS
Automatically and accurately estimating the number of clusters is very important for hyperspectral clustering, which
promotes clustering applications to be more intelligent and
attractive in practical applications. However, most current
studies focus on improving the clustering models, and studies on the automatic estimation of the number of clusters are
relatively few. Although some methods can automatically
estimate the number of clusters for HSIs, they are generally
bound to specific clustering models, e.g., FCM, and have
a limited universality. Hence, finding a technique to automatically and accurately estimate the number of clusters in
a more generic way will be an interesting and important research direction in the future.
ACKNOWLEDGMENTS
This work was funded, in part, by the Special Foundation
for National Science and Technology Basic Research Program of China, under grant 2019FY202503; the National
Key Research and Development Program of China, under grant 2018YFB0504500; the National Natural Science
Foundation of China, under grants 42001313, 61871298,
and 42071322; and the Fundamental Research Funds for
Central Universities, under grant G1323520273. Readers
who have questions about the article are encouraged to directly contact the corresponding author, Hongyan Zhang
(zhanghongyan@whu.edu.cn).
AUTHOR INFORMATION
Han Zhai (zhaihan@cug.edu.cn) is with the School of
Geography and Information Engineering, China University of Geosciences, Wuhan, 430074, China. He is a
Member of IEEE.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

Hongyan Zhang (zhanghongyan@whu.edu.cn) is with
the State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University,
Wuhan, 430079, China. He is a Senior Member of IEEE.
Pingxiang Li (pxLi@whu.edu.cn) is with the State Key
Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan,
430079, China. He is a Member of IEEE.
Liangpei Zhang (zlp62@whu.edu.cn) is with the State
Key Laboratory of Information Engineering in Surveying,
Mapping, and Remote Sensing, Wuhan University, Wuhan,
430079, China. He is a Fellow of IEEE.

REFERENCES
[1]

A. Plaza et al., “Recent advances in techniques for hyperspectral
image processing,” Remote Sens. Environ., vol. 113, pp. S110–
S122, Sept. 2009. doi: 10.1016/j.rse.2007.07.028.
[2] G. Camps-Valls, D. Tuia, L. Bruzzone, and J. A. Benediktsson, “Advances in hyperspectral image classification: Earth
monitoring with statistical learning methods,” IEEE Signal Process. Mag., vol. 31, no. 1, pp. 45–54, Jan. 2014. doi: 10.1109/
MSP.2013.2279179.
[3] M. Imani and H. Ghassemian, “An overview on spectral and
spatial information fusion for hyperspectral image classification: Current trends and challenges,” Inf. Fusion, vol. 59, pp.
59–83, July 2020. doi: 10.1016/j.inffus.2020.01.007.
[4] P. Duan, X. Kang, S. Li, P. Ghamisi, and J. A. Benediktsson, “Fusion of multiple edge-preserving operations for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 12,
pp. 10,336–10,349, 2019. doi: 10.1109/TGRS.2019.2933588.
[5] H. Zhang, Y. Song, C. Han, and L. Zhang, “Remote sensing image spatiotemporal fusion using a generative adversarial network,” IEEE Trans. Geosci. Remote Sens., early access, 2020. doi:
10.1109/TGRS.2020.3010530.
[6] G. Camps-Valls and L. Bruzzone, “Kernel-based methods for
hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens., vol. 43, no. 6, pp. 1351–1362, 2005. doi: 10.1109/
TGRS.2005.846154.
[7] H. Zhang, L. Liu, W. He, and L. Zhang, “Hyperspectral image denoising with total variation regularization and nonlocal low-rank tensor decomposition,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 5, pp. 3071–3084, 2019. doi: 10.1109/
TGRS.2019.2947333.
[8] H. Zhai, H. Zhang, L. Zhang, and P. Li, “Cloud/shadow detection based on spectral indices for multi/hyperspectral
optical remote sensing imagery,” ISPRS J. Photogram. Remote
Sens., vol. 144, pp. 235–253, Oct. 2018. doi: 10.1016/j.isprsjprs.2018.07.006.
[9] W. He, H. Zhang, and L. Zhang, “Total variation regularized reweighted sparse nonegative matrix factorization for hyperspectral unmixing,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 7,
pp. 3909–3921, 2017. doi: 10.1109/TGRS.2017.2683719.
[10] F. A. Kruse, J. W. Boardman, and J. F. Huntington, “Comparison of airborne hyperspectral data and EO-1 Hyperion for mineral mapping,” IEEE Trans. Geosci. Remote Sens., vol. 41, no. 6,
pp. 1388–1400, 2003. doi: 10.1109/TGRS.2003.812908.

[11] L. Tusa et al., “Mineral mapping and vein detection in hyperspectral drill-core scans: Application to porphyry-type mineralization,” Minerals, vol. 9, no. 2, p. 122, 2019. doi: 10.3390/
min9020122.
[12] U. Bradter, J. O’Connell, W. E. Kunin, C. W. Boffey, R. J. Ellis, and T. G. Benton, “Classifying grass-dominated habitats from remotely sensed data: The influence of spectral
resolution, acquisition time and the vegetation classification
system on accuracy and thematic resolution,” Sci. Total Environ., vol. 711, p. 134,584, Apr. 2020. doi: 10.1016/j.scitotenv.2019.134584.
[13] H. Zhang, J. Kang, X. Xu, and L. Zhang, “Accessing the temporal
and spectral features in crop type mapping using multi-temporal Sentinel-2 imagery: A case study of Yi’an County, Heilongjiang province, China,” Comput. Electron. Agricul., vol. 176,
p. 105,618, Sept. 2020. doi: 10.1016/j.compag.2020.105618.
[14] R. Darvishzadeh, C. Atzberger, A. Skidmore, and M. Schlerf,
“Mapping grassland leaf area index with airborne hyperspectral
imagery: A comparison study of statistical approaches and inversion of radiative transfer models,” ISPRS J. Photogram. Remote
Sens., vol. 66, no. 6, pp. 894–906, 2011. doi: 10.1016/j.isprsjprs
.2011.09.013.
[15] B. Kong, H. Yu, R. Du, and Q. Wang, “Quantitative estimation
of biomass of alpine grasslands using hyperspectral remote
sensing,” Rangeland Ecol. Manage, vol. 72, no. 2, pp. 336–346,
2019. doi: 10.1016/j.rama.2018.10.005.
[16] K. C. Tiwari, M. K. Arora, and D. Singh, “An assessment of independent component analysis for detection of military targets
from hyperspectral images,” Int. J. Appl. Earth Observ. Geoinf.,
vol. 13, no. 5, pp. 730–740, 2011. doi: 10.1016/j.jag.2011.03.007.
[17] M. Shimoni, R. Haelterman, and C. Perneel, “Hypersectral
imaging for military and security applications: Combining
myriad processing and sensing techniques,” IEEE Geosci. Remote Sens. Mag., vol. 7, no. 2, pp. 101–117, 2019. doi: 10.1109/
MGRS.2019.2902525.
[18] A. Plaza, P. Martínez, J. Plaza, and R. Pérez, “Dimensionality
reduction and classification of hyperspectral image data using
sequences of extended morphological transformations,” IEEE
Trans. Geosci. Remote Sens., vol. 43, no. 3, pp. 466–479, 2005.
doi: 10.1109/TGRS.2004.841417.
[19] W. Li, S. Prasadand, J. E. Fowler, and L. M. Bruce, “Locality-preserving dimensionality reduction and classification for hyperspectral image analysis,” IEEE Trans. Geosci. Remote Sens., vol. 50,
no. 4, pp. 1185–1198, 2012. doi: 10.1109/TGRS.2011.2165957.
[20] W. Li, F. Feng, H. Li, and Q. Du, “Discriminant analysis-based dimension reduction for hyperspectral image classification: A survey of the most recent advances and an experimental comparison
of different techniques,” IEEE Geosci. Remote Sens. Mag., vol. 6,
no. 1, pp. 15–34, 2018. doi: 10.1109/MGRS.2018.2793873.
[21] H. Zhai, H. Zhang, L. Zhang, and P. Li, “Total variation regularized collaborative representation clustering with a locally adaptive dictionary for hyperspectral imagery,” IEEE Trans. Geosci.
Remote Sens., vol. 57, no. 1, pp. 166–180, 2019. doi: 10.1109/
TGRS.2018.2852708.
[22] L. Zhang, L. Zhang, B. Du, J. You, and D. Tao, “Hyperspectral
image unsupervised classification by robust manifold matrix

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]
[35]

[36]

factorization,” Inf. Sci., vol. 485, pp. 154–169, June 2019. doi:
10.1016/j.ins.2019.02.008.
Y. Kong, Y. Cheng, C. P. Chen, and X. Wang, “Hyperspectral
image clustering based on unsupervised broad learning,” IEEE
Geosci. Remote Sens. Lett., vol. 16, no. 11, pp. 1741–1745, 2019.
doi: 10.1109/LGRS.2019.2907598.
H. Zhai, H. Zhang, L. Zhang, and P. Li, “Nonlocal means regularized sketched reweighted sparse and low-rank subspace clustering for large hyperspectral images,” IEEE Trans. Geosci. Remote
Sens., early access, 2020. doi: 10.1109/TGRS.2020.3023418.
H. Zhai, H. Zhang, L. Zhang, and P. Li, “Sparsity-based clustering for large hyperspectral remote sensing images,” IEEE
Trans. Geosci. Remote Sens., early access, 2020. doi: 10.1109/
TGRS.2020.3032427.
H. Kashima, J. Hu, B. Ray, and M. Singh, “K-means clustering of proportional data using L1 distance,” in Proc. IEEE
Int. Conf. Pattern Recognit., Dec. 2008, pp. 1–4. doi: 10.1109/
ICPR.2008.4760982.
J. Mao and A. K. Jain, “A self-organizing network for hyperellipsoidal clustering (HEC),” IEEE Trans. Neural Netw., vol. 7, no.
1, pp. 16–29, 1996. doi: 10.1109/72.478389.
Y. Ma, S. Lao, E. Takikawa, and M. Kawade, “Discriminant analysis
in correlation similarity measure space,” in Proc. Int. Conf. Mach.
Learn., June 2007, pp. 577–584. doi: 10.1145/1273496.1273569.
J. Chen, X. Jia, W. Yang, and B. Matsushita, “Generalization of subpixel analysis for hyperspectral data with flexibility in spectral similarity measures,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 7, pp. 2165–2171, 2009. doi: 10.1109/
TGRS.2008.2011432.
C. Rohkohl and K. Engel, “Efficient image segmentation using
pairwise pixel similarities,” in Proc. Joint Pattern Recognit. Symp.
(JPRS), Berlin: Springer-Verlag, Sept. 2007, pp. 254–263. doi:
10.1007/978-3-540-74936-3_26.
U. Maulik and I. Saha, “Automatic fuzzy clustering using modified differential evolution for image classification,” IEEE Trans.
Geosci. Remote Sens., vol. 48, no. 9, pp. 3503–3510, 2010. doi:
10.1109/TGRS.2010.2047020.
Y. Zhong, S. Zhang, and L. Zhang, “Automatic fuzzy clustering based on adaptive multi-objective differential evolution for
remote sensing imagery,” IEEE J. Sel. Topics Appl. Earth Observ.
Remote Sens., vol. 6, no. 5, pp. 2290–2301, 2013. doi: 10.1109/
JSTARS.2013.2240655.
S. Ghaffarian and S. Ghaffarian, “Automatic histogram-based
fuzzy C-means clustering for remote sensing imagery,” ISPRS
J. Photogram. Remote Sens., vol. 97, pp. 46–57, Nov. 2014. doi:
10.1016/j.isprsjprs.2014.08.006.
J. C. Bezdek, Pattern Recognition With Fuzzy Objective Function
Algorithms. New York: Plenum, 1981.
T. N. Tran, R. Wehrens, and L. M. Buydens, “KNN-kernel density-based clustering for high-dimensional multivariate data,”
Comput. Statist. Data Anal., vol. 51, no. 2, pp. 513–525, 2006.
doi: 10.1016/j.csda.2005.10.001.
C. Cariou and K. Chehdi, “Nearest neighbor-density-based
clustering methods for large hyperspectral images,” in Proc. Image Signal Process. Remote Sens. XXIII. Int. Soc. Opt. Photon., Oct.
2017, vol. 10427, p. 1,042,70I. doi: 10.1117/12.2278221.

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

[37] C. Cariou and K. Chehdi, “Unsupervised nearest neighbors
clustering with application to hyperspectral images,” IEEE J.
Sel. Topics Signal Process., vol. 9, no. 6, pp. 1105–1116, 2015. doi:
10.1109/JSTSP.2015.2413371.
[38] A. Paoli, F. Melgani, and E. Pasolli, “Clustering of hyperspectral
images based on multiobjective particle swarm optimization,”
IEEE Trans. Geosci. Remote Sens., vol. 47, no. 12, pp. 4175–4188,
2009. doi: 10.1109/TGRS.2009.2023666.
[39] H. Jiao, Y. Zhong, and L. Zhang, “An unsupervised spectral
matching classifier based on artificial DNA computing for
hyperspectral remote sensing imagery,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 8, pp. 4524–4538, 2013. doi: 10.1109/
TGRS.2013.2282356.
[40] H. Zhang, H. Zhai, L. Zhang, and P. Li, “Spectral-spatial sparse
subspace clustering for hyperspectral images,” IEEE Trans.
Geosci. Remote Sens., vol. 54, no. 6, pp. 3672–3684, 2016. doi:
10.1109/TGRS.2016.2524557.
[41] H. Zhai, H. Zhang, L. Zhang, P. Li, and A. Plaza, “A new sparse
subspace clustering algorithm for hyperspectral remote sensing
imagery,” IEEE Geosci. Remote Sens. Lett., vol. 14, no. 1, pp. 43–
47, 2017. doi: 10.1109/LGRS.2016.2625200.
[42] Y. Zhong, L. Zhang, B. Huang, and P. Li, “An unsupervised artificial immune classifier for multi/hyperspectral remote sensing imagery,” IEEE Trans. Geosci. Remote Sens., vol. 44, no. 2,
pp. 420–431, 2006. doi: 10.1109/TGRS.2005.861548.
[43] R. Wang, F. Nie, and W. Yu, “Fast spectral clustering with anchor graph for large hyperspectral images,” IEEE Geosci. Remote
Sens. Lett., vol. 14, no. 11, pp. 2003–2007, 2017. doi: 10.1109/
LGRS.2017.2746625.
[44] Q. Yan, Y. Ding, J. J. Zhang, Y. Xia, and C. H. Zheng, “A discriminated similarity matrix construction based on sparse subspace
clustering algorithm for hyperspectral imagery,” Cognit. Syst. Res.,
vol. 53, pp. 98–110, Jan. 2019. doi: 10.1016/j.cogsys.2018.01.003.
[45] C. W. Ahn, M. F. Baumgardner, and L. L. Biehl, “Delineation of
soil variability using geostatistics and fuzzy clustering analyses of
hyperspectral data,” Soil Sci. Soc. Amer. J., vol. 63, no. 1, pp. 142–
150, 1999. doi: 10.2136/sssaj1999.03615995006300010021x.
[46] D. Lavenier, “FPGA implementation of the k-means clustering
algorithm for hyperspectral images,” Los Alamos National Lab,
LAUR, Los Alamos, NM, 2000. [Online]. Available: https://
www.researchgate.net/publication/2582177_FPGA_imple
mentation_of_the_k-means_clustering_algorithm_for_hyper
spectral_images
[47] S. Lloyd, “Least squares quantization in PCM,” IEEE Trans.
Inf. Theory, vol. 28, no. 2, pp. 129–137, 1982. doi: 10.1109/
TIT.1982.1056489.
[48] K. Alsabti, S. Ranka, and V. Singh, “An efficient k-means clustering algorithm,” Elect. Eng. Comput. Sci., vol. 43, 1997.
[49] S. A. El Rahman, “Hyperspectral imaging classification using ISODATA algorithm: Big data challenge,” in Proc. IEEE Int.
Conf. E-Learn. (ECOF), Oct. 2015, pp. 247–250. doi: 10.1109/
ECONF.2015.39.
[50] J. M. Haut, M. Paoletti, J. Plaza, and A. Plaza, “Cloud implementation of the K-means algorithm for hyperspectral image
analysis,” J. Supercomput., vol. 73, no. 1, pp. 514–529, 2017. doi:
10.1007/s11227-016-1896-3.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

[51] B. Zhao, L. Gao, W. Liao, and B. Zhang, “A new kernel method for hyperspectral image feature extraction,” Geo-spat. Inf.
Sci., vol. 20, no. 4, pp. 309–318, 2017. doi: 10.1080/10095020.
2017.1403088.
[52] B. Zhang, S. Li, C. Wu, L. Gao, W. Zhang, and M. Peng, “A
neighbourhood-constrained k-means approach to classify very
high spatial resolution hyperspectral imagery,” Remote Sens.
Lett., vol. 4, no. 2, pp. 161–170, 2013. doi: 10.1080/2150704X.
2012.713139.
[53] W. Yang, K. Hou, B. Liu, F. Yu, and L. Lin, “Two-stage clustering technique based on the neighboring union histogram
for hyperspectral remote sensing images,” IEEE Access, vol. 5,
pp. 5640–5647, Apr. 2017. doi: 10.1109/ACCESS.2017.2695616.
[54] Z. Ren, L. Sun, Q. Zhai, and X. Liu, “Mineral mapping with hyperspectral image based on an improved k-means clustering algorithm,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS),
July 2019, pp. 2989–2992.
[55] B. C. Kuo and D. A. Landgrebe, “Nonparametric weighted feature extraction for classification,” IEEE Trans. Geosci. Remote
Sens., vol. 42, no. 5, pp. 1096–1105, May 2004. doi: 10.1109/
TGRS.2004.825578.
[56] C. C. Hung, S. Kulkarni, and B. C. Kuo, “A new weighted fuzzy
c-means clustering algorithm for remotely sensed image classification,” IEEE J. Sel. Topics Signal Process., vol. 5, no. 3, pp. 543–
553, 2010. doi: 10.1109/JSTSP.2010.2096797.
[57] Q. Wang and W. Shi, “Unsupervised classification based on fuzzy
c-means with uncertainty analysis,” Remote Sens. Lett., vol. 4, no. 11,
pp. 1087–1096, 2013. doi: 10.1080/2150704X.2013.832842.
[58] X. Liu, B. He, and X. Li, “Semi-supervised classification for
hyperspectral remote sensing image based on PCA and kernel
FCM algorithm,” in Proc. GeoInformatics Joint Conf. GIS Built Environ., Classif. Remote Sens. Images. Int. Soc. Opt. Photon., Nov.
2008, vol. 7147, p. 714,71I. doi: 10.1117/12.813255.
[59] S. Niazmardi, S. Homayouni, and A. Safari, “An improved FCM
algorithm based on the SVDD for unsupervised hyperspectral data classification,” IEEE J. Sel. Topics Appl. Earth Observ.
Remote Sens., vol. 6, no. 2, pp. 831–839, 2013. doi: 10.1109/
JSTARS.2013.2244851.
[60] D. L. Pham, “Spatial models for fuzzy clustering,” Comput.
Vis. Image Understand, vol. 84, no. 2, pp. 285–297, 2001. doi:
10.1006/cviu.2001.0951.
[61] W. Pedrycz, “Conditional fuzzy c-means,” Pattern Recog.
Lett., vol. 17, no. 6, pp. 625–631, 1996. doi: 10.1016/01678655(96)00027-X.
[62] S. Li, B. Zhang, A. Li, X. Jia, L. Gao, and M. Peng, “Hyperspectral imagery clustering with neighborhood constraints,” IEEE
Geosci. Remote Sens. Lett., vol. 10, no. 3, pp. 588–592, 2012. doi:
10.1109/LGRS.2012.2215005.
[63] X. Y. Wang and J. Bu, “A fast and robust image segmentation using FCM with spatial information,” Dig. Signal Process., vol. 20,
no. 4, pp. 1173–1182, 2010. doi: 10.1016/j.dsp.2009.11.007.
[64] S. Chen and D. Zhang, “Robust image segmentation using FCM
with spatial constraints based on new kernel-induced distance
measure,” IEEE Trans. Syst., Man, Cybern. B Cybern., vol. 34,
no. 4, pp. 1907–1916, 2004. doi: 10.1109/TSMCB.2004.
831165.

[65] Y. Zhong, A. Ma, and L. Zhang, “An adaptive memetic fuzzy
clustering algorithm with spatial information for remote sensing
imagery,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 7,
no. 4, pp. 1235–1248, 2014. doi: 10.1109/JSTARS.2014.2303634.
[66] G. Bilgin, S. Erturk, and T. Yildirim, “Unsupervised classification of hyperspectral-image data using fuzzy approaches
that spatially exploit membership relations,” IEEE Geosci. Remote Sens. Lett., vol. 5, no. 4, pp. 673–677, 2008. doi: 10.1109/
LGRS.2008.2002319.
[67] S. Krinidis and V. Chatzis, “A robust fuzzy local information Cmeans clustering algorithm,” IEEE Trans. Image Process., vol. 19,
no. 5, pp. 1328–1337, 2010. doi: 10.1109/TIP.2010.2040763.
[68] H. Zhang, Q. Wang, W. Shi, and M. Hao, “A novel adaptive
fuzzy local information c-means clustering algorithm for remotely sensed imagery classification,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 9, pp. 5057–5068, 2017. doi: 10.1109/
TGRS.2017.2702061.
[69] H. Zhang, L. Bruzzone, W. Shi, M. Hao, and Y. Wang, “ Enhanced
spatially constrained remotely sensed imagery classification
using a fuzzy local double neighborhood information c-means
clustering algorithm,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 11, no. 8, pp. 2896–2910, 2018. doi: 10.1109/
JSTARS.2018.2846603.
[70] A. K. Jain, “Data clustering: 50 years beyond K-means,” Pattern
Recognit. Lett., vol. 31, no. 8, pp. 651–666, 2010. doi: 10.1016/j.
patrec.2009.09.011.
[71] A. Rodriguez and A. Laio, “Clustering by fast search and find
of density peaks,” Science, vol. 344, no. 6191, pp. 1492–1496,
2014. doi: 10.1126/science.1242072.
[72] Y. Chen, S. Ma, X. Chen, and P. Ghamisi, “Hyperspectral data clustering based on density analysis ensemble,” Remote Sens. Lett., vol. 8,
no. 2, pp. 194–203, 2017. doi: 10.1080/2150704X.2016.1249295.
[73] H. Bäcklund, A. Hedblom, and N. Neijman, “A density-based
spatial clustering of application with noise,” Data Min. TNM,
vol. 33, pp. 11–30, Nov. 2011.
[74] S. Jia, G. Tang, J. Zhu, and Q. Li, “A novel ranking-based clustering approach for hyperspectral band selection,” IEEE Trans.
Geosci. Remote Sens., vol. 54, no. 1, pp. 88–102, 2015. doi:
10.1109/TGRS.2015.2450759.
[75] D. Comaniciu and P. Meer, “Mean shift: A robust approach toward feature space analysis,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 5, pp. 603–619, 2002. doi: 10.1109/34.1000236.
[76] X. Huang and L. Zhang, “An adaptive mean-shift analysis approach for object extraction and classification from urban hyperspectral imagery,” IEEE Trans. Geosci. Remote Sens., vol. 46,
no. 12, pp. 4173–4185, 2008. doi: 10.1109/TGRS.2008.2002577.
[77] J. M. Murphy and M. Maggioni, “Unsupervised clustering and
active learning of hyperspectral images with nonlinear diffusion,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 3, pp. 1829–
1845, 2018. doi: 10.1109/TGRS.2018.2869723.
[78] J. M. Murphy and M. Maggioni, “Spectral-spatial diffusion geometry for hyperspectral image clustering,” IEEE Geosci. Remote
Sens. Lett., vol. 17, no. 7, pp. 1243–1247, 2020. doi: 10.1109/
LGRS.2019.2943001.
[79] N. Acito, G. Corsini, and M. Diani, “An unsupervised algorithm for hyperspectral image segmentation based on the

[80]

[81]

[82]

[83]

[84]

[85]

[86]

[87]

[88]

[89]

[90]

[91]

[92]

Gaussian mixture model,” in Proc. IEEE Int. Geosci. Remote Sens.
Symp. (IGARSS), July 2003, vol. 6, pp. 3745–3747.
C. A. Shah, M. K. Arora, and P. K. Varshney, “Unsupervised
classification of hyperspectral data: An ICA mixture model
based approach,” Int. J. Remote Sens., vol. 25, no. 2, pp. 481–
487, 2004. doi: 10.1080/01431160310001618040.
C. A. Shah, P. K. Varshney, and M. K. Arora, “ICA mixture model algorithm for unsupervised classification of remote sensing
imagery,” Int. J. Remote Sens., vol. 28, no. 8, pp. 1711–1731,
2007. doi: 10.1080/01431160500462121.
C. F. Li, L. Liu, Y. M. Lei, J. Y. Yin, J. J. Zhao, and X. K. Sun,
“Clustering for HSI hyperspectral image with weighted PCA
and ICA,” J. Intell. Fuzzy Syst., vol. 32, no. 5, pp. 3729–3737,
2017. doi: 10.3233/JIFS-169305.
G. Celeux, “The SEM algorithm: A probabilistic teacher algorithm derived from the EM algorithm for the mixture problem,” Comput. Statist. Quart., vol. 2, pp. 73–82, Jan. 1985.
J. B. Courbot, V. Mazet, E. Monfrini, and C. Collet, “Pairwise
Markov fields for segmentation in astronomical hyperspectral
images,” Signal Process., vol. 163, pp. 41–48, Oct. 2019. doi:
10.1016/j.sigpro.2019.05.005.
S. D. Xenaki, K. D. Koutroumbas, A. A. Rontogiannis, and O.
A. Sykioti, “A layered sparse adaptive possibilistic approach for
hyperspectral image clustering,” in Proc. IEEE Geosci. Remote
Sens. Symp. (IGARSS), July 2014, pp. 2890–2893. doi: 10.1109/
IGARSS.2014.6947080.
C. Teodor, B. Alzenk, R. Constantinescu, and M. Datcu, “Unsupervised classification of EO-1 Hyperion hyperspectral
data using Latent Dirichlet allocation,” in Proc. IEEE Int. Symp.
Signals Circuits Syst. (ISSCS), July 2013, pp. 1–4. doi: 10.1109/
ISSCS.2013.6651211.
Y. Fang, L. Xu, J. Peng, H. Yang, A. Wong, and D. A. Clausi,
“Unsupervised Bayesian classification of a hyperspectral image
based on the spectral mixture model and Markov random field,”
IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 11, no. 9,
pp. 3325–3337, 2018. doi: 10.1109/JSTARS.2018.2858008.
A. Baraldi and F. Parmiggiani, “A neural network for unsupervised categorization of multivalued input patterns: An
application to satellite image clustering,” IEEE Trans. Geosci. Remote Sens., vol. 33, no. 2, pp. 305–316, Mar. 1995. doi:
10.1109/36.377930.
Y. Zhong, L. Zhang, and W. Gong, “Unsupervised remote sensing image classification using an artificial immune network,”
Int. J. Remote Sens., vol. 32, no. 19, pp. 5461–5483, 2011. doi:
10.1080/01431161.2010.502155.
J. Xu, H. Li, P. Liu, and L. Xiao, “A novel hyperspectral image clustering method with context-aware unsupervised discriminative extreme learning machine,” IEEE Access, vol. 6, pp. 16,176–
16,188, Mar. 2018. doi: 10.1109/ACCESS.2018.2813988.
H. H. Muhammed, “Unsupervised hyperspectral image segmentation using a new class of neuro-fuzzy systems based on
weighted incremental neural networks,” in Proc. IEEE Applied
Imagery Pattern Recognit. Workshop (AIPR), Oct. 2002, pp. 171–
177. doi: 10.1109/AIPR.2002.1182272.
S. Das, A. Abraham, and A. Konar, “Automatic clustering using an improved differential evolution algorithm,” IEEE Trans.

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

Syst., Man, Cybern. A, Syst. Humans, vol. 38, no. 1, pp. 218–237,
Jan. 2008. doi: 10.1109/TSMCA.2007.909595.
[93] Ç. Ari and S. Aksoy, “Unsupervised classification of remotely
sensed images using Gaussian mixture models and particle
swarm optimization,” in Proc. IEEE Int. Geosci. Remote Sens.
Symp. (IGARSS), July 2010, pp. 1859–1862. doi: 10.1109/
IGARSS.2010.5653855.
[94] A. Ma, Y. Zhong, and L. Zhang, “Adaptive multiobjective memetic fuzzy clustering algorithm for remote sensing imagery,”
IEEE Trans. Geosci. Remote Sens., vol. 53, no. 8, pp. 4202–4217,
2015. doi: 10.1109/TGRS.2015.2393357.
[95] A. Zhang et al., “Clustering of remote sensing imagery using
a social recognition-based multi-objective gravitational search
algorithm,” Cognit. Comput., vol. 11, no. 6, pp. 789–798, 2019.
doi: 10.1007/s12559-018-9582-9.
[96] Y. Wan, Y. Zhong, A. Ma, and L. Zhang, “Multi-objective sparse
subspace clustering for hyperspectral imagery,” IEEE Trans.
Geosci. Remote Sens., vol. 58, no. 4, pp. 2290–2307, 2019. doi:
10.1109/TGRS.2019.2947253.
[97] R. S. Zemel and M. Á. Carreira-Perpiñán, “Proximity graphs
for clustering and manifold learning,” in Proc. Adv. Neural Inf.
Process. Syst. (ANIPS), 2005, pp. 225–232.
[98] X. Zhu, C. Change Loy, and S. Gong, “Constructing robust affinity graphs for spectral clustering,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit. (CVPR), 2014, pp. 1450–1457.
[99] S. Liu, S. De Mello, J. Gu, G. Zhong, M. H. Yang, and J. Kautz,
“Learning affinity via spatial propagation networks,” in Proc.
Adv. Neural Inf. Process. Syst. (ANIPS), 2017, pp. 1520–1530.
[100] D. R. Karger and C. Stein, “A new approach to the minimum
cut problem,” J. ACM, vol. 43, no. 4, pp. 601–640, 1996. doi:
10.1145/234533.234534.
[101] S. Wang and J. M. Siskind, “Image segmentation with ratio cut,”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 6, pp. 675–
690, 2003. doi: 10.1109/TPAMI.2003.1201819.
[102] J. Shi and J. Malik, “Normalized cuts and image segmentation,”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 8, pp. 888–
905, Aug. 2000. doi: 10.1109/34.868688.
[103] P. Soundararajan and S. Sarkar, “Analysis of mincut, average
cut, and normalized cut measures,” in Proc. Workshop Percept.
Organiz. Comput. Vision (POCV), July 2001, pp. 1–4.
[104] C. H. Ding, X. He, H. Zha, M. Gu, and H. D. Simon, “A min-max
cut algorithm for graph partitioning and data clustering,” in
Proc. IEEE Int. Conf. Data Min. (ICDM), Nov. 2001, pp. 107–114.
[105] U. Von Luxburg, “A tutorial on spectral clustering,” Statist.
Comput., vol. 17, no. 4, pp. 395–416, 2007. doi: 10.1007/s11222007-9033-z.
[106] N. D. Cahill, W. Czaja, and D. W. Messinger, “Schroedinger eigenmaps with nondiagonal potentials for spatial-spectral clustering of hyperspectral imagery,” in Proc. Alg. Tech. Multispe.
Hyperspe. Ultraspe. Image. XX. Int. Soc. Opt. Photon. (ISOP), June
2014, vol. 9088, p. 908,804.
[107] W. Zhu et al., “Unsupervised classification in hyperspectral
imagery with nonlocal total variation and primal-dual hybrid gradient algorithm,” IEEE Trans. Geosci. Remote Sens.,
vol. 55, no. 5, pp. 2786–2798, 2017. doi: 10.1109/TGRS.
2017.2654486.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

[108] L. Fan and D. W. Messinger, “Joint spatial–spectral hyperspectral image clustering using block-diagonal amplified affinity
matrix,” Opt. Eng., vol. 57, no. 3, p. 033107, 2018. doi: 10.1117/1.
OE.57.3.033107.
[109] B. Hufnagl and H. Lohninger, “A graph-based clustering
method with special focus on hyperspectral imaging,” Anal.
Chimica Acta, vol. 1097, pp. 37–48, Feb. 2020. doi: 10.1016/j.
aca.2019.10.071.
[110] Z. Meng, E. Merkurjev, A. Koniges, and A. L. Bertozzi, “Hyperspectral image classification using graph clustering methods,”
Image Process. Line, vol. 7, pp. 218–245, Aug. 2017. doi: 10.5201/
ipol.2017.204.
[111] A. Hassanzadeh, T. Kauranne, and A. Kaarna, “A multi-manifold clustering algorithm for hyperspectral remote sensing imagery,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS),
July 2016, pp. 3326–3329.
[112] A. Hassanzadeh, A. Kaarna, and T. Kauranne, “Unsupervised
multi-manifold classification of hyperspectral remote sensing images with contractive autoencoder,” in Proc. Scandinavian Conf. Image Anal. (SCIA), Cham: Springer-Verlag, June 2017, pp. 169–180.
[113] N. Gillis, D. Kuang, and H. Park, “Hierarchical clustering of
hyperspectral images using rank-two nonnegative matrix
factorization,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 4,
pp. 2066–2078, 2014. doi: 10.1109/TGRS.2014.2352857.
[114] L. Tian, Q. Du, I. Kopriva, and N. Younan, “Orthogonal graphregularized non-negative matrix factorization for hyperspectral image clustering,” in Proc. IEEE Int. Geosci. Remote Sens.
Symp. (IGARSS), July 2019, pp. 795–798.
[115] W. Liu, S. Li, X. Lin, Y. Wu, and R. Ji, “Spectral–spatial co-clustering of hyperspectral image data based on bipartite graph,”
Multimedia Syst., vol. 22, no. 3, pp. 355–366, 2016. doi: 10.1007/
s00530-015-0450-0.
[116] A. Hassanzadeh, A. Kaarna, and T. Kauranne, “Sequential
spectral clustering of hyperspectral remote sensing image over
bipartite graph,” Appl. Soft Comput., vol. 73, pp. 727–734, Dec.
2018. doi: 10.1016/j.asoc.2018.09.015.
[117] N. Huang, L. Xiao, and Y. Xu, “Bipartite graph partition based
coclustering with joint sparsity for hyperspectral images,” IEEE
J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 12, no. 12,
pp. 4698–4711, 2019. doi: 10.1109/JSTARS.2019.2953378.
[118] W. Liu, J. He, and S.-F. Chang, “Large graph construction for
scalable semi-supervised learning,” in Proc. Int. Conf. Mach.
Learn. (ICML), 2010, pp. 679–686.
[119] D. Cai and X. Chen, “Large scale spectral clustering via landmarkbased sparse representation,” IEEE Trans. Cybern., vol. 45,
no. 8, pp. 1669–1680, Aug. 2015.
[120] F. Nie, W. Zhu, and X. Li, “Unsupervised large graph embedding,”
in Proc. 31st Conf. Artif. Intell. (AAAI), 2017, pp. 2422–2428.
[121] R. Wang, F. Nie, Z. Wang, F. He, and X. Li, “Scalable graphbased clustering with nonnegative relaxation for large hyperspectral image,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 10,
pp. 7352–7364, 2019. doi: 10.1109/TGRS.2019.2913004.
[122] E. Elhamifar and R. Vidal, “Sparse subspace clustering: Algorithm, theory, and applications,” IEEE Trans. Pattern Anal.
Mach. Intell., vol. 35, no. 11, pp. 2765–2781, 2013. doi: 10.1109/
TPAMI.2013.57.

[123] R. Vidal and P. Favaro, “Low rank subspace clustering (LRSC),”
Pattern Recognit. Lett., vol. 43, pp. 47–61, July 2014. doi:
10.1016/j.patrec.2013.08.006.
[124] Z. Wu, M. Yin, Y. Zhou, X. Fang, and S. Xie, “Robust spectral
subspace clustering based on least square regression,” Neural
Process. Lett., vol. 48, no. 3, pp. 1359–1372, 2018. doi: 10.1007/
s11063-017-9726-z.
[125] V. M. Patel, H. Van Nguyen, and R. Vidal, “Latent space sparse
subspace clustering,” in Proc. IEEE Int. Conf. Comput. Vision
(ICCV), 2013, pp. 225–232.
[126] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma, “Robust recovery
of subspace structures by low-rank representation,” IEEE Trans.
Pattern Anal. Mach. Intell., vol. 35, no. 1, pp. 171–184, 2012. doi:
10.1109/TPAMI.2012.88.
[127] E. Elhamifar and R. Vidal, “Sparse subspace clustering,” in Proc.
IEEE Conf. Comput. Vision Pattern Recog. (CVPR), June 2009,
pp. 2790–2797.
[128] A. Li, A. Qin, Z. Shang, and Y. Y. Tang, “Spectral-spatial sparse
subspace clustering based on three-dimensional edge-preserving filtering for hyperspectral image,” Int. J. Pattern Recognit.
Artif. Intell., vol. 33, no. 3, p. 1,955,003, 2019. doi: 10.1142/
S0218001419550036.
[129] S. Huang, H. Zhang, and A. Pižurica, “Semisupervised sparse
subspace clustering method with a joint sparsity constraint for
hyperspectral remote sensing images,” IEEE J. Sel. Topics Appl.
Earth Observ. Remote Sens., vol. 12, no. 3, pp. 989–999, 2019.
doi: 10.1109/JSTARS.2019.2895508.
[130] H. Zhai, H. Zhang, L. Zhang, and P. Li, “Laplacian-regularized low-rank subspace clustering for hyperspectral image
band selection,” IEEE Trans. Geosci. Remote Sens., vol. 57,
no. 3, pp. 1723–1740, 2019. doi: 10.1109/TGRS.2018.
2868796.
[131] J. Xu, N. Huang, and L. Xiao, “Spectral-spatial subspace clustering for hyperspectral images via modulated low-rank representation,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS),
pp. 3202–3205, July 2017.
[132] Y. Long, X. Deng, G. Zhong, J. Fan, and F. Liu, “Gaussian kernel
dynamic similarity matrix based sparse subspace clustering for
hyperspectral images,” in Proc. Int. Conf. Comput. Intell. Security
(CIS), Dec. 2019, pp. 211–215.
[133] P. A. Traganitis and G. B. Giannakis, “Sketched subspace clustering,” IEEE Trans. Signal Process., vol. 66, no. 7, pp. 1663–1675,
2017. doi: 10.1109/TSP.2017.2781649.
[134] S. Huang, H. Zhang, Q. Du, and A. Pižurica, “Sketch-based
subspace clustering of hyperspectral images,” Remote Sens.,
vol. 12, no. 5, p. 775, 2020. doi: 10.3390/rs12050775.
[135] H. Zhai, H. Zhang, L. Zhang, and P. Li, “Reweighted mass center based object-oriented sparse subspace clustering for hyperspectral images,” J. Appl. Remote Sens., vol. 10, no. 4, p. 046014,
2016. doi: 10.1117/1.JRS.10.046014.
[136] L. Wang et al., “Fast high-order sparse subspace clustering
with cumulative MRF for hyperspectral images,” IEEE Geosci. Remote Sens. Lett., early access, 2020. doi: 10.1109/LGRS.
2020.2968350.
[137] M. Zeng, Y. Cai, X. Liu, Z. Cai, and X. Li, “Spectral-spatial clustering of hyperspectral image based on Laplacian regularized

deep subspace clustering,” in Proc. IEEE Int. Geosci. Remote Sens.
Symp. (IGARSS), July 2019, pp. 2694–2697.
[138] M. Brbić and I. Kopriva, “Multi-view low-rank sparse subspace
clustering,” Pattern Recognit., vol. 73, pp. 247–258, Jan. 2018.
doi: 10.1016/j.patcog.2017.08.024.
[139] L. Tian, Q. Du, I. Kopriva, and N. Younan, “Spatial-spectral
based multi-view low-rank sparse subspace clustering for hyperspectral imagery,” in Proc. IEEE Int. Geosci. Remote Sens.
Symp. (IGARSS), July 2018, pp. 8488–8491.
[140] L. Tian, Q. Du, I. Kopriva, and N. Younan, “Kernel spatial-spectral
based multi-view low-rank sparse subspace clustering for
hyperspectral imagery,” in Proc. IEEE Workshop Hyperspec. Image
Signal Process. Evol. Remote Sens. (WHISPERS), Sept. 2018, pp. 1–4.
[141] L. Tian and Q. Du, “Parallel multi-view low-rank and sparse
subspace clustering for unsupervised hyperspectral image
classification,” in Proc. IEEE Asia-Pacific Signal Inf. Process. Assoc. Annu. Summit Conf. (APSIPA ASC), Nov. 2018, pp. 618–621.
[142] H. Zhai, H. Zhang, X. Xu, L. Zhang, and P. Li, “Kernel sparse
subspace clustering with a spatial max pooling operation for
hyperspectral remote sensing data interpretation,” Remote
Sens., vol. 9, no. 4, p. 335, 2017. doi: 10.3390/rs9040335.
[143] F. De Morsier, M. Borgeaud, V. Gass, J. P. Thiran, and D. Tuia,
“Kernel low-rank and sparse graph for unsupervised and semisupervised classification of hyperspectral images,” IEEE Trans.
Geosci. Remote Sens., vol. 54, no. 6, pp. 3410–3420, 2016. doi:
10.1109/TGRS.2016.2517242.
[144] J. Bacca, C. A. Hinojosa, and H. Arguello, “Kernel sparse subspace clustering with total variation denoising for hyperspectral remote sensing images,” in Proc. Math. Imag. Opt. Soc. Amer.
(MIOSA), June 2017, pp. MTu4C–MTu45.
[145] M. E. Paoletti, J. M. Haut, J. Plaza, and A. Plaza, “Deep learning
classifiers for hyperspectral imaging: A review,” ISPRS J. Photogram. Remote Sens., vol. 158, pp. 279–317, 2019. doi: 10.1016/j.
isprsjprs.2019.09.006.
[146] E. Min, X. Guo, Q. Liu, G. Zhang, J. Cui, and J. Long, “A survey
of clustering with deep learning: From the perspective of network architecture,” IEEE Access, vol. 6, pp. 39,501–39,514, July
2018. doi: 10.1109/ACCESS.2018.2855437.
[147] B. Yang, X. Fu, N. D. Sidiropoulos, and M. Hong, “Towards
k-means-friendly spaces: Simultaneous deep learning and
clustering,” in Proc. Int. Conf. Mach. Learn. (ICML), July 2017,
pp. 3861–3870.
[148] D. Chen, J. Lv, and Y. Zhang, “Unsupervised multi-manifold
clustering by learning deep representation,” in Proc. Workshop
31th AAAI Conf. Artif. Intell. (AAAI), Mar. 2017, pp. 385–391.
[149] X. Yang, C. Deng, F. Zheng, J. Yan, and W. Liu, “Deep spectral clustering using dual autoencoder network,” in Proc.
IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2019, pp.
4066–4075.
[150] K. Tian, S. Zhou, and J. Guan, “Deepcluster: A general clustering framework based on deep learning,” in Proc. Joint Eur. Conf.
Mach. Learn. Knowl. Discovery Databases, Cham: Springer-Verlag, Sept. 2017, pp. 809–825.
[151] P. Ji, T. Zhang, H. Li, M. Salzmann, and I. Reid, “Deep subspace clustering networks,” in Proc. Adv. Neural Inf. Process. Syst.
(NIPS), 2017, pp. 24–33.
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

[152] X. Peng, J. Feng, S. Xiao, W. Y. Yau, J. T. Zhou, and S. Yang,
“Structured autoencoders for subspace clustering,” IEEE
Trans. Image Process., vol. 27, no. 10, pp. 5076–5086, 2018. doi:
10.1109/TIP.2018.2848470.
[153] J. Zhang et al., “Self-supervised convolutional subspace clustering network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
(CVPR), 2019, pp. 5473–5482.
[154] M. Zeng, Y. Cai, Z. Cai, X. Liu, P. Hu, and J. Ku, “Unsupervised
hyperspectral image band selection based on deep subspace
clustering,” IEEE Geosci. Remote Sens. Lett., vol. 16, no. 12,
pp. 1889–1893, 2019. doi: 10.1109/LGRS.2019.2912170.
[155] C. C. Hsu and C. W. Lin, “CNN-based joint clustering and
representation learning with feature drift compensation for
large-scale image data,” IEEE Trans. Multimedia, vol. 20, no. 2,
pp. 421–429, 2017. doi: 10.1109/TMM.2017.2745702.
[156] G. Chen, “Deep learning with nonparametric clustering,”
2015. [Online]. Available: http://arxiv.org/abs/1501.03084
[157] J. Xie, R. Girshick, and A. Farhadi, “Unsupervised deep embedding for clustering analysis,” in Proc. Int. Conf. Mach. Learn.
(ICML), June 2016, pp. 478–487.
[158] F. Li, H. Qiao, and B. Zhang, “Discriminatively boosted image clustering with fully convolutional auto-encoders,” Pattern
Recognit., vol. 83, pp. 161–173, Nov. 2018. doi: 10.1016/j.patcog
.2018.05.019.
[159] J. Yang, D. Parikh, and D. Batra, “Joint unsupervised learning
of deep representations and image clusters,” in Proc. IEEE Conf.
Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 5147–5156.
[160] X. Peng, J. Feng, S. Xiao, J. Lu, Z. Yi, and S. Yan, “Deep sparse
subspace clustering,” 2017. [Online]. Available: http://arxiv.org/
abs/1709.08374
[161] U. Shaham, K. Stanton, H. Li, B. Nadler, R. Basri, and Y. Kluger,
“Spectralnet: Spectral clustering using deep neural networks,”
2018. [Online]. Available: http://arxiv.org/abs/1801.01587
[162] J. T. Springenberg, “Unsupervised and semi-supervised learning with categorical generative adversarial networks,” 2015.
[Online]. Available: http://arxiv.org/abs/1511.06390
[163] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and
P. Abbeel, “Infogan: Interpretable representation learning by
information maximizing generative adversarial nets,” in Proc.
Adv. Neural Inf. Process. Syst. (NIPS), 2016, pp. 2172–2180.
[164] W. Harchaoui, P. A. Mattei, and C. Bouveyron, “Deep adversarial Gaussian mixture auto-encoder for clustering,” in Proc.
Int. Conf. Learn. Represent. (ICLR), 2017, pp. 1–5.
[165] P. Zhou, Y. Hou, and J. Feng, “Deep adversarial subspace clustering,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
2018, pp. 1596–1604.
[166] Z. Jiang, Y. Zheng, H. Tan, B. Tang, and H. Zhou, “Variational
deep embedding: An unsupervised and generative approach
to clustering,” 2016. [Online]. Available: http://arxiv.org/
abs/1611.05148
[167] N. Dilokthanakul et al., “Deep unsupervised clustering with
Gaussian mixture variational autoencoders,” 2016. [Online].
Available: http://arxiv.org/abs/1611.02648

[168] V. E. Neagoe and V. Chirila-Berbentea, “Improved Gaussian
mixture model with expectation-maximization for clustering
of remote sensing imagery,” in Proc. IEEE Int. Geosci. Remote
Sens. Symp. (IGARSS), 2016, pp. 3063–3065.
[169] H. Xie et al., “Unsupervised hyperspectral remote sensing image clustering based on adaptive density,” IEEE Geosci. Remote
Sens. Lett., vol. 15, no. 4, pp. 632–636, 2018. doi: 10.1109/
LGRS.2017.2786732.
[170] K. K. Singh, M. J. Nigam, K. Pal, and A. Mehrotra, “A fuzzy Kohonen local information c-means clustering for remote sensing
imagery,” IETE Tech. Rev., vol. 31, no. 1, pp. 75–81, 2014. doi:
10.1080/02564602.2014.891375.
[171] X. Sun, L. Yang, L. Gao, B. Zhang, S. Li, and J. Li, “Hyperspectral image clustering method based on artificial bee colony
algorithm and Markov random fields,” J. Appl. Remote Sens.,
vol. 9, no. 1, p. 095047, 2015. doi: 10.1117/1.JRS.9.095047.
[172] S. G. Beaven, G. G. Hazel, and A. D. Stocker, “Automated
Gaussian spectral clustering of hyperspectral data,” in Proc. Alg.
Tech. Multispec. Hyperspec. Ultraspec. Image. VIII. Int. Soc. Opt.
Photo. (ISOP), 2002, vol. 4725, pp. 254–267.
[173] L. Galluccio, O. Michel, P. Comon, and A. O. Hero, III, “Graph
based k-means clustering,” Signal Process., vol. 92, no. 9, pp. 1970–
1984, 2012. doi: 10.1016/j.sigpro.2011.12.009.
[174] N. Huang and L. Xiao, “Hyperspectral image clustering via
sparse dictionary-based anchored regression,” IET Image Process., vol. 13, no. 2, pp. 261–269, 2018. doi: 10.1049/iet-ipr
.2018.5421.
[175] H. W. Kuhn, “The Hungarian method for the assignment problem,” Naval Res. Logist. Quart., vol. 2, no. 1–2, pp. 83–97, 1955.
doi: 10.1002/nav.3800020109.
[176] G. Carpaneto and P. Toth, “Algorithm 548: Solution of the
assignment problem,” ACM Trans. Math. Softw., vol. 6, no. 1,
pp. 104–111, 1980. doi: 10.1145/355873.355883.
[177] H. Yuan and Y. Y. Tang, “A novel sparsity-based framework using
max pooling operation for hyperspectral image classification,”
‘‘ IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 7, no. 8,
pp. 3570–3576, Aug. 2014. doi: 10.1109/JSTARS.2014.2339298.
[178] L. V. D. Maaten and G. Hinton, “Visualizing data using t-SNE,”
J. Mach. Learn. Res., vol. 9, pp. 2579–2605, Nov. 2008.
[179] L. Van Der Maaten, “Fast optimization for t-SNE,” in Proc. Adv.
Neural Inf. Process. Syst. (NIPS), Sept. 2010, vol. 100, pp. 1–5.
[180] M. Maggioni and J. M. Murphy, “Learning by unsupervised
nonlinear diffusion,” J. Mach. Learn. Res., vol. 20, no. 160,
pp. 1–56, 2019.
[181] W. Czaja and M. Ehler, “Schroedinger eigenmaps for the analysis
of biomedical data,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35,
no. 5, pp. 1274–1280, 2012. doi: 10.1109/TPAMI.2012.270.
[182] G. Schiebinger, M. J. Wainwright, and B. Yu, “The geometry
of kernelized spectral clustering,” Ann. Statist., vol. 43, no. 2,
pp. 819–846, 2015. doi: 10.1214/14-AOS1283.
[183] M. Soltanolkotabi, E. Elhamifar, and E. J. Candes, “Robust
subspace clustering,” Ann. Statist., vol. 42, no. 2, pp. 669–699,
2014. doi: 10.1214/13-AOS1199.
GRS

DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

Methods, applications, and
future directions

Digital Object Identifier 10.1109/MGRS.2021.3063465
Date of current version: 5 April 2021

hange detection is a vibrant area of research in remote
sensing. Thanks to increases in the spatial resolution of
remote sensing images, subtle changes at a finer geometrical scale can now be effectively detected. However, change
detection from very-high-spatial-resolution (VHR) (≤5 m)
remote sensing images is challenging due to limited spectral
information, spectral variability, geometric distortion, and
information loss. To address these challenges, many change
detection algorithms have been developed. However, a
comprehensive review of change detection in VHR images
is lacking in the existing literature. This review aims to fill
the gap and mainly includes three aspects: methods, applications, and future directions.

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DAWEI WEN, XIN HUANG,
FRANCESCA BOVOLO, JIAYI LI,
XINLI KE, ANLU ZHANG, AND
JÓN ATLI BENEDIKTSSON

DECEMBER 2021

©SHUTTERSTOCK.COM/VORAN

Change Detection From
Very-High-SpatialResolution Optical
Remote Sensing Images

Textural Features

Land Cover and Land Use
Feature
Extraction

Buildings
Vegetation

Frequency

Object-Based Features

Scale

Angular Features

Crops

Ecosystem Services
Impervious Surfaces
(a)

Change
Detection

Global Change Detection

Detailed

Algebra

Lakes and Wetlands

Change Tracking

Deep Features

Hyperspectral Change Detection

Transforms
Machine Learning

Semantic

End-to-End Architectures

Urban Functional Zone Changes

(b)

(c)

FIGURE 1. An outline of this review, including (a) applications, (b) methods, and (c) future directions.

BACKGROUND
Change detection is a vibrant area of research with wideranging applicability, including damage assessment, land
management, and environment monitoring. Due to the
revisit property of Earth observation sensors, multitemporal remote sensing images at a large geographical scale
can be acquired easily and conveniently. Due to their extensive availability, optical images become the main data
sources for change detection [1]. Since these satellite sensors are able to acquire images with meter and submeter
spatial resolutions, ground objects in fine spatial detail can
be investigated [2]. Subtle change detection using these
VHR images has drawn great interest in both the academic
and industrial communities. However, multitemporal VHR
images exhibit unique properties, such as limited spectral
information, intrinsic spectral variability, spatial displacement, and information loss, that limit the usefulness of
traditional change detection methods. Therefore, a great
number of studies have been carried out on VHR change
detection, and a series of new research topics has emerged
along with advances in remote sensing technology and data
computing methods. In this regard, a timely overview of
VHR change detection is required to summarize the new
techniques and applications.
Although a number of reviews about change detection
using remote sensing data [3]–[10] exist in the literature,
the publications discuss general change detection methods and do not focus on high-spatial-resolution images.
Only a few available works involve VHR images, e.g., the
reviews in [6] and [7]. However, those two works concern
object-based change detection methods for VHR data, neglecting other aspects, e.g., recent technological advances
in deep learning and multiview and 3D change detection.
Moreover, specific applications of VHR change detection
have rarely been summarized and discussed in the currently available literature. Therefore, a comprehensive review
of change detection from VHR remote sensing images,
including methods, applications, and future directions, is
presented (Figure 1).
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

ISSUES RELATED TO VHR IMAGES
AND THEIR CHANGE DETECTION
With the ongoing development of remote sensing imaging techniques, an increasing number of VHR sensors are
available, and many new sensors are being planned and
launched [11]. New platforms, such as unmanned aerial
vehicles (UAVs) and remotely piloted aircraft systems, have
grown in popularity [12] and are now providing a large
amount of VHR remote sensing data. As seen in Table 1,
the imaging capabilities of VHR platforms and sensors

TABLE 1. THE MAIN PARAMETERS OF SOME VHR SENSORS.

SENSOR

SPATIAL
RESOLUTION
(M)

NUMBER
OF BANDS

REVISIT
TIME (DAYS)

LAUNCH
YEAR

IKONOS

Four

One to three

1991

QuickBird

0.61

Four

1.5–2.5

2001

SPOT-5

2.5

Four

2002

OrbView-3

Four

Three

2003

Cartosat-2

0.8

One

Four

2007

WorldView-1

0.5

One

1.7

2007

GeoEye-1

0.41

Four

Fewer than
three

2008

WorldView-2

0.46

Four

1.1

2009

KOMPSAT-3

0.7

Four

Three

2012

Ziyuan-3

2.1

Four

Four to five

2012

SPOT-6/7

Four

One

2012/2014

Gaofen-1

Four

Fewer than
four

2013

Gaofen-2

0.8

Four

2014

Planet Labs

Four

One or two

2014

Deimos-2

Four

One or two

2014

WorldView-3

0.31

Fewer than
one

2014

DMC-3

Four

One

2015

WorldView-4

0.31

Four

Fewer than
one

2016

SPOT: Satellite Pour l’Observation de la Terre; KOMPSAT: Korean Multipurpose Satellite;
DMC: Disaster Monitoring Constellation.

are continually being improved with higher spatial resolutions, more spectral bands, and higher temporal revisit
frequencies. In addition, most VHR sensors provide an
along-track and across-track pair for stereo capture [12],
[13]. With the improved capability of VHR remote sensing
equipment, it is now becoming possible to achieve subtle,
detailed, and frequent 3D change detection. Although
change detection using VHR images is advantageous, from
a technological point of view, it remains a challenge due to
1) limited spectral information, 2) intrinsic spectral variability, 3) spatial displacement, and 4) information loss, as
discussed in the following.
1) Limited spectral information: Compared to coarse- and medium-resolution sensors, images captured by VHR sensors usually provide a smaller number of bands. Although
WorldView-3, one of the most advanced VHR sensors, can
provide images with 16 spectral bands, most VHR images, e.g., from IKONOS, QuickBird, WorldView-2, and
Ziyuan-3, cover only four bands (blue, green, red, and
near-infrared) [14]. With limited spectral information, it
is difficult to separate classes that have similar spectral
signatures because of the low between-class variance
[15]–[18]. Researchers have also pointed out that it is difficult to achieve high-accuracy change detection with the
limited spectral information [5], [15], [19]–[21] of VHR
images. This may inhibit the direct use of traditional spectral-based change detection methods, e.g., change vector
analysis (CVA) [22]. Therefore, other categories of features
are often adopted to augment the spectral information for
VHR change detection.
2) Spectral variability: There exists a high degree of spectral
variability in VHR images. Buildings, for example, have
complicated appearances, with various roof superstructures, such as chimneys, water tanks, and pipelines; this
leads to significantly heterogeneous spectral characteristics in VHR images [23], [24]. High spectral variability within geographic objects increases the within-class
variance, which inevitably leads to the uncertainty of
spectral-based image interpretation methods. External

(a)

(b)

(c)

FIGURE 2. The spatial displacement in multispectral data acquired
with different viewing geometries in an unchanged urban scene
[21]: (a) Image (t1), with a satellite angle zenith of 153°, and (b)
image (t 2), with a satellite angle zenith of 129°12´. (c) The result of
traditional spectral-based CVA shows a high number of false alarms
(black and white indicate unchanged and changed areas, respectively) [31].

factors, such as atmospheric conditions, phenological
stages, sun angles, soil moisture, tidal stages, and water turbidity, may make unchanged objects temporally
variant in their spectral features and hence result in
them being incorrectly identified as changed ones [25],
[26]. In addition, temporary objects, such as cars on a
road, visible in VHR images can also affect the performance of traditional spectral-based change detection
methods using VHR images.
3) Spatial displacement: The VHR imaging systems on optical satellites are highly agile platforms and can operate
as constellations [27] that can support rapid retargeting,
high revisit times (for instance, <1 day for WorldView-3
and WorldView-4), and stereoscopic coverage for rapid
disaster response and 3D change detection [28]. However, this imaging mode makes it extremely difficult to
acquire multitemporal images with the same or close
viewing angles for accurate change detection [29], [30].
As such, multitemporal VHR images may suffer from
apparent spatial displacement due to the parallax distortion of land cover objects, especially for high-rise
buildings [31]. Specifically, a building may display distinct spatial morphologies (e.g., roofs and facades) in
multitemporal VHR images due to different viewing angles (Figure 2). This may lead to a large number of commission errors if traditional spectral and pixel-based
change detection methods are adopted. To solve such
a problem, precise orthorectification using VHR digital
surface models (DSMs) is a feasible solution. In particular, sensors equipped with multiview imaging systems,
for instance, the three-line array of Ziyuan-3 and the
two cameras of Cartosat-2, that can nearly simultaneously collect multiview images are preferred in similar
atmospheric conditions for their stereo pairs and convenient collection of multitemporal data.
4) Information loss: VHR images suffer from serious information loss owing to the presence of clouds/haze, cloud
shadows, and shadows cast by terrain, buildings, and
trees. The problem of cloud and cloud shadow contamination can be avoided by selecting cloud-free observations [32]. However, shadows cast by terrain, buildings,
and trees seem unavoidable in VHR imagery, especially
in urban areas [33]. Although shadow information is
useful in building detection and height estimation [34]–
[36], it becomes a problem for change detection in wider
areas [37]. Since the direction and length of shadows are
dependent on the sun’s azimuth and elevation angle at
the time of image acquisition, shadow-affected areas are
different in multitemporal images. Besides, in the case
of occlusions by vertical structures (e.g., high-rise buildings and trees), the problem of information loss can be
more complicated. With different viewing geometries in
multitemporal images, the size and direction of the tilting effect can vary, as shown in Figure 2. Overall, the
regions affected by shadow and occlusions may become
invisible and different in multitemporal VHR images.
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

METHODS
Change detection methods for VHR images are commonly
based on two steps: 1) feature extraction and 2) change detection (see Figure 1).

the challenges of limited spectral information and intrinsic
spectral variability. A summary of the major features used
for VHR change detection, including categories, subcategories, descriptions, characteristics, most-used sensors, and
corresponding references, is presented in Table 2.

1
6

5
1

6
0

4
0

3
0

2
1

1
1

FEATURE EXTRACTION
Change detection methods rely on effective multitemporal
feature representation to indicate whether and what changes
have occurred. It has been agreed that spectral-based methods become ineffective in dealing with the challenges facing VHR change detection. During the past decades, a large
number of image features have been extracted, which can
compensate for the limited spectral information contained
in VHR images and improve the discriminative capability
of image change information. In this review, image features
designed for VHR change detection are divided into the following categories: textural, deep, object based, and angular (Figure 3). These are potentially useful for dealing with

Statistical

Model Based

Transform Based

(a)
Convolution

Autoencoder

Single Object

Openings

Structural

Code

Encoder

Closings

TEXTURAL FEATURES
Textural features depict contextual and structural information by using a moving window or kernel, where the parameters of size, direction, and distance must be appropriately
determined [5], [38]. Textural features for VHR change detection can be categorized as statistical, structural, model
based, and transform based. Statistical textures describe
the relationships between the gray levels of local windows,
e.g., the gray-level cooccurrence matrix (GLCM); local binary patterns (LBPs); and pixel shape index (PSI). The GLCM,
the most popular statistical texture, measures the contrast
(e.g., dissimilarity and homogeneity), orderliness (e.g., the

Pooling

Decoder

Radiometry

(b)
Two Objects

Convolution

Fully
Connected

Convolutional Neural Network
Adjacency

Geometry

Proximity

Texture

Relations
Second Level
(c)

First Level

Pooling

Multiple Objects

Spatial
Arrangements

Third Level
Angular Variation

Stereo Photogrammetry
Forward

Nadir

DSM
Implicit

Backward

ADF
(d)

Explicit

FIGURE 3. Features for change detection using high-spatial-resolution remote sensing images. (a) Textural features. (b) Deep features.

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

geometric information of relevant structures is preserved
and unimportant details are attenuated [48], [49]. MPs
and APs have proved to be effective in VHR change detection since they can simplify results and reduce noise components (e.g., spectral variations) [48], [49]. For instance,
Liu et al. [50] took the geometrical structure of change targets into account using MPs. In addition, the morphological
building index (MBI) [36], which is defined as differential
MPs with linear structural elements, has been extensively
used in VHR change detection in urban areas since it can
highlight bright and high-contrast structures, mostly consisting of buildings, in remote sensing images. For example,
Huang et al. [51] proposed an automatic building change
detection framework based on the MBI. Experimental results showed that the proposed method outperformed supervised classification via a support vector machine (SVM).
In addition, point and line features, for instance, Harris
[52] and scale-invariant feature transforms (SIFTs) [53], can
improve the discriminability of man-made objects, such as
buildings, roads, and cars, by describing corners and edges,
therefore improving results.
Model-based textures, e.g., Markov random fields (MRFs)
and fractal models, aim to represent textures through stochastic processes [54]. MRF models present spatial context through
a graph-based image representation, where the nodes and
edges of the graph express pixels and their relationship with
connected nodes, respectively. Fractal models can depict texture roughness and complexity by capturing self-similar and
self-affine patterns [55]. A number of MRF-based methods
have been proposed to deal with VHR image change detection [56]–[60] because of their ability to describe local spatial

angular second moment and entropy), and statistical (e.g.,
the mean, variance, and correlation) attributes within local windows [39], [40]. The LBP, an ordered set of binary
comparisons of pixel values between the central pixel and
its neighboring ones, is invariant to monotonic grayscale
change [41]. The PSI aims to measure the length of direction lines, which are extended based on gray-level similarity along a series of directions [42]. Some representative examples for VHR change detection using statistical textures
are briefly introduced in the following.
Tan et al. [43] adopted the GLCM in an automatic change
detection method to consider the variation information of
direction, distance, and amplitude in images. Li et al. [44] applied the local similarity of GLCM textures to detect changes
and demonstrated that this kind of feature was robust against
both noise and spectral similarity. Peng and Zhang [45] used
the LBP for change detection from Gaofen-1 imagery, and
both qualitative and quantitative analyses demonstrated
the effectiveness of the proposed approach. Zhang et al. [46]
identified building change types, i.e., new construction,
demolition, and reconstruction, by using LBP features
and obtained satisfactory change detection results with a
high detection accuracy and precise structure boundaries.
Liu et al. [47] proposed a line-constrained shape feature, a
modified version of the PSI, for building change detection,
and the results showed the approach’s advantage in individual building change detection in a lightly populated region.
Structural textures, e.g., morphological profiles (MPs)
and attribute profiles (APs), facilitate the investigation
of the geometries, shapes, and edges of regions, with the
convex and concave components being erased so that the

TABLE 2. A SUMMARY OF THE FEATURES USED FOR VHR IMAGE CHANGE DETECTION.
CATEGORY

SUBCATEGORY

DESCRIPTION

CHARACTERISTICS

SENSOR

REFERENCES

Textural
features

Statistical

Describe the relationships among the gray levels of
local windows

Edge effect, difficulty of
identifying parameters

QuickBird
[48]–[53]

[43]–[47]

Structural

Investigate the geometry, shapes, and edges of
regions

[48]–[53]

Model based

Obtain coefficients from the model describing the
relationships among the local image neighborhood

[56]–[61]

Transform based

Capture local structures in a transformed space

Autoencoders

Learn efficient encoding through the optimization
of a series of criteria

Convolutional
neural networks

Extract mid- and high-level abstract features by
interleaving convolutional and pooling layers

First level

Radiometry, geometry, and texture for each image
object

Second level

Relationships between two image objects, e.g.,
adjacency and proximity, and relationships with
neighboring objects

Third level

Spatial arrangements of multiple objects

Implicit

Orthographic images and DSMs

Explicit

Quantify the differences contained in multiangle
images, such as angular difference features

Deep
features

Objectbased
features

Angular
features

[63], [64]
Complex training and
parameter tuning,
“black-box” nature, high
computational burden,
overfitting, and so on

Gaofen-2
[66], [77]
and Google
Earth images
[66], [76]

[70]–[73]

Determination of appropriate segmentation
parameters and uncertainties of the segmentation
results

QuickBird
[88], [89]

[85], [88],
[89]

[66], [67],
[75]–[78]

[91], [92]

[95]
Availability of multiangle
images

Ziyuan-3 [2],
[21]

[21], [98],
[99]
[2]

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

relationships. Specifically, Bruzzone and Prieto [57] introduced a change detection method based on an MRF to model
prior class probabilities by interpixel dependence, which increased the accuracy and reliability of the change detection results. In [60], spatial constraints between neighboring samples
were formulated using an MRF in an active learning process
for change detection. Multifractal features were applied to
change detection in [61], and experiments on a complex landscape that included urban areas, agricultural fields, trees, and
an unregulated river indicated that the features were tolerant
to some degree to multitemporal differences caused by the
viewing geometry and illumination angles.
Transform-based textures, e.g., Gabor, wavelets, and
contourlets (CTs), aim to convert images into a new space
to capture local structures corresponding to scale, localization, and orientation [62]. For example, Li et al. [63] used

a Gabor-based approach to improve the change detection
performance since the technique can capture contextual
information at different scales and orientations. Wei et al.
[64] introduced wavelet pyramid decomposition features to
VHR change detection. Thus, in VHR images, the complexity of homogeneous regions can be reduced in low-scale features, and details and edge information can be retained in
high-scale ones [64]. In a comparative study conducted by Li
et al. [65], a number of representative textural features were
selected for change detection using VHR images, and it was
shown that texture-based change detection methods can
obtain better performance than spectral-based pixel ones.
Texture change detection results are demonstrated in Figure 4, and it can be seen that, compared to using individual
textures, combining multiple textures can improve change
detection accuracy.

Unchanged
Changed

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(k)

FIGURE 4. Change detection results based on textures: (a) image (t1), (b) image (t 2), (c) the reference change map, (d) the GLCM, (e) APs,

(f) a 2D wavelet transform (WT), (g) a fractal, (h) a fuzzy set (APs plus a 2D WT plus a 3D WT), (i) a fuzzy set (all textures), (j) a random
forest (APs plus a 2D WT plus a 3D WT), and (k) a random forest (all textures) [65].

DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DEEP FEATURES
Deep feature representation based on the layer-wise learning of image patterns is a very promising research direction
for change detection in VHR images [66], [67]. Differing
from traditional handcrafted features, higher-level abstractions (both linear and nonlinear features) can be automatically extracted and optimized by multilayer neural
networks, which can retain crucial variations and discard
uncorrelated differences for change detection tasks [68]. In
recent years, many deep learning methods have been developed, such as autoencoder (AE) models and convolutional
neural networks (CNNs), for deep feature extraction in
change detection with VHR images.
The AE is an unsupervised feature learning model that is
constructed by minimizing the reconstruction error. However, it may learn a useless feature representation, such as a
simple copy of the input [69]. To overcome that issue, variant
models, e.g., the denoising AE (DAE) [70], sparse AE (SAE)
[71], and Fisher AE (FAE) [72], have been employed for VHR
change detection, with denoising, sparsity, and Fisher discriminant criteria, respectively. Specifically, a stacked DAE
was used to learn high-level features from the local neighborhood [70]. In [70], it was found that the filters learned by a
stacked DAE have a stronger representation capability than
existing explicit ones. Based on the SAE, Su et al. [71] transformed a difference image into a suitable feature space for
suppressing noise and extracting key change information in
the change detection framework. Liu et al. [72] used the FAE
for unsupervised layer-wise feature learning and showed that
the model can generate more discriminative features than
the original AE. In addition to unsupervised feature learning through the optimization of certain criterions, AE-based
models can learn effective features in a supervised way by
considering label consistency, e.g., the contractive AE [73].
It is well recognized that CNNs are effective in extracting mid- and high-level abstract features by interleaving
convolutional and pooling layers [74]. According to the
feature learning strategy, CNNs can be categorized as unsupervised [67], [75], [76], supervised [77], fine-tuning
[66], and transfer learning based [78]. For example, Zhan
et al. [75] used a pretrained CNN to automatically extract
deep spatial–spectral features for change detection in VHR
satellite images. Saha et al. [67] developed unsupervised
deep CVA for change detection, and a network trained on
remote sensing aerial images for semantic labeling by Volpi and Tuia [79] was adopted for deep feature extraction.
As detailed in Figure 5, the experimental results demonstrated that, compared to object-based methods, deep features are effective for capturing change information and
are promising for distinguishing multiclass change information. Wang et al. [77] trained a model through manually selected samples, where the parameters of the shared
convolutional layers were initialized by the pretrained
ResNet-50 model, and the others were randomly initialized. Hou et al. [66] chose to extract CNN-based deep
features through a fine-tuned Visual Geometry Group
74

(VGG)-16 by transferring a model pretrained on largescale natural images to the remote sensing domain via an
aerial image data set. Liu et al. [78] proposed a CNN-based
transfer learning method for change detection. In particular, the loss function was designed by combining high-level
features extracted from a pretrained model (i.e., the U-net
model trained on an open source data set) and semantic
information contained in change detection data sets.
Notably, deep learning methods depend on an enormous amount of training data, which may not be available
for multitemporal VHR remote sensing imagery [74]. Meanwhile, great differences in spectral properties and image
contexts among natural red–green–blue (RGB) images and
remote sensing data result in deep features extracted by finetuned models that do not fully represent the essential characteristics of remote sensing images. As a result, the contrast
between a small number of remote sensing data sets and
a large number of natural images during model learning
may hamper the further improvement of VHR change detection using deep features. In recent years, large multitemporal data sets have been released, such as 86 image pairs
from the DigitalGlobe satellite constellation (i.e., QuickBird,
WorldView-1, WorldView-2, and GeoEye-1) [80], 291 pairs of
multitemporal aerial images [81], and more than 700,000
labeled instances for building damage assessment [82]. It
can be anticipated that more and larger multitemporal VHR
remote sensing data sets with diverse image characteristics
and various acquisition conditions will appear in the near
future. In this case, the essential change features for VHR
remote sensing images can be effectively extracted by a deep
network specialized for multitemporal remote sensing data.
OBJECT-BASED FEATURES
Object-based features refer to spectral, geometry, texture,
extent, and contextual information at the object scale rather than single pixels and groups of pixels within a kernel
filter/moving window. In this way, an image object is viewed
as the processing unit for change detection. An object is a
set of spatially adjacent pixels that are spectrally similar and
that can be extracted through image segmentation. Overall,
object-based features are effective in VHR change detection
since they mitigate radiometric differences, spectral variability, and misregistration errors [38], [83]. However, appropriate segmentation parameters, which are often dependent
on subjective and laborious trial-and-error experiments,
need to be determined [84]. Furthermore, shortcomings and
problems in different multitemporal image segmentation
strategies, e.g., 1) the segmentation of only one monotemporal image, 2) the segmentation of stacked multitemporal images, and 3) the independent segmentation of multitemporal
images, should be carefully considered and tackled [5], [85].
Specifically, geometric changes (e.g., the size and shape)
cannot be captured by 1) and 2) [85]. Moreover, strategy
2) may also result in “sliver objects” caused by image misregistration. As for strategy 3), spatial correspondence
between multitemporal objects needs to be established.
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

Therefore, object-oriented texture computed within the
boundary of an object is recommended, such as object-wise
GLCM texture measures [87] and object-based MPs [90].
The second-level object-based features exploit relationships between two image objects, e.g., adjacency, proximity,
and relations between neighboring objects [87]. For example, Liang et al. [91] considered the relations of neighboring objects in feature extraction for object-oriented change
detection. Yu et al. [92] combined a relative border with a
“forest with no change” and the normalized difference vegetation index (NDVI) to identify the category of “change
from forest to developed land.” The third-level features refer
to spatial arrangements among multiple objects [87]. Thirdlevel object-based features have been used in image classification, such as urban functional zone extraction [93] and
urban village detection [94]. Nevertheless, such features
have rarely been used in VHR image change detection. In
[95], spatial dependency and sharing boundaries among
multiple objects are considered to reduce spurious errors
caused by shadow in urban vegetation change detection.

Object-based CVA results [85] derived from different multitemporal segmentation strategies are presented in Figure 6,
where it can be observed that different multitemporal segmentation strategies can significantly affect change detection results.
Generally speaking, three levels of object-based features
can be used for change detection [86]. In the first, the objectbased features include the radiometry, geometry, and texture
for each image object [87]. For instance, in [88], key points of
each object are extracted in change detection, which was successfully applied in three landslide scenes and one view that
examined land use changes. Bovolo [89] computed the mean
values of texture measures in separate parcels for change
detection, and better accuracy with high fidelity in the homogeneous and border regions was achieved by the objectbased method than with the pixel-based one. However, in
these studies, texture is still extracted in a pixel-based manner and depends on the size of a moving window (or kernel).
More importantly, kernel- and window-based texture can
create between-class texture, leading to an edge effect [87].

(a)

(b)

(c)

(d)

(e)

(f)

ωnc

ωc1

ωc2

ωc3

Bounding Box Denoting Changes

FIGURE 5. Change detection results for QuickBird bi-temporal images: (a) image (t1), (b) image (t 2), (c) the reference change map, (d) multiclass deep CVA, (e) binary change deep CVA, and (f) object-based CVA [67].
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

ANGULAR FEATURES
Multiangle satellite images can be acquired by WorldView-2,
IKONOS, Cartosat-1, and Ziyuan-3 through across-track and
along-track stereoscopy [96]. Spatial and spectral variations
encoded in multiangle images can be extracted as new information sources for change detection. To be specific, multiangle observations can capture information about bidirectional reflectance signatures and vertical structures (e.g.,
trees and buildings) and hence complement conventional
spectral and spatial features [27]. In this article, angular features are categorized as 1) implicit ones that are generated
by stereo photogrammetry, such as orthographic images
and DSMs, and 2) explicit ones that capture angular variations, such as angular difference features [97].
Most existing change detection studies based on multiangle VHR imagery adopt implicit angular features. For example, Chaabouni-Chouayakh et al. [98] presented a fully
automatic change detection method for urban monitoring
using IKONOS stereo data, and their experimental results
verified the effectiveness of the joint use of multispectral
and DSM features. Tian et al. [99] investigated building and
forest change detection using panchromatic Cartosat-1 stereo imagery, and they found that extracted height values
from DSMs can greatly improve change detection accuracy.
Huang et al. [21] used photogrammetrically derived orthographic images from multiangle Ziyuan-3 data to monitor
subtle changes across urban areas, and it was shown that the

use of orthographic images can minimize the influence of
spatial inconsistency among multitemporal data, e.g., misregistration and parallax distortion for high-rise buildings.
On the other hand, explicit angular features aim at describing the differences contained in multiangle images,
e.g., the angular difference feature [100], multiangular builtup index (MABI) [101], multiangle spectral variation feature
[27], stacked multiangle spectral feature [102], and bidirectional reflectance distribution function-based index [103].
Benefiting from these explicit angular features, detailed urban and vegetation classifications were achieved using multiangle VHR images. Nevertheless, in the current literature,
the previously mentioned explicit angular features have seldom been employed for change detection. One exception is
a recent study presented in [2]. In it, the MABI, which indicates spectral and structural variations in multiview images,
was used. Specifically, Huang et al. [2] integrated planar (i.e.,
MBI, Harris, and PanTex) and vertical [multispectral image
(MSI), normalized DSM (nDSM), and MABI] features to detect newly constructed buildings and identify their change
timing by using time-series, multiview Ziyuan-3 imagery.
Figure 7 gives an example of change results from different
feature combinations. It shows that the joint use of planar
and vertical features can generate more accurate results in
terms of change extents and timings.
To better evaluate the different kinds of features, we create a Ziyuan-3 multiview change detection (MVCD) data

(a)

(b)

(c)

(d)

(e)

(f)

FIGURE 6. Object-based CVA results from different multitemporal segmentation strategies: (a) image (t1), (b) image (t 2), (c) the reference

change map, (d) the segmentation of image(t1), (e) the segmentation of stacked multitemporal images, and (f) the separate segmentation
of each monotemporal image [85].

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

set, which is available at http://irsip.whu.edu.cn/resources/
resources_en_v2.php. It includes both urban and rural scenes
with diverse and complex change types, and, moreover,
it considers seasonal and illumination influences. These
characteristics enable the MVCD to function as a challenging change detection data set. A comparative analysis
between different attributes, including the GLCM [39], AP
[49], CT [62], MABI, [101], object-wise GLCM (GLCM-Obj)
[87], and deep features [67], has been carried out. Specifically, the change intensity map was obtained by CVA, and
the threshold for each feature was determined based on receiver operating characteristic curves to achieve a balance
between commission and omission errors [65]. Qualitative and quantitative experimental results are provided in
Figure 8 and Table 3, respectively. The spectral feature fails
to detect changes between spectrally similar classes (e.g.,
bare soil and buildings), and unchanged objects with spectral variation are incorrectly detected as changed ones. The
GLCM, AP, and CT can depict textural changes, e.g., the
spatial distribution of the gray value, geometry, and local
details. Among them, the CT gives more complete changed
regions, and the AP produces more false alarms. The MABI
emphasizes building changes, but it is not sensitive to other
variations (e.g., soil, vegetation, and roads), which therefore leads to a large omission error. The GLCM-Obj generates smoother results with smaller omissions but larger
commission errors than its pixel-wise version. Deep CVA

outperforms the other methods, but false alarms caused by
shadows and seasonal effects can be still observed.
CHANGE DETECTORS
VHR change detectors can be categorized as algebra-, transform-, and machine learning-based indicators. CVA is one of
the most widely used algebraic approaches, and it is carried
out by measuring the difference among bi-temporal multifeature vectors to derive a change vector for VHR images
[67], [104], [105]. Transform-based methods, such as principal component analysis [106] and multivariate alteration
detection [107], attempt to suppress no-change areas and
emphasize change information in the transformed feature
space. In the machine learning community, change detection is often viewed as a classification problem. In conventional classification-based VHR change detection, spectral–
spatial feature extraction and detectors (e.g., SVMs [108] and
the random forest [65]) are separately implemented. The
recent hot spot, i.e., deep learning, can integrate these two
operations in a joint learning framework, which is therefore
very promising for VHR change detection [109], [110].
Deep learning-based change detectors can be grouped
in terms of different criteria, including learning and fusion
strategies, network models, and processing units (Table 4).
We first discuss learning strategies. On the basis of a
large amount of annotated data, supervised deep learning
methods can capture semantic changes, and hence they

2012

2013

2014

MBI

Harris

2015

2016

2017

MSI

nDSM

2018

MABI
2013
2014
2015
2016
2017
2018
Non-NCBAs

Pantex

Reference Data

Fused
Planar Features

(a)

Fused
Vertical Features

Planar
Vertical Features

(b)

FIGURE 7. Experimental results for the automatic monitoring of newly constructed building areas (NCBAs) using planar (i.e., MBI, Harris,

and Pantex) and vertical (MSI, nDSM, and MABI) features [2]. (a) Multitemporal Ziyuan-3 images. (b) NCBAs and their change timing.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

(a)

(b)

(c)

(d)

(e)
Commission

(f)

(g)

(h)

(i)

(j)

Omission

FIGURE 8. A comparison of different features for the MVCD data set: (a) image (t1), (b) image (t 2), (c) the ground reference, (d) spectral
features, (e) the GLCM, (f) APs, (g) CTs, (h) the MABI, (i) the GLCM-Obj, and (j) deep features.

TABLE 3. THE CONSIDERED METHODS’ CHANGE DETECTION
ACCURACY WHEN USING THE MVCD DATA (%).
METHOD

CORRECTNESS

COMMISSION
ERROR

OMISSION
ERROR

OVERALL
ERROR

Spectral

70.88

24.51

29.12

26.62

GLCM

65.05

16.1

34.95

22.04

71.75

40.2

28.25

33.18

75.13

33.84

24.87

28.67

MABI

57.87

28.76

42.13

34.18

GLCM-Obj

74.51

30.47

25.49

27.76

Deep

79.98

25.46

20.02

22.41

are sensitive to actual variations of interest and tolerant
to “pseudo changes” (such as geometric deformation and
radiation distortions caused by spatial displacement and
phenology variation, respectively) [110]–[116]. However, it
is difficult to learn a deep model only from the training
samples of a study area since the proportion of the change
area is usually very small. To tackle this problem, on the one
hand, transfer learning [117] and meta-learning [118] are
considered to leverage knowledge from other data sources.
Transfer learning strategies focus on fine-tuning pretrained
models that are designed for different but related tasks.
Meta-learning can learn from data, and it can learn how to
learn by utilizing previous experiences [119]. Regarding the
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

huge difference between VHR remote sensing images and
data from other fields (e.g., natural RGB images) in terms
of the image modality, spectral bands, spatial resolution,
viewing angle, and so on, large amounts of publicly available multitemporal VHR remote sensing data are required
to construct a robust VHR deep change detector. On the
other hand, semisupervised deep learning methods, with
the consideration of unlabeled samples [120], can relieve
the burdensome labeling process, although the effects of
unlabeled samples as well as the complexity of the semisupervised model should be further investigated.
With regard to the fusion strategy, according to how
bi-temporal images are dealt with, deep learning-based change
detectors can be classified as early fusion and late fusion. Early
fusion methods concatenate multitemporal images as a whole
input into a deep network [110]. Early fusion is able to capture
the hierarchical difference representation, i.e., from low-level
grayscale differences in shallow layers to high-level semantic
changes in deep layers, while grayscale differences that are
not relevant to semantic changes, e.g., spatial misalignment and the internal variability of objects, may propagate
to deeper layers and therefore lead to false alarms [113].
In contrast to early fusion, late fusion methods separately
learn monotemporal features and concatenate them later
as an input to the change detection layers [121]. This kind
of network architecture may lead to insufficient learning,
e.g., during network training. Gradients in high layers are
difficult to flow backward to lower ones [122] and hence affect the change detection performance. Thus, as an attempt
in [113], early and late fusion networks were combined to
complement one another.
As for network models, AE [123], [124], deep belief networks [125], CNNs [110], [112], [113], [115], [120], [126],
recurrent neural networks (RNNs) [127]–[129], generative
adversarial networks (GANs) [130], [131], and graph neural
networks [132] have been adopted for end-to-end change
detection. The CNN is one of the most widely used methods, and mainstream CNN architectures, such as AlexNet
[133], VGGNet [134], GoogleNet [135], ResNet [136], and
DenseNet [137] as well as their variants, have been considered [138]. RNNs with modules, such long short-term
memory and gated recurrent units as well as their variants,
are also widely employed to model the phenological process of multitemporal VHR images, due to the superiority of
recurrent layers in processing sequential data and modeling
time-series dependence. In addition, the U-net and its variants, which are composed of an encoder to hierarchically
extract semantic information and a counterpart decoder to
delineate spatial details, can be viewed as AE architectures
for VHR change detection. They receive much attention due
to their ability to maintain change object spatial details.
Recently, some studies proposed hybrid models, such
as those in [111] and [127]. For instance, as illustrated in
Figure 9, a CNN and an RNN are combined in one endto-end network to extract joint spectral–spatial–temporal
features [111]. In [139], difference-based methods using
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

edge-based level set evolution (ELSE), region-based level
set evolution (RLSE), MRFs, and fully convolutional networks (FCNs) as well as postclassification-based methods
with SVMs, CNNs, GANs, Siamese convolutional networks
(SCNs), and end-to-end GAN-based Siamese frameworks
(GSFs) are compared for landslide detection (Figure 10).
Since observing landslides separately from unchanged
and other changed regions is required, this kind of change
detection is challenging. As can be seen, the four difference-based methods lead to more false alarms. As for the
five postclassification methods, deep learning techniques
generally outperformed SVMs, due to their explorative capabilities in representing related changes and suppressing
irrelevant variations.

TABLE 4. A SUMMARY OF DEEP LEARNING-BASED
CHANGE DETECTORS.
CRITERIA CATEGORY

DESCRIPTION

REFERENCES

Learning
strategy

Supervised

Based on a large number of
labeled samples

[110]–[116],
[121], [124]–
[130], [139], [140]

Transfer
learning

Fine-tunes pretrained
models that are designed for
different but related tasks

[117], [131]

Metalearning

Learns from little labeled
data and learns how to learn

[118]

Semisupervised

Joint use of labeled and
unlabeled data

[120], [132]

Fusion
strategy

Network
model

Processing unit

Early fusion Uses concatenated multitemporal images as input

[110], [114], [115],
[125]–[129],
[131], [132], [139]

Late fusion

Learns monotemporal
features separately and
then concatenates them
as a whole input

[111], [112], [116],
[117], [121],
[130], [140]

CNN

Stacked convolutional,
pooling, and fully
connected layers

[110], [112]–
[116], [120], [126]

Recurrent
neural
network

Models with a recurrent
hidden state, e.g., gated
recurrent units and long
short-term memory

[127]–[129]

Reconstructs the input
with an encoder–decoder
structure

[123], [124]

Deep belief Composed of layer-wise renetwork
stricted Boltzmann machine

[125]

Graph neural network

Learns graph structure, e.g.,
relationships between features of pixels/objects

[132]

Generative
adversarial
network

Generator and discriminator
that are adversarially trained

[130], [131], [139]

Patch

Assigns a label to each patch

[111], [115]–[117],
[120], [121],
[128]–[130]

Pixel

Predicts change labels for
each pixel

[110], [113],
[114], [126],
[131], [139]

Object

Incorporation of segments/
superpixels

[124], [125],
[127], [132], [140]

Convolutional Layers of Branch (t2)

(a)

(b)

(c)

Sigmoid/Softmax

Unrolled
Recurrent Layer

Fully Convolutional
Layer

Convolutional Layers of Branch (t1)

(d)

(e)

FIGURE 9. An end-to-end architecture composed of a CNN, RNN, and fully connected network for change detection [111]. (a) Image (t1)

(top) and image (t 2). (b) The convolutional subnetwork. (c) The recurrent subnetwork. (d) The fully convolutional layers. (e) The binary
change detection (top) and multiclass change detection.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(k)

(l)

FIGURE 10. Landslide detection results from different methods: (a) image (t1), (b) image (t 2), (c) ELSE, (d) RLSE, (e) an MRF, (f) an FCN,

(g) an SVM, (h) a CNN, (i) a GAN, (j) an SCN, (k) a GSF, and (l) the ground truth. White and black indicate areas where landslides are detected and not detected, respectively. Red and blue circles represent landslide pixels that are wrongly detected and omitted [139].
80

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

According to the processing unit, deep learning-based detectors are divided into patch- [116], [130], pixel- [110], [117],
and object-based [127], [140] varieties. For a patch-based
change detection task, a sliding window with a fixed size is
used to divide the study area into a series of patches, and each
patch is assigned a label by the detector. In this way, each pixel
in the patch is assigned the same label. Consequently, rough
location—not fine-grained—boundary-of-change information is obtained. However, patch-based change detection can
reduce the influence of spatial misalignment to some extent
in VHR change detection. Since patch-based deep learning
networks view each patch as the change detection unit and
encode each patch as a set of feature maps with coarser spatial
resolutions, the spatial misalignment of these feature maps
becomes smaller, and some errors of spatial alignment are
therefore avoided in a change detection task. In other words,
when regarding a patch as the change analysis unit, only a
very large misalignment can cause an unchanged image
patch to be identified as a changed one, and a small misalignment can be tolerated. Several important issues should be noticed for the patch-based method, such as the oversmoothing
of results and the selection of the patch size.
The multiscale strategy [135] may be appropriate for addressing these issues, but it inevitably leads to larger computation burdens. Pixel-based methods usually employ
semantic segmentation architectures to predict pixel-wise
change detection results [33]. Specifically, in semantic segmentation architectures, after extracting abstract semantic
information through multilayer encoding (e.g., convolution layers), a series of operations, e.g., interpolation, deconvolution, and upsampling, is used to progressively decode semantic information into feature maps that have the
same spatial resolution as the input images. Unlike traditional pixel-based change detectors that suffer from misregistration, viewing angle differences, and occlusions, deep
learning methods can predict pixel-wise change detection
with a highly semantic abstraction of the spatial context.
However, object boundaries are often blurred in the change
detection results, as up-sampling layers reconstruct the appearance but not the shape of objects. To cope with this
issue, better networks are designed. UNet++, for example,
combines nested features to preserve change region boundaries, considering that shallow layers are better able to capture spatial details [110].
Object-based deep learning methods are also considered for change detection [127], [140]. A simple approach is
to adopt object-based segmentation in the pre/postprocessing step, as shown in [140]. On the other hand, object information can be also considered during the training process
by adding object-wise loss terms [127]. However, issues related to conventional multitemporal image segmentation,
such as oversegmentation, undersegmentation, and “sliver
objects” caused by misregistration, remain unsolved. In the
future, object-based detectors need to generate semantic
segments and establish spatial correspondence between
multitemporal segments.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

The types of characteristics most often used for each criterion (i.e., the learning strategy, fusion strategy, network
model, and processing unit) in VHR change detection are
summarized in the following:
1) For the learning strategy, supervised learning is the most
widely used method for VHR change detection. However, the great amount of labor required to collect a large
number of training samples becomes a bottleneck, especially for deep network models, which leads to increasing attention for other learning strategies.
2) Late and early fusion strategies have their own strengths
and weaknesses in representing multitemporal features
and their differences, and hence hybrid fusion is sometimes chosen.
3) Among various network models, CNNs are the most commonly considered, and they are coupled with other networks, i.e., hybrid models, for instance, CNN–RNNs [111].
4) As for the processing unit, most studies consider patchand pixel-level models. Patch-level detectors are more
tolerant to spatial misalignment, but pixel-based ones are
more appropriate for identifying fine-grained changes.
APPLICATIONS OF VHR CHANGE DETECTION
VHR image change detection is widely used in a large number of practical scenarios. A series of representative applications is the focus of this review, including the monitoring and change detection of 1) land cover and land use,
2) buildings, 3) vegetation, 4) crops, 5) lakes and wetlands,
6) ecosystem services, and 7) impervious surfaces.
LAND COVER AND LAND USE CHANGE DETECTION
Compared to coarse- and medium-resolution images, VHR
images can reveal detailed and subtle intraurban change
information [141]. Specifically, urban change detection by
combining multiple features (e.g., object-based spectral,
shape, and texture attributes) was presented in [142], where
changes to detailed urban objects, e.g., buildings, roads, and
playgrounds, can be detected. Huang et al. [21] identified
pixel-level change transitions in 2012–2013 using Ziyuan-3
orthographic images, and the experimental result is presented in Figure 11. It can be seen that, even in the one-year period, small-scale changes extensively occurred in the urban
area of Wuhan, China. For instance, fine-scale urban land
cover transitions caused by pond infilling, building demolitions, building construction, weed growth, and site preparation can be observed. In [143], changes in detailed land
cover classes, including bright roofs, gray roofs, tile roofs,
brown fields, dark asphalt, light asphalt, and so on, were
analyzed using IKONOS and GeoEye-1 images.
As for land use change detection, Wu et al. [108] interpreted change transitions, e.g., from sparse housing to industrial
areas, by combining spectral and SIFT features. In [144], land
use maps of Shenzhen (a highly dynamic and developed
megacity in China) were generated in 2005 and 2017 based
on VHR satellite data. As demonstrated in Figure 12, detailed land use categories, including residential, commercial,
81

industrial, infrastructure, grassland, farmland, woodland,
water, breeding surfaces, and unused land, were monitored.
In addition, the performance of different features, i.e., color
histograms (CHs), LBPs, SIFTs, and deep features, were compared, and the best accuracies of 96.9% and 97.1% were obtained by the deep learning method [Figure 12(b)].
BUILDING CHANGE DETECTION
Buildings are one of the most dynamic artificial structures,
and building change detection is important for urban development monitoring (e.g., building demolition and construction) and disaster management (e.g., building damage
caused by natural hazards). Numerous methods for building change detection have been proposed [19], [51]–[53],
[85], [145]–[157]. Some studies focus on multitemporal
building observation and subsequent change analysis, where
descriptors for building detection in VHR images are a critical issue. The descriptors can be categorized as template
matching (e.g., the snake model) [158], knowledge based
(e.g., shadow evidence and the MBI) [36], [159], and machine
learning [148], [160]. For example, in [52], the MBI and the

Harris detector were used to identify building areas, and
then building change detection was conducted through
interest point matching. Other types of methods directly
explore changes in shapes, colors, and textural properties
that are highly related to characteristics of buildings. For
example, in [51], multitemporal variations in the MBI and
spectral information were used to identify altered buildings. Likewise, in [85], the change feature generated by the
MBI and spectral features was considered the indicator of
building change. In [161], building changes were detected
through the aggregation of spectral and textural features.
Figure 13 provides building change detection results
from different methods, including SVMs based on MBI
features (MBI–SVM), building interest point detection
using the MBI and the Harris detector, MBI-based CVA
(MBI–CVA), the fusion of the MBI and spectral and shape
features, CVA using morphological features, and objectbased CVA. It can be seen that automatic methods can
achieve performance comparable to or better than supervised ones, i.e., the MBI–SVM [Figure 13(d)]. Meanwhile, the results of the MBI–CVA [Figure 13(g)] show

Result

2012

2013

b
d

(b)

(c)

(d)
e

(e)
f

(a)
Soil to Roof

Roof to Soil

(f)
Soil to Grass

Grass to Soil

Water to Grass

No Change

FIGURE 11. Land cover change detection using Ziyuan-3 satellite imagery from 2012 and 2013. (a) The change detection result of the study
area in Wuhan. (b)–(f) Five example cases of the change detection result and corresponding bi-temporal images [21].

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

representation is the key to achieving good performance
for VHR change detection.
Apart from 2D characteristics, 3D information has
been exploited for building change detection in recent

more small false alarms. The fusion of the MBI and
other features, e.g., the Harris detector [Figure 13(f)] and
spectral and shape features [Figure 13(h)], can reduce
these errors. These results illustrate that effective feature

N
W

0 4 8

16 Km

S
96.9

Overall Accuracy (%)

100

2005

74.6

97.1
84.6

79.2

88.9
77.2 78.9

69.7
63

20
2017
Residential
Commercial
Unused Land
Breeding Surface

Infrastructure
Grassland
Industrial

Woodland
Water
Farmland

2005

2017
CH
LBP
SIFT
CH + LBP + SIFT
Deep Learning

(a)

(b)

FIGURE 12. Land use change detection in the city of Shenzhen using high-spatial-resolution satellite imagery from 2005 to 2017, including

(a) land use maps and (b) an accuracy assessment with different features [144].

(a)

(b)

(c)

(d)

(e)

Changed
Unchanged

(f)

(g)

(h)

(i)

FIGURE 13. Building change detection maps obtained by different algorithms: (a) image (t1), (b) image (t 2), (c) the reference change map,

(d) the MBI–SVM, (e) object-based CVA, (f) the MBI and the Harris detector, (g) the MBI–CVA, (h) the fusion of the MBI with spectral and
shape features, and (i) CVA using morphological features [51], [52].

DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

Benefiting from time-series, multiview satellite imagery,
Wen et al. [155] analyzed 3D annual building changes
in inner city areas of four Chinese megacities (Beijing,
Shanghai, Xi’an, and Wuhan). Their results characterized
changes in the horizontal direction, such as construction
and demolition, and quantified changes in the vertical direction, i.e., height and volume (Figure 14).
It should be noted that uncertainty and the cost of 3D
data can present a bottleneck for the development and application of 3D building change detection. Specifically, on
the one hand, lidar data are relatively accurate but not recurrently acquired. On the other hand, photogrammetrically
derived 3D data from multiview images are a sufficiently
cost-effective alternative to lidar, but their 3D reconstruction qualities depend on metaparameters of stereo pairs
(e.g., intersections, off-nadir angles, sun elevations, azimuth

studies. With easier access to 3D data, such as multiview
images, 3D information indicated by angular features can
be conveniently used. More importantly, misregistration
caused by spatial displacement is minimized [162]. Turker
and Cetinkaya [163] detected damaged buildings by calculating the difference between digital elevation maps
derived from pre- and postearthquake stereo images. In
[157], multichannel indicators, such as height differences
and texture similarities, are fused to monitor building
changes. The incorporation of angular features is effective
in improving the performance of building change detection, and it has potential for quantifying 3D dynamic processes in urban renewal and development. However, due
to the relatively high cost of 3D data acquisition, such as
lidar and multiview UAV images, only a few studies investigate detailed building change processes in 3D space.

2012

2013

2014
Constructed
2013

60 m

2015

2016

Height (2012)

Height (2017)

2015

2016

Height (2012)

Height (2017)

2017

Demolished
2013

2017

Unchanged

3m
2017

Building Change
(a)

2012

2013

2014
Constructed
2013

60 m

2017

Demolished
2013

2017

Unchanged

2017

Building Change

3m
(b)

FIGURE 14. The annual 3D building change in subset areas of Shanghai that was achieved using multiview satellite imagery. (a) Subset
area 1. (b) Subset area 2 [155].

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

angles, completeness, and time differences) [164]. Therefore,
successful 3D building change detection relies on more advanced models that can produce accurate multitemporal 3D
data in an economical and effective way. Very recently, deep
learning has been explored for 3D reconstruction from multiview images. For example, a CNN-based method was proposed for dense image matching in [165]. This novel technique may provide a new research orientation for 3D urban
change detection when vertical and height information can
be accurately derived from multiview satellite images.
VEGETATION CHANGE DETECTION
Analysis of vegetation change is important to understanding ecological transitions [166]. Using VHR imagery, vegetation change can be investigated at a much finer scale, e.g.,
from forest stands to individual trees. In general, there are
three types of vegetation changes: 1) seasonal, caused by
plant phenology; 2) gradual, caused by interannual climate
variability, land management, and land degradation; and 3)
abrupt, caused by disturbances, e.g., urbanization, deforestation, and fires [167]. In [168], to assess seasonal changes,
both spectral and textural information extracted from multiseasonal Pléiades imagery (2 m) was used for multiseasonal
leaf area index (LAI) mapping. The results showed that the
highest LAI occurred in midsummer, followed by late spring,
autumn, and winter, and the observed seasonal change trend
was similar to that based on the in situ measured LAI. Seasonal changes in the crown scale in an Amazon tropical evergreen forest were assessed by Wang et al. [169] using Planet
constellation imagery with a spatial resolution of 3 m. The
crown scale fraction of nonphotosynthetic vegetation showed
large seasonal trend variability from June to November.
As for gradual changes, Gärtner et al. [170] used QuickBird
and WorldView-2 imagery to quantify tree crown diameter changes in a degraded riparian tugai forest in northwestern
China, and their results indicated that the diameter increased
by 1.14 m, on average, during 2005–2011. Tian et al. [171] explored DSMs from satellite stereo sensors to monitor vertical
tree growth and found that periodic annual increments at the
study sites were in the range of 0.3–0.5 m. In the case of abrupt
change, Dalagnol et al. [172] quantified tree canopy loss and
gap recovery in tropical forests where there was low-intensity
logging by using WorldView-2 and GeoEye-1 images. Their study
showed that VHR satellite imagery has potential for tracking
small-scale human disturbances. Ardila et al. [173] identified
bi-temporal tree crown elliptical objects through the iterative
surface fitting of a Gaussian model to crown membership in
two urban residential areas in The Netherlands using QuickBird and aerial images. A detection rate of 77% was reported
for both removed and planted trees.
In addition to coverage, tree crown diameters, and
canopy heights, species types are an essential parameter of
vegetation community structures. In particular, VHR imagery is able to identify small and highly mixed species. Since
different vegetation types exhibit similar spectral characteristics, textures are often used to identify various species.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

For instance, Lu and He [174] investigated seasonal species
variations in a tall grassland in Ontario, Canada, during
the growing season (from April to December) in 2015 using
UAV images. The reflectance value, vegetation indices, and
GLCM textures were used in the classification, and temporal change analysis revealed the growing process and succession of different species. Notably, some advanced methods, e.g., deep features [175], photogrammetric-derived
DSMs from stereo images [176], phenological characteristics [177], and data fusion (e.g., lidar and airborne hyperspectral images) [178], have been considered for the change
analysis of vegetation species. Moreover, some researchers
attempted to discriminate vegetation function types, e.g.,
park, roadside, and residential–industrial trees in urban areas [179]. Likewise, vegetation function-type change monitoring is of great significance but has not been addressed in
the current research.
MONITORING CROP CHANGES
Information about agricultural land changes, crop type
conversions, and crop growth, critical for precision agriculture, can be effectively captured using VHR images. In
[180], land cover data for Guanlin, Yixing City, China, in
2006, 2009, 2012, and 2015 were generated using QuickBird
images, and they showed a decrease followed by an increase
in the agricultural land area that was observed. Malinverni
et al. [181] quantified the temporal variation of main crop
rotations on the Capitanata plain of Southern Italy using
WorldView-2 images, and the textural features (e.g., the
GLCM and the Gabor wavelet) were employed to improve
the classification accuracy. The study suggests that multitemporal classification is preferred in crop mapping, due to
its rich phonological characteristics. Furthermore, frequent
crop growth monitoring is extremely important for timely
decision making in precision agriculture. Therefore, timeseries data are recommended, although dense time series of
VHR images are relatively difficult to acquire.
Recently, new generation micro-/nanosatellites (e.g.,
Planet) and UAV systems have become available and are
able to obtain time-series VHR images, which has potential for agricultural applications. For example, Sadeh et al.
[182] detected sowing dates using dense time-series Planet
CubeSat data with an interval of two days. As shown in
Figure 15, a partly sown field was successfully detected,
implying that detailed processes on a near daily basis can
be monitored by dense time series of VHR data. Likewise,
Bendig et al. [183] monitored plant growth based on crop
surface models using stereo UAV images. Notably, height
differences between cultivars and their increased trend during the growing season can be observed.
Crop change caused by disease and insect damage can
also be located. VHR images are able to identify small-extent disease and insect damage, which is beneficial for controlling problems at early stages. Generally, diseases and insects can result in various kinds of harm to crop canopies,
such as the removal of leaves, skeletonizing of leaf tissue,
85

management, restoration, and protection. Many studies
have used remote sensing data for monitoring lakes, from a
local to a global scale. They include lake changes between
1975 and 2015 across the Yangtze floodplain in China via
Landsat images [191], water clarity changes in lakes and reservoirs across China that were observed using Moderate Resolution Imaging Spectroradiometer (MODIS) data [192] from
2000 to 2017, and global surface water changes between
1984 and 2015 acquired through Landsat images [193]. In
these studies, which were subject to relatively low spatial
resolution, lakes with large areas were targeted. However,
more than 303.6 million of the 304 million lakes at the
global scale are smaller than 1 km2 [194]. Therefore, VHR
remote sensing images are required for observing them. To
our knowledge, however, only a few studies have focused
on lake monitoring using VHR images.
Cooley et al. [195] tracked water changes in the 470 lakes
(0.0025–1.23 km2) in the Yukon Flats of north-central
Alaska during mid to late summer (23 June to 1 October)
in 2016, using Planet CubeSat images with a spatial resolution of 3 m. A time-series analysis revealed that the area
of 83% of the studied lakes had decreased and that 22%
of the lakes had lost more than half their surface. Notably,
more applications of advanced methods of water detection
through VHR images, e.g., deep learning [196] and physical
approaches [197], are needed. Furthermore, information
about black and odorous water [198] and water types (e.g.,
rivers, lakes, canals, and ponds) [199] is of increasing interest, and multitemporal monitoring is imperative.
In addition to lakes, VHR images have potential for
monitoring detailed changes in wetland ecosystems. In
[200], the results of five-level mangrove features, including vegetation boundaries, mangrove stands, mangrove
zonations, individual tree crowns, and species communities, using different data sets [Landsat (30 m), Advanced Land

and discoloration of leaves, and these effects vary depending on the type of disease, insect, and crop [184]. Therefore,
different damage shows various spectral and structural
characteristics in remote sensing images, which makes the
identification of disease and insect problems via VHR images a challenging task. One of the successful applications
was presented by Johansen et al. [185], where GeoEye-1 images acquired in 2012, 2013, and 2014 were used to detect
canegrub damage in sugarcane fields. In the study, objects
with low NDVI values and rough textures were identified
as likely to be damaged, and they were further classified as
low, medium, and high likelihood. Franke and Menz [186]
observed different levels of disease severity in a plot of winter wheat using multitemporal QuickBird images acquired
in April, May, and June.
The experimental results show that VHR multispectral
data are only moderately suitable for damage detection at
an early growth stage, a fact attributed to the subtle spectrum and texture differences between damaged and healthy
crops [187], [188]. However, VHR hyperspectral sensors
seem to have potential to address this issue. For example, in
[189], spectral and spatial features were extracted by a CNN
from UAV hyperspectral images for the detection of yellow
rust across a whole crop cycle of winter wheat. Satisfactory
accuracy was achieved through all growing stages, due to
the detailed spectral information and rich spatial details in
VHR hyperspectral images.
MONITORING LAKES AND WETLANDS
Lakes and wetlands, which play a critical role in biodiversity,
ecosystems, hydrology, and climate regulation, are highly
dynamic due to various natural and anthropogenic factors,
such as climate change, farming, urbanization, floods, and
hydrological interventions [190]. Therefore, accurate and
timely monitoring of lakes and wetlands is important for

Change

0.5

Sown Field

0.5

Unsown Field
(a)

No Change

0.5

Noise
(b)

Sown Area
(c)

FIGURE 15. A sowing detection result obtained using time-series Planet CubeSat images [182]. (a) RGB satellite imagery. (b) The change

result. (c) The sowing detection result.
86

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

Observing Satellite Advanced Visible and Near-Infrared Radiometer 2 (10 m), pan-sharpened WorldView-2 (0.5 m), and lidar]
were generated and compared. As described in Figure 16,
the Landsat image cannot accurately discriminate the mangrove extent, due to the mixed-pixel problem [Figure 16(e)],
and more fine-scale mangrove features, i.e., tree-crown-level species, can be captured only by pan-sharpened WorldView-2 imagery [Figure 16(l)–(p)]. By summarizing the current literature, it can be found that most studies focus on
detecting the extent of wetland change but ignore species
change. For instance, Hu et al. [201] monitored land cover
changes in the Hangzhou Xixi wetland from 2000 to 2013
using IKONOS, QuickBird, and WorldView-2 images. It was
shown that the nonwetland area increased by approximately 100%, mostly in the form of herbaceous zones, followed
by forests, ponds, cropland, marshes, and rivers. Wu et al.
[202] integrated lidar data and multitemporal aerial imagery (1 m) to map wetland inundation dynamics in the Prairie Pothole region of North America, which is characterized
by millions of small depressional wetlands.
The difficulties of species change detection in wetlands lie
in the following aspects. On the one hand, tidal and phenological changes make different plant species highly dynamic
on daily and seasonal frequencies, respectively. On the other
hand, many species have a similar spectral reflectance during the peak biomass in complex wetland landscapes [203],
and the spectral signature of the same species can be influenced by many complex factors, such as the off-nadir angle,
sun-viewing geometry, crown porosity, leaf clumping, and
ground surface scattering [204]. For instance, in [200], mangrove species were categorized from WorldView-2 images
using the nearest-neighbor classifier to extract object-based
spectral and textural features within tree crowns, but a low
overall accuracy of around 54% was reported. As demonstrated in Figure 16(p), misclassified open scrub Avicennia
marina can be clearly observed. To improve the discriminative power among various species, the potential of VHR hyperspectral images, dense time-series data, and vertical information for characterizing detailed spectral, phenological,
and height attributes needs to be explored.
ECOSYSTEM SERVICES MONITORING
Ecosystem services link ecosystems to human welfare by
regarding nature as a stock providing a flow of services
(e.g., local climate regulation and water purification) [205].
Monitoring urban ecosystem services is of great value for
investigating ecological function changes and can help improve the understanding of urbanization impacts on local
ecological benefits. VHR satellite data can monitor spatially
explicit ecosystem services at fine scales. Generally speaking, there are two categories of methods to derive ecosystem
services: 1) statistical regression and radiative transfer models and 2) land use/cover-based methods [206]. Since in situ
observations are not always available and the validity of
statistical regression and radiative transfer models is affected by time inconsistencies between ground and remotely
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

sensed measurements, land use/cover-based methods are
often preferred. For example, in [207], land use/cover maps
of Shanghai’s urban core from 2000 to 2009 were classified using IKONOS and GeoEye-1 images, and the classes
were then transformed into ecosystem service supply and
demand budgets, including regulating, provisioning and
cultural services, and ecological integrity. An increase of
at least 20% in ecosystem service supply budgets was observed, which was mainly attributed to the replacement of
continuous urban fabric and industrial areas by high-rise
commercial/residential areas despite a slight increase in urban green sites.
Huang et al. [144] assessed ecosystem service change
in Shenzhen from 2005 to 2017 using Gaofen-2 (4-m) and
QuickBird (2.4-m) images. In the study, multitemporal land
use maps were generated by a transferred deep CNN (as
shown in Figure 12), based on which ecosystem service
supply and demand values were estimated. It was found
that supply capacity had decreased by 13.7% due to a reduction in woodlands, water, farmland, and so on, but, on
the other hand, demand values had grown by 23.5% because urban expansion and redevelopment had increased
the amount of residential, commercial, and infrastructure
land. The results clearly demonstrated the ecosystem degradation of Shenzhen during the previous 10 years. Ren et al.
[208] evaluated the ecosystem services of Guyuan City in
2003, 2009, and 2014 via VHR satellite imagery (e.g.,
QuickBird and Gaofen-1) and showed that VHR images were
advantageous in the dynamic, quantitative, and visual examination of ecological changes. With VHR remote sensing
images, fine-scale ecosystem services within urban areas
can be effectively quantified. However, most of the current
works focus on urban areas and ignore the ecosystem services of natural scenes, such as forests and wetlands. Moreover, these works present only case studies, and large-scale
examinations are still lacking.
IMPERVIOUS-SURFACE CHANGE DETECTION
The change detection of impervious surfaces is important
in monitoring and understanding urban development and
has been extensively studied in the remote sensing literature. However, most of the existing studies monitor the
change of impervious surfaces based on coarse- and medium-spatial-resolution satellite imagery, such as MODIS and
Landsat [209], [210], which, on the other hand, have difficulty dealing with areas that have low impervious-surface
intensities and mixed pixels [211]. During recent decades,
images with high spatial resolution have provided new opportunities for subtle impervious-surface monitoring at
very fine scales. However, impervious-surface monitoring
using VHR imagery is a challenging task. VHR multitemporal images exhibit a large number of details (e.g., buildings,
roads, driveways, and sidewalks), greater spatial heterogeneity (e.g., different viewing geometries), and occlusion by
urban trees, shadow, and vertical structure layover [212].
To address the problem caused by shadow, Li et al. [213]
87

153°10′15″E

153°10′E

153°10′15″E

153°10′E

153°10′15″E

153°10′E

153°10′15″E

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(k)

(l)

(m)

(n)

(o)

(p)

(q)

27°24′30″S

27°24′15″S

Level 1
Local Vegetation Cover

153°10′E

27°24′30″S

27°24′15″S

Level 2
Local Vegetation Community

250 m

153°10′15″E

Not Vegetation
Mangroves
Not Mangroves

27°24′15″S

Vegetation

27°24′30″S

Vegetation Class

Level 3
Mangrove Zonations

153°10′E

Mangrove Zonation

Tree Crowns
Canopy Gaps
Closed Forest,
Avicennia marina
Low-Closed Forest,
A. marina
Open Scrub,
A. marina

27°24′16″S

Level 5
Species Community

Tree Crowns and
Species Community

20 m
153°10′17″E 153°10′18″E

153°10′17″E 153°10′18″E

27°24′15″S

Zone 4

27°24′16″S

Zone 3

Level 4
Tree Crowns

Zone 2

27°24′15″S

Zone 1

153°10′17″E 153°10′18″E

FIGURE 16. Five-level mangrove features generated using different data sets [200]. (a) Level 1 TM, (b) level 1 AVNIR-2, (c) level 1 WorldView-2, (d) WorldView-2 RGB image, (e) level 2 TM, (f) level 2 AVNIR-2, (g) level 2 WorldView-2, (h) level 2 WorldView-2+LiDAR, (i) level 3
AVNIR-2, (j) level 3 WorldView-2, (k) level 3 WorldView-2+LiDAR, (l) level 4 pan-sharpened WorldView-2, (m) level 4 pan-sharpened
WorldView-2+LiDAR, (n) WorldView-2 PC1,2,1, (o) level 5 pan-sharpened WorldView-2, (p) level 5 pan-sharpened WorldView-2+LiDAR,
and (q) aerial photograph.

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

extracted multiscale object features and further classified
shaded areas to extract impervious surfaces using QuickBird
and IKONOS imagery. More recently, Zhang and Huang
[214] developed a two-stage object-based classification
method based on multilevel features (i.e., spectral, textural,
shape, and class related) for time-series impervious-surface
change detection in Shenzhen in 2003–2017, including the
impervious-surface mapping of both nonshaded and shaded areas. As can be seen in Figure 17, in addition to single
changes across the studied period (i.e., cases 1 and 2), some
regions (e.g., case 3) experienced multiple changes.

1) spatial resolution: HR (2–5 m), VHR (1–2 m), and ultraHR (UHR) (<1 m)
2) temporal resolution: bi-temporal and multitemporal
3) analysis unit: pixel, object, and patch
4) change category: binary change (BC), multiple change
(MC), and directional change (DC) categories
5) targets.
In terms of the previously mentioned categorization
schemes, a distribution of the literature reviewed in this study
appears in Figure 18. Most articles use only bi-temporal images (78.12%) and concern binary change (66.32%). With regard to spatial resolution, 43.75% of the papers use UHR images, followed by VHR (33.33%) and HR (22.92%) images.
As for analysis units, pixels and objects have almost the same
number of articles, but patch-based change detection is rarely reported. Of the studies reviewed in this research, more
than half involve land cover and land use change detection
with multiple targets considered, followed by a series of specific targets, including buildings (20%), vegetation (10.53%),
crops (8.42%), lakes and wetlands (5.26%), ecosystem services (3.16%), and impervious surfaces (2.1%).

SUMMARY OF VHR CHANGE
DETECTION DIMENSIONS
As suggested in [10], remote sensing change detection can
be categorized according to different dimensions, e.g., input
data, temporal resolutions, change categories, targets, and
analysis units. Since this research focuses on VHR optical
images, the input data are discussed in terms of spatial resolutions. Therefore, we divide VHR change detection studies
by considering the following five categorization schemes:
2003

2005

2007

2010

2012

2015

2017

(a)
1

2
Unchanged
2010
2012
Multiple Times

2005
2015

2007
2017

N
0

(b)
FIGURE 17. Impervious-surface monitoring results from Shenzhen during 2003–2017. (a) Some typical cases of change profiles and (b)

change detection results [214]. Red borders represent corresponding change times.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

RECOMMENDATIONS FOR FUTURE WORK

but studies related to tracking moving objects (e.g., ships,
planes, trains, and vehicles) in VHR sequential videos are
limited. In [219], the automatic detection and tracking
of moving ships using satellite video was achieved based
on multiscale saliency and surrounding contrast analysis.
Wang et al. [220] presented a UAV-based vehicle detecting
and tracking system, which jointly considered edges, optical flows, and local feature points. The first-ranked team
at the 2016 IEEE Geoscience and Remote Sensing Society
Data Fusion Contest designed an innovative deep neural
network with an MSI and spaceborne video as input, and
object activity was analyzed using the Kanade–Lucas–Tomasi key point tracker [221], [222]. During the coming
years, space videos are likely to be a very important data
source for Earth monitoring, and more promising studies
based on VHR sequential videos can be expected, while a
new era in VHR change detection that shifts from conventional multitemporal change detection to video sequential
tracking may dawn. Despite the preceding attempts, change
tracking using VHR videos is still in its early stage and
needs to be further explored. Notably, unlike conventional
videos, challenges related to satellite video processing may
include the small size of moving objects (e.g., vehicles),
complex backgrounds (e.g., building relief displacement in
urban scenes), camera movements, and low frame rates.

FROM CHANGE DETECTION TO TRACKING
Most VHR change detection studies focus on bi-temporal
images and multiple time series. However, change events,
such as phenology and urban development, cannot be well
characterized by coarse temporal observations. Frequent HR
monitoring of both human and natural activities deserves
much attention, especially when small satellite constellation
(e.g., Planet) images become available. With time series VHR
images, change detection is advanced from simply locating
variations via bi-temporal data to dense time-series monitoring [215]. There have been attempts at time-series monitoring using VHR images of buildings [155], crops [216], water
[195], impervious surfaces [214], newly constructed building
areas [2], forests [217], and landslides [218]. However, most
of these methods are merely an extension of bi-temporal
techniques by multiple pair comparisons, which is not sufficient to capture the temporal context and semantics and to
support time series analysis.
Recently, VHR videos acquired by SkySat-1, Jinlin-1,
and the UrtheCast Iris camera have shown great potential for near-real-time target tracking from space. Most of
the current change detection studies have focused on the
appearance/disappearance and shape changes of objects,

DC
(15.79%)

HR
(22.92%)

Multitemporal
(21.88%)
UHR
(43.75%)
Bi-Temporal
(78.12%)

MC
(17.89%)

BC
(66.32%)

VHR
(33.33%)

(a)

(b)

Ecosystem Services
(3.16%)
Patch
(10.53%)

Lakes and Wetlands
(5.26%)
Pixel
(43.16%)
Object
(46.31%)

Crops
(8.42%)

Land Cover
and Land Use
(50.53%)

Vegetation
(10.53%)
Buildings
(20%)
(d)

(e)

FIGURE 18. The distribution of different dimensions for the studies reviewed in this research: (a) temporal resolution, (b) spatial resolution,
(c) change categories, (d) analysis units, and (e) targets.

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

HR GLOBAL CHANGE DETECTION
Remote sensing imagery has long been considered an effective data source for global change detection, due to its
large coverage area, convenient access, and frequent revisits. Previous multitemporal global maps of land cover and
thematic change detection are often generated at a relatively coarse resolution (i.e., >300 m), e.g., 8-km-resolution
global forest change based on Advanced Very High Resolution
Radiometer data for 1982–1999 [223], 500-m resolution
mapping of the global urban extent from MODIS data from
2005 and 2009 [224], [225], and the 300-m resolution annual Climate Change Initiative Land Cover maps from 1992
to 2015 [226]. More recently, global-scale change detection
with fine spatial resolution (around 30 m) has been attempted with open source Landsat imagery. Notable examples include the Global Forest Cover database [227], GlobeLand30 global land cover product [228], Global Artificial
Impervious Area annual maps [229], Global Surface Water
data sets by the European Commission Joint Research Center [230], and Global Human Settlement Layer framework
[231]. Please note that 30 m is not a high spatial resolution
in a common sense, but it should be regarded as high in
the case of intercontinental and global mapping. Recently,
Gong et al. [232] developed a 10-m resolution global land
cover map through Sentinel-2 images acquired in 2017.
It is a trend that global products are being developed in
finer spatial and temporal resolutions that can characterize heterogeneous and mixed areas more accurately. For
instance, the Planet CubeSats are able to acquire images at
a 3–5-m spatial resolution with near-real-time daily global
coverage [233], which has potential for VHR global change
detection in the future. In addition, cloud computing platforms, such as Google Earth Engine and Amazon Web
Services, can facilitate the processing of large volumes of
satellite images and speed the development of VHR global
mapping [234].
HYPERSPECTRAL CHANGE DETECTION
Hyperspectral data can distinguish more detailed land cover types due to their rich spectral information. For a long
time, the data availability of hyperspectral images seemingly limited real applications in precise change detection.
Recently, however, the development of hyperspectral satellites with a relatively fine spatial resolution, e.g., Gaofen-5
(30 m, with 330 spectral bands), Tiangong-1 (10 m, with
128 spectral bands), and Zhuhai-1 (10 m, with 32 spectral
bands), and airborne hyperspectral sensors, e.g., HyMap
(3 m, with 126 spectral bands) and the Reflective Optics System Imaging Spectrometer (ROSIS) (1.3 m, with 115 spectral
bands), has significantly increased the availability of multitemporal hyperspectral images. However, studies related to
VHR hyperspectral change detection are very limited, and
even the existing methodologies were developed based on
synthetic data [235]. Moreover, advances in hyperspectral
image classification benefit from a set of widely used public
benchmark data sets, e.g., the ROSIS Pavia University and
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

Airborne Visible/Infrared Imaging Spectrometer Salinas
data sets [236]. Therefore, there is an urgent need for public
hyperspectral change detection data sets to promote the development of the related research fields.
URBAN FUNCTIONAL ZONE CHANGE DETECTION
Currently, the classification of urban functional zones is one
of the important research areas in interpreting VHR remote
sensing images, as the urban functional zones can bridge
the semantic gap between land cover and human socioeconomic activities. Current urban functional zone mapping
not only involves various image features, e.g., deep [237],
[238], angular [97], object based [239], and textural [240],
but it also refers to multisource geographic information,
such as points of interest (POIs) [241], social media [242],
and mobile phone positioning [100]. In rapidly urbanizing
regions, the timely and accurate monitoring of urban functional zones is crucial for planning and management. However, studies for change detection in urban functional zones
are lacking. Frankly, urban functional zone change detection
is a difficult task since land cover change does not necessarily
signify the conversion of a functional zone type. Meanwhile,
multisource geographic data, e.g., POIs, are widely used for
functional zone classification [230], but these data do not
provide a time tag, which hampers the dynamic monitoring
of urban functional zones. These issues should be overcome
to effectively monitor changes in cities.
CONCLUSIONS
With the increasing availability of VHR remote sensing
images, precise, frequent, and stereo change detection becomes possible. To the best of our knowledge, a comprehensive review of VHR change detection is lacking in the
current literature. Therefore, this article aimed to summarize recent advances in VHR remote sensing image change
detection, including methods and applications. The review
of methods focused on feature extraction and change detectors for multitemporal VHR images. Applications including
change detection for land cover and land use, impervious
surfaces, buildings, crops, vegetation, lakes and wetlands,
and ecosystem services were reviewed. Finally, some future
directions were suggested and discussed for this important
research area. Recommendations for future work include
focusing on change tracking, global change detection, hyperspectral change detection, and urban functional zone
change detection to generate frequent and detailed semantic change information on a global scale.
ACKNOWLEDGMENTS
The authors are grateful to the editor-in-chief, associate editor, and reviewers for their insightful comments and suggestions. This research was supported by the National Natural Science Foundation of China, under grants 41901279,
41771360, and 41971295, and the Chinese Academy of
Sciences Interdisciplinary Innovation Team, under grant
JCTD-2019-04. (Corresponding author: Xin Huang.)
91

AUTHOR INFORMATION
Dawei Wen (daweiwen@mail.hzau.edu.cn) received the
B.E. degree in surveying and mapping and the Ph.D. degree in photogrammetry and remote sensing from Wuhan
University, Wuhan, China, in 2013 and 2018, respectively.
She is a postdoctoral researcher in the College of Public Administration, Huazhong Agricultural University, Wuhan,
430070, China. Her research interests include the change
analysis of multitemporal remote sensing images and remote sensing applications.
Xin Huang (xhuang@whu.edu.cn) received the Ph.D. degree in photogrammetry and remote sensing in 2009 from
Wuhan University, Wuhan, China. He is a Luojia Distinguished Professor at Wuhan University, Wuhan, 430079,
China, where he teaches remote sensing, photogrammetry,
and image interpretation. He is the founder and director
of the Institute of Remote Sensing Information Processing,
School of Remote Sensing and Information Engineering,
Wuhan University. He has published more than 150 peerreviewed articles (Science Citation Index papers) in international journals. His research interests include remote
sensing image processing methods and applications. He
was the recipient of the Boeing Award for the Best Paper
in Image Analysis and Interpretation from the American
Society for Photogrammetry and Remote Sensing (ASPRS)
in 2010, the second-place recipient of the John I. Davidson
President’s Award from ASPRS in 2018, and the winner
of the IEEE Geoscience and Remote Sensing Society 2014
Data Fusion Contest. He was an associate editor of Photogrammetric Engineering and Remote Sensing (2016–2019) and
of IEEE Geoscience and Remote Sensing Letters (2014–2020),
and he now serves as an associate editor of IEEE Journal of
Selected Topics in Applied Earth Observations and Remote Sensing (since 2018). He is also an editorial board member of
Remote Sensing of Environment (since 2019), Science of Remote
Sensing (since 2020), and Remote Sensing (since 2018). He is
a Senior Member of IEEE.
Francesca Bovolo (bovolo@f bk.eu) received the B.S.
and M.S. degrees in telecommunication engineering (summa
cum laude) and the Ph.D. degree in communication and information technologies from the University of Trento, Italy,
in 2001, 2003, and 2006, respectively, where she remained
as a research fellow until June 2013. She is the founder and
head of the Remote Sensing for Digital Earth unit at Fondazione Bruno Kessler, Trento, 38123, Italy, and a member
of the Remote Sensing Laboratory, Trento. Her research interests include multitemporal remote sensing image analysis; change detection in multispectral, hyperspectral, and
synthetic aperture radar images and VHR images; time series analysis; content-based time series retrieval; domain
adaptation; and lidar and radar sounders. She was the publication chair for the 2015 IEEE International Geoscience
and Remote Sensing Symposium. She is the cochair of the
Society of Photographic Instrumentation Engineers International Conference on Signal and Image Processing for
Remote Sensing. She is a Senior Member of IEEE.
92

Jiayi Li (zjjerica@whu.edu.cn) received the B.S. degree
from Central South University, Changsha, China, in 2011
and the Ph.D. degree in photogrammetry and remote sensing from Wuhan University, Wuhan, China, in 2016. She
is currently an assistant professor in the School of Remote
Sensing and Information Engineering, Wuhan University,
Wuhan, 430079, China. She has authored more than 30
peer-reviewed articles (Science Citation Index papers) in
international journals. Her research interests include hyperspectral imagery, sparse representation, computation vision and pattern recognition, and remote sensing images.
She is a reviewer for more than 10 international journals,
including IEEE Transactions on Geoscience and Remote Sensing, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, IEEE Geoscience and Remote Sensing
Letters, IEEE Signal Processing Letters, and International Journal
of Remote Sensing. She is the guest editor of the special issue “Change Detection Using Multisource Remotely Sensed
Imagery” of Remote Sensing (an open-access journal of the
Multidisciplinary Digital Publishing Institute). She is a
Member of IEEE.
Xinli Ke (kexl@mail.hzau.edu.cn) received the B.S. degree in land planning and utilization from Huazhong Agricultural University, Wuhan, China, in 2001 and the M.S.
degree in cartography and geographical information systems
and the Ph.D. degree in photogrammetry and remote sensing from Wuhan University, Wuhan, China, in 2006 and
2009, respectively. He is a professor in the College of Public
Administration, Huazhong Agricultural University, Wuhan,
430070, China.
Anlu Zhang (zhanganlu@mail.hzau.edu.cn) received the
Ph.D. degree in 1999 from Huazhong Agricultural University, Wuhan, China. He has been a professor at Huazhong
Agricultural University, Wuhan, 430070, China, since 2000.
He is an executive director of the China Land Society; deputy director of the Academic Committee, China Land Society;
deputy director of the Youth Working Committee, China
Land Society; and a member of the Expert Committee, Land
Remediation Center, Ministry of Land and Resources.
Jón Atli Benediktsson (benedikt@hi.is) received the
Cand.Sci. degree in electrical engineering from the University of Iceland, Reykjavik, Iceland, in 1984, and the
M.S.E.E. and Ph.D. degrees in electrical engineering from
Purdue University, West Lafayette, Indiana, USA, in 1987
and 1990, respectively. He is with the Faculty of Electrical
and Computer Engineering, University of Iceland, Reykjavik, IS 107, Iceland. From 2009 to 2015, he was the prorector of science and academic affairs and a professor of
electrical and computer engineering at the University of
Iceland. In 2015, he was the rector of the University of Iceland. He is a cofounder of Oxymap, Reykjavik, a biomedical start-up company. He has authored and coauthored
extensively in his fields of interest. His research interests
include remote sensing, image analysis, pattern recognition, biomedical analysis of signals, and signal processing.
He is a Fellow of IEEE.
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

REFERENCES
[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

S. Liu, Q. Du, X. Tong, A. Samat, and L. Bruzzone, “Unsupervised change detection in multispectral remote sensing images
via spectral-spatial band expansion,” IEEE J. Select. Topics Appl.
Earth Observat. Remote Sens., vol. 12, no. 9, pp. 3578–3587, 2019.
doi: 10.1109/JSTARS.2019.2929514.
X. Huang, Y. Cao, and J. Li, “An automatic change detection
method for monitoring newly constructed building areas using time-series multi-view high-resolution optical satellite images,” Remote Sensing Environment, vol. 244, p. 111,802, 2020.
doi: 10.1016/j.rse.2020.111802.
I. J, P. Coppin, K. Nackaerts, B. Muys, and E. Lambin, “Digital
change detection methods in ecosystem monitoring: A review,”
Int. J. Remote Sens., vol. 25, no. 9, pp. 1565–1596, 2004. doi:
10.1080/0143116031000101675.
D. Lu, P. Mausel, E. Brondizio, and E. Moran, “Change detection techniques,” Int. J. Remote Sens., vol. 25, no. 12, pp. 2365–
2401, 2004. doi: 10.1080/0143116031000139863.
A. P. Tewkesbury, A. J. Comber, N. J. Tate, A. Lamb, and P. F.
Fisher, “A critical synthesis of remotely sensed optical image
change detection techniques,” Remote Sens. Environ., vol. 160,
pp. 1–14, 2015. doi: 10.1016/j.rse.2015.01.006.
M. Hussain, D. Chen, A. Cheng, H. Wei, and D. Stanley,
“Change detection from remotely sensed images: From pixelbased to object-based approaches,” ISPRS J. Photogrammetry
Remote Sens., vol. 80, pp. 91–106, June 2013. doi: 10.1016/j.isprsjprs.2013.03.006.
G. Chen, G. J. Hay, L. M. T. Carvalho, and M. A. Wulder, “Object-based change detection,” Int. J. Remote Sens., vol. 33, no. 14,
pp. 4434–4457, 2012. doi: 10.1080/01431161.2011.648285.
S. Liu, D. Marinelli, L. Bruzzone, and F. Bovolo, “A review of
change detection in multitemporal hyperspectral images: Current techniques, applications, and challenges,” IEEE Geosci. Remote Sens. Mag. (replaces Newslett.), vol. 7, no. 2, pp. 140–158,
2019. doi: 10.1109/MGRS.2019.2898520.
F. Bovolo and L. Bruzzone, “The time variable in data fusion:
A change detection perspective,” IEEE Geosci. Remote Sens. Mag.
(replaces Newslett.), vol. 3, no. 3, pp. 8–26, 2015. doi: 10.1109/
MGRS.2015.2443494.
H. Si Salah, S. E. Goldin, A. Rezgui, B. Nour El Islam, and S. AitAoudia, “What is a remote sensing change detection technique?
Towards a conceptual framework,” Int. J. Remote Sens., vol. 41, no.
5, pp. 1788–1812, 2020. doi: 10.1080/01431161.2019.1674463.
H. Han et al., “A mixed property-based automatic shadow detection approach for VHR multispectral remote sensing images,” Appl. Sci., vol. 8, no. 10, p. 1883, 2018. doi: 10.3390/
app8101883.
C. Toth and G. Jóźków, “Remote sensing platforms and sensors: A survey,” ISPRS J. Photogrammetry Remote Sens., vol. 115,
pp. 22–36, May 2016. doi: 10.1016/j.isprsjprs.2015.10.004.
D. Poli and T. Toutin, “Review of developments in geometric
modelling for high resolution satellite pushbroom sensors,”
Photogrammetric Rec., vol. 27, no. 137, pp. 58–73, 2012. doi:
10.1111/j.1477-9730.2011.00665.x.
M. Dalla Mura, S. Prasad, F. Pacifici, P. Gamba, J. Chanussot, and J. A. Benediktsson, “Challenges and opportu-

DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

nities of multimodality and data fusion in remote sensing,”
Proc. IEEE, vol. 103, no. 9, pp. 1585–1601, 2015. doi: 10.1109/
JPROC.2015.2462751.
R. Momeni, P. Aplin, and D. Boyd, “Mapping complex urban
land cover from spaceborne imagery: The influence of spatial
resolution, spectral band set and classification approach,” Remote Sens., vol. 8, no. 2, p. 88, 2016. doi: 10.3390/rs8020088.
M. Volpi, D. Tuia, F. Bovolo, M. Kanevski, and L. Bruzzone,
“Supervised change detection in VHR images using contextual
information and support vector machines,” Int. J. Appl. Earth
Observat. Geoinf., vol. 20, pp. 77–85, Feb. 2013. doi: 10.1016/j.
jag.2011.10.013.
J. P. Ardila, W. Bijker, V. A. Tolpekin, and A. Stein, “Multitemporal change detection of urban trees using localized regionbased active contours in VHR images,” Remote Sensing Environ.,
vol. 124, pp. 413–426, 2012. doi: 10.1016/j.rse.2012.05.027.
J. Gong, C. Liu, and X. Huang, “Advances in urban information
extraction from high-resolution remote sensing imagery,” Sci.
China Earth Sci., vol. 63, no. 4, pp. 463–475, 2020. doi: 10.1007/
s11430-019-9547-x.
R. Qin, X. Huang, A. Gruen, and G. Schmitt, “Object-based 3-D
building change detection on multitemporal stereo images,”
IEEE J. Select. Topics Appl. Earth Observat. Remote Sens., vol. 8, no.
5, pp. 2125–2137, 2015. doi: 10.1109/JSTARS.2015.2424275.
D. Liu et al., “Integration of historical map and aerial imagery to characterize long-term land-use change and landscape
dynamics: An object-based analysis via Random Forests,”
Ecol. Indicators, vol. 95, pp. 595–605, Dec. 2018. doi: 10.1016/j.
ecolind.2018.08.004.
X. Huang, D. Wen, J. Li, and R. Qin, “Multi-level monitoring of
subtle urban changes for the megacities of China using highresolution multi-view satellite imagery,” Remote Sens. Environ.,
vol. 196, pp. 56–75, July 2017. doi: 10.1016/j.rse.2017.05.001.
G. Xian and C. Homer, “Updating the 2001 National Land
Cover Database impervious surface products to 2006 using
Landsat imagery change detection methods,” Remote Sens.
Environ., vol. 114, no. 8, pp. 1676–1686, 2010. doi: 10.1016/j.
rse.2010.02.018.
M. Pesaresi et al., “A global human settlement layer from optical HR/VHR RS data: Concept and first results,” IEEE J. Select.
Topics Appl. Earth Observat. Remote Sens., vol. 6, no. 5, pp. 2102–
2131, 2013. doi: 10.1109/JSTARS.2013.2271445.
L. Bruzzone and F. Bovolo, “A novel framework for the design
of change-detection systems for very-high-resolution remote
sensing images,” Proc. IEEE, vol. 101, no. 3, pp. 609–630, 2012.
doi: 10.1109/JPROC.2012.2197169.
M. Lu, J. Chen, H. Tang, Y. Rao, P. Yang, and W. Wu, “Land cover change detection by integrating object-based data blending
model of Landsat and MODIS,” Remote Sens. Environ., vol. 184,
pp. 374–386, Oct. 2016. doi: 10.1016/j.rse.2016.07.028.
S. Ye, D. Chen, and J. Yu, “A targeted change-detection procedure by combining change vector analysis and post-classification approach,” ISPRS J. Photogrammetry Remote Sens., vol. 114,
pp. 115–124, Apr. 2016. doi: 10.1016/j.isprsjprs.2016.01.018.
N. Longbotham, C. Chaapel, L. Bleiler, C. Padwick, W. J. Emery, and F. Pacifici, “Very high resolution multiangle urban

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

classification analysis,” IEEE Trans. Geosci. Remote Sens., vol. 50,
no. 4, pp. 1155–1170, 2012. doi: 10.1109/TGRS.2011.2165548.
D. Poli, F. Remondino, E. Angiuli, and G. Agugiaro, “Radiometric and geometric evaluation of GeoEye-1, WorldView-2
and Pléiades-1A stereo images for 3D information extraction,”
ISPRS J. Photogrammetry Remote Sens., vol. 100, pp. 35–47,
2015/02/01/, 2015. doi: 10.1016/j.isprsjprs.2014.04.007.
F. Pacifici, N. Longbotham, and W. J. Emery, “The importance
of physical quantities for the analysis of multitemporal and
multiangular optical very high spatial resolution images,” IEEE
Trans. Geosci. Remote Sens., vol. 52, no. 10, pp. 6241–6256, 2014.
doi: 10.1109/TGRS.2013.2295819.
K. Jacobsen, “High resolution satellite imaging systems-an
overview,” Photogrammetrie Fernerkundung Geoinf., vol. 2005,
pp. 487–496, Jan. 2005.
D. Wen, X. Huang, L. Zhang, and J. A. Benediktsson, “A novel
automatic change detection method for urban high-resolution
remotely sensed imagery based on multiindex scene representation,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 1, pp. 609–
625, 2016. doi: 10.1109/TGRS.2015.2463075.
N. Tatar, M. Saadatseresht, H. Arefi, and A. Hadavand, “A robust object-based shadow detection method for cloud-free high
resolution satellite images over urban areas and water bodies,” Adv. Space Res., vol. 61, no. 11, pp. 2787–2800, 2018. doi:
10.1016/j.asr.2018.03.011.
A. Movia, A. Beinat, and F. Crosilla, “Shadow detection and removal in RGB VHR images for land use unsupervised classification,” ISPRS J. Photogrammetry Remote Sen., vol. 119, pp. 485–
495, Sept. 2016. doi: 10.1016/j.isprsjprs.2016.05.004.
G. Liasis and S. Stavrou, “Satellite images analysis for shadow
detection and building height estimation,” ISPRS J. Photogrammetry Remote Sens., vol. 119, pp. 437–450, Sept. 2016. doi:
10.1016/j.isprsjprs.2016.07.006.
N. Kadhim and M. Mourshed, “A shadow-overlapping algorithm for estimating building heights from VHR satellite images,” IEEE Geosci. Remote Sens. Lett., vol. 15, no. 1, pp. 8–12,
2018. doi: 10.1109/LGRS.2017.2762424.
X. Huang and L. Zhang, “A multidirectional and multiscale morphological index for automatic building extraction from multispectral GeoEye-1 imagery,” Photogrammetric Eng. Remote Sens.,
vol. 77, no. 7, pp. 721–732, 2011. doi: 10.14358/PERS.77.7.721.
H. Song, B. Huang, and K. Zhang, “Shadow detection and
reconstruction in high-resolution satellite images via morphological filtering and example-based learning,” IEEE Trans.
Geosci. Remote Sens., vol. 52, no. 5, pp. 2545–2554, 2014. doi:
10.1109/TGRS.2013.2262722.
T. Blaschke et al., “Geographic object-based image analysis – Towards a new paradigm,” ISPRS J. Photogrammetry Remote Sens., vol.
87, pp. 180–191, Jan. 2014. doi: 10.1016/j.isprsjprs.2013.09.014.
R. M. Haralick, K. Shanmugam, and I. H. Dinstein, “Textural
features for image classification,” IEEE Trans. on systems, man,
and cybernetics, vol. SMC-3, no. 6, pp. 610–621, 1973. doi:
10.1109/TSMC.1973.4309314.
M. Hall-Beyer, “GLCM texture: A tutorial, version v3.0,” Univ.
of Calgary, 2007. [Online]. Available: ttp://www.fp.ucalgary.ca/
mhallbey/tutorial.htm

[41] S. Yao, S. Pan, T. Wang, C. Zheng, W. Shen, and Y. Chong, “A
new pedestrian detection method based on combined HOG
and LSS features,” Neurocomputing, vol. 151, pp. 1006–1014,
Mar. 2015. doi: 10.1016/j.neucom.2014.08.080.
[42] L. Zhang, X. Huang, B. Huang, and P. Li, “A pixel shape index
coupled with spectral information for classification of high
spatial resolution remotely sensed imagery,” IEEE Trans. Geosci.
Remote Sens., vol. 44, no. 10, pp. 2950–2961, 2006.
[43] K. Tan, X. Jin, A. Plaza, X. Wang, L. Xiao, and P. Du, “Automatic change detection in high-resolution remote sensing
images by using a multiple classifier system and spectral–
spatial features,” IEEE J. Select. Topics Appl. Earth Observat. Remote Sens., vol. 9, no. 8, pp. 3439–3451, 2016. doi: 10.1109/
JSTARS.2016.2541678.
[44] Z. Li, W. Shi, M. Hao, and H. Zhang, “Unsupervised change detection using spectral features and a texture difference measure for
VHR remote-sensing images,” Int. J. Remote Sens., vol. 38, no. 23,
pp. 7302–7315, 2017. doi: 10.1080/01431161.2017.1375616.
[45] D. Peng and Y. Zhang, “Object-based change detection from
satellite imagery by segmentation optimization and multifeatures fusion,” Int. J. Remote Sens., vol. 38, no. 13, pp. 3886–
3905, 2017. doi: 10.1080/01431161.2017.1308033.
[46] L. Zhang, B. Zhong, and A. Yang, “Building change detection
using object-oriented LBP feature map in very high spatial resolution imagery,” in Proc. 10th Int. Workshop on the Anal. Multitemporal Remote Sens. Images (MultiTemp), 2019, pp. 1–4. doi:
10.1109/Multi-Temp.2019.8866919.
[47] H. Liu, M. Yang, J. Chen, J. Hou, and M. Deng, “Line-constrained shape feature for building change detection in VHR
remote sensing imagery,” ISPRS Int. J. Geo-Inform., vol. 7, no. 10,
p. 410, 2018. doi: 10.3390/ijgi7100410.
[48] M. Dalla Mura, J. A. Benediktsson, F. Bovolo, and L. Bruzzone,
“An unsupervised technique based on morphological filters
for change detection in very high resolution images,” IEEE
Geosci. Remote Sens. Lett., vol. 5, no. 3, pp. 433–437, 2008. doi:
10.1109/LGRS.2008.917726.
[49] N. Falco, M. Dalla Mura, F. Bovolo, J. A. Benediktsson, and L.
Bruzzone, “Change detection in VHR images based on morphological attribute profiles,” IEEE Geosci. Remote Sens. Lett., vol. 10,
no. 3, pp. 636–640, 2013. doi: 10.1109/LGRS.2012.2222340.
[50] S. Liu, Q. Du, X. Tong, A. Samat, L. Bruzzone, and F. Bovolo,
“Multiscale morphological compressed change vector analysis
for unsupervised multiple change detection,” IEEE J. Select. Topics Appl. Earth Observat. Remote Sens., vol. 10, no. 9, pp. 4124–
4137, 2017. doi: /10.1109/JSTARS.2017.2712119.
[51] X. Huang, L. Zhang, and T. Zhu, “Building change detection
from multitemporal high-resolution remotely sensed images
based on a morphological building index,” IEEE J. Select. Topics
Appl. Earth Observat. Remote Sens., vol. 7, no. 1, pp. 105–115, 2014.
[52] Y. Tang, X. Huang, and L. Zhang, “Fault-tolerant building change
detection from urban high-resolution remote sensing imagery,”
IEEE Geosci. Remote Sens. Lett., vol. 10, no. 5, pp. 1060–1064, 2013.
[53] X. Huang, T. Zhu, L. Zhang, and Y. Tang, “A novel building change
index for automatic building change detection from high-resolution remote sensing imagery,” Remote sensing letters, vol. 5, no.
8, pp. 713–722, 2014. doi: 10.1080/2150704X.2014.963732.
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

[54] G. R. Cross and A. K. Jain, “Markov random field texture models,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-5, no. 1,
pp. 25–39, 1983. doi: 10.1109/TPAMI.1983.4767341.
[55] T. Xu, I. D. Moore, and J. C. Gallant, “Fractals, fractal dimensions and landscapes—A review,” Geomorphology, vol. 8, no. 4,
pp. 245–262, 1993. doi: 10.1016/0169-555X(93)90022-T.
[56] C. Benedek, M. Shadaydeh, Z. Kato, T. Szirányi, and J. Zerubia,
“Multilayer Markov Random Field models for change detection in optical remote sensing images,” ISPRS J. Photogrammetry
Remote Sensing, vol. 107, pp. 22–37, Sept. 2015. doi: 10.1016/j.
isprsjprs.2015.02.006.
[57] L. Bruzzone and D. F. Prieto, “An adaptive semiparametric and
context-based approach to unsupervised change detection in multitemporal remote-sensing images,” IEEE Trans. Image Process., vol.
11, no. 4, pp. 452–466, 2002. doi: 10.1109/TIP.2002.999678.
[58] A. Ghosh, B. N. Subudhi, and L. Bruzzone, “Integration of
Gibbs Markov random field and Hopfield-type neural networks for unsupervised change detection in remotely sensed
multitemporal images,” IEEE Trans. Image Process., vol. 22,
no. 8, pp. 3087–3096, 2013. doi: 10.1109/TIP.2013.2259833.
[59] B. N. Subudhi, F. Bovolo, A. Ghosh, and L. Bruzzone, “Spatiocontextual fuzzy clustering with Markov random field model
for change detection in remotely sensed images,” Optics Laser
Technol., vol. 57, pp. 284–292, Apr. 2014. doi: 10.1016/j.optlastec.2013.10.003.
[60] H. Yu, W. Yang, G. Hua, H. Ru, and P. Huang, “Change detection using high resolution remote sensing images based on active learning and Markov random fields,” Remote Sensing, vol. 9,
no. 12, p. 1233, 2017. doi: 10.3390/rs9121233.
[61] S. Aleksandrowicz, A. Wawrzaszek, W. Drzewiecki, and M.
Krupiński, “Change detection using global and local multifractal description,” IEEE Geosci. Remote Sens. Lett., vol. 13, no. 8,
pp. 1183–1187, 2016. doi: 10.1109/LGRS.2016.2574940.
[62] S. Luan, C. Chen, B. Zhang, J. Han, and J. Liu, “Gabor convolutional networks,” IEEE Trans. Image Process., vol. 27, no. 9,
pp. 4357–4366, 2017.
[63] Z. Li, W. Shi, H. Zhang, and M. Hao, “Change detection based
on Gabor wavelet features for very high resolution remote
sensing images,” IEEE Geosci. Remote Sens. Lett., vol. 14, no. 5,
pp. 783–787, 2017. doi: 10.1109/LGRS.2017.2681198.
[64] C. Wei, P. Zhao, X. Li, Y. Wang, and F. Liu, “Unsupervised
change detection of VHR remote sensing images based on
multi-resolution Markov Random Field in wavelet domain,”
Int. J. Remote Sens., vol. 40, no. 20, pp. 7750–7766, 2019. doi:
10.1080/01431161.2019.1602792.
[65] Q. Li, X. Huang, D. Wen, and H. Liu, “Integrating multiple textural features for remote sensing image change detection,” Photogrammetric Eng. Remote Sens., vol. 83, no. 2, pp. 109–121, 2017.
doi: 10.14358/PERS.83.2.109.
[66] B. Hou, Y. Wang, and Q. Liu, “Change detection based on deep
features and low rank,” IEEE Geosci. Remote Sens. Lett., vol. 14,
no. 12, pp. 2418–2422, 2017. doi: 10.1109/LGRS.2017.2766840.
[67] S. Saha, F. Bovolo, and L. Bruzzone, “Unsupervised deep change
vector analysis for multiple-change detection in VHR images,”
IEEE Trans. Geosci. Remote Sens., vol. 57, no. 6, pp. 3677–3693,
2019. doi: 10.1109/TGRS.2018.2886643.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

[68] T. Zhan, M. Gong, J. Liu, and P. Zhang, “Iterative feature mapping network for detecting multiple changes in multi-source remote sensing images,” ISPRS J. Photogrammetry Remote Sens., vol.
146, pp. 38–51, Dec. 2018. doi: 10.1016/j.isprsjprs.2018.09.002.
[69] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol, “Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion,”
J. Mach. Learning Res., vol. 11, pp. 3371–3408, Dec. 2010.
[70] P. Zhang, M. Gong, L. Su, J. Liu, and Z. Li, “Change detection
based on deep feature representation and mapping transformation for multi-spatial-resolution remote sensing images,” ISPRS
J. Photogrammetry Remote Sens., vol. 116, pp. 24–41, June 2016.
doi: 10.1016/j.isprsjprs.2016.02.013.
[71] L. Su, M. Gong, P. Zhang, M. Zhang, J. Liu, and H. Yang,
“Deep learning and mapping based ternary change detection
for information unbalanced images,” Pattern Recogn., vol. 66,
pp. 213–228, June 2017. doi: 10.1016/j.patcog.2017.01.002.
[72] G. Liu, L. Li, L. Jiao, Y. Dong, and X. Li, “Stacked Fisher autoencoder for SAR change detection,” Pattern Recogn., vol. 96,
p. 106,971, Dec. 2019. doi: 10.1016/j.patcog.2019.106971.
[73] N. Lv, C. Chen, T. Qiu, and A. K. Sangaiah, “Deep learning and
superpixel feature extraction based on contractive autoencoder
for change detection in SAR images,” IEEE Trans. Ind. Inf., vol. 14,
no. 12, pp. 5530–5538, 2018. doi: 10.1109/TII.2018.2873492.
[74] X. X. Zhu et al., “Deep learning in remote sensing: A comprehensive review and list of resources,” IEEE Geosci. Remote
Sens. Mag. (replaces Newslett.), vol. 5, no. 4, pp. 8–36, 2017. doi:
10.1109/MGRS.2017.2762307.
[75] T. Zhan, M. Gong, X. Jiang, and M. Zhang, “Unsupervised
scale-driven change detection with deep spatial-spectral features for VHR images,” IEEE Trans. Geosci. Remote Sens., vol. 58,
no. 8, pp. 1–13, 2020. doi: 10.1109/TGRS.2020.2968098.
[76] S. Saha, L. Mou, C. Qiu, X. X. Zhu, F. Bovolo, and L. Bruzzone,
“Unsupervised deep joint segmentation of multitemporal highresolution images,” IEEE Trans. Geosci. Remote Sens., vol. 58,
no. 12, pp. 1–13, 2020. doi: 10.1109/TGRS.2020.2990640.
[77] Q. Wang, X. Zhang, G. Chen, F. Dai, Y. Gong, and K. Zhu,
“Change detection based on Faster R-CNN for high-resolution
remote sensing images,” Remote Sensing Letters, vol. 9, no. 10,
pp. 923–932, 2018. doi: 10.1080/2150704X.2018.1492172.
[78] J. Liu et al., “Convolutional neural network-based transfer
learning for optical aerial images change detection,” IEEE
Geosci. Remote Sens. Lett., vol. 17, no. 1, pp. 127–131, 2019. doi:
10.1109/LGRS.2019.2916601.
[79] M. Volpi and D. Tuia, “Dense semantic labeling of subdecimeter resolution images with convolutional neural networks,”
IEEE Trans. Geosci. Remote Sens., vol. 55, no. 2, pp. 881–893,
2016. doi: 10.1109/TGRS.2016.2616585.
[80] L. Gueguen and R. Hamid, “Toward a generalizable image representation for large-scale change detection: Application to generic damage analysis,” IEEE Trans. Geosci. Remote Sens., vol. 54,
no. 6, pp. 3378–3387, 2016. doi: 10.1109/TGRS.2016.2516402.
[81] R. Caye Daudt, B. Le Saux, A. Boulch, and Y. Gousseau, “Multitask learning for large-scale semantic change detection,” Comput. Vision Image Understanding, vol. 187, p. 102783, 2019. doi:
10.1016/j.cviu.2019.07.003.

[82] R. Gupta et al., “Creating xBD: A dataset for assessing building
damage from satellite imagery,” in Proc. IEEE Conf. Comput. Vision and Pattern Recogn. Workshops, 2019, pp. 10–17.
[83] J. Zhu, Y. Su, Q. Guo, and T. C. Harmon, “Unsupervised objectbased differencing for land-cover change detection,” Photogrammetric Eng. Remote Sens., vol. 83, no. 3, pp. 225–236, 2017.
doi: 10.14358/PERS.83.3.225.
[84] D. Ming, J. Li, J. Wang, and M. Zhang, “Scale parameter selection by spatial statistics for GeOBIA: Using mean-shift based
multi-scale segmentation as an example,” ISPRS J. Photogrammetry Remote Sens., vol. 106, pp. 28–41, Aug. 2015. doi: 10.1016/
j.isprsjprs.2015.04.010.
[85] P. Xiao, M. Yuan, X. Zhang, X. Feng, and Y. Guo, “Cosegmentation for object-based building change detection from highresolution remotely sensed images,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 3, pp. 1587–1603, 2017. doi: 10.1109/
TGRS.2016.2627638.
[86] Y. Liu, Q. Guo, and M. Kelly, “A framework of region-based
spatial relations for non-overlapping features and its application in object based image analysis,” ISPRS J. Photogrammetry
Remote Sens., vol. 63, no. 4, pp. 461–475, 2008. doi: 10.1016/
j.isprsjprs.2008.01.007.
[87] M. Kim and M. Madden, Xu, Bo, “GEOBIA vegetation mapping
in great smoky mountains national park with spectral and nonspectral ancillary information,” Photogrammetric Eng. Remote Sensing, vol. 76, no. 2, pp. 137–149, 2010. doi: 10.14358/PERS.76.2.137.
[88] Z. Lv, T. Liu, and J. A. Benediktsson, “Object-oriented key
point vector distance for binary land cover change detection
using VHR remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 9, pp. 6524–6533, 2020. doi: 10.1109/
TGRS.2020.2977248.
[89] F. Bovolo, “A multilevel parcel-based approach to change detection in very high resolution multitemporal images,” IEEE
Geosci. Remote Sens. Lett., vol. 6, no. 1, pp. 33–37, 2009. doi:
10.1109/LGRS.2008.2007429.
[90] C. Geiß, M. Klotz, A. Schmitt, and H. Taubenböck, “Objectbased morphological profiles for classification of remote sensing imagery,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 10,
pp. 5952–5963, 2016. doi: 10.1109/TGRS.2016.2576978.
[91] J. Liang et al., “A comparison of two object-oriented methods
for land-use/cover change detection with SPOT 5 imagery,”
Sensor Lett., vol. 10, no. 1, pp. 415–424, 2012. doi: 10.1166/
sl.2012.1865.
[92] W. Yu, W. Zhou, Y. Qian, and J. Yan, “A new approach for land
cover classification and change analysis: Integrating backdating and an object-based method,” Remote Sensing Environment,
vol. 177, pp. 37–47, May 2016. doi: 10.1016/j.rse.2016.02.030.
[93] X. Zhang, S. Du, Q. Wang, and W. Zhou, “Multiscale geoscene
segmentation for extracting urban functional zones from VHR
satellite images,” Remote Sens., vol. 10, no. 2, p. 281, 2018. doi:
10.3390/rs10020281.
[94] H. Liu, X. Huang, D. Wen, and J. Li, “The use of landscape metrics and transfer learning to explore urban villages in China,”
Remote Sens., vol. 9, no. 4, p. 365, 2017. doi: 10.3390/rs9040365.
[95] J. Zhou, B. Yu, and J. Qin, “Multi-level spatial analysis for
change detection of urban vegetation at individual tree scale,”

Remote Sens., vol. 6, no. 9, pp. 9086–9103, 2014. doi: 10.3390/
rs6099086.
[96] M. A. Aguilar, M. D. M. Saldana, and F. J. Aguilar, “Generation and
quality assessment of stereo-extracted DSM from GeoEye-1 and
WorldView-2 imagery,” IEEE Trans. Geosci. Remote Sens., vol. 52,
no. 2, pp. 1259–1271, 2013. doi: 10.1109/TGRS.2013.2249521.
[97] X. Huang, H. Chen, and J. Gong, “Angular difference feature
extraction for urban scene classification using ZY-3 multi-angle high-resolution satellite imagery,” ISPRS J. Photogrammetry
Remote Sens., vol. 135, pp. 127–141, Jan. 2018. doi: 10.1016/j.
isprsjprs.2017.11.017.
[98] H. Chaabouni-Chouayakh, I. R. Arnau, and P. Reinartz,
“Towards automatic 3-D change detection through multispectral and digital elevation model information fusion,”
Int. J. Image Data Fusion, vol. 4, no. 1, pp. 89–101, 2013. doi:
10.1080/19479832.2012.739577.
[99] J. Tian, P. Reinartz, P. d’Angelo, and M. Ehlers, “Region-based
automatic building and forest change detection on Cartosat-1
stereo imagery,” ISPRS J. Photogrammetry Remote Sens., vol. 79,
pp. 226–239, May 2013. doi: 10.1016/j.isprsjprs.2013.02.017.
[100] W. Tu et al., “Portraying urban functional zones by coupling remote sensing imagery and human sensing data,” Remote Sens.,
vol. 10, no. 1, p. 141, 2018. doi: 10.3390/rs10010141.
[101] C. Liu, X. Huang, Z. Zhu, H. Chen, X. Tang, and J. Gong, “Automatic extraction of built-up area from ZY3 multi-view satellite
imagery: Analysis of 45 global cities,” Remote Sens. Environ., vol.
226, pp. 51–73, June 2019. doi: 10.1016/j.rse.2019.03.033.
[102] R. Duca and F. D. Frate, “Hyperspectral and multiangle
CHRIS–PROBA images for the generation of land cover maps,”
IEEE Trans. Geosci. Remote Sens., vol. 46, no. 10, pp. 2857–2866,
2008. doi: 10.1109/TGRS.2008.2000741.
[103] Y. Yan, L. Deng, X. Liu, and L. Zhu, “Application of UAV-based
multi-angle hyperspectral remote sensing in fine vegetation
classification,” Remote Sens., vol. 11, no. 23, p. 2753, 2019. doi:
10.3390/rs11232753.
[104] M. Zanetti and L. Bruzzone, “A theoretical framework for
change detection based on a compound multiclass statistical model of the difference image,” IEEE Trans. Geosci. Remote Sens., vol. 56, no. 2, pp. 1129–1143, 2017. doi: 10.1109/
TGRS.2017.2759663.
[105] Y. T. Solano-Correa, F. Bovolo, and L. Bruzzone, “An approach
to multiple change detection in VHR optical images based on
iterative clustering and adaptive thresholding,” IEEE Geosci.
Remote Sens. Lett., vol. 16, no. 8, pp. 1–5, 2019. doi: 10.1109/
LGRS.2019.2896385.
[106] J. S. Deng, K. Wang, Y. H. Deng, and G. J. Qi, “PCA‐based land‐
use change detection and analysis using multitemporal and
multisensor satellite data,” Int. J. Remote Sens., vol. 29, no. 16,
pp. 4823–4838, 2008. doi: 10.1080/01431160801950162.
[107] A. Tahraoui, R. Kheddam, A. Bouakache, and A. Belhadj-Aissa,
“Land change detection using multivariate alteration detection
and Chi squared test thresholding,” in Proc. 4th Int. Conf. Adv.
Technol. Signal and Image Process. (ATSIP), 2018, pp. 1–6. doi:
10.1109/ATSIP.2018.8364501.
[108] C. Wu, L. Zhang, and L. Zhang, “A scene change detection
framework for multi-temporal very high resolution remote
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

sensing images,” Signal Process., vol. 124, pp. 184–197, July
2016. doi: 10.1016/j.sigpro.2015.09.020.
[109] X. Zhang, R. Fan, L. Ma, X. Liao, and X. Chen, “Change detection in very high-resolution images based on ensemble CNNs,”
Int. J. Remote Sens., vol. 41, no. 12, pp. 4757–4779, 2020. doi:
10.1080/01431161.2020.1723818.
[110] D. Peng, Y. Zhang, and H. Guan, “End-to-end change detection
for high resolution satellite images using improved UNet++,” Remote Sens., vol. 11, no. 11, p. 1382, 2019. doi: 10.3390/rs11111382.
[111] L. Mou, L. Bruzzone, and X. X. Zhu, “Learning spectral-spatialtemporal features via a recurrent convolutional neural network
for change detection in multispectral imagery,” IEEE Trans.
Geosci. Remote Sens., vol. 57, no. 2, pp. 924–935, 2019. doi:
10.1109/TGRS.2018.2863224.
[112] T. Bao, C. Fu, T. Fang, and H. Huo, “PPCNET: A combined patch-level
and pixel-level end-to-end deep network for high-resolution remote
sensing image change detection,” IEEE Geosci. Remote Sens. Lett., vol.
17, no. 10, pp. 1–5, 2020. doi: 10.1109/LGRS.2019.2955309.
[113] C. Zhang et al., “A deeply supervised image fusion network for
change detection in high resolution bi-temporal remote sensing images,” ISPRS J. Photogrammetry Remote Sensing, vol. 166,
pp. 183–200, Aug. 2020. doi: 10.1016/j.isprsjprs.2020.06.003.
[114] T. Lei, Y. Zhang, Z. Lv, S. Li, S. Liu, and A. K. Nandi, “Landslide
inventory mapping from bitemporal images using deep convolutional neural networks,” IEEE Geosci. Remote Sens. Lett., vol. 16,
no. 6, pp. 982–986, 2019. doi: 10.1109/LGRS.2018.2889307.
[115] W. Wiratama, J. Lee, S.-E. Park, and D. Sim, “Dual-dense convolution network for change detection of high-resolution panchromatic imagery,” Appl. Sci., vol. 8, no. 10, p. 1785, 2018. doi:
10.3390/app8101785.
[116] W. Zhang and X. Lu, “The spectral-spatial joint learning for
change detection in multispectral imagery,” Remote Sens.,
vol. 11, no. 3, p. 240, 2019. doi: 10.3390/rs11030240.
[117] A. Song and J. Choi, “Fully convolutional networks with multiscale 3D filters and transfer learning for change detection in
high spatial resolution satellite images,” Remote Sens., vol. 12,
no. 5, p. 799, 2020. doi: 10.3390/rs12050799.
[118] M. Zhai, H. Liu, and F. Sun, “Lifelong learning for scene recognition in remote sensing images,” IEEE Geosci. Remote Sens. Lett., vol.
16, no. 9, pp. 1472–1476, 2019. doi: 10.1109/LGRS.2019.2897652.
[119] M. Rußwurm, S. Wang, M. Korner, and D. Lobell, “Meta-learning
for few-shot land cover classification,” in Proc. IEEE/CVF Conf.
Comput. Vision Pattern Recogn. Workshops, 2020, pp. 200–201.
[120] R. Hedjam, A. Abdesselam, and F. Melgani, “Change detection
in unlabeled optical remote sensing data using Siamese CNN,”
IEEE J. Select. Topics Appl. Earth Observat. Remote Sens., vol. 13,
pp. 4178–4187, July 2020. doi: 10.1109/JSTARS.2020.3009116.
[121] H. Chen, C. Wu, B. Du, L. Zhang, and L. Wang, “Change detection in multisource VHR images via deep siamese convolutional multiple-layers recurrent neural network,” IEEE Trans. Geosci.
Remote Sens., vol. 58, no. 4, pp. 2848–2864, 2020. doi: 10.1109/
TGRS.2019.2956756.
[122] J. Liu, M. Gong, A. K. Qin, and K. C. Tan, “Bipartite differential neural network for unsupervised image change detection,”
IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 3, pp. 876–890,
2020. doi: 10.1109/TNNLS.2019.2910571.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

[123] X. Junfeng, Z. Baoming, G. Haitao, L. Jun, and L. Yuzhun,
“Combining iterative slow feature analysis and deep feature
learning for change detection in high-resolution remote sensing images,” J. Appl. Remote Sens., vol. 13, no. 2, pp. 1–16, 2019.
doi: 10.1117/1.JRS.13.024506.
[124] J. Fan, K. Lin, and M. Han, “A novel joint change detection approach based on weight-clustering sparse autoencoders,” IEEE
J. Select. Topics Appl. Earth Observat. Remote Sens., vol. 12, no. 2,
pp. 685–699, 2019. doi: 10.1109/JSTARS.2019.2892951.
[125] A. Argyridis and D. P. Argialas, “Building change detection
through multi-scale GEOBIA approach by integrating deep belief networks with fuzzy ontologies,” Int. J. Image Data Fusion,
vol. 7, no. 2, pp. 148–171, 2016.
[126] P. F. Alcantarilla, S. Stent, G. Ros, R. Arroyo, and R. Gherardi,
“Street-view change detection with deconvolutional networks,”
Autonom. Robots, vol. 42, no. 7, pp. 1301–1322, 2018. doi:
10.1007/s10514-018-9734-5.
[127] R. Jing et al., “Object-based change detection for VHR remote sensing images based on a Trisiamese-LSTM,” Int. J. Remote Sens., vol. 41,
no. 16, pp. 6209–6231, 2020. doi: 10.1080/01431161.2020.1734253.
[128] J. Geng, J. Fan, H. Wang, and X. Ma, “Change detection of
marine reclamation using multispectral images via patchbased recurrent neural network,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2017, pp. 612–615. doi: 10.1109/
IGARSS.2017.8127028.
[129] H. Lyu, H. Lu, and L. Mou, “Learning a transferable change rule
from a recurrent neural network for land cover change detection,” Remote Sens., vol. 8, no. 6, p. 506, 2016. doi: 10.3390/
rs8060506.
[130] M. Gong, X. Niu, P. Zhang, and Z. Li, “Generative adversarial
networks for change detection in multispectral imagery,” IEEE
Geosci. Remote Sens. Lett., vol. 14, no. 12, pp. 2310–2314, 2017.
doi: 10.1109/LGRS.2017.2762694.
[131] M. Gong, Y. Yang, T. Zhan, X. Niu, and S. Li, “A generative discriminatory classified network for change detection in multispectral imagery,” IEEE J. Select. Topics Appl. Earth Observat.
Remote Sens., vol. 12, no. 1, pp. 321–333, 2019. doi: 10.1109/
JSTARS.2018.2887108.
[132] S. Saha, L. Mou, X. X. Zhu, F. Bovolo, and L. Bruzzone, “Semisupervised change detection using graph convolutional network,” IEEE Geosci. Remote Sens. Lett., 2020.
[133] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proc.
Adv. Neural Information Process. Syst., 2012, pp. 1097–1105.
[134] K. Simonyan and A. Zisserman, “Ver y deep convolutional
n e t works for large-scale image recognition,” 2014, arXiv:
1409.1556.
[135] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, “Inceptionv4, inception-resnet and the impact of residual connections on
learning,” 2016, arXiv:1602.07261.
[136] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
image recognition,” in Proc. IEEE Conf. Comput. Vision Pattern
Recogn., 2016, pp. 770–778.
[137] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger,
“Densely connected convolutional networks,” in Proc. IEEE
Conf. Comput. Vision Pattern Recogn., 2017, pp. 4700–4708.

[138] Y. Wu, Z. Bai, Q. Miao, W. Ma, Y. Yang, and M. Gong, “A classified adversarial network for multi-spectral remote sensing image change detection,” Remote Sensing, vol. 12, no. 13, p. 2098,
2020. doi: 10.3390/rs12132098.
[139] B. Fang, G. Chen, L. Pan, R. Kou, and L. Wang, “GAN-based
Siamese framework for landslide inventory mapping using
bi-temporal optical remote sensing images,” IEEE Geosci. Remote Sens. Lett., vol. 11, no. 11, pp. 1–5, 2020. doi: 10.3390/
rs11111292.
[140] A. Song, Y. Kim, and Y. Han, “Uncertainty analysis for objectbased change detection in very high-resolution satellite images
using deep learning network,” Remote Sensing, vol. 12, no. 15,
p. 2345, 2020. doi: 10.3390/rs12152345.
[141] S. I. Toure, D. A. Stow, H-c Shih, J. Weeks, and D. Lopez-Carr, “Land
cover and land use change analysis using multi-spatial resolution
data and object-based image analysis,” Remote Sens. Environ., vol.
210, pp. 259–268, June 2018. doi: 10.1016/j.rse.2018.03.023.
[142] X. Wang, S. Liu, P. Du, H. Liang, J. Xia, and Y. Li, “Object-based
change detection in urban areas from high spatial resolution
images based on multiple features and ensemble learning,” Remote Sens., vol. 10, no. 2, 2018. doi: 10.3390/rs10020276.
[143] M. Chini, C. Bignami, A. Chiancone, and S. Stramondo, “Classification of VHR optical data for land use change analysis by
scale object seletion (SOS) algorithm,” in Proc. IEEE Geosci. Remote Sens. Symp., 2014, pp. 2834–2837.
[144] X. Huang, X. Han, S. Ma, T. Lin, and J. Gong, “Monitoring ecosystem service change in the City of Shenzhen by the use of high‐
resolution remotely sensed imagery and deep learning,” Land
Degradation Develop., vol. 30, no. 12, 2019. doi: 10.1002/ldr.3337.
[145] G. Doxani, K. Karantzalos, and M. Tsakiri-Strati, “Monitoring
urban changes based on scale-space filtering and object-oriented classification,” Int. J. Appl. Earth Observat. Geoinf., vol. 15, pp.
38–48, Apr. 2012. doi: 10.1016/j.jag.2011.07.002.
[146] Z. Guo and S. Du, “Mining parameter information for building
extraction and change detection with very high-resolution imagery and GIS data,” GISci. Remote Sens., vol. 54, no. 1, pp. 38–
63, 2017. doi: 10.1080/15481603.2016.1250328.
[147] B. Hou, Y. Wang, and Q. Liu, “A saliency guided semi-supervised building change detection method for high resolution
remote sensing images,” Sensors, vol. 16, no. 9, p. 1377, 2016.
doi: 10.3390/s16091377.
[148] X. Huang, H. Liu, and L. Zhang, “Spatiotemporal detection and
analysis of urban villages in mega city regions of China using
high-resolution remotely sensed imagery,” IEEE Trans. Geosci.
Remote Sens., vol. 53, no. 7, pp. 3639–3657, 2015. doi: 10.1109/
TGRS.2014.2380779.
[149] M. Janalipour and A. Mohammadzadeh, “Building damage
detection using object-based image analysis and ANFIS from
high-resolution image (case study: BAM earthquake, Iran),”
IEEE J. Select. Topics Appl. Earth Observat. Remote Sens., vol. 9, no.
5, pp. 1937–1945, 2016. doi: 10.1109/JSTARS.2015.2458582.
[150] T. Leichtle, C. Geiß, M. Wurm, T. Lakes, and H. Taubenböck,
“Unsupervised change detection in VHR remote sensing imagery–an object-based clustering approach in a dynamic urban environment,” Int. J. Appl. Earth Observat. Geoinf., vol. 54,
pp. 15–27, 2017. doi: 10.1016/j.jag.2016.08.010.

[151] Y. Li, X. Huang, and H. Liu, “Unsupervised deep feature learning for urban village detection from high-resolution remote
sensing images,” Photogrammetric Eng. Remote Sensing, vol. 83,
no. 8, pp. 567–579, 2017. doi: 10.14358/PERS.83.8.567.
[152] S. Radhika, Y. Tamura, and M. Matsui, “Cyclone damage detection on building structures from pre-and post-satellite images using wavelet based pattern recognition,” J. Wind Eng.
Ind. Aerodynamics, vol. 136, pp. 23–33, 2015. doi: 10.1016/j.
jweia.2014.10.018.
[153] N. Sofina and M. Ehlers, “Building change detection using high
resolution remotely sensed data and GIS,” IEEE J. Select. Topics
Appl. Earth Observat. Remote Sens., vol. 9, no. 8, pp. 3430–3438,
2016. doi: 10.1109/JSTARS.2016.2542074.
[154] X. Tong et al., “Use of shadows for detection of earthquakeinduced collapsed buildings in high-resolution satellite imagery,” ISPRS J. Photogrammetry Remote Sensing, vol. 79, pp. 53–67,
2013. doi: 10.1016/j.isprsjprs.2013.01.012.
[155] D. Wen, X. Huang, A. Zhang, and X. Ke, “Monitoring 3D building change and urban redevelopment patterns in inner city areas
of Chinese megacities using multi-view satellite imagery,” Remote Sens., vol. 11, no. 7, p. 763, 2019. doi: 10.3390/rs11070763.
[156] J. Tian, S. Cui, and P. Reinartz, “Building change detection
based on satellite stereo imagery and digital surface models,”
IEEE Trans. Geosci. Remote Sens., vol. 52, no. 1, pp. 406–417,
2014. doi: 10.1109/TGRS.2013.2240692.
[157] R. Qin, “Change detection on LOD 2 building models with
very high resolution spaceborne stereo imagery,” ISPRS J. Photogrammetry Remote Sens., vol. 96, pp. 179–192, Oct. 2014. doi:
10.1016/j.isprsjprs.2014.07.007.
[158] A. Kovacs and T. Sziranyi, “Orientation based building outline
extraction in aerial images,” ISPRS Ann. Photogrammetry, Remote Sens. Spatial Inf. Sci., vol. I-7, pp. 141–146, July 2012. doi:
10.5194/isprsannals-I-7-141-2012.
[159] A. O. Ok, “Automated detection of buildings from single VHR
multispectral images using shadow information and graph
cuts,” ISPRS J. Photogrammetry Remote Sensing, vol. 86, pp. 21–
40, Dec. 2013. doi: 10.1016/j.isprsjprs.2013.09.004.
[160] M. Vakalopoulou, K. Karantzalos, N. Komodakis, and N. Paragios, “Building detection in very high resolution multispectral
data with deep learning features,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2015, pp. 1873–1876.
[161] M. Janalipour and M. Taleai, “Building change detection after
earthquake using multi-criteria decision analysis based on extracted information from high spatial resolution satellite images,” Int. J. Remote Sens., vol. 38, no. 1, pp. 82–99, 2017. doi:
10.1080/01431161.2016.1259673.
[162] X. Huang and Y. Wang, “Investigating the effects of 3D urban
morphology on the surface urban heat island effect in urban
functional zones by using high-resolution remote sensing data:
A case study of Wuhan, Central China,” ISPRS J. Photogrammetry
Remote Sens., vol. 152, pp. 119–131, June 2019. doi: 10.1016/j.
isprsjprs.2019.04.010.
[163] M. Turker and B. Cetinkaya, “Automatic detection of earthquake‐
damaged buildings using DEMs created from pre‐ and post‐earthquake stereo aerial photographs,” Int. J. Remote Sens., vol. 26, no. 4,
pp. 823–832, 2005. doi: 10.1080/01431160512331316810.
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

[164] R. Qin, “A critical analysis of satellite stereo pairs for digital
surface model generation and a matching quality prediction model,” ISPRS J. Photogrammetry Remote Sensing, vol. 154,
pp. 139–150, Aug. 2019. doi: 10.1016/j.isprsjprs.2019.06.005.
[165] S. Ji, J. Liu, and M. Lu, “CNN-based dense image matching
for aerial remote sensing images,” Photogrammetric Eng. Remote Sens., vol. 85, no. 6, pp. 415–424, 2019. doi: 10.14358/
PERS.85.6.415.
[166] Y. Lü et al., “Recent ecological transitions in China: Greening, browning and influential factors,” Sci. Rep., vol. 5, no. 1,
p. 8732, 2015. doi: 10.1038/srep08732.
[167] J. Verbesselt, R. Hyndman, G. Newnham, and D. Culvenor,
“Detecting trend and seasonal changes in satellite image time
series,” Remote Sens. Environ., vol. 114, no. 1, pp. 106–115, 2010.
doi: 10.1016/j.rse.2009.08.014.
[168] R. Pu and S. Landry, “Evaluating seasonal effect on forest leaf
area index mapping using multi-seasonal high resolution satellite pléiades imagery,” Int. J. Appl. Earth Observat. Geoinf.,
vol. 80, pp. 268–279, Aug. 2019. doi: 10.1016/j.jag.2019.04.020.
[169] J. Wang, D. Yang, M. Detto, B. W. Nelson, M. Chen, K. Guan,
et al. “Multi-scale integration of satellite remote sensing improves characterization of dry-season green-up in an Amazon tropical evergreen forest,” Remote Sens. Environ., vol. 246,
p. 111,865, 2020. doi: 10.1016/j.rse.2020.111865.
[170] P. Gärtner, M. Förster, A. Kurban, and B. Kleinschmit, “Object based change detection of Central Asian Tugai vegetation
with very high spatial resolution satellite imagery,” Int. J. Appl.
Earth Observat. Geoinf., vol. 31, pp. 110–121, Sept. 2014. doi:
10.1016/j.jag.2014.03.004.
[171] J. Tian, T. Schneider, C. Straub, F. Kugler, and P. Reinartz, “Exploring digital surface models from nine different sensors for
forest monitoring and change detection,” Remote Sens., vol. 9,
no. 3, p. 287, 2017. doi: 10.3390/rs9030287.
[172] R. Dalagnol et al., “Quantifying canopy tree loss and gap recovery in tropical forests under low-intensity logging using VHR
satellite imagery and airborne LiDAR,” Remote Sensing, vol. 11,
no. 7, p. 817, 2019. doi: 10.3390/rs11070817.
[173] J. P. Ardila, W. Bijker, V. A. Tolpekin, and A. Stein, “Quantification of crown changes and change uncertainty of trees in
an urban environment,” ISPRS J. Photogrammetry Remote Sens.,
vol. 74, pp. 41–55, 2012. doi: 10.1016/j.isprsjprs.2012.08.007.
[174] B. Lu and Y. He, “Species classification using Unmanned Aerial
Vehicle (UAV)-acquired high spatial resolution imagery in a
heterogeneous grassland,” ISPRS J. Photogrammetry Remote Sens.,
vol. 128, pp. 73–85, 2017. doi: 10.1016/j.isprsjprs.2017.03.011.
[175] Y. Sun, Q. Xin, J. Huang, B. Huang, and H. Zhang, “Characterizing tree species of a tropical wetland in southern China at the individual tree level based on convolutional neural network,” IEEE
J. Select. Topics Appl. Earth Observat. Remote Sens., vol. 12, no. 11,
pp. 4415–4425, 2019. doi: 10.1109/JSTARS.2019.2950721.
[176] Z. Xie, Y. Chen, D. Lu, G. Li, and E. Chen, “Classification of
land cover, forest, and tree species classes with ZiYuan-3 multispectral and stereo data,” Remote Sens., vol. 11, no. 2, p. 164,
2019. doi: 10.3390/rs11020164.
[177] R. Pu, S. Landry, and Q. Yu, “Assessing the potential of multiseasonal high resolution Pléiades satellite imagery for mapping
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

urban tree species,” Int. J. Appl. Earth Observat. Geoinf., vol. 71,
pp. 144–158, Sept. 2018. doi: 10.1016/j.jag.2018.05.005.
[178] S. Hartling, V. Sagan, P. Sidike, M. Maimaitijiang, and J. Carron, “Urban tree species classification using a WorldView-2/3
and LiDAR data fusion approach and deep learning,” Sensors,
vol. 19, no. 6, p. 1284, 2019. doi: 10.3390/s19061284.
[179] D. Wen, X. Huang, H. Liu, W. Liao, and L. Zhang, “Semantic
classification of urban trees using very high resolution satellite imagery,” IEEE J. Select. Topics Appl. Earth Observat. Remote Sens., vol. 10, no. 4, pp. 1413–1424, 2017. doi: 10.1109/
JSTARS.2016.2645798.
[180] M. Xia, Y. Zhang, Z. Zhang, J. Liu, W. Ou, and W. Zou, “Modeling agricultural land use change in a rapid urbanizing town:
Linking the decisions of government, peasant households and
enterprises,” Land Use Policy, vol. 90, pp. 104266, 2020. doi:
10.1016/j.landusepol.2019.104266.
[181] E. S. Malinverni, M. Rinaldi, and S. Ruggieri, “Agricultural crop
change detection by means of hybrid classification and high
resolution images,” EARSeL eProc., vol. 11, no. 2, pp. 132–154,
2012.
[182] Y. Sadeh, X. Zhu, K. Chenu, and D. Dunkerley, “Sowing date
detection at the field scale using CubeSats remote sensing,”
Comput. Electron. Agriculture, vol. 157, pp. 568–580, 2019. doi:
10.1016/j.compag.2019.01.042.
[183] J. Bendig, A. Bolten, and G. Bareth, “UAV-based imaging for
multi-temporal, very high resolution crop surface models to
monitor crop growth variability monitoring des Pflanzenwachstums mit Hilfe multitemporaler und hoch auflösender
Oberflächenmodelle von Getreidebeständen auf Basis von Bildern aus UAV-Befliegungen,” Photogrammetrie-FernerkundungGeoinf., vol. 2013, pp. 551–562, Dec. 2013.
[184] P. L. Hatfield and P. J. Pinter, “Remote sensing for crop protection,” Crop Protection, vol. 12, no. 6, pp. 403–413, 1993. doi:
10.1016/0261-2194(93)90001-Y.
[185] K. Johansen et al., “Using GeoEye-1 imagery for multi-temporal
object-based detection of canegrub damage in sugarcane fields
in Queensland, Australia,” GISci. Remote Sensing, vol. 55, no. 2,
pp. 285–305, 2018. doi: 10.1080/15481603.2017.1417691.
[186] J. Franke and G. Menz, “Multi-temporal wheat disease detection by multi-spectral remote sensing,” Precision Agriculture, vol.
8, no. 3, pp. 161–172, 2007. doi: 10.1007/s11119-007-9036-y.
[187] L. Yuan, Y. Huang, R. W. Loraamm, C. Nie, J. Wang, and J.
Zhang, “Spectral analysis of winter wheat leaves for detection
and differentiation of diseases and insects,” Field Crops Res., vol.
156, pp. 199–207, 2014. doi: 10.1016/j.fcr.2013.11.012.
[188] A. M. Mouazen et al., “Chapter 2—Monitoring,” “ in Agricultural
Internet of Things and Decision Support for Precision Smart Farming,
A. Castrignanò, G. Buttafuoco, R. Khosla, A. M. Mouazen, D.
Moshou, and O. Naud, Eds. New York: Academic Press, 2020,
pp. 35–138.
[189] X. Zhang et al., “A deep learning-based approach for automated
yellow rust disease detection from high-resolution hyperspectral UAV images,” Remote Sens., vol. 11, no. 13, p. 1554, 2019.
doi: 10.3390/rs11131554.
[190] Y. Wang and H. Yésou, “Remote sensing of floodpath lakes and
wetlands: A challenging frontier in the monitoring of changing

environments,” Remote Sens., vol. 10, no. 12, p. 1955, 2018. doi:
10.3390/rs10121955.
[191] C. Xie, X. Huang, H. Mu, and W. Yin, “Impacts of land-use
changes on the lakes across the Yangtze floodplain in China,”
Environ. Sci. Technol., vol. 51, no. 7, pp. 3669–3677, 2017. doi:
10.1021/acs.est.6b04260.
[192] S. Wang et al., “Changes of water clarity in large lakes and reservoirs across China observed from long-term MODIS,” Remote Sens. Environ., vol. 247, pp. 111949, 2020. doi: 10.1016/j.
rse.2020.111949.
[193] J.-F. Pekel, A. Cottam, N. Gorelick, and A. S. Belward, “Highresolution mapping of global surface water and its long-term
changes,” Nature, vol. 540, no. 7633, pp. 418–422, 2016. doi:
10.1038/nature20584.
[194] J. A. Downing et al., “The global abundance and size distribution of lakes, ponds, and impoundments,” Limnol.
Oceanogr., vol. 51, no. 5, pp. 2388–2397, 2006. doi: 10.4319/
lo.2006.51.5.2388.
[195] S. W. Cooley, L. C. Smith, L. Stepan, and J. Mascaro, “Tracking
dynamic northern surface water changes with high-frequency
planet CubeSat imagery,” Remote Sens., vol. 9, no. 12, p. 1306,
2017. doi: 10.3390/rs9121306.
[196] W. Feng, H. Sui, W. Huang, C. Xu, and K. An, “Water body
extraction from very high-resolution remote sensing imagery
using deep U-Net and a superpixel-based conditional random
field model,” IEEE Geosci. Remote Sens. Lett., vol. 16, no. 4,
pp. 618–622, 2019. doi: 10.1109/LGRS.2018.2879492.
[197] F. Chen, X. Chen, T. Van de Voorde, D. Roberts, H. Jiang, and
W. Xu, “Open water detection in urban environments using
high spatial resolution remote sensing imagery,” Remote Sens.
Environ., vol. 242, p. 11,1706, June 2020.
[198] Q. Shen et al., “A CIE color purity algorithm to detect black
and odorous water in urban rivers using high-resolution
multispectral remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 9, pp. 6577–6590, 2019. doi: 10.1109/
TGRS.2019.2907283.
[199] X. Huang, C. Xie, X. Fang, and L. Zhang, “Combining pixel- and object-based machine learning for identification of
water-body types from urban high-resolution remote-sensing
imagery,” IEEE J. Select. Topics Appl. Earth Observat. Remote
Sens., vol. 8, no. 5, pp. 2097–2110, 2015. doi: 10.1109/JSTARS.
2015.2420713.
[200] M. Kamal, S. Phinn, and K. Johansen, “Object-based approach
for multi-scale mangrove composition mapping using multiresolution image datasets,” Remote Sens., vol. 7, no. 4, pp. 4753–
4783, 2015. doi: 10.3390/rs70404753.
[201] T. Hu, J. Liu, G. Zheng, Y. Li, and B. Xie, “Quantitative assessment of urban wetland dynamics using high spatial resolution
satellite imagery between 2000 and 2013,” Sci. Rep., vol. 8,
no. 1, p. 7409, 2018. doi: 10.1038/s41598-018-25823-9.
[202] Q. Wu et al., “Integrating LiDAR data and multi-temporal aerial
imagery to map wetland inundation dynamics using Google
Earth Engine,” Remote Sens. Environ., vol. 228, pp. 1–13, July
2019. doi: 10.1016/j.rse.2019.04.015.
[203] K. S. Schmidt and A. K. Skidmore, “Spectral discrimination
of vegetation types in a coastal wetland,” Remote Sens. En-

100

viron., vol. 85, no. 1, pp. 92–108, 2003. doi: 10.1016/S00344257(02)00196-7.
[204] G. Viennois, C. Proisy, J. Féret, J. Prosperi, F. Sidik, Suhardjono,
et al. “Multitemporal analysis of high-spatial-resolution optical
satellite imagery for mangrove species mapping in Bali, Indonesia,” IEEE J. Select. Topics Appl. Earth Observat. Remote Sens., vol.
9, pp. 3680–3686, 2016. doi: 10.1109/JSTARS.2016.2553170.
[205] R. B. Norgaard, “Ecosystem services: From eye-opening
metaphor to complexity blinder,” Ecol. Econ., vol. 69, no. 6,
pp. 1219–1227, 2010. doi: 10.1016/j.ecolecon.2009.11.009.
[206] Y. Z. Ayanu, C. Conrad, T. Nauss, M. Wegmann, and T. Koellner, “Quantifying and mapping ecosystem services supplies
and demands: A review of remote sensing applications,” Environ.
Sci. Technol., vol. 46, no. 16, pp. 8529–8541, 2012. doi: 10.1021/
es300157u.
[207] J. Haas and Y. Ban, “Mapping and monitoring urban ecosystem
services using multitemporal high-resolution satellite data,”
IEEE J. Select. Topics Appl. Earth Observat. Remote Sens., vol. 10,
no. 2, pp. 669–680, 2016. doi: 10.1109/JSTARS.2016.2586582.
[208] X. Ren, X. Chen, and Q. Ma, “Urban spatial ecological performance based on the data of remote sensing of Guyuan,” Int.
Arch. Photogrammetry, Remote Sens. Spatial Inf. Sci., vol. 42, p. 3,
Apr. 2018.
[209] C. R. Hakkenberg, M. P. Dannenberg, C. Song, and G. Vinci,
“Automated continuous fields prediction from landsat time
series: application to fractional impervious cover,” IEEE Geosci. Remote Sens. Lett., vol. 17, no. 1, pp. 132–136, 2019. doi:
10.1109/LGRS.2019.2915320.
[210] L. Zhang, Q. Weng, and Z. Shao, “An evaluation of monthly impervious surface dynamics by fusing Landsat and MODIS time
series in the Pearl River Delta, China, from 2000 to 2015,” Remote Sens. Environ., vol. 201, pp. 99–114, Nov. 2017. doi: 10.1016/j.
rse.2017.08.036.
[211] G. Xian, H. Shi, J. Dewitz, and Z. Wu, “Performances of WorldView 3, Sentinel 2, and Landsat 8 data in mapping impervious
surface,” Remote Sens. Appl., Soc. Environ., vol. 15, p. 100,246,
2019. doi: 10.1016/j.rsase.2019.100246.
[212] W. Zhou, G. Huang, A. Troy, and M. L. Cadenasso, “Objectbased land cover classification of shaded areas in high spatial
resolution imagery of urban areas: A comparison study,” Remote Sens. Environ., vol. 113, no. 8, pp. 1769–1777, 2009. doi:
10.1016/j.rse.2009.04.007.
[213] P. Li, J. Guo, B. Song, and X. Xiao, “A multilevel hierarchical image segmentation method for urban impervious surface mapping using very high resolution imagery,” IEEE J. Select. Topics
Appl. Earth Observat. Remote Sens., vol. 4, no. 1, pp. 103–116,
2011. doi: 10.1109/JSTARS.2010.2074186.
[214] T. Zhang and X. Huang, “Monitoring of urban impervious surfaces using time series of high-resolution remote sensing images in rapidly urbanized areas: A case study of Shenzhen,” IEEE J.
of Select. Topics Appl. Earth Observat. Remote Sens., vol. 11, no. 8,
pp. 2692–2708, 2018. doi: 10.1109/JSTARS.2018.2804440.
[215] C. E. Woodcock, T. R. Loveland, M. Herold, and M. E. Bauer,
“Transitioning from change detection to monitoring with remote sensing: A paradigm shift,” Remote Sens. Environ., vol. 238,
p. 111,558, 2020. doi: 10.1016/j.rse.2019.111558.
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

[216] D. Helman et al., “Using time series of high-resolution planet
satellite images to monitor grapevine stem water potential in
commercial vineyards,” Remote Sens., vol. 10, no. 10, p. 1615,
2018. doi: 10.3390/rs10101615.
[217] M. A. Wulder, J. C. White, N. C. Coops, and C. R. Butson,
“Multi-temporal analysis of high spatial resolution imagery for
disturbance monitoring,” Remote Sens. Environ., vol. 112, no. 6,
pp. 2729–2740, 2008. doi: 10.1016/j.rse.2008.01.010.
[218] D. Turner, A. Lucieer, and S. M. De Jong, “Time series analysis of landslide dynamics using an Unmanned Aerial Vehicle
(UAV),” Remote Sens., vol. 7, no. 2, pp. 1736–1757, 2015. doi:
10.3390/rs70201736.
[219] H. Li, L. Chen, F. Li, and M. Huang, “Ship detection and tracking method for satellite video based on multiscale saliency and
surrounding contrast analysis,” J. Appl. Remote Sens., vol. 13, no.
2, p. 026511, 2019. doi: 10.1117/1.JRS.13.026511.
[220] L. Wang, F. Chen, and H. Yin, “Detecting and tracking vehicles in traffic by unmanned aerial vehicles,” Automat. Construct., vol. 72, pp. 294–308, Dec. 2016. doi: 10.1016/j.autcon.2016.05.008.
[221] L. Mou et al., “Multitemporal very high resolution from space:
Outcome of the 2016 IEEE GRSS data fusion contest,” IEEE J.
Select. Topics Appl. Earth Observat. Remote Sens., vol. 10, no. 8,
pp. 3435–3447, 2017. doi: 10.1109/JSTARS.2017.2696823.
[222] L. Mou and X. X. Zhu, “Spatiotemporal scene interpretation of
space videos via deep neural network and tracklet analysis,”
in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2016,
pp. 1823–1826.
[223] M. C. Hansen and R. S. DeFries, “Detecting long-term global
forest change using continuous fields of tree-cover maps from
8-km advanced very high resolution radiometer (AVHRR) data
for the years 1982–99,” Ecosystems, vol. 7, no. 7, pp. 695–716,
2004. doi: 10.1007/s10021-004-0243-3.
[224] A. Schneider, M. A. Friedl, and D. Potere, “A new map of global
urban extent from MODIS satellite data,” Environ. Res. Lett., vol.
4, no. 4, p. 044003, 2009. doi: 10.1088/1748-9326/4/4/044003.
[225] M. A. Friedl et al., “MODIS collection 5 global land cover: Algorithm refinements and characterization of new datasets,”
Remote Sens. Environ., vol. 114, no. 1, pp. 168–182, 2010. doi:
10.1016/j.rse.2009.08.016.
[226] ESA. CCI-LC Product User Guide v2.4 [Online]. Available:
Http://maps.elie.ucl.ac.be/CCI/viewer/download/ESACCI-LC
-PUG-v2.4.pdf
[227] M. C. Hansen et al., “High-resolution global maps of 21st-century forest cover change,” Science, vol. 342, no. 6160, pp. 850–
853, 2013. doi: 10.1126/science.1244693.
[228] J. Chen et al., “Global land cover mapping at 30m resolution:
A POK-based operational approach,” ISPRS J. Photogrammetry
Remote Sens., vol. 103, pp. 7–27, May 2015. doi: 10.1016/j.isprsjprs.2014.09.002.
[229] P. Gong et al., “Annual maps of global artificial impervious
area (GAIA) between 1985 and 2018,” Remote Sens. Environ.,
vol. 236, p. 111,510, Jan. 2020. doi: 10.1016/j.rse.2019.111510.

DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

[230] P. Gong et al., “Mapping essential urban land use categories
in China (EULUC-China): Preliminary results for 2018,”
Sci. Bull., vol. 65, no. 3, pp. 182–187, 2020. doi: 10.1016/j.
scib.2019.12.007.
[231] M. Pesaresi, D. Ehrilch, A. J. Florczyk, S. Freire, A. Julea, T. Kemper, et al. “GHS built-up grid, derived from Landsat, multitemporal (1975, 1990, 2000, 2014),” European Commission, Joint Res.
Centre, JRC Data Catalogue, 2015.
[232] P. Gong, H. Liu, M. Zhang, C. Li, J. Wang, H. Huang, et al.
“Stable classification with limited sample: Transferring a 30-m
resolution sample set collected in 2015 to mapping 10-m
resolution global land cover in 2017,” Sci. Bull., vol. 64, no. 6,
pp. 370–373, 2019. doi: 10.1016/j.scib.2019.03.002.
[233] R. Houborg and M. McCabe, “High-resolution NDVI from
Planet’s constellation of earth observing nano-satellites: A new
data source for precision agriculture,” Remote Sens., vol. 8, no. 9,
p. 768, 2016. doi: 10.3390/rs8090768.
[234] L. Wang, M. Jia, D. Yin, and J. Tian, “A review of remote sensing for mangrove forests: 1956–2018,” Remote Sens. Environ.,
vol. 231, p. 111,223, 2019. doi: 10.1016/j.rse.2019.111223.
[235] A. Ertürk, M. Iordache, and A. Plaza, “Sparse unmixing with
dictionary pruning for hyperspectral change detection,” IEEE
J. Select. Topics Appl. Earth Observat. Remote Sens., vol. 10, no. 1,
pp. 321–330, 2017. doi: 10.1109/JSTARS.2016.2606514.
[236] M. V. M Graña, B Ayerdi. “Hyperspectral remote sensing scenes.”
Grupo de Inteligencia Computacional (GIC). http://www.ehu.es/
ccwintco/index.php?title=Hyperspectral_Remote_Sensing
_Scenes&redirect=no (accessed 2012).
[237] K. Nogueira, O. A. B. Penatti, and J. A. dos Santos, “Towards
better exploiting convolutional neural networks for remote
sensing scene classification,” Pattern Recognition, vol. 61,
pp. 539–556, Jan. 2017. doi: 10.1016/j.patcog.2016.07.001.
[238] W. Zhou, D. Ming, X. Lv, K. Zhou, H. Bao, and Z. Hong, “SO–
CNN based urban functional zone fine division with VHR remote sensing image,” Remote Sens. Environ., vol. 236, p. 111,458,
2020. doi: 10.1016/j.rse.2019.111458.
[239] M. Li, K. M. d Beurs, A. Stein, and W. Bijker, “Incorporating
open source data for Bayesian classification of urban land use
from VHR stereo images,” IEEE J. Select. Topics Appl. Earth Observat. Remote Sens., vol. 10, no. 11, pp. 4930–4943, 2017. doi:
10.1109/JSTARS.2017.2737702.
[240] S. Du, S. Du, B. Liu, and X. Zhang, “Context-enabled extraction of large-scale urban functional zones from very-high-resolution images: A multiscale segmentation approach,” Remote
Sens., vol. 11, no. 16, p. 1902, 2019. doi: 10.3390/rs11161902.
[241] X. Zhang, S. Du, and Q. Wang, “Hierarchical semantic cognition for urban functional zones with VHR satellite images and
POI data,” ISPRS J. Photogrammetry Remote Sens., vol. 132, pp.
170–184, Oct. 2017. doi: 10.1016/j.isprsjprs.2017.09.007.
[242] X. Liu et al., “Classifying urban land use by integrating remote sensing and social media data,” Int. J. Geographical Inform. Sci., vol. 31,
no. 8, pp. 1675–1696, 2017. doi: 10.1080/13658816.2017.1324976.
GRS

101

The CCSDS 123.0-B-2
“Low-Complexity Lossless and
Near-Lossless Multispectral
and Hyperspectral Image
Compression” Standard
A comprehensive
review

MIGUEL HERNÁNDEZCABRONERO,
AARON B. KIELY,
MATTHEW KLIMESH,
IAN BLANES,
JONATHAN LIGO,
ENRICO MAGLI, AND
JOAN SERRA-SAGRISTÀ

©SHUTTERSTOCK.COM/ASVMAGZ

he Consultative Committee for Space Data Systems
(CCSDS) published the CCSDS 123.0-B-2, “LowComplexity Lossless and Near-Lossless Multispectral and
Hyperspectral Image Compression” standard. This standard extends the previous issue, CCSDS 123.0-B-1, which
supported only lossless compression, while maintaining
backward compatibility. The main novelty of the new
issue is support for near-lossless compression, i.e., lossy
compression with user-defined absolute and/or relative
error limits in the reconstructed images. This new feature
is achieved via closed-loop quantization of prediction
errors. Two further additions arise from the new nearlossless support: first, the calculation of predicted sam-

102

ple values using sample representatives that may not be
equal to the reconstructed sample values, and, second, a
new hybrid entropy coder designed to provide enhanced
compression performance for low-entropy data, prevalent when nonlossless compression is used. These new
features enable significantly smaller compressed data volumes than those achievable with CCSDS 123.0-B-1 while
controlling the quality of the decompressed images. As
a result, larger amounts of valuable information can be
retrieved given a set of bandwidth and energy consumption constraints.

Digital Object Identifier 10.1109/MGRS.2020.3048443
Date of current version: 10 February 2021

BACKGROUND
During the past 30 years, multispectral imaging and hyperspectral imaging (HSI) have become a staple tool used
for geoscience remote sensing and Earth observation [1],

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

[2]. This type of imagery enables the simultaneous registration of multiple parts of the electromagnetic spectrum,
providing invaluable information for many detection, classification, and unmixing problems [3]. As a result, today,
remote sensing HSI is used in many commercial, scientific,
and defense areas, including precision agriculture, mining,
forestry, coastal and oceanic observation, intelligence, and
disaster monitoring [3]–[6]. Due to the growing quantity
of deployed sensors [7], the number of public and private
remote sensing stakeholders [4], and the ongoing effort to
improve the analysis of retrieved images [8]–[19], the importance of HSI is likely to increase in the future.
Images produced by multispectral and hyperspectral
sensors consist of multiple spectral bands, instead of the
three—red, green, and blue—present in traditional color
images. Depending on the application and the available
hardware, the number of registered bands can be on the order of tens, hundreds, and even thousands [20]. Thus, HSI
generates significantly larger volumes of data compared
to traditional imagers. Moreover, the spatial resolution of
the deployed sensors also follows a rising trend, further
increasing the amount of data produced. For instance, the
HyspIRI sensor developed by NASA can produce up to 5 TB
of data per day [21]. However, the downlink channel capacity between the remote sensing devices and the ground stations is constrained, which limits the amount and quality
of the retrieved data [22].
Data compression is typically applied to reduce the
amount of data to be downloaded, hence improving effective transmission capacity [23]–[27]. Due to hardware and
energy constraints, employed algorithms must be tailored
to attain a beneficial tradeoff between complexity and efficiency [22], [28]. When lossless compression is applied
to images, the resulting compressed data suffice to reconstruct identical copies of the originals. On the other hand,
lossy compression enables the transmission of even smaller data volumes at the cost of the reconstructed images not
being identical to the originals. Among lossy compression
algorithms, those that provide user-controlled bounds on
the maximum error introduced in any sample are referred
to as near lossless.
In spite of the distortion introduced by lossy and nearlossless methods, several studies have concluded that reconstructed images can be successfully used for the intended analysis tasks [29]. This is sometimes observed for
compressed images up to 25-times smaller than the original ones [30]. Notwithstanding, a successful analysis can
be performed only when the amount of loss is adequate
for the type of images and the task at hand [29], [31]. One
of the main advantages of near-lossless compressors is
that they guarantee the accuracy of all the reconstructed
samples in an image. This is in contrast to regular lossy
compression approaches, which typically provide competitive average distortion results but no assurance about
the fidelity of any given set of samples. Regardless of the
employed compression regime, compression algorithms
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

must meet very stringent limitations in terms of complexity and required computational resources [32]. This
constraint is particularly relevant for small satellites and
CubeSats, which have attracted much scientific and industrial interest recently [4], [33].
The CCSDS, founded in 1982, publishes the standards
for spaceflight communication used in more than 900
space missions to date. (An updated list of space missions
using CCSDS standards can be found at https://public.
ccsds.org/implementations/missions.aspx.) CCSDS standards enable cooperation among space agencies and with
industrial associates, seeking enhanced interoperability,
reliability, and cost-effectiveness. The latest CCSDS compression standard (CCSDS 123.0-B-2, “Low-Complexity
Lossless and Near-Lossless Multispectral and Hyperspectral Image Compression [34]), the central topic of this
article, supersedes CCSDS 123.0-B-1 [35] while maintaining backward compatibility. In the CCSDS naming
convention, suffixes “-1” and “-2” denote the first and second issues, respectively, of a standard. Hereafter, CCSDS
123.0-B-1 and CCSDS 123.0-B-2 are also denoted as Issue
1 and Issue 2, respectively.
Perhaps the most relevant novel feature of Issue 2 is
a new near-lossless compression regime, enabled by a
closed-loop scalar quantizer in the prediction stage [36].
Note that this in-loop quantization approach enables a
higher compression performance than quantization of
input samples before prediction [26]. With this new feature, users can specify the maximum error limits—absolute and/or relative—introduced in the decompressed
images. Fidelity settings can vary from band to band and
can be periodically updated within an image. Another
new feature of Issue 2 is a hybrid entropy coder option. It
is specifically designed to provide improved performance
on low-entropy data, i.e., for the case when prediction errors tend to be small compared to the quantizer stepsize.
The hybrid encoder extends the sample-adaptive codes of
CCSDS 123.0-B-1 with 16 additional variable-to-variablelength codes, which can represent multiple input symbols
using a single codeword. To guarantee backward compatibility, both lossless and near-lossless compression can be
performed with either of CCSDS 123.0-B-1’s original entropy coders or with the new hybrid option. A third novelty in the new standard is a new mode within the predictor stage called narrow local sums, which are designed
to facilitate the design of efficient hardware implementations. Yet another change introduced in the new standard
is added support for optional supplementary information
tables, which can provide ancillary image or instrument
information, e.g., to identify the wavelengths associated
with each spectral band.
This article provides a comprehensive overview of Issue 2,
paying special attention to the new concepts and capabilities not present in Issue 1. The content hereafter presented
extends those presented in a previous conference work [36].
The following overview is more in depth, it assumes no
103

previous knowledge of Issue 1, and a performance evaluation is included. Furthermore, the experimental results
discussed here complement those in [37] by providing both
a quantitative and qualitative comparison to other relevant
compression methods.
THE NEW CCSDS 123.0-B-2 STANDARD
PREVIOUS WORK
The CCSDS Data Compression Working Group (1995–2007;
2020–present) and the Multispectral and Hyperspectral
Data Compression (MHDC) Working Group (2007–2020)
have developed and maintained several compression standards applicable to remote sensing HSI, listed chronologically in Table 1. The CCSDS 121.0-B-1 standard describes a
general-purpose adaptive entropy coder. In CCSDS 121.0B-2, the efficiency and flexibility of this entropy coder was
enhanced by allowing larger block sizes and the possibility
of using a restricted set of codewords. (As this entropy coder
is available in the new CCSDS 123.0-B-2 standard, an overview is provided later in the “Block-Adaptive Coder” section).
The CCSDS 122.0-B-1 standard was designed specifically for
image data and supports both lossless and lossy regimes. It
consists of a spatial discrete wavelet transform, which is then
followed by a bit-plane coder. The CCSDS 122.1-B-0 standard extends CCSDS 122.0-B-1 by allowing the application
of spectral decorrelation transforms. To provide compatibility between the 122.0 and 122.1 standards, a second issue
of 122.0 (CCSDS 122.0-B-2) was also published. Finally, the
CCSDS 123.0-B-1 standard formalizes a predictive coding
scheme for multispectral and hyperspectral data. This standard is the immediate predecessor of the one addressed in
this article, and their functional blocks are described in subsequent subsections.
Several hardware implementations can be found in the
literature of the CCSDS 123.0-B-1 standard. In [44], a parallelization technique is described that achieves from 31 to 123
Megasamples per second (Ms/s), respectively, on the Xilinx
V-7 XC7VX690T and V-5QV FX130T field-programmable
TABLE 1. A CHRONOLOGY OF CCSDS DATA-COMPRESSION
STANDARDS.
NAME

RELEASE

STATUS

REGIME MULTISPECTRAL

121.0-B-1 [38]

May

1997

Retired

122.0-B-1 [39] May

2005

Retired

LL, LS

121.0-B-2 [40] April

2012

Retired

123.0-B-1 [35] May

2012

Retired

Yes

122.0-B-2 [41] September 2017

Active

LL, LS

122.1-B-1 [42]

Active

LL, LS

Yes

September 2017

123.0-B-2 [34] February

2019

Active

LL, NL

Yes

121.0-B-3 [43] August

2020

Active

The active recommendations (blue books) are shown in blue while retired (superseded) standards (silver books) are presented in gray. Lossless, lossy, and near-lossless compression
regimes are denoted as LL, LS, and NL, respectively. The “multispectral” column indicates
whether or not several bands can be compressed simultaneously.

104

gate arrays (FPGAs). In [45], parallelization using C-slow
retiming is proposed, which achieves a throughput of up to
213 Ms/s on a space-grade Virtex-5QV FPGA. In [46], another
implementation, this one with a throughput of 147 Ms/s on a
Xilinx Zynq-7020 FPGA, is described. The FPGA design discussed in [47] allows parallel processing of any number of
samples, provided that resource constraints are met. This enables configurable tradeoffs between throughput and power
consumption. In [48], a low-cost FPGA design is described
for the prediction block of CCSDS 123.0-B-1, with a throughput as high as 20 Ms/s on a Xilinx Zynq-7000 FPGA. In [49]–
[51], low-complexity and low-occupancy FPGA designs are
proposed. These implementations are designed to be independent and combinable in a plug-and-play fashion. The latest version of this system, referred to as SHyLoC 2.0, yields
a throughput of 150 Ms/s on a Xilinx Virtex XQR5VFX130
FPGA. The hardware designs for CCSDS 123.0-B-2 are currently ongoing, with the European Commission funding
two research projects within the framework of the Horizon
2020 (H2020) program [52], [53] and with NASA and the
European Space Agency funding other research projects [54],
[55]. To the best of our knowledge, there are no public implementations of Issue 2 available.
The extensions to CCSDS compression algorithms have
been published as well. In [56], a method to extend lossless
predictive coding schemes—in particular, CCSDS 123.0B-1—was proposed. This method enables compression in a
lossy regime, producing constant signal-to-noise ratio (SNR)
and accurate rate control. In [57], a lightweight arithmetic
coder was proposed as a possible replacement for the entropy coder of CCSDS 123.0-B-1. Some algorithms have been
suggested related to the prediction stage of Issue 2, based
on recursive least-squares theory. These algorithms describe
more adaptive prediction methods at the cost of increased
computational complexity. In [58], the inverse correlation
matrix of the local differences is used to update the prediction weights. In [59], this predictor is enhanced by adaptively
selecting the number of local differences to be used. In [60],
two prediction modes are described: the first uses only spectral neighbors in the weight update process; the second also
employs spatial neighbors. The best of the two for each band
in terms of mean absolute error is selected for coding. In [61],
the image is divided into nonoverlapping regions, which allows for parallel application of the methods described in [59]
and [60].
OVERVIEW OF THE NEW STANDARD
The CCSDS 123.0-B-2 standard is based on the fast lossless
extended (FLEX) compressor [62]. In turn, FLEX is based on
the fast lossless (FL) compressor [63], which was formalized
as CCSDS 123.0-B-1. FLEX improves upon FL by adding
adjustable lossy compression capabilities while maintaining the option to perform lossless compression. The latest
CCSDS compression standard extends FLEX by adding new
features, such as relative error limits, periodic error limit updating, and new prediction modes that facilitate hardware
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

implementations. Very importantly, Issue 2 has been designed to retain many of FL’s desirable properties, including low computational complexity; single-pass compression
and decompression; automatic adaptation to the data being
compressed; and the ability to operate requiring a constant,
reasonably sized memory space. Moreover, Issue 2 inherits
all the capabilities of CCSDS 123.0-B-1, allowing decompression of the data output by the latter. These features make
both issues of the CCSDS 123.0 standard suitable for use
onboard spaceborne systems, including small satellite missions. Note that compressed images do not include synchronization markers or any other similar scheme. It is assumed
that the transport layer will provide the ability to locate the
next image in the event of a bit error or data loss.
The general structure of the Issue 2 compressor is shown in
Figure 1. Similar to CCSDS 123.0-B-1, the input data—signed
or unsigned integers—go through a predictor stage in which
previously coded information is employed to predict the value of the next sample to be compressed. As a main novelty
of Issue 2, prediction errors are uniformly quantized. The
quantization bin sizes are determined by the user’s choice
of absolute error limit (i.e., the maximum allowed absolute
difference between the original and reconstructed sample
values) and/or the relative error limit, which controls the
maximum ratio of the error to the sample’s predicted value.
Quantized data are then mapped to nonnegative integers,
which then are input to the entropy coder.
When nonzero error limits are selected, quantizer indices represent approximations of the aforementioned prediction errors, instead of the actual values. In this case, data
output by the predictor stage typically exhibit lower entropy rates, which allows the coder to produce smaller compressed files. To make decompression possible, the decoder
must be able to make the same predictions as the encoder.
To guarantee this, when nonzero error limits are selected,
prediction is done using so-called sample representatives
instead of the original samples.
The following sections provide an informative description
of the aforementioned functional blocks. For the sake of readability, some definitions in this description are simplified so

as to not contemplate boundary cases, e.g., the image edges
when neighboring samples are involved. Interested readers
are referred to [34] for complete, normative definitions. A list
of the symbols employed hereafter is available in Table 2 for
ease of reference.
PREDICTOR STAGE
The predictor stage is designed to process input samples sequentially in a single pass, producing one mapped quantizer index per input sample. Although CCSDS 123.0-B-1 was
designed to accept input samples of, at most, 16 bits, Issue 2
accepts bit depths, D, up to 32 bits. Hereafter, s z (t) denotes
the tth sample of the zth spectral band in raster scan order,
and d z (t) is its corresponding mapped quantizer index. To
obtain d z (t), a prediction of the sample’s original value, denoted as ts z (t), is computed as described in the “Prediction”
section, and the prediction error is computed as
D z (t) = s z (t) - ts z (t). (1)
This prediction error is then quantized, as discussed in the
“Quantization” section, to produce a quantizer index q z (t).
This index is mapped to a nonnegative value: d z (t), the output of the predictor stage, as described in the “Quantizer
Index Mapping” section.
The quantizer index is also transformed into its corresponding sample representative smz (t), as described in the
“Sample Representatives” section. These representatives are
then used to obtain the predicted values, ts z (t), used in (1). As
mentioned previously in this section, the sample value prediction must be based on smz (t) instead of s z (t) to avoid compressor–decompressor prediction differences when compression is not lossless.
QUANTIZATION
The CCSDS 123.0-B-2 standard allows for quantization of
each prediction error D z (t) into a quantizer index q z (t) so
that D z (t)—and, thus, also the input sample s z (t)—can
be reconstructed with maximum error m z (t). A quantizer
with uniform bin size 2m z (t) + 1 is used, i.e.,
Encoder

Predictor
Quantized
Prediction
Prediction
Input
Errors
Errors
Image
Quantizer
qz(t )
∆z(t )
sz(t )
Index
Quantization
–
Mapping
Sample
Representatives
Predicted
s z″(t)
Sample
Sample
Prediction
Values
Representative
sz(t )
Calculation

Mapped
Quantizer
Indices
δz(t )

BlockAdaptive
Coder
Coder
Selection
(Once Per
Image)

Encoded
Bitstream

SampleAdaptive
Coder
Hybrid
Coder

FIGURE 1. A structure overview of the CCSDS 123.0-B-2 compressor. The new functional blocks with respect to CCSDS 123.0-B-1 are high-

lighted in blue while the modified blocks are shown in green.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

105

TABLE 2. A LIST OF SYMBOLS REFERENCED IN THIS ARTICLE.
SYMBOL

D z (t) + m z (t)
q z (t) = sgn (D z (t))· < 2m (t) + 1 F , (2)
z

MEANING
GENERAL

s z (t)

Original sample value
(tth sample of spectral band z)

Dynamic range in bits

s min, s max

Minimum and maximum allowed sample values

s mid

Midrange sample value

NX , NY

Horizontal and vertical spatial dimensions of the
image

Number of spectral bands of the image

where the sgn function is defined as
1, x 2 0
sgn (x) = * 0, x = 0 . (3)
- 1, x 1 0
The users control m z (t) indirectly by selecting the maximum absolute error az , the maximum relative error rz , or
both for each spectral band z. When only absolute error
limits are specified,

SAMPLE REPRESENTATIVE CALCULATION
smz (t)

Sample representative for s z (t)

Sample representative resolution

Sample representative damping for band z

Sample representative offset for band z

st z (t)

Predicted value for s z (t)

Prediction weight arithmetic resolution

Number of previous bands used for the prediction

PREDICTION

s z, y, x

Alternative notation for s z (t)

v z, y, x

Local sum for s z, y, x

NW
d z, y, x, d Nz, y, x, d W
z, y, x, d z, y, x

Local differences for s z, y, x

U z, y, x

Local difference vector for s z, y, x

W z, y, x

Prediction weight vector for s z, y, x

dt z, y, x

Predicted central local difference for s z, y, x
Double resolution predicted value for s z (t)

su z (t)
(i )
z

v min, v max, t inc, g , g

*
z

User-specified weight update parameters
QUANTIZATION

D z (t)

Prediction error for s z (t)

q z (t)

Quantizer index of D z (t)

sź (t)

Clipped quantizer bin center for D z (t)

Maximum absolute error in the spectral band z

Maximum relative error in the spectral band z

m z (t)

Maximum reconstruction error | s z (t) - slz (t) |

d z (t)

Mapped quantizer index for q z (t)

i z (t)

Scaled difference between st z (t)
and the closest of smin and smax

U max

Golomb-power-of-2 (GPO2) length limit

R z (t)

Accumulator value for d z (t)

QUANTIZER INDEX MAPPING

ENTROPY CODING

C (t)
c

k z (t)

Counter value for d z (t)
Sample-adaptive rescaling counter size
GPO2 code index for d z (t)

Ru z (t)

High-resolution counter value for d z (t)

Hybrid code index

Hybrid code entropy-threshold constants

Hybrid code symbol-limit constants

Hybrid code escape symbol

106

m z (t) = a z . (4)
When only relative error limits are set,
m z (t) = ;

rz | ts z (t)|
E, (5)
2D

where ts z (t) is the predicted value for the original sample
s z (t). Setting relative error limits allows for the reconstruction of different samples with dissimilar degrees of precision.
More specifically, the samples predicted to have a smaller
magnitude are reconstructed with lower error. Note that predicted, rather than actual, sample values are used in (5) to
keep the encoder and the decoder synchronized. Thus, absolute error bounds are not guaranteed when only a relative
error limit rz 2 0 is specified. When both the absolute and
relative error limits are used, m z (t) is set to the minimum of
(4) and (5). When lossless compression is desired in band z,
users may set a z = 0 or rz = 0 so that m z (t) = 0. This guarantees that q z (t) = D z (t), i.e., that the original samples can
be reconstructed exactly.
It is worth emphasizing that error limits can be set individually for each spectral band. With this mechanism,
higher-importance bands can be reconstructed with greater
fidelity (even perfect fidelity), while lesser-priority bands
can be represented with lower fidelity using smaller compressed data volumes [56], [64]–[6]. Furthermore, the periodic error limit update option can be activated so that different fidelity choices can be adapted within a band. This
option is useful to meet a given downlink transmission
rate constraint and/or to better preserve the image regions
expected to contain features of interest. It should be highlighted that the standard does not define a specific method
for selecting error limit values, e.g., to meet a given downlink rate. This is because error limit values are encoded in
the bitstream, and thus, the decoder does not need to know
how those error limits were selected.
SAMPLE REPRESENTATIVES
The decompressor must duplicate the prediction operation
performed by the compressor, but, in general, the original
image samples s z (t) cannot be perfectly reconstructed from
the compressed bitstream because of information lost during the quantization stage. Consequently, the prediction
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

calculation (in both the compressor and decompressor) is
performed using sample representatives smz (t) in place of the
original samples s z (t).
A naive solution to this problem is to use the central
point slz (t) of the quantizer bin, whose index q z (t) is transmitted to the decoder. The quantizer bin center slz (t) can be
calculated as
slz (t) = clip ^ts z (t) + q z (t)· (2m z (t) + 1), s min, s max h, (6)
where s min and s max are the minimum and maximum values, respectively, allowed for an input sample and
clip (x, a, b) = min (b, max (a, x))(7)
guarantees that slz (t) falls within the allowed value range.
However, using the quantizer bin center slz (t) as the sample
representative smz (t) for prediction does not always minimize compressed data volume [37]. This is true even for
m z (t) = 0, i.e., lossless compression.
In the CCSDS 123.0-B-2 standard, three user-specified
parameters can be used to adjust the choice of smz (t). These
are the sample representative resolution (H), damping (z z),
and offset (} z) parameters. Based on them, sample representatives smz (t) are defined as an integer approximation to
zz
z
ts z (t) + c 1 - Hz mc slz (t) - }Hz sgn ^q z (t) h m z (t) m . (8)
2H
2
2
Regardless of the parameter choice, the sample representatives always fall between slz (t) and ts z (t). Parameter H determines the precision with which representatives are computed. Parameter z z limits the effect of noisy samples in
the representative calculation. In turn, parameter } z establishes a bias toward slz (t) or ts z (t), depending on its value.
Although H is defined for the whole image, z z and } z can
be chosen on a band-by-band basis. Setting z z = } z = 0
causes the sample representatives to be equal to slz (t); the
larger values of z z and/or } z produce representatives closer

Sample
Band z Representation
S″z

Local
Difference
Vector Uz,y,x

Sample
Band z – 1 Representation
S″z–1

Local
Sums
σz–P

dz–1,y,x

...

PREDICTION
The predicted sample value ts z (t) for an input sample s z (t) is
computed causally using sample representatives from spectral bands z - P, f, z, where P $ 0 is a user-defined parameter. Within each band, previous sample representatives are
used to compute local sums. These can be regarded as preliminary, scaled estimates of the actual sample value. Local
sums, in combination with the sample representatives, are
used to compute local differences. The predicted value ts z (t)
is then calculated using the local sum in the current band z
as well as a weighted sum of local differences from the current and previous bands. Local sums can be understood as a
local mean subtraction, and prediction as being made in the
mean-subtracted domain. Figure 2 displays an overview of
the prediction process. Its stages are more precisely described
in the following.
Local sums are computed from previous sample representatives using one of the four available modes. Similar
to CCSDS 123.0-B-1, each mode is either neighbor- or column-oriented. As a novelty of Issue 2, modes can now be
narrow instead of wide. The sample representatives used to
calculate the local sums depend on the selected mode, as
depicted in Figure 3. In the figure and hereafter, s z, y, x is used
to denote the current sample s z (t), which makes explicit the
band index z as well as the spatial coordinates (x, y) within
the band. In all of the modes, the highlighted sample representatives are multiplied by the factor indicated in the
Figure 3 and added together to obtain the local sum v z, y, x
corresponding to s z, y, x . For instance, the narrow neighbororiented local sums are computed as

Central
dz–P,y,x
Local
+
− Differences
dz–P
Prediction
...
...
Weight
Predicted Value
Vector Wz,y,x
Central
≈
Sz,y,x
Local
Local
Sums
+
− Differences
Predicted Central dz,y,x + 2Ωσz,y,x
σz–1
dz–1
Local Difference
2Ω+2
dz,y,x
Directional
Inner
Prediction
Local
Local
Product
−
Differences
Sums
N
W
NW
d z,y,x, d z,y,x , d z,y,x
dz
σz
σz,y,x
(Full Prediction Only)

Sample
Band z – P Representation
S″z–P
...

to ts z (t). Note that, depending on the parameter choice,
smz (t) may not be contained in the quantizer bin identified by q z (t). The empirical results indicate that setting the
damping and offset parameters to values other than zero
tends to provide larger benefits to compression performance when spectral bands are closer in wavelength and
for images with larger noise prevalence [37].

FIGURE 2. An overview of the prediction block in CCSDS 123.0-B-2.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

107

v z, y, x = smz, y - 1, x - 1 + 2smz, y - 1, x + smz, y - 1, x + 1 . (9)

dtz, y, x = W Tz, y, x U z, y, x . (13)

As can be observed in Figure 3, column-oriented local
sums employ sample representatives at the same x coordinate, whereas neighbor-oriented sums also use sample representatives at contiguous x coordinates. In turn, the new
narrow option removes the dependency on smz, y, x - 1, which
facilitates pipelining in a hardware implementation at the
cost of some compression performance loss [37]. Note that
wide and narrow column-oriented modes are identical in
the general case. Notwithstanding, only the wide columnoriented mode uses smz, y, x - 1 for calculating local sums at the
first row, i.e., y = 0, of each spectral band.
Local differences are computed based on the sample representatives and the local sums. For an input sample s z, y, z,
up to four local difference types are computed: the central difference ^d z, y, x h and three directional differences, i.e.,
NW
north ^d Nz, y, x h, west ^d W
z, y, x h, and northwest ^ d z, y, x h . They are
defined as follows:

The predicted sample is then calculated as an integer approximation to

d z, y, x
d Nz, y, x
dW
z, y, x
d NW
z, y, x

=
=
=
=

4smz, y, x - v z, y, x,
4smz, y - 1, x - v z, y, x,

4smz, y, x - 1 - v z, y, x,
4smz, y - 1, x - 1 - v z, y, x .

ts z, y, x . <

where Ω is a parameter that controls arithmetic precision.
The initial prediction weight vector for each band,
W z, 0, 0, can be defined based on default or user-provided
values. In either case, vector elements are updated after
processing each input sample s z (t). The updates are based
on the obtained prediction error and several user-defined
parameters, namely, v min, v max, g (zi), g *z , and t inc, which control the rate at which weights are adapted to the original image statistics. More precisely, the smaller values of
g (zi), g *z , v min, v max, and 1/t inc typically produce larger weight
updates. This results in a faster adaptation to the source
statistics at the cost of worse steady-state compression performance [37].
It is important to highlight that the existence of two prediction modes (full and reduced) as well as two different
local mean types (column and neighbor oriented) is present in Issue 2 so that prediction is effective for the image
data produced by different types of instruments. For instance, when streaking artifacts are present in the images,
reduced column-oriented prediction tends to produce the
best results [37].

(10)

The predicted sample value is then computed using either the full or reduced prediction modes. In the full prediction mode, the local difference vector U z, y, x is defined using
directional differences from the current spectral band and
central differences from the previous bands:

QUANTIZER INDEX MAPPING
The prediction errors D z (t) obtained in (1) as well as their
corresponding quantizer indices q z (t) defined in (2) may
be negative. However, the entropy coders available in CCSDS 123.0-B are defined for nonnegative input values. The
quantizer index mapping stage depicted in Figure 1 provides a one-to-one mapping between valid quantizer indices and nonnegative values, referred to as mapped quantizer
indices, and is denoted as d z (t).
This functional block remains unaltered with respect
to the previous Issue of the standard [35]. A key property

NW
U z, y, x = 6d Nz, y, x, d W
z, y, x, d z, y, x, d z - 1, y, x, f, d z - P, y, x@ . (11)

In the reduced prediction mode, the local difference vector
uses only central differences from previous bands:
U z, y, x = 6d z - 1, y, x, f, d z - P, y, x@ . (12)
In both modes, a prediction weight vector W z, y, x is used to
obtain a weighted sum of local differences, called the predicted central local difference, as

x
1×

1×

S″z,y–1, x–1 S″z,y–1, x
y

S″z,y,x–1

x
1×

1×

S″z,y–1, x+1

2×

S″z,y–1, x–1 S″z,y–1, x
y

1×
Sz,y,x
(a)

dt z, y, x + 2 X v z, y, x
F, (14)
2X + 2

1×

4×

S″z,y–1, x+1

S″z,y–1, x–1 S″z,y–1, x

S″z,y–1, x+1

y
S″z,y,x–1

Sz,y,x
(b)

S″z,y,x–1

Sz,y,x
(c)

FIGURE 3. The local sum calculation modes available in Issue 2. The current sample position is highlighted with a blue border. The sample
representatives employed for the corresponding local sum are shown in orange. (a) Wide neighbor-oriented, (b) narrow neighbor-oriented,
and (c) column-oriented.

108

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

of this mapping is that indices can be represented using
the same number of bits as in the original image. This is
true because predicted values are guaranteed to satisfy
ts z (t) ! [s min, s max]; i.e., predictions do not exceed the range
of allowed sample input values given bit depth D. Thus, the
number of possible prediction errors equals the number of
elements in the aforementioned interval. Based on this, the
mapping is defined as

one of them to code all the mapped quantizer indices for an
image. The first two encoding options were already present
in the previous issue of the standard [35] while the hybrid
coder in Issue 2 is new. The hybrid coder tends to provide
better compression performance than the other two options, but the benefit may be small when compression is
lossless. An overview of the three available coders is provided in the following sections.

| q z (t) | + i z (t), | q z (t) | 2 i z (t)
d z (t) = * 2 | q z (t) | ,
0 # (- 1) us z (t) q z (t) # i z (t) (15)

2 | q z (t) | - 1, otherwise,

BLOCK-ADAPTIVE CODER
A block-adaptive coder is a separate CCSDS standard, originally specified in [38] and later extended in [40], based on
Rice coding. In this coder, the samples are partitioned into
disjoint blocks of fixed length of between eight and 64 samples. Each block is encoded using the most effective of five
available coding methods: zero block, second extension,
fundamental sequence, sample splitting, and no compression. A simplified diagram of this process is shown in Figure 4. Interested readers are referred to [67] for a summary
of key operational concepts and a detailed performance
analysis of this coder.

where us z (t) is a double-resolution version of the predicted
sample value defined in the “Prediction” section, and i z (t)
is the difference between the predicted value and the nearest interval endpoint, i.e.,
i z (t) = min d<

ts z (t) - s min + m z (t)
F, (16)
2m z (t) + 1
s max - ts z (t) + m z (t)
<
Fn . (17)
2m z (t) + 1

ENCODER STAGE
The encoder stage compresses the sequence of mapped
quantizer indices d z (t) produced by the predictor stage
into a variable-length bitstream. This operation is reversible, meaning that an identical sequence of mapped quantizer indices can be recovered from the bitstream. These
indices allow for an exact or approximate reconstruction of
the input image, depending on the error limits set in the
predictor stage.
In Issue 2, three coders are available for this purpose:
sample and block adaptive and hybrid. The user must select

Zero Block

Mapped
Quantizer
Indices
δz(t)

Mapped
Quantizer
Index
Block
Block
Splitting

Fundamental
Sequence
Second
Extension
Sample
Splitting k = 1
Sample
Splitting k = 2

SAMPLE-ADAPTIVE CODER
In the sample-adaptive coder, each mapped quantizer
index d z (t) is compressed using a variable-length codeword from a family of length-limited Golomb-power-of-2
(GPO2) codes. Each GPO2 code is identified by an index k,
which is selected based on the statistics of previously coded
samples. Given k and d z (t), the selected codeword is denoted as 0 k (d z (t)) and defined as follows:
◗◗ If 6d z (t)/2 k@ 1 U max, 0 k (d z (t)) consists of 6d z (t)/2 k@ zeros, followed by a one, followed by the k least-significant
bits of the binary representation of d z (t).

Zero-Block Codeword
Fundamental
Sequence Codeword
Second Extension
Codeword
Sample Splitting
(k = 1) Codeword
Sample Splitting
(k = 2) Codeword

Block of
Selected
Codewords
Coding Option
Selection
Selected
Option ID

...
...
No
Compression

No Compression
Codeword

FIGURE 4. An overview of the block-adaptive entropy coder. The coding options executed in parallel for each block are highlighted in orange.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

109

◗◗ Otherwise, 0 k (d z (t)) consists of U max zeros, followed

by the binary representation of d z (t) using D bits.
Here, U max is a user-specified parameter utilized to limit the
maximum codeword length, and D is the image’s bit depth.
Two variables are used to keep track of the input data
statistics and to choose the GPO2 family’s index k z (t) to
code d z (t): an accumulator R z (t) and a counter C (t). The
ratio of these two variables determines k z (t):
◗◗ If 2C (t) 2 R z (t) + 649C (t) /2 7@, then k z (t) = 0.
◗◗ Otherwise, k z (t) is the largest positive integer such that
k z (t) # D - 2,

C (t) 2 k z (t) # R z (t) + 649C (t) /2 7@ .

(18)

Mapped quantizer indices typically follow a nonstationary
geometric distribution, for which k z (t) is a good parameter
estimator. Note that the counter and accumulator variables
are initialized based on user-specified parameters.
The values of the counter and the accumulator variables are updated after coding each input sample d z (t - 1) .
More specifically, C is increased by one, and R is increased
by d z (t - 1) . In addition, both C and R are periodically
divided by two (rounding down) to enable calculation using finite-precision arithmetic. This division is hereafter referred to as renormalization.
HYBRID CODER
The hybrid coder uses the statistics of previously encoded
data to classify each input-mapped quantizer index as either a high- or low-entropy sample. The high-entropy samples are compressed using a variation of the length-limited
GPO2 code family described in the “Sample-Adaptive Coder” section. The low-entropy samples are coded using another family of 16 variable-to-variable-length codes, i.e., several input samples can be encoded with a single codeword.
A detailed description of these variable-to-variable-length

Start
Input
Sample
δz(t)
Update
Counter and
Accumulator

kz(t)

Calculate GPO2
Code Index
High
Entropy
Sample
Type?
Low
Entropy

Calculate
Low-Entropy
Code Index

Code
Index i

codes can be found in [68]. The ability to adaptively switch
between GPO2 and variable-to-variable-length codes gives
this code the name hybrid.
Variable-to-variable-length codes enable very efficient
compression of highly predictable (low-entropy) samples,
which become more prevalent when near-lossless error
limits are used in the predictor stage. Meanwhile, variableto-variable-length codes introduce variability in the latency
between the arrival of a low-entropy mapped quantizer
index and the output of the codeword that encodes it. To
accommodate this, codewords emitted by the hybrid coder
are designed so that they can be decoded in reverse order.
This is possible thanks to two main properties of the coder.
First, output codewords are suffix-free rather than prefixfree. Second, the compressed image ends with a specification of the final state of the coder. A set of flush tables is
provided in the standard to signal the code states in an unambiguous and compact manner. Reverse decoding allows
for simpler and more memory-efficient implementations
than does FLEX’s original hybrid entropy coder [62]. The
remainder of this section describes Issue 2’s hybrid coder. A
flow diagram of this coder’s logic is provided in Figure 5 to
support this description.
The classification of samples as high or low entropy is
performed using a similar statistical approach to that of
the sample-adaptive coder. Two variables are used to keep
track of these statistics: a counter C (t) and a high-resoluu z (t) . These variables are updated the
tion accumulator R
same way as in the sample-adaptive coder, with two main
differences. First, variables are updated before coding the
input sample; this is done so that decoding can proceed
in reverse order. To this effect, the least-significant bit of
the accumulator variable is output before renormalization
u z (t)
so that the decoder can invert this process. Second, R
is increased by 4d z (t) instead of d z (t) to enable a more

Emit Reversed,
Limited-Length
GPO2 Codeword
R′k (t)(δz(t))
z
Update
Code i
Prefix

Yes
(Likely
Sample)

End

Yes
Emit Code i’s
Codeword Given
Its Current Prefix

No
(Unlikely
Sample)

δz(t ) ≤
Code i Symbol
Limit Li?

Is Code i’ s
Prefix
Complete?

Emit
R′0(δz(t) – Li – 1)

Clear
Code i
Prefix

Complete Code i’s
Codeword With
Scape Symbol

FIGURE 5. A flow diagram of CCSDS 123.0-B-2’s hybrid coder. The logical decisions are highlighted in orange, the processes that update the
codes’ internal state are shown in green, and the processes that emit codewords are presented in purple background.

110

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

precise estimation of the input data statistics. The ratio
u z (t) /C (t) determines whether a sample is a high- or lowR
entropy symbol. More specifically, d z (t) is defined as high
entropy if and only if
u z (t)·2 14 $ T0·C (t), (19)
R
where T0 is a constant provided in the standard. This definition allows for image regions that are well predicted to be
coded with low-entropy codes and using the high-entropy
mode otherwise.
Each high-entropy sample is encoded using a family
of reversed-length-limited GPO2 codes. As in the sampleadaptive case, each code is identified by an index, k z (t).
For the hybrid coder, k z (t) is the largest positive integer
that satisfies
k z (t) # max (D - 2, 2),
u z (t) + 649C (t) /2 5@ .
C (t) 2 k z (t) + 2 # R

(20)

The codeword emitted for the high-entropy sample
d z (t), 0lk z (t) (d z (t)) is defined as follows:
◗◗ If 6d z (t) /2 k z (t)@ 1 U max, then 0lk z (t) (d z (t)) consists of the
k z (t) least-significant bits of the binary representation of
d z (t), followed by a one, followed by 6d z (t) /2 k z (t)@ zeros.
◗◗ Otherwise, 0lk z (t) (d z (t)) consists of the D-bit binary representation of d z (t) followed by U max zeros.
The low-entropy samples are processed with one of 16
available variable-to-variable-length codes. The code index
used to process a low-entropy sample d z (t) is the largest i
satisfying
u z (t)·2 14 1 C (t)·Ti, 0 # i # 15, (21)
R
where T0, f, T15 are constants provided in the standard,
and T0 is used in (19). This definition allows for the magnitude of recent prediction errors to determine the next
variable-to-variable-length code to be used.
Each code i has a prefix of previously input samples.
When a sample is processed, a symbol is added to the corresponding code’s prefix. The standard defines a list of complete prefixes for each code. At this point, if code i’s prefix
matches any of those complete prefixes, a codeword that
uniquely identifies that prefix and its associated sequence
of input samples is emitted. After that, the prefix for that
code is cleared.
It is worth noting that the complete prefixes defined for
code i cannot contain sample values satisfying d z (t) 2 L i,
where L 0, f, L 15 are constants defined in the standard.
When such a sample is processed, i.e., referred to as an unlikely sample, 0l0 (d z (t) - L i - 1) is emitted, and an escape
symbol X is added to the prefix instead of d z (t). Adding
X to any code’s prefix is guaranteed to make it complete
and trigger the emission of an output codeword. The input
symbol limit Li limits the size of the input alphabet in the
low-entropy codes by treating all of the unlikely symbols
in the same way. This enables us to reduce the number of
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

codewords in a code. As escape symbols occur with low
probability, the efficiency with which these residual values are encoded has only a small impact on the overall coding effectiveness.
COMPRESSION PERFORMANCE
EXPERIMENTAL SETUP
The lossless and near-lossless compression performances
of Issue 2 are explained in this section. The results are
provided for both the block- and sample-adaptive entropy
coders already present in Issue 1 and are compared to those
of the new hybrid coder defined in Issue 2. The hybrid
coder’s computational complexity is comprehensively addressed in [69], so the execution time results are not presented here. The empirical results were obtained using a
varied corpus of 17 multispectral images, 38 hyperspectral
images, and two sounder data samples. These were generated by 14 different instruments deployed in real missions,
except for the Pleiades images, which are simulated. Most
of the images included are raw, giving more weight to the
direct compression of images as they are acquired, while
the nonraw instances processed after acquisition are also
included to represent some possible onboard calibration.
Both pushbroom and whiskbroom sensors are covered
in the corpus and include the streaking artifacts that are
characteristic of pushbroom instruments (such as Hyperion) in uncalibrated images. A diverse range of spectral
separations is considered, and examples of images with
significant noise levels (the Moon Mineralogy Mapper) or
that are acquired with airborne instruments [the Compact
Airborne Spectrographic Imager (CASI)] are included as
well. Regarding the dynamic range, all the hyperspectral
and sounding instruments produce data with bit depths
of at least 11 bits, whereas, for multispectral instruments,
samples of lower bit depths are available, too.
A summary of this corpus, produced by the CCSDS MHDC
Working Group, is provided in Table 3. All of the images are
publicly available, except for those produced by the Infrared
Atmospheric Sounding Interferometer (IASI) and Meteosat
Second Generation instruments, due to licensing restrictions. (The download links for the test images can be found
at http://cwe.ccsds.org/sls/docs/sls-dc/123.0-B-Info/TestData.)
The “Entropy” column in the table represents the zero-order
entropy of the images. Note that this is not a strict bound on
compression efficiency and should be regarded as only an assessment of the difficulty of compressing the images.
The performance results are obtained by invoking Issue
2’s compressor with the default set of parameters described
in [37], except for the Hyperion, IASI, Moderate Resolution
Imaging Spectroradiometer (MODIS), and Système Pour
l’Observation de la Terre 5 (SPOT5) instruments. For these,
the following parameters are modified to enhance compression performance: t inc = 2 9, v min = v max = 0, U max = 32, c* = 11,
and c 0 = 4. A full prediction with wide, neighbor-oriented
local sums is used in most of the images, including the
111

terms of rate distortion, i.e., float discrete wavelet transform
(DWT) and spectral pairwise-orthogonal transform (POT),
is used. JPEG-LS is arguably the best-known compression
standard; it offers low complexity and supports both lossless and near-lossless regimes. In turn, M-CALIC is another
low-complexity algorithm well known for its competitive
compression performance. Note that, because JPEG-LS does
not admit an arbitrary number of spectral bands, images
are reshaped by concatenating the bands along the y-axis.
More specifically, an image with a width, height, and number of bands equal to NX, NY, and NZ, respectively, is transformed into a one-band image with the same width and
height as NY and NZ, respectively. No attempt is made to
perform decorrelation across spectral bands for JPEG-LS. In
contrast, M-CALIC is designed specifically to exploit spectral redundancy in hyperspectral images.

four aforementioned instruments. The column-oriented local
sums are employed for images that present streaking artifacts,
i.e., when the average sample values exhibit strong differences
for contiguous x positions. A full analysis of the impact on
performance of parameter tuning as well as an identification of images with streaking artifacts can be found in [37].
To provide a comparison baseline, the authors’ implementation of CCSDS 122.1-B-1, the reference implementation of the JPEG-LS standard, and the original authors’
implementation of multiband context-based adaptive lossless image coding (M-CALIC) [70] are included in the comparison as well. (Note that the employed JPEG-LS implementation is available at https://github.com/thorfdbg/
libjpeg; to attain lossless and near-lossless compression,
this compressor was invoked with parameter −ls 0.) For
CCSDS 122.1-B-1, the best-performing configuration in

TABLE 3. A SUMMARY OF THE EMPLOYED CORPUS PROPERTIES. THE ENTROPY (IN BITS) IS AVERAGED
FOR ALL OF THE IMAGES IN EACH ROW.
INSTRUMENT

ACRONYM

IMAGE TYPE

BIT DEPTH D

ENTROPY

NUMBER
OF BANDS

WIDTH

HEIGHT

NUMBER
OF IMAGES

Atmospheric Infrared Sounder

AIRS

Raw

11.2

1,501

135

Airborne Visible/Infrared Imaging
Spectrometer

AVIRIS

Raw

12.6

224

680

512

—

Raw

8.6

224

614

512

—

Calibrated

10.3

224

677

512

Compact Airborne Spectrographic
Imager

CASI

Raw

12, 13, and 15 11.6

406

1,225

Compact Reconnaissance Imaging
Spectrometer for Mars

CRISM

FRT, raw

10.1

107

640

510

—

FRT, raw

12, 13

10.4

438

640

510

—

FRT, raw

12, 13

10.6

545

640

510

—

HRL, raw

12, 13

11.2

545

320

450

—

MSP, raw

9.8

2,700

Hyperion

Raw

8.5

242

256

1,024

Infrared Atmospheric Sounding
Interferometer

IASI

Calibrated

8,461

Landsat

Raw

6.6

1,024

Moon Mineralogy Mapper

Target, raw

9.7

260

640

512

—

Global, raw

11, 12

9.4

320

512

Moderate Resolution Imaging Spec- MODIS
troradiometer

Night, raw

10.8

1,354

2,030

—

Day, raw

12, 13

8.6

1,354

2,030

—

500 m, raw

12, 13

11.1

2,708

4,060

—

250 m, raw

10.4

5,416

8,120

Meteosat Second Generation

MSG

Calibrated

8.2

3,712

Pleiades High Resolution

Pleiades

High resolution,
simulated

10.8

224

2,465

—

High resolution,
simulated

10.2

224

2,448

SWIR Full Spectrum Imager

SFSI

Calibrated

9.9

240

452

140

—

Raw

9, 11

7.4

240

496

140

Système Pour l’Observation de la
Terre 5 High Resolution Geometric

SPOT5

HRG, processed

6.8

1,024

Vegetation

Raw

9.4

1,728

10,080

FRT: full-resolution target; HRL: half-resolution long; MPS: multispectral survey; HRG: half-resolution.

112

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

LOSSLESS COMPRESSION RESULTS
Lossless compression results are obtained for all of Issue 2’s
entropy coders, for JPEG-LS, and for M-CALIC by setting
the absolute error limit to zero. For each image I in the test
corpus, the compression ratio is defined as

Relative Frequency

0.15

N X ·N Y ·N Z ·D
CR (I) = compressed
data size (bits) . (22)
Based on this definition, higher compression ratio values indicate better compression. A distribution of the obtained compression ratios for each compressor is shown
in Figure 6. Vertical bar heights indicate the relative frequency of each range of compression ratios. The average
compression ratio, plus/minus one standard deviation, is
denoted with a dot and two horizontal bars. Note that the
aggregated results presented here and in the “Near-Lossless
Compression Results” section are not necessarily representative of any particular image or instrument. This is due to
their different statistical properties and the fact that a different number of images is available for each instrument.
As can be observed, all three entropy coders in Issue 2
yield similar compression ratio distributions and average
values. In turn, JPEG-LS and M-CALIC produce average
compression ratios 25 and 13% lower, respectively, than
those of Issue 2. These differences can be explained by the
more advanced predictor stage used in Issue 2.
To provide further insight, the average compression
ratios grouped by instrument are shown in the “Lossless”
columns of Table 4 for Issue 2 using the hybrid coder, for
JPEG-LS, and for M-CALIC. Consistent with the previous discussion, the CCSDS compressor yields higher compression efficiency than do JPEG-LS and M-CALIC for most instruments.
Improvements of up to 63.7 and 63.4%, respectively, can be
observed. Only for the MODIS instrument does JPEG-LS

Sample
Adaptive

0
0.15

Hybrid

0
0.15

Block
Adaptive

0
0.15

JPEG-LS

0
0.15
0

M-CALIC
0

2
3
4
5
Compression Ratio

FIGURE 6. A distribution of lossless compression ratios.

perform better, yielding an average compression ratio 7.7%
higher than Issue 2’s with the hybrid coder. In turn, M-CALIC
improves upon JPEG-LS in all cases and is able to yield results
between 0.3 and 8.9% better than Issue 2 for five of the tested
instruments. These differences can be explained by the fact
that M-CALIC employs an arithmetic entropy coder, which
enables better modeling of the source’s statistics, although at
the cost of higher computational complexity.
NEAR-LOSSLESS COMPRESSION RESULTS
Near-lossless compression results are obtained for all three
entropy coders in CCSDS 123.0-B-2 as well as for JPEG-LS
and M-CALIC by limiting the maximum absolute error in
any pixel of the reconstructed images. This error is hereafter denoted as peak absolute error (PAE). Two illustrative
examples of near-lossless compression using Issue 2 and
JPEG-LS are provided in Figure 7. In the top row, it can be

TABLE 4. THE AVERAGE COMPRESSION RATIO RESULTS GROUPED BY INSTRUMENT.
CCSDS 123.0-B-2 (HYBRID CODER)

JPEG-LS

M-CALIC

INSTRUMENT LOSSLESS PAE 1 PAE 2

PAE 5 PAE 16 LOSSLESS PAE 1 PAE 2 PAE 5 PAE 16 LOSSLESS

PAE 1

PAE 2

PAE 5

PAE 16

AIRS

2.86

4.56

6.09

10.76 35.74

1.89

2.51

2.95

3.97

6.68

2.87

4.51

5.92

9.95

27.51

AVIRIS

3.11

5.28

7.66

15.29 37.52

1.90

2.56

3.03

4.09

7.06

3.01

4.82

6.31

10.23

21.36

CASI

2.29

3.22

3.96

5.91

12.21

1.66

2.08

2.36

2.96

4.38

2.27

3.17

3.87

5.63

11.09

CRISM

3.10

5.05

6.87

11.15

22.93

2.20

3.08

3.71

5.14

8.17

2.21

3.21

4.01

6.08

13.43

Hyperion

2.86

4.57

6.09

10.80 44.75

2.44

3.56

4.48

6.76

13.22

2.79

4.36

5.72

9.59

28.38

IASI

2.53

3.75

4.70

7.17

14.96

1.92

2.56

3.01

4.05

7.12

2.48

3.64

4.55

6.94

15.75

Landsat

2.35

4.12

6.24

12.8

41.88

2.13

3.68

5.09

8.46

20.33

2.37

3.97

5.4

9.25

19.51

4.38

7.44

9.61

14.27 24.28

2.72

4.15

5.29

7.27

10.33

2.68

4.17

5.42

8.86

22.49

MODIS

1.94

2.60

3.07

4.12

2.09

2.77

3.24

4.27

6.95

2.13

2.72

3.22

4.35

7.39

MSG

2.77

4.49

6.06

10.01 24.18

2.64

4.20

5.39

8.08

14.78

2.73

4.12

5.31

8.22

17.45

Pleiades

1.66

2.12

2.43

3.11

5.04

1.62

2.06

2.36

3.01

4.64

1.68

2.16

2.49

3.23

5.18

SFSI

3.07

5.18

7.02

11.97

53.21

2.58

3.75

4.65

6.99

16.5

2.91

4.39

5.65

9.13

30.05

SPOT5

1.55

2.21

2.74

4.22

10.00

1.45

2.03

2.48

3.63

6.69

1.54

2.22

2.74

4.07

8.90

Vegetation

1.95

2.77

3.40

5.04

10.54

1.87

2.61

3.16

4.42

7.78

2.03

2.86

3.51

5.08

10.05

All

2.67

4.20

5.55

9.07

22.98

2.12

3.00

3.67

5.17

9.20

2.35

3.44

4.34

6.7

15.43

7.35

PAE: peak absolute error.

DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

113

reconstructed images. For each compressor, PAE, and input image I, the compressed data rate expressed in bits per
sample (bps) is computed as

observed that Issue 2’s hybrid coder enables higher image quality, i.e., a lower PAE, at similar, albeit smaller,
compressed data sizes. Furthermore, for sufficiently low
PAEs, reconstructed images are hardly distinguishable
from the originals. In turn, the bottom row illustrates
how moderately larger PAEs introduce some texture artifacts, but retain the image’s structure, and so might not
hinder analysis tasks performed on it [30], [31]. A visual
inspection of this row also reveals that Issue 2 introduces
distortion patterns similar to those of JPEG-LS. This is expected because both algorithms apply quantization after
prediction. It is worth noting that the choice of entropy
coder in CCSDS 123.0-B does not affect the obtained reconstructed image, only the compressed data size. Compressed data rate differences aside, a similar discussion
regarding visual quality applies for M-CALIC, too. It is
omitted here for space constraints.
The remainder of this section provides quantitative
discussion of the compression performance of the aforementioned algorithms in relation to the fidelity of the

compressed data size (bits)
. (23)
N X ·N Y ·N Z

compressed data rate =

In turn, the peak SNR (PSNR) between I and its reconstructed counterpart It is defined as
PSNR (I, It) = 10·log 10 d

MAX 2I
n (dB). (24)
MSE(I, It)

Here, MAX I denotes the dynamic range of an image, i.e.,
2 D - 1, where D is I’s bit depth, and mean square error
(MSE)(I, It) is the mean squared error between I and It, i.e.,
N X NY N Z

MSE(I, It) =

/ / / ^I z,y,x - It z,y,xh2
x

N X ·N Y ·N Z

(a)

(b)

(c)

(d)

(e)

(f)

. (25)

FIGURE 7. (a) A crop (256 × 256) of Band 220 of an original AVIRIS f060925t01p00r12_sc00 image (calibrated, 16 bit); (b) and (c) the colo-

cated crops of the same AVIRIS image after reconstruction with CCSDS 123.0-B-2’s hybrid coder [compressed at 2.4 bits per sample (bps)]
and JPEG-LS (2.9 bps) with absolute error limits of 2 and 16, respectively; (d) a crop (128 × 128) of an original SPOT5 toulouse_spot5_xs_
extract1 image (processed, 8 bit); and (e) and (f) the colocated crops of the same SPOT5 image after reconstruction with CCSDS 123.0-B-2’s
hybrid coder (1.1 bps) and JPEG-LS (1.4 bps) with an absolute error limit 12. The brightness and magnification have been adjusted in all of
the images to facilitate a comparison. The SPOT5 images are presented using false color.
114

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

N
O
O
Nz
O . (26)
2
t
/ I z,y,x O
z
P

/ I z,y,x ·It z,y,x
z

/ I 2z,y,x ·
z

The mean spectral angle and maximum spectral angle
metrics are defined as the average and maximum spectral
angle, respectively, for all (x, y) positions in the image.
Figure 8 provides near-lossless compressed data rate results for the three entropy coders of Issue 2, for JPEG-LS,
and for M-CALIC, setting PAE limits between 0 (lossless)
and 32. For each coder and PAE value, the plotted value
is the mean compressed data rate for all the images in the
corpus. Markers have been included in the figure at the integer PAE values for which data have been obtained, and
linear interpolation is used between them for the sake of
readability. The results indicate that, for larger PAE values,
the differences between Issue 2’s coders become more apparent than for the lossless case. When compared to the
block- and sample-adaptive coders, the hybrid coder yields
compressed data rates up to 0.2 and 0.6 bps better, respectively. For PAE values up to 5, both JPEG-LS and M-CALIC
are outperformed by all entropy coders of Issue 2. For PAE
value from 20 onward, M-CALIC improves upon the blockadaptive coder. For PAEs larger than 25, JPEG-LS produces
results better than the sample-adaptive coder. Notwithstanding, for PAE values of 2 and above, the hybrid coder’s
average results are consistently better than all other compressors for all tested PAE values.
The global results presented in Figure 8 are complemented by Table 4, which also reports average compression ratios
for several PAE values. In it, the average compression ratios
for each instrument are provided. It can be observed that the
per-instrument results are generally consistent with global
averages, with similar exceptions as for the lossless case.
These behaviors are explained by the different predictor stages and by the way in which each coder handles
the low-entropy data prevalent in near-lossless compression. The sample-adaptive coder does not have a mode in
which multiple input symbols are compressed in a single
codeword. Therefore, the minimum length of any sampleadaptive codeword sets a lower bound for the compression
rates achievable by this coder. Both JPEG-LS and the blockadaptive coder have run-length modes that allow coding
of consecutive zeros in a single codeword. Thus, their compression performance is increased as the prevalence of such
runs is increased. In turn, the 16 stateful codes featured
in the hybrid coder enable a more efficient processing of
low-entropy data, including inputs that are not sequences
of only zeros. Finally, M-CALIC’s performance improvement for higher PAEs is due to its arithmetic entropy coder,
which is close to optimal for many data distributions.
In addition to considering the compressed data rates
and PAE of the reconstructed images, it is useful to consider
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

5.6
Compressed Data Rate (bps)

J
K
K
a (x, y) = cos -1 K
K
L

other distortion metrics to better understand the efficiency
of each coder. To complete the rate-distortion compression
performance comparison, the average PSNR as a function
of the average compressed data rate is plotted in Figure 9.
The mean spectral angle and maximum spectral angle metrics are plotted in Figure 10(a) and (b), respectively. All of
the metrics are computed for each coder, PAE value (or target bitrate, for CCSDS 122.1-B-1), and test image, and the
mean values are used in the plots. Markers are placed at
the obtained data points, and linear interpolation is used

4.8
4
3.2
2.4
1.6
0.8
0

PAE
Block Adaptive
Sample Adaptive
Hybrid

JPEG-LS
M-CALIC
CCSDS 122.1 (POT)

FIGURE 8. The average compressed data rate in bps as a function

of the maximum absolute error.

72
68
64
PSNR (dB)

The spectral angle is computed at each (x, y) position for each
original and reconstructed image pair, defined in [71] as

60
56
52
48
1

Compressed Data Rate (bpppc)
Block Adaptive
Sample Adaptive
Hybrid

JPEG-LS
M-CALIC
CCSDS 122.1 (POT)

FIGURE 9. The average PSNR results as a function of the average

compressed data rate.
115

between them to enhance readability. As in the previous case,
the hybrid coder yields better fidelity results than do the other
near-lossless coders for all the metrics, especially at low compressed data rates. This and other differences among compressors are comparable to those shown in Figure 8, for similar reasons as mentioned previously in this section.
When compared to CCSDS 122.1-B-1, all of the nearlossless codecs yield significantly better PAE results. This is as
expected, as the CCSDS 122.1-B-1 standard is not designed
to bound the maximum introduced error, but rather to minimize MSE. At low bitrates, i.e., below 1.25 bps, CCSDS 122.1B-1 yields the best PSNR results of all the tested codecs. Again,
this can be explained by the minimization goal of the standard. At higher bitrates, the hybrid coder of Issue 2 produces
the best PSNR results, which illustrates the competitive performance of CCSDS 123.0-B-2. When spectral angles are considered, the relative performance of the near-lossless coders is
very similar to the PAE and PSNR cases. In turn, for the mean
spectral angle metric, CCSDS 122.1-B-1 improves upon all
the other coders for bitrates up to 2 bps. This can be explained
by the fact that CCSDS 122.1-B-1 applies a spectral transform
across all bands, instead of predicting pixel values using a local spatial and spectral neighborhood. Interestingly, when
the maximum spectral angle is considered, CCSDS 123.0-B-2
yields better results than does CCSDS 122.1-B-1, except for
low bitrates, i.e., below 0.75 bps. This can be explained by the
fact that CCSDS 123.0-B-2 is near lossless, i.e., it bounds the
maximum error introduced in any pixel of the image.
CONCLUSIONS
Multispectral imaging and HSI have become invaluable tools
for many commercial, scientific, and defense applications

of remote sensing. With the advent of sensors allowing enhanced spatial and spectral resolution, data compression is
paramount to maximize the amount of valuable information retrieved from spaceborne systems. In particular, nearlossless compression can significantly improve the effective
capacity of transmission channels while providing strict
control of the distortion introduced in the images. Even
if rate-control strategies are possible, strong quality guarantees are prioritized over obtaining constant data rates in
near-real time transmission.
The CCSDS 123.0-B-2 compression standard published
by the CCSDS enables the specification of absolute and/
or relative error limits at the image or band level. This is
achieved via the uniform, in-loop quantization of prediction errors, obtaining higher performance at the expense
of a simpler implementation. As the decompressor does not
have access to the original image samples, sample representatives are used instead in the predictor stage. To fully exploit the lower entropy rates exhibited by quantized data, a
new hybrid entropy coder is defined for Issue 2. This coder
includes 16 variable-to-variable-length codes selected on
a sample-by-sample basis depending on the statistics of
previously coded information. One last improvement over
CCSDS 123.0-B-1 is the definition of narrow local sums
that facilitate the design of highly efficient hardware implementations. Experimental results with a comprehensive
corpus of test images indicate that the new hybrid coder
yields competitive compression performance results, measurably improving upon the other coding modes of Issue 2
as well as upon the JPEG-LS compression standard and the
M-CALIC algorithm. The standard obtains state-of-the-art
performance in absolute or relative error measurements,

2.4

Maximum Spectral Angle (°)

Mean Spectral Angle (°)

2.8

2
1.6
1.2
0.8
0.4
0

21
18
15
12
9
6
3

Compressed Data Rate (bps)

(a)

(b)

Block Adaptive
Sample Adaptive

JPEG-LS
M-CALIC

Hybrid
CCSDS 122.1 (POT)

FIGURE 10. The spectral angle metrics as a function of the compressed data rate. (a) The mean spectral angle and (b) the maximum

spectral angle.
116

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

while other approaches may provide better performance in
terms of quadratic error at very low rates.
Regarding future developments related to this standard,
it is unlikely that major changes are introduced soon.
ACKNOWLEDGMENTS
Miguel Hernández-Cabronero, Ian Blanes, and Joan SerraSagristà received partial funding from the postdoctoral
fellowship program Beatriu de Pinós, reference 2018-BP00008, funded by the Secretary of Universities and Research (Government of Catalonia) and by the H2020 Programme of Research and Innovation of the European Union
(EU) under Marie Skłodowska-Curie grant agreement
801370; from the EU’s H2020 program under grant agreement 776151; from the Spanish Government under grant
RTI2018-095287-B-I00; and from the Catalan Government
under grant 2017SGR-463. The research conducted at the Jet
Propulsion Laboratory at the California Institute of Technology was performed under a contract with NASA. Miguel
Hernández-Cabronero is the corresponding author.
AUTHOR INFORMATION

[3]

[4]

[5]

[6]

[7]

Miguel Hernández-Cabronero (miguel.hernandez@uab

.cat) is with the Department of Information and Communications Engineering, Universitat Autònoma de Barcelona,
Barcelona, 08193, Spain.
Aaron B. Kiely (aaron.b.kiely@jpl.nasa.gov) is with the
Jet Propulsion Laboratory, California Institute of Technology, Pasadena, California, 91109, USA. He is a Senior Member of IEEE.
Matthew Klimesh (matthew.a.klimesh@jpl.nasa.gov) is
with the Jet Propulsion Laboratory, California Institute of
Technology, Pasadena, California, 91109, USA. He is a Senior Member of IEEE.
Ian Blanes (ian.blanes@uab.ca) is with the Department
of Information and Communications Engineering, Universitat Autònoma de Barcelona, Barcelona, 08193, Spain. He
is a Senior Member of IEEE.
Jonathan Ligo (jonathan.ligo@jhuapl.edu) is with the
Applied Physics Laboratory, Johns Hopkins University, Baltimore, Maryland, 20723, USA. He is a Member of IEEE.
Enrico Magli (enrico.magli@polito.it) is with the Department of Electronics and Telecommunications, Politecnico di Torino, Turin, 10129, Italy. He is a Fellow of IEEE.
Joan Serra-Sagristà (joan.serra@uab.cat) is with the
Department of Information and Communications Engineering, Universitat Autònoma de Barcelona, Barcelona,
08193, Spain. He is a Senior Member of IEEE.

[8]

[9]

[10]

[11]

[12]

[13]

[14]

REFERENCES
[1]

[2]

M. Parente, J. Kerekes, and R. Heylen, “A special issue on hyperspectral imaging [from the guest editors],” IEEE Geosci. Remote
Sens. Mag. (replaces Newslett.), vol. 7, no. 2, pp. 6–7, June 2019.
doi: 10.1109/MGRS.2019.2912617.
E. J. Ientilucci and S. Adler-Golden, “Atmospheric compensation of hyperspectral data: An overview and review of in-scene

DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

[15]

and physics-based approaches,” IEEE Geosci. Remote Sens. Mag.
(replaces Newslett.), vol. 7, no. 2, pp. 31–50, June 2019. doi:
10.1109/MGRS.2019.2904706.
M. J. Khan, H. S. Khan, A. Yousaf, K. Khurshid, and A. Abbas,
“Modern trends in hyperspectral image analysis: A review,” IEEE
Access, vol. 6, pp. 14,118–14,129, Mar. 2018. doi: 10.1109/ACCESS.2018.2812999.
M. Malyy, Z. Tekic, and A. Golkar, “What drives technology innovation in new space? A preliminary analysis of venture capital investments in earth observation start-ups,” IEEE Geosci. Remote Sens. Mag. (replaces Newslett.), vol. 7, no. 1, pp. 59–73, Mar.
2019. doi: 10.1109/MGRS.2018.2886999.
J. Theiler, A. Ziemann, S. Matteoli, and M. Diani, “Spectral variability of remotely sensed target materials: Causes, models, and
strategies for mitigation and robust exploitation,” IEEE Geosci.
Remote Sens. Mag. (replaces Newslett.), vol. 7, no. 2, pp. 8–30,
June 2019. doi: 10.1109/MGRS.2019.2890997.
Y. Zhong et al., “Mini-UAV-borne hyperspectral remote sensing: From observation and processing to applications,” IEEE
Geosci. Remote Sens. Mag. (replaces Newslett.), vol. 6, no. 4, pp.
46–62, Dec. 2018. doi: 10.1109/MGRS.2018.2867592.
G. Denis et al., “Towards disruptions in Earth observation?
New Earth Observation systems and markets evolution:
Possible scenarios and impacts,” Acta Astronaut. (U.K.), vol. 137,
pp. 415–433, Aug. 2017. doi: 10.1016/j.actaastro.2017.04.034.
W. Sun and Q. Du, “Hyperspectral Band Selection: A Review,”
IEEE Geosci. Remote Sens. Mag. (replaces Newslett.), vol. 7, no. 2,
pp. 118–139, June 2019. doi: 10.1109/MGRS.2019.2911100.
S. Li, W. Song, L. Fang, Y. Chen, P. Ghamisi, and J. A. Benediktsson, “Deep learning for hyperspectral image classification: An
overview,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 9, pp.
6690–6709, 2019. doi: 10.1109/TGRS.2019.2907932.
P. Duan, X. Kang, S. Li, P. Ghamisi, and J. A. Benediktsson, “Fusion of multiple edge-preserving operations for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 12,
pp. 10,336–10,349, 2019. doi: 10.1109/TGRS.2019.2933588.
Y. Su, J. Li, A. Plaza, A. Marinoni, P. Gamba, and S. Chakravortty, “DAEN: Deep autoencoder networks for hyperspectral
unmixing,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 7, pp.
4309–4321, 2019. doi: 10.1109/TGRS.2018.2890633.
Y. Chen, K. Zhu, L. Zhu, X. He, P. Ghamisi, and J. A. Benediktsson, “Automatic design of convolutional neural network
for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 9, pp. 7048–7066, 2019. doi: 10.1109/
TGRS.2019.2910603.
J. M. Haut et al., “Cloud deep networks for hyperspectral image
analysis,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 12, pp.
9832–9848, 2019. doi: 10.1109/TGRS.2019.2929731.
B. Tu, X. Zhang, X. Kang, J. Wang, and J. A. Benediktsson, “Spatial density peak clustering for hyperspectral image classification with noisy labels,” IEEE Trans. Geosci. Remote Sens., vol. 57,
no. 7, pp. 5085–5097, 2019. doi: 10.1109/TGRS.2019.2896471.
K. Bhardwaj, S. Patra, and L. Bruzzone, “Threshold-free attribute profile for classification of hyperspectral images,” IEEE
Trans. Geosci. Remote Sens., vol. 57, no. 10, pp. 7731–7742, 2019.
doi: 10.1109/TGRS.2019.2916169.

117

[16] X. Lu, L. Dong, and Y. Yuan, “Subspace clustering constrained
sparse NMF for hyperspectral unmixing,” IEEE Trans. Geosci.
Remote Sens., vol. 58, no. 5, pp. 3007–3019, 2020. doi: 10.1109/
TGRS.2019.2946751.
[17] C. J. Della Porta, A. A. Bekit, B. H. Lampe, and C. Chang, “Hyperspectral image classification via compressive sensing,” IEEE
Trans. Geosci. Remote Sens., vol. 57, no. 10, pp. 8290–8303, 2019.
doi: 10.1109/TGRS.2019.2920112.
[18] J. Nalepa, M. Myller, and M. Kawulok, “Validating hyperspectral image segmentation,” IEEE Geosci. Remote Sens. Lett.,
vol. 16, no. 8, pp. 1264–1268, 2019. doi: 10.1109/LGRS.2019.
2895697.
[19] D. Hong, X. Wu, P. Ghamisi, J. Chanussot, N. Yokoya, and X.
X. Zhu, “Invariant attribute profiles: A spatial-frequency joint
feature extractor for hyperspectral image classification,” IEEE
Trans. Geosci. Remote Sens., 2020, pp. 1–18.
[20] “IASI level 1: Product guide,” EUMETSAT, Tech. Rep. EUM/
OPS-EPS/MAN/04/0032, Darmstadt, Germany, Sept. 2019.
[21] K. Turpie, S. Veraverbeke, R. Wright, M. Anderson, and D. Quattrochi, “NASA 2014 The Hyperspectral Infrared Imager (HyspIRI) – Science impact of deploying instruments on separate platforms,” Jet Propulsion Lab., Tech. Rep. JPL-Publ-14-13, July 2014.
[Online]. Available: http://hdl.handle.net/2060/20160001776
[22] S.-E. Qian, Optical Satellite Data Compression and Implementation.
Bellingham, WA: SPIE, 2013.
[23] B. Huang, Satellite Data Compression. Berlin, Germany: Springer
Science & Business Media, 2011.
[24] K. Sayood, Introduction to Data Compression, 5th ed. San Mateo,
CA: Morgan Kaufmann, 2017.
[25] S. Álvarez-Cortés, J. Serra-Sagristà, J. Bartrina-Rapesta, and
M. W. Marcellin, “Regression wavelet analysis for near-lossless remote sensing data compression,” IEEE Trans. Geosci.
Remote Sens., vol. 58, no. 2, pp. 790–798, 2020. doi: 10.1109/
TGRS.2019.2940553.
[26] D. Valsesia and E. Magli, “High-throughput onboard hyperspectral image compression with ground-based CNN reconstruction,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 12, pp.
9544–9553, Dec. 2019. doi: 10.1109/TGRS.2019.2927434.
[27] M. Díaz et al., “Real-time hyperspectral image compression
onto embedded GPUs,” IEEE J. Select. Topics Appl. Earth Observat.
Remote Sens., vol. 12, no. 8, pp. 2792–2809, 2019. doi: 10.1109/
JSTARS.2019.2917088.
[28] S.-E. Qian, Optical Satellite Signal Processing and Enhancement.
Bellingham, WA: SPIE, 2013.
[29] Z. Chen, Y. Hu, and Y. Zhang, “Effects of compression on remote sensing image classification based on fractal analysis,”
IEEE Trans. Geosci. Remote Sens., vol. 57, no. 7, pp. 4577–4590,
July 2019. doi: 10.1109/TGRS.2019.2891679.
[30] J. García-Sobrino, J. Serra-Sagristà, and A. J. Pinho, “Competitive
segmentation performance on near-lossless and lossy compressed
remote sensing images,” IEEE Geosci. Remote Sens. Lett, vol. 17,
no. 5 , pp. 834–838, 2020. doi: 10.1109/LGRS.2019.2934997.
[31] F. Garcia-Vilchez et al., “On the impact of lossy compression on
hyperspectral image classification and unmixing,” IEEE Geosci.
Remote Sens. Lett., vol. 8, no. 2, pp. 253–257, 2010. doi: 10.1109/
LGRS.2010.2062484.

118

[32] I. Blanes, E. Magli, and J. Serra-Sagrista, “A tutorial on image
compression for optical space imaging systems,” IEEE Geosci.
Remote Sens. Mag. (replaces Newslett.), vol. 2, no. 3, pp. 8–26,
Sept. 2014. doi: 10.1109/MGRS.2014.2352465.
[33] A. D. George and C. M. Wilson, “Onboard processing with hybrid and reconfigurable computing on small satellites,” Proc.
IEEE, vol. 106, no. 3, pp. 458–470, 2018. doi: 10.1109/JPROC.
2018.2802438.
[34] Low-Complexity Lossless and Near-Lossless Multispectral and Hyperspectral Image Compression, Consultative Committee for Space
Data Systems (CCSDS) Standard CCSDS 123.0-B-2, Feb. 2019.
[Online]. Available: https://public.ccsds.org/Pubs/123x0b2c1.pdf
[35] Lossless Multispectral & Hyperspectral Image Compression. Silver
Book, Consultative Committee for Space Data Systems (CCSDS)
Standard CCSDS 123.0-B-1-S, May 2012. [Online]. Available:
https://public.ccsds.org/Pubs/123x0b1ec1s.pdf
[36] A. Kiely et al., “The new CCSDS Standard for low-complexity
lossless and near-lossless multispectral and hyperspectral image compression,” in Proc. 6th Int. Workshop on On-Board Payload
Data Compression (OBPDC), 2018, pp. 1–6.
[37] I. Blanes, A. Kiely, M. Hernández-Cabronero, and J. Serra-Sagristà,
“Performance impact of parameter tuning on the CCSDS-123.0B-2 low-complexity lossless and near-lossless multispectral and
hyperspectral image compression standard,” MDPI Remote Sens.,
vol. 11, no. 11, p. 1390, 2019. doi: 10.3390/rs11111390.
[38] Lossless Data Compression, Consultative Committee for Space
Data Systems (CCSDS) Standard CCSDS 121.0-B-1-S, May 1997.
[Online]. Available: https://public.ccsds.org/Pubs/121x0b1sc2.pdf
[39] Image Data Compression, Consultative Committee for Space Data
Systems (CCSDS) Standard CCSDS 122.0-B-1-S, May 2005.
[Online]. Available: https://public.ccsds.org/Pubs/122x0b1c3s
.pdf
[40] Lossless Data Compression, Consultative Committee for Space
Data Systems (CCSDS) Standard CCSDS 121.0-B-2, Apr. 2012.
[Online]. Available: https://public.ccsds.org/Pubs/121x0b2ec1s
.pdf
[41] Image Data Compression, Consultative Committee for
Space Data Systems (CCSDS) Standard CCSDS 122.0-B-2, Sept.
2017. [Online]. Available: https://public.ccsds.org/Pubs/
122x0b2.pdf
[42] Spectral Preprocessing Transform for Multispectral and Hyperspectral
Image Compression, Consultative Committee for Space Data Systems (CCSDS) Standard CCSDS 122.1-B-1, Sept. 2017. [Online].
Available: https://public.ccsds.org/Pubs/122x1b1.pdf
[43] Lossless Data Compression, Consultative Committee for Space
Data Systems (CCSDS) Standard CCSDS 121.0-B-3, Aug. 2020.
[Online]. Available: https://public.ccsds.org/Pubs/121x0b3.pdf
[44] D. Báscones, C. González, and D. Mozos, “Parallel implementation of the CCSDS 1.2.3 standard for hyperspectral lossless
compression,” MDPI Remote Sens., vol. 9, no. 10, p. 973, 2017.
doi: 10.3390/rs9100973.
[45] A. Tsigkanos, N. Kranitis, G. A. Theodorou, and A. Paschalis,
“A 3.3 Gbps CCSDS 123.0-B-1 multispectral hyperspectral image compression hardware accelerator on a space-grade SRAM
FPGA,” IEEE Trans. Emerg. Topics Comput., early access, July 12,
2018.
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

[46] J. Fjeldtvedt, M. Orlandić, and T. A. Johansen, “An efficient real-time FPGA Implementation of the CCSDS-123 compression
standard for hyperspectral images,” IEEE J. Select. Topics Appl.
Earth Observat. Remote Sens., vol. 11, no. 10, pp. 3841–3852,
2018. doi: 10.1109/JSTARS.2018.2869697.
[47] M. Orlandić, J. Fjeldtvedt, and T. A. Johansen, “A parallel FPGA
implementation of the CCSDS-123 compression algorithm,”
MDPI Remote Sens., vol. 11, no. 6, p. 673, 2019. doi: 10.3390/
rs11060673.
[48] L. M. V. Pereira, D. A. Santos, C. A. Zeferino, and D. R. Melo,
“A low-cost hardware accelerator for CCSDS 123 predictor in
FPGA,” in Proc. IEEE Int. Symp. Circuits and Syst. (ISCAS), 2019,
pp. 1–5. doi: 10.1109/ISCAS.2019.8702428.
[49] L. Santos, L. Berrojo, J. Moreno, J. F. López, and R. Sarmiento,
“Multispectral and hyperspectral lossless compressor for space
applications (HyLoC): A low-complexity FPGA implementation of the CCSDS 123 standard,” IEEE J. Select. Topics Appl.
Earth Observat. Remote Sens., vol. 9, no. 2, pp. 757–770, 2016.
doi: 10.1109/JSTARS.2015.2497163.
[50] L. Santos, A. J. Gomez, and R. Sarmiento, “Implementation
of CCSDS standards for lossless multispectral and hyperspectral satellite image compression,” IEEE Trans. Aerosp. Electron. Syst., vol. 56, no. 2, pp. 1120–1138, 2020. doi: 10.1109/
TAES.2019.2929971.
[51] Y. Barrios, A. J. Sánchez, L. Santos, and R. Sarmiento, “Shyloc 2.0: A versatile hardware solution for on-board data and
hyperspectral image compression on future space missions,”
IEEE Access, vol. 8, pp. 54,269–54,287, 2020. doi: 10.1109/ACCESS.2020.2980767.
[52] “High-speed integrated satellite data systems for leading EU industry,” European Commission, Hi-SIDE Project,
H2020-COMPET-3-2017 (RIA): High speed data chain, Gemany, 2018–2021.
[53] “Next generation satellite processing chain for rapid civil
alerts,” European Commission, EO-ALERT Project, H2020COMPET-3-2017 (RIA): High speed data chain Spain, 2018–
2021.
[54] D. Keymeulen et al., “High performance space data acquisition,
clouds screening and data compression with modified COTS
embedded system-on-chip instrument avionics for space-based
next generation imaging spectrometers (NGIS),” in Proc. 6th Int.
Workshop on On-Board Payload Data Compression (OBPDC), 2018,
pp. 7–15.
[55] “Copernicus Hyperspectral Imaging Mission for the Environment, mission requirements document.” European Space
Agency, France, 2018. http://esamultimedia.esa.int/docs/EarthObservation/Copernicus_CHIME_MRD_v2.1_Issued20190723.pdf
[56] M. Conoscenti, R. Coppola, and E. Magli, “Constant SNR, rate
control, and entropy coding for predictive lossy hyperspectral
image compression,” IEEE Trans. Geosci. Remote Sens., vol. 54,
no. 12, pp. 7431–7441, 2016. doi: 10.1109/TGRS.2016.2603998.
[57] J. Bartrina-Rapesta, I. Blanes, F. Aulí-Llinàs, J. Serra-Sagristà, V.
Sanchez, and M. W. Marcellin, “A lightweight contextual arithmetic coder for on-board remote sensing data compression,”
IEEE Trans. Geosci. Remote Sens., vol. 55, no. 8, pp. 4825–4835,
2017. doi: 10.1109/TGRS.2017.2701837.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

[58] J. Song, Z. Zhang, and X. Chen, “Lossless compression of hyperspectral imagery via RLS filter,” Electron. Lett., vol. 49, no. 16,
pp. 992–994, 2013. doi: 10.1049/el.2013.1315.
[59] F. Gao and S. Guo, “Lossless compression of hyperspectral images using conventional recursive least-squares predictor with
adaptive prediction bands,” J. Appl. Remote Sens., vol. 10, no. 1,
p. 015010, 2016. doi: 10.1117/1.JRS.10.015010.
[60] A. C. Karaca and M. K. Güllü, “Lossless hyperspectral image
compression using bimodal conventional recursive leastsquares,” Remote Sens. Lett., vol. 9, no. 1, pp. 31–40, 2018. doi:
10.1080/2150704X.2017.1375612.
[61] A. C. Karaca and M. K. Güllü, “Superpixel based recursive leastsquares method for lossless compression of hyperspectral images,”
Multidimensional Syst. Signal Process., vol. 30, no. 2, pp. 903–919, 2019.
[62] D. Keymeulen et al., “High performance space computing with
system-on-chip instrument avionics for space-based Next Generation Imaging Spectrometers (NGIS),” in Proc. NASA/ESA
Conf. Adaptive Hardware and Syst. (AHS), Aug. 2018, pp. 33–36.
doi: 10.1109/AHS.2018.8541473.
[63] M. Klimesh, “Low-complexity lossless compression of hyperspectral imagery via adaptive filtering,” Jet Propulsion Lab.,
NASA, Pasadena, CA, Tech. Rep., 2005. [Online]. Available:
http://ipnpr.jpl.nasa.gov/progress_report/42-163/163H.pdf
[64] D. Valsesia and E. Magli, “A novel rate control algorithm for
onboard predictive coding of multispectral and hyperspectral
images,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 10, pp.
6341–6355, 2014. doi: 10.1109/TGRS.2013.2296329.
[65] D. Valsesia and E. Magli, “Fast and lightweight rate control for
onboard predictive coding of hyperspectral images,” IEEE Geosci. Remote Sens. Lett., vol. 14, no. 3, pp. 394–398, 2017.
[66] R. Guerra, Y. Barrios, M. Díaz, A. Baez, S. López, and R. Sarmiento, “A hardware-friendly hyperspectral lossy compressor for nextgeneration space-grade field programmable gate arrays,” IEEE J.
Select. Topics Appl. Earth Observat. Remote Sens., vol. 12, no. 12, pp.
4813–4828, 2019. doi: 10.1109/JSTARS.2019.2919791.
[67] Lossless Data Compression, Green Book, no. 3, Consultative
Committee for Space Data Systems (CCSDS), Washington,
D.C., 2013.
[68] I. Blanes, A. Kiely, L. Santos, M. Hernández-Cabronero, and J.
Serra-Sagristà, “The hybrid entropy encoder of CCSDS 123.0B-2: Insights and decoding process,” in Proc. 7th Int. Workshop
on On-Board Payload Data Compression (OBPDC), Sept. 2020,
pp. 1–10.
[69] M. Hernández-Cabronero, J. Portell, I. Blanes, and J. SerraSagristà, “High-performance lossless compression of hyperspectral remote sensing scenes based on spectral decorrelation,” MDPI Remote Sens., vol. 12, no. 18, p. 2955, 2020. doi:
10.3390/rs12182955.
[70] E. Magli, G. Olmo, and E. Quacchio, “Optimized onboard lossless and near-lossless compression of hyperspectral data using
CALIC,” IEEE Geosci. Remote Sens. Lett., vol. 1, no. 1, pp. 21–25,
2004. doi: 10.1109/LGRS.2003.822312.
[71] F. A. Kruse et al., “The spectral image processing system (SIPS)interactive visualization and analysis of imaging spectrometer
data,” AIP Conf. Proc., vol. 283, no. 1, pp. 192–201, 1993.
GRS

119

©SHUTTERSTOCK.COM/SALMANALFA

Advances and
Opportunities in Remote
Sensing Image Geometric
Registration
A systematic review of state-of-the-art approaches
and future research directions
RUITAO FENG, HUANFENG SHEN, JIANJUN BAI, AND XINGHUA LI

Digital Object Identifier 10.1109/MGRS.2021.3081763
Date of current version: 28 June 2021

OVERVIEW
Remote sensing images from various sensors, periods, and
viewpoints can provide complementary information about
regions of interest (ROIs) and Earth surface observation.
Owing to various factors, such as Earth’s rotation and curvature and variations in platform altitudes, remote sensing
images contain systematic geometric distortions that cannot be thoroughly corrected without high-precision elevation data [through the digital elevation model (DEM) or
the digital surface model (DSM)] and control points on
the ground. Although the true digital orthophoto map
(TDOM) promises accurate spatial positions, it has high
production costs and is difficult for general users to obtain.
Therefore, most available remote sensing images retain
small geometrical distortions after systematic correction,
resulting in objects in one image not spatially corresponding to those in another image, as in Figure 1.
Furthermore, topographical fluctuations in mountainous regions, differences in imaging viewpoints (shown in
Figure 2), and spatial resolutions cause dislocation in two

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

eometric registration is often an accuracy assurance for
most remote sensing image processing and analysis,
such as image mosaicking, image fusion, and time-series
analysis. In recent decades, geometric registration has attracted considerable attention in the remote sensing community, leading to a large amount of research on the subject. However, few studies have systematically reviewed
its current status and deeply investigated its development
trends. Moreover, new approaches are constantly emerging,
and some issues still need to be solved. Thus, this article
presents a survey of state-of-the-art approaches for remote
sensing image registration in terms of intensity-based, feature-based, and combination techniques. Optical flow estimation and deep learning-based methods are summarized,
and software-operated registration and registration evaluation are introduced. Building on recent advances, promising opportunities are explored.

120

DECEMBER 2021

(a)

(b)

(c)

FIGURE 1. Multitemporal optical image geometrical dislocation. (a) A reference image taken by Landsat 5 on 15 October 1990. (b) A sensed

image taken by Landsat 5 on 15 September 1993. (c) The overlapping images of (a) and (b).
images covering the same scene. Thus, geometrical registration techniques are implemented to align two or more
images from the image-to-image perspective rather than
the imaging mechanism. Consequently, geometrical registration is an image-processing technique that aligns different images of the same scene acquired at various times
and viewing angles and with multiple sensors [1]. As a fundamental task in remote sensing information processing,
it is a prerequisite for many practical applications, such as
image mosaicking [2], image fusion [3], land cover change
detection [4], [5], and disaster evaluation [6], [7].
It worth noting that there is a technical term, coregistration,
that is similar but not exactly the same as image registration.
It is now commonly used in aerial and unmanned aerial
vehicle image registration, generally including multimode
registration and alignment through the aid of auxiliary data.
When the registration is conducted with a GPS/inertial measurement unit, it usually establishes a connection between
an image and the simulated or real ground [8]. Certainly,
the registration technology works on tie points generation
for the construction of relationships. With real ground control points (GCPs), the tie points between the reference and
sensed bands are produced to register different bands of hyperspectral images [9]. Additionally, when the orientation of
the reference image is determined, without GCPs, the coregistration of multitemporal high-resolution image blocks is automatically achieved [10]. Although there are time-increasing
papers focused on coregistration techniques doing some auxiliary work with the positioning data, the core of the process
is image registration, as far as we are concerned. Therefore,
the emphasis is put on the opportunities and challenges of
geometrical registration in remote sensing fields.
Geometrical registration can be traced to the 1970s,
when the United States proposed image registration to analyze target objects in aircraft-aided navigation and weapons systems. Since then, it has rapidly developed, particularly in the domains of remote sensing, computer vision,
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

and medical image processing. Some conclusive studies of
computer vision and medical image processing have been
published [11]–[16]. Building on a widespread survey of image registration, published in 1992 by Brown [15], a 2003
review [16] comprehensively summarized the subsequent
research. In recent years, several overviews of image registration have focused on newly developed approaches inspired by extant versions [17]–[19]. However, these surveys
are limited to analyzing and drawing conclusions based on
conventional approaches [20]–[22]. Since the first study
of multispectral and multitemporal digital imagery registration in 1970 [23], an increasing number of papers have
contributed to the field. A total of 140,983 related studies with the keywords image registration or image matching
were retrieved, from 1979 to January 2021, from Web of
Science (WoS). When screening again using the keyword
remote sensing, 46,141 articles were found, as plotted, based
on their publication year, in Figure 3. The respective proportions of the total number of papers on WoS per year are
also presented. It can be seen that a small number of papers

T+t

FIGURE 2. The angle difference from multitemporal images in a
mountainous region.

121

feature-based, and combination registration, as detailed in
Figure 4. The intensity-based technique directly uses pixel
intensity information to register images, including the conventional area-based approach and optical flow estimation.
The geometrical and advanced features used to register images instead of intensity information are defined as featurebased approaches. Combination registration mainly consists of the integration of feature- and area-based methods
as well as two geometric feature-based techniques. Many
detailed classifications are presented in each category.
All registration approaches must undergo coordinate
transformation and resampling to ultimately acquire the
aligned image, as demonstrated in Figure 5. Before this step,
transformation models for coordinate recalculation other
than optical flow estimation should be constructed. In general, transformation models, such
as the affine, projective, piecewise
linear, and thin spline models, are
3,500
0.14
derived from global or local paramet2019: 3,154
Published Papers
ric models. To calculate these models,
3,000
0.12
Proportion
images are preprocessed to extract
2,500
0.1
representative features through techniques including geometrical- and
2,000
0.08
advanced-feature extraction and
1,500
0.06
matching. Given that intensity information is directly utilized in area1,000
0.04
based registration, feature extraction
is omitted, and the transformation
500
0.02
model is constructed when matching
0
0
the intensity information. Since most
approaches prefer to contribute to the
preliminary steps (e.g., feature exYear
traction, feature matching, and mismatched feature elimination) rather
FIGURE 3. The number of papers about remote sensing image registration on WoS, per year.
1979
1981
1983
1985
1987
1989
1991
1993
1995
1997
1999
2001
2003
2005
2007
2009
2011
2013
2015
2017
2019
2021

Proportion (%)

Published Papers

were presented early in the field’s development, with remote sensing image registration accounting for a minimal
percentage of annual WoS publications. More recently, a
considerable number of studies have been published, peaking in 2019. Thus, comprehensive analysis is necessary to
identify unsolved problems for the rapid development of
this field.
In this article, we summarize various classical approaches to remote sensing image registration as well as
recent methods based on deep learning, optical flow estimation, and image registration software. We also point out
interesting aspects and analyze development trends from
our perspective, without describing specific approaches
in detail. Concretely, the registration approaches can be
classified into three categories, namely, intensity-based,

• Two Geometric Feature-Based
Methods

• Feature- and Area-Based Methods
Combination
Method

Remote
Sensing Image
Registration

Intensity-Based
Method

• Area Based

• Optical Flow

Feature-Based
Method

• Geometrical Feature Based

• Deep Learning

Frequency Domain

Dense Optical Flow

Points, Lines, and Polygons

Siamese Network

Spatial Domain

Sparse Optical Flow

Feature Matching...

GAN

FIGURE 4. The remote sensing image registration algorithms. GAN: generative adversarial network.

122

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

than designing new transformation models and presenting novel resampling techniques, this article emphasizes
the previous steps, as well, comprehensively summarizing
studies and further predicting development trends.
INTENSITY-BASED REGISTRATION
Intensity-based registration directly employs original or extended intensity information, such as gradients, for registering remote sensing images. In addition to the traditional
area-based approach, we classify optical flow estimation,
a direct calculation of the increased displacement of corresponding pixels with intensity information, as intensitybased registration.
AREA-BASED METHOD
In general, area-based registration accords with a similarity criterion established in advance and adopts the optimal search
strategy to iteratively find the parameters of the transformation model that yield the maximum or minimum similarity
measurement to achieve the spatial registration of images, as
illustrated in Figure 6. With the transformation model constantly being optimized, the aligned image changes gradually, which is mainly reflected in the growing black area in
the lower- and upper-left-hand corners of the aligned image.
This approach differs from image matching, which is generally understood as template matching. Although both methods directly employ intensity information, template matching aims to extract the centroids of matched windows as a
feature point. This process is not true geometric registration,
but it constitutes an important step. Here, we introduce areabased registration. The well-known core of this technique is
the similarity metric, which has been researched in terms of
spatial- and frequency-domain approaches [16], [24], [25].
SPATIAL-DOMAIN APPROACH
Spatial-domain techniques directly employ intensity difference
and statistical information of all pixels, without any image
transformation. These methods generally come at the problem from one of two perspectives, namely, the correlation-like
technique or the mutual information (MI) algorithm.
CORRELATION-LIKE SIMILARITY METRIC
This technique determines the spatial alignment of images by
directly comparing the similarity of corresponding pixels. It is
vulnerable to intensity changes, which may be introduced, for
instance, by noise, thick or thin clouds, and differences in the
photosensitive components of various sensors. As a fundamental similarity metric, the cross-correlation (CC) algorithm
directly calculates the difference between corresponding pixels to iteratively register images until they have the largest CC,
which is useful for small rigid-body and affine transformation
[26], [27]. Many other correlation-like similarity metrics are
available, including the sequential similarity detection algorithm [28], correlation coefficient [29], [30], normalized CC
(NCC) [31]–[33], sum of squared differences [34], Hausdorff
distance [35], and other minimum distance criteria. NCC, in
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

particular, is very popular and widely applied due to its invariance to linear intensity variations [31], [36], [37]. Recently, the
centers of windows well-matched by NCC have been used as
feature points to solve transformation model parameters [38],
namely, image matching. Supposing t (R, S) to be the NCC
coefficient of matched windows, we calculate NCC as follows:
m#n

t (R, S) =

/ (R (i) - n R)(S (i) - n S)

i=1
m#n

m#n

i=1

/ (R (i) - n R) 2 / (S (i) - n S) 2

,(1)

where the predefined window consists of m # n pixels, R (i)
and S (i) denote specified positions in the windows of the
reference and sensed images, and n R and n S are the average
intensity values of a specified window. The algorithm was developed to generate tie points that resist complicated geometric deformation [31], [38], [39]; it has recently been integrated
with a novel feature descriptor [e.g., the local self-similarity
(LSS) descriptor] for robust feature extraction in multimodal
remote sensing image registration [36]. Although NCC is superior to the traditional correlation-like similarity metric, it is
unable to handle the nonlinear radiometric difference, which
is a common problem for correlation-like similarity metrics.
MI APPROACH
MI has appeared recently compared with correlation-like
techniques; it has been successfully applied to multispectral and multisensor image registration due to its robustness
against nonlinear radiation differences [40]–[43], which are
usually calculated by (2). The normalized MI (NMI) method is a measure that is independent of changes in the marginal entropies of two images in their region of overlap [44],
[45]. MI and NMI are the same type of statistical similarity
measurement, and both are prone to registration errors. Inspired by these approaches, the region–MI approach was developed [46] with consideration of structural information.

Reference Image

Intensity
Information
Gradient
Information

Sensed Image

Geometrical
Feature

Image
Preprocessing

Transformation Model
Construction

Advanced
Feature

Displacement Field

Coordinate
Transformation/Resampling
Aligned Image

FIGURE 5. General geometrical registration.

123

where H (R) and H (S) are the Shannon entropies of the reference and sensed images, respectively; H (R, S) represents the
mutual entropy; P (r) and P (s) are the marginal probability
distributions of R and S; and P (r, s) is the joint probability
distribution that is calculated, in practice, by 2D histogram
binning as the discrete random variables. Additionally, there
is an MI registration based on displacement maps, which is
similar to optical flow estimation. In this variational framework, MI is employed as the similarity metric for displacement calculation [47]. Overall, the MI-like algorithms originating from information theory are a measure of the statistical

Furthermore, rotationally invariant regional MI considers not
only the spatial information but also the influence that local
gray variations and rotation changes have on the computation of the probability density function [45]:
MI (R, S) = H (R) + H (S) - H (R, S),
H (R) = - / P (r) log 2 P (r),
r!R

H (S) = - / P (s) log 2 P (s),

s!S

H (R, S) = -

P (r, s) log 2 P (r, s),

(2)

r ! R, s ! S

(a)

(d)

(g)

(c)

(b)

(f)

(e)

(h)

(i)

FIGURE 6. Conventional area-based registration. Pay attention to how the black-edge region changes in the lower- and upper-left corners

of the aligned image. (a) The aligned images overlapping. (b) The sensed image. (c) The original images overlapping. (d) The fifth iteration.
(e) The reference image. (f) The first iteration. (g) The fourth iteration. (h) The third iteration. (i) The second iteration.
124

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

dependence between two data sets and particularly suitable
for registration with different imaging mechanisms. However,
they are computationally expensive, which may be restrictive,
as remote sensing images are always relatively large.

or the overlap between two scenes inevitably reduces their robustness [16], [25]. Overall, intensity-based approaches directly use the pixel value of an image, without error accumulation,
offering high-precision registration. However, these algorithms
have limitations in terms of large rotations, translations, scale
differences, and so on and are quite time-consuming.

FREQUENCY-DOMAIN APPROACHES
Frequency-domain approaches indirectly utilize intensity inOPTICAL FLOW ESTIMATION
formation, transforming an image and exploiting its frequenSimilar to the area-based approaches, optical flow estimation
cy-domain features for registration. By so doing, they accelcalculates object motions with direct and indirect consistency
erate the computational speed of relatively small geometric
constraints based on pixel intensity. This technique is popudislocations. Fourier techniques are typical representations
lar in computer vision for motion estimation. Owing to the
of frequency-domain registration, which were first used to
similarity between the displacements of corresponding pixels
register images with translational changes [48]. Phase-based
under the same coordinate system and the optical flow of an
correlation approaches [23], [49]–[51] exploit the Fourier
object, some studies have utilized optical flow estimation to
transform to register images by searching for global optimal
register remote sensing images [60], [61]. Unlike area-based
matching [53]; they compute the cross-power spectra of the
approaches, optical flow estimation calculates pixel displacesensed and reference images and seek the location of the
ment based on intensity and gradient consistency constraints
peak. The translational and rotational properties of the Foufor coordinate recalculation. After resampling, the intensity
rier transform are employed to calculate the transformation
value is assigned to the new noninteger position, and the
parameters [53]. Frequency domain approaches are robust
aligned image is acquired [62], as summarized in Figure 7.
against frequency-dependent noise and illumination changOptical flow is a 2D displacement field that describes the
es. They also contribute to the acceleration of computational
apparent motion of brightness patterns between two succesefficiency [54] since they neither involve feature extraction,
sive images [63], and its concept was proposed by Gibson
as feature-based approaches do, nor require an optimization
approach in the spatial domain, which would increase their
[64]. Horn and Schunck (HS) [63] and Lucas and Kanada
computational complexity [53]. However, given that the Fou(LK) [65] proposed a differential approach for optical flow
rier transform offers poor spatial localization, the operation
calculation in 1981. Since then, many extensions and modican be replaced by a wavelet transform with strong spatial and
fications have been proposed for video image processing
frequency localization [55], which can be applied to remote
[66]–[68]. Given that the process is at the initial stage of desensing image registration [56]. Recently, phase congruency
velopment in the remote sensing field and that many stud(PC) has been used to represent structural information in
ies have focused on differential techniques, the following
remote sensing images; it is similar to
the image gradient but is invariant in
terms of image contrast and brightReference Image
Sensed Image
ness variations [57], [58].
In short, most correlation-like approaches are statistical similarity metrices that do not facilitate structural
Displacement
information or high computational
Calculation
complexity. Owing to their easy hardware implementations, they remain
in frequent use for registration evaluation [59]. Fourier techniques have
some advantages in terms of comPixels (Assumption)
putational sufficiency, and they are
u1 = u1′ + ∆u1
robust against frequency-dependent
noise. However, they have limitations
v1 = v1′ + ∆v1
u = u ′ + ∆u
in the case of image pairs with signifiCoordinate
v = v ′ + ∆v
cantly different spectral content. AlTransformation
un = un′ + ∆un
though MI methods offer outstandvn = vn′ + ∆vn
ing performance compared with the
two aforementioned algorithms,
they do not always provide a global
FIGURE 7. Optical flow estimation for remote sensing image registration. [(u i , v j ) indicates the
maximum of the entire search space
pixel coordinates in the reference image, and (uli , v lj ) indicates the coordinates of the correfor the transformation, as images
sponding pixel in the sensed image. The coordinate difference, which we called the displacecontaining insufficient information
ment, is depicted as (Tu i , Tv j ).]
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

125

aspects are generally emphasized in research on remote sensing image registration.
DENSE OPTICAL FLOW ESTIMATION
The differential method for dense optical flow calculation
proposed by HS is generally called the typical global approach
[63]. Dense optical flow calculates each pixel’s motion in a
scene, as in Figure 8. The regular grid represents image pixels,
and the displacement is displayed at equal intervals, where
only the displacement directions and magnitudes of the green
pixels are marked, for brevity. The HS optical flow integrates
the brightness constancy assumption and the global smoothness constraint to separately estimate the pixel motion in the
x and y directions. The intensity value constancy assumption
is markedly susceptible to slight brightness changes [69],
which are inevitable for remote sensing images. Applying the
spatial gradient constancy assumption to the HS equation [as
in (3)] is popular in research on multitemporal remote sensing image registration [62], [69]:
E (u, v) =

#X } (; I (x + w) - I (x) ;2 + c ; dI (x + w) - dI (x) ;2) dx
(3)
+ a # } (; d 3 u ;2 + ; d 3 v ;2) dx,
X

where w = (u, v, 1) T is the pixel displacement to be solved,
X = (x, y, t) T is a pixel coordinate, } (s 2) = s 2 + f 2 is an
increasing concave function, and f is a fixed value. Here, a
and c are the weights for the gradient and smoothness terms,
respectively, and d 3 = (2 x, 2 y, 2 t) T indicates a spatiotemporal
smoothness assumption and is often replaced by the spatial
gradient when used for remote sensing image registration.
Owing to the advantages of the per-pixel computation of
optical flow estimation, very local deformation due to terrain
elevations can be eliminated. Occlusion remains a challenge
for accurate dense optical flow calculation [66], which is similar to land use (LU) and land cover (LC) changes in remote

Pixels
FIGURE 8. Dense optical flow.

126

Optical Flow

sensing images [62]. Under this circumstance, an object in the
reference (sensed) image cannot be sought in the sensed (reference) image. For example, in the yellow, rounded rectangles
in Figure 9(a) and (b), a road disappears in the sensed image. This leads to further abnormal pixel displacement, in
Figure 9(c), where the magnitudes and directions of the displacements are inconsistent with the neighborhood. The successive abnormal displacements further change the content of
the aligned image, although it is highly geometrically aligned
with the reference image in Figure 9(d). This change opposes
the principle of image registration in that it does not alter the
image content but spatially aligns the sensed and reference
images. After the abnormal displacement correction, the recalculated displacement is similar to that of the surrounding
region, as in Figure 9(e). Furthermore, the aligned image is
similar to the corresponding region in the sensed image in
Figure 9(b), and the two are spatially aligned with the reference image, as in Figure 9(f).
For large-scale movements, which are another concern
when applying optical flow for remote sensing image registration, an improved approach was proposed in [70]. The pixel
displacement calculated by the extended phase correlation
technique is determined as the initial motion estimator for
the global optical flow to achieve general remote sensing image registration, especially for large-scale movement deformation [70]. However, given that dense optical flow estimation
calculates the displacement for each pixel, it is unavailable for
the real-time registration of large images, although it provides
a high-precision result.
SPARSE OPTICAL FLOW ESTIMATION
Sparse optical flow estimation is more popular for remote
sensing image registration than its dense counterpart is. The
sparse optical flow represented by the local difference may
be supported in a specified local region, such as the position
of the feature points extracted by popular extractors, including the scale-invariant feature transform (SIFT), as shown
in Figure 10. This approach assumes that pixel motions are
identical within a local neighborhood and estimates the optical flow by performing least-squares regression with a set of
similar equations [66]. The LK gradient-based approach [65],
as the origin, is widely used to estimate the motion of video
images, on an equal footing with the HS model. The GeFOLKI algorithm was developed from LK and implemented
on a graphics processing unit to achieve real-time and robust
optical flow estimation [60], [71]. Furthermore, the GeFOLKI
algorithm is adopted for the coregistration of heterogeneous
data, such as synthetic aperture radar (SAR) lidar images and
SAR optical images [61]. Subsequently, given the different
imaging mechanisms of SAR and high-resolution optical images, which benefit from the high registration precision of
optical flow estimation, two dense feature descriptors replace
raw intensities when aligning images by an optical-to-SAR
flow; this combines the global and local optical flow estimation approaches [72]. Sparse optical flow based on specified
and distinct pixels is computationally time saving, whereas
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

(a)

(b)

(c)

(d)

(e)

(f)

FIGURE 9. Abnormal displacement detection and correction. (a) The reference image. (b) The sensed image. (c) The displacement field

estimated by (3). (d) The aligned image overlapping (a). (e) The corrected displacement field. (f) The aligned image formed by overlapping
the corrected optical flow with (a) The highlighted road in (a) disappears in (b), leading to similar occlusion.
its accuracy for remote sensing image registration is relatively
low compared with the dense optical flow approach. In addition, it is not vulnerable to LU–LC changes because it does
not have similar features for sparse optical flow estimation in
the changed region.
In summary, optical flow estimation has been developed
in computer vision for motion estimation in superresolution
reconstruction for several decades, whereas it is in the initial
stage of use in remote sensing image registration. Optical
flow estimation is a superior pixel displacement calculation
approach that is particularly interesting in the case of very local deformation due to, for example, terrain elevation, which
has considerable influence on high-resolution image registration [61]. The efficiency of optical flow estimation should be
considered when applying it to remote sensing because a wide
field of view (WFV) is a characteristic of remote sensing image.
Therefore, due to social development and seasonal changes,
LU–LC changes are frequent phenomena for multitemporal
remote sensing images. The dense optical flow approach is
sensitive to such changes, leading to abnormal displacement
and the alteration of the content of an aligned image. Therefore, efficient and accurate correction should be integrated into
the initial optical flow estimation when used for registration.

and automatically detected to represent the original remote
sensing image. The feature correspondence is then established
between the reference and sensed images by a similarity comparison of the feature descriptors. The geometric relationship
is calculated, guiding a sensed image that is spatially aligned
with the reference. Ultimately, coordinates in the sensed image are transformed. The transformed coordinates are usually noninteger, and they are calculated by interpolation to
acquire their intensity values, as demonstrated in Figure 11. In
the following, we summarize geometrical feature extraction
and matching because research into this subject has been at
the core of the traditional feature-based approach.
FEATURE EXTRACTION
The feature extraction mentioned here is a representation of
feature detection and extraction. Detection aims to locate distinctive features in an image and determine their positions.
In the feature-extraction stage, the recognizable descriptor is
uniquely constructed, identifying the detected feature. Formerly, features were manually selected. This approach is still
in use today, as in the “image-to-image registration” module
in Environment for Visualizing Images (ENVI) software. Experts require a considerable amount of time for this approach,

FEATURE-BASED REGISTRATION
The feature-based approach directly exploits the abstract
features of an image, rather than the pixel intensity, for registration. Feature refers to a distinct geometrical or advanced
characteristic extracted by a specified approach. Geometrical
features are distinct points, line segments, and closed boundary regions in a remote sensing image that can be detected
or extracted by extant or novel approaches. Advanced features
are abstract descriptions of local regions, which are extracted
by a neural network (NN) (especially in the deep learning approach) to represent the original image. Geometric features
are understood as being conventional for feature-based registration, and the use of advanced features is defined as novel
feature-based registration.
CONVENTIONAL FEATURE-BASED METHOD
In general, salient and distinctive features, such as points,
line segments, and closed boundary regions, are manually
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

Pixels

Feature Points

Optical Flow

FIGURE 10. Sparse optical flow.

127

especially for large remote sensing images. At present, many
methods have been proposed to automatically acquire representative features. Common geometrical features, including
salient points (line intersections, corners, points on curves
with high curvature, and road crossings) [73], [74], polylines
(roads, contours, and edges) [41], [75], and polygons (closed
boundary regions and lakes) [76], are selected by the specified approach. As shown in Figure 11, the yellow points, line
segments, and regions are detected to abstractly describe the
original image.
FEATURE POINTS
The local points at which the gray value varies dramatically in
all directions are feature points, including corner points, inflection points, and T-intersection points. Many attempts have
been made to extract them in computer vision, inspiring the
development of feature point extraction in remote sensing. The
first corner detection approach was proposed by Moravec in
1977 [77]. This algorithm has fast computation but is sensitive
to noise and vulnerable to image rotation, leading to its rare
use in the remote sensing field. The Harris corner detector was
proposed in 1988 [78]. This algorithm is invariant under grayscale and rotational changes. It and improved Harris algorithms
are applied to remote sensing image processing [38], [74], [79],
[80], mainly with respect to multiscale corner detection.
Smith and Brady presented the smallest unvalued segment
assimilating nucleus operator [81], which is insensitive to local noise and has high anti-interference ability [82]. However,
it is not widely used in remote sensing image registration [83],
whereas the SIFT algorithm is [45], [58], [74], [84]–[90]. The
SIFT was developed by Lowe [92] and is invariant under rotation, scale, and translational changes [93]. It has been followed
by many improved versions, such as principal component
analysis SIFT [94], scale-restriction (SR) SIFT [36], [95], affine

SIFT [96], and uniform robust SIFT [97], [98]. Moreover, the
speeded-up robust features (SURF) [99] algorithm was proposed, by Bay et al. to overcome the time-consuming nature
of the SIFT for large-scale remote sensing images [100]–[102].
SURF applies an integral image to compute image derivations
and quantifies the gradient orientations in a small number of
histogram bins [103]. Additionally, the features from accelerated segment test (FAST) [104]; binary, robust, independent
elementary features (BRIEF) [105]; oriented FAST and rotated
BRIEF [106], [107]; Kaze [108]; and accelerated Kaze [109] algorithms are fast tools for descriptor construction but are less
widely utilized in remote sensing.
In addition, a novel key point detector combining corners
and blobs for remote sensing image registration is under development to increase the number of correctly matched features
[110]. Recently, looking at intensity differences in multimodal
remote sensing images, robust and novel feature descriptors
have been adopted to depict detected feature points; these include the LSS descriptor, which accommodates effects such
as nonlinear intensity differences [36]; the histogram of oriented PC, based on structural similarity measures [57]; and
maximally stable PC, representing a novel affine and contrastinvariant descriptor [111]. All these coincidentally absorb PC
information. PC is similar to the image gradient, presenting
structural information with resistance to variations in illumination [112]. Therefore, the use of phase consistency information is a trend in the construction of robust feature descriptors
for multimodal remote sensing image.

FEATURE LINES
A feature line is also known as a line feature; it is the generalization of feature points, such as general line segments [113],
object contours [75], roads, coast lines [114], and rivers [115].
Given that feature lines have more attributes than feature
points as control features [116], they
have been gradually developed for use
in image registration [117] as well as
remote sensing image registration
[116], [118], [119]. Standard edge detection, as with the Canny detector
[120], [121] and detectors based on
the Laplacian of Gaussian [122] are
conventional feature line detection
approaches [16]. Recently, some excellent detectors generating precise
Feature Point
Feature Line
Feature Region
and robust line segments have been
proposed [123], [124], and they are
suitable for line detection in remote
sensing images. Feature lines are comFeature Matching
Mismatched Features Elimination
paratively less utilized in the remote
sensing field than are feature points
because matching them is an obstaCoordinates
Transformation Model
Aligned
cle. They are often abstracted from
Transformation/Resampling
Construction
Image
corners, midpoints, and endpoints
as final features [16], thereby losing
FIGURE 11. The geometrical feature-based registration algorithm.
their geometric value.

128

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

FEATURE REGION
Feature region is a general term for all closed boundary regions
of appropriate size, e.g., lakes [125], forests [126], buildings
[113], urban areas [127], and so on. Before the robust feature
point extraction approach was developed, the feature region
was used to indirectly extract feature points. Regions with high
contrast were extracted by filtering [128] and image segmentation [129] and described with moment-invariant descriptors [130], [131]. They are often abstracted by their centers of
gravity [128], [132]–[135], which are invariant with respect to
rotation, scaling, and skewing and are stable under random
noise and gray-level variation [16]. Compared with feature
points and lines, the extraction and description of feature regions were relatively early foci of research, and they have been
used less for recent feature-based registration.
FEATURE MATCHING AND MISMATCHED
FEATURE ELIMINATION
The correspondence relationship between reference and
sensed images can be established based on detected feature
points, lines, and regions, exploiting various descriptors of
features [16], [136], [137]. Mismatched features are an inevitable byproduct of general feature matching, the elimination
of which purifies correspondences for generating transformation models that are as accurate as possible. A pair of features
with similar attributes is considered a selectable matching despite radiometric differences, noise, image distortion, and so
forth. Under the circumstances, a robust matching measurement is essential. Feature matching approaches can be generally classified into two categories, namely, feature similarity
and spatial relations.
FEATURE SIMILARITY
The constructed feature descriptors are used to establish the
correspondence between extracted features in the reference
and sensed images through feature similarity comparison.
Feature similarity is conducted in the feature space by using
the Euclidean distance ratio between the first and second
nearest neighbors [92]. For efficiency, the k-dimensional tree
and the best-bin-first algorithms are employed for feature
similarity determination [93], [138]. The clustering technique
[140], chamfer matching [141], and PC models are frequently
used matching approaches, and they are invariant under intensity changes during matching [1].
SPATIAL RELATIONS
Aimed at tie point matching in poor textural regions, approaches based on spatial relations have been developed.
Representative of these, graph-based feature points matching
considers feature points as graph nodes. Feature matching is
then transformed into a node-correspondence problem and
solved by graph matching [125], [142]. Graph matching is
applied to image feature correspondences, although it is not
affine invariant [143]. By finding a consensus nearest-neighbor graph from candidate matches, a graph-transformation
matching approach is developed [144]. Targeting the problem
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

in [143], a similar graph matching for tie point matching in
poor textural images is proposed [101]. Furthermore, Xiong
and Zhang introduced a novel interest point matching for
high-resolution satellite images [145]. For this, the relative
position and angle are used to reduce ambiguity and to avoid
false matching, as the approach is suitable for image shifting
and rotation. Affine and large-scale transformations are not
considered [144].
MISMATCHED FEATURE ELIMINATION
Although the extracted features in a reference image have
been matched with the corresponding ones in the sensed
image via the aforementioned approach, some mismatched
feature points are inevitable, further affecting the transformation model estimation [32], [76]. Therefore, eliminating
mismatched features with a specified approach is necessary
[146], [147]. Generally, based on the initial matching result,
random sample consensus (RANSAC) is used to remove a
mismatched point. This method randomly selects a sample
from the consensus set in each iteration and finds the largest
consensus set to calculate the final model parameters [33],
[148]. RANSAC performs well and robustly when there are no
more than 50% outliers [144], [149], [150]. Combining the
local structure with global information, a restricted spatial order constraints algorithm is developed to find exact matched
feature points in reference and sensed images [144].
Based on the affine-invariance property of the triangle-area
representation (TAR), a robust sample consensus judging algorithm is proposed to efficiently identify bad samples and
ensure accuracy with a light computational load [151]. For
images with simple patterns, large affine transformations, and
low overlapping areas, a mismatch- removal principle based
on the TAR value of the k-nearest neighbors is proposed and
referred to as k-nearest neighbors–TAR [149]. Furthermore,
an improved RANSAC approach called fast sample consensus
is developed to obtain correct matching in a few iterations
[150], [152]. Thus, most of the reserved feature points in
the reference image accurately correspond to the specified
feature points in the sensed image, as the feature points
connected by the yellow lines in Figure 12 will add precision to the transformation model estimation in the following step. The geometrical feature-based approach abstracts
an original remote sensing image with distinct features
instead of its intensity information, which is efficient and
can easily process large rotations, translations, and scale
differences between reference and sensed images. However,
position errors in the automatically extracted features are
inevitable, and a few mismatched features cannot be eliminated. This leads to a relatively low registration precision
compared with the intensity-based approach.
NOVEL FEATURE-BASED REGISTRATION
BY DEEP LEARNING
Deep learning provides a new concept for remote sensing
image registration. It essentially refers to image registration
based on advanced feature extraction [153]. Deep learning
129

originated in computer vision and has a long history [154].
In recent years, it has gradually entered use in remote sensing image applications, such as image fusion [155], [156],
LC classification [157], [158], and segmentation [159]. The
framework is data driven and can generate image features
by learning from many training data sets with a specified
principle [158]. Therefore, it is suitable for remote sensing
image registration.
Some studies have focused on feature matching for this
purpose [158], [160]. Most utilize a Siamese network consisting of two parts to train a deep NN (DNN) [161]–[164].
One part extracts features from image patch pairs by training
a Siamese, pseudo-Siamese, or improved Siamese network
[165]; the other part measures the similarity between these
features for image matching. In [164], the DNN inspired the
construction of a deep learning framework for remote sensing image registration. In addition, generative adversarial networks (GANs) are applied to image matching and registration
[166], [167]. These approaches first translate an image into
another one by training the GANs, enabling two images to
have similar intensities and feature information [166], [168].
Feature extraction and matching are subsequently performed
between two artificially generated images, effectively improving the performance of image matching. For the deficiencies
of specified-scale NNs, multitask learning is introduced to
improve the registration precision [169]. Wang et al. break
through the limitations of the traditional deep learning approach, which extracts image features in one network and
matches them with the other NN. They design an end-toend network using forward propagation and backward feedback to learn the mapping functions of the patches and their
matching labels for remote sensing image registration [164].
Recently, Li et al. paired image blocks from sensed and reference images and directly learned the displacement parameters
of four corners of the sensed block relative to the reference image on a deep learning regression network, which differs from
the traditional deep learning method [170].
Deep learning has advantages over the traditional registration approach. It is completely data driven and has strong
flexibility, enabling it to theoretically fit any complex mapping function, whereas the traditional registration method
can deal only with fixed pattern registration. Moreover, deep
learning extracts abstract and high-level semantic information. Compared with low-level gray and gradient data, deep

FIGURE 12. Feature matching examples.

130

semantic information is more consistent with the way humans understand images. Therefore, deep learning methods
can extract robust features. However, deep learning has challenges. It highly depends on image samples; when there is a
lack of data or the data quality is poor, deep learning methods have difficulty ensuring the effectiveness of the registration results. Although remote sensing images are now easy to
acquire, the lack of manual annotation and standard data is
still very serious. Deep learning, in essence, learns the statistical characteristics of a large number of similar images, but
its input–output process is a complex, nonlinear mapping
without clear physical significance. Additionally, deep learning requires high computing power and has major hardware
requirements, limiting its applicability.
In short, remote sensing image registration based on deep
learning is still in its infancy, and its registration framework is
not mature. However, many studies have demonstrated that
deep learning methods can achieve or even surpass the optimal
level of traditional registration approaches in terms of accuracy
and efficiency. We predict that deep learning-based methods
will become important solutions to the problem of real-time
and high-precision remote sensing image registration.
REGISTRATION BASED ON
THE COMBINATION METHOD
As mentioned, feature- and intensity-based approaches have
their own advantages. Different feature extractors also have
various precisions. To integrate these strengths as fully as possible, combination techniques have been developed. Typically, popular combinations consist of two aspects, namely,
feature- and area-based approaches; however, some integrate
two geometric feature-based approaches, such as the SIFT and
Harris detectors.
COMBINATIONS OF FEATURE- AND
AREA-BASED ALGORITHMS
Feature-based approaches are typically suitable for images
with more significant structural data than intensity information. However, they are restricted by the distribution and
accuracy of the features. On the other hand, area-based approaches are appropriate for images with more distinctive
intensity information; however, they require the intensity
information of the reference and sensed images to be correlated. Thus, the two methods have complementary pros and
cons. To further improve registration accuracy and robustness,
some studies focus on a combination of geometric featureand area-based techniques [171]. Huang et al. [172] proposed
a hybrid approach to aligning images by intensities within
a scale-invariant feature region. Elsewhere, a wavelet-based
feature extraction technique and an area-based method with
NCC were combined to reduce the local distortion caused by
terrain relief [173]. In a wavelet-based hierarchical pyramid
framework, Mekky et al. [174] proposed a hybrid approach
using MI and the SIFT; employing the rough registration parameters of the area-based approach for MI, the number of
false alarms obtained by the SIFT was reduced. In addition,
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

Gong et al. employed the robustness of the SIFT and the accuracy of MI, proposing a novel coarse-to-fine registration
framework aimed at registering optical and SAR remote sensing images [90]. For multisensor SAR image registration, Suri
et al. proposed a multistage registration strategy. The rough
parameters of the transformation model are estimated by MI,
and this model is introduced during the SIFT matching phase
to increase the number of tie points [175]. Under the SIFT
and MI combinations, Heo et al. introduced a stereo matching method that produces accurate depth maps [176]. All
these approaches can be considered coarse-to-fine-processing
chains. The basic idea is to improve the result of the featurebased approach by adopting an optimization process from an
area-based technique [90], [171].
The combined methods integrate the robustness of the
feature-based algorithm with the accuracy of the area-based
approach. They are relatively few compared with individual
methods, but their combination will be the focus in the near
future, from our point of view. To deal with the possible accumulation of errors, bundle block adjustment is usually needed [178], [179] to register sequential images. Moreover, the
integration of different geometric feature-based approaches is
being developed, as well, for ever-increasing transformation
model estimation accuracy, generating precise registration results to the greatest extent possible.
INTEGRATION OF TWO GEOMETRIC
FEATURE-BASED APPROACHES
In addition to combinations of feature- and area-based techniques, the integration of two geometric feature-based approaches is a developing trend for high-precision registration.
In particular, the feature points extracted by different methods
are used to register images in two stages. Yu et al. proposed
to extract feature points using the SIFT for the preregistration
of Satellite Pour l’Observation de la Terre-5/Thematic Mapper/
Quickbird images from different sensors [74]. In the fine registration stage, the Harris algorithm for corner point detection is
enforced to detect the distinct corner, and the extracted point
is matched by the NCC algorithm. Similarly, Lee used SURF
to extract the feature point of a low-resolution image after
Harr wavelet transformation, which is defined as rough registration [180]. Fine registration is the same as the approach
proposed by Yu et al. Recently, Ye et al. utilized SR–SIFT to
extract the feature point in the preregistration stage for distinct
translation, rotation, and scale difference elimination.
To further optimize registration, the Harris algorithm was
employed to detect feature points in the reference and prealigned images and describe them by LSS for matching [36].
To register large, high-resolution remote sensing images, a
coarse-to-fine strategy combining the Harris–Laplace detector with the SIFT descriptor has been proposed. After rough
registration, a large image is divided into small, processable
blocks for fine alignment [181]. Additionally, in a new twostep registration, the approximate spatial relationship is calculated with the deep features using a convolutional NN in the
first step. Then, the previous result is adjusted based on the
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

extracted local features [182]. Another technique combines
feature point and feature line methods for the registration
of images covering low-texture scenes in the computer vision field [183]. Since low- and repeated-texture regions are
common in remote sensing images, feature lines can be employed to supplement the number of feature points. Therefore, beside the combination of two geometric feature-based
methods, the integration of different geometrical features has
great potential for the high-precision alignment of remote
sensing images [22].
Since combination schemes integrate the advantages of
two or more registration approaches, they offer remarkable
precision. Moreover, in general, preregistration provides a
rough result that approximates the final alignment. With finetuning in the optimized registration stage, a high-precision
registration result is finally acquired. This algorithm is suitable for remote sensing image registration with large spatial
position differences. It is as time-consuming as two or more
alignment strategies.
SOFTWARE-BASED REGISTRATION
Most reviews emphasize the ever-increasing number of image registration approaches that are improved on the basis
of existing methods for registering larger and more complicated images [16], [184]. Few studies have evaluated the
performance of software-embedded image registration modules and the packages/tools for image geometric registration
[185]. Thus, in this section, we present some examples.
The Earth Resources Data Analysis System (ERDAS), ENVI,
PCI Geomatica, ER Mapper, and Arc Geographic Information
System (GIS) are well-known software packages for remote
sensing image processing that include registration modules.
ER Mapper was acquired by ERDAS a few years ago. They
integrate conventional manual and automatic registration
programs. Concretely, ENVI could register two remote sensing images or align one image with a map covering the same
scene. A user can extract tie points by observing similar objects
lying on two images, such as corners of buildings, road intersections, inflection points of rivers, and so on. With a uniform
point distribution, the parameters of a specified transformation model can be estimated. There are some general geometric mapping functions, including affine, polynomial, and
triangulation transformation models. Geometric mapping is
generally conducted by an expert and is time-consuming and
tedious. It is difficult to avoid subjective factors while extracting tie points, especially when registering WFV images that
require more time than general image registration. To liberate
the productive forces and improve the registration efficiency,
the automatic alignment technique is also put into ENVI.
We should point out the reference and sensed images, respectively. After setting the area-based matching parameters,
the tie point for transformation model construction is automatically extracted; soon, the aligned image is obtained. Neither the manually extracted tie point nor the automatically acquired point in ENVI is sufficiently accurate. For example, the
coordinates of the extracted feature point are (157.05, 171),
131

which may suggest the neighborhood of the real corner.
Under this circumstance, the calculated geometric spatial relationship is not as precise as it could be. The obtained registration result is usually worse than expected, especially for
high-resolution remote sensing images with inconsistent local deformation.
ERDAS was developed by the ERDAS Corporation, in the
United States. Compared with ENVI, it can produce tie points
with higher location accuracies [for instance, the coordinate
of the extracted feature point is (385.776, 75.161), which has
more decimal places] to generate precise mapping functions
between reference and sensed images that approximate real
geometric relations. Additionally, there are abundant transformation models, such as linear rubber sheeting, nonlinear
rubber sheeting, and the direct linear transform. Elevation
data are introduced into the registration to generate the highprecision alignment of mountainous remote sensing images,
even using the digital terrain model (DTM). Furthermore, the
region and interval of the selected tie point can be set manually in the “AutoSync” module. To acquire a high-precision
registration result, the elevation data (DEM or DTM) should
be input at the same time as the image to be registered. If
higher-spatial-resolution elevation data were included in ERDAS, the corresponding information would be automatically
extracted when an image’s geographic information was identified to register the input image.
Image registration can also be conducted in ArcGIS, although most researchers would probably utilize the software
to solve problems with the GIS, such as spatial analysis. PCI
Geomatica prefers to produce orthophoto and fusion images, rather than registering remote sensing images. However,
both ArcGIS and PCI Geomatica contain an image registration module. The steps for alignment processing are similar
to those for the aforementioned software, including manual
registration and automatic operation. Some different transformation models, such as spline, similarity polynomial, and
projective transformations, are used to achieve the high-precision registration of complicated remote sensing images. However, sometimes the result is unsatisfactory for further applications, as the tie points are not uniformly distributed and their
number is small. Pixel Information Expert is a new generation
of remote sensing image processing software that was developed by Beijing Aerospace Hongtu Information Technology.
It can handle the dislocation of multisource, heterogeneous
remote sensing images since it integrates a novel algorithm
with a focus on multimodal remote sensing image registration. It can be tested free for 30 days. In addition, copyrighted
geometric registration software, such as the Hyperspectral Image Processing and Analysis System, GeoImager, Titan Image,
and so forth, were generated by the Institute of Remote Sensing, Chinese Academy of Sciences.
Because high-resolution image registration is an important
task in remote sensing image processing, much emphasis has
been placed on it. To extract dense tie points representing local geometric relationships, SURF and an adaptive binning
SIFT descriptor have been combined [186]. With the guidance
132

of the local transformation model, an accurate registration result is obtained. The MATLAB code for the algorithm is provided, with experimental data, at https://www.researchgate
.net/publication/320354469_HRImReg. The code is encrypted, and the parameters cannot be adjusted. It can be used
only for comparative experiments to evaluate a proposed
approach. When doing simulation experiments to assess a
feature point detector or to evaluate a mismatched elimination approach with real data, the progressive sparse spatial
consensus algorithm can be employed [187]. The code, with
experimental data, is publicly available at https://github.com/
jiayi-ma?tab=repositories. It has been tested on photographs
from the computer vision field. To apply it to remote sensing
images, some improvements are needed. Beyond these, there
are many commercial and open-source software packages/
tools for geometric registration. There are also different points
of view, which should be discussed in depth in the future as
more resources become available. However, an evaluation of
registration approaches should be conducted, as well, whenever an aligned image is generated from software or a proposed method.
EVALUATION OF IMAGE REGISTRATION ACCURACY
For the spatial alignment of remote sensing images, it is highly desirable to provide users with an estimate of how accurate
the registration actually is. Accuracy evaluation is a nontrivial
problem that is present in all literature on remote sensing image registration. We have identified three aspects to measuring the registration accuracy on the basis of different considerations, including tie point identification, the transformation
model performance, and the alignment error. In this section,
we review basic approaches for alignment assessment.
ACCURACY OF TIE POINTS
The quality and quantity of tie points are important to guarantee high-precision image registration. The number of redundant tie points, in addition to the elementary computation of
the specified transformation model, is essential information
since we generally use as many tie points as possible to calculate the parameters of the mapping function for alignment.
Furthermore, we must allow for a residual (Tx i, Ty i) for the
ith extracted feature point compared with the origin of the
image [188]. If there are N tie points, the root-mean-square
error (RMSE) can be estimated as follows:
RMS tp =

1
((Tx i) 2 + (Ty i) 2) .(4)
N i/
=1

To enable general comparison, the RMSE should be computed across the normalized (to the pixel size) residuals. Additionally, the bad point proportion should be calculated to
evaluate the extracted feature point. This is the number of residuals that lie above a certain threshold multiplied by the ellipse formed by the pixel size. Besides the mentioned criteria,
the distribution of tie points is attracting increased attention.
To design a uniform distribution of tie points, some papers
have proposed to extract feature points within a specified
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

subregion [30]. A detection approach is employed to extract
the specified number of feature points. Tie points affect the
registration accuracy but are not the sole influencer.
TRANSFORMATION MODEL PERFORMANCE
The transformation model abstractly represents the geometric
mapping function from a sensed image to a reference image.
The actual between-image geometric distortion is difficult to
obtain without prior information, and the estimated transformation approximates the real geometric relationship between images. One part of the N pairs of tie points is taken
for mapping function estimation through the least-squares
method, assuming N matched feature points. The left part
in the sensed image is employed as the test point to be transformed into the reference image system [188]. The distance
between the transformed coordinate and the corresponding
point in the reference image is calculated as the residual, the
mean of which is a representation of the estimated transformation model:
RMS N - te =

1
N-T

N-T

/ ((x - Hx') 2 + (y - Hy') 2),(5)

j=1

where H denotes the estimated transformation model by T
pairs of tie points, (x, y) and (x', y'), which represent the corresponding points in the reference and sensed images, respectively. Furthermore, a | 2 goodness-of-fit test may be applied [188]
to analyze whether the residuals are equally distributed across
all quadrants. However, “overfitting” may yield zero error for
a mapping model with sufficient degrees of freedom; this is a
well-known phenomenon in numerical analysis. Under this
circumstance, the registration results may not be optimal.
ALIGNMENT ERROR
The oldest method for estimating registration accuracy is visual assessment by a domain expert, which is still in use and
remains the most effective technique, although it cannot be
quantified [16], [188]. At present, this is performed using
professional software, such as ENVI and ArcGIS, with shutter tools. Similarity metrics in area-based registration, such as
MI, NMI, CC, and so on, are frequently employed to evaluate
alignment accuracy [59]. The indicators are easily influenced
changes in the information with development and differences
in radiation. To quantitatively present the alignment error, the
RMSE is calculated using feature points manually extracted by
a specialist employing (4) [85]. Since image registration aims
to achieve the relative spatial alignment of two different images, there is no gold standard reference image with which to
evaluate the registration accuracy. When evaluating outcomes
according to at least three criteria, the most indicative results
point to the best registration, as different assessments have
their own advantages and disadvantages.
FUTURE TRENDS
There has been a large number of independent studies on remote sensing image registration, and much effort has been
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

put into constructing robust feature descriptors and eliminating mismatched features. With the development of sensor technology and application requirements, some novel
opportunities and challenges must be addressed for remote sensing image registration. To us, it seems likely that
the future of this field will include accelerated, combined,
heterogenous, cross-scale, and smart remote sensing image registration techniques, which are introduced in detail
in the following.
ACCELERATED REMOTE SENSING
IMAGE REGISTRATION
With the ongoing development of sensor technology, the
spatial resolution of remote sensing images increases, resulting in a growing number of features with distinctive
details. The huge number of features lengthens the distance to the real-time registration of remote sensing images, causing inefficiency when aligning large-scale images.
Thus, constructing descriptors and matching the detected
features is time-consuming for general images, especially
WFV ones. As proposed in [52], to achieve real-time registration to the greatest extent possible, remote sensing image registration can be operated on a cloud platform based
on finite-state chaotic compressed sensing theory. Similarly,
cloud computing [91] and some hardware systems may also
be effective for accelerating image registration. At present,
parallel computing [139] is the easiest path to implementation. Here, an image is divided into several subregions,
and the image features in each one are simultaneously extracted, based on the same principles, on different parallel
processors, as is the transformation model construction.
The parallel commands are easy to implement on MATLAB
and other platforms.
COMBINED APPROACHES FOR IMAGE REGISTRATION
With the development of imaging sensors, the resolution
of remote sensing images has increased, and local deformation has become obvious. For example, the geometric
distortion caused by terrain relief and high-rise buildings
leads to inaccurate registration [36], introducing difficulties for remote sensing image applications. The reference
and sensed images cover the plain and mountainous regions simultaneously in Figure 13(c). Calculating the displacements of corresponding pixels for spatial registration,
the enlarged displacements in the specified rectangular regions are shown in Figure 13(d) and (e). The magnitude
and direction of the displacements in the plain region are
similar, but they differ in the mountainous region. Here,
multistage registration with a global mapping function
cannot exactly describe the spatial relationship between
the reference and sensed images, and neither can the local
transformation model.
Given that displacements vary in different terrain regions,
dividing images into a series of regions and registering with
a specified approach may yield a high-precision alignment,
indicating a combination of different techniques. Concretely,
133

(a)

(b)
108° 45′ 0″ E

108° 50′ 0″ E

Elevation (m)
1,722

34° 0′ 0″ N

400

108° 45′ 0″ E

(d)

(c)

108° 50′ 0″ E

(e)

FIGURE 13. The spatial position of corresponding pixels in a remote sensing image of complex terrain. (a) The reference image. (b) The
sensed image. (c) The topographic image. (d) The displacements in the mountainous region marked with a yellow rectangle in (a) and (b).
(e) The displacements in the plain region marked with a red rectangle in (a) and (b).

134

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

this transformation model is calculated with distinct tie features in the plain region. With the transformation model,
rather than directly obtaining the aligned plain region, the displacement guiding pixels to alignment is estimated. In mountainous regions, the dense optical flow estimation borrowed
from computer vision is utilized to acquire the displacement
of each corresponding pixel. Then, the displacement fields
from different terrain regions are mosaicked (e.g., using the
inverse distance weighted function for uniform transitions in
image stitching) to obtain a seamless displacement field of
the entire image [177]. This is a creative combination of different registration approaches in a coordinated way, differing
from the combined approaches mentioned in the “Registration Based on the Combination Method” section with the
serial mode. Therefore, regional registration accommodating
complex geometric relationships that vary with terrain differences may become a significant trend in remote sensing image registration, giving full play to the registration advantages
of different approaches in various terrain regions.
HETEROGENOUS AND CROSS-SCALE
IMAGE REGISTRATION
Heterogenous and cross-scale images collected all at once
and at different times provide complementary information to
improve our understanding of an entire scene during Earth
observation or even during disaster rescues. However, such
data usually have dramatically different spatial resolutions,
intensities, noise, geometries, and so on, owing to different
imaging principles. Some studies have focused on spatial registration, including optical image and SAR registration, optical image and infrared image registration, and satellite image
and map registration [36], [57]. These works emphasized
the robust construction of descriptors to resist intensity and
noise differences and other influential factors. Large-scale
differences between cross-scale images (which are much
greater than four times the resolution difference between
the panchromatic and multispectral images) introduce difficulties for extracting geometrical features from low-resolution images that are similar to those from high-resolution
images. Thus, generating the tie features of cross-scale images for transformation model construction, even during
high-precision registration, is difficult. Additionally, highefficiency heterogenous and cross-scale image registration
remains an open problem that is worth researching in the
near future. For a concrete example, the approximately realtime registration of optical and SAR images may offer an approach for analyzing disaster regions as quickly as possible
for rescue purposes by means of registering and comparing
images before and after an event. These applications are vital
for rescue operations. Precise and efficient heterogeneous and
cross-scale image registration is a mandatory prerequisite for
high-precision, real-time applications.
SMART REMOTE SENSING IMAGE REGISTRATION
To register multiple remote sensing images, one simple and
conventional idea is to align them frame by frame, namely,
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

by converting multiple image registration into pair-to-pair
alignment. This process, learning from the simultaneous mosaicking of multiframe images, specifies a reference image
connected to others and stitches other images to the reference
one. Therefore, when images to be registered are read into the
program, the coordinates of the four corners in each image
are extracted. The reference image is determined by comparing these coordinates. As presented in Figure 14, images A,
B, C, and D are simultaneously aligned with the reference
image (marked in green) according to a general registration
strategy, as there is overlap between two images. Unlike frameto-frame approaches, this technique needs to specify only the
reference image, and the intermediate results do not output
and input many times, which saves memory and improves
computational efficiency. From our point of view, this is
smart registration, which is particularly useful for WFV-image
generation. However, when images overlap, a more intelligent
approach needs to be developed.
Moreover, images to be registered may have small overlapping areas. This overlap presents a challenge for high-accuracy
alignment because a small number of geometric and intensity features is available for constructing the transformation
model. This problem should be intelligently solved to register images with a low ratio of overlapping regions. Typically,
these images are used to produce WFV images by means of
stitching. Further solutions should be provided in the future.
Therefore, the large-scale, complex distortion of high-resolution, heterogenous, and cross-scale remote sensing images must
be a focus of future research. In this situation, the traditional
single-registration approach may not meet requirements. For
real time, high-precision registration, a combination of alignment approaches and high-performance computing is considered very promising.
CONCLUSIONS
In this article, we presented a comprehensive and quantitative
summary of intensity-based, feature-based, and combined approaches to remote sensing image registration. Conventional
methods and new applications of deep learning and optical

Reference Image

FIGURE 14. The spatial position of multiple images to be registered.

135

flow techniques were included. The performance of registration software packages and tools was analyzed. Additionally,
novel registration evaluations were presented to support an
effective assessment. The development of any approach aims
to improve registration accuracy as much as possible because
registration is an important step for preprocessing remote
sensing images. Several such techniques have been developed,
as recounted in this article.
However, as resolutions increase, the problem of inconsistent local distortion caused by high-rise buildings and
topographic relief has become apparent; this cannot be exactly described by the transformation model. Moreover, WFV
images are an emerging trend in satellite image production,
enabling a whole ROI to be contained within one image.
This poses a challenge for real-time registration and memory
for registration processing. Therefore, we believe that future
research on remote sensing image registration will use accelerated registration, combined approaches for remote sensing image registration, heterogeneous and cross-scale image
registration, and smart registration. Challenges remain, and
considerable additional research is required. We perform this
research with the advantage of lower entrance barriers than
the TDOM generation.
ACKNOWLEDGMENTS
The work was supported by the National Natural Science
Foundation of China (grants 41971303 and 41701394), Key
Research and Development Program of Shaanxi Province
(grant 2020NY-166), and Fundamental Research Funds for
the Central Universities (grant GK202103143). The authors
thank the editor-in-chief and associate editor of IEEE Geoscience and Remote Sensing Magazine as well as four anonymous
reviewers for their advice for strengthening their manuscript.
AUTHOR INFORMATION
Ruitao Feng (feng-rt@snnu.edu.cn) is with the School of
Geography and Tourism, Shaanxi Normal University, Xi’an,
710062, China.
Huanfeng Shen (shenhf@whu.edu.cn) is with the School
of Resource and Environment Science and the Collaborative
Innovation Center for Geospatial Technology, Wuhan University, Wuhan, 430072, China. He is a Senior Member of IEEE.
Jianjun Bai (bjj@snnu.edu.cn) is with the School of Geography and Tourism, Shaanxi Normal University, Xi’an,
710062, China.
Xinghua Li (lixinghua5540@whu.edu.cn) is with the
School of Remote Sensing and Information Engineering, Wuhan University, Wuhan, 430072, China. He is a Senior Member of IEEE.

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

REFERENCES
[1]

[2]

136

A. Wong and D. A. Clausi, “ARRSI: Automatic registration of
remote-sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 45,
no. 5, pp. 1483–1493, 2007. doi: 10.1109/TGRS.2007.892601.
X. Li, N. Hui, H. Shen, Y. Fu, and L. Zhang, “A robust mosaicking
procedure for high spatial resolution remote sensing images,”

[16]

[17]

ISPRS J. Photogram. Remote Sens., vol. 109, pp. 108–125, Nov.
2015. doi: 10.1016/j.isprsjprs.2015.09.009.
H. Shen, X. Meng, and L. Zhang, “An Integrated Framework
for the Spatio-Temporal-Spectral Fusion of Remote Sensing
Images,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 12, pp.
7135–7148, 2016. doi: 10.1109/TGRS.2016.2596290.
Y. Lu, P. Wu, X. Ma, and X. Li, “Detection and prediction of land
use/land cover change using spatiotemporal data fusion and the
Cellular Automata–Markov model,” Environ. Monitoring Assessment, vol. 191, no. 2, p. 68, 2019. doi: 10.1007/s10661-019-7200-2.
Z. Lv, T. Liu, C. Shi, J. A. Benediktsson, and H. Du, “Novel land
cover change detection method based on k-means clustering
and adaptive majority voting using bitemporal remote sensing
images,” IEEE Access, vol. 7, pp. 34,425–34,437, Jan. 2019. doi:
10.1109/ACCESS.2019.2892648.
C. Yuan, F. Wang, S. Wang, and Y. Zhou, “Accuracy evaluation of flood monitoring based on multiscale remote sensing
for different landscapes,” Geomatics, Natural Hazards Risk, vol.
10, no. 1, pp. 1389–1411, 2019. doi: 10.1080/19475705.2019.
1580224.
L. Yang and G. Cervone, “Analysis of remote sensing imagery for disaster assessment using deep learning: A case study
of flooding event,” Soft Comput., vol. 23, no. 24, pp. 13,393–
13,408, 2019. doi: 10.1007/s00500-019-03878-8.
K. Barbieux, “Pushbroom hyperspectral data orientation by
combining feature-based and area-based co-registration techniques,” Remote Sens., vol. 10, no. 4, p. 645, 2018. doi: 10.3390/
rs10040645.
Y. Jiang, J. Wang, L. Zhang, G. Zhang, X. Li, and J. Wu, “Geometric processing and accuracy verification of zhuhai-1 hyperspectral satellites,” Remote Sens., vol. 11, no. 9, p. 996, 2019. doi:
10.3390/rs11090996.
I. Aicardi, F. Nex, M. Gerke, and A. M. Lingua, “An image-based approach for the co-registration of multi-temporal UAV image datasets,” Remote Sens., vol. 8, no. 9, p. 779, 2016. doi: 10.3390/rs8090779.
F. P. M. Oliveira and J. M. R. S. Tavares, “Medical image registration: A review,” Comput. Methods Biomech. Biomed. Eng., vol. 17,
no. 2, pp. 73–93, 2014. doi: 10.1080/10255842.2012.670855.
A. Sotiras, C. Davatzikos, and N. Paragios, “Deformable medical image registration: A survey,” IEEE Trans. Med. Imag., vol. 32,
no. 7, pp. 1153–1190, 2013. doi: 10.1109/TMI.2013.2265603.
M. A. Viergever, J. B. A. Maintz, S. Klein, K. Murphy, M. Staring, and J. P. W. Pluim, “A survey of medical image registration,” Med. Image Anal., vol. 33, pp. 140–144, Oct. 2016. doi:
10.1016/j.media.2016.06.030.
G. Haskins, U. Kruger, and P. Yan, “Deep learning in medical
image registration: A survey,” Mach. Vis. Appl., vol. 31, nos. 1–2,
p. 8, 2020. doi: 10.1007/s00138-020-01060-x.
L. G. Brown, “A survey of image registration techniques,”
ACM Comput. Surv., vol. 24, no. 4, pp. 325–376, 1992. doi:
10.1145/146370.146374.
B. Zitová and J. Flusser, “Image registration methods: A survey,”
Image Vis. Comput., vol. 21, no. 11, pp. 977–1000, 2003. doi:
10.1016/S0262-8856(03)00137-9.
M. Deshmukh and U. Bhosle, “A survey of image registration,”
Int. J. Image Process., vol. 5, no. 3, p. 245, 2011.

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

[18] Z. Xiong and Y. Zhang, “A critical review of image registration
methods,” Int. J. Image Data Fusion, vol. 1, no. 2, pp. 137–158,
2010. doi: 10.1080/19479831003802790.
[19] M. V. Wyawahare, P. M. Patil, and H. K. Abhyankar, “Image registration techniques: An overview,” Int. J. Signal Process., Image
Process. Pattern Recognit., vol. 2, no. 3, pp. 11–28, 2009.
[20] C. Dalmiya and V. Dharun, “A survey of registration techniques
in remote sensing images,” Indian J. Sci. Technol., vol. 8, no. 26,
pp. 1–7, 2015. doi: 10.17485/ijst/2015/v8i26/81048.
[21] R. M. Ezzeldeen, H. H. Ramadan, T. M. Nazmy, M. A. Yehia,
and M. S. Abdel-Wahab, “Comparative study for image registration techniques of remote sensing images,” Egyptian J. Remote
Sens. Space Sci., vol. 13, no. 1, pp. 31–36, 2010. doi: 10.1016/j.
ejrs.2010.07.004.
[22] M. P. S. Tondewad and M. M. P. Dale, “Remote sensing image
registration methodology: Review and discussion,” Proc. Comput. Sci., vol. 171, pp. 2390–2399, June 2020. doi: 10.1016/j.
procs.2020.04.259.
[23] P. E. Anuta, “Spatial registration of multispectral and multitemporal digital imagery using fast Fourier transform techniques,” IEEE Trans. Geosci. Electron., vol. 8, no. 4, pp. 353–368,
1970. doi: 10.1109/TGE.1970.271435.
[24] X. Xu, X. Li, X. Liu, H. Shen, and Q. Shi, “Multimodal registration of remotely sensed images based on Jeffrey’s divergence,”
ISPRS J. Photogram. Remote Sens., vol. 122, pp. 97–115, Dec.
2016. doi: 10.1016/j.isprsjprs.2016.10.005.
[25] J. Ma, H. Zhou, J. Zhao, Y. Gao, J. Jiang, and J. Tian, “Robust feature matching for remote sensing image registration
via locally linear transforming,” IEEE Trans. Geosci. Remote
Sens., vol. 53, no. 12, pp. 6469–6481, 2015. doi: 10.1109/
TGRS.2015.2441954.
[26] N. Hanaizumi and S. Fujimur, “An automated method for registration of satellite remote sensing images,” in Proc. IEEE Int.
Geosci. Remote Sens. Symp. (IGARSS), 1993, pp. 1348–1350. doi:
10.1109/IGARSS.1993.322087.
[27] W. F. Webber, “Techniques for image registration,” in Proc.
LARS Symp., West Lafayette, IN, 1973, pp. 1–7.
[28] D. I. Barnea and H. F. Silverman, “A class of algorithms for fast
digital image registration,” IEEE Trans. Comput., vol. C-21, no. 2,
pp. 179–186, 1972. doi: 10.1109/TC.1972.5008923.
[29] S. i. Kaneko, Y. Satoh, and S. Igarashi, “Using selective correlation coefficient for robust image registration,” Pattern Recognit., vol. 36, no. 5, pp. 1165–1173, 2003. doi: 10.1016/S00313203(02)00081-X.
[30] H. Gonçalves, J. A. Gonçalves, L. Corte-Real, and A. C. Teodoro,
“CHAIR: Automatic image registration based on correlation and
Hough transform,” Int. J. Remote Sens., vol. 33, no. 24, pp. 7936–
7968, Dec. 20, 2012. doi: 10.1080/01431161.2012.701345.
[31] J. Inglada and A. Giros, “On the possibility of automatic multisensor image registration,” IEEE Trans. Geosci. Remote Sens.,
vol. 42, no. 10, pp. 2104–2120, 2004. doi: 10.1109/TGRS.2004.
835294.
[32] J. Ma, J. C. Chan, and F. Canters, “Fully automatic subpixel image registration of multiangle CHRIS/Proba data,” IEEE Trans.
Geosci. Remote Sens., vol. 48, no. 7, pp. 2829–2839, 2010. doi:
10.1109/TGRS.2010.2042813.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

[33] Y. Wu, W. Ma, Q. Su, S. Liu, and Y. Ge, “Remote sensing image
registration based on local structural information and global
constraint,” J. Appl. Remote Sens., vol. 13, no. 1, p. 1, 2019. doi:
10.1117/1.JRS.13.016518.
[34] G. Wolberg and S. Zokai, “Image registration for perspective
deformation recovery,” in Proc. SPIE, Automatic Target Recognit.
X, Orlando, FL, 2000, vol. 4050, pp. 259–270.
[35] D. P. Huttenlocher, G. A. Klanderman, and W. J. Rucklidge,
“Comparing images using the Hausdorff distance,” IEEE Trans.
Pattern Anal. Mach. Intell., vol. 15, no. 9, pp. 850–863, 1993. doi:
10.1109/34.232073.
[36] Y. Ye and J. Shan, “A local descriptor based registration method
for multispectral remote sensing images with non-linear intensity differences,” ISPRS J. Photogram. Remote Sens., vol. 90, pp.
83–95, 2014. doi: 10.1016/j.isprsjprs.2014.01.009.
[37] Y. Hel-Or, H. Hel-Or, and E. David, “Fast template matching
in non-linear tone-mapped images,” in Proc. Int. Conf. Comput.
Vision (ICCV), Barcelona, Spain, 2011, pp. 1355–1362. doi:
10.1109/ICCV.2011.6126389.
[38] Y. Bentoutou, N. Taleb, K. Kpalma, and J. Ronsin, “An automatic image registration for applications in remote sensing,” IEEE
Trans. Geosci. Remote Sens., vol. 43, no. 9, pp. 2127–2137, 2005.
doi: 10.1109/TGRS.2005.853187.
[39] K. Taejung and I. Yong-Jo, “Automatic satellite image registration by combination of matching and random sample consensus,” IEEE Trans. Geosci. Remote Sens., vol. 41, no. 5, pp. 1111–
1117, 2003. doi: 10.1109/TGRS.2003.811994.
[40] J. P. Kern and M. S. Pattichis, “Robust multispectral image registration using mutual-information models,” IEEE Trans. Geosci.
Remote Sens., vol. 45, no. 5, pp. 1494–1505, 2007. doi: 10.1109/
TGRS.2007.892599.
[41] H. m. Chen, M. K. Arora, and P. K. Varshney, “Mutual information-based image registration for remote sensing data,”
Int. J. Remote Sens., vol. 24, no. 18, pp. 3701–3706, 2003. doi:
10.1080/0143116031000117047.
[42] A. A. Cole-Rhodes, K. L. Johnson, J. LeMoigne, and I. Zavorin,
“Multiresolution registration of remote sensing imagery by optimization of mutual information using a stochastic gradient,”
IEEE Trans. Image Process., vol. 12, no. 12, pp. 1495–1511, 2003.
doi: 10.1109/TIP.2003.819237.
[43] D. Brunner, G. Lemoine, and L. Bruzzone, “Earthquake Damage assessment of buildings using VHR optical and SAR imagery,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 5, pp. 2403–
2420, 2010. doi: 10.1109/TGRS.2009.2038274.
[44] X. Wang, W. Yang, A. Wheaton, N. Cooley, and B. Moran,
“Efficient registration of optical and IR images for automatic
plant water stress assessment,” Comput. Electron. Agriculture,
vol. 74, no. 2, pp. 230–237, 2010. doi: 10.1016/j.compag.2010.
08.004.
[45] S. Chen, X. Li, L. Zhao, and H. Yang, “Medium-low resolution
multisource remote sensing image registration based on SIFT
and robust regional mutual information,” Int. J. Remote Sens.,
vol. 39, no. 10, pp. 3215–3242, 2018. doi: 10.1080/01431161.
2018.1437295.
[46] L. Y. Zhao, B. Y. Lü, X. R. Li, and S. H. Chen, “Multi-source
remote sensing image registration based on scale-invariant

137

[47]

[48]
[49]

[50]

[51]

[52]

[53]

[54]

[55]

[56]

[57]

[58]

[59]

[60]

138

feature transform and optimization of regional mutual information,” Acta Phys. Sin., vol. 64, no. 12, pp. 124204, 1-11),
2015.
G. Hermosillo, C. Chefd’Hotel, and O. Faugeras, “Variational
methods for multimodal image matching,” Int. J. Comput.
Vis., vol. 50, no. 3, pp. 329–343, Dec. 1, 2002. doi: 10.1023/
A:1020830525823.
R. N. Bracewell and R. N. Bracewell, The Fourier Transform and Its
Applications. New York: McGraw-Hill, 1986.
H. Foroosh, J. B. Zerubia, and M. Berthod, “Extension of phase
correlation to subpixel registration,” IEEE Trans. Image Process.,
vol. 11, no. 3, pp. 188–200, Mar. 2002. doi: 10.1109/83.988953.
X. Wan, J. G. Liu, and H. Yan, “The illumination robustness
of phase correlation for image alignment,” IEEE Trans. Geosci.
Remote Sens., vol. 53, no. 10, pp. 5746–5759, 2015. doi: 10.1109/
TGRS.2015.2429740.
X. Wan, J. Liu, H. Yan, and G. L. K. Morgan, “Illumination-invariant image matching for autonomous UAV localisation based
on optical sensing,” ISPRS J. Photogram. Remote Sens., vol. 119,
pp. 198–213, Sept. 2016. doi: 10.1016/j.isprsjprs.2016.05.016.
Z. Liu, L. Wang, X. Wang, X. Shen, and L. Li, “Secure remote
sensing image registration based on compressed sensing in
cloud setting,” IEEE Access, vol. 7, pp. 36,516–36,526, Mar.
2019. doi: 10.1109/ACCESS.2019.2903826.
M. Xu and P. K. Varshney, “A subspace method for Fourierbased image registration,” IEEE Geosci. Remote Sens. Lett., vol. 6,
no. 3, pp. 491–494, 2009. doi: 10.1109/LGRS.2009.2018705.
L. Lucchese, S. Leorin, and G. M. Cortelazzo, “Estimation of
two-dimensional affine transformations through polar curve
matching and its application to image mosaicking and remotesensing data registration,” IEEE Trans. Image Process., vol. 15, no.
10, pp. 3008–3019, 2006. doi: 10.1109/TIP.2006.877519.
P. Bao and D. Xu, “Complex wavelet-based image mosaics using edge-preserving visual perception modeling,” Comput.
Graph., vol. 23, no. 3, pp. 309–321, 1999. doi: 10.1016/S00978493(99)00040-0.
H. Gang and Z. Yun, “Combination of feature-based and
area-based image registration technique for high resolution remote sensing image,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2007, pp. 377–380. doi: 10.1109/
IGARSS.2007.4422809.
Y. Ye, J. Shan, L. Bruzzone, and L. Shen, “Robust registration
of multimodal remote sensing images based on structural
similarity,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 5, pp.
2941–2958, 2017. doi: 10.1109/TGRS.2017.2656380.
H. Yang, X. Li, L. Zhao, and S. Chen, “A novel coarse-to-fine
scheme for remote sensing image registration based on SIFT
and phase correlation,” Remote Sens., vol. 11, no. 15, p. 1833,
2019. doi: 10.3390/rs11151833.
Y. Han, F. Bovolo, and L. Bruzzone, “An approach to fine coregistration between very high resolution multispectral images
based on registration noise distribution,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 12, pp. 6650–6662, 2015. doi: 10.1109/
TGRS.2015.2445632.
A. Plyer, E. Colin-Koeniguer, and F. Weissgerber, “A new
coregistration algorithm for recent applications on urban

[61]

[62]

[63]

[64]
[65]

[66]

[67]

[68]

[69]

[70]

[71]

[72]

[73]

[74]

SAR images,” IEEE Geosci. Remote Sens. Lett., vol. 12, no. 11,
pp. 2198–2202, 2015. doi: 10.1109/LGRS.2015.2455071.
G. Brigot, E. Colin-Koeniguer, A. Plyer, and F. Janez, “Adaptation and evaluation of an optical flow method applied to coregistration of forest remote sensing images,”
IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 9,
no. 7, pp. 2923–2939, 2016. doi: 10.1109/JSTARS.2016.
2578362.
R. Feng, X. Li, and H. Shen, “Mountainous remote sensing images registration based on improved optical flow estimation,”
ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci., vol. IV-2/
W5, pp. 479–484, June 2019. doi: 10.5194/isprs-annals-IV2-W5-479-2019.
B. K. P. Horn and B. G. Schunck, “Determining optical
flow,” Artif. Intell., vol. 17, nos. 1–3, pp. 185–203, 1981. doi:
10.1016/0004-3702(81)90024-2.
J. J. Gibson, The Perception of the Visual World. Oxford: Houghton Mifflin, 1950.
B. D. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision,” in Proc. Imag. Understanding Workshop, 1981, pp. 121–130.
Z. Tu et al., “A survey of variational and CNN-based optical
flow techniques,” Signal Processing: Image Commun., vol. 72, pp.
9–24, Mar. 2019. doi: 10.1016/j.image.2018.12.002.
M. J. Black and P. Anandan, “The robust estimation of multiple
motions: Parametric and piecewise-smooth flow fields,” Comput. Vision Image Understand., vol. 63, no. 1, pp. 75–104, 1996.
doi: 10.1006/cviu.1996.0006.
C. Liu, J. Yuen, and A. Torralba, “SIFT Flow: Dense correspondence across scenes and its applications,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 5, pp. 978–994, 2011. doi:
10.1109/TPAMI.2010.147.
T. Brox, A. Bruhn, N. Papenberg, and J. Weickert, “High accuracy optical flow estimation based on a theory for warping,” in
Proc. Eur. Conf. Comput. Vision (ECCV), 2004, pp. 25–36. doi:
10.1007/978-3-540-24673-2_3.
J.-Y. Xiong, Y.-P. Luo, and G.-R. Tang, “An improved optical
flow method for image registration with large-scale movements,” Acta Autom. Sin., vol. 34, no. 7, pp. 760–764, 2008. doi:
10.3724/SP.J.1004.2008.00760.
A. Plyer, G. Le Besnerais, and F. Champagnat, “Massively parallel Lucas Kanade optical flow for real-time video processing
applications,” J. Real-Time Image Process., vol. 11, no. 4, pp. 713–
730, 2016. doi: 10.1007/s11554-014-0423-0.
Y. Xiang, F. Wang, L. Wan, N. Jiao, and H. You, “OS-Flow: A robust algorithm for dense optical and SAR image registration,”
IEEE Trans. Geosci. Remote Sens., vol. 57, no. 9, pp. 1–20, 2019.
doi: 10.1109/TGRS.2019.2905585.
C. Huo, C. Pan, L. Huo, and Z. Zhou, “Multilevel SIFT matching for large-size VHR image registration,” IEEE Geosci. Remote Sens. Lett., vol. 9, no. 2, pp. 171–175, 2012. doi: 10.1109/
LGRS.2011.2163491.
L. Yu, D. Zhang, and E.-J. Holden, “A fast and fully automatic registration approach based on point features for multisource remote-sensing images,” Comput. Geosci., vol. 34, no. 7,
pp. 838–848, 2008. doi: 10.1016/j.cageo.2007.10.005.

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

[75] L. Hui, B. S. Manjunath, and S. K. Mitra, “A contour-based approach to multisensor image registration,” IEEE Trans. Image Process., vol. 4, no. 3, pp. 320–334, 1995. doi: 10.1109/83.366480.
[76] H. Goncalves, L. Corte-Real, and J. A. Goncalves, “Automatic
image registration through image segmentation and SIFT,”
IEEE Trans. Geosci. Remote Sens., vol. 49, no. 7, pp. 2589–2600,
2011. doi: 10.1109/TGRS.2011.2109389.
[77] H. P. Moravec, “Techniques towards automatic visual obstacle
avoidance,” no. 2, p. 584, 1977. [Online]. Available: https://frc
.ri.cmu.edu/~hpm/project.archive/robot.papers/1977/aip.txt
[78] C. Harris and M. Stephens, “A combined corner and edge detector,” in Proc. Alvey Vision Conf., Manchester, U.K., 1988, vol. 15,
pp. 147–151.
[79] Y. Xiang, F. Wang, and H. You, “OS-SIFT: A robust SIFT-like algorithm for high-resolution optical-to-SAR image registration
in suburban areas,” IEEE Trans. Geosci. Remote Sens., vol. 56, no.
6, pp. 3078–3090, 2018. doi: 10.1109/TGRS.2018.2790483.
[80] I. Misra, S. M. Moorthi, D. Dhar, and R. Ramakrishnan, “An automatic satellite image registration technique based on Harris
corner detection and Random Sample Consensus (RANSAC)
outlier rejection model,” in 1st Int. Conf. on Recent Advances in
Information Technology (RAIT), 2012, pp. 68–73.
[81] S. M. Smith and J. M. Brady, “SUSAN–A new approach to low
level image processing,” Int. J. Comput. Vis., vol. 23, no. 1, pp.
45–78, 1997. doi: 10.1023/A:1007963824710.
[82] C. Leng, H. Zhang, B. Li, G. Cai, Z. Pei, and L. He, “Local feature descriptor for image matching: A survey,” IEEE Access, vol.
7, pp. 6424–6434, 2019. doi: 10.1109/ACCESS.2018.2888856.
[83] W. He and X. Deng, “A modified SUSAN corner detection algorithm based on adaptive gradient threshold for remote sensing
image,” in Proc. Int. Conf. Optoelectron. Image Process., 2010, vol.
1, pp. 40–43.
[84] R. Feng, X. Li, W. Zou, and H. Shen, “Registration of multitemporal GF-1 remote sensing images with weighting
perspective transformation model,” in Proc. IEEE Int. Conf.
Image Process. (ICIP), 2017, pp. 2264–2268. doi: 10.1109/
ICIP.2017.8296685.
[85] R. Feng, Q. Du, X. Li, and H. Shen, “Robust registration
for remote sensing images by combining and localizing
feature- and area-based methods,” ISPRS J. Photogram. Remote Sens., vol. 151, pp. 15–26, May 2019. doi: 10.1016/j.isprsjprs.2019.03.002.
[86] Y. Duan, X. Huang, J. Xiong, Y. Zhang, and B. Wang, “A combined image matching method for Chinese optical satellite imagery,” Int. J. Digital Earth, vol. 9, no. 9, pp. 851–872, 2016. doi:
10.1080/17538947.2016.1151955.
[87] P. K. Konugurthi, R. Kune, R. Nooka, and V. Sarma, “Autonomous ortho-rectification of very high resolution imagery using
SIFT and genetic algorithm,” Photogram. Eng. Remote Sens., vol.
82, no. 5, pp. 377–388, 2016. doi: 10.14358/PERS.82.5.377.
[88] Q. Li, G. Wang, J. Liu, and S. Chen, “Robust scale-invariant
feature matching for remote sensing image registration,” IEEE
Geosci. Remote Sens. Lett., vol. 6, no. 2, pp. 287–291, 2009. doi:
10.1109/LGRS.2008.2011751.
[89] W. Ma et al., “Remote sensing image registration with modified SIFT and enhanced feature matching,” IEEE Geosci. ReDECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

mote Sens. Lett., vol. 14, no. 1, pp. 3–7, 2017. doi: 10.1109/
LGRS.2016.2600858.
[90] M. Gong, S. Zhao, L. Jiao, D. Tian, and S. Wang, “A novel
coarse-to-fine scheme for automatic image registration based
on SIFT and mutual information,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 7, pp. 4328–4338, 2014. doi: 10.1109/
TGRS.2013.2281391.
[91] C. A. Lee, S. D. Gasster, A. Plaza, C. Chang, and B. Huang,
“Recent developments in high performance computing for remote sensing: A review,” IEEE J. Sel. Topics Appl. Earth Observ.
Remote Sens., vol. 4, no. 3, pp. 508–527, 2011. doi: 10.1109/
JSTARS.2011.2162643.
[92] D. G. Lowe, “Distinctive image features from scale-invariant
keypoints,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004.
doi: 10.1023/B:VISI.0000029664.99615.94.
[93] K. Mikolajczyk and C. Schmid, “A performance evaluation of
local descriptors,” (in English), IEEE Trans. Pattern Anal. Mach.
Intell., vol. 27, no. 10, pp. 1615–1630, Oct. 2005. doi: 10.1109/
TPAMI.2005.188.
[94] K. Yan and R. Sukthankar, “PCA-SIFT: A more distinctive representation for local image descriptors,” in Proc. IEEE Comput. Soc.
Conf. Comput. Vision Pattern Recognit. (CVPR), Washington, D. C.,
2004, vol. 2, pp. 506–513. doi: 10.1109/CVPR.2004.1315206.
[95] Y. Zheng, Z. Cao, and Y. Xiao, “Multi-spectral remote image registration based on SIFT,” Electron. Lett., vol. 44, no. 2, pp. 107–108,
2008.
[96] J. Morel and G. Yu, “ASIFT: A new framework for fully affine
invariant image comparison,” SIAM J. Imag. Sci., vol. 2, no. 2,
pp. 438–469, 2009. doi: 10.1137/080732730.
[97] A. Sedaghat, M. Mokhtarzade, and H. Ebadi, “Uniform robust
scale-invariant feature matching for optical remote sensing
images,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 11, pp.
4516–4527, 2011. doi: 10.1109/TGRS.2011.2144607.
[98] A. Sedaghat and H. Ebadi, “Distinctive order based self-similarity descriptor for multi-sensor remote sensing image matching,” ISPRS J. Photogram. Remote Sens., vol. 108, pp. 62–71, Oct.
2015. doi: 10.1016/j.isprsjprs.2015.06.003.
[99] H. Bay, T. Tuytelaars, and L. Van Gool, “SURF: Speeded up robust features,” in Proc. Eur. Conf. Comput. Vision (ECCV), Graz,
Austria, 2006, pp. 404–417.
[100] W. Yan, H. She, and Z. Yuan, “Robust registration of remote sensing image based on SURF and KCCA,” J. Indian Soc. Remote Sens.,
vol. 42, no. 2, pp. 291–299, 2014. doi: 10.1007/s12524-013-0324-x.
[101] X. Yuan, S. Chen, W. Yuan, and Y. Cai, “Poor textural image tie point matching via graph theory,” ISPRS J. Photogram.
Remote Sens., vol. 129, pp. 21–31, July 2017. doi: 10.1016/j.isprsjprs.2017.04.015.
[102] R. Bouchiha and K. Besbes, “Automatic remote-sensing image
registration using SURF,” Int. J. Comput. Theory Eng., vol. 5, no.
1, pp. 88–92, 2013. doi: 10.7763/IJCTE.2013.V5.653.
[103] J. Chen et al., “WLD: A robust local image descriptor,” IEEE
Trans. Pattern Anal. Mach. Intell., vol. 32, no. 9, pp. 1705–1720,
2010. doi: 10.1109/TPAMI.2009.155.
[104] E. Rosten and T. Drummond, “Machine learning for highspeed corner detection,” in Proc. Eur. Conf. Comput. Vision
(ECCV), Graz, Austria, 2006, pp. 430–443.

139

[105] M. Calonder, V. Lepetit, C. Strecha, and P. Fua, “BRIEF: Binary
robust independent elementary features,” in Proc. Eur. Conf.
Comput. Vision (ECCV), Crete, Greece, 2010, pp. 778–792.
[106] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “ORB: An
efficient alternative to SIFT or SURF,” in Proc. Int. Conf. Comput.
Vision (ICCV), Barcelona, Spain, 2011, pp. 2564–2571.
[107] D. Ma and H. Lai, “Remote sensing image matching based improved ORB in NSCT domain,” J. Indian soc. Remote Sens., vol.
47, no. 5, pp. 801–807, 2019. doi: 10.1007/s12524-019-00958-y.
[108] P. F. Alcantarilla, A. Bartoli, and A. J. Davison, “KAZE Features,” in Proc. Eur. Conf. Comput. Vision (ECCV), Florence, Italy,
2012, pp. 214–227.
[109] P. Alcantarilla, J. Nuevo, and A. Bartoli, “Fast explicit diffusion
for accelerated features in nonlinear scale spaces,” in Proc. Brit.
Mach. Vision Conf. (BMVC), Bristol, U.K., 2013, pp. 1–11.
[110] Y. Ye, M. Wang, S. Hao, and Q. Zhu, “A novel keypoint detector combining corners and blobs for remote sensing image registration,” IEEE Geosci. Remote Sens. Lett., vol. 18, no.
3, pp. 451–455, Mar. 31, 2020. doi: 10.1109/LGRS.2020.
2980620.
[111] X. Liu, Y. Ai, J. Zhang, and Z. Wang, “A novel affine and contrast invariant descriptor for infrared and visible image registration,” Remote Sens., vol. 10, no. 4, p. 658, 2018. doi: 10.3390/
rs10040658.
[112] Z. Ye et al., “Robust fine registration of multisensor remote
sensing images based on enhanced subpixel phase correlation,”
Sensors, vol. 20, no. 15, p. 4338, Aug. 4, 2020. doi: 10.3390/
s20154338.
[113] Y. C. Hsieh, D. M. McKeown, and F. P. Perlant, “Performance evaluation of scene registration and stereo matching for
cartographic feature extraction,” IEEE Trans. Pattern
Anal. Mach. Intell., vol. 14, no. 2, pp. 214–238, 1992. doi:
10.1109/34.121790.
[114] S. Dongseok, J. K. Pollard, and J. Muller, “Accurate geometric
correction of ATSR images,” IEEE Trans. Geosci. Remote Sens.,
vol. 35, no. 4, pp. 997–1006, 1997. doi: 10.1109/36.602542.
[115] J. Inglada and F. Adragna, “Automatic multi-sensor image registration by edge matching using genetic algorithms,” in Proc. Int.
Geosci. Remote Sens. Symp. (IGARSS), Sydney, NSW, Australia,
2001, vol. 5, pp. 2313–2315.
[116] W. Shi and A. Shaker, “The Line‐Based Transformation Model
(LBTM) for image‐to‐image registration of high‐resolution satellite image data,” Int. J. Remote Sens., vol. 27, no. 14, pp. 3001–
3012, 2006. doi: 10.1080/01431160500486716.
[117] T.-Z. Xiang, G.-S. Xia, X. Bai, and L. Zhang, “Image stitching
by line-guided local warping with global similarity constraint,”
Pattern Recognit., vol. 83, pp. 481–497, Nov. 2018. doi: 10.1016/j.
patcog.2018.06.013.
[118] C. Zhao and A. A. Goshtasby, “Registration of multitemporal
aerial optical images using line features,” ISPRS J. Photogram.
Remote Sens., vol. 117, pp. 149–160, July 2016. doi: 10.1016/j.
isprsjprs.2016.04.002.
[119] C. Li and W. Shi, “The generalized-line-based iterative transformation model for imagery registration and rectification,” IEEE
Geosci. Remote Sens. Lett., vol. 11, no. 8, pp. 1394–1398, 2014.
doi: 10.1109/LGRS.2013.2293844.

140

[120] A. O. Ok, J. D. Wegner, C. Heipke, F. Rottensteiner, U. Soergel, and V. Toprak, “Matching of straight line segments from
aerial stereo images of urban areas,” ISPRS J. Photogram. Remote Sens., vol. 74, pp. 133–152, Nov. 2012. doi: 10.1016/j.isprsjprs.2012.09.003.
[121] J. Canny, “A computational approach to edge detection,” IEEE
Trans. Pattern Anal. Mach. Intell., vol. PAMI-8, no. 6, pp. 679–
698, 1986. doi: 10.1109/TPAMI.1986.4767851.
[122] D. Marr and E. Hildreth, “Theory of edge detection,” Proc. Roy.
Soc. Ser. B-Biol. Sci., vol. 207, no. 1167, pp. 187–217, 1980. doi:
10.1098/rspb.1980.0020.
[123] R. G. v. Gioi, J. Jakubowicz, J. Morel, and G. Randall, “LSD: A
fast line segment detector with a false detection control,” IEEE
Trans. Pattern Anal. Mach. Intell., vol. 32, no. 4, pp. 722–732,
2010. doi: 10.1109/TPAMI.2008.300.
[124] C. Akinlar and C. Topal, “EDLines: A real-time line segment detector with a false detection control,” Pattern Recog. Lett., vol. 32,
no. 13, pp. 1633–1642, 2011. doi: 10.1016/j.patrec.2011.06.001.
[125] A. Goshtasby and G. C. Stockman, “Point pattern matching using convex hull edges,” IEEE Trans. Syst., Man, Cybern., vol. SMC15, no. 5, pp. 631–637, 1985. doi: 10.1109/TSMC.1985.6313439.
[126] W. Dorigo, M. Hollaus, W. Wagner, and K. Schadauer, “An
application-oriented automated approach for co-registration of forest inventory and airborne laser scanning data,”
Int. J. Remote Sens., vol. 31, no. 5, pp. 1133–1153, 2010. doi:
10.1080/01431160903380581.
[127] B. Sirmacek and C. Unsalan, “Urban-area and building detection using SIFT keypoints and graph theory,” IEEE Trans. Geosci.
Remote Sens., vol. 47, no. 4, pp. 1156–1167, 2009. doi: 10.1109/
TGRS.2008.2008440.
[128] J. Flusser and T. Suk, “A moment-based approach to registration of images with affine geometric distortion,” IEEE Trans.
Geosci. Remote Sens., vol. 32, no. 2, pp. 382–387, 1994. doi:
10.1109/36.295052.
[129] N. R. Pal and S. K. Pal, “A review on image segmentation techniques,” Pattern Recognit., vol. 26, no. 9, pp. 1277–1294, 1993.
doi: 10.1016/0031-3203(93)90135-J.
[130] D. Xiaolong and S. Khorram, “Development of a feature-based
approach to automated image registration for multitemporal
and multisensor remotely sensed imagery,” in Proc. IEEE Int.
Geosci. Remote Sens. Symp. Proc. Remote Sens.-A Sci. Vision Sustainable Develop. (IGARSS), 1997, vol. 1, pp. 243–245.
[131] L. M. Fonseca and B. Manjunath, “Registration techniques for
multisensor remotely sensed imagery,” Photogram. Eng. Remote
Sensing (PERS), vol. 62, no. 9, pp. 1049–1056, 1996.
[132] A. Goshtasby, G. C. Stockman, and C. V. Page, “A region-based
approach to digital image registration with subpixel accuracy,”
IEEE Trans. Geosci. Remote Sens., vol. GE-24, no. 3, pp. 390–399,
1986. doi: 10.1109/TGRS.1986.289597.
[133] J. Ton and A. K. Jain, “Registering Landsat images by point
matching,” IEEE Trans. Geosci. Remote Sens., vol. 27, no. 5, pp.
642–651, 1989. doi: 10.1109/TGRS.1989.35948.
[134] Y. Chen, X. Zhang, Y. Zhang, S. J. Maybank, and Z. Fu, “Visible
and infrared image registration based on region features and
edginess,” Mach. Vis. Appl., vol. 29, no. 1, pp. 113–123, 2018.
doi: 10.1007/s00138-017-0879-6.
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

[135] A. Irani Rahaghi, U. Lemmin, D. Sage, and D. A. Barry, “Achieving high-resolution thermal imagery in low-contrast lake surface waters by aerial remote sensing and image registration,”
Remote Sens. Environ., vol. 221, pp. 773–783, Feb. 2019. doi:
10.1016/j.rse.2018.12.018.
[136] A. Li, X. Cheng, H. Guan, T. Feng, and Z. Guan, “Novel image
registration method based on local structure constraints,” IEEE
Geosci. Remote Sens. Lett., vol. 11, no. 9, pp. 1584–1588, 2014.
doi: 10.1109/LGRS.2014.2305982.
[137] S. Jiang and W. Jiang, “Hierarchical motion consistency
constraint for efficient geometrical verification in UAV stereo image matching,” ISPRS J. Photogram. Remote Sens., vol.
142, pp. 222–242, Aug. 2018. doi: 10.1016/j.isprsjprs.2018.
06.009.
[138] J. S. Beis and D. G. Lowe, “Shape indexing using approximate nearest-neighbour search in high-dimensional spaces,”
in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), 1997, vol. 97, pp. 1000–1006. doi: 10.1109/
CVPR.1997.609451.
[139] Y. Ma et al., “Remote sensing big data computing: Challenges
and opportunities,” Future Gener. Comput. Syst., vol. 51, pp. 47–
60, Oct. 2015. doi: 10.1016/j.future.2014.10.029.
[140] G. Stockman, S. Kopstein, and S. Benett, “Matching images to
models for registration and object detection via clustering,”
IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-4, no. 3, pp.
229–241, 1982. doi: 10.1109/TPAMI.1982.4767240.
[141] G. Borgefors, “Hierarchical chamfer matching: A parametric edge matching algorithm,” IEEE Trans. Pattern Anal.
Mach. Intell., vol. 10, no. 6, pp. 849–865, 1988. doi: 10.1109/
34.9107.
[142] L. Livi and A. Rizzi, “The graph matching problem,” Pattern
Anal. Appl., vol. 16, no. 3, pp. 253–283, 2013. doi: 10.1007/
s10044-012-0284-8.
[143] L. Torresani, V. Kolmogorov, and C. Rother, “Feature correspondence via graph matching: models and global optimization,” in Proc. Eur. Conf. Comput. Vision (ECCV), Berlin,
Heidelberg, 2008, pp. 596–609.
[144] Z. Liu, J. An, and Y. Jing, “A simple and robust feature point
matching algorithm based on restricted spatial order constraints for aerial image registration,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 2, pp. 514–527, 2012. doi: 10.1109/
TGRS.2011.2160645.
[145] Z. Xiong and Y. Zhang, “A novel interest-point-matching algorithm for high-resolution satellite images,” IEEE Trans. Geosci.
Remote Sens., vol. 47, no. 12, pp. 4189–4200, 2009. doi: 10.1109/
TGRS.2009.2023794.
[146] H. Chang, G. Wu, and M. Chiang, “Remote sensing image registration based on modified SIFT and feature slope grouping,”
IEEE Geosci. Remote Sens. Lett., vol. 16, no. 9, pp. 1363–1367,
2019. doi: 10.1109/LGRS.2019.2899123.
[147] S. Zhili and Z. Jiaqi, “Image registration approach with scaleinvariant feature transform algorithm and tangent-crossingpoint feature,” J. Electron. Imag., vol. 29, no. 2, pp. 1–14, Mar.
2020. doi: 10.1117/1.JEI.29.2.023010.
[148] M. A. Fischler and R. C. Bolles, “Random sample consensus: A
paradigm for model fitting with applications to image analyDECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

sis and automated cartography,” Commun. ACM, vol. 24, no. 6,
pp. 381–395, 1981. doi: 10.1145/358669.358692.
[149] K. Zhang, X. Li, and J. Zhang, “A robust point-matching algorithm for remote sensing image registration,” IEEE Geosci. Remote Sens. Lett., vol. 11, no. 2, pp. 469–473, 2014. doi: 10.1109/
LGRS.2013.2267771.
[150] Y. Wu, W. Ma, M. Gong, L. Su, and L. Jiao, “A novel pointmatching algorithm based on fast sample consensus for image
registration,” IEEE Geosci. Remote Sens. Lett., vol. 12, no. 1, pp.
43–47, 2015. doi: 10.1109/LGRS.2014.2325970.
[151] B. Li and H. Ye, “RSCJ: Robust sample consensus judging
algorithm for remote sensing image registration,” IEEE Geosci.
Remote Sens. Lett., vol. 9, no. 4, pp. 574–578, 2012. doi: 10.1109/
LGRS.2011.2175434.
[152] H. Zhang et al., “Remote sensing image registration based
on local affine constraint with circle descriptor,” IEEE
Geosci. Remote Sens. Lett., early access, 2020. doi: 10.1109/
LGRS.2020.3027096.
[153] F. Ye, Y. Su, H. Xiao, X. Zhao, and W. Min, “Remote sensing image registration using convolutional neural network features,”
IEEE Geosci. Remote Sens. Lett., vol. 15, no. 2, pp. 232–236, 2018.
doi: 10.1109/LGRS.2017.2781741.
[154] C. Farabet, C. Couprie, L. Najman, and Y. LeCun, “Learning hierarchical features for scene labeling,” IEEE Trans. Pattern Anal.
Mach. Intell., vol. 35, no. 8, pp. 1915–1929, 2013. doi: 10.1109/
TPAMI.2012.231.
[155] W. Huang, L. Xiao, Z. Wei, H. Liu, and S. Tang, “A new pansharpening method with deep neural networks,” IEEE Geosci.
Remote Sens. Lett., vol. 12, no. 5, pp. 1037–1041, 2015. doi:
10.1109/LGRS.2014.2376034.
[156] Y. Xing, M. Wang, S. Yang, and L. Jiao, “Pan-sharpening via
deep metric learning,” ISPRS J. Photogram. Remote Sens., vol.
145, pp. 165–183, Nov. 2018. doi: 10.1016/j.isprsjprs.2018.
01.016.
[157] G. J. Scott, M. R. England, W. A. Starms, R. A. Marcum, and
C. H. Davis, “Training deep convolutional neural networks
for land–cover classification of high-resolution imagery,” IEEE
Geosci. Remote Sens. Lett., vol. 14, no. 4, pp. 549–553, 2017. doi:
10.1109/LGRS.2017.2657778.
[158] L. Ma, Y. Liu, X. Zhang, Y. Ye, G. Yin, and B. A. Johnson, “Deep
learning in remote sensing applications: A meta-analysis and
review,” ISPRS J. Photogram. Remote Sens., vol. 152, pp. 166–177,
June 2019. doi: 10.1016/j.isprsjprs.2019.04.015.
[159] Y. Liu, D. Minh Nguyen, N. Deligiannis, W. Ding, and A.
Munteanu, “Hourglass-shapenetwork based semantic segmentation for high resolution aerial imagery,” Remote Sens., vol. 9,
no. 6, p. 522, 2017. doi: 10.3390/rs9060522.
[160] H. Zhang et al., “Registration of multimodal remote sensing
image based on deep fully convolutional neural network,”
IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 12,
no. 8, pp. 3028–3042, 2019. doi: 10.1109/JSTARS.2019.
2916560.
[161] N. Merkle, W. Luo, S. Auer, R. Müller, and R. Urtasun, “Exploiting deep matching and SAR data for the geo-localization accuracy improvement of optical satellite images,” Remote Sensing,
vol. 9, no. 6, p. 586, 2017. doi: 10.3390/rs9060586.

141

[162] H. He, M. Chen, T. Chen, and D. Li, “Matching of remote sensing images with complex background variations via Siamese
convolutional neural network,” Remote Sens., vol. 10, no. 3,
p. 355, 2018. doi: 10.3390/rs10020355.
[163] L. H. Hughes, M. Schmitt, L. Mou, Y. Wang, and X. X. Zhu,
“Identifying corresponding patches in SAR and optical images
with a pseudo-Siamese CNN,” IEEE Geosci. Remote Sens. Lett.,
vol. 15, no. 5, pp. 784–788, 2018. doi: 10.1109/LGRS.2018.
2799232.
[164] S. Wang, D. Quan, X. Liang, M. Ning, Y. Guo, and L. Jiao, “A
deep learning framework for remote sensing image registration,” ISPRS J. Photogram. Remote Sens., vol. 145, pp. 148–164,
Nov. 2018. doi: 10.1016/j.isprsjprs.2017.12.012.
[165] R. Fan, B. Hou, J. Liu, J. Yang, and Z. Hong, “Registration of
multi-resolution remote sensing images based on L2-Siamese
model,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens.,
vol. 14, pp. 1–1, Nov. 19, 2020. doi: 10.1109/JSTARS.2020.
3038922.
[166] N. Merkle, S. Auer, R. Müller, and P. Reinartz, “Exploring the
potential of conditional adversarial networks for optical and
SAR image matching,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 11, no. 6, pp. 1811–1820, 2018. doi: 10.1109/
JSTARS.2018.2803212.
[167] H. L. Hughes, M. Schmitt, and X. X. Zhu, “Mining hard negative samples for SAR-optical image matching using generative
adversarial networks,” Remote Sens., vol. 10, no. 10, p. 1552,
2018. doi: 10.3390/rs10101552.
[168] J. Zhang, W. Ma, Y. Wu, and L. Jiao, “Multimodal remote sensing image registration based on image transfer and local features,” IEEE Geosci. Remote Sens. Lett., vol. 16, no. 8, pp. 1210–
1214, 2019. doi: 10.1109/LGRS.2019.2896341.
[169] N. Girard, G. Charpiat, and Y. Tarabalka, “Aligning and updating cadaster maps with aerial images by multi-task, multi-resolution deep learning,” in Proc. Asian Conf. Comput. Vision (ACCV
2018), Cham, 2019, pp. 675–690.
[170] L. Li, L. Han, M. Ding, Z. Liu, and H. Cao, “Remote sensing
image registration based on deep learning regression model,”
IEEE Geosci. Remote Sens. Lett., early access, 2020. doi: 10.1109/
LGRS.2020.3032439.
[171] F. Liu, F. Bi, L. Chen, H. Shi, and W. Liu, “Feature-area optimization: A novel SAR image registration method,” IEEE
Geosci. Remote Sens. Lett., vol. 13, no. 2, pp. 242–246, 2016. doi:
10.1109/LGRS.2015.2507982.
[172] X. Huang, Y. Sun, D. Metaxas, F. Sauer, and C. Xu, “Hybrid image registration based on configural matching of scale-invariant salient region features,” in Proc. IEEE Comput. Society Conf.
Comput. Vis. Pattern Recognit. (CVPR), Washington, D. C., 2004,
pp. 167–167. doi: 10.1109/CVPR.2004.362.
[173] G. Hong and Y. Zhang, “Combination of feature-based and
area-based image registration technique for high resolution
remote sensing image,” in Proc. Int. Geosci. Remote Sens. Symp.
(IGARSS), Barcelona, Spain, 2007, pp. 377–380.
[174] N. E. Mekky, F. E.-Z. Abou-Chadi, and S. Kishk, “Waveletbased image registration techniques: A study of performance,”
Int. J. Comput. Sci. Netw. Security, vol. 11, no. 2, pp. 188–196,
2011.

142

[175] S. Suri, P. Schwind, P. Reinartz, and J. Uhl, “Combining mutual
information and scale invariant feature transform for fast and
robust multisensor SAR image registration,” in Proc. 75th Annu.
ASPRS Conf., 2009.
[176] Y. S. Heo, K. M. Lee, and S. U. Lee, “Joint depth map and
color consistency estimation for stereo images with different
illuminations and cameras,” IEEE Trans. Pattern Anal. Mach.
Intell., vol. 35, no. 5, pp. 1094–1106, 2013. doi: 10.1109/
TPAMI.2012.167.
[177] R. Feng, Q. Du, H. Shen, and X. Li, “Region-by-region registration combining feature-based and optical flow methods for
remote sensing images,” Remote Sens., vol. 13, no. 8, p. 1475,
2021. doi: 10.3390/rs13081475.
[178] C. Xing, J. Wang, and Y. Xu, “A method for building a mosaic
with UAV images,” Int. J. Inform. Eng. Electron. Bus., vol. 2, no. 1,
pp. 9–15, 2010. doi: 10.5815/ijieeb.2010.01.02.
[179] Z. Kang, L. Zhang, S. Zlatanova, and J. Li, “An automatic mosaicking method for building facade texture mapping using a
monocular close-range image sequence,” ISPRS J. Photogram.
Remote Sens., vol. 65, no. 3, pp. 282–293, 2010. doi: 10.1016/j.
isprsjprs.2009.11.003.
[180] S. R. Lee, “A coarse-to-fine approach for remote-sensing image
registration based on a local method,” Int. J. Smart Sens. Intell.
Syst., vol. 3, no. 4, 2010.
[181] K. Sharma and A. Goyal, “Very high resolution image registration based on two step Harris-Laplace detector and SIFT descriptor,” in 2013 4th Int. Conf. Comput., Commun. Netw. Technol.
(ICCCNT), pp. 1–5. doi: 10.1109/ICCCNT.2013.6726632.
[182] W. Ma, J. Zhang, Y. Wu, L. Jiao, H. Zhu, and W. Zhao, “A
novel two-step registration method for remote sensing
images based on deep and local features,” IEEE Trans. Geosci.
Remote Sens., vol. 57, no. 7, pp. 4834–4843, 2019. doi: 10.1109/
TGRS.2019.2893310.
[183] S. Li, L. Yuan, J. Sun, and L. Quan, “Dual-feature warpingbased motion model estimation,” in Proc. IEEE Int. Conf. Comput. Vision (ICCV), 2015, pp. 4283–4291.
[184] S. Nag, “Image registration techniques: A survey,” Nov. 28,
2017, arXiv:1712.07540.
[185] J. S. Bhatt and N. Padmanabhan, “Image Registration for meteorological applications: Development of a generalized software for sensor data registration at ISRO,” IEEE Geosci. Remote
Sens. Mag. (replaces Newslett.), vol. 8, no. 4, pp. 23–37, 2020. doi:
10.1109/MGRS.2019.2949382.
[186] A. Sedaghat and N. Mohammadi, “High-resolution image
registration based on improved SURF detector and localized
GTM,” Int. J. Remote Sens., vol. 40, no. 7, pp. 2576–2601, Apr.
2019. doi: 10.1080/01431161.2018.1528402.
[187] Y. Ma, J. Wang, H. Xu, S. Zhang, X. Mei, and J. Ma, “Robust image feature matching via progressive sparse spatial consensus,”
IEEE Access, vol. 5, pp. 24,568–24,579, Oct. 2017. doi: 10.1109/
ACCESS.2017.2768078.
[188] H. Goncalves, J. A. Goncalves, and L. Corte-Real, “Measures
for an objective evaluation of the geometric correction process
quality,” IEEE Geosci. Remote Sens. Lett., vol. 6, no. 2, pp. 292–
296, 2009. doi: 10.1109/LGRS.2008.2012441.
GRS
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

Deep Learning Meets SAR
Concepts, models, pitfalls, and perspectives
XIAO XIANG ZHU, SINA MONTAZERI, MOHSIN ALI, YUANSHENG HUA,
YUANYUAN WANG, LICHAO MOU, YILEI SHI, FENG XU, AND RICHARD BAMLER

eep learning in remote sensing has received considerable international hype, but it is mostly limited to the
evaluation of optical data. Although deep learning has
been introduced in synthetic aperture radar (SAR) data processing, despite successful first attempts, its huge potential
remains locked. In this article, we provide an introduction
to the most relevant deep learning models and concepts,
point out possible pitfalls by analyzing special characteristics of SAR data, review the state of the art of deep learning
applied to SAR, summarize available benchmarks, and recommend some important future research directions. With
this effort, we hope to stimulate more research in this inter-

MOTIVATION
In recent years, deep learning [1] has been developed at a
dramatic pace, achieving great success in many fields. Unlike conventional algorithms, deep learning-based methods
commonly employ hierarchical architectures, such as deep
neural networks, to extract feature representations of raw
data for numerous tasks. For instance, convolutional neural
networks (CNNs) are capable of learning low- and high-level
features from raw images with stacks of convolutional and
pooling layers and then applying the extracted features to
various computer vision tasks, such as large-scale image recognition [2], object detection [3], and semantic segmentation

©SHUTTERSTOCK.COM/WILLEM

Digital Object Identifier 10.1109/MGRS.2020.3046356
Date of current version: 9 February 2021

esting yet underexploited field and to pave the way for the
use of deep learning in big SAR data processing workflows.

DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

143

sharp features when denoising. Furthermore, the development of SAR and optical image joint analysis has been
motivated by the capacities of extracting features from
both types of images. For applications in InSAR, only a few
studies have been carried out, such as the work described
in [10]. However, these algorithms neglect the special characteristics of phase and simply use an out-of-the-box deep
learning-based model.
Despite initial successes, and unlike the evaluation of optical data, the huge potential of deep learning in SAR and
InSAR remains locked. For example, to the best knowledge
of the authors, there is no example of deep learning in SAR
that has been developed for the operational processing of
big data or integrated into the production chain of any satellite mission. This article aims at stimulating more research
in this interesting yet underexploited research field.

[4]. Inspired by numerous successful applications in the computer vision community, the use of deep learning in remote
sensing is now receiving significant attention [5]. As first attempts at SAR applications, deep learning-based methods
have been adopted for a variety of tasks, including terrain surface classification [6], object detection [7], parameter inversion [8], despeckling [9], specific functions in interferometric
SAR (InSAR) [10], and SAR–optical data fusion [11].
For terrain surface classification from SAR and polarimetric SAR (PolSAR) images, effective feature extraction is
essential. These features are extracted based on expert domain knowledge and are usually applicable to a small number of cases and data sets. Deep learning feature extraction
has, however, proved to overcome, to some degree, both
of the aforementioned issues [6]. For SAR target detection,
conventional approaches mainly rely on template matching, where specific templates are manually created [12] to
classify different categories, and the use of traditional machine learning (ML) methods, such as support vector machines (SVMs) [13], [14]; in contrast, modern deep learning
algorithms aim at applying deep CNNs to automatically extract discriminative features for target recognition [7]. For
parameter inversion, deep learning models are employed to
learn the latent mapping function from SAR images to estimated parameters, e.g., sea ice concentration [8]. Regarding
despeckling, conventional methods often rely on artificial
filters and may suffer from the improper elimination of

INTRODUCTION TO RELEVANT DEEP
LEARNING MODELS AND CONCEPTS
In this section, we briefly review relevant deep learning algorithms that were originally proposed for visual data processing and that are widely used for state-of-the-art research
into deep learning in SAR. In addition, we mention the latest deep learning developments, which are not yet widely
applied to SAR but may help create the next generation of
its algorithms. Figure 1 gives an overview of the deep learning models we discuss in this section.

CNN

(a)

(d)

Deep
Learning

Generative
Models

Deep RL

(f)

GNN
(e)

(g)

(h)

(i)

(j)

FIGURE 1. A selection of relevant deep learning models. (a) The Visual Geometry Group Network. (Source: [15].) (b) The residual neural
network (ResNet) block. (Source: [16].) (c) The U-Net. (Source: [17].) (d) The long short-term memory unit. (Source: [18].) (e) The variational
autoencoder. (Source: [19].) (f) The recurrent neural network (RNN). (Source: [20].) (g) The generative adversarial network. (Source: [21].)
(h) The convolutional graph neural network (GNN). (Source: [22].) (i) The recurrent GNN. (Source: [23].) (j) Neural architecture search using
deep reinforcement learning (RL). (Source: [24].) ReLU: rectified linear unit; GRU: gated recurrent unit.

144

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

Before discussing deep learning algorithms, we would
like to stress that the importance of high-quality benchmark data sets in deep learning research cannot be overstated. Especially in supervised learning, the knowledge
that can be gained by a model is bounded by the information present in the training data set. For example, the
Modified National Institute of Standards and Technology
[25] data set played a key role in LeCun’s seminal paper
about CNNs and gradient-based learning [26]. Similarly,
there would be no AlexNet [27], the network that kickstarted the current deep learning renaissance, without
the ImageNet [28] data set, which contains more than
14 million images and 22,000 classes. ImageNet has been
such an important part of deep learning research that,
more than 10 years after it was published, it is still used
as a standard benchmark to evaluate the performance of
CNNs for image classification.
DEEP LEARNING MODELS
The main principle of deep learning models is to encode
input data into effective feature representations for target
tasks. To exemplify how a deep learning framework works,
we take the autoencoder as an example: it first maps input data to a latent representation via a trainable nonlinear mapping and then reconstructs inputs through reverse
mapping. The reconstruction error is usually defined as
the Euclidian distance between inputs and reconstructed
inputs. Parameters of autoencoders are optimized by gradient descent-based optimizers, such as stochastic gradient descent, root mean square propagation [29], and Adam
[30], during the backpropagation step.
CNNs
With the success of AlexNet in the 2012 ImageNet Large
Scale Visual Recognition Challenge (ILSVRC), where the
network scored a top-five test error of 15.3%, compared to
the second-best test error of 26.2%, CNNs have attracted
worldwide attention and are now used for many image understanding tasks, such as image classification, object detection, and semantic segmentation. AlexNet consists of five
convolutional layers, three maximum pooling layers, and
three fully connected layers. One of the key AlexNet innovations was the use of graphics processing units (GPUs), which
made it possible to train such large networks with huge
data sets without using supercomputers. In just two years,
the Visual Geometry Group Network [2] overtook AlexNet
in performance by achieving a 6.8%, top-five test error at
ILSVRC-2014; the main difference was that it used only
3 × 3-sized convolutional kernels, which enabled it to have
more channels and, in turn, capture more diverse features.
The residual neural network (ResNet) [31], U-Net [32],
and DenseNet [33] were the next major CNN architectures.
Their main feature concerned the idea of connecting not
only neighboring layers but any two layers in the network
by using skip connections. This helped reduce information
loss across networks, mitigated the problem of vanishing
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

gradients, and facilitated the design of deeper networks.
U-Net is one of the most commonly used image segmentation networks. It has an autoencoder-based architecture
that uses skip connections to concatenate features from the
first layer to the last, the second to the second last, and so
on; this way, it can get fine-grained information from the
initial layers to the end layers. U-Net was initially proposed
for medical image segmentation, where data labeling is a
big problem. The authors employed heavy data augmentation techniques on input data, making it possible to learn
from only a few hundred annotated samples.
In ResNet, skip connections were incorporated within
individual blocks, not across the whole network. Since its
initial proposal, ResNet has undergone many architectural
tweaks, and, even after five years, its variants are always
among the top scorers on ImageNet. In DenseNet, all the
layers were attached to all preceding layers, reducing the
size of the network, albeit at the cost of memory usage.
For a more detailed explanation of different CNN models,
interested readers are referred to [34]. These CNN models
have also proved their worth in SAR processing tasks; e.g.,
see [35]–[37]. For more examples and details of CNNs in
SAR, see the “Recent Advances in Deep Learning Applied
to SAR” section.
RECURRENT NEURAL NETWORKS
Besides CNNs, recurrent neural networks (RNNs) [38] are
a major class of deep networks. Their main building blocks
are recurrent units, which take the current input and output
of the previous state as input. They provide state-of-the-art
results for processing data of variable lengths, including text
and time-series information. Their weights can be replaced
with convolutional kernels for visual processing tasks, such
as image captioning and predicting future frames/points in
visual time series data. Long short-term memory (LSTM)
[39] is one of the most popular RNN architectures: its
cells can store values from any past instances and are not
severely affected by the problem of gradient diminishing.
As with any other time-series data from deep learning tool
kits, RNNs are natural choices to process SAR time-series
information; e.g., see [40].
GENERATIVE ADVERSARIAL NETWORKS
Proposed by Ian Goodfellow et al. [41], generative adversarial networks (GANs) are among the most popular and exciting inventions in the field of deep learning. Based on gametheoretic principles, they consist of two networks called a
generator and a discriminator. The generator’s objective is to
learn a latent space through which it can create samples
from the same distribution as the training data, while the
discriminator tries to learn to distinguish whether a sample
is from the generator or the training data. This very simple
mechanism is responsible for most cutting-edge algorithms
for various applications, e.g., generating artificial photorealistic images and videos, superresolution, and text-toimage synthesis. For example, in the SAR domain, GANs
145

have already been successfully used in cloud removal applications [42], [43]. See the “Recent Advances in Deep Learning Applied to SAR” section for more examples.
SUPERVISED, UNSUPERVISED,
AND REINFORCEMENT LEARNING

[47], [48]. Recently, deep RL received particular attention
and achieved popularity due to the success of Google Deep
Mind’s AlphaGo [49], which defeated the Go board game
world champion. This task was considered impossible for
computers until a few years ago.
RELEVANT DEEP LEARNING CONCEPTS

SUPERVISED LEARNING
Most popular deep learning models fall under the category
of supervised deep learning; i.e., they need labeled data sets
to learn objective functions. One big challenge of supervised
learning is generalization, i.e., how well a trained model
performs on test data. Therefore, it is vital that training data
truly represent the actual data distribution so that they can
handle all the unseen information. If a model fits well on
training data and fails on test data, overfitting occurs. In the
deep learning literature, there are several techniques that
can be used to avoid overfitting, e.g., dropout [44].
UNSUPERVISED LEARNING
Unsupervised learning refers to the class of algorithms where
the training data do not contain labels. For instance, in classical data analysis, principal component analysis [45] can
be used to reduce the data dimension, followed by a clustering algorithm to group similar data points. In deep learning
generative models, autoencoders, variational autoencoders
(VAEs), [46] and GANs [41] are some of the popular techniques that can be used for unsupervised learning. Their
primary goal is to generate output data from the same distribution as the input data. Autoencoders consist of an encoder that finds a compressed, latent representation of the
input and a decoder that translates a representation back
to the original input. VAEs take autoencoders to the next
level by learning a whole distribution instead of just a single
representation at the end of the encoder; this, in turn, can
be used by the decoder to generate the whole distribution
of outputs. The trick to learning this distribution is to also
acquire the variance along with the mean of the latent representation at the encoder–decoder meeting point and to
add a Kullback–Leibler divergence-based loss term to the
standard reconstruction loss function of the autoencoders.
DEEP REINFORCEMENT LEARNING
Reinforcement learning (RL) tries to mimic human learning behavior, i.e., taking actions and then adjusting them
for the future, according to feedback from the environment. For example, young children learn to repeat or not
repeat their actions based on the reaction of their parents.
The RL model consists of an environment with states, actions to transition between those states, and a reward system for ending up in different states. The objective of the
algorithm is to learn the best actions for given states using a
feedback–reward system. In a classical RL algorithms function, approximators are used to calculate the probability of
different actions in different states. Deep RL employs different types of neural networks to create these functions
146

AUTOMATIC ML
Deep networks have many hyperparameters to choose from,
for example, the number of layers, kernel sizes, types of optimizers, skip connections, and the like. There are billions of
possible combinations of these parameters, and, given high
computational time and energy costs, it is hard to find the
best-performing network, even from among a few hundred
candidates. In the case of deep learning, the objective of automatic ML (AutoML) is to find the most efficient and highperforming deep network for a given data set and task. The
first major attempt in this field was by Zoph et al. [24], who
used deep RL to find the optimum CNN for image classification. In the system, an RNN creates CNN architectures
and, based on their classification results, proposes changes
to them. This process continues to loop until the optimum
architecture is found. This algorithm was better able to find
competing networks than the state of the art, but it took
more than 800 GPUs, which was unrealistic for practical application. Recently, there have been many developments in
the AutoML field, and they have made it possible to perform
such tasks in more intelligent and efficient ways. More details
about the field of network architectural search can be found
in [51]. Furthermore, AutoML has already been successfully
applied to SAR for PolSAR classification [52]. The method
shows great potential for segmentation and classification
tasks, in particular.
GEOMETRIC DEEP LEARNING:
GRAPH NEURAL NETWORKS
Except for well-structured image data, there is a large
amount of unstructured data in real life, e.g., knowledge
graphs and social networks, that cannot be directly processed by a deep CNN. Usually, these data are represented
in the form of graphs, where each node indicates an entity
and edges delineate mutual relations. As an approach to
learning from unstructured data, geometric deep learning
has been attracting increasing attention; the most common
architecture is the graph neural network (GNN), which
has also proved to be successful in dealing with structured
data. Specifically, using the terminology of graphs, nodes of
a graph can be regarded as feature descriptions of entities,
and their edges are established by measuring their relations
and distances and encoded in an adjacency matrix. Once a
graph is constructed, messages can be propagated among
nodes by simply performing matrix multiplication. Accordingly, [53] proposed graph convolutional networks (GCNs),
which are characterized by utilizing graph convolutions;
the authors of [45] accelerated the process. Moreover, the
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

units in recurrent GNNs (RGNNs) [23], [55] have been
shown to obtain achievements in learning from graphs.
The usefulness of GNNs in SAR is still to be properly explored, and [56] is one of the only attempts to do so.
POSSIBLE PITFALLS
To develop tailored deep learning architectures and prepare
suitable training data sets for SAR and InSAR tasks, it is
important to understand that SAR data are different from
optical remote sensing data, not to mention images downloaded from the Internet. In this section, we discuss the
special characteristics (and possible pitfalls) encountered
while applying deep learning to SAR. What makes SAR
data and SAR data processing by neural networks unique?
SAR data are substantially different from optical imagery in
many respects. The following points should be considered
when transferring CNN experience and expertise from optical to SAR data:
◗◗ Dynamic range: Depending on the spatial resolution, the
dynamic range of SAR images can be up to 90 dB (TerraSAR-X high-resolution spotlight data with a resolution
of roughly 1 m). Moreover, the distribution is extremely
asymmetric, with the majority of the pixels in the lowamplitude range (distributed scatterers) and a long tail
representing bright discrete scatterers, in particular, in
urban areas. Standard CNNs are not able to handle such
dynamic ranges, and hence most approaches feature
dynamic compression as a preprocessing step. In [57],
the authors first take only amplitude values from zero
to 255 and then subtract the mean values of each image. In [11] and [58], normalization is performed as a
preprocessing step, which significantly compresses the
dynamic range.
◗◗ Signal statistics: To retrieve features from SAR (amplitude and intensity) images, speckle statistics must be
considered. Speckle is a multiplicative, rather than an
additive, phenomenon. This has consequences: while
the optimum estimator of the radar brightness of a homogeneous image patch under speckle is a simple moving-average operation (i.e., a convolution, such as in the
additive noise case), other optimum detectors of edges
and low-level features under additive Gaussian noise
may no longer be optimum in the case of SAR. A popular example is Touzi’s constant false alarm rate edge
detector [59] for SAR images, which uses the ratio of
two spatial averages across adjacent windows. This operation cannot be emulated by the first layer of a standard CNN. Some studies use a logarithmic mapping of
the SAR images prior to feeding them into a CNN [9],
[60]. This turns speckle into an additive random variable and, as a side effect, reduces the dynamic range.
But, still, a single convolutional layer can emulate only
approximations to optimum SAR feature estimators. It
could be valuable to supplement the original logarithmic SAR image by a few low-pass-filtered and logarithmized versions as input to a CNN. Another approach is
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

to apply a sophisticated speckle reduction filter before
entering a CNN, e.g., nonlocal averaging [61]–[63].
◗◗ Imaging geometry: SAR image coordinates’ range and azimuth are not arbitrary, such as east and north or x and y,
but, rather, reflect the peculiarities of the image generation process. Layover always occurs at the near range of
an object, and shadow always results at the far range.
That means data augmentation by SAR image rotation
would lead to nonsense imagery that would never be
generated by a SAR.
◗◗ The complex nature of SAR data: The most valuable SAR
data information lies in its phase. This applies to SAR image formation, which takes place in the complex signal
domain, as well as for PolSAR, InSAR, and tomographic
SAR data processing, meaning that an entire CNN must
be able to handle complex numbers. For the convolution operation, this is trivial. The nonlinear activation
function and the loss function, however, require thorough consideration. Depending on whether the activation function independently acts on the real and imaginary parts of the signal, or only on its magnitude, and
where bias is added, the phase will be distorted to different degrees.
If we use PolSAR data for land cover and target classification, a nonlinear processing of the phase is even desirable because the phase between different polarimetric
channels has physical meaning and hence contributes
to the classification process. In SAR interferometry and
tomography, however, the absolute phase has no meaning; i.e., the CNN must be invariant to an arbitrary phase
offset. Assume some interferometric input signal x to a
CNN and the output signal CNN(x) with phase
zt = +CNN (x). (1)
Any constant phase offset z 0 does not change the meaning of the interferogram. Thus, we require an invariance
that we refer to as phase linearity (which is valid at least
in the expectation):
CNN (xe jz 0) = CNN (x) e jz 0. (2)
This linearity is violated, for example, if the activation
function is separately applied to real and imaginary
parts and if a bias is added to the complex numbers.
Another point to consider in regression-type InSAR
CNN processing (e.g., for noise reduction) is the loss
function. If the quantity of interest is not the complex
number itself but its phase, the loss function must be
able to handle the cyclic nature of phases. It may also be
advantageous that the loss function is independent, at
least to a certain degree, of the signal magnitude to relieve a CNN from modeling the magnitude. A loss function that meets these requirements is, for example,
L = E 6e j (+CNN (x) - +y)@ , (3)
147

where y is the reference signal. Some authors use the
magnitude and phase, rather than real and imaginary
parts, as input to a CNN. This approach is not invariant to phase offset, either. The interpretation of a phase
function as a real-valued function forces a CNN to disregard the sharp discontinuities at the ! r transitions,
whose positions are inconsequential. A standard CNN
would pounce on these, interpreting them as edges.
◗◗ Simulation-based training and validation data: The prevailing lack of ground truth for regression-type tasks, such
as speckle reduction and InSAR denoising, might tempt
us to use simulated SAR data for the training and validation of neural networks. However, this bears the risk
that our networks will learn models that are far too simplified. Unlike optical imaging, where highly realistic
scenes can be simulated, e.g., by PC games, the simulation of SAR data is more of a scientific topic that lacks the
power of commercial companies and a huge market. SAR
simulators focus on specific scenarios, e.g., vegetation

(only distributed scatterers are considered) and persistent (point) scatterers. The most advanced simulators are
probably the ones for computing the radar backscatter signatures of single military objects, such as vessels. To our
knowledge, though, there is no simulator available that
can, e.g., generate realistic interferometric data of rugged
terrain with layover, spatially varying coherence, and diverse scattering mechanisms. Often, simplified scattering
assumptions are made, e.g., that speckle is multiplicative. Even this is not true; pure Gaussian scattering can
be found only for quite homogeneous surfaces and lowresolution SARs. As soon as the resolution increases, the
chances of having a few dominating scatterers in a resolution cell increase, and the statistics become substantially
different from those of fully developed speckle
RECENT ADVANCES IN DEEP
LEARNING APPLIED TO SAR
In this section, we provide an in-depth review of deep
learning methods applied to SAR data from six perspectives: terrain surface classification, object detection, parameter inversion, despeckling, InSAR, and SAR–optical
data fusion. For each, we state notable developments in
chronological order and report their advantages and disadvantages. Finally, each section concludes with a brief summary. It is worth mentioning that the application of deep
learning to SAR image formation is not explicitly treated
here. For SAR focusing, we have to distinguish between
general-purpose focusing and the imaging of objects with a
priori known properties, such as sparsity. General-purpose
algorithms produce data for applications including land use
and land cover classification, glacier monitoring, biomass
estimation, and interferometry. These are complex-valued,
focused data that retain all the information contained in
the raw data.
General-purpose focusing has a well-defined system
model and requires a sequence of fast Fourier transforms
148

(FFTs) and phasor multiplications, i.e., linear operations,
such as matrix–vector multiplications. For decades, optimal algorithms have been developed to perform these
operations at the highest possible speeds and with diffraction-limited accuracy. There is no reason that deep neural
networks should perform better or faster than this gold
standard. If we want to introduce prior knowledge about
imaged objects, however, specialized focusing algorithms
may be beneficially learned by neural networks. But, even
then, it might make sense to focus raw data first through a
standard algorithm and apply deep learning for postprocessing. In [64], a CNN is trained to focus sparse military
targets. Nevertheless, in this approach, the raw data are partially focused by an FFT before entering the CNN.
TERRAIN SURFACE CLASSIFICATION
As an important direction for SAR applications, terrain surface classification using PolSAR images is rapidly advancing
with the help of deep learning. Regarding feature extraction, most conventional methods rely on exploring physical scattering properties [65] and texture information [66]
in SAR images. However, these features are mainly human
designed based on specific problems and characteristics of
data sources. Compared to conventional methods, deep
learning is superior in terrain surface classification due to
its capability of automatically learning discriminative features. Moreover, deep learning approaches, such as CNNs,
can effectively extract not only polarimetric characteristics
but also spatial patterns of PolSAR images [6]. Some of the
most notable deep learning techniques for PolSAR image
classification are reviewed in the following.
Xie et al. [67] first applied deep learning to terrain surface classification using PolSAR images. They employed a
stacked autoencoder (SAE) to automatically learn deep features from PolSAR data and then fed the data to a Softmax
classifier. Remarkable improvements in both the classification accuracy and the visual effect proved that this method
could effectively learn a comprehensive feature representation for classification purposes. Instead of simply applying
an SAE, Geng et al. [70] proposed a deep convolutional
autoencoder (DCAE) for automatically extracting features
and performing classification. The first layer of the DCAE
is a handcrafted convolutional layer, where filters are predefined, such as gray-level co-occurrence matrices and Gabor filters. The second layer performs a scale transformation, which integrates correlated neighbor pixels to reduce
speckle. Following these two layers, a trained SAE, which
is similar to [67], is attached for learning more abstract
features. Tested on high-resolution, single-polarization
TerraSAR-X images, the method achieved remarkable classification accuracy.
Based on a DCAE, for SAR image classification, Geng
et al. [68] proposed a framework, called the deep supervised
and contractive neural network (DSCNN), which introduced
a histogram of oriented gradient descriptors. In addition,
a supervised penalty was designed to capture relevant
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

classification. This method is built on two feature extraction channels: one to extract polarization features from the
six-channel real matrix and the other to extract the spatial
features of a Pauli decomposition. Next, the extracted features are combined using two parallel, fully connected layers, and they are finally fed to a Softmax layer for classification. The detailed architecture of this network is illustrated
in Figure 3.
Different variations of CNNs have been used for terrain surface classification, as well. In [77], Zhou et al. first
extracted a six-channel covariance matrix and then fed it
to a trainable CNN for PolSAR image classification. Wang
et al. [78] proposed a fully convolutional network (FCN) integrated with sparse and low-rank subspace representations
for classifying PolSAR images. Chen et al. [79] improved
CNN performance by incorporating expert knowledge of
target scattering mechanism interpretation and polarimetric feature mining. In a more recent work [80], He et al. proposed the combination of features learned from nonlinear
manifold embedding and applying an FCN to input PolSAR images; the final classification was carried out in an
ensemble approach by an SVM. In [81], the authors focused
on the computational efficiency of deep learning methods,
proposing the use of lightweight 3D CNNs. They showed
that a classification accuracy comparable to other CNN
methods was achievable while significantly reducing the
number of learned parameters and therefore gaining computational efficiency.

information between features and labels, and a contractive
restriction, which can enhance the local invariance, was employed in the following trainable autoencoder layers. An example of applying the DSCNN to TerraSAR-X data from a small
area in Norway appears in Figure 2. Compared to other algorithms, the ability of the DSCNN to achieve a highly accurate
and noise-free classification map is observed.
In addition to the aforementioned methods, many studies integrate SAE models with conventional classification
algorithms for terrain surface classification. Hou et al. [73]
proposed an SAE combined with superpixels for PolSAR
image classification. Multiple layers of the SAE are trained
on a pixel-by-pixel basis. Superpixels are formed based
on Pauli-decomposed pseudocolor images. Outputs of
the SAE are used as features in the final step of k-nearestneighbor superpixel clustering. Zhang et al. [74] applied a
sparse SAE to PolSAR image classification by taking into
account local spatial information. Qin et al. [75] applied
adaptive restricted Boltzmann machine boosting to PolSAR image classification. Zhao et al. [76] proposed a discriminant deep belief network for SAR image classification,
in which discriminant features are gleaned by combining
ensemble learning with a deep belief network in an unsupervised manner. Moreover, taking into account that most
current deep learning methods aim at exploiting features
from PolSAR image polarization information and spatial
information, Gao et al. [72] proposed a dual-branch CNN
to learn features from both perspectives for terrain surface

(a)

(b)

(g)

(c)

(h)

(d)

(i)

(e)

(j)

(f)

(k)

FIGURE 2. Classification maps obtained from a TerraSAR-X image of a small area in Norway [68]. (a)–(f) depict the results of classifica-

tion using (a) an SVM (accuracy = 78.42%), (b) a sparse representation classifier (SRC) (accuracy = 85.61%), (c) a random forest (accuracy
= 82.20%) [69], (d) an SAE (accuracy = 87.26%) [67], (e) a DCAE (accuracy = 94.57%) [70], and (f) a contractive autoencoder (accuracy
= 88.74). (g)–(i) show the combination of a DSCNN with (g) an SVM (accuracy = 96.98%), (h) an SRC (accuracy = 92.51%) [71], and (i) a
random forest (accuracy = 96.87%). (j) and (k) represent the classification results of (j) a DSCNN (accuracy = 97.09%) and (k) a DSCNN
followed by spatial regularization (accuracy = 97.53%), which achieves higher accuracy than the other methods.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

149

Apart from these single-image classification schemes using CNNs, the use of SAR image time series for crop classification has been shown in [40] and [82]. The authors of both
papers experimented with using RNN-based architectures
to exploit the temporal dependency of multitemporal SAR
images to improve classification accuracy. A unique approach for tackling PolSAR classification was recently proposed in [52], where, for the first time, the authors utilized
an AutoML technique to find the optimum CNN architecture for each data set. The approach takes into account
the complex nature of PolSAR images, is cost effective, and
achieves high classification accuracy [52].
Most of the aforementioned methods rely primarily on
preprocessing and transforming raw, complex-valued data
into features in the real domain and then inputting the
data in a common CNN, which constrains the possibility of
directly learning features from raw information. To tackle
this problem, Zhang et al. [83] proposed a novel complexvalued CNN (CV-CNN) specifically designed to process
complex values in PolSAR data, i.e., the off-diagonal elements of a coherency or covariance matrix. The CV-CNN
not only takes complex numbers as inputs but also employs complex weights and complex operations throughout different layers. A complex-valued backpropagation
algorithm was also developed for CV-CNN training. Other
notable complex-valued deep learning approaches for classification using PolSAR images can be found in [84]–[86].
Differing from the previously mentioned works, which exploit the complex-valued nature of SAR images in PolSAR
image classification, Huang et al. [87] recently proposed
a novel deep learning framework called the Deep SAR-Net
for land use classification focusing on feature extraction
from single-polarimetric complex SAR images. The authors
perform a feature fusion based on spatial features learned

PolSAR Data
Preprocessing

Dual-CNN Feature Extraction and Classification
Convolution 61, Convolution 62,
100 at 3 × 3 FC6_200
500 at 3 × 3
Pooling
Pooling FC6_84

Six Channels
Matrix T
Six-Channel
CNN

from intensity images and time–frequency features extracted from the spectral analysis of complex SAR images. Since
the time–frequency features are highly relevant for distinguishing different backscattering mechanisms within SAR
images, they gain accuracy in classifying man-made objects
compared to the use of typical CNNs, which focus only on
spatial information.
Although not completely related to terrain surface classification, it is also worth mentioning that the combination
of SAR and PolSAR images with feed-forward neural networks has been extensively used for sea ice classification.
This topic is not treated any further in this section, and the
interested reader is referred to [88]–[92] for more information. Similar to the polarimetric signature, InSAR coherence provides information about physical scattering properties. In [35], interferometric volume decorrelation is used
as a feature for forest/nonforest mapping together with radar backscatter and the incidence angle. The authors used
bistatic TerraSAR-X Add-On for Digital Elevation Measurement
data, where temporal decorrelation can be neglected. They
compared different architectures and concluded that CNNs
outperformed the random forest and that the U-Net [32]
proved best for this segmentation task.
To summarize, it is apparent that deep learning-based
SAR and PolSAR classification algorithms have advanced
considerably in the past few years. Although, at first, the
emphasis was on low-rank representation learning using SAEs [67] and its modifications [70], later research
focused on a multitude of issues relevant to SAR imagery,
such as taking into account speckle-preserving [68], [70]
spatial structures [72] and their complex nature [83]–[85],
[87]. It can also be seen that the labeled data scarcity challenge has driven researchers to use semisupervised learning algorithms [86], although weakly supervised methods

FC6_14
Softmax

Dual-CNN

Class 1
Class 1

Pauli RGB
CNN

Class N
Pauli
Decomposition

Pooling
Pooling
FC3_84
Pauli RGB
Convolution 31, Convolution 32, FC3_200
100 at 3 × 3
500 at 3 × 3

FIGURE 3. The architecture of the dual-branch deep CNN (the Dual-CNN) for PolSAR image classification proposed in [72]. FC: fully connected; RGB: red–green–blue.

150

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

for semantic annotation, which have been proposed for
high-resolution optical data [93], have not been explicitly explored for classification tasks using SAR data. Furthermore, specific metric-learning approaches to enhance
class separability [94] can be adopted for SAR imagery to
improve overall classification accuracy. Finally, one of ML’s
important fields, AutoML, which had not been extensively
exploited by the remote sensing community, has found an
application in PolSAR image classification [52].
OBJECT DETECTION
Although various characteristics distinguish SAR images
from optical red–green–blue (RGB) images, the SAR object detection problem is still analogous to optical image
classification and segmentation in the sense that feature
extraction from raw data is always a prior and crucial step.
Hence, given the success in the optical domain, there is
no doubt that deep learning is one of the most promising
ways to develop state-of-the-art SAR object detection algorithms. The majority of the earlier work related to SAR
object detection using deep learning consists of taking
successful deep learning methods for optical object detection and applying them with minor tweaks to military
vehicle detection [the Moving and Stationary Target Acquisition Recognition (MSTAR) data set] and ship detection with custom data sets. Even small networks are easily
able to achieve more than 90% test accuracy for most of
these tasks.
The first attempt at military vehicle detection can be
found in [7], where Chen et al. used an unsupervised sparse
autoencoder to generate convolution kernels from random
patches of a given input for a single-layer CNN, which generated features to train a Softmax classifier for categorizing military targets in the MSTAR data set [96]. The experiments in [7] showed great potential for applying CNNs to
SAR target recognition. With this discovery, Chen et al. [97]
proposed A-ConvNets, a simple five-layer CNN that was
able to achieve state-of-the-art accuracy of approximately
99% on MSTAR. Following this trend, more and more authors applied CNNs to MSTAR [37], [98], [99]. Morgan [37]
successfully applied a modestly sized, three-layer CNN to
MSTAR, and, building on that work, Wilmanski et al. [100]
investigated the effects that initialization and optimizer selection had on the final results. Ding et al. [98] investigated
the capabilities of a CNN model combined with domainspecific data augmentation techniques (e.g., pose synthesis
and speckle adding) in SAR object detection. Furthermore,
Du et al. [99] proposed a displacement- and rotation-insensitive CNN and claimed that data augmentation using
training samples is necessary and critical during the preprocessing stage.
On the same data set, instead of treating a CNN as an
end-to-end model, Wagner [101] and, similarly, Gao [102]
integrated a CNN and an SVM by first using a CNN to
extract features and then feeding the features to an SVM
for final prediction. Specifically, Gao et al. [103] added a
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

class of separation information to the cross-entropy cost
function as a regularization term, which they showed explicitly facilitated intraclass compactness and separability
and improved the quality of the extracted features. More
recently, Furukawa [104] proposed VersNet, an encoder–
decoder-style segmentation network, to not only identify
but localize multiple objects in an input SAR image. Moreover, Zhang et al. [95] proposed an approach based on multiaspect image sequences as a preprocessing step. They accounted for backscattering signals from different viewing
geometries, followed by feature extraction through Gabor
filters and dimensionality reduction; they eventually fed
the results to a bidirectional LSTM model for the joint recognition of targets. This SAR awareness-trial-repeat framework is presented in Figure 4.
Ship detection is another SAR task. Early studies of applying deep learning models to ship detection [105]–[109]
mainly consisted of two stages: first, cropping patches from
the whole SAR image and then identifying whether cropped
patches belonged to target objects by using a CNN. Because
of fixed patch sizes, these methods were not robust enough
to accommodate variations in ship geometry, such as size and
shape. This problem was overcome by using region-based
CNNs [110], [111], with the creative use of skip connections
and feature fusion techniques in later literature. For example,
Li et al. [112] fused features of the last three convolution layers before feeding them to a region proposal network (RPN).
Kang et al. [113] introduced a contextual region-based network that fused features from different levels. Meanwhile, to
make the most use of features of different resolution, Jiao
et al. [114] densely connected each layer to subsequent ones
and fed features from all the layers to a separate RPN to generate proposals; in the end, the best proposal was chosen
based on an intersection–overunion score.
In more recent works on SAR object detection, scientists have tried to explore many other interesting ideas to
complement current efforts. Dechesne et al. [115] proposed
a multitask network that simultaneously learned to detect,
classify, and estimate the length of ships. Mullissa et al. [84]
showed that CNNs can be trained directly with complexvalued SAR data; Kazemi et al. [117] performed object classification using an RNN-based architecture directly on received SAR signals instead of processed SAR images; and
Rostami et al. [118] and Huang et al. [119] explored knowledge transfers and transfer learning from other domains to
the SAR arena for object detection.
Perhaps one of the more interesting recent works in
this application area relates to building detection, by
Shahzad et al. [120]. The authors tackle the problem of
very-high-resolution (VHR) SAR building detection using
an FCN [121] architecture for feature extraction, followed
by a conditional random fields RNN [122], which helps
give similar weights to neighboring pixels. This architecture produced building segmentation masks with up to
93% accuracy. An example of the detected buildings can
be seen in Figure 5, where Figure 5(a) is the amplitude of
151

FIGURE 4. A flowchart of the multiaspect-aware bidirectional approach for SAR automatic target recognition proposed in [95]. TPLBP: three-patch local binary pattern.

Original Images

Multiaspect
Multiaspect
Image Sequence
Sample
Construction

Feature Detection

Dimensionality Multiaspect Feature Learning
Reduction

Classification

T72
: 0.9
BMP2 : 0.03
BRDM2: 0.07
Softmax
LSTM
LSTM
TPLBP
Gabor
Filter

LSTM

T72
: 0.92
BMP2 : 0.02
BRDM2: 0.06
Softmax
LSTM
LSTM
TPLBP
Gabor
Filter

LSTM

T72
: 0.98
BMP2 : 0.01
BRDM2: 0.01
Softmax
LSTM
LSTM
TPLBP
Gabor
Filter

LSTM

152

the input TerraSAR-X image of Berlin and Figure 5(b) is
the predicted building mask. Another major contribution
made in that paper addresses the lack of training data by
introducing an automatic annotation technique, which
annotates the SAR tomography data using Open Street
Map (OSM) data.
As an extension of the preceding work, Sun et al. [123]
tackled the problem of individual building segmentation in
large-scale urban areas. They proposed a conditional geographic information system (GIS)-aware network (CG-Net)
that learns multilevel visual features and employs building
footprint data to normalize these features for predicting
building masks. Thanks to the novel network architecture
and the large number of building labels automatically generated from accurate digital elevation model (DEM) and
GIS building footprints, this network achieves an F1 score
of 75.08% for individual building segmentation. With
the predicted building masks, large-scale level-of-detail 1
building models are reconstructed, with a mean height error of 2.39 m.
Overall, deep learning has shown very good performance in existing SAR object detection tasks. There are two
main challenges that the algorithm designer needs to keep
in mind when tackling any SAR object detection tasks. The
first relates to identifying characteristics of SAR imagery,
such as imaging geometry, the size of objects, and speckle
noise. The second and bigger difficulty concerns the lack of
good quality standardized data sets. As we observed, the
most popular data set, MSTAR, is too easy for deep nets,
and, for ship detection, the majority of authors create their
data sets, which makes it very hard to judge the quality of
the proposed algorithms and even harder to compare different algorithms. An example of a difficult-to-create data
set can be found in global building detection. The shape,
size, and style of buildings change quite drastically from
region to region, and so a good data set for this purpose re
quires training examples taken from buildings from around
the world, a task that requires significant effort to produce
high-quality annotations of enough structures that deep
nets can learn from them.
PARAMETER INVERSION
Parameter inversion from SAR images is a challenging field in
SAR applications. As one important branch, ice concentration estimation is now attracting great attention due to its
importance to ice monitoring and climate research [124].
Since there are complex interactions between SAR signals
and sea ice [125], empirical algorithms face difficulties
with interpreting SAR images for accurate ice concentration estimation. Wang et al. [8] resorted to a CNN for
generating ice concentration maps from dual-polarized
SAR images. Their method takes image patches of intensity-scaled dual-band SAR images as inputs and directly
outputs ice concentrations. In [126] and [127], Wang et al.
employed various CNN models to estimate ice concentrations from SAR images during the melt season. Labels
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

were produced by ice experts via visual interpretation. The
algorithm was tested on dual-polarization RadarSat-2 data.
Since the problem under consideration concerns the regression of a continuous value, the mean square error is selected as the loss function. Experimental results demonstrate
that CNNs can offer a more accurate result than comparative operational products.
In a different application, Song et al. [130] used a deep
CNN, including five pairs of convolutional and maximum
pooling layers followed by two fully connected layers, for inverting rough surface parameters from SAR images. The network training was based on simulated data, due solely to the
scarcity of real training material. The method was able to invert the desired parameters with a reasonable accuracy, and
the authors showed that training a CNN for parameter inversion purposes could be done quite efficiently. Furthermore,
Zhao et al. [131] designed a CV-CNN to directly learn physical scattering signatures from PolSAR images. The authors
notably proposed a framework to automatically generate
labeled data, which led to a supervised learning algorithm
for the aforementioned parameter inversion. The approach
is similar to the study presented in [132], where the authors
used deep learning for SAR image colorization and for learning a full PolSAR image from single-polarization data. Another interesting application of deep learning in parameter
inversion was recently published in [133]. The authors propose a deep neural network architecture containing a CNN
and a GAN to automatically learn SAR image simulation
parameters from a small number of real SAR images. They
later feed the learned parameters to a SAR simulator, such as
RaySAR [134], to generate a wide variety of simulated SAR
images, which can increase training data production and
improve the interpretation of SAR images that have complex
backscattering scenarios.
On the whole, deep learning-based parameter estimation for SAR applications has not yet been fully exploited.
Unfortunately, most of the remote sensing community’s

focus has been devoted to classical problems, which overlap with computer vision tasks, such as classification, object detection, segmentation, and denoising. One reason
for this might be that, since parameter estimation usually
requires the incorporation of appropriate physical models
and tackles the problem at hand as regression rather than
classification, domain knowledge is quite essential for
applying deep learning for such tasks, especially for SAR
images, with their peculiar physical characteristics. One
interesting study [87], described in detail in the “Terrain
Surface Classification” section, designs discriminative features through the spectral analysis of complex-valued
SAR data and is an important work toward including deep
learning in parameter inversion studies using SAR data. We
hope that, in the future, more studies will be carried out in
this direction.
DESPECKLING
Speckle, which is caused by the coherent interaction among
scattered signals from subresolution objects, often makes
processing and interpreting SAR images difficult. Therefore, despeckling is a crucial procedure before applying SAR
images to various tasks. Conventional methods aim at removing speckle either spatially, where local spatial filters,
such as Lee [135], Kuan [136], and Frost filters [137], are
employed, or by using wavelet-based methods [138]–[140].
For a full overview of these techniques, the reader is referred
to [141]. During the past decade, patch-based methods for
speckle reduction have gained popularity due to their ability
to preserve spatial features while not sacrificing image resolution [142]. Deledalle et al. [143] proposed one of the first
nonlocal patch-based methods applied to speckle reduction
by taking into account the statistical properties of speckle,
combined with the original nonlocal image denoising algorithm introduced in [144]. A vast number of variations of the
nonlocal method for SAR despeckling have been proposed,
with the most notable ones included in [145] and [146].

(a)

(b)

FIGURE 5. (a) A VHR TerraSAR-X image of Berlin and (b) the predicted building mask [120] (right).
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

153

However, on the one hand, the manual selection of appropriate parameters for conventional algorithms is not easy
and is sensitive to reference images. On the other hand, it
is difficult to achieve a balance between preserving distinct
image features and removing artifacts through empirical
despeckling methods. To solve these limitations, methods
based on deep learning have been developed.
Inspired by the success of image denoising using a residual
learning network architecture in the computer vision community [147], Chierchia et al. [60] first introduced a residual learning CNN for SAR image despeckling by presenting
a 17-layer CNN for learning to subtract speckle components
from noisy images. Considering that speckle noise is assumed
to be multiplicative, a homomorphic approach with coupled
logarithmic and exponential transformations is performed
before and after feeding images to the network. In this case,
multiplicative speckle noise is transformed into an additive
form and can be recovered by residual learning, where logarithmic speckle noise is regarded as residual. As shown in Figure 6, an input logarithmic noisy image is identically mapped
to a fusion layer via a shortcut connection and then added
elementwise with the learned residual image to produce a
logarithmic clean image. Afterward, denoised images can be
obtained by an exponential transformation.
Wang et al. [9] proposed a CNN, called Intelligence Detection Using a CNN, for image despeckling, that can directly
learn denoised images via a componentwise division-residual layer with skip connections. In another words, homomorphic processing is not introduced for transforming multiplicative noise into additive noise, and, at a final stage, the
noisy image is divided by the learned noise to yield a clean
image. As a step forward with respect to the two aforementioned residual-based learning methods, Zhang et al. [148]
employed a dilated residual network (DRN), SAR–DRN,
instead of simply stacking convolutional layers. Unlike [60]
and similar to [9], SAR–DRN is trained in an end-to-end
fashion using a combination of dilated convolutions and
skip connections with a residual learning structure, which
indicates that prior knowledge, such as a noise description
model, is not required in the workflow.
In [149], Yue et al. proposed a novel deep neural network
architecture specifically designed for SAR despeckling. It
used a CNN to extract image features and reconstruct a

discrete radar cross section (RCS) probability density function (PDF). It was trained by a hybrid loss function that
measured the distance between the actual SAR image intensity PDF and the estimated one derived from convolution
between the reconstructed RCS PDF and a prior speckle
PDF. Experimental results demonstrated that the proposed
despeckling neural network could achieve performance
comparable to nonlearning state-of-the-art methods. The
unique distribution of SAR intensity images was also taken
into account in [150]. The authors proposed a different loss
function, which contained three terms between the true
and reconstructed images: the common L2 loss, the L2 difference between the gradient of the two images, and the
Kullback–Leibler divergence between the distribution of
the two images. The three terms are designed to emphasize
spatial details, the identification of strong scatterers, and
speckle statistics, respectively. Experiments in [150] show
improved performance compared to the SAR–block-matching 3D algorithm (BM3D) [128] and SAR–DRN [148].
In [57], the problem of despeckling was tackled using
a time series of images. Employing a stack of images for
despeckling is not unique to deep learning-based methods,
as recently demonstrated in [151]. In [57], the authors utilized a multilayer perceptron with several hidden layers to
learn nonlinear intensity characteristics of training image
patches. This approach showed promising results and comparative performance with the state-of-the-art despeckling
algorithms. Again using single images instead of time series, in [36], the authors proposed a deep encoder–decoder
CNN architecture with a focus on feature preservation,
which is a weakness of CNNs. They modified the U-Net
[32] to accommodate speckle statistical features. Another
notable CNN approach was introduced in [129], where the
authors employed a nonlocal structure, while the weights
for pixelwise similarity measures were assigned using a
CNN. The results of this approach, called CNN–nonlocal
means (NLM), are reported in Figure 7, where the superiority of the method with respect to both feature preservation
and speckle reduction is clearly observed.
One of the drawbacks of the aforementioned algorithms
is the requirement of noise-free and noisy image pairs for
training. Often, those training data are simulated using optical images with multiplicative noise. This is, of course, not

Noisy Image

–
+
Residual Image

Exponent

Convolution

Convolution +
BN + ReLU

Convolution + ReLU

Logarithm

CNN

Filtered Image

FIGURE 6. The CNN architecture for SAR image despeckling [60]. BN: belief network.

154

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

ideal for real SAR images. Therefore, one elegant solution is
the noise-to-noise framework [152], where the network requires only two noisy images of the same area. The authors
of [152] prove that the network is able to learn a clean representation of the image, given that the noise distributions
of the two noisy images are independent and identical. This
idea was employed in SAR despeckling in [153]. The authors made use of multitemporal SAR images of the same
area as the input to the noise-to-noise network. To mitigate
the effect of the temporal change between the input SAR
image pairs, they multiplied a patch similarity term to the
original loss function.
From the deep learning-based despeckling methods reviewed in this section, it can be observed that most methods

(a)

(b)

(c)

employ CNN-based architectures with single images of a
scene for training; they either output clean images in an
end-to-end fashion or propose residual-based techniques
to learn underlying noise models. With the availability of
large archives of time series thanks to the Sentinel-1 mission,
an interesting direction is to exploit the temporal correlation of speckle characteristics for despeckling applications.
One critical issue is oversmoothing in despeckling, and it
needs to be addressed. Many of the CNN-based methods
perform well in terms of speckle removal but are not able to
preserve sharp edges. This is quite problematic in despeckling high-resolution SAR images of urban areas, in particular. Another problem in supervised deep learning-based
despeckling techniques concerns the lack of ground truth

(d)

(e)

FIGURE 7. A comparison of speckle reduction among SAR–BM3D [128], SAR–CNN [60], and CNN–NLM applied to a small strip of Constella-

tion of Small Satellites for Mediterranean Basin Observation–SkyMed data above Caserta, Italy, where the reference clean image has been
obtained by temporal multilooking applied to a stack of SAR images [129]. (a) The clean image. (b) The noisy image. (c) SAR–BM3D
is applied. (d) SAR–CNN is applied. (e) CNN–NLM is applied.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

155

data. In many studies, the training data set is built by corrupting optical images through multiplicative noise. This is
far from realistic for despeckling applied to real SAR data.
Therefore, despeckling in an unsupervised manner would
be highly desirable and worth attention.
InSAR
InSAR is one of the most important SAR techniques, and
it is widely used in reconstructing the topography of the
Earth’s surface, i.e., DEM generation [65], [154], [155], and
detecting topographical displacements, e.g., monitoring
volcanic eruptions [156]–[158], earthquakes [159], [160],
land subsidence [161], and urban areas by using time-series
methods [162]–[164]. The principle of InSAR is to first measure the interferometric phase between signals received by
two antennas located at different positions and then extract
topographic information from the obtained interferogram
by unwrapping and converting the absolute phase to height.
However, an actual interferogram often suffers from a large
number of singular points, which originate from the interference distortion and noise in radar measurements. These
points result in unwrapping errors and, consequently, lowquality DEMs.
To tackle this problem, Ichikawa and Hirose [165] applied a complex-valued neural network (CV-NN) in the
spectral domain to restore singular points. With the help of
the complex Markov random field filter [166], they aimed
at learning ideal relationships between the spectrum of
neighboring pixels and that of the center pixels via a onehidden-layer CV-NN. Notably, the center pixels of each
training sample are supposed to be ideal points, which indicates that singular points are not fed to the network during
the training procedure. Similarly, Oyama and Hirose [167]
restored singular points with a CV-NN in the spectrum
domain. Related to topography extraction, Costante et al.
[169] proposed a full CNN encoder–decoder architecture
for estimating DEMs from single-pass image acquisitions.
They demonstrated that this model was capable of extracting high-level features from input radar images using an
encoder section and then reconstructing full-resolution
DEMs via a decoder section. Moreover, the network can potentially solve the layover phenomenon in one single-look
SAR image that has contextual features.
In addition to reconstructing DEMs, Schwegmann et al.
[170] presented a CNN-based technique to detect subsidence deformations from interferograms. They employed
a nine-layer network to extract salient information from
interferograms and displacement maps for discriminating deformation targets from deformation-like targets.
Furthermore, Anantrasirichai et al. [10], [171], [172] used
a pretrained CNN to automatically detect volcanic ground
deformation through InSAR images. They divided each
image into patches and relabeled it with binary labels, i.e.,
“background” and “volcano,” and finally fed it to the network to predict volcano deformation. In [173], they further
improved their method to be able to detect slow-moving
156

volcanoes using a time series of interferograms. In another
study related to automatic volcanic deformation detection, Valade et al. [168] designed and trained a CNN from scratch to
learn a decorrelation mask from input wrapped interferograms; the CNN then was used to detect volcanic ground
deformation. A flowchart of this approach appears in Figure 8. The training in both [168] and [173] was based on
simulated data.
Another geophysics-motivated example of using deep
learning on InSAR data, which was actually proposed earlier than the previously mentioned CNN-based studies,
can be found in [174]–[176], where the authors used simple
feed-forward shallow neural networks for seismic event
characterization and automatic seismic source parameter
inversion by exploiting the power of neural networks in
solving nonlinear problems. Recently, deep learning has
been utilized for tomographic processing, as well. An unfolded deep network that involves vector-approximate
message-passing algorithms was proposed in [177]. Experiments with simulated and real data were performed, showing the spectral estimation gains and achieving competitive
performance. In [178], a real-valued deep neural network
was applied for multiple-input, multiple-output SAR 3D
imaging. It displayed a better superresolution power compared with other compressive sensing-based methods.
In summary, it can be concluded that the use of deep
learning methods in InSAR is still at a very early stage. Although deep learning has been incorporated in different
applications combined with InSAR, the full potential of
interferograms has not been fully exploited, except in the
pioneering work of Hirose [179]. Many applications treat
interferograms and deformation maps obtained from interferograms as images similar to RGB and gray-scale ones,
and therefore the complex nature of interferograms has remained unnoticed. Apart from this issue, such as the SAR
despeckling problem related to deep learning, the lack of
ground truth data for detection and image restoration problems provides motivation to focus on developing semisupervised and unsupervised algorithms that combine deep
learning and InSAR. Otherwise, a training database consisting of interferograms for different scenarios and different phase contributions could be beneficial for supervised
learning applications. Simulation-based interferogram generation for the latter was recently proposed in [180].
SAR–OPTICAL DATA FUSION
The fusion of SAR and optical images can provide complementary information about targets. However, considering
the two different sensing modalities, the prior identification and the coregistration of corresponding images are
challenging [181] but compulsory. For the purpose of identifying and matching SAR and optical images, many current methods resort to deep learning, given its powerful
capabilities of extracting effective features from complex
images. In [58], the authors proposed a CNN for identifying corresponding image patches of VHR optical and SAR
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

via a concatenation layer for further binary prediction of
their correspondence. A selection of true positives, false
positives, false negatives, and true negatives of SAR–optical
image patches from [58] is presented in Figure 9. Similarly,

imagery of complex urban scenes. Their network consists
of two streams: one designed for extracting features from
optical images and one responsible for learning features
from SAR images. Next, the extracted features are fused

Synthetic Decorrelation Mask

Synthetic Training Data

Synthetic Wrapped Interferogram

Synthetic Phase Gradients
Gradients, y
Gradients, x

Input

CNN
Training

Desired Outputs
(a)

Trained
CNN
Wrapped Interferogram

Estimated Decorrelation Mask

Estimated Phase Gradients

Real Data

Gradients, y
Gradients, x

(b)
Estimated Unwrapped Phase (W)

Deformation Map (Wm)
Wm = W . λ /4π

Deformation Score (DEF)
DEF = std – dev (Wm)

Phase
Unwraping

(c)

– Time Series and Deformation Maps (Public Website)
– email Alert If DEF > 0.001 (Private List)
(d)

FIGURE 8. The workflow of the volcano deformation (DEF) detection proposed in [168]. The CNN is trained on simulated data and later

used to perceive phase gradients and a decorrelation mask from the input wrapped interferograms to locate ground deformation caused by
volcanoes. (a) The CNN training. (b) The phase gradient detection. (c) The phase unwrapping and score computation. (d) The dissemination.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

157

Hughes et al. [11] proposed a pseudo-Siamese CNN for
learning a multisensor correspondence predictor for SAR
and optical image patches. Notably, the networks in [11]
and [58] are trained and validated on the SARptical data set
[182], [183], which is specifically built for the joint analysis
of VHR SAR and optical images in dense urban areas.
In [184], the authors proposed a deep learning framework that can obtain an end-to-end mapping between image
patch pairs and their matching labels. An image pair is first
transformed into two 1D vectors and then concatenated to
build a large 1D vector as the network input. Then, hidden
layers are stacked for learning the mapping between input
vectors and output binary labels, which indicate their correspondence. For the purpose of matching SAR and optical images, Merkle et al. [185] presented a CNN that consists of a
feature extraction stage (a Siamese network) and a similarity
measure stage (a dot product layer). Specifically, features of
input optical and SAR images are extracted via two separate
nine-layer branches and then fed to a dot product layer for
predicting the shift of the optical image within the large SAR
reference patch. Experimental results indicate that this deep
learning-based method outperforms state-of-the-art matching approaches [186], [187]. Furthermore, Abulkhanov et al.
[188] successfully trained a neural network to build feature
point descriptors to identify corresponding patches among
SAR and optical images and match the detected descriptors
using the random sample consensus algorithm [189]. In
contrast to training a model to identify corresponding image patches, Merkle et al. [190] first employed a conditional
GAN (cGAN) to generate artificial SAR-like images from optical images and then matched them with real SAR images.
The authors demonstrated that the matching accuracy and
precision improved through the proposed strategy. Inspired
by that study, more researchers resorted to using GANs for
the purpose of SAR–optical image matching (see [191] and
[192] for a review).
With respect to applications of SAR and optical image
matching, Yao et al. [193] aimed at applying SAR and optical
images to semantic segmentation with deep neural networks.
They collected corresponding optical patches from Google
Earth that accorded to TerraSAR-X patches and built ground

(a)

(b)

truths using data from OSM. Then, SAR and optical images
were separately fed to different CNNs to predict semantic labels (buildings, natural areas, land use, and water). Despite
the fact that their experimental results did not outperform the
state of the art [194], likely because of the network design or
the training strategy, they deduced that introducing advanced
models and simultaneously using both data sources can greatly
improve the performance of semantic segmentation. Another
application, mentioned in [195], demonstrated that standard
fusion techniques for SAR and optical images require data
from both sources, which indicates that it is still not easy to
interpret SAR images without the support of optical ones. To
address this issue, Schmitt et al. [195] proposed an automatic
colorization network composed of a VAE and a mixture density
network [196] to predict artificially colored SAR images (i.e.,
Sentinel-1 images). These images proved to disclose more information to human interpreters than the original SAR data did.
In [42], the authors tackled the problem of cloud removal from optical imagery. They introduced a cGAN architecture to fuse SAR and cloud-corrupted multispectral data for
generating cloud- and haze-free multispectral optical information. Experiments proved the effectiveness of the proposed network for removing clouds from multispectral data
with auxiliary SAR data. Extending previous multimodal
networks for cloud removal, the authors of [43] proposed a
cycle-consistent GAN architecture [197] that utilizes an image forward–backward translation consistency loss. Cloudcovered optical information is reconstructed via SAR data
fusion, while changes to cloud-free areas are minimized
through the cycle consistency loss. The cycle-consistent architecture facilitates training without pixelwise correspondences between cloudy input and cloud-free target optical
imagery, relaxing requirements for the training data set.
In summary, it can be seen that the utilization of deep
learning methods for SAR–optical data fusion has been
a hot topic in the remote sensing community. Although
a handful of data sets consisting of optical and SAR corresponding image patches is available for different terrain
types and applications, one of the biggest problems remains
the scarcity of high-quality training data. Semisupervised
methods, as proposed in [198], seem to be a viable option

(c)

(d)

FIGURE 9. Randomly selected patches obtained from the testing phase of the network for SAR–optical image patch correspondence detec-

tion proposed in [11]. (a) True positives. (b) False positives. (c) False negatives. (d) True negatives.
158

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

to tackle the issue. A great challenge in SAR–optical image
matching concerns the extreme difference between the two
sensors’ viewing geometries. For this, it is important to exploit auxiliary 3D data to assist the training data generation.
EXISTING BENCHMARK DATA SETS
AND THEIR LIMITATIONS
To train and evaluate deep learning models, large data sets
are indispensable. Unlike RGB images in the computer vision community, which can be easily collected and interpreted, SAR images are much more difficult to annotate due to
their complex properties. Our research shows that big SAR

data sets created for the primary purpose of deep learning
investigations are nearly nonexistent in the community. In
recent years, only a few SAR data sets have been made public
for training and assessing deep learning models. In the following, we categorize those data sets according to their bestsuited deep learning problem and focus on openly accessible
and well-curated large data sets (see Table 1 for summaries
the open SAR data sets). In particular, we consider the following categories of deep learning problems in SAR:
◗◗ Image classification: Each pixel or patch in one image is
classified into a single label. This is often the case in typical land use/land cover classification problems.

TABLE 1. AVAILABLE OPEN SAR DATA SETS.
NAME

DESCRIPTION

SUITABLE TASKS

RELATED WORK

So2Sat LCZ421 [200],
TensorFlow 2

This data set contains 400,673 pairs of corresponding Sentinel-1 dual-polarity image patches, Sentinel-2 multispectral image patches, and manually
labeled LCZ classes across 42 urban agglomerations (plus 10 additional
smaller areas) around the globe. It is the first Earth observation data set
that provides a quantitative measure of the label uncertainty, achieved by
having a group of domain experts cast 10 independent votes for 19 cities
in the data set.

Image classification
Data fusion
Quantification of
uncertainties

[201]

OpenSARUrban3 [199]

This data set includes 33,358 Sentinel-1 dual-polarity image patches covering 21 major cities in China, labeled with 10 classes of urban scenes.

Image classification

SEN12MS 4 [202]

In this data set, there are 180,748 corresponding image triplets containing
Sentinel-1 dual-polarity SAR data, Sentinel-2 multispectral imagery, and
MODIS-derived land cover maps, covering all inhabited continents during
all meteorological seasons.

Image classification
Semantic segmentation
Data fusion

MSAW5 [204]

This data set contains quad-polarity X-band SAR imagery from Capella
Space, with a 0.5-m spatial resolution, which covers 120 km2 in the area
of Rotterdam. A total of 48,000 unique building footprints are labeled
with associated height information curated from the 3D Basis Registratie
Adressen en Gebouwen data set.

Semantic segmentation

PolSAR Image Data
Set on San Francisco6,
Label7 [205]

The data set includes PolSAR images of San Francisco from five different sensors. Each image was densely labeled to five or six classes, such
as mountain, water, high-density urban, low-density urban, vegetation,
developed, and bare soil.

Image classification
Semantic segmentation
Data fusion

[206]

MSTAR8 [207]

This data set contains 17,658 X-band VHR SAR images chips (patches)
of 10 classes of different vehicles plus one class of a simple geometricshaped target. SAR images of pure clutter are also included.

Object detection
Scene classification

[97], [98], [208]

OpenSARShip 2.0 9 [209]

This data set includes 34,528 Sentinel-1 SAR image chips of ships, with
ship geometric information, types, and corresponding AIS information.

Object detection
Scene classification

[210]

SAR-Ship data set10 [211]

Here, there are 43,819 Gaofen-3 and Sentinel-1 image chips of different
ships. Each image chip has a dimension of 256 × 256 pixels in range
and azimuth.

Object detection
Scene classification

SARptical11 [212]

The SARptical data set includes 10,108 coregistered pairs of TerraSAR-X
VHR spotlight image patch and UltraCam aerial RGB image patches for
Berlin. The coregistration is defined by the matching of the 3D position of
the center of the image pair.

Image matching

[11], [183]

SEN1-212 [203]

This data set contains 282,384 pairs of corresponding Sentinel-1 singlepolarization-intensity and Sentinel-2 RGB image patches collected across
the globe. The patches are 256 × 256 pixels.

Image matching
Data fusion

[202]

[203]

1https://doi.org/10.14459/2018mp1483140.

2https://www.tensorflow.org/datasets/catalog/so2sat.
3https://doi.org/10.21227/3sz0-dp26.
4https://mediatum.ub.tum.de/1474000.
5https://spacenet.ai/sn6-challenge/.
6https://www.ietr.fr/polsarpro-bio/san-francisco/.
7https://github.com/liuxuvip/PolSF.
8https://www.sdms.afrl.af.mil/index.php?collection=mstar.
9http://opensar.sjtu.edu.cn/Data/Search?key=OpenSARShip.
10https://github.com/CAESAR-Radi/SAR-Ship-Dataset.
11https://syncandshare.lrz.de/getlink/figixjRV9idETzPgG689dGB/SARptical_data.zip.
12 https://mediatum.ub.tum.de/1436631.

DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

159

◗◗ Scene classification: Similar to image classification, one

image or patch is classified into a single label. However,
one scene is usually much larger than an image patch.
Hence, a different network architecture is required.
◗◗ Semantic segmentation: One image or patch is segmented to
a classification map of the same dimension. Training such
neural networks also requires densely annotated data.
◗◗ Object detection: This is much like scene classification.
However, detection often requires the estimation of the
object location.
◗◗ Registration/matching: This provides binary classification
(matched and unmatched) and estimates the translation
between two image patches. Such tasks require that pairs
of two different image patches be matched as training data.
IMAGE/SCENE CLASSIFICATION
So2Sat LCZ42
So2Sat LCZ42 [200] follows the local climate zones (LCZs)
classification scheme. The data set consists of 400,673
pairs of dual-polarity Sentinel-1 and multispectral Sentinel-2 image patches from 42 urban agglomerations, plus
10 additional smaller areas, across five continents. The image patches are hand labeled into one of the 17 LCZ classes
[213]. The Sentinel-1 image patches contain a geocoded,
single-look complex image as well as a despeckled Lee-filtered variant. In particular, it is the first Earth observation
data set that provides a quantitative measure of the label
uncertainty, achieved by letting a group of domain experts
cast 10 independent votes covering 19 cities. It therefore
can be considered a large-scale data fusion and classification benchmark data set for cutting-edge ML methodological developments, such as automatic topology learning,
data fusion, and the quantification of uncertainties.
OpenSARUrban
OpenSARUrban [199] consists of 33,358 patches of Sentinel-1
dual-polarity images covering 21 major cities in China. The
data set was manually annotated according to a hierarchical
classification scheme, with 10 classes of urban scenes at its finest level. Each image patch has a dimension of 100 × 100 pixels,
with a pixel spacing of 10 m [the Sentinel-1 ground-range-detected (GRD) product]. This data set can support deep learning
studies of urban target characterization and content-based SAR
image queries. Figure 10 shows samples.

expect this data set to support the community in developing sophisticated deep learning-based approaches for
common tasks, such as scene classification and semantic
segmentation for land cover mapping.
MULTISENSOR ALL-WEATHER MAPPING
The Multisensor All-Weather Mapping (MSAW) [204] data
set includes high-resolution SAR data, which covers 120 km2
in the area of Rotterdam, The Netherlands. The quad-polarized X-band SAR imagery from Capella Space, with a 0.5-m
spatial resolution, was used for the SpaceNet 6 Challenge. In
total, 48,000 unique building footprints have been labeled
with additional building heights.
PolSAR IMAGE DATA SET ON SAN FRANCISCO
This data set [205] consists of PolSAR images of San Francisco from eight different sensors, including Airborne SAR,
Advanced Land Observing Satellite (ALOS)-1, ALOS-2, RadarSat-2, Sentinel-1A, Sentinel-1B, Gaofen-3, and Radar Imaging Satellite (data compiled by E. Pottier of the Institute of
Electronics and Telecommunications of Rennes). Five of the
eight images were densely labeled to five or six land use land
cover classes in [205]. These densely annotated images correspond to roughly 3,000 training patches of 128 × 128 pixels.
Although the data volume is relatively low for deep learning
research, this is the only annotated multisensory PolSAR data
set, to the best of our knowledge. Therefore, we suggest that
its creator increase the number of annotated images to enable
its greater potential use.
OBJECT DETECTION
MSTAR
MSTAR [207] is one of the earliest data sets for SAR target
recognition. It consists of 17,658 X-band SAR image chips
(patches) of 10 classes of vehicles plus one class of simple
geometric-shaped targets. The collected SAR image patches
are 128 × 128 pixels, with a resolution of 1 ft in the range
and azimuth. In addition, 100 SAR images of clutter are provided. In our opinion, the number of image patches is relatively low for deep learning models, especially considering
the number of classes. In addition, this data set represents a
rather ideal and unrealistic scenario: vehicles are centered
in the patch, and the clutter is quite homogeneous, without
disturbing signals. However, considering the scarcity of such
data sets, MSTAR is a valuable source for target recognition.

SEMANTIC SEGMENTATION/CLASSIFICATION
SEN12MS
SEN12MS [202] was created based on its previous version SEN1-2 [203]. It consists of 180,662 triplets of dualpolarity Sentinel-1 image patches, multispectral Sentinel-2
image patches, and Moderate Resolution Imaging Spectroradiometer (MODIS) land cover maps. The patches are
georeferenced, with a ground sampling distance of 10 m.
Each image patch has a dimension of 256 × 256 pixels. We
160

OpenSARShip 2.0
This data set [209] is based on its previous version, OpenSARShip [210]. It contains 34,528 Sentinel-1 SAR image
patches of different ships, with automatic identification
system (AIS) information. For each SAR image patch, the
creators manually extracted the ship length, width, and
direction as well as the vessel type by verifying the data
on the Marine Traffic website [209]. Roughly one-third of
the patches are extracted from Sentinel-1 GRD products,
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

and the other two-thirds are from Sentinel-1 single-look
complex products. OpenSARShip 2.0 is one of the handful of SAR data sets suitable for object detection.

ships, tankers, fishing boats, and others. The scene types include ports, islands, reefs, and sea surfaces of different levels.
REGISTRATION/MATCHING

SAR-SHIP-DATA SET
This data set [211] was created using 102 Gaofen-3 and 108 Sentinel-1 images. It consists of 43,819 ship chips of 256 pixels in both
the range and azimuth. The ships mainly have distinct
scales and backgrounds. Therefore, this data set can be employed for developing multiscale object detection models.
FUSAR–SHIP
The FUSAR–Ship data set [214] was created using space–time
matched-up data sets of Gaofen-3 SAR images and ship AIS
messages. It consists of more than 5,000 ship chips with corresponding vessel information extracted from AIS messages,
which can be used to trace back to each unique ship of any
particular chip.
AIR–SARShip 1.0/2.0
The AIR–SARShip data set [215] has 31 (300) SAR images
from the Geofen-3 satellite, including 1- and 3-m-resolution
imagery with different imaging modes, such as spotlight and
stripmap. There are more than 10 object categories, including

SARptical
The SARptical data set [183], [212] was designed for interpreting VHR spaceborne SAR images of dense urban areas.
It consists of 10,108 pairs of corresponding VHR SAR and
optical image patches whose locations are precisely coregistered in 3D. The patches are extracted from TerraSAR-X VHR
spotlight images with a resolution better than 1 m and from
UltraCam aerial optical images with a 20-cm pixel spacing,
respectively. Unlike low- and medium-resolution images,
high-resolution SAR and optical images in dense urban areas have very distinct geometries. Therefore, in the SARptical
data set, the center points of each image pair are matched in
3D space via sophisticated 3D reconstruction and matching
algorithms. The universal transverse Mercator coordinates
of the center pixel of each pair are also made publicly available. This data set contributes to applications of multimodal
data classification and SAR optical images coregistering.
However, we believe more training samples are required for
learning complicated SAR optical image-to-image mapping.

FIGURE 10. Samples of the OpenSARUrban data set [199]. Six classes are shown from top to bottom: dense and low-rise residential buildings, a general residential area, high-rise buildings, villas, an industrial storage area, and vegetation.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

161

SEN1-2
The SEN1-2 data set [203] includes 282,384 pairs of corresponding Sentinel-1 single-polarization-intensity and Sentinel-2 RGB image patches collected from across the globe
and in all meteorological seasons. The patches are 256 ×
256 pixels. Their distribution through the four seasons is
roughly even. SEN1-2 is the first large open data set of this
kind. We believe it will support further developments in
the field of deep learning for remote sensing as well as multisensor data fusion, such as SAR image colorization and
SAR–optical image matching.

◗◗

OTHER DATA SETS
SAMPLE PolSAR IMAGES FROM
THE EUROPEAN SPACE AGENCY
These data sets (https://earth.esa.int/web/polsarpro/data
-sources/sample-datasets) include, for example, the Flevoland PolSAR data set, which several works use for agricultural land use/land cover classification. The authors of
[216]–[218] manually labeled it according to different classification schemes.
SAR IMAGE LAND COVER
This data set [219] is not publicly available. Readers should
contact the creator.
◗◗

AIRBUS SHIP DETECTION CHALLENGE
This data set can be accessed at https://www.kaggle.com/c/
airbus-ship-detection.
CONCLUSION AND FUTURE TRENDS
This article reviewed the state of the art of an important
and underexploited research field: deep learning in SAR.
Relevant deep learning models were introduced, and their
applications in six application fields—terrain surface classification, object detection, parameter inversion, despeckling, InSAR, and SAR–optical data fusion—were analyzed
in depth. Existing benchmark data sets and their limitations
were discussed. In summary, despite early successes, the full
exploitation of deep learning in SAR is mostly limited by 1)
the lack of large and representative benchmark data sets and
2) the defect of tailored deep learning models that makes
full consideration of SAR signal characteristics difficult.
Looking forward, the years ahead will be exciting. Nextgeneration spaceborne SAR missions will simultaneously
provide high-resolution and global coverage, which will
enable novel applications, such as monitoring the dynamic
Earth. To retrieve geoparameters from these data, the development of new analytics methods is warranted. Deep
learning is among the most promising methods. To fully
unlock its potential in SAR/InSAR applications in this big
SAR data era, there are several promising future directions,
including the following:
◗◗ Large and representative benchmark data sets: As summarized in this article, there is only a handful of SAR
162

◗◗

benchmarks, in particular, when multimodal ones are
excluded. For instance, in SAR target detection, methods are mainly tested on a single benchmark data set,
MSTAR, where only several thousands of target samples
(several hundred for each class) are provided for training.
With respect to InSAR, due to the lack of ground truth,
data sets are extremely deficient or nearly nonexistent.
Large and representative expert-annotated benchmark
data sets are in high demand in the SAR community and
deserve more attention.
Unsupervised deep learning: To bypass the deficiencies in
annotated data in SAR, unsupervised deep learning is
a promising direction. These algorithms derive insights
directly from the data themselves and work as feature
learning, representation learning, and clustering, which
could be further used for data-driven analytics. Autoencoders and their extensions, such as VAEs and deep embedded clustering algorithms, are popular choices. With
respect to denoising, in despeckling, the high complexity of SAR images and the lack of ground truth make it
infeasible to produce appropriate benchmarks from real
data. Noise to noise [152] is an elegant example of unsupervised denoising, where the authors of [152] learn denoised data without clean data. Despite the nice visual
appearance of the results, preserving details is a must for
SAR applications.
Interferometric data processing: Since deep learning methods were initially applied to perception tasks in computer vision, many methods resort to transforming SAR
images, e.g., PolSAR images, into RGB-like images in advance, or they focus only on intensities. In other words,
the most essential component of an SAR measurement—
the phase information—is not appropriately considered.
Although CV-CNNs are capable of learning phase information and show great potential in processing CV-SAR
images, only a few such attempts have been made [83].
Extending CNNs to the complex domain, while preserving precious phase information, would enable networks
to directly learn features from raw data and would open
up a wide range of SAR/InSAR applications.
Quantification of uncertainties: Generally speaking, geoparameter estimates without uncertainty measures are considered invalid in remote sensing. Appropriately trained
deep learning models can achieve highly accurate predictions. Yet they fail in quantifying the uncertainty of these
predictions. Here, giving a statement about the predictive
uncertainty, while considering both aleatoric uncertainty and epistemic uncertainty, is of crucial importance.
The Bayesian deep learning community has developed
a model-agnostic and easy-to-implement methodology
to estimate both the data uncertainty and model uncertainty within deep learning models [54], which is awaiting exploration by the SAR community.
Large-scale nonlinear optimization problems: The development of inversion algorithms should keep up the pace
of data growth. Fast solvers are demanded for many

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

advanced parameter inversion models, which often involve nonconvex, nonlinear, and complex-valued optimization problems, such as compressive sensing-based
tomographic inversion and low-rank complex tensor decomposition for InSAR time series data analysis. In some
cases, the iterations of the optimization algorithms perform computations similar to those in layers in neural
networks, that is, a linear step followed by a nonlinear
activation (see for example, the iteratively reweighted
least-squares approach). And it is thus meaningful to
replace computationally expensive optimization algorithms with unrolled deep architectures that could be
trained from simulated data [50].
◗◗ Cognitive sensors: Radars—and SARs, in particular—are
very complex and versatile imaging machines. A variety
of modes (stripmap, spotlight, ScanSAR, terrain observation with progressive scans, and so on), swath widths,
incidence angles, and polarizations can be programmed
in near real time. Cognitive radars go a giant step further: they autonomously adapt their operational modes
to the environment to be imaged through an intelligent
interplay of transmit waveforms, adaptive signal processing on the receiver side, and learning. Cognitive SARs
are still in their conceptual and experimental phase and
are often justified by the stunning capabilities of the
echolocation system of bats. In his pioneering article
[116], Haykin defines three ingredients of a cognitive
radar: “1) intelligent signal processing, which builds on
learning through interactions of the radar with the surrounding environment; 2) feedback from the receiver to
the transmitter, which is a facilitator of intelligence; and
3) preservation of the information content of radar returns, which is realized by the Bayesian approach to target detection through tracking.” Such a SAR could, e.g.,
perform low-resolution yet wide-swath surveillance of
a coastal area and, in a first step, detect objects of interest, such as ships, in real time. Based on such detection,
the transmit waveform could be modified, for instance,
by zooming into the region of interest and enabling a
close-up look at an object and possibly classifying or
even identifying it. Reinforcement (online) learning is
part of the concept, as are fast and reliable detectors and
classifiers (trained offline), e.g., based on deep learning.
All this is edge computing; the learning algorithms have
to perform in real time and with the limited compute
resources onboard the satellite or airplane.
Last but not least, technology advances in deep learning
in remote sensing will be possible only if experts in remote
sensing and ML work closely together. This is particularly true
when it comes to SAR. Thus, we encourage more joint initiatives to work collaboratively toward deep learning powered,
explainable, and reproducible big SAR data analytics.
ACKNOWLEDGMENTS
The work of Xiao Xiang Zhu is jointly supported by the
European Research Council, under the European Union’s
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

Horizon 2020 research and innovation program (grant
ERC-2016-StG-714087); the Helmholtz Association, through
Helmholtz AI, Munich Unit at Aeronautics, Space, and Transport, and through the Helmholtz Excellent Professorship
Data Science in Earth Observation: Big Data Fusion for Urban Research; and the German Federal Ministry of Education and Research, through the international Future AI Lab
AI4EO (grant 01DD20001).
AUTHOR INFORMATION
Xiao Xiang Zhu (xiaoxiang.zhu@dlr.de) received her M.Sc.,

Dr.-Ing., and habilitation degrees in signal processing from
the Technical University of Munich (TUM), Munich, Germany, in 2008, 2011, and 2013, respectively. She is a professor of data science in Earth observation at TUM and the
head of the Department of Earth Observation Data Science,
Remote Sensing Technology Institute, German Aerospace
Center, Wessling, 82234, Germany. Since 2019, she has
been a co-coordinator of the Munich Data Science Research
School and the head of the aeronautics, space, and transport research field at the Helmholtz Association, Bonn,
Germany. She has directed the Future Lab AI4EO: Artificial
Intelligence for Earth Observation: Reasoning, Uncertainties, Ethics and Beyond, Munich, since 2020 and she serves
on the board of directors of the Munich Data Science Institute, TUM. She was a guest scientist or visiting professor at
the Italian National Research Council, Naples, Italy; Fudan
University, Shanghai, China; the University of Tokyo, Tokyo, Japan; and the University of California, Los Angeles,
Los Angeles, California, USA, in 2009, 2014, 2015, and
2016, respectively. Her research interests include remote
sensing and Earth observation, signal processing, machine
learning, and data science, with a special focus on global
urban mapping. She is a member of the Junge Akademie/
Junges Kolleg, Berlin–Brandenburg Academy of Sciences
and Humanities; the German National Academy of Sciences Leopoldina; and the Bavarian Academy of Sciences and
Humanities. She is an associate editor of IEEE Transactions
on Geoscience and Remote Sensing and a Fellow of IEEE.
Sina Montazeri (sina.montazeri@dlr.de) received his
B.Sc. degree in geodetic engineering from the University of
Isfahan, Isfahan, Iran, in 2011; his M.Sc. degree in geomatics from Delft University of Technology, Delft, The Netherlands, in 2014; and his Ph.D. degree in radar remote sensing
from the Technical University of Munich (TUM), Munich,
Germany, in 2019, with a dissertation on geodetic synthetic aperture radar (SAR) interferometry. In 2012, he spent
two weeks with the Laboratoire des Sciences de l’Image, de
l’Informatique et de la Télédétection, University of Strasbourg, Strasbourg, France, as a junior researcher working
on thermal remote sensing. From 2013 to 2015, he was a research assistant at the Remote Sensing Technology Institute
(IMF), German Aerospace Center (DLR), Wessling, 82234,
Germany, where he was involved in the absolute localization of point clouds obtained from SAR tomography. From
2015 to 2019, he was a research associate with the Signal
163

Processing in Earth Observation research group, TUM, and
IMF–DLR, working on the automatic positioning of ground
control points from multiview radar images. He is currently
a senior researcher in the Department of Earth Observation
Data Science, IMF–DLR, focused on developing machine
learning algorithms applied to radar imagery. His research
interests include advanced interferometric SAR techniques
for the deformation monitoring of urban infrastructure,
image and signal processing relevant to radar imagery, and
applied machine learning. He received the DLR Science
Award and the IEEE Geoscience and Remote Sensing Society Transactions Prize Paper Award, in 2016 and 2017, respectively, for his work on geodetic SAR tomography.
Mohsin Ali (syed.ali@dlr.de) received his B.S. degree
in computer engineering from the National University of
Science and Technology, Islamabad, Pakistan, in 2013 and
his M.S. degree in computer science from the University of
Freiburg, Freiburg, Germany, in 2018. He is a Ph.D. degree
candidate at the Earth Observation Center, German Aerospace Center, Wessling, 82234. Germany, supervised by
Prof. Xiao Xiang Zhu. His research interests include uncertainty estimation in deep learning models for remote sensing applications.
Yuansheng Hua (yuansheng.hua@dlr.de) received his
B.S. degree in remote sensing science and technology from
Wuhan University, Wuhan, China, in 2014 and his M.S. degree in Earth-oriented space science and technology from
the Technical University of Munich (TUM), Munich, Germany, in 2018. He is pursuing his Ph.D. degree at the German Aerospace Center, Wessling, 82234, Germany, and at
TUM. In 2019, he was a visiting researcher at Wageningen
University and Research, Wageningen, The Netherlands.
His research interests include remote sensing, computer
vision, and deep learning, especially their applications in
remote sensing. He is a Student Member of IEEE.
Yuanyuan Wang (y.wang@tum.de) received his B.Eng.
degree, with honors, in electrical engineering from Hong
Kong Polytechnic University, Hong Kong, China, in 2008,
and his M.Sc. and Dr. Ing. degrees from the Technical University of Munich (TUM), Munich, Germany, in 2010 and
2015, respectively. In June and July 2014, he was a guest
scientist at the Institute of Visual Computing, ETH Zürich,
Zürich, Switzerland. He is currently with the Department of
Earth Observation Data Science, Remote Sensing Technology Institute, German Aerospace Center, Wessling, 82234,
Germany, where he leads the Big SAR Data working group.
He is also a guest member of the Professorship of Data Science in Earth Observation, TUM, where he supports the scientific management of European Research Council projects
So2Sat and AI4SmartCities. His research interests include
optimal and robust parameter estimation in multibaseline
interferometric synthetic aperture radar (SAR), multisensor
fusion algorithms of SAR and optical data, nonlinear optimization with complex numbers, machine learning in SAR,
and high-performance computing for big data. He serves
as a reviewer for multiple IEEE Geoscience and Remote
164

Sensing Society and other remote sensing journals, and he
was named one of the best reviewers of IEEE Transactions on
Geoscience and Remote Sensing, in 2016. He is an associate
editor of the Royal Meteorological Society’s Geoscience Data
Journal. He is a Member of IEEE.
Lichao Mou (lichao.mou@dlr.de) received his B.S. degree
in automation from the Xi’an University of Posts and Telecommunications, Xi’an, China, in 2012; his M.S. degree in
signal and information processing from the University of the
Chinese Academy of Sciences, Beijing, China, in 2015; and
his Dr.-Ing. degree from the Technical University of Munich
(TUM), Munich, Germany, in 2020. He is a guest professor at
the Munich AI Future Lab AI4EO, TUM, and the head of the
Visual Learning and Reasoning team, Department of Earth
Observation Data Science, Remote Sensing Technology Institute (IMF), German Aerospace Center (DLR), Wessling,
82234, Germany. Since 2019, he has been an artificial intelligence consultant for the Helmholtz Artificial Intelligence
Cooperation Unit of the Helmholtz Association of Germany. In 2015, he spent six months at the Computer Vision
Group, University of Freiburg, Freiburg, Germany. In 2019
he was a visiting researcher at the Cambridge Image Analysis Group, University of Cambridge, Cambridge, U.K. From
2019 to 2020, he was a research scientist at IMF–DLR. He
was the first-place winner of the 2016 IEEE GRSS Data Fusion Contest and a finalist for the Best Student Paper Award
at the Joint Urban Remote Sensing Event, in 2017 and 2019.
He is a Member of IEEE.
Yilei Shi (yilei.shi@tum.de) received his Dipl.-Ing. degree
in mechanical engineering and his Dr.-Ing. degree in engineering from the Technical University of Munich (TUM),
Germany. In April and May 2019, he was a guest scientist
in the Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, U.K. He is
currently a senior scientist with the Chair of Remote Sensing
Technology, TUM, Munich, 82024, Germany. His research
interests include computational intelligence; fast-solver and
parallel computing for large-scale problems; advanced methods for synthetic aperture radar (SAR) and interferometric
SAR processing; machine learning and deep learning for a
variety data sources, such as SAR, optical images, medical
images, and so on; and partial differential equation-related
numerical modeling and computing. He is a Member of IEEE.
Feng Xu (fengxu@fudan.edu.cn) received his B.E. degree,
with honors, in information engineering from Southeast
University, Nanjing, China, in 2003 and his Ph.D. degree, with honors, in electronic engineering from Fudan
University, China, in 2008. From 2008 to 2010, he was a
postdoctoral fellow at the National Oceanic and Atmospheric Administration Center for Satellite Application
and Research, Camp Springs, Maryland, USA. From 2010
to 2013, he was with Intelligent Automation, Rockville,
Maryland, USA, and with the NASA Goddard Space Flight
Center, Greenbelt, Maryland, USA, as a research scientist.
In 2012, he was selected for China’s Global Experts Recruitment Program and subsequently returned to Fudan
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

University, Shanghai, 200433, China, in 2013, where he
is currently a professor in the School of Information Science and Technology and the vice director of the Ministry
of Education Key Laboratory for Information Science of
Electromagnetic Waves. He has authored more than 30 papers in peer-reviewed journals, coauthored two books, and
written many conference papers, and he holds two patents.
His research interests include electromagnetic scattering
modeling, synthetic aperture radar information retrieval,
and radar system development. He was a recipient of the
second-class National Nature Science Award and the 2014
SUMMA graduate fellowship in advanced electromagnetics. He serves as an associate editor of IEEE Geoscience and
Remote Sensing Letters. He is the founding chair of the IEEE
Geoscience and Remote Sensing Society Shanghai Chapter
and a Senior Member of IEEE.
Richard Bamler (richard.bamler@dlr.de) received his
Dipl.-Ing. degree in electrical engineering, Dr.-Ing. degree
in engineering, and habilitation degree in signal and systems theory, in 1980, 1986, and 1988, respectively, from
the Technical University of Munich, Germany. He worked
at the university, from 1981 to 1989, on optical signal processing, holography, wave propagation, and tomography.
He joined the German Aerospace Center (DLR), Wessling,
82234, Germany, in 1989, where he is currently the director of the Remote Sensing Technology Institute. In early
1994, he was a visiting scientist at the NASA Jet Propulsion Laboratory in preparation of the Spaceborne Imaging
Radar-C/X-band Synthetic Aperture Radar (SIR-C/X-SAR)
missions, and, in 1996, he was a guest professor at the University of Innsbruck. Since 2003, he has held a full professorship in remote sensing technology at the Technical University of Munich, Munich, 80333, Germany, as a double
appointment with his DLR position. His teaching activities include university lectures and courses covering signal
processing, estimation theory, and synthetic aperture radar
(SAR). Since he joined the DLR, his team has worked on
SAR and optical remote sensing, image analysis and understanding, stereo reconstruction, computer vision, ocean
color, passive and active atmospheric sounding, and laboratory spectrometry. His team is responsible for the development of the operational processors for SIR-C/X-SAR, the
Shuttle Radar Topography Mission, TerraSAR-X, TerraSAR-X
Add-On for Digital Elevation Measurement, the Tandem-L
mission, the Second European Remote Sensing Satellite Global Ozone Monitoring Experiment (GOME), Environmental
Satellite Scanning Imaging Absorption Spectrometer for Atmospheric Cartography, Meteorological Operational Satellite/
GOME-2, Sentinel-5 Precursor, Sentinel-4, DLR Earth Sensing Imaging Spectrometer, the Environmental Mapping and
Analysis Program mission, and others. His research interests include algorithms for optimum information extraction from remote sensing data, with an emphasis on SAR.
This involves new estimation algorithms, such as sparse reconstruction, compressive sensing, and deep learning. He
is a Fellow of IEEE.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

REFERENCES
[1]
[2]
[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol.
521, no. 7553, pp. 436–444, 2015. doi: 10.1038/nature14539.
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2014, arXiv:1409.1556.
Z.-Q. Zhao, P. Zheng, S.-T. Xu, and X. Wu, “Object detection
with deep learning: A review,” IEEE Trans. Neural Netw. Learn.
Syst., vol. 30, no. 11, pp. 3212–3232, 2019. doi: 10.1109/
TNNLS.2018.2876865.
Y. Guo, Y. Liu, T. Georgiou, and M. S. Lew, “A review of semantic segmentation using deep neural networks,” Int. J. Multimedia
Inf. Retrieval, vol. 7, no. 2, pp. 87–93, 2018. doi: 10.1007/s13735017-0141-z.
X. X. Zhu et al., “Deep learning in remote sensing: A comprehensive review and list of resources,” IEEE Geosci. Remote Sens. Mag.,
vol. 5, no. 4, pp. 8–36, 2017. doi: 10.1109/MGRS.2017.2762307.
H. Parikh, S. Patel, and V. Patel, “Classification of SAR and PolSAR
images using deep learning: A review,” Int. J. Image Data Fusion, vol.
11, no. 1, pp. 1–32, 2020. doi: 10.1080/19479832.2019.1655489.
S. Chen and H. Wang, “SAR target recognition based on deep
learning,” in Proc. Int. Conf. Data Sci. Adv. Anal. (DSAA), 2014,
pp. 541–547. doi: 10.1109/DSAA.2014.7058124.
L. Wang, A. Scott, L. Xu, and D. Clausi, “Ice concentration estimation from dual-polarized SAR images using deep convolutional
neural networks,” in IEEE Trans. Geosci. Remote Sens., vol. 11, no.
1, pp. 1–32, 2014. doi: 10.1109/TGRS.2016.2543660.
P. Wang, H. Zhang, and V. Patel, “SAR image despeckling using
a convolutional neural network,” IEEE Signal Process. Lett., vol.
24, no. 12, pp. 1763–1767, 2017. doi: 10.1109/LSP.2017.2758203.
N. Anantrasirichai, J. Biggs, F. Albino, P. Hill, and D. Bull, “Application of machine learning to classification of volcanic deformation in routinely generated InSAR data,” JGR, Solid Earth, vol.
123, no. 8, pp. 6592–6606, 2018. doi: 10.1029/2018JB015911.
L. Hughes, M. Schmitt, L. Mou, Y. Wang, and X. X. Zhu, “Identifying corresponding patches in SAR and optical images with a
pseudo-Siamese CNN,” IEEE Geosci. Remote Sens. Lett., vol. 15,
no. 5, pp. 784–788, 2018. doi: 10.1109/LGRS.2018.2799232.
K. Ikeuchi, T. Shakunaga, M. Wheeler, and T. Yamazaki, “Invariant histograms and deformable template matching for
SAR target recognition,” in Proc. CVPR IEEE Comput. Soc. Conf.
Comput. Vis. Pattern Recognit., 1996, pp. 100–105. doi: 10.1109/
CVPR.1996.517060.
Q. Zhao and J. Principe, “Support vector machines for SAR automatic target recognition,” IEEE Trans. Aerosp. Electron. Syst.,
vol. 37, no. 2, pp. 643–654, 2001. doi: 10.1109/7.937475.
M. Bryant and F. Garber, “SVM classifier applied to the MSTAR
public data set,” in Proc. Algorithms Synth. Aperture Radar Imag.,
1999, pp. 355–360. doi: 10.1117/12.357652.
M. Ferguson, R. Ak, Y.-T. T. Lee, and K. H. Law, “Automatic
localization of casting defects with convolutional neural networks,” in Proc. IEEE Int. Conf. Big Data (Big Data), Dec. 2017,
pp. 1726–1735. doi: 10.1109/BigData.2017.8258115.
K. Chen, K. Chen, Q. Wang, Z. He, J. Hu, and J. He, “Short-term
load forecasting with deep residual networks,” IEEE Trans. Smart
Grid, vol. 10, no. 4, pp. 3943–3952, July 2019. doi: 10.1109/
TSG.2018.2844307.

165

[17] Y. Han and J. C. Ye, “Framing U-Net via Deep Convolutional
Framelets: Application to Sparse-View CT,” IEEE Transactions
on Medical Imaging, vol. 37, no. 6, pp. 1418–1429, Jun. 2018,
doi: 10.1109/TMI.2018.2823768.
[18] “Long short-term memory.” Wikimedia. https://upload.wiki
media.org/wikipedia/commons/thumb/3/3b/The_LSTM_
cell.png/1280px-The_LSTM_cell.png (accessed May 27, 2020).
[19] Y. Yang, K. Zheng, C. Wu, and Y. Yang, “Improving the Classification Effectiveness of Intrusion Detection by Using Improved
Conditional Variational AutoEncoder and Deep Neural Network,” Sensors, vol. 19, no. 11, p. 2528, Jun. 2019. doi: 10.3390/
s19112528.
[20] W. Feng, N. Guan, Y. Li, X. Zhang, and Z. Luo, “Audio visual
speech recognition with multimodal recurrent neural networks,” in Proc. Int. Joint Conf. Neural Netw. (IJCNN), May 2017,
pp. 681–688. doi: 10.1109/IJCNN.2017.7965918.
[21] A. Creswell, T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta, and A. A. Bharath, “Generative Adversarial Networks:
An Overview,” IEEE Signal Process. Mag., vol. 35, no. 1, pp. 53–
65, Jan. 2018. doi: 10.1109/MSP.2017.2765202.
[22] M. Zitnik, M. Agrawal, and J. Leskovec, “Modeling polypharmacy side effects with graph convolutional networks,” Bioinformatics, vol. 34, no. 13, pp. 457–466, 2018. doi: 10.1093/bioinformatics/bty294.
[23] B. Huang and K. M. Carley, “Residual or gate? Towards deeper graph neural networks for inductive graph representation
learning,” Aug. 2019, arXiv: 1904.08035.
[24] M. Alioscha-Perez, A. D. Berenguer, E. Pei, M. C. Oveneke, and
H. Sahli, “Neural architecture search under black-box objectives with deep reinforcement learning and increasingly-sparse
rewards,” in 2020 Int. Conf. Artificial Intelligence in Information
and Communication (ICAIIC), Fukuoka, Japan, Feb. 2020. pp.
276–281. doi: 10.1109/ICAIIC48513.2020.9065031.
[25] Y. LeCun, C. Cortes, and C. Burges, “Mnist handwritten
digit database,” 2010. [Online]. Available: http://yann.lecun.
com/exdb/mnist/
[26] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based
learning applied to document recognition,” Proc. IEEE, vol. 86,
no. 11, pp. 2278–2324, 1998. doi: 10.1109/5.726791.
[27] A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. Adv.
Neural Inf. Process. Syst., 2012, pp. 1097–1105.
[28] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE
Conf. Comput. Vis. Pattern Recognit., 2009, pp. 248–255. doi:
10.1109/CVPR.2009.5206848.
[29] T. Tieleman and G. Hinton, “Lecture 6.5-Rmsprop: Divide the
gradient by a running average of its recent magnitude,” COURSERA: Neural Netw. Machine Learn., vol. 4, no. 2, pp. 26–31, 2012.
[30] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2014, arXiv:1412.6980.
[31] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
image recognition,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern
Recognit. (CVPR), 2016, pp. 770–778. doi: 10.1109/CVPR.2016.90.
[32] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Proc. Int.

166

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]
[42]

[43]

[44]

[45]

[46]
[47]

[48]

Conf. Med. Image Comput. Comput.-Assisted Intervention, 2015, pp.
234–241. doi: 10.1007/978-3-319-24574-4_28.
G. Huang, Z. Liu, K. Weinberger, and L. Maaten, “Densely connected convolutional networks,” in Proc. IEEE Int. Conf. Comput.
Vis. Pattern Recognit. (CVPR), 2017, pp. 2261–2269. doi: 10.1109/
CVPR.2017.243.
T. Hoeser and C. Kuenzer, “Object detection and image segmentation with deep learning on earth observation data: A
review-Part I: evolution and recent trends,” Remote Sens., vol.
12, no. 10, p. 1667, 2020. doi: 10.3390/rs12101667.
A. Mazza, F. Sica, P. Rizzoli, and G. Scarpa, “TanDEM-X forest
mapping using convolutional neural networks,” Remote Sens.,
vol. 11, no. 24, p. 2980, Jan. 2019. doi: 10.3390/rs11242980.
F. Lattari, B. Gonzalez Leon, F. Asaro, A. Rucci, C. Prati, and M.
Matteucci, “Deep learning for SAR image despeckling,” Remote
Sens., vol. 11, no. 13, p. 1532, 2019. doi: 10.3390/rs11131532.
D. Morgan, “Deep convolutional neural networks for ATR
from SAR imagery,” in Proc. SPIE, vol. 9475, May 13, 2015. doi:
10.1117/12.2176558.
B. A. Pearlmutter, “Learning state space trajectories in recurrent
neural networks,” Neural Computat., vol. 1, no. 2, pp. 263–269,
1989. doi: 10.1162/neco.1989.1.2.263.
S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computat., vol. 9, no. 8, pp. 1735–1780, 1997. doi:
10.1162/neco.1997.9.8.1735.
E. Ndikumana, D. Ho Tong Minh, N. Baghdadi, D. Courault,
and L. Hossard, “Deep recurrent neural network for agricultural classification using multitemporal SAR sentinel-1 for Camargue, France,” Remote Sens., vol. 10, no. 8, p. 1217, 2018. doi:
10.3390/rs10081217.
I. Goodfellow et al., “Generative adversarial nets,” in Proc. Adv.
Neural Inf. Process. Syst., 2014, pp. 2672–2680.
C. Grohnfeld, M. Schmitt, and X. X. Zhu, “A conditional generative adversarial network to fuse SAR and multispectral optical
data for cloud removal from Sentinel-2 images,” in Proc. IEEE
Int. Geosci. Remote Sens. Symp. (IGARSS), 2018, pp. 1726–1729,
doi: 10.1109/IGARSS.2018.8519215.
P. Ebel, M. Schmitt, and X. Zhu, “Cloud removal in unpaired sentinel-2 imagery using cycle-consistent GAN and SAR-optical data
fusion,” in Proc. IGARSS 2020 IEEE Int. Geosci. Remote Sens. Symp.
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R.
Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” J. Mach. Learn. Res., vol. 15, no. 1, pp.
1929–1958, 2014. doi: 10.5555/2627435.2670313.
K. Pearson, “LIII. On lines and planes of closest fit to systems of
points in space,” London, Edinburgh, Dublin Philosoph. Mag. J. Sci., vol.
2, no. 11, pp. 559–572, 1901. doi: 10.1080/14786440109462720.
D. P. Kingma and M. Welling, “Auto-encoding variational
bayes,” 2013, arXiv:1312.6114.
V. Mnih et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
doi: 10.1038/nature14236.
H. Mao, M. Alizadeh, I. Menache, and S. Kandula, “Resource
management with deep reinforcement learning,” in Proc.
15th ACM Workshop Hot Topics Netw., 2016, pp. 50–56. doi:
10.1145/3005745.3005750.

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

[49] D. Silver et al., “Mastering the game of go with deep neural networks and tree search,” nature, vol. 529, no. 7587, p. 484, 2016.
doi: 10.1038/nature16961.
[50] X. Chen, J. Liu, Z. Wang, and W. Yin, “Theoretical linear convergence of unfolded ista and its practical weights and thresholds,”
2018.
[51] T. Elsken, J. H. Metzen, and F. Hutter, “Neural architecture
search: A survey,” 2018, arXiv:1808.05377.
[52] H. Dong, B. Zou, L. Zhang, and S. Zhang, “Automatic design of
CNNs via differentiable neural architecture search for PolSAR
image classification,” IEEE Trans. Geosci. Remote Sens., vol. 58,
no. 9, pp. 1–14, 2020. doi: 10.1109/TGRS.2020.2976694.
[53] T. N. Kipf and M. Welling, “Semi-supervised classification with
graph convolutional networks,” 2016, arXiv:1609.02907.
[54] A. Kendall and Y. Gal, “What uncertainties do we need in
Bayesian deep learning for computer vision?” in Proc. 31st
Int. Conf. Neural Inf. Process. Syst., 2017, pp. 5580–5590. doi:
10.5555/3295222.3295309.
[55] Y. Shi, Q. Li, and X. X. Zhu, “Building segmentation through
a gated graph convolutional neural network with deep structured feature embedding,” ISPRS J. Photogram. Remote Sens.,
vol. 159, pp. 184–197, Jan. 2020. doi: 10.1016/j.isprsjprs.
2019.11.004.
[56] F. Ma, F. Gao, J. Sun, H. Zhou, and A. Hussain, “Attention graph
convolution network for image segmentation in big SAR imagery data,” Remote Sens., vol. 11, no. 21, p. 2586, 2019. doi:
10.3390/rs11212586.
[57] X. Tang, L. Zhang, and X. Ding, “SAR image despeckling with
a multilayer perceptron neural network,” Int. J. Digit. Earth,
vol. 12, no. 3, pp. 1–21, 2018. doi: 10.1080/17538947.2018.
1447032.
[58] L. Mou, M. Schmitt, Y. Wang, and X. X. Zhu, “A CNN for the
identification of corresponding patches in SAR and optical
imagery of urban scenes,” in Proc. Urban Remote Sens. Event
(JURSE), 2017, pp. 1–4. doi: 10.1109/JURSE.2017.7924548.
[59] R. Touzi, A. Lopes, and P. Bousquet, “A statistical and geometrical edge detector for SAR images,” IEEE Trans. Geosci. Remote
Sens., vol. 26, no. 6, pp. 764–773, 1988. doi: 10.1109/36.7708.
[60] G. Chierchia, D. Cozzolino, G. Poggi, and L. Verdoliva, “SAR
image despeckling through convolutional neural networks,”
2017, arXiv:1704.00275.
[61] Y. Shi, X. X. Zhu, and R. Bamler, “Optimized parallelization of
non-local means filter for image noise reduction of InSAR image,” in Proc. IEEE Int. Conf. Inf. Automat., 2015, pp. 1515–1518.
doi: 10.1109/ICInfA.2015.7279525.
[62] X. X. Zhu, R. Bamler, M. Lachaise, F. Adam, Y. Shi, and M.
Eineder, “Improving TanDEM-X DEMs by non-local InSAR filtering,” in Proc. Euro. Conf. Synth. Aperture Radar (EUSAR), 2014,
pp. 1–4.
[63] L. Denis, C.-A. Deledalle, and F. Tupin, “From patches to deep
learning: Combining self-similarity and neural networks for
SAR image despeckling,” in Proc. IGARSS 2019 - 2019 IEEE
Int. Geosci. Remote Sens. Symp., pp. 5113–5116. doi: 10.1109/
IGARSS.2019.8898473.
[64] J. Gao, B. Deng, Y. Qin, H. Wang, and X. Li, “Enhanced radar
imaging using a complex-valued convolutional neural netDECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

[65]

[66]

[67]

[68]

[69]

[70]

[71]

[72]

[73]

[74]

[75]

[76]

[77]

[78]

work,” IEEE Geosci. Remote Sens. Lett., vol. 16, no. 1, pp. 35–39,
2019. doi: 10.1109/LGRS.2018.2866567.
A. Moreira, P. Prats-Iraola, M. Younis, G. Krieger, I. Hajnsek,
and K. P. Papathanassiou, “A tutorial on synthetic aperture
radar,” IEEE Geosci. Remote Sens. Mag., vol. 1, no. 1, pp. 6–43,
2013. doi: 10.1109/MGRS.2013.2248301.
C. He, S. Li, Z. Liao, and M. Liao, “Texture classification of PolSAR data based on sparse coding of wavelet polarization textons,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 8, pp. 4576–
4590, 2013. doi: 10.1109/TGRS.2012.2236338.
H. Xie, S. Wang, K. Liu, S. Lin, and B. Hou, “Multilayer feature
learning for polarimetric synthetic radar data classification,” in
Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2014, pp.
2818–2821. doi: 10.1109/IGARSS.2014.6947062.
J. Geng, H. Wang, J. Fan, and X. Ma, “Deep supervised and
contractive neural network for SAR image classification,” IEEE
Trans. Geosci. Remote Sens., vol. 55, no. 4, pp. 2442–2459, 2017.
doi: 10.1109/TGRS.2016.2645226.
S. Uhlmann and S. Kiranyaz, “Integrating color features in polarimetric SAR image classification,” IEEE Trans. Geosci. Remote
Sens., vol. 52, no. 4, pp. 2197–2216, 2014. doi: 10.1109/TGRS.
2013.2258675.
J. Geng, J. Fan, H. Wang, X. Ma, B. Li, and F. Chen, “High-resolution SAR image classification via deep convolutional autoencoders,” IEEE Geosci. Remote Sens. Lett., vol. 12, no. 11, pp.
2351–2355, 2015. doi: 10.1109/LGRS.2015.2478256.
B. Hou, B. Ren, G. Ju, H. Li, L. Jiao, and J. Zhao, “SAR image
classification via hierarchical sparse representation and multisize patch features,” IEEE Geosci. Remote Sens. Lett., vol. 13, no.
1, pp. 33–37, 2016. doi: 10.1109/LGRS.2015.2493242.
F. Gao, T. Huang, J. Wang, J. Sun, A. Hussain, and E. Yang, “Dual-branch deep convolution neural network for polarimetric
SAR image classification,” Appl. Sci., vol. 7, no. 5, p. 447, 2017.
doi: 10.3390/app7050447.
B. Hou, H. Kou, and L. Jiao, “Classification of polarimetric
SAR images using multilayer autoencoders and superpixels,”
IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 9, no. 7,
pp. 3072–3081, 2016. doi: 10.1109/JSTARS.2016.2553104.
L. Zhang, W. Ma, and D. Zhang, “Stacked sparse autoencoder
in PolSAR data classification using local spatial information,”
IEEE Geosci. Remote Sens. Lett., vol. 13, no. 9, pp. 1359–1363,
2016. doi: 10.1109/LGRS.2016.2586109.
F. Qin, J. Guo, and W. Sun, “Object-oriented ensemble classification for polarimetric SAR imagery using restricted Boltzmann
machines,” Remote Sens. Lett., vol. 8, no. 3, pp. 204–213, 2017.
doi: 10.1080/2150704X.2016.1258128.
Z. Zhao, L. Jiao, J. Zhao, J. Gu, and J. Zhao, “Discriminant deep
belief network for high-resolution SAR image classification,”
Pattern Recognit., vol. 61, pp. 686–701, 2017. doi: 10.1016/j.patcog.2016.05.028.
Y. Zhou, H. Wang, F. Xu, and Y. Jin, “Polarimetric SAR image
classification using deep convolutional neural networks,” IEEE
Geosci. Remote Sens. Lett., vol. 13, no. 12, pp. 1935–1939, 2016.
doi: 10.1109/LGRS.2016.2618840.
Y. Wang, C. He, X. Liu, and M. Liao, “A hierarchical fully convolutional network integrated with sparse and low-rank subspace

167

[79]

[80]

[81]

[82]

[83]

[84]

[85]

[86]

[87]

[88]

[89]

[90]

[91]

168

representations for PolSAR imagery classification,” Remote
Sens., vol. 10, no. 2, p. 342, 2018. doi: 10.3390/rs10020342.
S. Chen and C. Tao, “PolSAR image classification using polarimetric-feature-driven deep convolutional neural network,”
IEEE Geosci. Remote Sens. Lett., vol. 15, no. 4, pp. 627–631, 2018.
doi: 10.1109/LGRS.2018.2799877.
C. He, M. Tu, D. Xiong, and M. Liao, “Nonlinear manifold
learning integrated with fully convolutional networks for PolSAR image classification,” Remote Sens., vol. 12, no. 4, p. 655,
2020. doi: 10.3390/rs12040655.
H. Dong, L. Zhang, and B. Zou, “PolSAR image classification
with lightweight 3D convolutional networks,” Remote Sens., vol.
12, no. 3, p. 396, 2020. doi: 10.3390/rs12030396.
N. Teimouri, M. Dyrmann, and R. N. Jørgensen, “A novel spatio-temporal FCN-LSTM network for recognizing various crop
types using multi-temporal radar images,” Remote Sens., vol. 11,
no. 8, p. 990, 2019. doi: 10.3390/rs11080990.
Z. Zhang, H. Wang, F. Xu, and Y. Jin, “Complex-valued convolutional neural network and its application in polarimetric SAR
image classification,” IEEE Trans. Geosci. Remote Sens., vol. 55,
no. 12, pp. 7177–7188, 2017. doi: 10.1109/TGRS.2017.2743222.
A. G. Mullissa, C. Persello, and A. Stein, “PolSARNet: A deep
fully convolutional network for polarimetric SAR image classification,” IEEE J. Select. Topics Appl. Earth Observ. Remote
Sens., vol. 12, no. 12, pp. 5300–5309, 2019. doi: 10.1109/
JSTARS.2019.2956650.
L. Li, L. Ma, L. Jiao, F. Liu, Q. Sun, and J. Zhao, “Complex
contourlet-CNN for polarimetric SAR image classification,”
Pattern Recognit., vol. 100, p. 107,110, Apr. 2020. doi: 10.1016/j.
patcog.2019.107110.
W. Xie, G. Ma, F. Zhao, H. Liu, and L. Zhang, “PolSAR image classification via a novel semi-supervised recurrent
complex-valued convolution neural network,” Neurocomputing, vol. 388, pp. 255–268, May 2020. doi: 10.1016/j.neucom.
2020.01.020.
Z. Huang, M. Datcu, Z. Pan, and B. Lei, “Deep SAR-Net: Learning
objects from signals,” ISPRS J. Photogram. Remote Sens., vol. 161,
pp. 179–193, Mar. 2020. doi: 10.1016/j.isprsjprs.2020.01.016.
R. Ressel, A. Frost, and S. Lehner, “A neural network-based
classification for sea ice types on x-band SAR images,” IEEE J.
Select. Topics Appl. Earth Observ. Remote Sens., vol. 8, no. 7, pp.
3672–3680, 2015. doi: 10.1109/JSTARS.2015.2436993.
R. Ressel, S. Singha, and S. Lehner, “Neural network based automatic sea ice classification for CL-pol RISAT-1 imagery,” in
Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2016, pp.
4835–4838. doi: 10.1109/IGARSS.2016.7730261.
R. Ressel, S. Singha, S. Lehner, A. Rosel, and G. Spreen, “Investigation into different polarimetric features for sea ice classification using x-band synthetic aperture radar,” IEEE J. Select. Topics
Appl. Earth Observ. Remote Sens., vol. 9, no. 7, pp. 3131–3143,
2016. doi: 10.1109/JSTARS.2016.2539501.
S. Singha, M. Johansson, N. Hughes, S. M. Hvidegaard, and H.
Skourup, “Arctic sea ice characterization using spaceborne fully
polarimetric L-, C-, and X-band SAR with validation by airborne measurements,” IEEE Trans. Geosci. Remote Sens., vol. 56,
no. 7, pp. 3715–3734, 2018. doi: 10.1109/TGRS.2018.2809504.

[92] N. Zakhvatkina, V. Smirnov, and I. Bychkova, “Satellite SAR
data-based sea ice classification: An overview,” Geosciences, vol.
9, no. 4, p. 152, 2019. doi: 10.3390/geosciences9040152.
[93] X. Yao, J. Han, G. Cheng, X. Qian, and L. Guo, “Semantic annotation of high-resolution satellite images via weakly supervised
learning,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 6, pp.
3660–3671, 2016. doi: 10.1109/TGRS.2016.2523563.
[94] G. Cheng, C. Yang, X. Yao, L. Guo, and J. Han, “When deep
learning meets metric learning: Remote sensing image scene
classification via learning discriminative CNNs,” IEEE Trans.
Geosci. Remote Sens., vol. 56, no. 5, pp. 2811–2821, 2018. doi:
10.1109/TGRS.2017.2783902.
[95] F. Zhang, C. Hu, Q. Yin, W. Li, H. Li, and W. Hong, “SAR target
recognition using the multi-aspect-aware bidirectional LSTM
recurrent neural networks,” 2017, arXiv:1707.09875.
[96] E. Keydel, S. Lee, and J. Moore, “MSTAR extended operating
conditions: A tutorial,” in Proc. SPIE, vol. 2757, pp. 228–242,
1996. doi: 10.1117/12.242059.
[97] S. Chen, H. Wang, F. Xu, and Y. Jin, “Target classification using
the deep convolutional networks for SAR images,” IEEE Trans.
Geosci. Remote Sens., vol. 54, no. 8, pp. 4806–4817, 2016. doi:
10.1109/TGRS.2016.2551720.
[98] J. Ding, B. Chen, H. Liu, and M. Huang, “Convolutional neural
network with data augmentation for SAR target recognition,”
IEEE Geosci. Remote Sens. Lett., vol. 13, no. 3, pp. 364–368, 2016.
doi: 10.1109/LGRS.2015.2513754.
[99] K. Du, Y. Deng, R. Wang, T. Zhao, and N. Li, “SAR ATR based on displacement-and rotation-insensitive CNN,” Remote Sens. Lett., vol.
7, no. 9, pp. 895–904, 2016. doi: 10.1080/2150704X.2016.1196837.
[100] M. Wilmanski, C. Kreucher, and J. Lauer, “Modern approaches
in deep learning for SAR ATR,” in Proc. SPIE 9843, Algorithms for
Synthetic Aperture Radar Imagery XXIII, vol. 9843, May 14, 2016,
p. 98430N. doi: 10.1117/12.2220290.
[101] S. Wagner, “SAR ATR by a combination of convolutional neural network and support vector machines,” IEEE Trans. Aerosp.
Electron. Syst., vol. 52, no. 6, pp. 2861–2872, 2016. doi: 10.1109/
TAES.2016.160061.
[102] F. Gao, T. Huang, J. Sun, J. Wang, A. Hussain, and E. Yang, “A new
algorithm for SAR image target recognition based on an improved
deep convolutional neural network,” Cogn. Computat., vol. 11, no.
6, pp. 809–824, 2019. doi: 10.1007/s12559-018-9563-z.
[103] F. Gao, T. Huang, J. Wang, J. Sun, E. Yang, and A. Hussain,
“Combining deep convolutional neural network and SVM to
SAR image target recognition,” in Proc. IEEE Int. Conf. Internet of
Things (iThings) IEEE Green Comput. Commun. (GreenCom) IEEE
Cyber, Phys. Soc. Comput. (CPSCom) IEEE Smart Data (SmartData), 2017, pp. 1082–1085. doi: 10.1109/iThings-GreenComCPSCom-SmartData.2017.165.
[104] H. Furukawa, “Deep learning for end-to-end automatic target
recognition from synthetic aperture radar imagery,” 2018, arXiv:1801.08558.
[105] D. Cozzolino, G. D Martino, G. Poggi, and L. Verdoliva, “A
fully convolutional neural network for low-complexity singlestage ship detection in Sentinel-1 SAR images,” in Proc. IEEE Int.
Geosci. Remote Sens. Symp. (IGARSS), 2017, pp. 886–889. doi:
10.1109/IGARSS.2017.8127094.
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

[106] C. Schwegmann, W. Kleynhans, B. Salmon, L. Mdakane, and
R. Meyer, “Very deep learning for ship discrimination in synthetic aperture radar imagery,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2016, pp. 104–107. doi: 10.1109/
IGARSS.2016.7729017.
[107] C. Bentes, A. Frost, D. Velotto, and B. Tings, “Ship-iceberg discrimination with convolutional neural networks in high resolution SAR images,” in Proc. Euro. Conf. Synth. Aperture Radar
(EUSAR), 2016, pp. 1–4.
[108] N. Ødegaard, A. Knapskog, C. Cochin, and J. Louvigne, “Classification of ships using real and simulated data in a convolutional neural network,” in Proc. IEEE Radar Conf. (RadarConf),
2016, pp. 1–6. doi: 10.1109/RADAR.2016.7485270.
[109] Y. Liu, M. Zhang, P. Xu, and Z. Guo, “SAR ship detection using
sea-land segmentation-based convolutional neural network,”
in Proc. Int. Workshop Remote Sens. Intell. Process. (RSIP), 2017,
pp. 1–4. doi: 10.1109/RSIP.2017.7958806.
[110] R. Girshick, “Fast R-CNN,” 2015, arXiv:1504.08083.
[111] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards
real-time object detection with region proposal networks,” IEEE
Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149,
2017. doi: 10.1109/TPAMI.2016.2577031.
[112] J. Li, C. Qu, and J. Shao, “Ship detection in SAR images based
on an improved faster R-CNN,” in Proc. SAR Big Data Era: Models, Methods Appl. (BIGSARDATA), 2017, pp. 1–6, doi: 10.1109/
BIGSARDATA.2017.8124934.
[113] M. Kang, K. Ji, X. Leng, and Z. Lin, “Contextual region-based
convolutional neural network with multilayer fusion for SAR
ship detection,” Remote Sens., vol. 9, no. 8, p. 860, 2017. doi:
10.3390/rs9080860.
[114] J. Jiao et al., “A densely connected end-to-end neural network
for multiscale and multiscene SAR ship detection,” IEEE Access, vol. 6, pp. 20,881–20,892, Apr. 2018. doi: 10.1109/ACCESS.2018.2825376.
[115] C. Dechesne, S. Lefèvre, R. Vadaine, G. Hajduch, and R. Fablet,
“Multi-task deep learning from sentinel-1 SAR: Ship detection,
classification and length estimation,” presented at the Conf. Big
Data from Space, 2019.
[116] S. Haykin, “Cognitive radar: A way of the future,” IEEE Signal
Process. Mag., vol. 23, no. 1, pp. 30–40, 2006. doi: 10.1109/
MSP.2006.1593335.
[117] S. Kazemi, B. Yonel, and B. Yazici, “Deep learning for direct automatic target recognition from SAR data,” in Proc. IEEE Radar
Conf. (RadarConf), 2019, pp. 1–6. doi: 10.1109/RADAR.2019.
8835492.
[118] M. Rostami, S. Kolouri, E. Eaton, and K. Kim, “Deep transfer
learning for few-shot SAR image classification,” Remote Sens.,
vol. 11, no. 11, p. 1374, 2019. doi: 10.3390/rs11111374.
[119] Z. Huang, Z. Pan, and B. Lei, “What, where, and how to transfer in SAR target recognition based on deep CNNs,” IEEE
Trans. Geosci. Remote Sens., vol. 58, no. 4, 2019. doi: 10.1109/
TGRS.2019.2947634.
[120] M. Shahzad, M. Maurer, F. Fraundorfer, Y. Wang, and X. X. Zhu,
“Buildings detection in VHR SAR images using fully convolution neural networks,” IEEE Trans. Geosci. Remote Sens., vol. 57,
no. 2, pp. 1100–1116, 2019. doi: 10.1109/TGRS.2018.2864716.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

[121] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional
networks for semantic segmentation,” in Proc. IEEE Int. Conf.
Comput. Vis. Pattern Recognit. (CVPR), 2015, pp. 3431–3440. doi:
10.1109/CVPR.2015.7298965.
[122] S. Zheng et al., “Conditional random fields as recurrent
neural networks,” in Proc. IEEE Int. Conf. Comput. Vis., 2015,
pp. 1529–1537. doi: 10.1109/ICCV.2015.179.
[123] Y. Sun, Y. Hua, L. Mou, and X. X. Zhu, “CG-net: Conditional
GIS-aware network for individual building segmentation in
VHR SAR images,” 2020, arXiv:2011.08362.
[124] F. Radar and J. Falkingham. “Global satellite observation requirements for floating ice.” World Meteorological Organization.
https://globalcryospherewatch.org/satellites/docs/PSTG-4_
Doc_08-04_GlobSatObsReq-FloatingIce.pdf (accessed Jan. 25,
2021).
[125] W. Dierking, “Sea ice monitoring by synthetic aperture radar,”
Oceanography, vol. 26, no. 2, pp. 100–111, 2013. doi: 10.5670/
oceanog.2013.33.
[126] L. Wang, K. Scott, L. Xu, and D. Clausi, “Sea ice concentration estimation during melt from dual-pol SAR scenes using
deep convolutional neural networks: A case study,” IEEE Trans.
Geosci. Remote Sens., vol. 54, no. 8, pp. 4524–4533, 2016. doi:
10.1109/TGRS.2016.2543660.
[127] L. Wang, “Learning to estimate sea ice concentration from SAR
imagery,” Ph.D. dissertation, Univ. Waterloo, 2016. [Online].
Available: http://hdl.handle.net/10012/10954
[128] S. Parrilli, M. Poderico, C. V. Angelino, and L. Verdoliva, “A
nonlocal SAR image denoising algorithm based on LLMMSE
wavelet shrinkage,” IEEE Trans. Geosci. Remote Sens., vol. 50,
no. 2, pp. 606–616, 2012. doi: 10.1109/TGRS.2011.2161586.
[129] D. Cozzolino, L. Verdoliva, G. Scarpa, and G. Poggi, “Nonlocal CNN SAR image despeckling,” Remote Sens., vol. 12, no. 6,
p. 1006, 2020. doi: 10.3390/rs12061006.
[130] T. Song, L. Kuang, L. Han, Y. Wang, and Q. H. Liu, “Inversion of
rough surface parameters from SAR images using simulationtrained convolutional neural networks,” IEEE Geosci. Remote
Sens. Lett., vol. 15, no. 7, pp. 1130–1134, 2018. doi: 10.1109/
LGRS.2018.2822821.
[131] J. Zhao, M. Datcu, Z. Zhang, H. Xiong, and W. Yu, “Contrastive-regulated CNN in the complex domain: A method to learn
physical scattering signatures from flexible polsar images,” IEEE
Trans. Geosci. Remote Sens., vol. 57, no. 12, pp. 10,116–10,135,
2019. doi: 10.1109/TGRS.2019.2931620.
[132] Q. Song, F. Xu, and Y.-Q. Jin, “Radar image colorization: converting single-polarization to fully polarimetric using deep
neural networks,” IEEE Access, vol. 6, pp. 1647–1661, 2018. [Online]. Available: https://ieeexplore.ieee.org/document/8141881
doi: 10.1109/ACCESS.2017.2779875.
[133] S. Niu, X. Qiu, B. Lei, C. Ding, and K. Fu, “Parameter extraction
based on deep neural network for SAR target simulation,” IEEE
Trans. Geosci. Remote Sens., vol. 58, no. 7, pp. 4901–4914, 2020.
doi: 10.1109/TGRS.2020.2968493.
[134] S. Auer, R. Bamler, and P. Reinartz, “RaySAR - 3D SAR simulator: Now open source,” in Proc. IEEE Int. Geosci. Remote Sens.
Symp. (IGARSS), Beijing, 2016, pp. 6730–6733. doi: 10.1109/
IGARSS.2016.7730757.

169

[135] J. Lee, “Digital image enhancement and noise filtering by use of
local statistics,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-2,
no. 2, pp. 165–168, 1980. doi: 10.1109/TPAMI.1980.4766994.
[136] D. Kuan, A. Sawchuk, T. Strand, and P. Chavel, “Adaptive noise
smoothing filter for images with signal-dependent noise,” IEEE
Trans. Pattern Anal. Mach. Intell., vol. PAMI-7, no. 2, pp. 165–177,
1985. doi: 10.1109/TPAMI.1985.4767641.
[137] V. Frost, J. Stiles, K. Shanmugan, and J. Holtzman, “A model
for radar images and its application to adaptive digital filtering of multiplicative noise,” IEEE Trans. Pattern Anal. Mach.
Intell., vol. PAMI-4, no. 2, pp. 157–166, 1982. doi: 10.1109/
TPAMI.1982.4767223.
[138] H. Xie, L. Pierce, and F. Ulaby, “SAR speckle reduction using
wavelet denoising and Markov random field modeling,” IEEE
Trans. Geosci. Remote Sens., vol. 40, no. 10, pp. 2196–2212, 2002.
doi: 10.1109/TGRS.2002.802473.
[139] F. Argenti and L. Alparone, “Speckle removal from SAR images
in the undecimated wavelet domain,” IEEE Trans. Geosci. Remote Sens., vol. 40, no. 11, pp. 2363–2374, 2002. doi: 10.1109/
TGRS.2002.805083.
[140] A. Achim, P. Tsakalides, and A. Bezerianos, “SAR image denoising via Bayesian wavelet shrinkage based on heavy-tailed
modeling,” IEEE Trans. Geosci. Remote Sens., vol. 41, no. 8, pp.
1773–1784, 2003. doi: 10.1109/TGRS.2003.813488.
[141] F. Argenti, A. Lapini, T. Bianchi, and L. Alparone, “A tutorial
on speckle reduction in synthetic aperture radar images,” IEEE
Geosci. Remote Sens. Mag., vol. 1, no. 3, pp. 6–35, 2013. doi:
10.1109/MGRS.2013.2277512.
[142] F. Tupin, L. Denis, C.-A. Deledalle, and G. Ferraioli, “Ten years
of patch-based approaches for SAR imaging: A review,” in Proc.
IGARSS 2019–2019 IEEE Int. Geosci. Remote Sens. Symp., pp.
5105–5108. doi: 10.1109/IGARSS.2019.8900596.
[143] C.-A. Deledalle, L. Denis, and F. Tupin, “Iterative weighted
maximum likelihood denoising with probabilistic patch-based
weights,” IEEE Trans. Image Process., vol. 18, no. 12, pp. 2661–
2672, 2009. doi: 10.1109/TIP.2009.2029593.
[144] A. Buades, B. Coll, and J.-M. Morel, “A non-local algorithm
for image denoising,” in Proc. IEEE Comput. Soc. Conf. Comput.
Vis. Pattern Recognit. (CVPR’05), 2005, vol. 2, pp. 60–65. doi:
10.1109/CVPR.2005.38.
[145] X. Su, C.-A. Deledalle, F. Tupin, and H. Sun, “Two-step multitemporal nonlocal means for synthetic aperture radar images,”
IEEE Trans. Geosci. Remote Sens., vol. 52, no. 10, pp. 6181–6196,
2014. doi: 10.1109/TGRS.2013.2295431.
[146] C.-A. Deledalle, L. Denis, F. Tupin, A. Reigber, and M. Jager,
“NL-SAR: A unified nonlocal framework for resolutionpreserving (pol)(in)SAR denoising,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 4, pp. 2021–2038, 2015. doi: 10.1109/
TGRS.2014.2352555.
[147] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a
Gaussian denoiser: Residual learning of deep CNN for image
denoising,” IEEE Trans. Image Process., vol. 26, no. 7, pp. 3142–
3155, 2017. doi: 10.1109/TIP.2017.2662206.
[148] Q. Zhang, Q. Yuan, J. Li, Z. Yang, and X. Ma, “Learning a dilated
residual network for SAR image despeckling,” Remote Sens., vol.
10, no. 2, p. 196, 2018. doi: 10.3390/rs10020196.

170

[149] D.-X. Yue, F. Xu, and Y.-Q. Jin, “SAR despeckling neural network with logarithmic convolutional product model,” Int.
J. Remote Sens., vol. 39, no. 21, pp. 7483–7505, 2018. doi:
10.1080/01431161.2018.1471539.
[150] S. Vitale, G. Ferraioli, and V. Pascazio, “Multi-objective CNN
based algorithm for SAR despeckling,” Aug. 2020, arXiv:
2006.09050v4.
[151] G. Baier, W. He, and N. Yokoya, “Robust nonlocal low-rank
SAR time series despeckling considering speckle correlation
by total variation regularization,” IEEE Trans. Geosci. Remote
Sens., vol. 58, no. 11, pp. 1–13, 2020. doi: 10.1109/TGRS.
2020.2985400.
[152] J. Lehtinen et al., “Noise2noise: Learning image restoration
without clean data,” 2018, arXiv:1803.04189.
[153] X. Ma, C. Wang, Z. Yin, and P. Wu, “SAR image despeckling by
noisy reference-based deep learning method,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 12, pp. 1–12, 2020. doi: 10.1109/
TGRS.2020.2990978.
[154] H. Zebker, C. Werner, P. Rosen, and S. Hensley, “Accuracy of
topographic maps derived from ERS-1 interferometric radar,”
IEEE Trans. Geosci. Remote Sens., vol. 32, no. 4, pp. 823–836,
1994. doi: 10.1109/36.298010.
[155] R. Abdelfattah and J. Nicolas, “Topographic SAR interferometry
formulation for high-precision DEM generation,” IEEE Trans.
Geosci. Remote Sens., vol. 40, no. 11, pp. 2415–2426, 2002. doi:
10.1109/TGRS.2002.805071.
[156] D. Massonnet, P. Briole, and A. Arnaud, “Deflation of mount
Etna monitored by spaceborne radar interferometry,” Nature,
vol. 375, no. 6532, p. 567, 1995. doi: 10.1038/375567a0.
[157] J. Ruch, J. Anderssohn, T. Walter, and M. Motagh, “Calderascale inflation of the Lazufre volcanic area, South America:
Evidence from InSAR,” J. Volcanol. Geotherm. Res., vol. 174,
no. 4, pp. 337–344, 2008. doi: 10.1016/j.jvolgeores.2008.
03.009.
[158] E. Trasatti et al., “The 2004–2006 uplift episode at Campi
Flegrei caldera (Italy): Constraints from SBAS-DInSAR ENVISAT data and Bayesian source inference,” Geophys. Res. Lett.,
vol. 35, no. 7, pp. 1–6, 2008. doi: 10.1029/2007GL033091.
[159] D. Massonnet et al., “The displacement field of the landers
earthquake mapped by radar interferometry,” Nature, vol. 364,
no. 6433, p. 138, 1993. doi: 10.1038/364138a0.
[160] G. Peltzer and P. Rosen, “Surface displacement of the 17 May
1993 Eureka valley, California, earthquake observed by SAR interferometry,” Science, vol. 268, no. 5215, pp. 1333–1336, 1995.
doi: 10.1126/science.268.5215.1333.
[161] V. B. H. (Gini) Ketelaar, Satellite Radar Interferometry (Remote
Sensing and Digital Image Processing), vol. 14. The Netherlands: Springer-Verlag, 2009.
[162] X. X. Zhu and R. Bamler, “Let’s do the time warp: Multicomponent nonlinear motion estimation in differential SAR tomography,” IEEE Geosci. Remote Sens. Lett., vol. 8, no. 4, pp. 735–739,
2011. doi: 10.1109/LGRS.2010.2103298.
[163] S. Gernhardt and R. Bamler, “Deformation monitoring of single buildings using meter-resolution SAR data in PSI,” ISPRS
J. Photogram. Remote Sens., vol. 73, pp. 68–79, Sept. 2012. doi:
10.1016/j.isprsjprs.2012.06.009.
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

[164] S. Montazeri, X. X. Zhu, M. Eineder, and R. Bamler, “Threedimensional deformation monitoring of urban infrastructure
by tomographic SAR using multitrack TerraSAR-x data stacks,”
IEEE Trans. Geosci. Remote Sens., vol. 54, no. 12, pp. 6868–6878,
2016. doi: 10.1109/TGRS.2016.2585741.
[165] K. Ichikawa and A. Hirose, “Singular unit restoration in InSAR
using complex-valued neural networks in the spectral domain,”
IEEE Trans. Geosci. Remote Sens., vol. 55, no. 3, pp. 1717–1723,
2017. doi: 10.1109/TGRS.2016.2630719.
[166] R. Yamaki and A. Hirose, “Singular unit restoration in interferograms based on complex-valued markov random field
model for phase unwrapping,” IEEE Geosci. Remote Sens.
Lett., vol. 6, no. 1, pp. 18–22, 2009. doi: 10.1109/LGRS.2008.
2005588.
[167] K. Oyama and A. Hirose, “Adaptive phase-singular-unit restoration with entire-spectrum-processing complex-valued neural
networks in interferometric SAR,” Electron. Lett., vol. 54, no. 1,
pp. 43–44, 2018. doi: 10.1049/el.2017.2680.
[168] S. Valade et al., “Towards global volcano monitoring using
multisensor sentinel missions and artificial intelligence: The
MOUNTS monitoring system,” Remote Sens., vol. 11, no. 13, pp.
1528, 2019. doi: 10.3390/rs11131528.
[169] G. Costante, T. Ciarfuglia, and F. Biondi, “Towards monocular digital elevation model (DEM) estimation by convolutional
neural networks-application on synthetic aperture radar images,” 2018, arXiv:1803.05387.
[170] C. Schwegmann, W. Kleynhans, J. Engelbrecht, L. M
dakane,
and R. Meyer, “Subsidence feature discrimination using deep convolutional neural networks in synthetic aperture radar imagery,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2017,
pp. 4626–4629. doi: 10.1109/IGARSS.2017.8128031.
[171] N. Anantrasirichai, F. Albino, P. Hill, D. Bull, and J. Biggs, “Detecting volcano deformation in InSAR using deep learning,”
2018, arXiv:1803.00380.
[172] N. Anantrasirichai, J. Biggs, F. Albino, and D. Bull, “A deep
learning approach to detecting volcano deformation from
satellite imager y using synthetic datasets,” Remote Sens.
Environ., vol. 230, p. 111,179, Sept. 2019. doi: 10.1016/j.rse.
2019.04.032.
[173] N. Anantrasirichai, J. Biggs, F. Albino, and D. Bull, “The application of convolutional neural networks to detect slow, sustained deformation in InSAR time series,” Geophys. Res. Lett.,
vol. 46, no. 21, pp. 11,850–11,858, 2019.
[174] F. Del Frate, M. Picchiani, G. Schiavon, and S. Stramondo,
“Neural networks and SAR interferometry for the characterization of seismic events,” in Proc. SPIE, 2010, p. 78290J. doi:
10.1117/12.867915.
[175] M. Picchiani, F. Del Frate, G. Schiavon, S. Stramondo, M. Chini,
and C. Bignami, “Neural networks for automatic seismic source
analysis from DInSAR data,” in Proc. SPIE, 2011, p. 81790K. doi:
10.1117/12.898575.
[176] S. Stramondo, F. Del Frate, M. Picchiani, and G. Schiavon,
“Seismic source quantitative parameters retrieval from InSAR data and neural networks,” IEEE Trans. Geosci. Remote
Sens., vol. 49, no. 1, pp. 96–104, 2011. doi: 10.1109/TGRS.
2010.2050776.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

[177] J. Gao, Y. Ye, S. Li, Y. Qin, X. Gao, and X. Li, “Fast super-resolution 3D SAR imaging using an unfolded deep network,” in Proc.
IEEE Int. Conf. Signal, Inf. Data Process. (ICSIDP), 2019, pp. 1–5.
doi: 10.1109/ICSIDP47821.2019.9173392.
[178] C. Wu, Z. Zhang, L. Chen, and W. Yu, “Super-resolution for MIMO
array SAR 3-D imaging based on compressive sensing and deep
neural network,” IEEE J. Select. Topics Appl. Earth Observ. Remote Sens.,
vol. 13, pp. 3109–3124, 2020. doi: 10.1109/JSTARS.2020.3000760.
[179] A. Hirose, Complex-Valued Neural Networks (Studies in Computational Intelligence). Berlin: Springer-Verlag, 2012, vol. 400.
[180] G. Rongier, C. Rude, T. Herring, and V. Pankratius, “Generative
Modeling of InSAR Interferograms,” Earth Space Sci., vol. 6, no.
12, pp. 2671–2683, 2019. doi: 10.1029/2018EA000533.
[181] M. Schmitt and X. X. Zhu, “On the challenges in stereogrammetric fusion of SAR and optical imagery for urban areas,” Int. Arch.
Photogram. Remote Sens. Spatial Inf. Sci., vol. XLI-B7, pp. 719–722,
June 2016. doi: 10.5194/isprs-archives-XLI-B7-719-2016.
[182] Y. Wang, X. X. Zhu, S. Montazeri, J. Kang, L. Mou, and M.
Schmitt, “Potential of the ‘SARptical’ system,” presented at the
FRINGE, 2017.
[183] Y. Wang and X. X. Zhu, “The SARptical dataset for joint analysis
of SAR and optical image in dense urban area,” in Proc. IEEE Int.
Geosci. Remote Sens. Symp. (IGARSS), 2018, pp. 6840–6843. doi:
10.1109/IGARSS.2018.8518298.
[184] S. Wang, D. Quan, X. Liang, M. Ning, Y. Guo, and L. Jiao, “A deep
learning framework for remote sensing image registration,” ISPRS J. Photogram. Remote Sens., vol. 145, pp. 148–164, Nov. 2018.
doi: 10.1016/j.isprsjprs.2017.12.012.
[185] N. Merkle, W. Luo, S. Auer, R. Müller, and R. Urtasun, “Exploiting deep matching and SAR data for the geo-localization accuracy improvement of optical satellite images,” Remote Sens., vol.
9, no. 6, p. 586, 2017. doi: 10.3390/rs9060586.
[186] S. Suri and P. Reinartz, “Mutual-information-based registration of TerraSAR-X and Ikonos imagery in urban areas,” IEEE
Trans. Geosci. Remote Sens., vol. 48, no. 2, pp. 939–949, 2010.
doi: 10.1109/TGRS.2009.2034842.
[187] F. Dellinger, J. Delon, Y. Gousseau, J. Michel, and F. Tupin, “SARSIFT: A SIFT-like algorithm for SAR images,” IEEE Trans. Geosci.
Remote Sens., vol. 53, no. 1, pp. 453–466, 2015. doi: 10.1109/
TGRS.2014.2323552.
[188] D. Abulkhanov, I. Konovalenko, D. Nikolaev, A. Savchik, E.
Shvets, and D. Sidorchuk, “Neural network-based feature point
descriptors for registration of optical and SAR images,” in Proc.
SPIE 10696, Tenth Int. Conf. Machine Vision (ICMV 2017), vol.
10696 2017, pp. 106960L. doi: 10.1117/12.2310085.
[189] M. A. Fischler and R. C. Bolles, “Random sample consensus: A
paradigm for model fitting with applications to image analysis
and automated cartography,” Commun. ACM, vol. 24, no. 6, pp.
381–395, 1981. doi: 10.1145/358669.358692.
[190] N. Merkle, S. Auer, R. Müller, and P. Reinartz, “Exploring the
potential of conditional adversarial networks for optical and
SAR image matching,” IEEE J. Select. Topics Appl. Earth Observ. Remote Sens., vol. 11, no. 6, pp. 1–10, 2018. doi: 10.1109/
JSTARS.2018.2803212.
[191] L. H. Hughes, N. Merkle, T. Burgmann, S. Auer, and M. Schmitt,
“Deep learning for SAR-optical image matching,” in Proc.

171

IGARSS 2019 – 2019 IEEE Int. Geosci. Remote Sens. Symp., pp.
4877–4880. doi: 10.1109/IGARSS.2019.8898635.
[192] M. Fuentes Reyes, S. Auer, N. Merkle, C. Henry, and M. Schmitt,
“SAR-to-optical image translation based on conditional generative adversarial networks-optimization, opportunities and
limits,” Remote Sens., vol. 11, no. 17, p. 2067, 2019. doi: 10.3390/
rs11172067.
[193] W. Yao, D. Marmanis, and M. Datcu, “Semantic segmentation
using deep neural networks for SAR and optical image pairs,”
presented at the Big Data from Space, 2017.
[194] N. Audebert, B. Le Saux, and S. Lefevre, “Semantic segmentation
of earth observation data using multimodal and multi-scale
deep networks,” in Computer Vision–ACCV 2016 (Lecture Notes
in Computer Science), vol. 10111, S.-H. Lai, V. Lepetit, K. Nishino, and Y. Sato, Eds. Cham: Springer-Verlag, 2017, pp. 180–196.
[195] M. Schmitt, L. Hughes, M. Körner, and X. X. Zhu, “Colorizing Sentinel-1 SAR images using a variational autoencoder conditioned
on Sentinel-2 imagery,” Int. Arch. Photogram. Remote Sens. Spatial
Inform. Sci., vol. 42, no. 2, pp. 1045–1051, 2018. doi: 10.5194/isprsarchives-XLII-2-1045-2018.
[196] C. Bishop, “Mixture density networks,” Citeseer, Tech. Rep., 1994.
Accessed: Jan. 25, 2020. [Online]. Available: https://publications.
aston.ac.uk/id/eprint/373/1/NCRG_94_004.pdf
[197] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-toimage translation using cycle-consistent adversarial networks,”
in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 2242–
2251. doi: 10.1109/ICCV.2017.244.
[198] L. H. Hughes and M. Schmitt, “A semi-supervised approach to
SAR-optical image matching,” ISPRS Ann. Photogram. Remote
Sens. Spatial Inform. Sci., vol. IV-2/W7, pp. 71–78, Sept. 2019.
doi: 10.5194/isprs-annals-IV-2-W7-71-2019.
[199] J. Zhao, Z. Zhang, W. Yao, M. Datcu, H. Xiong, and W. Yu, “OpenSARUrban: A Sentinel-1 SAR image dataset for urban interpretation,” IEEE J. Select. Topics Appl. Earth Observ. Remote Sens., vol.
13, pp. 187–203, 2020. doi: 10.1109/JSTARS.2019.2954850.
[200] X. Zhu et al., “So2Sat LCZ42: A benchmark dataset for global
local climate zones classification,” IEEE Geosci. Remote Sens.
Mag., vol. 8, no. 3, pp. 187–203, 2020. doi: 10.1109/MGRS.2020.
2964708.
[201] M. Neumann, A. S. Pinto, X. Zhai, and N. Houlsby, “In-domain
representation learning for remote sensing,” Nov. 2019, arXiv:
1911.06721.
[202] M. Schmitt, L. H. Hughes, C. Qiu, and X. X. Zhu, “SEN12MS A curated dataset of georeferenced multi-spectral Sentinel-1/2
imagery for deep learning and data fusion,” ISPRS Ann. Photogram. Remote Sens. Spatial Inform. Sci., vol. IV-2/W7, pp. 153–
160, Sept. 2019. doi: 10.5194/isprs-annals-IV-2-W7-153-2019.
[203] M. Schmitt, L. H. Hughes, and X. X. Zhu, “The SEN1-2 dataset
for deep learning in SAR-Optical data fusion,” in Proc. ISPRS
Ann. Photogram. Remote Sens. Spatial Inf. Sci., pp. 141–146, 2018.
[204] J. Shermeyer et al., “Spacenet 6: Multi-sensor all weather mapping dataset,” 2020, arXiv:2004.06500.
[205] X. Liu, L. Jiao, and F. Liu, “PolSF: Polsar image dataset on San
Francisco,” 2019, arXiv:1912.07259.
[206] Y. Cao, Y. Wu, P. Zhang, W. Liang, and M. Li, “Pixel-wise Polsar
image classification via a novel complex-valued deep fully con-

172

volutional network,” Remote Sens., vol. 11, no. 22, p. 2653, 2019.
doi: 10.3390/rs11222653.
[207] T. Ross, S. Worrell, V. Velten, J. Mossing, and M. Bryant, “Standard SAR ATR evaluation experiments using the MSTAR public
release data set,” in Proc. Algorithms Synth. Aperture Radar Imag.,
1998. doi: 10.1117/12.321859.
[208] F. Gao, Y. Yang, J. Wang, J. Sun, E. Yang, and H. Zhou, “A deep
convolutional generative adversarial networks (DCGANS)based semi-supervised method for object recognition in synthetic aperture radar (SAR) images,” Remote Sens., vol. 10, no. 6,
p. 846, 2018. doi: 10.3390/rs10060846.
[209] B. Li, B. Liu, L. Huang, W. Guo, Z. Zhang, and W. Yu, “OpenSARShip 2.0: A large-volume dataset for deeper interpretation
of ship targets in Sentinel-1 imagery,” in Proc. SAR Big Data Era:
Models, Methods Appl. (BIGSARDATA), Nov. 2017, pp. 1–5. doi:
10.1109/BIGSARDATA.2017.8124929.
[210] L. Huang et al., “OpenSARShip: A dataset dedicated to Sentinel-1 ship interpretation,” IEEE J. Select. Topics Appl. Earth Observ. Remote Sens., vol. 11, no. 1, pp. 195–208, Jan. 2018. doi:
10.1109/JSTARS.2017.2755672.
[211] Y. Wang, C. Wang, H. Zhang, Y. Dong, and S. Wei, “A SAR dataset of ship detection for deep learning under complex backgrounds,” Remote Sens., vol. 11, no. 7, p. 765, Mar. 2019. doi:
10.3390/rs11070765.
[212] Y. Wang, X. X. Zhu, B. Zeisl, and M. Pollefeys, “Fusing meterresolution 4-D InSAR point clouds and optical images for semantic urban infrastructure monitoring,” IEEE Trans. Geosci.
Remote Sens., vol. 55, no. 1, pp. 14–26, Jan. 2017. doi: 10.1109/
TGRS.2016.2554563.
[213] I. D. Stewart and T. R. Oke, “Local climate zones for urban temperature studies,” Bull. Amer. Meterol. Soc., vol. 93, no. 12, pp.
1879–1900, 2012. doi: 10.1175/BAMS-D-11-00019.1.
[214] H. Xiyue, W. Ao, Q. Song, J. Lai, H. Wang, and F. Xu, “FUSARship: A high-resolution SAR-AIS matchup dataset of Gaofen-3
for ship detection and recognition,” Sci. China Inf. Sci., vol. 68,
2020, Art. no. 140303. doi: 10.1007/s11432-019-2772-5.
[215] S. Xian, W. Zhirui, S. Yuanrui, D. Wenhui, Z. Yue, and F. Kun,
“Air-sarship–1.0: High resolution SAR ship detection dataset,”
J. Radars, vol. 8, no. 6, pp. 852–862, 2019.
[216] P. Yu, A. Qin, and D. Clausi, “Unsupervised polarimetric SAR
image segmentation and classification using region growing
with edge penalty,” IEEE Trans. Geosci. Remote Sens., vol. 50, no.
4, pp. 1302–1317, 2012. doi: 10.1109/TGRS.2011.2164085.
[217] D. Hoekman and M. Vissers, “A new polarimetric classification
approach evaluated for agricultural crops,” IEEE Trans. Geosci.
Remote Sens., vol. 41, no. 12, pp. 2881–2889, 2003. doi: 10.1109/
TGRS.2003.817795.
[218] W. Yang, D. Dai, J. Wu, and C. He, “Weakly supervised polarimetric SAR image classification with multi-modal Markov aspect model,” in Proc. ISPRS, 2010.
[219] C. O. Dumitru, G. Schwarz, and M. Datcu, “SAR image land
cover datasets for classification benchmarking of temporal changes,” IEEE J. Select. Topics Appl. Earth Observ. Remote
Sens., vol. 11, no. 5, pp. 1571–1592, May 2018. doi: 10.1109/
JSTARS.2018.2803260.
GRS
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

©SHUTTERSTOCK.COM/1968

Forward-Looking
Ground-Penetrating
Radar

Subsurface target imaging and detection: A review
DAVIDE COMITE, FAUZIA AHMAD, MOENESS G. AMIN, AND TRAIAN DOGARU

etection of shallow-buried, in-road threats using a
forward-looking (FL) ground-penetrating radar (GPR)
system has attracted significant research interest in the last
decade. An FL-GPR mounted on a moving platform can
provide standoff target detection and imaging. This enables
real-time sensing and situation awareness over large ground
areas. The main challenge facing this sensing technology
is high false-alarm rates due to scattering arising from air–
ground interface roughness and subsurface clutter.
In this article, we present a comprehensive review of the
state-of-the-art techniques that address the unique challenges associated with FL-GPR technology. Specifically, we
focus on array-based FL-GPR systems and consider both
electromagnetic modeling and signal processing for problem formulation and solutions. Image formation methods
and target detection approaches are discussed, highlighting
their offerings and shortcomings in providing reliable system performance.
Digital Object Identifier 10.1109/MGRS.2020.3048368
Date of current version: 9 February 2021

DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

THE CHALLENGES OF FORWARD-LOOKING
GROUND-PENETRATING RADAR
In recent years, radar imaging and detection of shallow-buried targets have garnered much interest due to the need for
reliable subsurface investigations in a variety of applications,
including real-time security, military situational awareness,
and humanitarian demining of unexploded ordnance over
large areas [1]–[8]. Although a broad class of sensing modalities, including seismic and radiometric, have been proposed
in the literature for the detection of buried targets [9]–[11],
electromagnetic waves remain a viable option (see, e.g., [12])
owing to their various attributes, such as superior ground
penetration, sensitivity to arbitrarily shaped plastic targets,
and robustness to different soil conditions. In particular, the
FL-GPR technology is gaining impetus as it enables sensing
from a standoff distance.
A major motivation for the development of early FL-GPR
systems has been their terrain-mapping capabilities, used to
clear roads from explosive hazards. Vehicle-borne, downlooking radar systems previously employed in this application lacked the standoff detection range that would enable
0274-6638/21©2021IEEE

173

spotting of the hazard before the vehicle drove over it. By pointing the antenna array to look ahead of the vehicle, FL-GPR
systems are able to achieve a reasonable lead detection time
before reaching the actual explosive hazard location.
However, performance of an FL-GPR system is highly
impacted by rough surface clutter (see, e.g., [13]). Depending on the soil conditions and degree of surface roughness,
the returns from the ground interface can dominate the
radar measurements and obscure the target response. This
leads to significant uncertainty in the assessment and interpretation of the attained radar images.
Compared to its downward-looking counterpart, wherein the antennas are either coupled or very close to the ground
surface (see, e.g., [3], [4], and [14]–[16]), an FL-GPR system
employs oblique and near-grazing incidence sensing to enable target detection from a
safe standoff distance. In this
case and depending on the
roughness profile of the illuAPPROACHES BASED
minated surface, most of the
ON CONTROLLED
energy would be forward-scatEXPERIMENTS, THOUGH
tered along the specular direcCOMPLICATED AND COSTLY,
tion, yielding reduced returns
ARE VALUABLE IN
from the air–ground interface.
UNDERSTANDING
In practice, however, even
PHENOMENOLOGY AND
if the backscattered echo from
CAN PROVIDE REAL
the rough surface is relatively
weak, the intensity of the sigSCATTERING DATA.
nal returns from concealed
targets can also be quite low.
This renders target detection
and localization challenging, especially in the case of nonmetallic objects. Therefore, proper design of both imaging
and detection approaches becomes fundamental to improving the performance of the FL configuration. To compensate for the loss of energy due to the signal bounce at the
ground interface, synthetic aperture radar (SAR)-based focusing is typically employed [4], [17]–[21], wherein coherently combining the returns at multiple antenna positions
focuses the energy to an image pixel, thereby improving
weak target representations.
Several approaches based on electromagnetic modeling
and statistical detection analysis have been proposed for
FL-GPR (see, e.g., [19], [22], and [23] and the references
therein). In this article, we focus on array-based FL-GPR systems and provide a comprehensive review of the state-of-theart radar imaging and detection methods, highlighting their
advantages and limitations. We attempt to group advances
in FL-GPR based on the nature of the data employed, system
prototyping, properties of the imaging scene assumed, and
principal signal processing algorithms undertaken.
SOLUTION TO THE FORWARD PROBLEM
A controlled solution for the forward-scattering problem can
be a powerful tool to assess and characterize the ground interface contributions, predict the target signature, and design
174

and validate image formation methods, including clutter
mitigation approaches. This would require determining the
scattered field from the illuminated scene that essentially
comprises the targets buried in a dielectric half-space with
a rough surface profile. For simplicity and without loss of
generality, the involved media are assumed homogeneous.
Under known materials and imaging geometry, in the
presence of a flat ground interface and considering targets
represented by canonical simple shapes, the underlying scattering problem can be analytically characterized by solving
Maxwell’s equations (see, e.g., [12], [24], and [25]). However, in most cases of practical interest, those assumptions
and prior information do not hold, and more realistic and
flexible approaches are needed. These approaches call for
implementing a full-wave solution of the scattering problem numerically (see, e.g., [26] and the references therein) or
collecting experimental data. In the next sections, we summarize key FL-GPR systems used in data collection and also
discuss the numerical approaches used for data modeling.
FORWARD-LOOKING GROUND-PENETRATING RADAR
PROTOTYPES AND EXPERIMENTAL APPROACHES
Approaches based on controlled experiments, though complicated and costly, are valuable in understanding phenomenology and can provide real scattering data. This, however,
requires the availability of specific facilities and preparation of experimental campaigns. Toward this end, different
prototypes and radar systems have been developed for data
measurements, which are summarized as follows.
In [27]–[29], a prototype of a high-resolution GPR system was designed and deployed by SRI International. This
system is a stepped-frequency, fully polarimetric radar, operating over the 0.3–3-GHz frequency band. The prototype
was conceived to operate as an FL SAR system, providing a
ground–surface resolution of about 5 cm. The experimental
activity was originally designed to define optimal FL-GPR parameters and support image processing for the standoff target
detection of concealed antitank mines. Reference [27] was the
first publication reporting experimental data collection and
processing with an FL SAR system for GPR applications. It
provided insights into the signal-to-clutter ratio (SCR) of shallow-buried targets as well as key features of clutter statistics.
Time-frequency analysis was applied in [30] using an
FL-GPR system to detect plastic targets buried under a
rough ground surface. Different quadratic time-frequency
distributions were considered to characterize and interpret
the scattering from both the targets and the rough surface.
This work employed experimental data described in [27]
and proposed a target detector based on the signal ambiguity function, which showed superior detection performance
over a conventional detector.
In [31], an FL-GPR operating from 0.76 to 3.8 GHz was
developed by Planning System Incorporated (PSI). This FLGPR system is a broadband, stepped-frequency, continuous wave (CW) system performing digital phase detection
of the CW echoes on a fixed number of receiving channels.
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

Experimental campaigns were carried out to collect data in
the field, accounting for both metallic and plastic objects. A
near-field delay-and-sum beamforming algorithm (more details are provided in the “Image Formation” section) was implemented to provide focused images of the considered area.
To meet the system bandwidth requirements, the antennas
were constituted by Archimedean spirals, and each antenna
was housed in a cavity-backed structure.
The U.S. Army Combat Capabilities Development Command Army Research Laboratory (ARL) FL-GPR prototype,
called the synchronous impulse reconstruction (SIRE) radar [see
Figure 1(a)], is an ultrawideband (UWB) radar based on
the transmission of short pulses [32]. For the imaging and
detection of buried targets, the system employs a physical
array of 16 receiving antennas, which provide a long aperture for high cross-range resolution. The transmitted pulse
has a 0.3–3-GHz bandwidth, which represents a tradeoff
between fine down-range resolution and the ability to penetrate soil depths of a few centimeters.
To increase the signal-to-noise ratio, the baseband receiver integrates radar returns from multiple pulses prior to
processing for target detection. The system hardware was
based on commercially available integrated circuits, which
provided a low-cost and lightweight digitizing scheme. In
[32], both simulations and measurements in the field were
conducted considering on-surface metallic targets; the possibility of penetrating foliage and weather was experimentally assessed.
Following the design and testing of SIRE, ARL researchers proposed a new UWB radar system, called the spectrally
agile frequency-incrementing reconfigurable (SAFIRE) radar
system [34]. SAFIRE was designed to provide an unprecedented capability of adapting the operating frequency to
the surrounding electromagnetic environment, thereby
lowering the susceptibility of the system to radio-frequency
(RF) interference. To this end, SAFIRE employed steppedfrequency waveforms and sought to eliminate system
transmissions that are likely to cause interference to nearby
sources of disturbance [35]. Currently, such a feature is
considered essential for FL-GPRs operating in congested
RF environments.
The SAFIRE operating band ranges from 300 to 2,000 MHz,
with a minimum frequency step-size of 1 MHz. The SAFIRE
system can be configured in either an FL or side-looking
orientation and is equipped with a uniform linear array
made of 16 Vivaldi receiving antennas and two quad-ridge
horn transmit antennas. The latter are placed above the
ends of the receiver array. The sequential firing of the two
transmitters provided orthogonal waveforms, which established a multiple-input, multiple-output (MIMO) configuration with an extended virtual aperture for improved
cross-range resolution.
Experimental FL-GPR data, collected by the Army
Look Ahead Radar Impulse (for) Countermine (ALARIC)
vehicle-borne UWB impulse radar system, were used in
[36] to provide the first assessment of coherent integration
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

through exploitation of the platform movement. The system employs an impulse generator at approximately 950 MHz
and has a 300–3,000-MHz bandwidth (down-range resolution of .5 cm). A pair of transverse electromagnetic horn
transmit antennas, placed at two ends of a 2-m-wide receiver array, was considered to provide good pulse fidelity while
minimizing the reflected power of the transmitter. The receiver array comprised 16 identical Vivaldi notch antennas,
which were selected because of their compact size and low
cross coupling between elements.
Using physical array measurements from multiple platform positions, it was shown that conventional synthetic
aperture processing can be used to form FL-GPR images of
good quality, though at the expense of the lateral position
estimates of the targets within the illuminated scene. The
preliminary results also demonstrated the possibility of successfully detecting metallic targets buried near the surface.
More recently, the authors in [33] proposed an experimental test for the assessment of imaging and detection

(a)

(b)
FIGURE 1. (a) The system and field test of the U.S. ARL. (Source:
[32]). (b) The test facilities at Ingegneria dei Sistemi S.p.A., Italy.
(Source: [33]).

175

with two transmitters and 16 receivers, were presented in [18].
The modeled system has a frequency bandwidth from 0.3
to 1.5 GHz. In particular, a near-field army FDTD software
package was developed at the ARL for synthesizing FL-GPR
numerical data accounting for realistic sensing conditions.
More information on the modeling approach and targets used for the analysis can be found in [37] and [38].
An example of FL-GPR-focused numerical data from [38]
is reproduced in Figure 2; the images are formed over a
horizontal plane in front of the transmitting and receiving antennas considering both metallic and plastic targets whose locations are specified in Figure 2(a). Both
the flat ground interface [Figure 2(a)] and the rough
surface [Figure 2(b)] are simulated. The latter was generated by assuming a random process model described by
Gaussian statistics.
A 3D full-wave approach, based on a finite- difference
frequency- domain method and optimized to provide realtime solutions, was proposed in [20] to model an FL-GPR
on a moving platform and calculate the scattering from
rough terrains located at large electrical distances from the
antennas. For a synthetic aperture, the computational domain was reduced to a small subset of the observed region,
and the surface clutter was determined by performing the
simple multiplication of a precomputed impulse response
NUMERICAL APPROACHES
matrix of the rough profile with a matrix characterizing the
Numerical data obtained by means of a finite- difference
FL-GPR transmitted signal.
time- domain (FDTD) method, modeling an FL-GPR system
This approach significantly reduced the complexity through an efficient use of computational resources,
–20
4
thereby permitting the representa5
8
3
–25
2
tion of lossy/frequency-dispersive
2
Tx: (θinc, φ1)
–30
11
soils and target-detection processing
1
3
–35
Rx Array
0
6
in real time. This is especially useful
–40
10
9
–1
in scenarios where an experimental
Tx: (θinc, φ2)
–2 1
–45
performance validation may incur a
7
–3
–50
high cost and/or require significantly
4
–4
more resources.
–8 –6 –4
–2
0
2
4
6
8
The authors in [39] extended the rex (m)
al-time
3D simulation to the multiview
(a)
case,
considering
a realistic velocity of
4
the
moving
platform.
The matrix-mul–25
3
tiplication-based
surface
clutter com2
–30
1
putation
in
this
case
required
an addi–35
0
tional precomputed correction matrix
–40
–1
of the moving platform measurement
–45
–2
steps along the direction of motion.
–3
–50
The method was tested via Monte Carlo
–4
–55
simulations. In practice, the proposed
–8 –6 –4
–2
0
2
4
6
8
simulation-based approach can be
x (m)
(b)
used to estimate the scattering from the
rough surface profile, which can then
be subtracted from the actual FL-GPR
FIGURE 2. The focused numerical data (in decibels) for a scene with size equal to 9 × 19 m:
measurements. The resulting difference
ground with (a) a flat surface and (b) a randomly rough surface characterized by a root mean
square (rms) surface height equal to 0.8 cm and correlation length of 14.93 cm. Further details signals can then be processed for image
formation and target detection.
can be found in [38]. Rx: receive; Tx: transmit. (Source: [38].)
y (m)

y (m)

performance by means of an FL-GPR under realistic conditions. Test facilities at Ingegneria dei Sistemi S.p.A., headquartered in Italy [see Figure 1(b)], are equipped with a
moving platform that can support two or more antennas,
pointing toward a test site that comprises several resolution
cells of the FL-GPR system. The test field allows the inclusion
of heterogeneous soils. Data were gathered in the frequency
range from 0.4 to 2 GHz using a transmit and a receive horn
antenna, both connected to a network analyzer.
The antennas, spaced 93.5 cm apart and tilted at a 45°
angle, were mounted on the moving platform at a distance
of 1.42 m from the air–soil interface. The platform was
moved along straight tracks with a constant spatial step of
size 0.02 m. Scanning lines of about 8 m were used to collect
data over a sandy portion of the test site. The experiments
were performed after intense rainfall, which reproduced
challenging operational conditions. This resulted in a nonhomogeneous background medium that consisted of two
layers: the upper layer, with thickness of a few centimeters,
was dry sand while the deeper layer comprised wet sand.
This work also discussed the performance achievable with
a conventional microwave tomographic approach to focus
FL-GPR data.

176

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

IMAGE FORMATION
Once the scattering data have been collected or generated
by solving the forward problem, a postprocessing procedure is needed to produce an image of the illuminated
scene [12], [40]. In most cases, when the detection of concealed targets is of interest, the image is 2D and formed
over a horizontal plane within an area ahead of the moving
FL system [see Figures 2(a) and (b)]. Although the height of
the 2D image can be arbitrarily chosen, the capability of
penetrating lossy soils at microwave frequencies is on the
order of 2–10 cm, which is comparable to the achievable
resolution. Therefore, varying the height by a few centimeters will not significantly affect the image quality and
target-detection capability.
Several image formation approaches have been proposed
in the literature, with a majority being simple adaptations of
conventional algorithms used for focusing SAR data. More
involved strategies based on electromagnetic formulations
of the problem have been presented to account for the presence of the dielectric interface and near-field conditions
arising due to shorter distances between the antennas and
imaging region of interest. In the following sections, we give
an overview of these methods.
MIGRATION
Among the most well-known image formation algorithms,
migration has been broadly used to focus GPR data. Migration is a family of imaging techniques that originate from
the seismic literature [41], [42]. Over the years, this class of
algorithms has been adapted within radar imaging frameworks, including SAR and GPR (see, e.g., [43] and [44]).
From a practical viewpoint, the algorithm essentially operates on the scattered field at the receiver to compensate for
the different delays encountered by the signal generated by
point-like scatterers, which are illuminated within a certain
time interval during the movement of the system (the FLGPR platform in this case).
In radar imaging, the migration algorithm is sometimes
assimilated to beamforming approaches since both essentially compensate for the hyperbolic patterns representing
raw data in a time-range scattering diagram. Image reconstruction in terms of migrated data can be achieved by numerically implementing a double integral function of time
and range, which includes the scattered field and migration
operator (see, e.g., [45] and [46]). A number of contributions on the application of migration and beamforming
algorithms to GPR data have been proposed. A comprehensive review of these approaches can be found in [17].
MICROWAVE TOMOGRAPHY
GPR imaging methods based on an electromagnetic formulation constitute a so-called inverse problem [12], [47].
Mathematically, a solution to the direct problem exists that
is unique and has a continuous dependence on the data
(see, e.g., [40] and [48]). The problem becomes ill posed
when the uniqueness of the solution and/or continuity of
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

its data dependence do not hold. The latter implies that
even a small error in the scattered field (e.g., the presence
of additive thermal noise) can cause a considerable error
in the reconstruction of the background dielectric characteristics. In practice, a regularization is applied to solve
the inverse problem [48]. The
main objective of the numerous methods that exist to regONCE THE SCATTERING
ularize the inverse problem is
renouncing an ideal solution
DATA HAVE BEEN
and looking for suitable roCOLLECTED OR GENERATED
bustness in the results.
BY SOLVING THE FORWARD
Imaging procedures based
PROBLEM, A
on a linear solution of the
POSTPROCESSING
scattering equation have been
PROCEDURE IS NEEDED TO
shown to be simple and particPRODUCE AN IMAGE OF THE
ularly suitable for the processILLUMINATED SCENE.
ing of GPR data [4], [49]–[51],
including FL configurations
[23], [33]. These procedures are
mainly based on the Born approximation (BA) (see e.g., [40] and [49] and the references
therein), which essentially approximates the internal field
of a dielectric object with the incident field; the latter being
a known term.
By suitably defining the Green’s function of the problem
[12], the electromagnetic formulation of the scattering field
based on a linear solution allows near-field consideration
consistent with the nature of the illumination. We can also
describe in the formulation a flat air–soil interface by defining a Green’s function for multilayered media.
Methods based on the inversion of the linear scattering
equation are often referred to as microwave tomography approaches [49], which essentially consist of retrieving the unknown profile of a dielectric object, i.e., the contrast function, from the knowledge of the scattered field collected at
the receiving antenna. The contrast function is defined as
the relative difference between the (complex) permittivity
of the target and that of the reference propagation scenario
(free space, in the case at hand).
By modeling the transmitting antennas as vertically oriented Hertzian dipoles and measuring only the VV-polarization scattered field from the investigation domain D, the
linear relationship under BA for shallow-buried targets can
be expressed as [12], [23]
E s ^rr, ~h = - jk b2 ~n 0 z 0 $

##D G^r, rr, ~h

$ 6G ^r, rt, ~h $ z 0@ | (r) dr,

(1)

where E s is the VV-polarized scattered field corresponding
to angular frequency ~ collected at point rr, | is the unknown scene reflectivity, G is the free-space dyadic Green’s
function, k b = f r k 0 is the wavenumber in the medium,
and k 0 = ~ f 0 n 0 is the free-space wavenumber. The vectors rr and rt represent the positions of the receive and
transmit antennas, respectively; r denotes a generic point
177

in the image area; and z 0 is the unit vector along the vertical direction. The operator “$” in (1) represents the dyadic
product and is implemented as the usual product between
a 3 # 3 matrix and a 3 # 1 vector.
To generate the image, (1) is discretized by means of a
conventional methods-of-moments approach (i.e., implementing a point-matching procedure) [12]. To limit the
computational burden, the linear problem can be simply solved by applying the adjoint operator, which is also
known as the backpropagation algorithm (BPA) [52], and solving for the unknown scene reflectivity. That is,
| = L )zz E s z, (2)
where L )zz is the adjoint of the discretized linear operator
in (1), and E s z and | are stacked vectors representing the
collected scattered field data and discretized version of the
unknown scene reflectivity, respectively. The spatial map defined by the magnitude of | is the tomographic image of D.
An alternative, computationally more demanding approach to regularize and solve (1) can be implemented using truncated singular value decomposition (TSVD) [48],
[54]. To achieve robustness of the solution against noise
and the uncertainties of the parameters of the reference

–5

y (m)

–10
–20

–30
–40

5
0

8 10 12 14 16 18
x (m)
(a)

–50

–5

scenario, the inversion is performed by implementing the
following equation [23], [48]:
N

n=1

1
v n G E s z, u n H v n , (3)

where G·,·H denotes the inner product, v n denotes the
singular values (sorted in a decreasing order) of the linear operator L zz, u n and v n are the singular vectors of L zz,
and N is the truncation index, whose choice should ensure a compromise between resolution and smoothness
of the reconstruction and the stability of the solution
against noise.
TSVD belongs to the class of inverse filtering methods
[54] and has frequently been applied to process GPR data
[33], [49]. A performance comparison between TSVD and
BPA for an FL-GPR was conducted in [23]. In Figure 3, we
reproduce the images generated by the two methods using
the near-field numerical data in [23]. The reconstruction
capabilities of both schemes were investigated by analyzing
the achievable resolution limits and considering the impact
of rough surface clutter on image quality. It was shown that
the two methods provide comparable imaging capabilities
with few differences.
More specifically, a microwave inverse imaging approach
can provide improved target reconstructions over BPA, specifically enhancing the response of weak targets. On the
other hand, BPA provides smoother and cleaner images
that are less affected by environmental clutter. In general,
BPA is preferred in the case of a large investigation domain
and when implementing multiview and multiaperture
strategies [19], [36], [53] or multilook incoherent processing [44]. An example of an FL-GPR image achieved with
BPA based on a multiaperture strategy, i.e., integration of a
certain number of FL-GPR scans selected along the track of
the sensor platform [53], is shown in Figure 4. An FL-GPR
image based on real data, described in [28] and [29], is depicted in Figure 5. The crosses with label P10 point to the
nominal locations of plastic mines buried at a depth of 10 cm.

y (m)

–10
–20

–30

5
0

8 10 12 14 16 18
x (m)
(b)

–50

FIGURE 3. Reconstructed images for plastic and metallic targets,

both on top of and buried below a rough surface with an rms
height equal to 0.8 cm and correlation length of 14.93 cm [23].
The amplitude is normalized to the maximum and expressed in
decibels over the interval [–50, 0]. (a) TSVD inversion with a truncation index of N = 180. (b) Adjoint inversion. The range between
the strongest and weakest target is around 45 dB for the TSVD and
nearly 48 dB for the adjoint method. (Source: [23].)
178

y (m)

–40

5
4
3
2
1
0
–1
–2
–3
–4
–5
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
x (m)

0
–5
–10
–15
–20
–25
–30
–35
–40

FIGURE 4. The normalized BPA tomographic reconstruction, on the
decibel scale, achieved the integration of the sets of eight FL-GPR
apertures. The processed numerical data are described in [37]. The
true target positions are indicated with red crosses. (Source: [53].)
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

0
–5
40

45
50
55
Down Range (m)

–10
–15

(dB)

–1
0
1
2
3

(a)
–1
0
1
2
3

0
–5
40

45
50
55
Down Range (m)

–10
–15

(dB)

Cross Range (m)

DATA-ADAPTIVE AND COMPRESSIVE
SENSING METHODS
A data-adaptive approach for FL-GPR image formation was
proposed in [44] (Figure 6). It is based on amplitude and phase
estimation and rank-deficient robust Capon beamforming.
There were 12 evenly spaced scans (each scan covering 2 m
in the down range) used to form the entire image, covering
24 m in total. The amplitude- and phase-estimation algorithm
in conjunction with the robust Capon beamformer provided a
significantly enhanced image quality compared to BPA.
Compressive sensing (CS) methods can also be applied to exploit the intrinsic sparsity of the illuminated
scene in terms of the number of buried targets. A CS approach was employed in [56] for scene reconstruction using

measurements from a MIMO FL-GPR system. Assuming a
linear model relating the measured data and the unknown
scene reflectivity, the image formation can be posed as a
solution to an inverse problem regularized by a sparsity-inducing norm. This framework permits scene reconstruction
with spatial and temporal sampling at sub-Nyquist rates.
In real environments, even with few targets, there
exists strong clutter that populates and subsequently
degrades the quality of the reconstructed image. This is
because the rough surface clutter in the FL-GPR can be
distributed over the entire region. An FL-GPR image from
[56], generated by processing real data of a shallow-buried
metallic antitank landmine using the CS technique, is depicted in Figure 7. Clearly, without clutter suppression, it

Cross Range (m)

The image is strongly cluttered with contributions from the
rough surface.
The aforementioned tomographic imaging methods are
based on free-space approximation, neglecting the presence
of the air-to-ground interface and assuming the propagation
as occurring in a homogeneous dielectric medium. The performance of the approximate free-space tomographic imaging was contrasted in [55] with that of a tomographic algorithm that accounts for the presence of the actual half-space
geometry. The latter implements the spectral representation
of the dyadic Green’s function. Using numerical electromagnetic FL-GPR data, the authors in [55] demonstrated that a
free-space approximation can lead to a loss of imaging resolution and degradation in the SCR, as compared to its halfspace counterpart. The impact of the lower resolution was
also observed in the estimated target statistics [53].

(a)
FIGURE 6. Real-data-based, single-look imaging results: (a) a

BPA imaging result and (b) the results of a hybrid of amplitudeand phase-estimation algorithm and robust Capon beamformer.
(Source: [44].)

X Distance (m)

P10

–1
–0.8

2
P10

–0.6
–0.4

P10

15.4 –2 15.6

3
4
5
Y Distance (m)

FIGURE 5. An FL-GPR image of plastic mines. The crosses point to
the nominal buried locations of the targets. The label P10 denotes
a plastic mine buried at a depth of 10 cm. As expected, the image
is strongly cluttered by the contribution from the rough surface.
The blank region indicates where a strong stake (fiducial) return
has been masked from the image to make the mine returns more
visible. (Source: [30].)
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

Azimuth (m)

15.8

16.2

–1
0
1
2
3
14 14.5 15 15.5 16 16.5 17 17.5 18
Range (m)

1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0

FIGURE 7. A CS image of sparse data without clutter suppression.

(Source: [56].)
179

Additionally, reflections generated by rocks and other
objects lying on the surface above the targets can be the
source of strong clutter or false alarms (see, e.g., [53], [65],
and [66]). Since the illuminated area in the FL-GPR usually
CLUTTER-MITIGATION STRATEGIES
extends beyond the image region where the targets reside,
The standoff-sensing capability of FL-GPR comes at the
strong clutter can also derive from nearby shrubbery, rocks,
expense of the energy backscattered by the illuminated
and other objects lying on the surface. Because of these factargets. The weak target responses are vulnerable to intertors, clutter-suppression approaches devised for the DL conference scattering arising from the air–ground interface
figuration may not directly apply to FL-GPR.
roughness and subsurface clutter. Therefore, it is imperative
Figure 8 shows an image from [56] obtained by applying
to eliminate or significantly reduce the clutter for effective
BPA to real data corresponding to a shallow-buried landand reliable target detection.
mine in a road 6 m wide, with rocks and shrubs populating
Over many years, considerable attention has been dethe roadside. The various types of clutter are clearly visible
voted to the suppression of clutter generated by the ground
in the image. More specifically, in addition to the clutter in
bounce in the down-looking
the image region, strong azimuth clutter and short-range
(DL) configuration, wherein
clutter are also visible. The former is due to large shrubs
the detection of objects burTHE STANDOFF-SENSING
and on-surface rocks on the side of the road, while the latter
ied at large depths (on the orCAPABILITY OF FL-GPR
is associated with ranges adjacent to the radar system that
der of tens of centimeters) is
cause returns with small propagation delays.
possible
[54],
[57]–[63].
Since
COMES AT THE EXPENSE
Some research efforts have been devoted to rough surthe
ground
bounce
in
DL
is
OF THE ENERGY
face
clutter characterization and reduction in FL-GPR (see,
typically
from
a
fixed
range
BACKSCATTERED BY THE
e.g., [20], [56], and [67] and the references therein). One of
and has the highest strength,
ILLUMINATED TARGETS.
the first attempts to characterize rough surface clutter in
it is conventionally removed
FL-GPR is documented in [68], where plane-wave timeby estimating and subtracting
domain scattering from a fixed target in the presence of
the ground return from the
a rough surface was numerically solved by means of an
measured signals or via time gating [64]. In FL-GPR sensFDTD algorithm. The authors examined the statistics of the
ing, however, the rough air–ground interface creates clutter
pulse scattered from the surface and applied conventional
that is essentially distributed over the entire area illumimatched filtering for target detection.
nated by the sensor.
A method based on the scattering
solution through physical optics was
proposed in [69]. The authors demonstrated that, by analyzing both
Clutter in
Buried
Reconstruction
scattering amplitude and phase as
Short-Range Clutter
Reconstruction Region
Landmine
Region
well as employing time-frequency
signal representations, it is possible
1
to suppress clutter and improve
0.9
–6
target-detection performance over
0.8
conventional approaches based on
–4
background subtraction or param0.7
eter analysis [70].
–2
0.6
An analytical approach was devel0
0.5
oped in [71] to examine the impact
0.4
of the rough surface on the detection
2
of buried targets in FL-GPR. This ap0.3
4
proach quantified the coherent and
0.2
incoherent components of the cross
6
0.1
section of buried targets using phys8
ical-optics approximation. The total
6
8
10
12
14
16
18
20
received signal from the targets and
Range (m)
surrounding clutter was determined
Azimuth Clutter
to consist of three components: a
coherent signal (whose phase is well
defined and can be tracked) correFIGURE 8. An FL-GPR image showing different types of clutter. The data are acquired by a
sponding to the target, an incohervehicle-mounted stepped-frequency FL-GPR virtual aperture radar, and BPA is used to generent signal (whose phase is random)
ate the image. (Source: [56].)
Azimuth (m)

is difficult to distinguish the target from the clutter in the
reconstructed image.

180

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

y (m)

In [67] and [75], an alternative approach, based on the
coherence factor (CF), was proposed for clutter reduction.
The performance of the CF-based approach was quantified
in terms of the SCR in the image domain. The approach
leveraged the matched filtering formulation of microwave
near-field tomographic imaging to define the CF for a multiantenna FL-GPR system. The CF was used to generate a
coherence map of the region of interest, which was then
applied as a mask to the original tomographic image. Since
the CF map assumes small values for low-coherence image
regions, which correspond to strong rough surface clutter
contributions, the final image has significantly reduced
clutter and is more amenable to the implementation of a
subsequent target-detection procedure. A comparative example is shown in Figure 9.
In [56], a clutter-suppression method, in conjunction
with CS imaging, described in the “Image Formation” section, was designed for a MIMO, array-based FL-GPR. A preprocessing method was proposed for reducing the azimuth
and short-range clutter localized in specific regions outside
of the image area, as depicted in Figure 8. This was achieved
by implementing azimuth filtering on sparse-array data
and range-profile domain suppression via an inverse Fourier transform.
The clutter-suppressed version of the CS-based image
in Figure 7 is depicted in Figure 10, where the impact of
the clutter reduction method is clearly visible. The clutter

y (m)

generated by the target, and an incoherent clutter contribution. As such, the problem of subsurface target detection
can rely on the identification of a partially coherent broadband signal in the presence of noise.
This approach, however, would require the design of a coherent system, which is complicated and expensive. Further,
it could fail not only in the presence of strong surface roughness profiles or inhomogeneities but also under weak target
response (i.e., dielectric) when the useful signal can lose its
partially coherent nature. Nonetheless, the main analytical approaches are based on physical-optics scattering and
a Gaussian representation of the correlation function of the
rough soil; these assumptions, however, may not represent
all possible realistic conditions.
To overcome some intrinsic limitations of the analytical approaches and provide a more realistic prediction of
back-scattering in FL-GPR systems, in both the presence
and absence of buried targets, a full-wave solution based
on an FDTD modeling of dispersive soil (i.e., described by
a frequency-dependent permittivity) was proposed in [72].
This work also developed a statistical analysis of the roughsurface scattering, constituting one of the first attempts at
the application of optimum hypothesis testing to solve the
problem of the detection of radar returns from buried mines
in the presence of rough surface clutter.
The effects of surface clutter on time-reversal-based FLGPR imaging were investigated numerically in [73], where a
large realistic scene consisting of landmines buried under a
rough surface was considered. This work emphasized the role
of the polarization of the incidence wave and impact of the
surface parameters on the dynamic range of the radar images
comprising both clutter and metallic/dielectric targets. The
impact of target orientation was also considered therein.
Following a similar full-wave approach, the authors in
[74] characterized clutter in the image domain and proposed
a statistical polarimetric approach for the reduction of the
rough surface clutter to improve the signal-to-background
ratio. Specifically, the method was based on the analysis
of the polarimetric coherence of the backscattered signal,
which is assumed to be zero for the rough surface clutter and
nonzero for human-made discrete targets.
A synthetic aperture near-field beamforming approach
was used to reduce clutter for antitank mine detection in [31],
which also proposed a statistical analysis of the signal and
clutter contributions based on real data from metallic and
plastic landmines. This work provided useful insights into the
relative intensity of clutter and targeted echoes, discussing the
challenging nature of the detection of plastic materials.
In [36], [37], [44], and [53], authors demonstrated the
advantage of coherently integrating measurements corresponding to multiple consecutive platform positions
for rough surface clutter reduction. The approach simply
takes advantage of the coherent and static nature (when
observed from different spatial positions) of the scattering generated by human-made targets with respect to the
rough surface contribution.

5
4
3
2
1
0
–1
–2
–3
–4
–5
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
x (m)
(a)

0
–5

5
4
8
4
3
2
2
1
5
0
9
6
–1
1
–2
7
–3
3
–4
–5
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
x (m)
(b)

0
–5

–10
–15
–20
–25
–30
–35
–40

FIGURE 9. The CF-based imaging results using FL-GPR numerical

data of plastic and metallic landmines: images (a) before clutter
suppression and (b) after CF-based enhancement. (Source: [67].)
181

in the region containing the targets, on the other hand, can
be reduced by fine-tuning the regularization parameter associated with the sparsity-based inverse problem. Toward
this end, an iterative procedure was implemented in [56] to
estimate an optimum regularization parameter in the presence of rough surface clutter, based on the ratio of clean
areas within the image with respect to cluttered regions.
TARGET DETECTION
The presence of rough surface clutter in FL-GPR imagery
renders the detection of on-surface and buried targets challenging. Owing to the oblique illumination in the FL configuration, only a small fraction
of the transmitted energy is
backscattered from the target
THE PRESENCE OF ROUGH
and collected by the radar reSURFACE CLUTTER IN
ceiver. The deeper the burial
depth of the target, the weaker
FL-GPR IMAGERY RENDERS
the signal return. More imTHE DETECTION OF
portantly, due to the similar
ON-SURFACE AND BURIED
dielectric features of plastic
TARGETS CHALLENGING.
targets and the surrounding
soil (permittivity on the order
of 3–4 for a dry background
medium), the scatterers cannot be easily differentiated from
clutter in both the spectral and image domains [11]. To this
end, innovative statistical and spectral approaches have been
devised in the literature to offer reliable target detection in FLGPR applications. In the following sections, we group these
approaches based on statistical and spectral methods.
STATISTICAL DETECTORS
In [28] and [29], the effectiveness of two statistical signalprocessing techniques, namely, the polarimetric whitening

–1
–0.8
–0.6
–0.4

Azimuth (m)

15.4 –2 15.6 15.8

16.2

–1
0
1
2
3
14 14.5 15 15.5 16 16.5 17 17.5 18
Range (m)

1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0

FIGURE 10. A CS image of Figure 7 after clutter suppression.

(Source: [56].)
182

filter and generalized likelihood ratio test (LRT), was investigated for different types of targets buried at various
depths below the interface. The capability of these methods to detect metallic targets with high confidence was
illustrated. However, an unsatisfactory detection performance for plastic mines was observed due to 1) a mismatch
between the ground truth and assumed target and clutter
statistics as well as 2) an incomplete exploitation of the
target signatures.
A locally adaptive detection method that adjusted the
detection criteria automatically and dynamically across different spatial regions of the FL-GPR image was proposed in
[76]. In this work, an FL-GPR image was processed with a locally adaptive standard deviation filter to compute the standard deviation of a small neighborhood around each pixel
of interest in the image. More specifically, prior to performing target detection, each image pixel value was replaced by
the maximum pixel value within a rectangular neighborhood of dimensions equal to 3 m in the cross range and
1.5 m in the down range. Potential targets within the image
were identified by performing the following operation:
A = arg u, v {G f (u, v) $ min {O f (u, v), - 60}}, (4)
where O f (u, v) denotes the filtered image, A is the set of
local-maxima locations, and G denotes the FL-GPR image.
An empirical value of –60 dB was selected as the threshold.
An example from [76] is depicted in Figure 11, where the
associated false-alarm locations are indicated with white
crosses. Expectedly, because of the nonoptimal choice of the
threshold, the processed image still exhibits a considerable
number of false alarms.
An image-domain LRT-based detection strategy was proposed in [53], which exploits the multiview intrinsic nature
of the FL-GPR configuration. The multiple views of the scene
correspond to measurements from different positions along
the platform trajectory. For an LRT detector, the exact statistics of the targets and clutter in the FL-GPR images need to
be known a priori. To this end, clutter and target pixel sets,
obtained from the training data, were used to determine the
target and clutter statistics. The targets were represented by
a three-component Gaussian mixture model, whereas the
clutter was found to be Rayleigh distributed.
Two different LRT detection strategies were employed for
fusion of the multiview images. The first performed simultaneous detection and fusion within the LRT framework under
the assumption of independent target and clutter statistics
from one viewpoint to another. Mathematically, the pixelwise LRT applied on N im images, {X n (i, j); n = 1, 2, f, N im},
is given by
N im

LR (i, j) =

n=1

p (X n (i, j)| H 1) 1
c, (5)
p (X n (i, j)| H 0) 2
H1

where p (X n (i, j) ; H 0) and p (X n (i, j) ; H 1) are the conditional
probability density functions of the nth image under the
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

null (target absent) and alternative (target present) hypotheses, respectively, and data independence across the multiple
views is assumed. By comparing the likelihood ratio with a
threshold c determined using the Neyman–Pearson theorem [77], a fused binary image Ff can be defined as
Ff (i, j) = (

1 if LR (i, j) 2 c
(6)
0 if LR (i, j) # c.

y (m)

The second method applied the LRT detector to individual images, followed by fusion of the detected binary
images through a pixel-by-pixel multiplication. Since the
clutter generates different image-domain signatures when
observed from different viewpoints, both strategies take
advantage of the clutter diversity provided by the multiple
views, though the latter scheme does not require the data
independence assumption across the multiple views.
An adaptive version of the LRT detector of [53] was proposed in [78] to allow enhanced multiview detection of lowsignature targets in a rough surface clutter environment. To
achieve a more accurate estimation of the image-domain
statistics, the target and clutter distributions were iteratively adjusted by means of a two-step procedure. The first
step aimed at separating the image into target and clutter
regions, whereas the second step used the extracted target
and clutter regions to update the target and clutter statistics.
This process was repeated until convergence was achieved.

–275
Along Track (m)

A binary image from [78], corresponding to the image presented in Figure 4, is reported in Figure 12. In [78], it was
shown that an adaptive detector can outperform its nonadaptive counterpart in terms of the false-alarm rates while
providing comparable detection performance.
A robust LRT detector, under the independence viewpoints assumption, was proposed in [79] for multiview FLGPR imaging. Instead of modeling the distributions of the
target and clutter pixels with parametric families, a band
of feasible probability densities under each hypothesis was
constructed using training data. The detector was then designed such that it minimized the maximum error probability for all feasible density pairs within the two bands.
This relaxed the strong assumptions about the clutter and
noise distributions, rendering the detector robust against
statistical model deviations. The minimax approach is critical in cases where accurate estimation of the distribution of
the background clutter may be challenging.
A binary image from [79], corresponding to the image of
Figure 4, is depicted in Figure 13. It was demonstrated that,

–280

–285

–10

–5

0
Along Array (m)

(a)

5
4
3
2
1
0
–1
–2
–3
–4
–5

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
x (m)

FIGURE 12. The detection result for the image presented in Figure 4

obtained through the adaptive procedure [78]. The red areas indicate the detected target regions, while the black areas represent
false alarms. (Source: [78].)

–208
–210

–212

–214
–216
–10

–5

0
Along Array (m)

(b)
FIGURE 11. An FL-GPR-processed image in [76]. The × symbol indi-

cates false alarms, + indicates fiducial alarm, and a circle indicates
a target. The panels correspond to two different regions along a
track that have slightly different lengths: (a) –285 to –275 m and
(b) –216 to –206 m. (Source: [76].)
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

y (m)

Along Track (m)

–206

–1

3
0

–3
–5

8
x (m)

FIGURE 13. The detection result for the image presented in Figure 4
obtained using the robust LRT detector. (Source: [79].)

183

extracted at each narrow-frequency subband and employed
to assess their role in target detection. For this purpose, the
authors considered a Fisher’s linear discriminant (FLD)based classifier, which can be mathematically described as
follows. With C 1 and C 2 representing the “false-alarm” and
“target” classes, respectively, and given N training feature
vectors {x n, n = 1, f, N} that have been labeled as C 1 and
C 2, FLD finds the projection direction in the feature space
that maximizes the class separation as

compared to detectors based on parametric models, robust
detectors can lead to significantly reduced false-alarm rates,
particularly in cases where there is a mismatch between the
assumed model and true distributions. Both the robust and
parametric detectors were extended to incorporate statistical dependence between multiview images via a copulabased function in [80].

y FLD = (m 2 - m 1) S -w1, (7)
where m i, i = 1, 2,is the mean feature vector of the ith class
and S w is the within-class scatter matrix, defined as
1
S w = N : / (x n - m 1) (x n - m 1) T
xn ! C1
+

xn ! C2

(x n - m 2) (x n - m 2) TD.

(8)

When an unlabeled testing data point is collected, its confidence value is determined by the projection of its feature vector on y FLD, and it is classified by means of simple
thresholding.
In [84], the authors performed target detection using space–wavenumber processing and a feature-based
method, employing data measured by means of a vehiclemounted FL-GPR equipped with a MIMO array. The approach was applied in the image domain and relied on the
definition of a bistatic scattering function associated with
selected pixels. A set of images achieved with different incident and scattering angles was used to estimate the bistatic
scattering function. Experimental results demonstrated
that the proposed method can offer an efficient feature vector for landmine discrimination.
An original approach to process measured data collected at the U.S. Army test site using the radar system

24
Vehicle Direction

Along-Track Range (m)

SPECTRAL APPROACHES
To improve the detection performance of plastic objects, a
time-frequency approach was proposed in [30]. The detection problem was conventionally formulated by considering a signal corrupted by interference. To deal with the nonstationary nature of both the signal and clutter, the authors
employed time-frequency distribution to provide temporal
localization of the signal spectral components. The detector
considered the signal time-frequency representation based
on the Choi–Williams distribution or, equivalently, the
ambiguity function and applied discriminant features extracted using principal component analysis plus the linear
discriminant method [81], [82]. The effectiveness of this approach and the employed detector was demonstrated using
experimental results.
Frequency subband processing was used in [83], together with co- and cross-polarized signals, for enhanced target-detection performance in FL-GPR sensing. Images were
formed using one wide subband and four narrow-frequency subbands within a 2.5-GHz signal bandwidth to analyze
the frequency dependency of landmines and clutter. An FLGPR image, corresponding to the copolarized (VV) signal
over multiple subbands, is shown in Figure 14. It is evident
that the clutter is particularly strong, but its distribution
changes over the frequency bands considered. On this basis, a number of features, including the magnitude, local
contrast, ratio between copolarized and cross-polarized
signals, and features of polarimetric decompositions, were

20
16
12
8
–4

Cross-Track
Range (m)
(a)

–4

Cross-Track
Range (m)
(b)

–4

Cross-Track
Range (m)
(c)

–4

Cross-Track
Range (m)
(d)

–4

Cross-Track
Range (m)
(e)

FIGURE 14. An example of images achieved with the FL-GPR system in [83] keeping unchanged the illuminated scene and polarization (VV)
but exploiting different frequency subbands: (a) 0.8–2.8, (b) 0.75–1.35, (c) 1.25–1.85, (d) 1.75–2.35, and (e) 2.25–2.85 GHz. The black circle
indicates the true target location. (Source: [83].)

184

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

developed by PSI was proposed in [85]. The method relied on the definition of a set of spatial lanes in the radar
image. The identification of potential targets was first independently performed in each subregion, and they were
then tracked across the subregions. Weighted averages of
the corresponding geometrical features were evaluated,
and the target persistence across the spatial regions was
used to reduce the false-alarm rates. Targets appearing in
a limited number of lanes were removed as part of the detection scheme.
An analysis of the spectral features extracted from the
scattered signal, with the goal to improve the performance
of buried explosive hazard detection, was provided in [86].
Natural resonant frequency and polarization features of
improvised explosive devices were examined in [87] for FLGPR. In [76], a spectrum-based classifier was proposed that
rejected false alarms by classifying each potential target
based on its spatial frequency spectrum.
A method based on the use of narrow-band and fullband radar processing, coupled with a classifier exploiting
complex-valued Gabor filter responses, was proposed in
[66]. Full-band radar images yielded high spatial resolution, while narrow-band images provided the means to detect targets with unique signatures. A composite confidence
map was implemented to detect local maxima and isolate
potential target pixels.
FUTURE TRENDS
A completely different radar-based approach from FL-GPR
to road mapping and clearing is to employ a traditional airborne, side-looking SAR system flying on a track parallel to
the road and imaging the ground area of interest. This approach has the advantages of a high coverage rate as well as
the fact that the platform does not come in contact with the
in-road hazard. Nevertheless, these radar systems operate
at relatively long ranges (at least 1 km) and, consequently,
require larger transmitted power and longer coherent integration intervals to achieve high image resolution. Both
modeling and experimental studies have demonstrated the
difficulty of detecting weak buried targets (such as plastic
landmines) by side-looking GPR systems, even in mild clutter conditions [21], [88].
Recent advances in radar sensors based on unmanned
aerial vehicle (UAV) platforms promise to bring together
the advantages of both types of aforementioned imaging systems [89]–[96]. Thus, a UAV-based SAR system can
operate at small elevations and ranges, requiring a small
amount of power and a short synthetic aperture length.
At the same time, as the flying platform does not come in
contact with the ground, the standoff range requirement
relevant to ground-vehicle-borne systems does not apply
in this case.
A UAV-mounted radar system is likely to be significantly less costly to build and operate than any of the current
airborne or FL-GPR systems, while the excellent control of
UAV flight trajectories makes motion compensation easier
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

to accomplish. The radar antenna can be readily configured
as down-looking, side-looking, or FL, depending on the imaging application, while antenna arrays can be combined
with the synthetic aperture created by platform motion.
Another possible scenario is
using a network of distributed
UAV-based SAR imaging senONE POTENTIAL
sors working cooperatively to
CAPABILITY OF FL-GPR
map a ground area. VehicleSYSTEMS THAT HAS RARELY
borne radar systems will still
BEEN EXPLOITED IN
have a role in explosive hazard
PRACTICE THUS FAR IS THE
detection; however, we envision that future trends will
CREATION OF 3D IMAGES OF
favor employing unmanned
THE SCENE UNDER
ground vehicle platforms for
INVESTIGATION.
this application.
One potential capability
of FL-GPR systems that has
rarely been exploited in practice thus far is the creation of
3D images of the scene under investigation. By adding the
height dimension to a radar map, one can infer the depth
of a buried target and partially mitigate the surface clutter,
which now affects only a limited part of the image volume.
Additional target features inferred from the 3D map can
also be useful in target-classification applications.
One example of a GPR system operating on the FL SAR
principle and capable of creating 3D images is the iRadar,
developed by the Lawrence Livermore National Laboratory [97]. While the iRadar antenna array is mounted close
to the ground and provides only a modest standoff range,
one can envision a system equipped with a similar array
mounted on a UAV flying at a height of 1–2 m over the
road and mapping the area of interest, including the underground volume [96]. Although the surface clutter becomes less an issue in the detection of buried targets in
3D images, underground inhomogeneities created by different soil layers, rocks, roots, and so on represent a new
source of clutter that may degrade the detection and classification performance.
In addition to buried object detection, FL radar technology is finding use in other emerging applications. One example is attempting to exploit the 3D imaging capability of
an FL radar to assist helicopter landing in degraded visual
environments (DVEs), such as those created by brownout
conditions. A prototype radar system based on this principle is currently being developed at ARL [98].
To achieve an angular resolution of 0.1–0.2°, comparable to optical sensors such as lidar, this radar system must
operate in the millimeter-wave regime (Ka band). The wave
attenuation through dust, sand storms, or other DVE conditions at these frequencies (less than 1 dB/km one way) is
still low enough to provide a see-through capability, which
is not available in infrared, optical, or lidar sensors. The 3D
map of the landing zone obtained by the FL SAR would be
interpreted in terms of natural or human-made terrain features, and this information would be passed to the pilot via
185

a helmet-mounted display to assist in deciding whether the
landing zone is safe.
The principle of the helicopter-mounted FL SAR system
for 3D landing zone mapping is explained in Figure 15.
The system is equipped with a 2-m-wide front-bumpertype linear antenna array,
which provides resolution in
the azimuth direction, while
THIS EMERGING
the forward motion of the
TECHNOLOGY HAS GAINED
platform at constant height
AN INCREASING INTEREST
creates a synthetic aperture
DUE TO ITS HUMANITARIAN
w it h suf f ic ient ele vat ion
look-angle diversity to offer
AND MILITARY
resolution in the vertical diAPPLICATIONS WHILE
rection. The radar waveform
MAINTAINING OPERATOR
bandwidth (between 0.5 and
SAFETY.
1 GHz) provides resolution
in the down-range direction.
To date, several studies based
on computer simulations have demonstrated the feasibility
of this concept and emphasized some of the major challenges associated with it.
CONCLUSIONS
In this article, we presented an overview of image formation and subsurface target-detection techniques using FLGPR. This emerging technology has gained an increasing
interest due to its humanitarian and military applications
while maintaining operator safety. We provided a balanced
account of existing methods and discussed their respective
advantages and limitations. The presented image formation approaches included conventional back-projection,

2D Synthetic
Array
1D Linear
Array
(a)

ar
Line

Arra

Forward Motion
Equivalent
2D Array
∆φ
∆θ

Obstacle

(b)
FIGURE 15. (a) A schematic representation of the helicopter-borne
FL SAR system for 3D landing-zone imaging, showing the proposed
configuration involving a linear antenna array. (b) The equivalence
of this imaging system with a 2D antenna array.

186

microwave tomographic techniques, and CS-based methods, with the last of these assuming the underlying scene
to be sparse. We also outlined different approaches to deal
with clutter arising from the rough ground interface. Finally, we detailed statistical and spectral techniques for landmine detection in FL-GPR applications.
While a broad range of imaging, target-detection, and
clutter-suppression techniques have been proposed in the
literature, there are still open issues, particularly associated
with the detection of plastic targets and real-time operation
under various challenging realistic conditions, that require
further investigations. New machine learning algorithms
could also be devised for target classification, especially in
the presence of strong clutter.
The future trend of radar deployment on unmanned
platforms (both aerial and ground based) brings forth
new challenges. From an implementation perspective,
the antenna design is a critical issue, especially when using antenna arrays with the limited space available on an
unmanned aerial platform. At the preferred operational
frequency range of 0.3–3 GHz, depending on the radiation
performance, the antennas can be quite bulky and heavy.
Compact designs using metamaterial-based UWB conformal antenna technology are a promising potential solution
to the implementation challenges.
From an algorithmic perspective, multiplatform data
fusion strategies under both communication and computation constraints could be devised to achieve enhanced performance using a distributed network of unmanned platforms. In short, research into devising effective solutions
for addressing the aforementioned challenges is critical to
providing performance guarantees.
AUTHOR INFORMATION
Davide Comite (davide.comite@uniromal.it) received his
master’s degree (cum laude) in communications engineering and Ph.D. degree in electromagnetics and mathematical models for engineering from the Sapienza University
of Rome, Rome, Italy, in 2011 and 2015, respectively. He
was a visiting Ph.D. student with the Institute of Electronics
and Telecommunications of Rennes, University of Rennes
1, Rennes, France, from March to June 2014 and a postdoctoral researcher with the Center of Advanced Communications, Villanova University, Villanova, Pennsylvania, USA,
in 2015. He is currently a post-doctoral researcher with the
Sapienza University of Rome, Rome, 00184, Italy. His research interests include the study of scattering from natural
surfaces as well as GNSS reflectometry over the land, microwave imaging and object detection performed through
ground-penetrating radar, modeling of the radar signature
in forward-scatter radar systems, study and design of leakywave antennas, and generation of nondiffracting waves and
pulses. He has been a recipient of a number of awards at
national and international conferences. Most recently, he
received a Young Scientist Award for the General Assembly and Scientific Symposium of the International Union
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

of Radio Science (URSI) 2020. In 2019 and 2020, the IEEE
Antennas and Propagation Society recognized him as an
Outstanding Reviewer for IEEE Transactions on Antennas and
Propagation, and he was honored as the best reviewer for
IEEE Journal of Selected Topics in Applied Earth Observation and
Remote Sensing in 2020. He is an associate editor of Journal of
Engineering and Microwaves, Antennas, and Propagation, both
by the Institution of Engineering and Technology, and IEEE
Access. He is a Senior Member of IEEE and of URSI.
Fauzia Ahmad (fauzia.ahmad@temple.edu) received
her Ph.D. degree in electrical engineering from the University of Pennsylvania in 1997. She is an associate professor
in the Department of Electrical and Computer Engineering, Temple University, Philadelphia, Pennsylvania, 19122,
USA. Prior to joining Temple University in 2016, she was
a research professor and the director of the Radar Imaging Lab at the Center for Advanced Communications, Villanova University. She has more than 250 publications in
the areas of array and statistical signal processing, computational imaging with applications in radar and ultrasonics,
compressive sensing , machine learning, radar signal processing, and structural heath monitoring. She is a Fellow of
IEEE and of the Society of Photo-Optical Instrumentation
Engineers (SPIE). She is the past chair of the IEEE Dennis
J. Picard Medal for Radar Technologies and Applications
Committee and SPIE Compressive-Sensing Conference series. She currently chairs the SPIE Big Data Conference series. She is a member of the Sensor Array and Multichannel
Technical Committee of the IEEE Signal Processing Society,
member of the Computational Imaging Technical Committee of the IEEE Signal Processing Society, and member
of the Electrical Cluster of the Franklin Institute Committee on Science and the Arts. She also serves as an associate
editor of IEEE Transactions on Computational Imaging and IET
Radar, Sonar, & Navigation.
Moeness G. Amin (moeness.amin@villanova.edu)
received his Ph.D. degree in 1984 from the University of
Colorado, Boulder. Since 1985, he has been on the faculty
of the Department of Electrical and Computer Engineering
at Villanova University, Villanova, Pennsylvania, 19085,
USA, where he is now a professor and director of the Center
for Advanced Communications. He is a Fellow of IEEE, the
Society of Photo-Optical Instrumentation Engineers, the
Institute of Engineering and Technology (IET), and the European Association for Signal Processing (EURASIP). He is
a recipient of the U.S. Fulbright Distinguished Chair in Advanced Science and Technology, Alexander von Humboldt
Research Award, IET Achievement Medal, IEEE Warren D.
White Award for Excellence in Radar Engineering, IEEE
Signal Processing Society Technical Achievement Award,
NATO Scientific Achievement Award, EURASIP Technical
Achievement Award, and IEEE Third Millennium Medal.
He was a Distinguished Lecturer of the IEEE Signal Processing Society. He has more than 850 journal and conference publications in the areas of wireless communications,
time–frequency analysis, sensor array processing, satellite
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

navigations, ultrasound imaging, and radar signal processing. He is a recipient of 12 best paper awards. He is the editor of three books from CRC Press: Through-the-Wall Radar
Imaging (2011), Compressive Sensing for Urban Radar (2014),
and Radar for Indoor Monitoring (2017). He serves on the editorial board of Proceedings of the IEEE.
Traian Dogaru (traian.v.dogaru.civ@mail.mil) received
his degree in electrical engineering from the Polytechnic
University of Bucharest, Bucharest, Romania, in 1990 and
his M.S. and Ph.D. degrees in electrical engineering from
Duke University, Durham, North Carolina, USA, in 1997
and 1999, respectively. He was a research associate with
Duke University, developing algorithms for electromagnetic field modeling, between 1999 and 2001. He has been
with the U.S. Army Research Laboratory, Adelphi, Maryland, 20783, USA, since 2001. His research interests include
radar signature modeling, computational electromagnetics,
signal processing, radar imaging and detection of concealed
targets, sensing through the wall, foliage penetration, and
ground-penetrating radar, as well as applying advanced
computational modeling techniques to the analysis of complex sensing scenarios. He is a Member of IEEE.
REFERENCES
[1] A. P. Annan, “GPR—History, trends, and future developments,”
Subsurface Sens. Technol. Appl., vol. 3, no. 4, pp. 253–270, 2002.
doi: 10.1023/A:1020657129590.
[2] D. J. Daniels, “A review of GPR for landmine detection,” Sens.
Imag., vol. 7, no. 3, p. 90, 2006. doi: 10.1007/s11220-006-0024-5.
[3] M. Sato, “Principles of mine detection by ground-penetrating
radar,” in Anti-personnel Landmine Detection for Humanitarian Demining. Berlin: Springer-Verlag, 2009, pp. 19–26.
[4] I. Catapano, G. Gennarelli, G. Ludeno, F. Soldovieri, and R. Persico, “Ground-penetrating radar: Operation principle and data
processing,” in Wiley Encyclopedia Elect. Electron. Eng. Hoboken,
NJ: Wiley, 2019, pp. 1–23.
[5] L. Robledo, M. Carrasco, and D. Mery, “A survey of land mine
detection technology,” Int. J. Remote Sens., vol. 30, no. 9, pp.
2399–2410, 2009. doi: 10.1080/01431160802549435.
[6] W. R. Scott, K. Kim, G. D. Larson, A. C. Gurbuz, and J. H. McClellan, “Combined seismic, radar, and induction sensor for
landmine detection,” J. Acoust. Soc. Amer., vol. 123, no. 5, pp.
3042–3042, 2008. doi: 10.1121/1.2932726.
[7] C. R. Ratto, P. A. Torrione, and L. M. Collins, “Exploiting groundpenetrating radar phenomenology in a context-dependent
framework for landmine detection and discrimination,” IEEE
Trans. Geosci. Remote Sens., vol. 49, no. 5, pp. 1689–1700, 2010.
doi: 10.1109/TGRS.2010.2084093.
[8] M. G. Fernández et al., “Synthetic aperture radar imaging system for landmine detection using a ground penetrating radar
on board a unmanned aerial vehicle,” IEEE Access, vol. 6,
pp. 45,100–45,112, 2018.
[9] S. Vitebskiy and L. Carin, “Resonances of perfectly conducting
wires and bodies of revolution buried in a lossy dispersive halfspace,” IEEE Trans. Antennas Propag., vol. 44, no. 12, pp. 1575–
1583, 1996. doi: 10.1109/8.546243.

187

[10] I. J. Gupta, A. van der Merwe, and C.-C. Chen, “Extraction of
complex resonances associated with buried targets,” in Proc. SPIE
Detection Remediation Technol. Mines Minelike Targets III, 1998, vol.
3392, pp. 1022–1032. doi: 10.1117/12.324149.
[11] L. Carin, N. Geng, M. McClure, J. Sichina, and L. Nguyen, “Ultrawide-band synthetic-aperture radar for mine-field detection,”
IEEE Antennas Propag. Mag., vol. 41, no. 1, pp. 18–33, 1999. doi:
10.1109/74.755021.
[12] R. Persico, Introduction to Ground Penetrating Radar: Inverse Scattering and Data Processing. Hoboken, NJ: Wiley, 2014.
[13] M. El-Shenawee and C. M. Rappaport, “Monte Carlo simulations
for clutter statistics in minefields: AP-mine-like-target buried
near a dielectric object beneath 2-D random rough ground surfaces,” IEEE Trans. Geosci. Remote Sens., vol. 40, no. 6, pp. 1416–
1426, 2002. doi: 10.1109/TGRS.2002.800275.
[14] A. C. Gurbuz, J. H. McClellan, and W. R. Scott, “A compressive sensing data acquisition and imaging method for stepped
frequency GPRs,” IEEE Trans. Signal Process., vol. 57, no. 7, pp.
2640–2650, 2009. doi: 10.1109/TSP.2009.2016270.
[15] D. Comite, A. Galli, I. Catapano, and F. Soldovieri, “Advanced
imaging for down-looking contactless GPR systems,” Appl. Comput. Electromagn. Soc. J., vol. 33, no. 7, pp. 1–4, 2017.
[16] G. Ludeno, G. Gennarelli, S. Lambot, F. Soldovieri, and I. Catapano, “A comparison of linear inverse scattering models for contactless GPR imaging,” IEEE Trans. Geosci. Remote Sens., vol. 58,
no. 10, pp. 7305–7316, 2020. doi: 10.1109/TGRS.2020.2981884.
[17] R. Solimene, I. Catapano, G. Gennarelli, A. Cuccaro, A.
Dell’Aversano, and F. Soldovieri, “SAR imaging algorithms and
some unconventional applications: A unified mathematical
overview,” IEEE Signal Process. Mag., vol. 31, no. 4, pp. 90–98,
2014. doi: 10.1109/MSP.2014.2311271.
[18] T. Dogaru, “NAFDTD—A near-field finite difference time domain solver,” Army Research Lab., Sensors and Electronic Devices Directorate, Adelphi, MD, Tech. Rep. ARL-TR-6110, 2012.
[19] D. Comite, F. Ahmad, M. Amin, and T. Dogaru, “Multi-aperture
processing for improved target detection in forward-looking GPR
applications,” in Proc. Eur. Conf. Antennas Propag., 2016, pp. 1–3.
[20] M. M. Tajdini, B. Gonzalez-Valdes, J. A. Martinez-Lorenzo, A.
W. Morgenthaler, and C. M. Rappaport, “Real-time modeling
of forward-looking synthetic aperture ground penetrating radar scattering from rough terrain,” IEEE Trans. Geosci. Remote
Sens., vol. 57, no. 5, pp. 2754–2765, May 2019. doi: 10.1109/
TGRS.2018.2876808.
[21] L. Nguyen, K. Ranney, K. Sherbondy, and A. Sullivan, “Detection of buried in-road IED targets using airborne ultra-wideband
(UWB) low-frequency SAR,” in Proc. 60th MSS Tri-Service Radar
Symp., 2014.
[22] J. A. Camilo, J. M. Malof, P. A. Torrione, L. M. Collins, and K.
D. Morton Jr., “Clutter and target discrimination in forwardlooking ground penetrating radar using sparse structured basis pursuits,” in Proc. SPIE Detection Sens. Mines, Explosive Objects, and Obscured Targets XX, 2015, , vol. 9454, p. 94540V. doi:
10.1117/12.2176491.
[23] F. Soldovieri, G. Gennarelli, I. Catapano, D. Liao, and T. Dogaru, “Forward-looking radar imaging: A comparison of two data
processing strategies,” IEEE J. Sel. Topics Appl. Earth Observ. Re-

188

mote Sens., vol. 10, no. 2, pp. 562–571, 2016. doi: 10.1109/
JSTARS.2016.2543840.
[24] A. W. Morgenthaler and C. M. Rappaport, “Scattering from lossy
dielectric objects buried beneath randomly rough ground: Validating the semi-analytic mode matching algorithm with 2-D
FDFD,” IEEE Trans. Geosci. Remote Sens., vol. 39, no. 11, pp. 2421–
2428, 2001. doi: 10.1109/36.964978.
[25] J. T. Johnson and R. J. Burkholder, “Coupled canonical grid/
discrete dipole approach for computing scattering from objects
above or below a rough interface,” IEEE Trans. Geosci. Remote Sens.,
vol. 39, no. 6, pp. 1214–1220, 2001. doi: 10.1109/36.927443.
[26] D. Comite, A. Galli, I. Catapano, and F. Soldovieri, “The role of
the antenna radiation pattern in the performance of a microwave
tomographic approach for GPR imaging,” IEEE J. Sel. Topics Appl.
Earth Observ. Remote Sens., vol. 10, no. 10, pp. 4337–4347, 2017.
doi: 10.1109/JSTARS.2016.2636833.
[27] J. Kositsky and P. Milanfar, “Forward-looking high-resolution
GPR system,” in Proc. SPIE Detection and Remediation Technol.
Mines Minelike Targets IV, 1999, vol. 3710, pp. 1052–1062.
[28] J. Kositsky and C. A. Amazeen, “Results from a forward-looking
GPR mine detection system,” in Proc. SPIE Detection and Remediation Technol. Mines and Minelike Targets VI, 2001, vol. 4394,
pp. 700–711.
[29] J. Kositsky, R. Cosgrove, C. A. Amazeen, and P. Milanfar, “Results from a forward-looking GPR mine detection system,” in
Proc. SPIE Detection and Remediation Technol. Mines Minelike Targets VII, 2002, vol. 4742, pp. 206–217.
[30] Y. Sun and J. Li, “Time–frequency analysis for plastic landmine detection via forward-looking ground penetrating radar,” Inst. Elect.
Eng. Proc. Radar, Sonar, Navigation, vol. 150, no. 4, pp. 253–261,
2003.
[31] M. R. Bradley, T. R. Witten, M. Duncan, and R. McCummins,
“Anti-tank and side-attack mine detection with a forward-looking GPR,” in Proc. SPIE Detection and Remediation Technol. Mines
Minelike Targets IX, 2004, vol. 5415, pp. 421–432.
[32] M. Ressler, L. Nguyen, F. Koenig, D. Wong, and G. Smith, “The
Army Research Laboratory (ARL) synchronous impulse reconstruction (SIRE) forward-looking radar,” in Proc. SPIE Unmanned
Systems Technology IX, 2007, vol. 6561, pp. 35–46.
[33] I. Catapano, A. Affinito, A. Del Moro, G. Alli, and F. Soldovieri,
“Forward-looking ground-penetrating radar via a linear inverse
scattering approach,” IEEE Trans. Geosci. Remote Sens., vol. 53, no.
10, pp. 5624–5633, 2015. doi: 10.1109/TGRS.2015.2426502.
[34] B. R. Phelan, K. D. Sherbondy, K. I. Ranney, and R. M. Narayanan, “Proc. SPIE Design and performance of an ultra-wideband
stepped-frequency radar with precise frequency control for landmine and IED detection,” in Proc. Radar Sensor Technology XVIII,
2014, vol. 9077, pp. 53–64.
[35] B. R. Phelan, K. I. Ranney, K. A. Gallagher, J. T. Clark, K. D.
Sherbondy, and R. M. Narayanan, “Design of ultrawideband
stepped-frequency radar for imaging of obscured targets,” IEEE
Sensors J., vol. 17, no. 14, pp. 4435–4446, 2017. doi: 10.1109/
JSEN.2017.2707340.
[36] T. Ton, D. Wong, and M. Soumekh, “ALARIC forward-looking
ground penetrating radar system with standoff capability,” in
IEEE Int. Conf. Wireless Information Technol. Syst., 2010, pp. 1–4.
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

[37] D. Liao, T. Dogaru, and A. Sullivan, “Large-scale, full-wave-based
emulation of step-frequency forward-looking radar imaging in
rough terrain environments,” Sens. Imag., vol. 15, no. 1, p. 88, 2014.
[38] D. Liao and T. Dogaru, “Full-wave characterization of rough terrain surface scattering for forward-looking radar applications,”
IEEE Trans. Antennas Propag., vol. 60, no. 8, pp. 3853–3866, 2012.
doi: 10.1109/TAP.2012.2201076.
[39] M. M. Tajdini, A. W. Morgenthaler, and C. M. Rappaport, “Multiview synthetic aperture ground-penetrating radar detection
in rough terrain environment: A real-time 3-d forward model,”
IEEE Trans. Geosci. Remote Sens., vol. 57, no. 5, pp. 3400–3410,
2019. doi: 10.1109/TGRS.2019.2954776.
[40] M. Pastorino, Microwave Imaging, vol. 208. Hoboken, NJ: Wiley, 2010.
[41] G. A. McMechan, “A review of seismic acoustic imaging by reverse-time migration,” Int. J. Imag. Syst. Technol., vol. 1, no. 1, pp.
18–21, 1989. doi: 10.1002/ima.1850010104.
[42] C. Özdemir, Ş. Demirci, E. Yiğit, and B. Yilmaz, “A review on migration
methods in b-scan ground penetrating radar imaging,” Math. Problems Eng., vol. 2014, pp. 1–17, June 2014. doi: 10.1155/2014/280738.
[43] J. M. Lopez-Sanchez and J. Fortuny-Guasch, “3-D radar imaging
using range migration techniques,” IEEE Trans. Antennas Propag.,
vol. 48, no. 5, pp. 728–737, 2000. doi: 10.1109/8.855491.
[44] Y. Wang, Y. Sun, J. Li, and P. Stoica, “Adaptive imaging for forward-looking ground penetrating radar,” IEEE Trans. Aerosp.
Electron. Syst., vol. 41, no. 3, pp. 922–936, 2005. doi: 10.1109/
TAES.2005.1541439.
[45] J. Gazdag, “Wave equation migration with the phase-shift method,”
Geophysics, vol. 43, no. 7, pp. 1342–1351, 1978. doi: 10.1190/1.1440899.
[46] I. Catapano, F. Soldovieri, G. Alli, G. Mollo, and L. A. Forte, “On
the reconstruction capabilities of beamforming and a microwave
tomographic approach,” IEEE Geosci. Remote Sens. Lett., vol. 12,
no. 12, pp. 2369–2373, 2015. doi: 10.1109/LGRS.2015.2476514.
[47] W. C. Chew, Waves and Fields in Inhomogeneous Media. Piscataway,
NJ: IEEE Press, 1995.
[48] M. Bertero and P. Boccacci, Introduction to Inverse Problems in Imaging. Boca Raton, FL: CRC Press, 1998.
[49] G. Leone and F. Soldovieri, “Analysis of the distorted born approximation for subsurface reconstruction: Truncation and uncertainties effects,” IEEE Trans. Geosci. Remote Sens., vol. 41, no. 1,
pp. 66–74, 2003. doi: 10.1109/TGRS.2002.806999.
[50] P. Meincke, “Linear GPR inversion for lossy soil and a planar airsoil interface,” IEEE Trans. Geosci. Remote Sens., vol. 39, no. 12, pp.
2713–2721, 2001. doi: 10.1109/36.975005.
[51] T. B. Hansen and P. M. Johansen, “Inversion scheme for ground
penetrating radar that takes into account the planar air-soil interface,” IEEE Trans. Geosci. Remote Sens., vol. 38, no. 1, pp. 496–506,
2000. doi: 10.1109/36.823944.
[52] A. Ben-Israel and T. N. Greville, Generalized Inverses: Theory and
Applications, vol. 15. New York: Springer Science & Business Media, 2003.
[53] D. Comite, F. Ahmad, D. Liao, T. Dogaru, and M. G. Amin, “Multiview imaging for low-signature target detection in rough-surface
clutter environment,” IEEE Trans. Geosci. Remote Sens., vol. 55, no.
9, pp. 5220–5229, 2017. doi: 10.1109/TGRS.2017.2703820.
[54] R. Solimene, A. Cuccaro, A. Dell’Aversano, I. Catapano, and F.
Soldovieri, “Ground clutter removal in GPR surveys,” IEEE J. Sel.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

Topics Appl. Earth Observ. Remote Sens., vol. 7, no. 3, pp. 792–798,
2013. doi: 10.1109/JSTARS.2013.2287016.
[55] D. Comite, F. Ahmad, and T. Dogaru, “Performance of free-space
tomographic imaging approximation for shallow-buried target
detection,” in Proc. IEEE 7th Int. Workshop on Comput. Adv. MultiSensor Adaptive Process., 2017, pp. 1–4.
[56] J. Yang, T. Jin, X. Huang, J. Thompson, and Z. Zhou, “Sparse
MIMO array forward-looking GPR imaging based on compressed sensing in clutter environment,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 7, pp. 4480–4494, 2013. doi: 10.1109/
TGRS.2013.2282308.
[57] L. M. van Kempen, H. Sahli, J. Brooks, and J. P. Cornelis, “New
results on clutter reduction and parameter estimation for land
mine detection using GPR,” in Proc. 8th Int. Conf. Ground Penetrating Radar, 2000, vol. 4084, pp. 872–879.
[58] F. Abujarad, A. Jostingmeier, and A. Omar, “Clutter removal for
landmine using different signal processing techniques,” in Proc.
Int. Conf. Ground Penetrating Radar, 2004, pp. 697–700.
[59] R. Wu et al., “Adaptive ground bounce removal,” Electron. Lett.,
vol. 37, no. 20, pp. 1250–1252, 2001. doi: 10.1049/el:20010855.
[60] R. Wu, J. Liu, Q. Gao, H. Li, and B. Zhang, “Progress in the research of ground bounce removal for landmine detection with
ground penetrating radar,” PIERS Online, vol. 1, no. 3, pp. 336–
340, 2005. doi: 10.2529/PIERS041130195615.
[61] G. Nadim, “Clutter reduction and detection of landmine objects in
ground penetrating radar data using likelihood method,” in Proc.
IEEE Int. Symp. Commun., Control Signal Process., 2008, pp. 98–106.
[62] F. Abujarad, G. Nadim, and A. Omar, “Clutter reduction and detection of landmine objects in ground penetrating radar data using singular value decomposition (SVD),” in Proc. Int. Workshop
on Adv. Ground Penetrating Radar, 2005, pp. 37–42.
[63] O. Lopera, N. Milisavljević, and S. Lambot, “Clutter reduction
in GPR measurements for detecting shallow buried landmines:
A Colombian case study,” Near Surface Geophys., vol. 5, no. 1, pp.
57–64, 2007. doi: 10.3997/1873-0604.2006018.
[64] D. J. Daniels, “Ground penetrating radar,” in Encyclopedia of RF
and Microwave Engineering. Hoboken, NJ: Wiley, 2005.
[65] T. C. Havens et al., “Improved detection and false alarm rejection
using FLGPR and color imagery in a forward-looking system,” in
Proc. SPIE Detection and Sensing Mines, Explosive Objects, and Obscured
Targets XV, 2010, vol. 7664, p. 76641U. doi: 10.1117/12.852274.
[66] T. C. Havens, J. M. Keller, K. Ho, T. T. Ton, D. C. Wong, and M.
Soumekh, “Narrow-band processing and fusion approach for explosive hazard detection in FLGPR,” in Proc. SPIE Detection and
Sensing Mines, Explosive Objects, and Obscured Targets XVI, 2011,
vol. 8017, p. 80171F. doi: 10.1117/12.884610.
[67] D. Comite, F. Ahmad, T. Dogaru, and M. Amin, “Coherence-factor-based rough surface clutter suppression for forward-looking
GPR imaging,” Remote Sens., vol. 12, no. 5, p. 857, 2020. doi:
10.3390/rs12050857.
[68] T. Dogaru and L. Carin, “Time-domain sensing of targets buried
under a rough air-ground interface,” IEEE Trans. Antennas Propag.,
vol. 46, no. 3, pp. 360–372, 1998. doi: 10.1109/8.662655.
[69] H. Jin-feng and Z. Zheng-ou, “A novel method for clutter reduction in the FLGPR measurements,” in Proc. IEEE Int. Conf. Commun., Circuits Syst., 2004, vol. 2, pp. 896–900.

189

[70] L. Van Kempen and H. Sahli, “Signal processing techniques for
clutter parameters estimation and clutter removal in GPR data
for landmine detection,” in Proc. IEEE Signal Process. Workshop on
Stat. Signal Process. (Cat. No. 01TH8563), 2001, pp. 158–161.
[71] K. F. Casey, “Rough-surface effects on subsurface target detection,” in Proc. SPIE Detection and Remediation Technol. Mines Minelike Targets VI, 2001, vol. 4394, pp. 754–763.
[72] G. A. Tsihrintzis, C. M. Rappaport, S. C. Winton, and P. M. Johansen, “Statistical modeling of rough surface scattering for
ground-penetrating radar applications,” in Proc. SPIE Detection
and Remediation Technol. Mines Minelike Targets III, 1998, vol.
3392, pp. 735–744.
[73] D. Liao and T. Dogaru, “Full-wave-based emulation of forwardlooking radar target imaging in rough terrain environment,” in
Proc. IEEE Int. Symp. Antennas Propag., 2011, pp. 2107–2110.
[74] D. Liao, “Ground surface scattering and clutter suppression in
ground-penetrating radar applications,” in Proc. IEEE Int. Symp.
Antennas Propag., 2012, pp. 1–2.
[75] D. Comite, F. Ahmad, T. Dogaru, and M. G. Amin, “Coherence
factor for rough surface clutter mitigation in forward-looking
GPR,” in Proc. IEEE Radar Conf., 2017, pp. 1803–1806.
[76] T. C. Havens et al., “Locally adaptive detection algorithm for forward-looking ground-penetrating radar,” in Proc. SPIE Detection
and Sensing Mines, Explosive Objects, and Obscured Targets XV, 2010,
vol. 7664, p. 76642E. doi: 10.1117/12.851512.
[77] S. M. Kay, Fundamentals of Statistical Signal Processing. Englewood
Cliffs, NJ: Prentice Hall, 1993.
[78] D. Comite, F. Ahmad, T. Dogaru, and M. Amin, “Adaptive detection of low-signature targets in forward-looking GPR imagery,” IEEE
Geosci. Remote Sens. Lett., vol. 15, no. 10, pp. 1520–1524, Oct. 2018.
[79] A. D. Pambudi, M. Fauß, F. Ahmad, and A. M. Zoubir, “Minimax
robust landmine detection using forward-looking ground-penetrating radar,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 7, pp.
1–10, 2020. doi: 10.1109/TGRS.2020.2971956.
[80] A. D. Pambudi, F. Ahmad, and A. M. Zoubir, “Copula-based
robust landmine detection in multi-view forward-looking GPR
imagery,” in Proc. IEEE Radar Conf., 2020, pp. 1–6.
[81] R. O. Duda, P. E. hart, and D. G. Stork, Pattern Classification.
Hoboken, NJ: Wiley, 2001.
[82] C. M. Bishop, Pattern Recognition and Machine Learning. Berlin:
Springer-Verlag, 2006.
[83] T. Wang, J. M. Keller, P. D. Gader, and O. Sjahputera, “Frequency subband processing and feature analysis of forward-looking
ground-penetrating radar signals for land-mine detection,” IEEE
Trans. Geosci. Remote Sens., vol. 45, no. 3, pp. 718–729, 2007. doi:
10.1109/TGRS.2006.888142.
[84] T. Jin, J. Lou, and Z. Zhou, “Extraction of landmine features using
a forward-looking ground-penetrating radar with MIMO array,”
IEEE Trans. Geosci. Remote Sens., vol. 50, no. 10, pp. 4135–4144,
2012. doi: 10.1109/TGRS.2012.2188803.
[85] T. Wang, O. Sjahputera, J. M. Keller, and P. D. Gader, “Landmine
detection using forward-looking GPR with object tracking,” in
Proc. SPIE Detection and Remediation Technol. Mines Minelike Targets X, 2005, vol. 5794, pp. 1080–1088.
[86] J. Farrell et al., “Evaluation and improvement of spectral features
for the detection of buried explosive hazards using forward-

190

looking ground-penetrating radar,” in Proc. SPIE Detection and
Sensing Mines, Explosive Objects, and Obscured Targets XVII, 2012,
vol. 8357, p. 83571C. doi: 10.1117/12.918779.
[87] H.-S. Youn et al., “Feasibility study for IED detection using forward-looking ground penetrating radar integrated with target
features classification,” in Proc. IEEE Antennas Propag. Soc. Int.
Symp., 2010, pp. 1–4.
[88] T. Dogaru and C. Le, “Polarization differences in airborne
ground penetrating radar performance for landmine detection,”
in Proc. SPIE Radar Sensor Technology XX, 2016, vol. 9829, pp.
85–97.
[89] M. Garcia-Fernandez, Y. Alvarez-Lopez, and F. Las Heras, “Autonomous airborne 3D SAR imaging system for subsurface sensing: UWB-GPR on board a UAV for landmine and IED detection,” Remote Sens., vol. 11, no. 20, p. 2357, 2019. doi: 10.3390/
rs11202357.
[90]A. Alzeyadi, J. Hu, and T. Yu, “Electromagnetic sensing of a
subsurface metallic object at different depths,” in Proc. SPIE
Nondestructive Characterization and Monitoring Adv. Mater.,
Aerosp., Civil Infrastructure, and Transp. XIII, 2019, vol. 10971, p.
1,097,105.
[91] M. González-Díaz, M. García-Fernández, Y. Álvarez-Loópez,
and F. Las-Heras, “Improvement of GPR SAR-based techniques
for accurate detection and imaging of buried objects,” IEEE Trans.
Instrum. Meas., vol. 69, no. 6, pp. 3126–3138, 2019. doi: 10.1109/
TIM.2019.2930159.
[92] D. Šipoš and D. Gleich, “A lightweight and low-power UAVborne ground penetrating radar design for landmine detection,”
Sensors, vol. 20, no. 8, p. 2234, 2020. doi: 10.3390/s20082234.
[93] I. Catapano et al., “Small multicopter-UAV-based radar imaging:
Performance assessment for a single flight track,” Remote Sens.,
vol. 12, no. 5, p. 774, 2020. doi: 10.3390/rs12050774.
[94] T. Dogaru, “Imaging study for small unmanned aerial vehicle
(UAV)-mounted ground-penetrating radar: Part I – Methodology and analytic formulation,” Army Res. Lab., Sensors and
Electronic Devices Directorate, Adelphi, MD, Tech. Rep. ARLTR-8645, 2019.
[95] T. Dogaru, “Imaging study for small unmanned aerial vehicle
(UAV)-mounted ground-penetrating radar: Part II – Numeric examples and performance analysis,” Army Res. Lab., Sensors and
Electronic Devices Directorate, Adelphi, MD, Tech. Rep. ARLTR-8725, 2019.
[96] T. Dogaru, “Imaging study for small unmanned aerial vehicle
(UAV)-mounted ground-penetrating radar: Part III – A multistatic approach,” Army Res. Lab., Sensors and Electronic Devices
Directorate, Adelphi, MD, Tech. Rep. ARL-TR-8773, 2019.
[97] D. W. Paglieroni, D. H. Chambers, J. E. Mast, S. W. Bond, and
N. Reginald Beer, “Imaging modes for ground penetrating radar
and their relation to detection performance,” IEEE J. Sel. Topics
Appl. Earth Observ. Remote Sens., vol. 8, no. 3, pp. 1132–1144,
2015. doi: 10.1109/JSTARS.2014.2357718.
[98] T. Dogaru, “Synthetic aperture radar for helicopter landing in
degraded visual environments,” Army Res. Lab., Sensors and
Electronic Devices Directorate, Adelphi, MD, Tech. Rep. ARLTR-8595, 2018.
GRS
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

Gaussianizing the Earth
Multidimensional information measures
for Earth data analysis
J. EMMANUEL JOHNSON, VALERO LAPARRA,
MARÍA PILES, AND GUSTAU CAMPS-VALLS

nformation theory (IT) is an excellent framework for analyzing Earth system data because it enables us to characterize uncertainty and redundancy and is universally
interpretable. However, accurately estimating information
content is challenging because spatiotemporal data are
high-dimensional and heterogeneous and have nonlinear
characteristics. In this article, we apply multivariate Gaussianization for probability density estimation, which is robust to dimensionality, comes with statistical guarantees,
and is easy to apply. In addition, this methodology enables
us to estimate information-theoretic measures to characterDigital Object Identifier 10.1109/MGRS.2021.3066260
Date of current version: 6 May 2021

ize multivariate densities: information, entropy, total correlation, and mutual information (MI). We demonstrate how
IT measures can be applied in various Earth system data
analysis problems.
First, we show how the method can be used to jointly
Gaussianize radar backscattering intensities, synthesize hyperspectral data, and quantify information content in aerial optical images. We also quantify the information content
of several variables that describe the soil–vegetation status
in agroecosystems and investigate the temporal scales that
maximize their shared information under extreme events,
such as droughts. Finally, we measure the relative information content of space and time dimensions in remote sensing products and model simulations involving long records
©SHUTTERSTOCK.COM/SUMANBHAUMIK

DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

191

of key variables, such as precipitation, sensible heat (SH),
and evaporation. Results confirm the validity of the method, for which we anticipate wide use and adoption. Code
and demonstrations of the implemented algorithms and IT
measures are provided.
EARTH DATA AND INFORMATION DELUGE
Understanding spatial temporal dynamics of Earth system
models and ovservation data are fundamental to monitoring our planet and understanding climate change [1]–[4].
We now face an information deluge from remote sensing
platforms that continuously increase the spatial, temporal,
and spectral resolution of data sources. Earth system data
come in high volumes, are heterogeneous, and are riddled
with uncertainty [5], which poses important challenges
in analysis, modeling, and understanding. The statistical
analysis of remote sensing data and model simulations requires dealing with this large amount of heterogeneous,
multivariate, and spatiotemporal material. Copious amounts
of data do not necessarily mean large quantities of information. For example, it is now widely acknowledged that
models are often correlated and share common traits,
features, and information content. Which features are the
most appropriate and representative? How can we best
quantify their information content in meaningful units?
Essential Earth variables and data products exhibit high
levels of redundancy in space and time. So, what is the
most appropriate space, time, or spatiotemporal scale one
should look at? The same questions arise when trying to assess and choose the most adequate observational variable
and biogeophysical parameter for Earth monitoring.
From a purely statistical standpoint, information quantification for Earth and climate data is difficult. IT is the appropriate framework to study information content, uncertainty, and redundancy [6]. The estimation of entropy and
MI for discrete and continuous random variables has been
addressed through different approaches in the statistics literature [7]–[10]. But the IT measure estimation of multivariate data is problematic. Some methods, such as using histograms [6], [11] and nearest neighbors [8]–[10], can be very
limiting, as they do not scale well, do not converge to the
true measure, and show a high estimation bias [12]. However, in the remote sensing and geosciences community, there
have been many successful application-driven approaches to
overcome this challenge. Examples include studying feature
redundancy in image classifiers [13], assessing the maximum number of parameters that can be estimated given a set
of observations [14], remote sensing feature extraction and
weighting [15], [16], data fusion [17], image registration [18]–
[20], synthetic aperture radar (SAR) data characterization
[21], [22], and quantifying uncertainty in models and observations [23]. However, again, these methods are applicationdriven, and none have been tested in very-high-dimensional
scenarios, which is crucial for data characterization.
All information quantification metrics require a
good multivariate density estimator. This is especially
192

problematic in Earth observation (EO) data with moderate- to high-dimensional problems and nonlinear feature
relations. These issues affect the classic parametric density estimators based on the exponential family of solutions and mixture distributions as well as nonparametric
methods based on histograms, kernel density estimation
(KDE), and k-nearest neighbors (kNNs). As an alternative
to these traditional methods, there is a new class of techniques called neural density estimators [24], which are parameterized neural networks that approximate densities.
They use the “change-of-variables” formula to estimate
the densities of inputs and enable one to draw samples
of input data. They have promise, as they have been successfully used in applications related to Earth system sciences, including inverse problems [25] and density estimation [26].
In this article, we look at a particular class of models
in the neural density estimation family. In particular, we
introduce the Gaussianization method [27] and a generalized algorithm called rotation-based iterative Gaussianization (RBIG) [28]. This uses a repeated sequence of simpler
feature-wise Gaussian transformations and orthogonal
rotations until convergence. In each iteration, the total
correlation and the non-Gaussianity are reduced and
converge toward zero, that is, toward full independence.
The learned transformation toward the Gaussian domain
is invertible, which enables us to easily synthesize data
by inverting samples drawn from the Gaussian domain.
The approach is also advantageous because it enables us
to estimate IT measures, such as entropy, total correlation,
non-Gaussianity, and MI in high-dimensional data. It is
fast and easy to apply and has links to deep neural networks [28]–[30].
MULTIVARIATE GAUSSIANIZATION
PROBABILITY DENSITY FUNCTION ESTIMATION
Most problems in signal and image processing, IT, and
machine learning involve the challenging task of multidimensional probability density function (PDF) estimation. A PDF, or simply a density p ($), takes an input
x ! X and outputs a density following the properties 1)
that p (x) $ 0, 6x ! R D and 2) that it has to sum to one,
#X p (x) dx = 1. In practice, we usually do not have access to
the PDF p ($), but we do have a set of (multivariate) samples
drawn from the generating process x = " x 1, x 2, f, x N , to
estimate the PDF from. Accurate PDF estimation is important because it enables us to 1) calculate the probability of
any arbitrary input data point, which accounts for the relative likelihood that the value of the random variable will
equal the sample; 2) generate samples x l + p (x) from this
distribution, thus facilitating data synthesis, background
and support estimation, and anomaly detection; and 3)
calculate expectations for functions (or transformations)
of arbitrary form f (x) given p (x), i.e., E x [f (x)], which enables us to, e.g., characterize a system.
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

Having access to all these properties gives us the ability to tackle long-standing problems in machine learning
and statistics. With accurate PDF estimates, one can model
the conditional densities of data generated from a prior
distribution, develop accurate and efficient compression
schemes, and use principled objective functions, such as
the maximum likelihood. In addition, having access to
an accurate density estimator can be useful in many hybrid applications to deal with out-of-sample and out-ofdistribution problems [31]. The problem is, therefore, to
estimate the density p (x) given a set of samples from X.
The simplest approach to PDF estimation assumes that the
density has a parametric functional form defined by a fixed
number of tunable parameters. The Gaussian assumption
is the most widely adopted for unimodal distributions,
which come parameterized by a mean n and a covariance
function R. If more than one mode is assumed, a mixture
of Gaussians (MoG) generally leads to better fits. However, finding a parametric form for the distribution that fits
properly to particular data is very difficult in most cases.
The alternative technique comes from nonparametric models, which do not assume a specific form for the
distribution and are learned from data. The simplest nonparametric method estimates the PDF by partitioning the
data space into nonoverlapping bins, where the density is
estimated as the fraction of data points in the bin divided
by the volume of the bin. This estimator runs the risk of
overfitting or underfitting, depending upon how the bins
are selected. Thus, there are several rule-of-thumb estimators with a wide range of guidelines for choosing the most
appropriate bin size: 1) an overall good estimator using
Sturges’s Rule [32], an estimator that is better for a larger
number of samples and is more robust to outliers by using
the Freedman–Diaconis method [33], and more Bayesian
approaches [34]. However, histogram-based PDF estimation methods are affected by the curse of dimensionality,
so they cannot be applied to a large number of features.
Alternative parametric estimates that follow probability
estimation schemes for the optimal bin width determined
by the maximum likelihood have been introduced [24].
However, they are very rigid and lead to extremely rough
density functions.
To achieve smoother PDF estimates, KDE is popular. It
places a nonlinear kernel function with a varying bandwidth parameter to control the degree of smoothness on
top of each example. Unfortunately, a bias–variance tradeoff will result in over/underfitting the PDF, especially in
moderate- to high-dimensional problems. In the previous
approaches, the bandwidth is typically fixed a priori following heuristics in the literature [35], and it rarely accounts for the concentration of points, i.e., that smaller bins
should be placed in regions with a higher concentration of
points, in the form of an adaptive bit allocation scheme.
This can be addressed by using kNNs, which have one
adaptive bandwidth per location and depend on the number of available training points. However, all the preceding
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

density estimators suffer from the curse of dimensionality:
as the dimensionality increases, the space becomes sparser,
and density estimates are unreliable.
GAUSSIANIZATION FOR PDF ESTIMATION
An alternative way to estimate a PDF from observational
data is to employ a data transformation to a convenient domain instead of working explicitly in the high-dimensional
input domain. The question of what constitutes a convenient domain is a long-standing one. Ideally, the domain
should have independent components so that one can work
in each dimension independently to get rid of the curse of
dimensionality. It should enable one to perform operations
and compute quantities therein, and it should be invertible so that one can express these quantities in meaningful
units of the input domain.
The Gaussian distribution has the desirable properties
of showing independent components and being mathematically tractable and is thus a good candidate for density estimation. A class of Gaussianization methods [28],
[30] looks for transforms to a multivariate Gaussian domain. These transforms are related to projection pursuit
transformations introduced in [42] and seek to transform
a multivariate distribution p (x), where x + X ! R d, into a
standardized multivariate Gaussian distribution [27], [28]:
G i: x ! R d 7 z ! R d

+ p (x)
+ N (0, I d),

(1)

where i are the parameters learned to Gaussianize the
data x, 0 is a vector of zeros (for the means), and I d is
the identity matrix (for the covariance). By construction, the Gaussianization transform is a parameterized
function G i consisting of a sequence of L iterations (or
layers), each performing an orthogonal rotation of the
data and a marginal Gaussianization transformation to
every feature.
The transformation G i in each iteration , is defined as
G i : x , + 1 = R , W , (x ,), , = 0, 1, f, L,
where x 0 corresponds to the original data x, W , is the marginal Gaussianization of each dimension of x , for the iteration ,, and R , is a rotation matrix for the marginally Gaussianized variable W , (x ,). After convergence in L iterations,
the transformation contains all the needed information to
convert data coming from the original density into a multivariate Gaussian. Here, i collectively group all parameters: those from the rotation matrix R and the marginal
transformation W. For example, one could use a principal
component analysis (PCA) transformation for the rotation
matrix R and a histogram transformation for the marginal
Gaussianization transformation W. Then, the eigenvectors
obtained from PCA describing R and the parameterizations
of W would define i. See Table 1 for more details on the
decomposition of this formula and Figure 1 for a full decomposition of a toy data set.
193

TABLE 1. A SUMMARY OF THE COMPONENTS OF THE GAUSSIANIZATION ALGORITHM.
DESCRIPTION

NOTATION

TRANSFORMATION

DOMAIN

Marginal uniformization

Histogram [28], mixture CDF [36], KDE [30],
Lambert [37], spline [38], Box–Cox [39]

R " R [0, 1]

Inverse CDF

CDF −1

Inverse Gaussian CDF, logistic, inverse
Cauchy CDF

R [0, 1] " R

Marginal Gaussianization

W = CDF -1 % U

Marginal uniformization + inverse CDF

R"R

Rotation

PCA [28], independent component analysis
[27], random rotations [28], Householder
transformations [40], [41]

Rd " Rd

Gaussianization block

G , = R 6W 1 gW d@

Composition of rotation + marginal Gaussianization

Rd " Rd

Gaussianization transform

G = 6G 1 % g % G L@

Composition of Gaussianization blocks

Rd " Rd

BEFORE

AFTER

CDF: cumulative distribution function.

We can use the change-of-variables formula to calculate
the PDF of x as
p x (x) = p z ^G i (x)h d x G i (x) ,

we can sample points in the original domain xl ! X by
generating samples in the transformed Gaussian domain
and propagating these through the inverse transformation G -H1 . Because the transform is a product of linear and
marginal operations, the Jacobian and the inverse transform can be easily computed [28], [44].
The original Gaussianization algorithm [27] worked
by applying an orthogonal rotation matrix via independent component analysis and an MoG for the marginal
Gaussian transformation. After enough repetitions L, it
was shown that this converged to a multivariate Gaussian
distribution [27]. In [28], we extended Gaussianization by
realizing that the method will converge with any orthogonal rotation matrix R, and we named the algorithm RBIG.

(2)

where d x G i (x) is the determinant of the Jacobian of G i
with respect to x. Generally, any unknown PDF of x can be estimated as long as we have the transformation G i along with
its Jacobian. Intuitively, this transformation essentially converts the density of X into unstructured noise (often Gaussian or normal) [24], [26], [43]. There is no limit to the number of composite transformations G H = G i1 % G i 2 % g % G i L
that can be used to sufficiently converge to the Gaussian distribution. In addition, because G H is invertible,

G1
x0

G2
x1

G3
x2

GL
xO

FIGURE 1. A complete Gaussianization of a noisy sine wave to a marginally and jointly Gaussian distributed one. We use PCA for the
rotation matrix and a histogram cumulative distribution function estimator for the marginal transformation. Plots were generated with
seaborn [73].

194

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

This facilitated simpler and faster algorithms, such as
PCA, and even randomly generated orthogonal rotation
matrices. In addition, much simpler univariate estimators, such as histograms, were used to significantly speed
up the algorithm. Meng et al. [30] coined the term Gaussianization flows and extended the iterative algorithm to be
fully parameterized and trainable by incorporating a mixture of logistics as the marginal Gaussianization layer and
a sequence of Householder flows [40], [41] as the rotation
layer. They also proved that this is a universal approximator and showed convincing results that Gaussianization
is comparable to other classes of methods specifically designed for density estimation and sampling [30]. All transformations and example variants can be found in Table 1.
For details about the theoretical convergence properties of
Gaussianization flows, see [27], [28], and [30].
Regardless of the chosen method, to find the parameters
i for the transformation G i, we minimize the following
cost function with respect to i:
L ^i h = D KL 7p z ^G i ^ x hh N ^0, I D hA, (3)

Gaussian). See the RBIG site, https://ipl-uv.github.io/rbig_
jax/, for a working implementation of the RBIG algorithm in
Python and MATLAB.
IT MEASURES USING THE RBIG TRANSFORM
RBIG was designed for density estimation but was inspired by, and had connections to, IT [6]. The series of
transformations learned by RBIG converts data from the
original domain to a standard multivariate Gaussian
one. The features are marginally independent, which is
important for determining information-theoretic measures using the Gaussianization scheme. This reduction
in redundancy is iteratively achieved and can be explicitly
computed by summing up all the layer redundancy reductions. This metric is known as the total correlation, and
computing it enables us to derive information-theoretic
measures from data.
INFORMATION
Shannon information I [47] is based on the idea that a sample, x i, is more interesting (it carries more information) if
it is less probable. The formal definition of information is

which is the Kullback–Leibler (KL) divergence between
the estimated Gaussian distribution and the true multivariate Gaussian distribution of mean 0 and covariance
I; in other words, this is a measure of how non-Gaussian
our distribution is after transformation. This reveals a direct relationship with information-theoretic concepts and
measures. Chen [27], [46] showed that (3) can be decomposed as

where T (x) is the total correlation (T) (as well as multi-information and multivariate MI) between all the marginal
distributions and J m (x) is the KL divergence between the
marginal distributions and the standard Gaussian normal distribution. Intuitively, this cost function is trying to
minimize the information shared among each of the marginal distributions and ensure that they follow a normal
Gaussian distribution. We want to highlight the fact that
RBIG vastly transforms and simplifies the PDF estimation
problem, from directly estimating the density of the highdimensional multivariate distribution in X to doing it indirectly through a transformation to a Gaussian domain,
all by using a series of marginal transformations, which are
straightforward and fast.
An example of how RBIG works on a simple 2D toy data
set is provided in Figure 2. We transform a non-Gaussian 2D
data set into a 2D marginal and jointly Gaussian distribution along with the inverse transformation (first row). The
second row demonstrates how we can use RBIG to synthesize points in the data domain by using the inverse transformation. Figure 2(f) shows evolution through iterations
of the final total correlation (as a measure of redundancy)
and the non-Gaussianity (as a measure of the distance to a
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

x
(a)

(c)

(b)
0.6
0.4
0.2
0

Loss

L (i) = T (x) + J m (x), (4)

I (x i) = - log (p x (x i)). (5)

(d)

(e)

10 20
Iterations
(f)

∆ Total Correlation
Non-Gaussianity
FIGURE 2. The density estimation of a sinusoid with heteroscedas-

tic noise, using RBIG. The original data distribution X is mapped
to a Gaussian domain Z, with transform G i parameterized by a
set of rotations and marginal Gaussianizations collectively denoted
as i, which has an analytic inverse transformation, x = D -i 1 (zt ), to
recover the original data. One can sample random data from the
Gaussian in domain Z and use the inverse transformation of z to
xt for data synthesis. We also demonstrate the losses: the equivalence of the change in the total correlation between layers DT and
the KL divergence between transformed data and a multivariate
Gaussian (non-Gaussianity). (a) X. (b) zt = G i (x). (c) x = G -i 1 (zt ) .
(d) Z. (e) xt = G -i 1 (z) . (f) DT and non-Gaussianity Plots were
generated with matplotlib [74].
195

It can be used, for instance, to highlight regions of more
interest in a data set. Information can be computed for each
sample in our data set by using RBIG and (2). The expected
value of the information provided by a complete data set, x,
is called entropy:
H (x) = E x [- log (p x (x))]. (6)
While entropy could be computed by estimating the information of each sample in a data set using (5) and averaging,
computing it using the ability of RBIG to calculate the total
correlation is more convenient, as we will see in the following section.
TOTAL CORRELATION
The total correlation, T, accounts for the information shared
among the dimensions of a multidimensional random variable [48], [49]. Details of how to compute T using RBIG can
be found in [28]; here, we sketch the main idea. Given data
x ! R D, we first learn the Gaussianization transform with
L iterations and compute the cumulative reduction in the
total correlation in each iteration as
T^x h =

/ d D H^N^0,1hh - / H^x ,dhn.(7)
L

,=1

d=1

x = [x1, x2]

y = [y1, y2]
H (x1)

H (x2)

H (x) = H ([x1, x2])

The number of layers L will be determined by the reduction in the total correlation with each transformation.
If there is no change in the total correlation after some
threshold number of layers, we can assume that x d are
completely independent. It is important to note that all
entropy calculations involve only marginal operations,
which are simple and fast, enabling RBIG to be used on
large data sets that have a high number of dimensions.
JOINT ENTROPY
While the concept of information is attached to a particular
sample, entropy is used in different fields to characterize
how unpredictable a complete process is. Entropy can be
easily computed from the learned RBIG transformation by
H (x) =

/ H (x i) - T (x),(8)

d=1
D
d=1

where R H (x i) are marginal entropy estimations and
T (x) also involves marginal estimations [see (7)].
MULTIVARIATE MI
Multivariate MI accounts for the information shared by two
data sets [6]. Estimating MI can be very challenging when
working with high-dimensional data. Our approach is based
on the invariance property of MI to reparameterize the
space of each variable [8]. Therefore,
we essentially Gaussianize the two
data sets, X and Y, with corresponding
H (y1)
transforms that remove their total correlations. Then, the total correlation
H (y2)
remaining between both Gaussianized data sets is equivalent to the MI
H (y)
between the original data sets:

T (x) = MI (x1, x2)

T (y)

MI (X, Y) = T ([G i x (X), G i y (Y)]), (9)

MI (x, y)

T ([x, y])

which again implies only marginal operations [see (7)]. Figure 3 includes
a Venn diagram illustrating the different IT measures used in this article,
and Table 2 demonstrates how they

FIGURE 3. A Venn diagram of the relationships of all IT measures used in this article. The
solid-colored circles represent marginal variables, and the intersection regions with bold
lines show regions for IT measures, such as MI and total correlation, T.

TABLE 2. A COMPARISON OF DIFFERENT IT MEASURES AND THE POPULAR PEARSON CORRELATION COEFFICIENT, t.

Correlation

t (x, y)

Low

Medium

Low

Medium

High

MI (x, y)

Low

Medium

High

Marginal entropy

H (x), H (y)

High

Joint entropy

H (x, y)

High

Medium

Low

This table is also a visual demonstration of how to interpret MI and its relationship to marginal entropy and joint entropy; MI (x, y) = H (x) + H (y) - H (x, y).

196

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

compare to the popular Pearson correlation coefficient for different toy data sets.
ILLUSTRATIVE EXPERIMENTS
In this section, we explore the information content, redundancy, and relation in a selection of Earth data analysis problems involving remote sensing data and models, using RBIG.
First, we illustrate the method’s ability to analyze standard
remote sensing settings involving total correlation estimation in hyperspectral, radar, and very-high-resolution imagery. Second, we quantify the information content of several
variables that describe a soil–vegetation status and investigate the temporal scales leading to the maximum shared
information for the detection and precursors of anomalies,
such as droughts. Finally, we explore the challenging problems of IT measurement estimates and the quantification
of the spatiotemporal information tradeoff in global Earth
products. Table 3 summarizes the experiments in terms of
measures, applications, and data/simulations.
GAUSSIANIZATION IN REMOTE SENSING DATA
This first set of experiments considers the use of RBIG for
standard remote sensing image processing. We show the
performance of RBIG in hyperspectral, very-high-resolution, and radar imagery and for several applications: joint
(multivariate) Gaussianization, data synthesis, and information estimation.
GAUSSIANIZATION OF RADAR IMAGES
The first part of the experiment focuses on the analysis of
radar imagery. The data were collected in the Urban Expansion Monitoring (UrbEx) project, a part of the European Space Agency’s European Space Research Institute
Data User Program [51]. Results from the UrbEx project were used to perform the analysis of the selected test
sites and for validation purposes. We consider a European
Remote Sensing Satellite 2 (ERS-2) SAR pair selected with

perpendicular baselines between 20 and 150 m to obtain
the interferometric coherence. The corresponding pair
(I 1, I 2) of SAR backscattering intensities (0–35 days) was
stacked for analysis; see Figure 4. The relation between the
intensity features is strongly nonlinear and non-Gaussian
and shows a large dispersion; see Figure 4(a). The total correlation, T, computed with RBIG for the original domain is
T = 0.0929 b. A standard approach in SAR image (pre)processing consists of noise removal and marginal Gaussianization, which can address these problems only partially.
This marginal Gaussianization cannot deal with the saturation for high and low signal values [Figure 4(b)]. A multivariate Gaussianization leads to a fully Gaussian density
[Figure 4(c)]. This is confirmed by the estimated total correlation of T = 0.0095 b, as it is less than the marginally
Gaussianized data.
SYNTHESIZING HYPERSPECTRAL IMAGES
To show the ability of the method to deal with high-dimensional data, we consider hyperspectral image processing. We took the standard Airborne Visible/Infrared Imaging Spectrometer Indian Pines data set [52], where the data
have spectral redundancy and complex joint distributions.
The images contain 200 spectral channels, constituting the
(very high) input dimensionality. We learned a Gaussianization transform that led to a multivariate Gaussian domain
of 200-dimension spectral bands. Then, we selected from a
multivariate Gaussian n = 10 6 samples of 200 dimensions
and inverted them back to the spectral domain. RBIG can be
used this way to easily generate synthetic spectra. Figure 5(a)
presents the original and synthesized spectra. It shows
how the proposed method enables us to generate/synthesize seemingly spectral distributions, even in such a highdimensional setting. Figure 5(b) and (c) gives corner plots
illustrating joint distributions among various spectral bands
(10, 20, 50, 100, and 150). We see that the marginal and joint
distributions for the RBIG-generated spectra in Figure 5(c)

TABLE 3. A SUMMARY OF EXPERIMENTS, WITH DETAILS OF THE DATA SETS, CONFIGURATIONS, APPLICATIONS,
AND MEASURES EMPLOYED.
EXPERIMENT

DECEMBER 2021

DATA SET

CHARACTERISTICS

REFERENCE

CONFIGURATION

APPLICATION

MEASURES

SAR: European Remote Sensing
Satellite 2

26 m, backscatter
intensity

[51]

Pixel-wise

Gaussianization

Hyperspectral: Airborne Visible/Infrared
Imaging Spectrometer

30 m, 224 channels [52]

Pixel-wise

Synthesis

Airborne camera: red-green-blue
images

10 cm, 21 classes,
100 images/class

[53]

Spatial

I quantification

Optical: Moderate Resolution Imaging
Spectroradiometer land surface
temperature, normalized-difference
vegetation index

0.05º, 5.5 years,
14 days

[54]

Temporal

I quantification,
PDF comparison

H, MI

Passive microwave: Soil Moisture and
Ocean Salinity (SMOS) soil moisture,
vegetation optical depth

25 km, 5.5 years,
daily

[55]

Temporal

I quantification,
PDF comparison

H, MI

Observed and simulated: evaporation,
SH, precipitation

0.083º, 10 years,
monthly, global

[56]

Spatiotemporal

I quantification

I, H

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

197

Machine learning and, in particular, deep learning have
led to an important leap in classification accuracy. However, owing to the wealth of data and their diversity, it becomes necessary to design algorithms that exploit most of
the images’ information content in terms of relevant features and examples. We validate RBIG to estimate the total correlation (multi-information) in a set of aerial scenes
collected in the University of California, Merced, data set
[53], which contains manually extracted images from the
United States Geological Survey’s National Map Urban
Area Imagery collection, from 21 aerial scene categories,

are very similar to the real data in Figure 5(b) across all pairwise band combinations. It is important to note that some of
the most widely used methods, such as PCA, could replicate
Figure 5(a) with a good approximate mean and standard deviation, but they would not be able to duplicate Figure 5(d),
where all joint distributions are approximately Gaussian.
INFORMATION IN HIGH-RESOLUTION IMAGES
Very-high-resolution images are constantly acquired by
the new generation of sensors on airborne and spaceborne
platforms. A systematic analysis of the images is necessary.

l1
50 100 150 200 250

200
l2

150
100

–2

l1
0

250

l1
0

–2

–1

–2

–3

(a)

(b)

(c)

9,000
8,000
7,000
6,000
5,000
4,000
3,000
2,000
1,000
0

50 75 100 125 150 175 200
Wavelength, λ (nm)
(a)

20
50
150 100

Real
Generated

150 100

Radiance (Wm–2nm–1)

FIGURE 4. Radar image processing. We illustrate the Gaussianization of 2D radar data comprised of a pair (I1, I2) of ERS-2 SAR backscattering intensities. (a) The joint distribution is non-Gaussian, and preprocessing before applying any algorithm is generally convenient. The (b)
standard marginal Gaussianization does not achieve a full spherical (joint) Gaussian, unlike (c) the RBIG transformation [75], [76].

50
(b)

100

150

100

150

(c)

FIGURE 5. The Gaussianization and synthesis of hyperspectral data, using RBIG. In (a) we show the mean and standard deviation spectrum
for the 21,000 real pixels (mean = black; standard deviation = darker shade) and the 1 million pixels generated synthetically (mean = red;
standard deviation = lighter shade) using RBIG. In (b) and (c), we show the marginal and joint distributions of 10, 20, 50, 100, and 150 spectral bands for the real data and for data generated with RBIG, respectively. Plots were generated with corner [77].

198

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

Freeways

b12

b13

Runways

Buildings

b11

Overpasses

Intersections

Baseball Diamonds

Dense Residential

Storage Tanks

Parking Lots

Tennis

Harbors

Rivers

Beaches

Airplanes

Golf Courses

Sparse Residential

Mobile Home Parks

Agriculture

Chaparral

Forest

Medium Residential

(spatiospectral) T using RBIG; see Figure 6(d). We show
in Figure 6(e) the average and standard deviation of the T
evolution through 50 iterations for the 21 classes (note the
log scale) and the total correlation per class. More textured
classes, such as runaways, freeways, buildings, and intersections, lead to higher T, while rather homogeneous/flat

with a 1-ft/pixel resolution. The data set contains highly
overlapping classes and has 100 images per class; examples
appear in Figure 6(a). We extracted 3 # 3 # 3 color patches
from each image, which yielded 6,499,950 27-dimension
feature vectors per class. Then, we developed a Gaussianization transformation for each class and computed the

(a)
b11 b12 b13
g11 g12 g13
r11 r12 r13 22 g23
r r r 32 g33
21

g11

22 b23

r11

32 b33

r21

r31 r32 r33

r31

0
–5

15
log (T )

–10
–15

–20
–25

–30
0

–35
20

30
Iterations
(d)

r33
(c)

b33
g33

r32

(b)

b23

Parking Lots
Runways
Airplanes
Buildings
Freeways
Beaches
Sparse Residential
Intersections
Agriculture
Dense Residential
Overpasses
Rivers
Storage Tanks
Baseball Diamonds
Harbors
Medium Residential
Tennis Courts
Mobile Home Parks
Golf Courses
Forest
Chaparral

T
(e)

FIGURE 6. The estimation of the total correlation, T, in very-high-resolution aerial imagery. (a) Images for each of the 21 classes in the database, ranked according to their estimated T. (b) Each image is decomposed in 3 # 3 patches with three channels (red-green-blue), making
samples of 27 dimensions. (c) We measure how much overlap there is between the information content (i.e., the total correlation) of the 27
dimensions for each class. We show a Venn diagram to illustrate the measured information, following the same criteria as in Figure 3. (d) The
average total correlation is iteratively computed for the different 21 class-specific RBIG models through 50 iterations, with the mean T (solid)
and the T standard deviation (shaded) across all models. Convergence is achieved very rapidly for all classes (note the log scale). (e) The
ranked T per class computed from the RBIG models [78].
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

199

classes, including chaparral, fields, and forests, have little
information content.
NFORMATION QUANTIFICATION OF TERRESTRIAL
BIOSPHERE VARIABLES IN TIME
According to climate projections, extreme events are likely
to intensify and become more frequent during the coming
years [59]. The effects of extreme events (such as droughts)
are prevalent not only in the biosphere and atmosphere but
in the anthroposphere. Drought is a major cause of limited agricultural productivity, which accounts for a large
proportion of the crop losses and annual yield variations
throughout the world [60]. Droughts are also direct contributors to social conflicts, migration, and political unrest
(e.g., [61]). There are many studies that show the value of
incorporating EO data for global agricultural systems and
applications [62], [63]. Variables, such as the land surface
temperature (LST) and the normalized-difference vegetation index (NDVI), derived from optical satellites, and,
more recently, soil moisture (SM) and the vegetation optical depth (VOD) derived from passive microwave sensors,
are just a few of the many features that can potentially be
key to the early detection of droughts [54], [55], [64]. The
Soil Moisture Agricultural Drought Index (SMADI) was
proposed in [65] to integrate SM with the LST and NDVI,
showing good agreement with other indices and documented events worldwide [54].
In this experiment, we quantify the information in and
between LST, NDVI, SM, and VOD variables for a study area
in California (only agricultural fields); see Figure 7. The LST
and NDVI are descriptors of the surface temperature and
vegetation chlorophyll content, whereas SM and the VOD
characterize the water content in soils and vegetation [55],
[65]. We also use information measures to evaluate whether
it would be worthwhile to include the VOD as an additional
variable in the SMADI ensemble to characterize droughts.
Prior to the analyses, variables are resampled into a common 0.05º grid and biweekly temporal resolution. Details
of the data sets are provided in Table 3. Measures are conducted for 2010 and 2011 and 2014–2016, which are representative of conditions with and without droughts (see
Figure 7).
We focus on computing multivariate IT measures in
a temporal feature setting, where previous time steps are
included as input features. For example, one input feature
includes the current time stamp, two input features include
the current time stamp and a time stamp from 14 days
earlier, and so on. This enables us to investigate temporal
scales that maximize shared information among remotely
sensed variables. This is particularly relevant for droughts
since there is a time lag between soil/climatic conditions
(e.g., represented by SM and the LST) and plant responses
(e.g., described by the NDVI and VOD), which varies in the
literature from two or three weeks to three months [66].
The amount of expected information H for each of
the four variables and how it changes as we include more
200

temporal dimensions is analyzed in Figure 8(a). Entropy will
always increase with more features. The entropy shown here
has been normalized by the total number of features, which
enables us to quantify the amount of entropy per feature. It
can be seen that the amount of entropy for the VOD is the
highest in all temporal settings, closely followed by the LST.
All variables decrease in entropy as we add more temporal
features. The NDVI saturates at roughly 1.5 b, whereas the
other variables have a steady, smooth decline. We can also
see that the LST and VOD show the largest difference between years with and without droughts and that the difference is largest as we increase the temporal dimension. This
result suggests that the LST and VOD observed during longer periods could be more useful in detecting droughts. Figure 8(b) demonstrates that the VOD increases the amount
of expected information when added to the SMADI variable
ensemble in all the considered temporal settings, suggesting
that it would be worthwhile to include the VOD in agricultural drought studies. The results indicate that vegetation
monitoring operational settings could benefit from synergistic approaches that facilitate including multisensory,
multidimensional variables, in particular, under stress and
during disturbances, such as agricultural droughts.
The MI of every pair of multidimensional variables was
analyzed to investigate the pairs’ relationships and redundancies as well as the optimal time scales for combining
them. Note that standard measures for pairwise comparison, such as Pearson’s correlation, are restricted to one temporal dimension and hence do not facilitate exploring these
scales. The MI scores obtained for LST relations are given in
Figure 9. Interestingly, the figure shows that the LST–NDVI
and LST–VOD show an MI increase to approximately two
to four temporal dimensions and then saturate. This result
suggests that a period of about one or two months is needed
to capture the soil–plant status through the remotely sensed
variables analyzed in our study region. The curves are relatively similar regardless of whether there is a drought or
not, and the value spread for drought years is considerably
reduced for all variables and especially for the VOD. This
could be related to reduced variability (a limited range of
values) during droughts, but further studies are needed to
confirm this. We also observed that the MI is consistently
low among SM and all variables with any number of temporal dimensions, and it is also low between the NDVI and
VOD, highlighting the value of combining optical and microwave variables for vegetation/land monitoring.
INFORMATION IN SPATIAL–TEMPORAL EARTH DATA
DATA
For our experiments, we used observational and model
simulated variables from the Earth Science Data Lab [56]
(https://www.earthsystemdatalab.net/), which is a platform that provides an opportunity for datacentric processing methodologies. The analysis-ready data cube contains
and harmonizes more than 40 variables to monitor key
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

42° N

39° N

36° N
Legend
California
Cropland
Selected Pixel

33° N

100 0 100 200 300 400 km
30° N
123° W

120° W
(a)

117° W

114° W

VOD

0
1

NDVI

0
1

LST

0
1

SMADI

0
5
0

2011

2012

2013
(b)

2014

2015

2013
(c)

2014

2015

100
80
60
40

D0 Abnormally Dry
D1 Moderate Drought
D2 Severe Drought
D3 Extreme Drought
D4 Exceptional Drought

20
0

2011

2012

FIGURE 7. (a) The distribution of cropland in California, according to the Moderate Resolution Imaging Spectroradiometer International

Geosphere–Biosphere Program land cover classification. (b) A time series of the normalized VOD, NDVI, SM, and LST, as well as the SMADI
index [57] obtained at the selected pixel. The SMADI extreme drought category is marked with an orange horizontal line. (c) The percentage
of the area in California that is in U.S. drought monitor [58] categories. Figure 7(a) was generated with QGIS [79].
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

201

Normalized Entropy

2
1.75
1.5
1.25
1
0.75
0.5

Variables
VOD
NDVI
SM
LST
Droughts
False
True
2

6
5

t1 t2 t3 t4

NDVI

t1 t2 t3 t4

LST

t1 t2 t3 t4

10
SMADI

Variables
SMADI
SMADI +
Droughts
False
True

7
Normalized Entropy

4
6
8
Temporal Dimensions
(a)

VOD

NDVI

t1 t2 t3 t4

SM t1 t2 t3 t4
LST t1 t2 t3 t4
SMADI+

4
NDVI
3
2

4
6
8
Temporal Dimensions
(b)

t1 t2 t3 t4

SM t1 t2 t3 t4
LST t1 t2 t3 t4
VOD t1 t2 t3 t4

FIGURE 8. (a) A comparison of the entropy for the VOD, LST, NDVI and SM individually against

the number of considered temporal dimensions. (b) A comparison of the VOD entropy contribution to the joint multidimensional variables integrated in the SMADI (the LST, NDVI, and
SM) and the SMADI + (the LST, NDVI, SM, and VOD) and how it changes as we include more
temporal dimensions. Solid lines are mean estimates, and shaded regions are the variance estimates for 2010 and 2011, when there were no droughts, and 2014 and 2015, when there were
droughts. Next to each graphic, we show a Venn diagram to illustrate the measured information
for three temporal dimensions as an example, following the same criteria as in Figure 3. Plots
were generated using seaborn [73].

Variables
VOD
NDVI
SM
LST
Droughts
False
True

Normalized MI

0.7
0.6
0.5
0.4
0.3

VOD t1 t2 t3 t4
LST t1 t2 t3 t4
NDVI t1 t2 t3 t4
LST t t t t
1 2 3 4
SM t1 t2 t3 t4
LST t1 t2 t3 t4

0.2
0.1
2

4
6
8
Temporal Dimensions

FIGURE 9. MI between pairs of multidimensional variables: LST–VOD, LST–NDVI, and LST–SM.

Solid lines are mean estimates, and shaded regions are the variance estimates for 2010 and
2011, when there were no droughts, and 2014 and 2015, when there were droughts. The Venn
diagram illustrates the measured information for three temporal dimensions as an example,
following the same criteria as in Figure 3. Plots were generated using seaborn [73].
202

processes of the terrestrial surface
and the atmosphere. The data exhibit
clear spatial–temporal relations that
need to be accounted for to properly
convey and quantify information.
Figure 10 illustrates how we represent
the spatial–temporal relations as inputs given a single variable. We focus
on three key land surface variables:
precipitation, SH, and evaporation,
which are outlined in the following:
◗
Precipitation: This is a fundamental variable in land–atmosphere
processes. The collected data cover
the period from 1980 to 2015 and
come from the Global Precipitation Climatology Project [67], [68].
◗
SH: These data cover 2001–2012
and were generated by training
an ensemble of machine learning
algorithms with eddy covariance
data from FLUXNET and satellite
observations in a cross-validation
approach. Regressions from these
observations to different kinds
of carbon and energy fluxes were
established and used to generate
data sets with a spatial resolution
of 5 arc minutes and a temporal
resolution of eight days. The SH
is a conductive heat flux from the
Earth’s surface to the atmosphere;
it is an important component of
Earth’s surface energy budget and
is expressed in [W m–2] [69].
◗
Evaporation: These data span 2001–
2011 and build on the Global Land
Evaporation Amsterdam Model,
which consists of a set of algorithms that separately estimate
the different components of land
evaporation by using input-forcing
data sets from re-analyses, optical and microwave satellites, and
other merged sources. The model
consists of four modules: potential
evaporation (the Priestley–Taylor
equation), interception (the Gash
analytical model), soil (the multilayer soil model plus data assimilation), and stress (semiempirical).
The data are sampled on a grid
of 0.25º and have daily temporal
coverage [70], [71].
The data are organized in a 4D
cube x (u, v, t, k) involving (latitude,

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

longitude) spatial coordinates (u, v), time sampling t, and
the variable k. They are provided at two spatial resolutions
(0.083º and 0.25º) and at a temporal resolution of eight
days, encompassing the years 2001–2011. In our experiments, we focus on the lower-resolution products and on
the period from 2008 to 2010.
SPATIAL–TEMPORAL ANALYSIS
The considered variables (precipitation, SH, and evaporation) are fully coupled. Moisture and precipitation interactions are vastly modulated by both land–atmosphere
exchanges and large-scale atmospheric circulation. Nevertheless, before understanding variable relations, it is important to identify when and where individual variables are expressive. This may help to assess the coupling mechanisms
among variables and improve Earth system models. The
question we want to address in this experiment is as follows: What are the optimal (in information terms) spatial
and temporal scales for exploiting each variable’s information? Using RBIG, we show that the ratio of the spatiotemporal neighboring pixels that gives the most information
can be explicitly calculated. We used RBIG to calculate the
entropy H for the aforementioned variables under different spatial–temporal configurations (fully temporal, spatiotemporal, and fully spatial) as well as the corresponding
information I (x) for each time pixel and variable.
Figure 11 shows the entropy for the different variables
and configurations, following the same procedure as [72]
(and used, only in the spatial domain, in the “Information
and Redundancy in High-Resolution Images” section). Essentially, we formed cubes with the same dimensionality
but different spatiotemporal configurations and computed
the entropy values for each. We chose several configurations ranging from a ratio of purely spatial (ratio = zero)
up to purely temporal (ratio = one). We also looked at different configurations for the number of spatial–temporal
dimensions used, e.g., a maximum of four dimensions
up to a maximum of 49 dimensions (temporally, this is

approximately one year). Notice how each variable has a
different spatial–temporal relationship with entropy, but,
in general, temporal configurations (ratio = one) convey
more information than purely spatial ones (ratio = zero) for
all the considered variables. The trends are clear, in particular, for precipitation, where incorporating temporal data for
any number of dimensions yields a higher amount of expected information. For SH and evaporation, the entropy
paths are similar and reveal a fast entropy increase for particular spatiotemporal configurations (ratio ~ 0.8). These
results suggest different optimal (in information terms)
time and space scales for various variables, which may have
implications in further analyses and applications.
Using the same data configurations, we computed the
information content of each sample, following the procedure described in the “Information” section. This helps
to visualize the regions that have more and less information. We show in Figure 12 the results of a spatiotemporal
analysis of the information content of all three variables.
In regions where we expect pronounced seasonal patterns,
the information (complexity) is apparently high in fully
temporal configurations, as the seasonal cycle controls
ecosystem dynamics. Actually, seasonal (temporal) modes
have less informative content in the spatial domain, as they
are mainly driven by solar forcing. The information values
tend to be higher in tropical regions, whereas arid locations
show low-complexity (low-information) patterns. Let us
now look in deeper detail at the different spatiotemporal
configurations and their information patterns.
Global rainfall patterns are traditionally related to strong
seasonality, dominated by the position of the Intertropical
Convergence Zone and El Niño–La Niña cycles, which occur irregularly at intervals of two to seven years. Spatial data
generally dominate with high probability in the Amazon and
the tropics and with little information in desert areas (e.g.,
California, the Arabian Peninsula, and central Australia). As
we quantify information in spatiotemporal configurations,
clearer patterns of little information (e.g., Australia) and

Spatial

Temporal

Latitude

Time

Longitude

7×7×1

4×4×3

1 × 1 × 46

FIGURE 10. The decomposition of the Earth science data cube (ESDC) [56] into different spatial–temporal configurations ranging from

completely spatial to completely temporal. The 7 # 7 # 1 spatial configuration consists entirely of spatial pixels; this is very similar to spatial
patches. The 1 # 1 # 46 configuration includes only temporal pixels, which is essentially a time series. The 4 # 4 # 3 configuration includes
a mix of spatial and temporal pixels. Throughout this article, we see different notions of spatial–temporal representation of the ESDC data.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

203

Ratio

Spatial

Temporal

(a)
0

4
9
16

Entropy

–0.5

25
36
49

–1
–1.5
–2
–2.5
0

0.5

Ratio
(b)
4
9
16

25
36
49

Entropy

0.5
0
–0.5
–1
0

0.5
Ratio

4
9
16

–1.5

25
36
49

Entropy

–1.75
–2
–2.25
–2.5
–2.75
0

0.5
Ratio

(d)
FIGURE 11. (a) The entropy measurement for different spatiotemporal configurations in the ESDC variables. The IT Venn diagram
represents Figure 10 and how it relates to measuring entropy: the
expected uncertainty. The measured entropy for (b) precipitation,
(c) SH, and (d) evaporation from the ESDC [56] changes with different spatial–temporal representations, ranging from fully spatial
(ratio = zero) to fully temporal (ratio = one).

204

high probability (e.g., the east–west U.S. gradient) emerge
[45]. Studying precipitation in the fully temporal configuration translates into a clear ruling of the winter season in the
Amazon, Indonesia, and northern Europe. Yet a comparison
of temporal versus spatial information in Figure 12 (bottom
row) reveals that spatial information dominates in desert
areas (e.g., Australia, the Iberian Peninsula, the Sahara, and
Mexico), which are reasonably independent of time, and that
temporal information dominates in the Sahel (savanna),
northern latitudes, and southwest China, which are generally
characterized by high rain factors, seasons, and moisture.
The transfer of SH into the air is dependent on the temperature gradient between the surface and the space above.
SH information patterns stand out clearly. While (fully)
spatial information dominates in the northern hemisphere,
(fully) temporal information patterns appear in the tropics, where rainfall is present across larger regions and longer
seasons. The global spatial distribution of SH information
shows the largest values in subtropical, dry regions, where
available energy is preferentially partitioned to SH rather
than latent heat [50], and it seems to be anticorrelated with
the amplitude of the mean seasonal cycle. These results reveal the most SH information in tropical and subtropical
deserts, where a high surface temperature conducts much
heat into the air above, and the least information near the
poles, where surface temperatures are much lower. The information is mainly concentrated in the tropics, too, and
shows patterns similar to those of precipitation, with the
exception of clear spatial information in India.
Evaporation maps reveal that spatial information dominates in deserts and dry regions, where evaporation is limited, while temporal information (with more interannual
variability) resides in northern latitudes. This is mainly due
to low temperatures and radiation, equating to little evaporation throughout the year. Temperate areas show increased
evaporation information in purely spatial and temporal configurations, coinciding with increasing temperatures above
ground moistened by winter rains. Cooler winter temperatures in the southern hemisphere reduce evaporation, which
is also captured in the spatial-versus-temporal divergence
maps. Note that, in very dry regions, there is more information (a lower evaporative fraction), while, conversely, for very
humid regions, the information agrees with [50].
CONCLUSIONS
This article introduced a Gaussianization method and illustrated how to use it for multivariate density estimation in
the context of Earth system science. The problem is highly
relevant given the advent of all kinds of Earth data (both
remotely sensed and in situ observations), novel products,
and model simulations. Density estimation is a long-standing, unresolved problem in statistics and machine learning,
mainly because of the curse of dimensionality. Data in remote sensing and geosciences pose additional problems for
PDF estimation: high-dimensional data, nonlinear feature
relations, many noise sources, and distinct spatial–temporal
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

7×7×1

1 × 1 × 46

–60

–40

–20

–60

–40

–20

–60

–40

–20

(a)

–150 –100 –50
0
50
Longitude (°)

–1

–150 –100 –50
0
50
Longitude (°)

1
0.5
0

–150 –100 –50
0
50
Longitude (°)

1
0.5
0

100

150

–60

–40

–20

–60

–40

–20

–60

–40

–20

(b)

–150 –100 –50
0
50
Longitude (°)

–1

–150 –100 –50
0
50
Longitude (°)

1
0.5
0

–150 –100 –50
0
50
Longitude (°)

1
0.5
0

100

150

–60

–40

–20

–60

–40

–20

–60

–40

–20

(c)

–150 –100 –50
0
50
Longitude (°)

–1

–150 –100 –50
0
50
Longitude (°)

1
0.5
0

–150 –100 –50
0
50
Longitude (°)

1
0.5
0

100

150

FIGURE 12. The top two first rows show information content maps for precipitation, SH, and evaporation, using a fully spatial (7 # 7 spatial width and temporal length of one) and a fully temporal
(1 # 1 spatial width and temporal length of 46) configuration. The bottom row shows a divergent map of the tradeoff (subtraction) between the fully spatial and fully temporal information content
per each variable. (a) Precipitation. (b) SH. (c) Evaporation. Plots were generated with xarray and cartopy [80], [81].

Spatial–Temporal I (x )

Latitude (°)

I (x )

Spatial–Temporal

Latitude (°)
Latitude (°)
Latitude (°)

I (x )
I (x )

I (x )
I (x )
Spatial–Temporal

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

Spatial–Temporal

DECEMBER 2021

205

structures. Looking at the literature, most sources dealing
with density estimation involve few dimensions, treat the
problems marginally, and construct parametric models of
the densities. The Gaussianization methodology 1) scales
to very high dimensions, 2) jointly works with all dimensions through simple orthogonal transforms plus marginal
operations, and 3) does not assume any parametric form
of the density. Using the standard multivariate Gaussian
as a convenient goal distribution in the transform domain
leverages the change-of-variables formula to compute exact probability densities. And, by extension, we are able to
compute IT metrics easily.
We showed empirical performance evidence in several
Earth system data analysis problems, using a wide diversity of data (multispectral, hyperspectral, SAR, and global
products from satellites and Earth system models), and
addressed the key problems of information estimation,
redundancy, and synthesis. Our results confirmed the validity of the method, for which we anticipate wide use and
adoption. The framework enables us to tackle all applications involving a PDF estimation, from data classification
to denoising and coding, which were not treated in this article. The methodology also facilitates computing other interesting IT measures, such as KL divergence and conditional independence, which will be a subject of future research.
ACKNOWLEDGMENTS
This research was funded by the European Research
Council (ERC), under the ERC–CoG-2014 project (grant
647423) and the ERC-SyG-2019 USMILE project (grant
855187). J. Emmanuel Johnson thanks the European Space
Agency (ESA) for support via the Early Adopter Call of the
Earth System Data Lab project. Additional support was
provided by Project RTI2018-096765-A-100 (MCIU/AEI/
FEDER, UE). Valero Laparra is supported by the projects
TEC2016-77741-R, DPI2017-89867-C2-2-R, and PID2019109026RB-I00. Maria Piles thanks the ESA for the longterm support of this initiative.

[2]

[3]

[4]

[5]

[6]
[7]

[8]

[9]

[10]

[11]
[12]

[13]

AUTHOR INFORMATION
J. Emmanuel Johnson (juan.johnson@uv.es) is with the Image Processing Laboratory, University of Valencia, Valencia,
46010, Spain.
Valero Laparra (valero.laparra@uv.es) is with the Image Processing Laboratory, University of Valencia, Valencia,
46010, Spain.
María Piles (maria.piles@uv.es) is with the Image Processing Laboratory, University of Valencia, Valencia, 46010,
Spain. She is a Senior Member of IEEE.
Gustau Camps-Valls (gustau.camps@uv.es) is with the
Image Processing Laboratory, University of Valencia, Valencia, 46010, Spain. He is a Fellow of IEEE.
REFERENCES
[1]

206

W. Buermann, J. Dong, X. Zeng, R. B. Myneni, and R. E. Dickinson, “Evaluation of the utility of satellite-based vegetation leaf

[14]

[15]

[16]

area index data for climate simulations,” J. Clim., vol. 14, no. 17,
pp. 3536–3550, 2001. doi: 10.1175/1520-0442(2001)0142.0.CO;2.
R. H. Moss et al., “The next generation of scenarios for climate
change research and assessment,” Nature, vol. 463, no. 7282,
pp. 747–756, 2010. doi: 10.1038/nature08823.
J. T. Overpeck, G. A. Meehl, S. Bony, and D. R. Easterling, “Climate data challenges in the 21st century,” Science, vol. 331, no.
6018, pp. 700–702, 2011. doi: 10.1126/science.1197869.
V. Eyring et al., “Taking climate model evaluation to the next
level,” Nature Climate Change, vol. 9, no. 2, pp. 102–110, 2018.
doi: 10.1038/s41558-018-0355-y.
M. Reichstein et al., “Deep learning and process understanding for data-driven Earth System Science,” Nature, vol. 566, pp.
195–204, Feb. 2019. doi: 10.1038/s41586-019-0912-1.
T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd
ed. Hoboken, NJ: Wiley, 2006.
G. A. Darbellay and I. Vajda, “Estimation of the information
by an adaptive partitioning of the observation space,” IEEE
Trans. Inf. Theory, vol. 45, no. 4, pp. 1315–1321, Sept. 2006. doi:
10.1109/18.761290.
A. Kraskov, H. Stögbauer, and P. Grassberger, “Estimating mutual information,” Phys. Rev. E, vol. 69, no. 6, p. 066138, 2004
June. doi: 10.1103/PhysRevE.69.066138
Q. Wang, S. Kulkarni, and S. Verdú, “A nearest-neighbor approach to estimating divergence between continuous random
vectors,” in Proc. IEEE Int. Symp. Inf. Theory, 2006, pp. 242–246.
doi: 10.1109/ISIT.2006.261842.
N. Leonenko, L. Pronzato, and V. Savani, “A class of Rényi information estimators for multidimensional densities,” Annu.
Statist., vol. 36, no. 5, pp. 2153–2182, 2008. doi: 10.1214/
07-AOS539.
D. W. Scott, Multivariate Density Estimation: Theory, Practice, and
Visualization. Hoboken, NJ: Wiley, 2015.
F. Pérez-Cruz, “Estimation of information theoretic measures
for continuous random variables,” in Proc. 22nd Ann. Conf. Neural Inf. Process. Syst., in Advances in Neural Information Processing Systems 21, D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, Eds. Curran Associates, 2009, pp. 1257–1264.
S. Paul and D. N. Kumar, “Spectral-spatial classification
of hyperspectral data with mutual information based segmented stacked autoencoder approach,” ISPRS J. Photogramm.
Remote Sens., vol. 138, pp. 265–280, 2018. doi: 10.1016/j.isprsjprs.2018.02.001.
A. G. Konings, K. A. McColl, M. Piles, and D. Entekhabi, “How
many parameters can be maximally estimated from a set of
measurements?” IEEE Geosci. Remote Sens. Lett., vol. 12, no. 5,
pp. 1081–1085, 2015. doi: 10.1109/LGRS.2014.2381641.
A. Marinoni and P. Gamba, “Unsupervised data driven feature
extraction by means of mutual information maximization,”
IEEE Trans. Computat. Imag., vol. 3, no. 2, pp. 243–253, 2017.
doi: 10.1109/TCI.2017.2669731.
J. Zhang, M. Zareapoor, X. He, D. Shen, D. Feng, and J. Yang,
“Mutual information based multi-modal remote sensing image
registration using adaptive feature weight,” Remote Sens. Lett.,
vol. 9, no. 7, pp. 646–655, 2018. doi: 10.1080/2150704X.2018
.1458343.

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

[17] S. Prasad and L. M. Bruce, “Hyperspectral feature space partitioning via mutual information for data fusion,” in Proc.
2007 IEEE Int. Geosci. Remote Sens. Symp., pp. 4846–4849. doi:
10.1109/IGARSS.2007.4423946.
[18] L.-Y. Zhao, B.-Y. Lü, X.-R. Li, and S.-H. Chen, “Multi-source
remote sensing image registration based on scale-invariant
feature transform and optimization of regional mutual information,” Acta Phys. Sin., vol. 64, no. 12, p. 124,204, 2015. doi:
10.1109/IGARSS.2016.7729658.
[19] S. Chen, X. Li, L. Zhao, and H. Yang, “Medium-low resolution multisource remote sensing image registration based on sift and robust regional mutual information,” Int. J. Remote Sens., vol. 39, no.
10, pp. 3215–3242, 2018. doi: 10.1080/01431161.2018.1437295.
[20] X. Xu, X. Li, X. Liu, H. Shen, and Q. Shi, “Multimodal registration of remotely sensed images based on Jeffrey’s divergence,”
ISPRS J. Photogramm. Remote Sens., vol. 122, pp. 97–115, Dec.
2016. doi: 10.1016/j.isprsjprs.2016.10.005.
[21] A. C. Frery, “Stochastic contrast measures for SAR data: A survey,”
J. Radars., vol. 8, no. 6, pp. 758–781, 2019. doi: 10.12000/JR19108.
[22] A. D. C. Nascimento, A. C. Frery, and R. J. Cintra, “Detecting
changes in fully polarimetric SAR imagery with statistical information theory,” IEEE Trans. Geosci. Remote. Sens., vol. 57, no.
3, pp. 1380–1392, 2019. doi: 10.1109/TGRS.2018.2866367.
[23] B. L. Ruddell, N. A. Brunsell, and P. Stoy, “Applying information theory in the geosciences to quantify process uncertainty,
feedback, scale,” Eos, Trans. Amer. Geophys. Union, vol. 94, no. 5,
pp. 56–56, 2013. doi: 10.1002/2013EO050007.
[24] G. Papamakarios, “Neural density estimation and likelihoodfree inference,” 2019, arXiv:abs/1910.13233.
[25] L. Ardizzone, J. Kruse, C. Rother, and U. Köthe, “Analyzing inverse
problems with invertible neural networks,” 2019, arXiv:1808.04730.
[26] D. J. Rezende et al., “Normalizing flows on tori and spheres,”
2020, arXiv:abs/2002.02428.
[27] S. S. Chen and R. A. Gopinath, “Gaussianization,” in Proc. Adv. Neural Inf. Process. Syst., in Advances in Neural Information Processing
Systems 13, Papers from Neural Information Processing Systems
(NIPS), Denver, CO, 2000, pp. 423–429.
[28] V. Laparra, G. Camps-Valls, and J. Malo, “Iterative gaussianization: From ICA to random rotations,” IEEE Trans. Neural
Netw., vol. 22, no. 4, pp. 537–549, 2011. doi: 10.1109/TNN.2011
.2106511.
[29] J. Ballé, V. Laparra, and E. P. Simoncelli, “Density modeling
of images using a generalized normalization transformation,”
2016, arXiv:abs/1511.06281.
[30] C. Meng, Y. Song, J. Song, and S. Ermon, “Gaussianization
flows,” 2020, arXiv:abs/2003.01941.
[31] E. T. Nalisnick, A. Matsukawa, Y. W. Teh, D. Görür, and B. Lakshminarayanan, “Hybrid models with deep and invertible features,” 2019, arXiv:abs/1902.02767.
[32] H. A. Sturges, “The choice of a class interval,” J. Amer. Statist. Assoc.,
vol. 21, no. 153, pp. 65–66, 1926. doi: 10.1080/01621459.1926
.10502161.
[33] D. Freedman and P. Diaconis, “On the histogram as a density
estimator:l2 theory,” Zeitschrift für Wahrscheinlichkeitstheorie
und Verwandte Gebiete, vol. 57, no. 4, pp. 453–476, 1981. doi:
10.1007/BF01025868.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

[34] K. H. Knuth, “Optimal data-based binning for histograms
and histogram-based probability density models,” Digit. Signal Process., vol. 95, p. 102,581, Dec. 2019. doi: 10.1016/j.
dsp.2019.102581.
[35] C. M. Bishop, “Pattern recognition and machine learning,” in
Information Science and Statistics, 5th ed. New York: Springer, 2007.
[36] S. S. Chen and R. A. Gopinath, “Gaussianization,” in Advances in
Neural Information Processing Systems 13, T. K. Leen, T. G. Dietterich,
and V. Tresp, Eds., Cambridge, MA: MIT Press, 2001, pp. 423–429.
[37] G. M. Goerg, “The lambert way to gaussianize heavy-tailed data
with the inverse of Tukey’s h transformation as a special case,”
2015, arXiv:1010.2265.
[38] C. Durkan, A. Bekasov, I. Murray, and G. Papamakarios, “Neural spline flows,” in Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché
Buc, E. Fox, and R. Garnett, Eds., Vancouver, BC, CA: Curran
Associates, 2019, pp. 7511–7522.
[39] G. E. P. Box and D. R. Cox, “An analysis of transformations,” J.
Roy. Statist. Soc. Ser. B (Methodological), vol. 26, no. 2, pp. 211–
252, 1964. doi: 10.1111/j.2517-6161.1964.tb00553.x.
[40] G. Liu, Y. Liu, M. Guo, P. Li, and M. Li, “Variational inference
with Gaussian mixture model and householder flow,” Neural
Netw. Official J. Int. Neural Netw. Soc., vol. 109, pp. 43–55, Jan.
2019. doi: 10.1016/j.neunet.2018.10.002.
[41] J. M. Tomczak and M. Welling, “Improving variational autoencoders using householder flow,” 2016, arXiv:abs/1611.09630.
[42] J. H. Friedman, “Exploratory projection pursuit,” J. Amer.
Statist. Assoc., vol. 82, no. 397, pp. 249–266, 1987. doi:
10.1080/01621459.1987.10478427.
[43] D. I. Inouye and P. Ravikumar, “Deep density destructors,” in
Proc. 35th Int. Conf. Machine Learn., 2018, pp. 2167–2175.
[44] P. Jaini, K. A. Selby, and Y. Yu, “Sum-of-squares polynomial
flow,” Proceedings of Machine Learning Research, vol. 97, K.
Chaudhuri and R. Salakhutdinov, Eds., Long Beach, CA: PMLR,
June 9–15, 2019, pp. 3009–3018.
[45] S. Tuttle and G. Salvucci, “Empirical evidence of contrasting soil moisture–precipitation feedbacks across the United
States,” Science, vol. 352, no. 6287, pp. 825–828, 2016. doi:
10.1126/science.aaa7185.
[46] J. Cardoso, “Dependence, correlation and Gaussianity in independent component analysis,” J. Mach. Learn. Res., vol. 4, nos.
7–8, pp. 1532–4435, 2004. doi: 10.1162/jmlr.2003.4.7-8.1177.
[47] C. E. Shannon, “A mathematical theory communication,” Bell
Syst. Techn. J., vol. 27, pp. 379–423, 1948.
[48] M. S. Watanabe, “Information theoretical analysis of multivariate correlation,” IBM J. Res. Develop., vol. 4, no. 1, pp. 66–82,
1960. doi: 10.1147/rd.41.0066.
[49] M. Studený and J. Vejnarová, The multiinformation function
as a tool for measuring stochastic dependence,” in Proc. NATO
Adv. Study Inst. Learn. Graph. Models., 1998, pp. 261–297. doi:
10.5555/308574.308673.
[50] M. Jung et al., “Global patterns of land-atmosphere fluxes
of carbon dioxide, latent heat, and sensible heat derived
from eddy covariance, satellite, and meteorological observations,” J. Geophys. Res., BiogeoSci., vol. 116, no. G3, 2011. doi:
10.1029/2010JG001566.

207

[51] L. Gómez-Chova, D. Fernández-Prieto, J. Calpe, E. Soria, J. VilaFrancés, and G. Camps-Valls, “Urban monitoring using multitemporal SAR and multispectral data,” Pattern Recognit. Lett., vol. 27,
no. 4, pp. 234–243, 2006. doi: 10.1016/j.patrec.2005.08.004.
[52] M. F. Baumgardner, L. L. Biehl, and D. A. Landgrebe. “220 Band
AVIRIS Hyperspectral Image Data Set: June 12, 1992 Indian
Pine Test Site 3.” Sept. 2015. Purdue University. https://purr
.purdue.edu/publications/1947/1
[53] Y. Yang and S. Newsam, “Bag-of-visual-words and spatial extensions for land-use classification,” in Proc. 18th SIGSPATIAL Int. Conf. Adv. Geogr. Inf. Syst., 2010, pp. 270–279. doi:
10.1145/1869790.1869829.
[54] N. Sánchez, Á. González-Zamora, J. Martínez-Fernández, M.
Piles, and M. Pablos, “Integrated remote sensing approach to global
agricultural drought monitoring,” Agric. For. Meteorol., vol. 259,
pp. 141–153, Sept. 2018. doi: 10.1016/j.agrformet.2018.04.022.
[55] R. Fernandez-Moran et al., “SMOS-IC: An alternative SMOS
soil moisture and vegetation optical depth product,” Remote
Sens., vol. 9, no. 5, p. 457, May 2017. doi: 10.3390/rs9050457.
[56] M. Mahecha et al., “Earth system data cubes unravel global
multivariate dynamics,” Earth Syst. Dynamics, vol. 11, no. 1, pp.
201–234, Feb. 2020. doi: 10.5194/esd-11-201-2020.
[57] Á. González-Zamora, N. Sánchez, and M. Piles. “Global Soil
Moisture Agricultural Drought Index (SMADI).” June 17, 2019.
Zenodo. https://zenodo.org/record/3247649#.YFCot9IzbIU
[58] U.S. Drought Monitor. https://droughtmonitor.unl.edu/ (accessed Mar. 3, 2020).
[59] J. Zscheischler, M. D. Mahecha, S. Harmeling, and M. Reichstein, “Detection and attribution of large spatiotemporal extreme events in Earth observation data,” Ecol. Informat., vol. 15,
pp. 66–73, May 2013. doi: 10.1016/j.ecoinf.2013.03.004.
[60] J. S. Boyer, “Plant productivity and environment,” Science,
vol. 218, no. 4571, pp. 443–448, 1982. doi: 10.1126/science.218.
4571.443.
[61] C. P. Kelley, S. Mohtadi, M. A. Cane, R. Seager, and Y. Kushnir,
“Climate change in the Fertile Crescent and implications of the
recent Syrian drought,” Proc. Natl. Acad. Sci., vol. 112, no. 11,
pp. 3241–3246, 2015. doi: 10.1073/pnas.1421533112.
[62] S. Fritz et al., “A comparison of global agricultural monitoring
systems and current gaps,” Agric. Syst., vol. 168, pp. 258–272,
Jan. 2019. doi: 10.1016/j.agsy.2018.05.010.
[63] M. Weiss, F. Jacob, and G. Duveiller, “Remote sensing for agricultural applications: A meta-review,” Remote Sens. Environ., vol.
236, p. 111,402, Jan. 2020. doi: 10.1016/j.rse.2019.111402.
[64] S. Sadri, E. F. Wood, and M. Pan, “Developing a drought-monitoring index for the contiguous US using SMAP,” Hydrol. Earth
Syst. Sci., vol. 22, no. 12, pp. 6611–6626, 2018. doi: 10.5194/
hess-22-6611-2018.
[65] N. Sánchez, Á. González-Zamora, M. Piles, and J. MartínezFernández, “A new Soil Moisture Agricultural Drought Index
(SMADI) integrating MODIS and SMOS products: A case of
study over the Iberian Peninsula,” Remote Sens., vol. 8, no. 4, p.
287, 2016. doi: 10.3390/rs8040287.
[66] G. P. Petropoulos and T. Islam, Remote Sensing of Hydrometeorological Hazards. Boca Raton, FL: CRC Press, 2017.

208

[67] R. F. Adler et al., “The Version-2 Global Precipitation Climatology Project (GPCP) Monthly Precipitation Analysis
(1979-Present),” J. Hydrometeorol., vol. 4, no. 6, pp. 1147–1167,
2003. doi: 10.1175/1525-7541(2003)004<1147:T VGPCP>
2.0.CO;2.
[68] G. J. Huffman, R. F. Adler, D. T. Bolvin, and G. Gu, “Improving
the global precipitation record: GPCP version 2.1,” Geophys. Res.
Lett., vol. 36, no. 17, 2009. doi: 10.1029/2009GL040000.
[69] G. Tramontana et al., “Predicting carbon dioxide and energy
fluxes across global fluxnet sites with regression algorithms,”
Biogeosciences, vol. 13, no. 14, pp. 4291–4313, 2016. doi: 10.5194/
bg-13-4291-2016.
[70] B. Martens et al., “Gleam v3: Satellite-based land evaporation
and root-zone soil moisture,” Geoscientific Model Develop., vol. 10,
no. 5, pp. 1903–1925, 2017. doi: 10.5194/gmd-10-1903-2017.
[71] D. G. Miralles, T. R. H. Holmes, R. A. M. De Jeu, J. H. Gash, A. G.
C. A. Meesters, and A. J. Dolman, “Global land-surface evaporation estimated from satellite-based observations,” Hydrol. Earth
Syst. Sci., vol. 15, no. 2, pp. 453–469, 2011. doi: 10.5194/hess15-453-2011.
[72] V. Laparra and R. Santos-Rodriguez, “Spatial/spectral information trade-off in hyperspectral images,” in Proc. IEEE Int. Geosci.
Remote Sens. Symp. (IGARSS), 2015, pp. 1124–1127.
[73] M. L. Waskom, “seaborn: statistical data visualization,” J.
Open Source Softw., vol. 6, no. 60, p. 3021, 2021. doi: 10.21105/
joss.03021.
[74] J. D. Hunter, “Matplotlib: A 2D graphics environment,”Comput.
Sci. Eng., vol. 9, no. 3, pp. 90–95, May-June 2007. doi: 10.1109/
MCSE.2007.55.
[75] L. Gomez-Chova, D. Fernández-Prieto, J. Calpe, E. Soria, J. Vila,
and G. Camps-Valls, “Urban monitoring using multi-temporal
SAR and multi-spectral data,” Pattern Recognit. Lett., vol. 27, no.
4, pp. 234–243, 2006. doi: 10.1016/j.patrec.2005.08.004.
[76] P. Castracane et al., “Monitoring urban sprawl and its trends
with EO data. UrbEx, a prototype national service from a
WWF-ESA joint effort,” in Proc. 2003 2nd GRSS/ISPRS Joint
Workshop on Remote Sens. Data Fusion over Urban Areas, pp. 245–
248. doi: 10.1109/DFUA.2003.1219997
[77] D. Foreman-Mackey, “corner.py: Scatterplot matrices in Python,” J. Open Source Softw., vol. 1, no. 2, June 2016. doi:
10.21105/joss.00024.
[78] Y. Yang and S. Newsam, “Bag-of-visual-words and spatial
extensions for land-use classification,” in Proc. 18th SIGSPATIAL Int. Conf. Adv. Geographic Inf. Syst. (GIS ‘10), Nov. 2010, pp.
270–279. doi: 10.1145/1869790.1869829
[79] QGIS Development Team. QGIS Geographic Information System.
(2021). QGIS Association. [Online]. Available: https://www
.qgis.org
[80] S. Hoyer and J. Hamman, “xarray: N-D labeled arrays and datasets in Python,” J. Open Res. Softw., vol. 5, no. 1, p.10, 2017. doi:
10.5334/jors.148.
[81] Met Office, 2010–2015, “Cartopy: A Cartographic Python Library With a Matplotlib Interface,” Exeter, Devon. [Online].
Available: http://scitools.org.uk/cartopy
GRS

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

Wireless Sensor
Networks Applied to
Precision Agriculture
A worldwide literature review with emphasis on Latin America

MÓNICA KAREL HUERTA, ANDREA GARCÍA-CEDEÑO,
JUAN CARLOS GUILLERMO, AND ROGER CLOTET

griculture is fundamental to the economic and social
development of populations worldwide since the
food of millions of people depends on agriculture. According to the Food and Agriculture Organization (FAO)
of the United Nations, in 2017, more than 100 million
people were food insecure. In developing countries, where
this situation is more pronounced, agriculture is a family activity in which farming processes don’t make use of
technology. The use of wireless sensor networks (WSNs)
to provide precision agriculture (PA) has demonstrated
Digital Object Identifier 10.1109/MGRS.2020.3044235
Date of current version: 9 February 2021

DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

positive results related to crop yields and resource management, which raises the need to determine the progress of
research on the impact of these technologies. This article
analyzes different proposals focused on the optimization
of agricultural processes, with particular focus on their benefit to small and medium producers and, therefore, to food
security. A literature review was conducted with an emphasis on scientific developments in Latin America, a region
where family farming is one of the main economic activities. Through this study, it was possible to generate indicators of development and general successes and setbacks as
well as to determine the main technical characteristics and
most addressed topics in publications on this subject.
0274-6638/21©2021IEEE

209

PROBLEMS IN AGRICULTURE
Today, millions of people suffer from hunger due to the lack
of an available and stable of food supply, the main causes
of food insecurity [1]–[3]. An example of this is a situation
in which the inhabitants of a given territory do not have access to enough food to maintain a healthy life [3]–[7]. Food
insecurity is also a growing concern worldwide because of
its serious consequences: the degradation of health, acute
malnutrition, increased risk of birth defects, and high mortality [4], [7]–[10]. According to statistics from FAO, in 2017,
124 million people were food insecure in 51 countries [11].
Among the different factors that give rise to these conditions are climate change, government policies, inequitable
land distribution, chronic poverty, and insufficiently developed agriculture. Agriculture is a key activity in overcoming
famine and promoting economic growth [2], [3], [5]–[7],
[12]–[14]. Despite the fact that the practice of agriculture
depends on natural resources, its irresponsible use results
in environmental degradation: water scarcity, soil erosion,
greenhouse gas emissions, and deforestation [2], [15], [16].
In the most advanced countries, priority is given to the industrial agricultural model that depends on hydrocarbons,
external energy, and agrochemicals, generating an unfavorable impact on biodiversity and human beings [16], [17].
In contrast to first world countries, the agricultural economic sector of many nations is constituted mostly of family
farms [18]. Family agriculture contributes approximately 50%
of crops globally [1]; in Latin America and Africa, their lands
correspond to 34.5 and 80%, respectively [16]. Nevertheless,
family agriculture production normally doesn’t reach markets. On the basis of this information, it has been established
that improving the performance of these small producers will
significantly increase the availability of food [16], [19], [20].
Given these circumstances, there is a need to adopt new
sustainable procedures and apply them to the agricultural
sector, with a focus on environmental conservation, profitability through the appropriate use of resources, and promotion of sustainable family-farming production [1], [2],
[5], [13], [15], [16]. Such methods form a safeguard against
food insecurity because of their economic viability and
guarantee of maintaining or increasing agricultural yields
for present and future generations [1], [13], [14], [16], [21].
With regard to government initiatives, most administrations intended to guide their policies and financing plans in
favor of low-income farmers and to implement technology
in the fields [4], [5], [13], [17], [22]. However, the impact has
been minimal as small producers are unable to adopt the
innovations or they do so inappropriately [17], [23]–[26].
Due to the gap between small farmers and technology, the
scientific community is working on various proposals aimed
at optimizing and combating obstacles generated by climate
factors through the use of various technologies: WSNs, mobile geographic information systems, geostatistics, and spectrum analysis, among others [1], [16], [19], [20], [24], [27]–
[29]. The application of these techniques in different crop
processes constitutes a new trend of PA [30]–[32].
210

Within PA, the use of WSNs is the most common technique for improving of traditional agricultural processes
[30], [31], [33]–[35]. It is a noninvasive method that monitors information regarding the number of resources, the
weather, and environmental factors. Data processing can
generate systems for predicting and modeling agricultureaffected parameters to respond to, for example, climate
change [36]. As a consequence, appropriate agronomic
techniques that comply with the principle of sustainability
and establish a solution to food insecurity are designed for
a given crop. Therefore, the use of PA increases yields and
leads to a crop-management strategy, where investment in
agrochemicals is reduced and profits increase due to the
consequent productivity growth [30], [31].
The established problems support the indisputable importance of WSNs in PA for the promotion of agriculture, based
on scientific evidence presented in literature reviews and surveys analyzing the application of this technology and its effectiveness. In light of the results and to provide continuity to
this technology’s development, this article seeks to establish
the main technical characteristics of several articles and determine their impact on and benefit for small producers.
TECHNOLOGY AND FAMILY-FARMING PRODUCTION
Within PA is a large amount of scientific evidence that reflects the positive effects of WSNs. The application of this
method allows farmers to identify the interaction between
crop growth and soil/environmental factors. Its effectiveness
is manifested in decision making based on monitored data
regarding agronomic techniques suitable for a given crop as
well as for the protection of the environment and farm products [26], [37]–[39]. However, in Latin American countries,
where agriculture is a major economic activity, most of the
agriculture-producing population belongs to rural sectors and
practices farming at a family level as a source of livelihood.
The biggest drawback for the rural agricultural sector
is the lack of synergy between popular advice and scientific development organizations. This is an obstacle to the
inclusion of innovative, low-cost proposals for rural farmers. Faced with this problem, governments have opted for
measures that promote domestic agriculture [1], [6], [16].
Nevertheless, technological solutions can be perceived incorrectly if they are defined primarily as highly sophisticated devices instead of as techniques that generate new capacities to produce goods and services [5]. In the same area,
there is another impediment concerning education: small
farmers cannot acquire industrial electronic solutions due
to their high prices and the farmers’ lack of access to the
training required for the use of such equipment [23].
METHODOLOGY
To obtain current information that shows the state of scientific
technological research on agriculture WSNs and their impact
on small producers, we have analyzed research documents
developed in various regions of the planet, with an emphasis
on Latin America. The literature review will be a starting point
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

ANALYSIS AND RESULTS
In this section, we extract data regarding the implementation and evaluation of the various platforms detailed in the
articles. Applying the selection criteria specified in Table 1,
a total of 86 documents match these criteria as primary
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

scientific research documents from January 2005 to December of 2019 [25], [44]–[127]. The results and analysis of
these documents were divided into three sections: utility,
technical characteristics, and implementation.
UTILITY
The documentary sample is composed of documentation on
platforms that function as a support tool with a specific utility. To determine the main uses of the technology in question, we analyzed the most common applications for which
the various sensor networks have been evaluated or designed.
Likewise, the variables and parameters that coincide in the
majority of the documentary sample were extracted.
This review uses the term variable as a reference to the
components of an ecosystem: soil, plant, environment, and
others. The state and features of each variable have been set
as parameters, a term used to reference chemical and physical
indicators, such as humidity, temperature, pH, and others.
As can be seen in Figure 2, according to the information collected, it is established that the main applications of WSNs in
agriculture are oriented toward the appropriate administration

“Wireless Sensor Network” and “Precision Agriculture”
Documents Sample, Sorted by Year
30
Documents (%)

25
20
15
10
5
0
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019

for future projects in the search to meet environmental needs
and for proposing solutions to reduce food insecurity.
Articles from the Scopus database were considered due
to its prestige, magnitude, and the quality of content [40].
In addition, because we wanted to give geographical priority to Latin America and found scarce evidence about it in
the main database, we included works from digital repositories of universities and research institutes as well as from
journals in the main Latin American indexes [41], [42].
The established methodology is based on a qualitative
study of primary research documents. The documents correspond to indexed articles related to proposals and studies
of WSN-based platforms developed in the last 14 years and
applied to PA. For the extraction of scientific articles, the
following search string was applied: “wireless sensor network” and “precision agriculture,” for a metadata analysis
corresponding to the title, abstract, and keywords.
The sample size of matching documents was 694, resulting from the search string plus a temporary filter that
limits the selection of works to those published from January 2005 to December 2019. By applying Cochran’s sample
calculation formula [43], a 95% confidence level, ±10% accuracy, and maximum variability are assumed, obtaining
86 as a result—a value that corresponds to the number of
documents to be analyzed (see Figure 1).
Using inclusion and exclusion criteria (see Table 1), we
made the selection of articles that belong to the sample.
Similarly, we carried out the extraction of the technical data
and results regarding the implementation of these proposals. From the information acquired, the main parameters
and variables of monitoring were analyzed as well as the
acquisition devices and their accessibility according to the
use of open hardware and open source software. Other assessments correspond to the most used sensors and brands,
communication protocols, application areas, the development of graphic interfaces, and the creation of open databases. Similarly, the main research problems that have
motivated the development of these proposals, the beneficiaries, and the effects of the implementations of these proposals at scientific and social levels were considered.
As a final stage, the information collected is classified
according to
◗◗ utility: the main applications for which the different networks were designed
◗◗ technical characteristics: to obtain indicators of the most
commonly used equipment and materials
◗◗ implementation: to determine the type of crops in which
WSN technology is implemented and the level of scientific development in different regions, with emphasis on
Latin America.

Year
FIGURE 1. The documents sample, sorted by year.

TABLE 1. THE INCLUSION AND EXCLUSION CRITERIA FOR
HARDWARE-RELATED ARTICLES.
INCLUDED

EXCLUDED

Primary research presenting
a proposal for a WSN-based
platform applied to agriculture

Documentation of secondary research of any kind

Documents presenting an
agricultural technological
proposal, by which a minimum
of one physical or chemical
parameter is monitored

Documents describing agricultural
platforms based on non-WSN
technologies

Documentation detailing
the parameters monitored or
the use of hardware in terms
of sensing and data acquisition

Documentation that does not
technically express hardware
specifications or parameters to be
monitored

Documents whose proposals
are physically applied in a
laboratory, greenhouse, or
open field

Documents focused only on software
matters, such as algorithms,
information processing, and device
programming, among others

211

of water resources reflected in irrigation systems and toward
increasing of production and efficiency of crops through the
automation and optimization of processes. These two practices
correspond to 26.7 and 36.1% of the projects, respectively.
On a smaller scale, the objectives of the technical evaluation of network performance and development of inclusive
technologies, whether low cost or user friendly, represent
20.9 and 10.5%, respectively. Platforms aimed at improving
the final quality of products and reducing the vulnerability
of crops to climate change account for 4.7% each, and pest
treatment accounts for 2.3%.
The results can be grouped into four areas. First is the improvement of crop yields, which includes production optimization and the automation of water management and pest
treatment, representing 65.1%. Second is the evaluation of
the proposals and/or how to make them affordable, both of
which focus on generalizing the use of WSNs; in this case,
they represent 31.4%. In third and fourth place are final
product quality and climate change vulnerability reduction,
both cases with 4.7%.
WSN Applications in Agriculture
Documents (%)

36.1%
26.7%
20.9%

10.5%
4.7%
0

2.3%

Water Management/Irrigation
Production Optimization
Performance Technical Evaluation
Inclusive Technology/Affordable Solutions
Final Product Quality
Climate Change Vulnerability Reduction
Pest Treatment
FIGURE 2. The main applications of WSN in agriculture.

Analyzed Variables

77.91
72.09
16.28

Documents (%)
Plant

Environment

Soil

FIGURE 3. The percentage representation of analyzed variables
within the documentary sample.

212

From the Figure 2 background data, three major aspects
are deduced. First, a solution for optimization in agriculture is needed. Using WSNs, farmers can reduce operational costs and/or improve productivity to obtain higher incomes. Additionally, the environmental impact is reduced
with this optimization as a result of improved awareness of
greener production (environment friendly, organic, and so
forth) and the completion of different required regulations
(for example, the reduction of pesticides in pest control).
As a result of this enhanced leverage, farmers get a betterquality crop. The second aspect is the limited resource of
water. Its constraint is exacerbated by climate change, and
it is estimated that about 70% of freshwater will be consumed by 2050 [128]. However, it is in great demand in agriculture for different kinds of irrigation, from flooding to
dripping; this has led to conflicts in establishing fair usage
among different consumers. The application of WSNs optimizes water management and enables a more equitable
distribution. Third, WSN solutions must be tested and validated to demonstrate their feasibility. Working with WSNs
has many advantages over other technologies: low cost,
adaptability, an easy learning curve, inclusive technology
with open and proprietary solutions, a good cost–benefit
relation, and the possibility of nondestructive tests of the
technology in crops. All of this makes many preliminary
studies possible, as the number of articles shows.
To make the right agronomical decisions possible, WSN
applications need to monitor various parameters. From the
analyzed works, a variable classification was made based on
where data were acquired: in “plant,” “environment,” and
“soil.” Figure 3 shows that soil stands out most frequently,
with 77.91%, followed closely by the environment (72.09%)
and, lastly, in a much smaller value, by the characteristics of
the plant or fruit, corresponding to 16.28% of the total investigations from the documentary sample. Normally, different
parameters were monitored simultaneously. Also, sometimes,
the same parameter was measured for distinct variables, for
example, humidity in the soil and environment. Monitoring plants is more difficult since, as living entities, they grow
based on each of their own characteristics—height, foliage,
thickness, and so on—resulting in the need to relocate sensors
to correct measurement and function. In our experience, the
direct monitoring of plants entailed some problems in relation to that specific issue; sensors lost the line of sight, which
reduced the effective range of transmission and, in the worst
case, resulted in losing all connectivity or part of the data. In
some situations, increasing the power of transmission, with
the handicap of more power consumption, may mitigate
these disruptions. It is sufficient to effectively assess the condition of crops; most scientific documents rule out the direct
surveillance of plants. For instance, water/irrigation management can be monitored with a high degree of accuracy using
two variables: soil moisture and ambient humidity. These parameters are specific and are present in many investigations.
For each acquisition location, further analysis is performed, identifying individual monitored variables. Figure 4
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

75
Documents (%)

displays that information. For soil,
the most frequently monitored parameter is moisture, followed by
temperature and, to a lesser extent,
pH and electrical conductivity. Concerning the environment, temperature is the most monitored parameter, followed by relative humidity
and lighting or solar radiation. On a
minor scale, the parameters of wind
speed and direction, atmospheric
pressure, and gas concentration are
considered. From the scarce documentation focused on monitoring
plant variables, temperature, humidity, and the diameter of the stem or
trunk are the most common.
The common use of humidity and
temperature parameters, either in the
soil or in the environment, together
with the luminance (the amount of solar radiation the crop receives), is not
surprising. As can be deduced from
most studies, and as we can reaffirm
with our experience, they are the main
agents that have the most significant
impact on the appropriate development of the majority of crops.

Sensorized Parameters Per Variable
68.6

65.12

25
6.98
0

Soil

Environment

Moisture/Relative Humidity
Luminosity
Electric Conductivity
Carbon Dioxide Concentration
Trunk Diameter Growth

6.98
Plant

Temperature
pH
Pressure
Wind Speed/Direction
Others

FIGURE 4. The percentage representation of measured parameters per variable within the

documentary sample.

Campbell:
4.7%
Meter Group
(Formerly Decagon):
15.1%

Honeywell:
4.7%

TECHNICAL CHARACTERISTICS
An important factor is the technical
Davis
characteristics of the sensing device. To
Instruments:
Most
analyze this information about sensors,
2.3%
Popular
the communication protocols, user inVegetronix:
Sensor Brands
2.3%
terface, and availability of freely accessible data were extracted. The objective
of this was to collect the main techniFIGURE 5. The percentage representation of the most commonly used sensor brands within
cal characteristics of WSNs in the sethe documentary sample.
lected articles and then to determine
the included needs and specify which
new approaches would be convenient for future research.
TABLE 2. THE MOST POPULAR SENSORS.
To obtain this information, the frequency of the use of
the sensing hardware within the documentary sample was
MODEL
PARAMETER
analyzed, revealing which models and brands of sensors
SHT1X by Sensirion
Digital humidity sensor
stand out among the great diversity that exists. This inforSHT7X by Sensirion
Digital relative humidity and temperature sensor
mation can be seen in Figure 5 and Table 2.
DHT11 by Aosong
Digital temperature and humidity sensor
Meter Group (formerly Decagon and UMS) has more
DHT22 by Aosong
Digital temperature and humidity sensor
than half of the market; it has specialized in agricultural and
EC-5 by Meter Group Soil moisture/volumetric water content
(formerly Decagon)
food sensors since its creation in the 1980s. Other brands
FC-28, YL-69, and
Soil moisture
share the rest of the market without a clear second position.
HL-69 modules for
Most sensors created by these vendors are centered in huArduino
midity or temperature monitoring in the environment, soil,
LM-35 by various
Precision centigrade temperature
or plant. Both Decagon and UMS were pioneers before their
manufacturers
merger and have been in the market for a long time with
DS18B20 by various
Digital thermometer
manufacturers
well-known products and broad support, which gives them
an advantage over other companies.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

213

Documents (%)

material does not specify the communication protocol used, making analyWireless Communication Protocols
sis impossible in such cases. Recently,
Does Not
Specify: 19.1%
LoRaWAN has been increasingly used.
It is a newer protocol, with its first version defined in 2015. It has a higher
range of transmission (up to 15 km
Others: 6.7%
in the countryside), more bandwidth,
and only a slightly increased power
consumption (compensated for in
Zigbee: 56.2
Zigbee:
56.2%
%
LoRaWAN: 4.5%
part by the evolution of technology,
with better batteries and more effiWi-Fi: 4.5%
cient solar panels).
Bluetooth: 4.5%
According to the devices used
GSM: 2.3%
for data acquisition, as displayed in
RFID: 2.3%
Figure 7, about 42% of documents
make use of commercial or private
FIGURE 6. The percentage representation of the most commonly used wireless communicamotes or nodes. Motes and nodes are
tion protocols.
terms that refer to platforms whose
development depends on closedsource software or whose architecture expansion is limited
Data Acquisition Device
to components of the same company that markets them.
Affordability Evaluation
The most important companies in private motes are Mem60
sic (formerly Crossbow Technology) [136] and Waspmote
50
50
[137]. About 50% of the devices use open hardware nodes,
41.86
most
of them belonging to different microcontroller mod40
18.6
12.79
els and various versions of development boards from the
30
Arduino microcontroller platform. Normally, research doc6.98
20
uments use open hardware due to its low cost and flexibil27.91
ity. Commercial hardware’s strength is its reliability, while
22.09
10
8.14
its weakness is limited parametrization and higher cost. In
8.14
3.49
0
open hardware, microcontrollers are the most used thanks
Private/
Open
No
to their flexibility and adaptability.
Commercial
Hardware
Specification
As a part of this review, we verified whether the authors
Crossbow–Memsic
prioritized easy and friendly interaction between the platWaspmote
form and the user through the development of a graphiOthers
cal user interface (GUI), and we determined the type of
Raspberry
application the authors chose. Many of the studies didn’t
Microcontrollers
include the development of a GUI in their methodology
Arduino
or didn’t consider it important to report; as can be seen in
Does Not Specify
Figure 8, this is the case in 50% of the analyzed documents.
When the authors used a GUI and reported it, most of them
FIGURE 7. The classification of data acquisition devices by open
developed a web application, representing 36.05%, while
source and private characterization within the documentary
8.14% used the native mote application, and only 3.49%
sample.
developed a mobile application.
Papers reporting only research proposals normally
didn’t
include or report a GUI. Those reporting applied
To communicate between sensors or with the sink, many
research, for example, with farmers as the end users, were
different wireless protocols may be used. The most commonmore prone to include and report a developed GUI. The end
ly used one is Zigbee (IEEE Standard 802.15.4) [129], thanks
users don’t have to know the technical details and prefer a
to its low energy consumption. It was defined in 2004 and
user-friendly interface.
focuses on low-range and low-consumption connectivity. As
presented in Figure 6, more than half of the collected maIMPLEMENTATION
terial uses Zigbee (56.2%). Representing no more than 5%
We also analyzed the type of farming conditions on which
of the documents each, other protocols are used: Bluetooth
the research focused, the validated results of the applica[130], Wi-Fi [131], LoRaWAN [132], [133], radio-frequency
tion of this technology, and, finally, the regions where
identification [134], and GSM [135]. Almost 20% of the
214

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

Graphic User Interface

36.05
3.49
1.16
8.14
50

20 25 30 35
Documents (%)

Web Application
Mobile Application
Hybrid Application

50 55

Native Application
Not Specified

FIGURE 8. The percentage representation of GUIs within the

documentary sample.

Research
Documentation Distribution per Region
Indexed Articles
Digital Repositories

30
20
10

Oceania

Africa

North
America

Europe

Asia

0
Latin America

CONCLUSIONS
Progress in the development of WSNs applied for PA is part
of the solution for food insecurity because results from different authors show that, through this technology, it is possible
to guarantee the quality of products, optimize processes to

increase production and/or preserve resources, and reduce the
gap between small farmers and new technologies. According
to the experience in all of the analyzed publications, the application of WSNs for PA is very beneficial in terms of time,
production, and environmental care factors. In the studied
documentary sample, the parameters selected most often for
monitoring correspond to humidity and the temperature of
the soil and environment, which constitute sufficient basic information to generate optimal agricultural plans. The main focus of the scientific community for the development of WSNbased technologies for PA is production optimization and
water management through the design of irrigation systems.
Furthermore, it is agreed that these platforms, if based
on open source and open hardware technology, correspond
to inclusive solutions for small and medium farmers, favoring more than one economic sector. However, many of
the proposals do not direct their development objectives
toward specific beneficiaries, such as populations affected

Documents

the research was deployed to agriculture. The implementation of the proposed platforms was, in most cases, in an
open field (54.65%), followed by a greenhouse (20.93%),
laboratories (5.81%), and not specified (18.61%). Normally, research started with a laboratory test and then
moved to a greenhouse (as a more controlled ambient
environment) and, finally, an open field. In some cases,
due to the crop type, the greenhouse was the final stage.
The implementation of WSNs for PA has achieved the
following, according to the recollected conclusions from
different researchers in the analyzed documents:
◗◗ a considerable reduction in water consumption through
irrigation, based on monitored data rather than scheduled watering
◗◗ decision support to farmers and the generation of autonomous practices through automated processes, implemented together with the monitoring system
◗◗ increased crop yields and optimized crop growth while
reducing the use of resources
◗◗ investigation of the variations of environmental conditions
◗◗ a decrease in the price of the systems compared to industrial solutions, thanks to the implementation of open
software and hardware, which provide similar yields.
From a regional point of view, studied works are from
various countries on five continents. Asia turned out to be
the continent with the largest number of publications, with
a total of 27 documents. From Latin America, the priority
region for this analysis, nine articles were considered. There
were 16 from Europe, nine from North America, four from
Africa, and two from Oceania, as presented in Figure 9. As
a result of the analysis concerning Latin America, the countries that contributed notable published research in the
field of PA, including undergraduate and graduate theses,
are Colombia, Mexico, Brazil, Ecuador, and Argentina. The
geographic location of all of the reviewed documents and
whether they are indexed publications or correspond to degree projects in digital repositories are outlined in Figure 10.
On one hand, Asia has the largest population of the five
continents and a huge extension of crops (extensive or greenhouse). For these reasons and with many universities and
research centers, Asian researchers explored and applied PA
to improve crop management, resulting in more published
documents in the area. On the other hand, Latin American
countries have a high percentage of family farming [138] and
also have the need for better crop management. However,
with a limited budget in research and with smaller universities, they have more difficulty making research contributions
in the area. For these reasons, some analyzed documents in
this region are from digital repositories.

FIGURE 9. The percentage representation of the regional distribution of the documentary sample.

215

216

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

FIGURE 10. The documentary sample’s geographical locations.

Digital Repositories

Indexed Articles

Colombia (4)
Ecuador (1)
Ecuador (2)

Mexico (3)

United States (9)

Canada (1)

Iran (1)

Egypt (2)
Palestine (1)

Kenya (1)

Argentina (1)

Brazil (2)
Brazil (1)

Colombia (4)

Portugal (2)
Spain (5)
Tunisia (1)

Malawi (1)

Australia (1)

Vietnam (3)
Indonesia (1)

Thailand (1)

India (8)

China (13)

Macedonia (1)
Greece (4)

Germany (2)
Italy (4)

New Zealand (1)

Malaysia (3)

Philippines (1)

Taiwan (1)

by their economic capacity or a low level of production.
The documents corroborate the validity of the platforms
for general purposes, such as water saving and crop monitoring, without reflecting economic–social results.
The East, South, and Insular Asia regions are the main
producers of scientific documents related to the implementation of WSNs for PA. Latin America is the second largest producer of scientific documents on this topic. This
contrasts with the reality that the level of Latin American
research cannot yet be compared with that of agricultural
power regions, such as China, India, and the United States.
This situation is justified due to the inclusion of the digital repository criteria since, in Latin America, most useful
contributions correspond to undergraduate and graduate
projects that, in some cases, are not published in indexed
repositories. The documents from this region are articles
released in regional journals, indexed in Latinindex and
Publindex, whose impact is not comparable to global databases, such as Scopus or IEEE Xplore.
Another feature that stands out in this analysis is the
lack of importance given to user friendliness, with some
systems requiring permanent assistance from a technician
and intensive training because of the lack of a user-friendly
interface. There are very few primary research articles experimenting with new wireless communication protocols,
such as LoRaWAN. Most of them focus on the application
of Zigbee, and fewer of them center on other well-known
protocols. The most notable deficiency in the scientific research of this technology corresponds to the lack of freely
accessible data, which would generate starting points for innovation through big data or machine learning to provide
services adapted to different crop conditions and forecast
the variation of diverse environmental indicators over time.
ACKNOWLEDGMENTS
The authors would like to express their gratitude for the
support of the PLAGRI (Plataforma de digitalización agrÍcola
para pymes basada en IOT/IOT-based agricultural digitization
platform for SMEs) project by the Telecommunications and
Telematics Research Group (GITEL), Universidad Politécnica
Salesiana, Cuenca, Ecuador. They would also like to thank
project EXT-2020-06 “Sistema de alerta temprana de heladas”
from CONGOPE–ESPE and project RED DUS-C-01 “MASCHA – Monitoreo de microclima urbano” from RedDUS-C-ESPE.
Additionally, the authors gratefully acknowledge the contributions of Jose Ignacio Castillo and Wolfgang Lichtenwagner
for their suggestions in the original version of this document.
AUTHOR INFORMATION
Mónica Karel Huerta (mhuerta@ups.edu.ec) received her

M.Sc. degrees in biomedical engineering and electrical engineering from Universidad Simón Bolívar (USB) in 1999
and 1994, respectively, and her Ph.D. degree (cum laude)
in telematics engineering from the Universitat Politècnica
de Catalunya (Spain) in 2006. She is with the Telecommunications and Telematics Research Group (GITEL),
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

Universidad Politécnica Salesiana, Cuenca, 010102, Ecuador. She is a full professor at the Universidad Politécnica
Salesiana of Cuenca–Ecuador, Cuenca, 010102, Ecuador.
She was a professor, dean of graduate studies, coordinator of the doctorate in engineering, and the founder of the
Networks and Telematics group at USB. She was a researcher at the Universidad de las Fuerzas Armadas Escuela Politécnica del Ejército in 2014, and in 2015 and 2016 she was
also a researcher at the Universidad Politécnica Salesiana
of Cuenca, both under the Prometeo program of Secretaría
de Educación Superior, Ciencia, Tecnología e Innovación,
Ecuador. Her research focuses on wireless networks, wireless sensor networks, precision agriculture, the Internet of
Things, and telemedicine. She is a Senior Member of IEEE,
vice president of the IEEE Ecuador Section for 2020–2021,
and a member of the IEEE Women in Engineering, Women
in Communications, and Women in Engineering in Medicine and Biology.
Andrea García-Cedeño (agarciac@ups.edu.ec) received
her B.S. degree in electronic engineering with a major in industrial systems from Universidad Politécnica Salesiana in
2017. Since 2017, she has been a research assistant at Telecommunications and Telematics Research Group (GITEL),
Universidad Politécnica Salesiana, Cuenca, 010102, Ecuador. Her research interests include precision agriculture,
wireless sensor networks, and signals acquisition and processing. She is a Member of IEEE and currently serves as the
Chapter secretary of the IEEE Engineering in Medicine and
Biology Society of the IEEE Ecuador Section.
Juan Carlos Guillermo (jguillermo@ups.edu.ec) received
his degree in systems engineering (computer engineering)
with a major in telematics from Universidad Politécnica
Salesiana in 2015. From 2015 to 2017, he was a research assistant with the Computer Science Department of the Universidad de Cuenca. From 2017 to 2019, he was a research
assistant at Telecommunications and Telematics Research
Group (GITEL), Universidad Politécnica Salesiana, 010102
Cuenca, Ecuador. His research interests include projects related to the Internet of Things applied to agriculture and
weather stations, radiological image analysis and processing, network security, and the analysis and processing of
big data. He is a Member of IEEE.
Roger Clotet (roger.clotet@campusviu.es) received
his Ph.D. degree in engineering from Universidad Simón
Bolívar (USB), Venezuela, in 2019, and is a computer science engineer with the Universitat Politècnica de Catalunya, Spain in 2004. Currently, he is a teacher and researcher at Valencian International University (VIU) and
a member of the Astronomy, Big Data, and Computing
Science Group, VIU, Valencia, 46002, Spain. He taught
in the Computer Science Department of USB between
2010 and 2013 and at the Telecommunications Engineering School of Universidad Católica Andrés Bello between
2011 and 2013, both in Caracas, Venezuela. He was a researcher with the Networks and Applied Telematics group
at USB from 2009 to 2019. His current research interests
217

include electronic health records, telemedicine, e-health,
e-agriculture, big data, and wireless sensor networks. He is
a Senior Member of IEEE.
REFERENCES
[1]

[2]

[3]

[4]
[5]

[6]

[7]
[8]

[9]

[10]

[11]

[12]

[13]

[14]

218

M. Á. Altieri and C. I. Nicholls, “Agroecología: única esperanza
para la soberanía alimentaria y la resiliencia socioecológica,”
Agroecología, vol. 7, no. 2, pp. 65–83, 2013.
O. L. Balogun, “Sustainable agriculture and food crisis in subSahara Africa,” in Global Food Insecurity, M. Behnassi, S. Draggan, and S. Yaya, Eds. Berlin: Springer-Verlag, 2011, pp. 283–
297.
M. Behnassi, S. Draggan, and S. Yaya, Global Food Insecurity: Rethinking Agricultural and Rural Development Paradigm and Policy.
Berlin: Springer-Verlag, 2011.
M. Sassi, Understanding Food Insecurity: Key Features, Indicators,
and Response Design. Berlin: Springer-Verlag, 2017.
U. Haruna and M. B. Umar, “Agricultural development for food
security and sustainability in Nigeria,” in Global Food Insecurity,
M. Behnassi, S. Draggan, and S. Yaya, Eds. Berlin: Springer-Verlag, 2011, pp. 63–71.
G. Kasza, J. Szigeti, S. Podruzsik, and K. Keszthelyi, “Risk communication at the Hungarian guar-gum scandal,” in Global Food
Insecurity, M. Behnassi, S. Draggan, and S. Yaya, Eds. Berlin:
Springer-Verlag, 2011, pp. 173–183.
C. B. Barrett, “Measuring food insecurity,” Science, vol. 327, no.
5967, pp. 825–828, 2010. doi: 10.1126/science.1182768.
K. L. Sharma, “Food security in the south pacific island countries with special reference to the Fiji Islands,” in Food Insecurity,
Vulnerability and Human Rights Failure, B. Guha-Khasnobis, S. S.
Acharya, and B. Davis, Eds. Berlin: Springer-Verlag, 2007, pp.
35–57.
A. Charman and J. Hodge, “Food security in the SADC region:
An assessment of national trade strategy in the context of the
2001–03 food crisis,” in Food Insecurity, Vulnerability and Human
Rights Failure, B. Guha-Khasnobis, S. S. Acharya, and B. Davis,
Eds. Berlin: Springer-Verlag, 2007, pp. 58–81.
C. Gundersen and J. P. Ziliak, “Food insecurity and health outcomes,” Health Affairs, vol. 34, no. 11, pp. 1830–1839, 2015. doi:
10.1377/hlthaff.2015.0645.
“Las crisis alimentarias continúan golpeando: el hambre aguda
se intensifica,” Organización de las Naciones Unidas para
la Alimentación y la Agricultura, FAO Headquarters, Rome,
Italy. 2018. Accessed: June 8, 2019. [Online] Available: http://
www.fao.org/news/story/es/item/1110457/icode/
H. G. Bohle, T. E. Downing, and M. J. Watts, “Climate change
and social vulnerability: Toward a sociology and geography of
food insecurity,” Global Environ. Change, vol. 4, no. 1, pp. 37–48,
1994. doi: 10.1016/0959-3780(94)90020-5.
M. S. I. Molla, “18,000 children die of starvation everyday:
Cannot we save them?” in Global Food Insecurity, M. Behnassi,
S. Draggan, and S. Yaya, Eds. Berlin: Springer-Verlag, 2011, pp.
127–147.
R. Gebbers and V. I. Adamchuk, “Precision agriculture and food
security,” Science, vol. 327, no. 5967, pp. 828–831, 2010. doi:
10.1126/science.1183899.

[15] B. Cerfontaine, S. Panhuysen, and C. Wunderlich, Sostenibilidad
Agrícola: Kit de herramientas de planificación. Sustainable Commodity Assistance Network, Winnipeg, Canada, 2014.
[16] M. Altieri and C. Nicholls, “Agroecología: Potenciando la agricultura campesina para revertir el hambre y la inseguridad alimentaria en el mundo,” Revista de Economía Crítica, vol. 10, no.
2, pp. 62–74, 2010.
[17] H. Valenzuela, “Agroecology: A global paradigm to challenge
mainstream industrial agriculture,” Horticulturae, vol. 2, no. 1,
p. 2, 2016. doi: 10.3390/horticulturae2010002.
[18] “Agricultura familiar y desarrollo territorial rural en América
Latina y el Caribe,” Organización de las Naciones Unidas
para la Alimentación y la Agricultura, FAO Headquarters,
Rome, Italy, 2014. Accessed: June 12, 2019. [Online] Available:
http://www.fao.org/3/a-at886s.pdf
[19] D. Mulla and R. Khosla, “Historical evolution and recent advances in precision farming,” in Soil-Specific Farming Precision
Agriculture, R. Lal and B. A. Stewart, Eds. Boca Raton, FL: CRC
Press, 2016, pp.1–35.
[20] R. Bongiovanni, E. Mantovani, S. Best, and Á. Roel, “Agricultura de precisión: Integrando conocimientos para una agricultura moderna y sustentable,” Procisur/IICA, 2006.
[21] Y. Lambrou and R. Laub, “Gender, local knowledge and lessons
learnt in documenting and conserving agrobiodiversity,” in
Food Insecurity, Vulnerability and Human Rights Failure, B. GuhaKhasnobis, S. S. Acharya, and B. Davis, Eds. Berlin: SpringerVerlag, 2007, pp. 161–194.
[22] M. Piamonte, M. Huerta, R. Clotet, J. Padilla, T. Vargas, and D.
Rivas, “WSN prototype for African oil palm bud rot monitoring,” in International Conference of ICT for Adapting Agriculture
to Climate Change, P. Angelov, J. Iglesias, and J. Corrales, Eds.
Berlin: Springer-Verlag, 2017, pp. 170–181.
[23] J. Wanjiku, J. U. Manyengo, W. Oluoch-Kosura, and J. T. Karugia, “Gender differentiation in the analysis of alternative farm
mechanization choices on small farms in Kenya,” in Food Insecurity, Vulnerability and Human Rights Failure, B. Guha-Khasnobis, S. S. Acharya, and B. Davis, Eds. Berlin: Springer-Verlag,
2007, pp. 194–218.
[24] “Sistemas de innovación para el desarrollo rural sostenible en América Latina y el Caribe,” Organización de
las Naciones Unidas para la Alimentación y la Agricultura, FAO Headquarters, Rome, Italy. Accessed:
June 17, 2019. [Online] Available: http://www.fao.org/3/
a-i7769s.pdf
[25] F. J. Ferrández-Pastor, J. M. García-Chamizo, M. Nieto-Hidalgo,
J. Mora-Pascual, and J. Mora-Martínez, “Developing ubiquitous
sensor network platform using Internet of Things: Application
in precision agriculture,” Sensors, vol. 16, no. 7, p. 1141, July
2016. doi: 10.3390/s16071141.
[26] I. H. Erden and O. Tozan, “Remote sensors and mobile technologies for precision agricultural data,” in Proc. 4th Int. Conf. AgroGeoinformatics (Agro-Geoinformatics), July 2015, pp. 105–108.
[27] G. Carrión, M. Huerta, and B. Barzallo, “Monitoring and irrigation of an urban garden using IoT,” in Proc. IEEE Colombian Conf.
Commun. Comput. (COLCOM), 2018, pp. 1–6. doi: 10.1109/ColComCon.2018.8466722.
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

[28] A. García-Cedeño et al., “Platano: Intelligent technological support platform for Azuay province farmers in Ecuador,” in Proc.
IEEE Int. Conf. Eng. Veracruz (ICEV), 2019, vol. 1, pp. 1–7. doi:
10.1109/ICEV.2019.8920501.
[29] M. Erazo-Rodas et al., “Multiparametric monitoring in equatorian tomato greenhouses (i): Wireless sensor network benchmarking,” Sensors, vol. 18, no. 8, p. 2555, 2018. doi: 10.3390/
s18082555.
[30] A. Baggio, “Wireless sensor networks in precision agriculture,”
in Proc. ACM Workshop Real-World Wireless Sensor Netw. (REALWSN 2005), Stockholm, Sweden, 2005, vol. 20, pp. 1567–1576.
[31] T. Ojha, S. Misra, and N. S. Raghuwanshi, “Wireless sensor networks for agriculture: The state-of-the-art in practice and future
challenges,” Comput. Electron. Agri., vol. 118, pp. 66–84, Oct.
2015. doi: 10.1016/j.compag.2015.08.011.
[32] E. García and F. Flego, “Agricultura de precisión,” Revista Ciencia y Tecnología, vol. 8, pp. 99–116, 2008. [Online]. Available:
https://www.palermo.edu/ingenieria/Ciencia_y_tecnologia/
ciencia_y_tecno_8.htm
[33] J. Abad et al., “Coffee crops variables monitoring: A case
of study in Ecuadorian Andes,” in International Conference
of ICT for Adapting Agriculture to Climate Change, J. Corrales,
P. Angelov, and J. Iglesias, Eds. Berlin: Springer-Verlag, 2018,
pp. 202–217.
[34] J. C. Guillermo, A. García-Cedeño, D. Rivas-Lalaleo, M. Huerta,
and R. Clotet, “IoT architecture based on wireless sensor network applied to agricultural monitoring: A case of study of
cacao crops in Ecuador,” in International Conference of ICT for
Adapting Agriculture to Climate Change, J. Corrales, P. Angelov,
and J. Iglesias, Eds. Berlin: Springer-Verlag, 2018, pp. 42–57.
[35] F. Sichiqui et al., “Agricultural information management: A
case study in corn crops in ecuador,” in The International Conference on Advances in Emerging Trends and Technologies. Berlin:
Springer-Verlag, 2019, pp. 113–124.
[36] A. de la Piedra, F. Benitez-Capistros, F. Dominguez, and A. Touhafi, “Wireless sensor networks for environmental research: A
survey on limitations and challenges,” in Proc. IEEE International Conference on Smart Technologies (EUROCON), July 2013, pp.
267–274.
[37] S. Wolfert, L. Ge, C. Verdouw, and M.-J. Bogaardt, “Big data in
smart farming: A review,” Agri. Syst., vol. 153, pp. 69–80, May
2017. doi: 10.1016/j.agsy.2017.01.023.
[38] R. A. Viscarra Rossel and J. Bouma, “Soil sensing: A new paradigm for agriculture,” Agri. Syst., vol. 148, pp. 71–74, Oct. 2016.
doi: 10.1016/j.agsy.2016.07.001.
[39] Y. Zhu, J. Song, and F. Dong, “Applications of wireless sensor
network in the agriculture environment monitoring,” Proc. Eng.,
vol. 16, pp. 608–614, 2011. [Online]. Available: https://www
.sciencedirect.com/science/article/pii/S1877705811026324 doi:
10.1016/j.proeng.2011.08.1131.
[40] Scopus. Accessed: May 9, 2020. [Online]. Available: http://
www.scopus.com
[41] Latindex, Sistema regional de información en línea para revistas
científicas de América Latina, el Caribe, España y Portugal. Accessed: May 9, 2020. [Online]. Available: https://www.latindex
.org/latindex/gCatalogo
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

[42] Publindex, Sistema de indexación y homologación de revistas especializadas de CTI. Accessed: May 9, 2020. [Online].
Available: https://scienti.minciencias.gov.co/publindex/#/
revistasPublindex/buscador
[43] G. D. Israel, “Determining sample size,” IFAS Extension, 1992.
[44] J. E. Guaña Moya, “Diseño de una Red de Sensores Inalámbricos (WSN) para monitorear parámetros relacionados con la
agricultura,” Ph.D. thesis, Escuela Politécnica Nacional, Quito,
Ecuador, Nov. 2016.
[45] C. Goumopoulos, “An autonomous wireless sensor/actuator
network for precision irrigation in greenhouses,” in Smart
Sensing Technology for Agriculture and Environmental Monitoring, S. Mukhopadhyay, Ed. Berlin: Springer-Verlag, 2012,
pp. 1–20.
[46] O. Postolache, J. M. Pereira, P. S. Girão, and A. A. Monteiro,
“Greenhouse environment: Air and water monitoring,” in
Smart Sensing Technology for Agriculture and Environmental Monitoring, S. Mukhopadhyay, Ed. Berlin: Springer-Verlag, 2012, pp.
81–102.
[47] L. Bencini, S. Maddio, G. Collodi, D. D. Palma, G. Manes, and
A. Manes, “Development of wireless sensor networks for agricultural monitoring,” in Smart Sensing Technology for Agriculture
and Environmental Monitoring. Berlin: Springer-Verlag, 2012, pp.
157–186.
[48] C. Cambra, S. Sendra, J. Lloret, and L. Garcia, “An IoT serviceoriented system for agriculture monitoring,” in Proc. IEEE Int.
Conf. Commun. (ICC), May 2017, pp. 1–6.
[49] G. Sahitya, N. Balaji, C. D. Naidu, and S. Abinaya, “Designing a
wireless sensor network for precision agriculture using Zigbee,”
in Proc. IEEE 7th Int. Adv. Comput. Conf. (IACC), Jan. 2017, pp.
287–291. doi: 10.1109/IACC.2017.0069.
[50] I. Mat, M. R. M. Kassim, and A. N. Harun, “Precision agriculture applications using wireless moisture sensor network,”
in Proc. IEEE 12th Malaysia Int. Conf. Commun. (MICC), 2015,
pp. 18–23.
[51] K. Ferentinos, N. Katsoulas, A. Tzounis, T. Bartzanas, and C.
Kittas, “Wireless sensor networks for greenhouse climate and
plant condition assessment,” Biosyst. Eng., vol. 153, pp. 70–81,
Jan. 2017. doi: 10.1016/j.biosystemseng.2016.11.005.
[52] F. A. Urbano-Molano, “Redes de sensores inalámbricos aplicadas a optimización en agricultura de precisión para cultivos de
café en colombia,” J. de Ciencia e Ingenier ia, vol. 5, no. 1, pp.
46–52, 2013.
[53] A. Torre Neto et al., “Wireless sensor network for variable rate
irrigation in citrus,” Inform. Technol. Sustainable Fruit Vegetable
Prod., vol. 7, pp. 563–569, Sept. 2005.
[54] M. Flores-Medina, F. Flores-García, V. Velasco-Martínez, G.
González-Cervantes, and F. Jurado-Zamarripa, “Monitoreo de
humedad en suelo a través de red inalámbrica de sensores,” Tecnología y ciencias del agua, vol. 6, no. 5, pp. 75–88, 2015.
[55] Y. Kim, R. G. Evans, and W. M. Iversen, “Remote sensing and
control of an irrigation system using a distributed wireless sensor network,” IEEE Trans. Instrum. Meas., vol. 57, pp. 1379–1387,
July 2008. doi: 10.1109/TIM.2008.917198.
[56] L. Fernandez, M. Huerta, G. Sagbay, R. Clotet, and A. Soto,
“Sensing climatic variables in a orchid greenhouse,” in Proc.

219

[57]

[58]

[59]

[60]

[61]

[62]

[63]

[64]

[65]

[66]

[67]

[68]

220

Int. Caribbean Conf. Devices, Circuits Syst. (ICCDCS), 2017, pp.
101–104. doi: 10.1109/ICCDCS.2017.7959719.
M. Culman, J. M. T. Portocarrero, C. D. Guerrero, C. Bayona, J.
L. Torres, and C. M. d. Farias, “PalmNET: An open-source wireless sensor network for oil palm plantations,” in Proc. IEEE 14th
Int. Conf. Netw., Sensing Control (ICNSC), May 2017, pp. 783–
788. doi: 10.1109/ICNSC.2017.8000190.
J. Petearson Anzola, V. García-Díaz, and A. C. Jiménez, “Wsn
analysis in grid topology for potato crops for IoT,” in Proc.
4th Multidisciplinary Int. Soc. Netw. Conf., 2017, p. 44. doi:
10.1145/3092090.3092104.
T. Cao-hoang and C. N. Duy, “Environment monitoring
system for agricultural application based on wireless sensor network,” in Proc. 7th Conf. Inf. Sci. Technol. (ICIST), 2017,
pp. 99–102.
F. Mesas-Carrascosa, D. V. Santano, J. Meroño, M. S. de la Orden, and A. García-Ferrer, “Open source hardware to monitor
environmental parameters in precision agriculture,” Biosystems
engineering, vol. 137, pp. 73–83, Sept. 2015. doi: 10.1016/j.biosystemseng.2015.07.005.
M. R. Marín, L. Padilla Sánchez, and J. Gómez Gómez, “Sistema
de monitoreo agrícola mediante redes inalámbricas de sensores
para el monitoreo de variables ambientales SISMOAGRO,” Ingeniería al Día, vol. 2, no. 2, pp. 4–22, Sept. 2016.
A. L. Diedrichs, G. Tabacchi, G. Grünwaldt, M. Pecchia, G. Mercado, and F. G. Antivilo, “Low-power wireless sensor network
for frost monitoring in agriculture research,” in Proc. IEEE Biennial Congr. Argentina (ARGENCON), June 2014, pp. 525–530.
doi: 10.1109/ARGENCON.2014.6868546.
R. Aquino-Santos, A. González-Potes, A. Edwards-Block, and
R. A. Virgen-Ortiz, “Developing a new wireless sensor network platform and its application in precision agriculture,”
Sensors, vol. 11, no. 1, pp. 1192–1211, 2011. doi: 10.3390/
s110101192.
R. Filev Maia, I. Netto, and A. L. H. Tran, “Precision agriculture
using remote monitoring systems in brazil,” in Proc. IEEE Global
Humanitarian Technol. Conf. (GHTC), 2017, pp. 1–6.
N. Fahmi, S. Huda, E. Prayitno, M. U. H. A. Rasyid, M. C. Roziqin, and M. U. Pamenang, “A prototype of monitoring precision
agriculture system based on WSN,” in Proc. Int. Seminar Intell.
Technol. Its Appl. (ISITIA), Aug. 2017, pp. 323–328. doi: 10.1109/
ISITIA.2017.8124103.
J. M. Nunez, F. Fonthal, and Y. Quijada, “Design and implementation of WSN for precision agriculture in white cabbage
crops,” in Proc. IEEE XXIV Congreso Internacional de Ingeniería
Eléctrica, Electrónica y Computación ( INTERCON), Arequipa,
Peru, 2017, pp. 1–4.
N. Karimi, A. Arabhosseini, M. Karimi, and M. H. Kianmehr,
“Web-based monitoring system using Wireless Sensor Networks for traditional vineyards and grape drying buildings,”
Comput. Electron. Agri., vol. 144, pp. 269–283, Jan. 2018. doi:
10.1016/j.compag.2017.12.018.
Y. E. M. Hamouda and B. H. Y. Elhabil, “Precision agriculture
for greenhouses using a wireless sensor network,” in Proc. Palestinian Int. Conf. Inf. Commun. Technol. (PICICT), May 2017, pp.
78–83.

[69] R. K. Kodali, N. Rawat, and L. Boppana, “WSN sensors for precision agriculture,” in Proc. Region 10 Symp., 2014, pp. 651–656.
[70] R. Godoi Vieira, A. Cunha, M. da Cunha, L. B. Ruiz, and A.
Pires de Camargo, “On the design of a long range WSN for precision irrigation,” IEEE Sensors J., vol. 18, no. 2, pp. 773–780,
Jan. 2018. doi: 10.1109/JSEN.2017.2776859.
[71] U. Dorji, T. Pobkrut, and T. Kerdcharoen, “Electronic nose
based wireless sensor network for soil monitoring in precision
farming system,” in Proc. 9th Int. Conf. Knowledge Smart Technol.
(KST), 2017, pp. 182–186.
[72] R. K. Kodali, S. Soratkal, and L. Boppana, “WSN in coffee cultivation,” in Proc. Int. Conf. Comput., Commun. Automat. (ICCCA),
Apr. 2016, pp. 661–666. doi: 10.1109/CCAA.2016.7813804.
[73] J. Wang, K. Damevski, and H. Chen, “Sensor data modeling
and validating for wireless soil sensor network,” Comput. Electron. Agri., vol. 112, pp. 75–82, Mar. 2015. doi: 10.1016/j.compag.2014.12.016.
[74] Z. Li et al., “Practical deployment of an in-field soil property
wireless sensor network,” Comput. Standards Interf., vol. 36, no.
2, pp. 278–287, 2014. doi: 10.1016/j.csi.2011.05.003.
[75] X. Dong, M. Vuran, and S. Irmak, “Autonomous precision agriculture through integration of wireless underground sensor
networks with center pivot irrigation systems,” Ad Hoc Networks, vol. 11, no. 7, pp. 1975–1987, 2013. doi: 10.1016/j.adhoc.2012.06.012.
[76] Z. Li, J. Wang, R. Higgs, L. Zhou, and W. Yuan, “Design of an
intelligent management system for agricultural greenhouses
based on the internet of things,” in Proc. IEEE Int. Conf. Comput. Sci. Eng. (CSE) and Embedded and Ubiquitous Comput. (EUC),
2017, vol. 2, pp. 154–160.
[77] L. Liu and Y. Zhang, “Design of greenhouse environment
monitoring system based on wireless sensor network,” in
Proc. 3rd Int. Conf. Control, Automat. Robotics (ICCAR), 2017,
pp. 463–466.
[78] L. Geng and T. Dong, “An agricultural monitoring system based
on wireless sensor and depth learning algorithm,” Int. J. Online Eng., vol. 13, no. 12, pp. 127–137, 2017. doi: 10.3991/ijoe.
v13i12.7885.
[79] A. Cama-Pinto, F. Gil-Montoya, J. Gómez-López, A. GarcíaCruz, and F. Manzano-Agugliaro, “Wireless surveillance system for greenhouse crops,” Dyna, vol. 81, no. 184, pp. 164–170,
2014. doi: 10.15446/dyna.v81n184.37034.
[80] M. F. Quiñones Cuenca, “Sistema de monitoreo de variables
medio ambientales usando una red de sensores inalámbricos y
plataformas de internet de las cosas,” B.S. thesis, Universidad
Nacional de Loja, Loja, Ecuador, 2017.
[81] J. C. Ortega Ortiz, “Desarrollo de un prototipo de adquisición
de variables ambientales en cultivos hidropónicos de lechuga,
mediante una red de sensores, utilizando un sistema embebido,” Ingenierías USB Bogotá, 2014. [Online]. Available: http://
biblioteca.usbbog.edu.co:8080/Biblioteca/BDigital/83534.pdf
[82] J. C. Suárez Barón and M. J. Suárez Barón, “Monitoreo de variables ambientales en invernaderos usando tecnología zigbee,”
in Proc. XLIII Jornadas Argentinas de Informática e Investigación Operativa (43JAIIO)-VI Congreso Argentino de AgroInformática (CAI),
Buenos Aires, 2014, pp. 165–175
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

[83] G. Aiello, I. Giovino, M. Vallone, P. Catania, and A. Argento, “A
decision support system based on multisensor data fusion for
sustainable greenhouse management,” J. Cleaner Prod., vol. 172,
pp. 4057–4065, Jan. 2018. doi: 10.1016/j.jclepro.2017.02.197.
[84] F. F. Montesano, M. W. van Iersel, F. Boari, V. Cantore, G.
D’Amato, and A. Parente, “Sensor-based irrigation management of soilless basil using a new smart irrigation system: Effects of set-point on plant physiological responses and crop performance,” Agri. Water Manage., vol. 203, pp. 20–29, Apr. 2018.
doi: 10.1016/j.agwat.2018.02.019.
[85] J.-A. Jiang et al., “A wireless sensor network-based monitoring system with dynamic convergecast tree algorithm for precision cultivation management in orchid greenhouses,” Precision Agri., vol. 17,
no. 6, pp. 766–785, Dec. 2016. doi: 10.1007/s11119-016-9448-7.
[86] S. Khan, “Wireless sensor network based water well management system for precision agriculture,” in Proc. 26th Int. Telecommun. Netw. Appl. Conf. (ITNAC), 2016, pp. 44–46. doi: 10.1109/
ATNAC.2016.7878780.
[87] Z. Li, N. Wang, T. Hong, T. Wen, and Z. Liu, “Design of wireless sensor network system based on in-field soil water content
monitoring,” Nongye Gongcheng Xuebao/Trans. Chinese Soc. Agri.
Eng., vol. 26, no. 2, pp. 212–217, 2010.
[88] Z. Li, N. Wang, T. Hong, A. Franzen, and J. Li, “Closed-loop drip
irrigation control using a hybrid wireless sensor and actuator
network,” Sci. China Inform. Sci., vol. 54, no. 3, pp. 577–588,
2011. doi: 10.1007/s11432-010-4086-6.
[89] J. Jao, B. Sun, and K. Wu, “A prototype wireless sensor network
for precision agriculture,” in Proc. Int. Conf. Distrib. Comput.
Syst., July 2013, pp. 280–285.
[90] E. Kampianakis, J. Kimionis, K. Tountas, C. Konstantopoulos,
E. Koutroulis, and A. Bletsas, “Wireless environmental sensor
networking with analog scatter radio and timer principles,”
IEEE Sensors J., vol. 14, no. 10, pp. 3365–3376, Oct. 2014. doi:
10.1109/JSEN.2014.2331704.
[91] J. L. Chávez, F. J. Pierce, T. V. Elliott, and R. G. Evans, “A remote irrigation monitoring and control system for continuous move systems. Part A: Description and development,” Precision Agri., vol.
11, no. 1, pp. 1–10, Feb. 2010. doi: 10.1007/s11119-009-9109-1.
[92] G. Vellidis, M. Tucker, C. Perry, C. Kvien, and C. Bednarz, “A
real-time wireless smart sensor array for scheduling irrigation,”
Comput. Electron. Agri., vol. 61, no. 1, pp. 44–50, 2008. doi:
10.1016/j.compag.2007.05.009.
[93] W. Yitong, S. Yunbo, and Y. Xiaoyu, “Design of multi-parameter
wireless sensor network monitoring system in precision agriculture,” in Proc. 4th Int. Conf. Instrum. Meas., Comput., Commun. Control (IMCCC 2014) 2014, pp. 721–725. doi: 10.1109/
IMCCC.2014.153.
[94] I. Mat, M. R. M. Kassim, and A. N. Harun, “Precision irrigation
performance measurement using wireless sensor network,” in
Proc. 6th Int. Conf. Ubiquitous Future Netw. (ICUFN), 2014, pp.
154–157. doi: 10.1109/ICUFN.2014.6876771.
[95] J. E. G. López, J. C. Chavez, and A. K. J. Sánchez, “Modelado de
una red de sensores y actuadores inalámbrica para aplicaciones
en agricultura de precisión,” in Proc. IEEE Mexican Humanitarian Technol. Conf. (MHTC), 2017, pp. 109–116. doi: 10.1109/
MHTC.2017.7926210.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

[96] L. Xiao and L. Guo, “The realization of precision agriculture monitoring system based on wireless sensor network,” in Proc. Int.
Conf. Comput. Commun. Technol. Agri. Eng., 2010, vol. 3, pp. 89–92.
[97] J. Xu, J. Zhang, X. Zheng, X. Wei, and J. Han, “Wireless sensors
in farmland environmental monitoring,” in Proc. Int. Conf. Cyber-Enabled Distrib. Comput. Knowl. Discovery, 2015, pp. 372–379.
[98] A. Medela, B. Cendón, L. González, R. Crespo, and I. Nevares,
“IoT multiplatform networking to monitor and control wineries
and vineyards,” in Proc. Future Netw. Mobile Summit, July 2013,
pp. 1–10.
[99] K. O. Flores, I. M. Butaslac, J. E. M. Gonzales, S. M. G. Dumlao,
and R. S. Reyes, “Precision agriculture monitoring system using wireless sensor network and raspberry Pi local server,” in
Proc. IEEE Region 10 Conf. (TENCON), 2016, pp. 3018–3021.
[100] J. L. Hou, R. Hou, D. S. Gao, and H. R. Shu, “The design and
implementation of orchard long-distance intelligent irrigation
system based on Zigbee and GPRS,” in Advanced Materials Res.,
vol. 588, pp. 1593–1597, Nov. 2012.
[101] I. Hajdu and I. Yule, “Application of a wireless sensor network
for multi-depth soil moisture monitoring at farm scale in New
Zealand’s hill country,” Adv. Animal Biosci., vol. 8, no. 2, pp.
412–417, July 2017. doi: 10.1017/S2040470017000450.
[102] K. Manikandan and S. Rajaram, “Automatic monitoring system
for a precision agriculture based on wireless sensor networks,”
Int. J. Sci., Eng. Comput. Technol., vol. 6, no. 6, p. 208, 2016.
[103] M. Mafuta, M. Zennaro, A. Bagula, G. Ault, H. Gombachika,
and T. Chadza, “Successful deployment of a wireless sensor network for precision agriculture in Malawi,” Int. J. Distrib. Sensor
Netw., vol. 9, no. 5, p. 150,703, 2013. doi: 10.1155/2013/150703.
[104] J. A. López, A.-J. Garcia-Sanchez, F. Soto, A. Iborra, F. GarciaSanchez, and J. Garcia-Haro, “Design and validation of a wireless sensor network architecture for precision horticulture
applications,” Precision Agri., vol. 12, no. 2, pp. 280–295, Apr.
2011. doi: 10.1007/s11119-010-9178-1.
[105] S. Rodríguez, T. Gualotuña, and C. Grilo, “A system for the
monitoring and predicting of data in precision agriculture in
a rose greenhouse based on wireless sensor networks,” in Proc.
CENTERIS 2017—Int. Conf. ENTERprise Inf. Systems/ProjMAN
2017—Int. Conf. Project MANagement/HCist 2017—Int. Conf.
Health and Soc. Care Inf. Syst. Technol., CENTERIS/ProjMAN/
HCist 2017, Jan. 2017, vol. 121, pp. 306–313.
[106] M. Srbinovska, C. Gavrovski, V. Dimcev, A. Krkoleva, and V.
Borozan, “Environmental parameters monitoring in precision agriculture using wireless sensor networks,” J. Cleaner Prod., vol. 88, pp. 297–307, Feb. 2015. doi: 10.1016/j.
jclepro.2014.04.036.
[107] A. Matese, S. Di Gennaro, A. Zaldei, L. Genesio, and F. Vaccari,
“A wireless sensor network for precision viticulture: The NAV
system,” Comput. Electron. Agri., vol. 69, no. 1, pp. 51–58, Nov.
2009. doi: 10.1016/j.compag.2009.06.016.
[108] S. M. Abd El-kader and B. M. Mohammad El-Basioni, “Precision
farming solution in Egypt using the wireless sensor network
technology,” Egyptian Inf. J., vol. 14, no. 3, pp. 221–233, Nov.
2013. doi: 10.1016/j.eij.2013.06.004.
[109] F. Karim, F. Karim, and A. Frihida, “Monitoring system using
Web of Things in precision agriculture,” Procedia Comput. Sci.,

221

vol. 110, pp. 402–409, 2017. doi: 10.1016/j.procs.2017.06.083.
[Online]. Available: https://www.sciencedirect.com/science/
article/pii/S1877050917312590
[110] J. Bauer, B. Siegmann, T. Jarmer, and N. Aschenbr uck,
“On the potential of Wireless Sensor Networks for the insitu assessment of crop leaf area index,” Comput. Electron.
Agri., vol. 128, pp. 149–159, Oct. 2016. doi: 10.1016/j.compag.2016.08.019.
[111] G. E. John, “A low cost wireless sensor network for precision
agriculture,” in Proc. 6th Int. Symp. Embedded Comput. Syst. Des.
(ISED), 2016, pp. 24–27. doi: 10.1109/ISED.2016.7977048.
[112] R. K. Math and N. V. Dharwadkar, “A wireless sensor network
based low cost and energy efficient frame work for precision
agriculture,” in Proc. Int. Conf. Nascent Technol. Eng. (ICNTE),
2017, pp. 1–6. doi: 10.1109/ICNTE.2017.7947883.
[113] T. D. Le and D. H. Tan, “Design and deploy a wireless sensor
network for precision agriculture,” in Proc. 2nd Nat. Found. Sci.
Technol. Develop. Conf. Inf. and Comput. Sci. (NICS), 2015, pp.
294–299.
[114] L. Hui, M. Zhijun, W. Hua, and X. Min, “Spatio-temporal variation analysis of soil temperature based on wireless sensor network,” Biol. Eng., vol. 9, no. 6, p. 9, 2016.
[115] G. Nagarajan and R. Minu, “Wireless soil monitoring sensor for
sprinkler irrigation automation system,” Wireless Personal Commun., vol. 98, no. 2, pp. 1835–1851, 2018. doi: 10.1007/s11277017-4948-y.
[116] Y. Wang, Y. Wang, X. Qi, and L. Xu, “OPAIMS: Open architecture precision agriculture information monitoring system,” in
Proc. 2009 Int. Conf. on Compilers, Archit., Synthesis Embedded
Syst. pp. 233–240. doi: 10.1145/1629395.1629428.
[117] M. Zeni et al., “Low-power low-cost wireless sensors for real-time
plant stress detection,” in Proc. 2015 Annu. Symp. Comput. Develop. New York, NY, pp. 51–59. doi: 10.1145/2830629.2830641.
[118] A. Khattab, S. E. Habib, H. Ismail, S. Zayan, Y. Fahmy, and
M. M. Khairy, “An IoT-based cognitive monitoring system for
early plant disease forecast,” Comput. Electron. Agri., vol. 166, p.
105,028, Nov. 2019. doi: 10.1016/j.compag.2019.105028.
[119] R. S. Jo, M. Lu, V. Raman, and P. H. Then, “Design and implementation of IoT-enabled compost monitoring system,” in Proc.
IEEE 9th Symp. Comput. Appl. Ind. Electron. (ISCAIE), Nov. 2019,
pp. 23–28.
[120] J. M. Núñez V, F. Fonthal, and Y. M. Quezada L, “Design and
implementation of WSN and IoT for precision agriculture in
tomato crops,” in Proc. IEEE Andean Conf., 2018, pp. 1–5. doi:
10.1109/ANDESCON.2018.8564674.
[121] V. A. Vuh, D. C. Trinh, T. C. Truvant, T. D. Bui, “Design of automatic irrigation system for greenhouse based on LoRa technology,” in Proc. Int. Conf. Adv. Technol. Commun. (ATC), 2018, pp.
72–77.
[122] T. Murthy and S. Rasool, “Design of smart bio-shed using IoT
with raspberry PI,” Int. J. Recent Technol. Eng., vol. 8, no. 2 Special Issue 11, pp. 2249–2255, 2019.
[123] Y. Kuang, Y. Shen, L. Lu, and G. Li, “Farmland monitoring
system based on cloud platform,” in Proc. IEEE 8th Joint Int.
Inf. Technol. Artif. Intell. Conf. (ITAIC), 2019, pp. 335–339. doi:
10.1109/ITAIC.2019.8785531.

222

[124]J. Bauer, T. Jarmer, S. Schittenhelm, B. Siegmann, and N.
Aschenbruck, “Processing and filtering of leaf area index time
series assessed by in-situ wireless sensor networks,” Comput.
Electron. Agri., vol. 165, p. 104,867, 2019. doi: 10.1016/j.compag.2019.104867.
[125] S. Sadowski and P. Spachos, “Solar-powered smart agricultural monitoring system using internet of things devices,” in
Proc. IEEE 9th Annu. Inf. Technol., Electron. Mobile Commun. Conf.
(IEMCON), 2018, pp. 18–23. doi: 10.1109/IEMCON.2018.
8614981.
[126] X. Feng, F. Yan, and X. Liu, “Study of wireless communication
technologies on Internet of Things for precision agriculture,”
Wireless Personal Commun., 2019, pp. 1–18.
[127] A. Rao, H. Shao, and X. Yang, “The design and implementation of smart agricultural management platform based on
UAV and wireless sensor network,” in Proc. IEEE 2nd Int. Conf.
Electron. Technol. (ICET), 2019, pp. 248–252. doi: 10.1109/
ELTECH.2019.8839480.
[128] “Water in agriculture,” The World Bank Group Water Global Practice, Washington, D.C. Accessed: Sept. 3, 2020. [Online]. Available: https://www.worldbank.org/en/topic/water-in-agriculture
[129] Zigbee, IEEE Standard for Low-Rate Wireless Networks, IEEE Standard 802.15.4-2015 (Revision of IEEE Standard 802.15.4-2011),
2016, pp. 1–709.
[130] Bluetooth, IEEE Standard for Telecommunications and Information
Exchange Between Systems – LAN/MAN – Specific Requirements –
Part 15: Wireless Medium Access Control (MAC) and Physical Layer
(PHY) Specifications for Wireless Personal Area Networks (WPANs),
IEEE Standard 802.15.1-2002, pp. 1–473.
[131] Wifi, IEEE Standard for Wireless LAN Medium Access Control
(MAC) and Physical Layer (PHY) Specifications, IEEE Standard
802.11-1997, pp. 1–445.
[132] W. Ayoub, A. E. Samhat, F. Nouvel, M. Mroue, and J. Prévotet,
“Internet of mobile things: Overview of LoRaWAN, DASH7,
and NB-IoT in LPWANs standards and supported mobility,”
IEEE Commun. Surveys Tuts., vol. 21, no. 2, pp. 1561–1581, 2019.
doi: 10.1109/COMST.2018.2877382.
[133] F. Adelantado, X. Vilajosana, P. Tuset-Peiro, B. Martinez, J.
Melia-Segui, and T. Watteyne, “Understanding the limits of LoRaWAN,” IEEE Commun. Mag., vol. 55, no. 9, pp. 34–40, 2017.
doi: 10.1109/MCOM.2017.1600613.
[134] V. D. Hunt, A. Puglia, and M. Puglia, An Overview of RFID Technology. New Jersey: Wiley Telecom, 2007, pp. 5–24.
[135] J. Cai and D. J. Goodman, “General packet radio service in
GSM,” IEEE Commun. Mag., vol. 35, no. 10, pp. 122–131, 1997.
doi: 10.1109/35.623996.
[136] Memsic. Accessed: May 10, 2020. [Online]. Available: https://
www.memsic.com/
[137] “Waspmote,” Libelium Comunicaciones Distribuidas S.L. Accessed: May 10, 2020. [Online]. Available: http://www.libelium
.com/products/waspmote/
[138] B. E. Graeub, M. J. Chappell, H. Wittman, S. Ledermann, R.
B. Kerr, and B. Gemmill-Herren, “The state of family farms in
the world,” World Develop., vol. 87, pp. 1–15, Nov. 2016. doi:
10.1016/j.worlddev.2015.05.012.
GRS
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

Spectral Variability
in Hyperspectral
Data Unmixing
A comprehensive review
RICARDO AUGUSTO BORSOI, TALES IMBIRIBA, JOSÉ CARLOS MOREIRA BERMUDEZ,
CÉDRIC RICHARD, JOCELYN CHANUSSOT, LUCAS DRUMETZ, JEAN-YVES TOURNERET,
ALINA ZARE, AND CHRISTIAN JUTTEN

he spectral signatures of the materials contained in hyperspectral images, also called endmembers (EMs), can
be significantly affected by variations in atmospheric, illumination, and environmental conditions that typically
occur within an image. Traditional spectral unmixing
(SU) algorithms neglect the spectral variability of the EMs,
which propagates significant modeling errors throughout
the whole unmixing process and compromises the quality
of the results. Therefore, serious efforts have been dedicated
to mitigating the effects of spectral variability in SU. This
resulted in the development of algorithms that incorporate
different strategies to enable the EMs to vary within a hyDigital Object Identifier 10.1109/MGRS.2021.3071158
Date of current version: 21 May 2021

DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

perspectral image, using, for instance, sets of spectral signatures known a priori as well as Bayesian, parametric, and
local EM models.
Each of these approaches has different characteristics
and underlying motivations. This article presents a comprehensive literature review contextualizing both classic
and recent approaches to solve this problem. We give a detailed evaluation of the sources of spectral variability and
their effects in image spectra. Furthermore, we propose
a new taxonomy that organizes existing works according to a practitioner’s point of view, based on the necessary amount of supervision and the computational cost.
We also summarize methods to construct spectral libraries (which are required by many SU techniques) based on
the observed hyperspectral image as well as algorithms for
0274-6638/21©2021IEEE

223

library augmentation and reduction. Finally, we conclude
with discussions and an outline of possible future directions for the field.
OVERVIEW
Hyperspectral cameras can sample electromagnetic spectra
at hundreds of contiguous wavelength intervals. The high
spectral resolution of hyperspectral images makes them an
important tool for the precise identification and discrimination of different materials in a scene. Hyperspectral images
significantly contribute to different fields and are now at the
core of a vast number of applications, such as space exploration [1], land use analysis, mineral detection, environment
monitoring, field surveillance [2], [3], disease diagnosis,
and image-guided surgery [4]. Despite the advantages of
their high spectral resolution, hyperspectral cameras operate with a delicate tradeoff between spatial resolution and
the signal-to-noise ratio. This happens because the light
observed at the sensor is decomposed into several spectral
bands, which, in turn, demands the pixel size to be large
enough to attain an acceptable signal-to-noise ratio. When
combined with a large target-to-sensor distance, which is
common in many applications, this leads to images that
have a low spatial resolution [5]. The limited spatial resolution of hyperspectral images means that each image pixel
is actually a mixture of P different pure materials, whose
spectra are termed EMs, that are present in the scene [6].
This mixing process conceals important information about
the pure materials and their distribution in an image. SU
aims to solve this problem by decomposing a hyperspectral
image into the spectral signatures of the EMs and their fractional abundance proportions for each pixel [7].
The simplest and most widely used model to represent
the interaction between light and the EMs in the scene is
the linear mixing model (LMM) [6], which represents a given pixel y n indexed by n with L spectral bands as
y n = M 0 a n + e n, subject to 1 < a n = 1 and a n $ 0, (1)
where M 0 = [m 0, 1, f, m 0,P] is an L × P matrix whose columns are the P EMs, a n is a vector containing the abundances of every EM in the pixel y n, and e n is an additive noise vector. Traditionally, the LMM assumes that
the signatures M 0 of the pure materials are the same
for all pixels y n, n = 1, f, N in the image. Although this
assumption leads to a well-posed and computationally
simpler framework, it limits the applicability of the LMM
since it can jeopardize the accuracy of estimated abundances in many circumstances, due to the spectral variability of the EMs.
SPECTRAL VARIABILITY IN SPECTRAL UNMIXING
Spectral variability is an effect commonly observed in many
scenes in which the spectral signatures of the pure constituent materials vary across the observed hyperspectral image,
as illustrated in Figure 1. It can be caused, for instance, by
224

variable illumination and atmospheric conditions. Variability can also be intrinsic to the very definition of a pure
material, such as signatures of a single vegetation species
that differ significantly due to growing and environmental
conditions [8], [9]. In this context, the use of a single matrix
M 0 for all pixels in the LMM (1) leads to problems such as
proportion indeterminacy, where errors in the estimation
of the EM spectra at each pixel propagate to the estimated
abundances. This results in erroneous abundance estimation and the selection of too many EMs to represent the
spectrum of each pixel y n [8]–[10]. Due to the significant
impact of EM variability on abundance estimation quality,
a lot of effort has been dedicated to developing algorithms
that are able to obtain better abundance estimates in this
scenario. The most general form of the LMM that considers spectral variability generalizes (1) to facilitate a different
EM matrix for each pixel, resulting in
y n = M n a n + e n, subject to 1 < a n = 1 and a n $ 0 (2)
for n = 1, f, N, where M n ! R L # P is the nth pixel EM
matrix.
SU for spectral variability can be generally defined as
two complementary problems related, respectively, to the
recovery of the abundances and EMs. These can be defined
as the following:
◗◗ P1, which mitigates the adverse effects of spectral variability in the abundance estimation
◗◗ P2, which estimates the spectral signatures of the EMs
present in each pixel of the image.
Both problems have attracted substantial interest. While all
SU methods must deal with P1 while accounting for spectral variability, not all of them take P2 into consideration,
due to the additional difficulty that is involved.
CONTRIBUTION, TAXONOMY, AND ORGANIZATION
Many SU algorithms have been proposed to address
problems P1 and P2. Different algorithms follow various
methodologies to represent the EMs in the scene. Existing
methods employ Bayesian, parametric, and spatially localized models as well as libraries containing different instances of material spectra known a priori. This multiplicity of models gives rise to solutions presenting advantages
and disadvantages in terms of computational complexity,
accuracy, and the amount of user supervision. In this article, we categorize the methods according to criteria that are
most relevant to the practitioner, such as, e.g., computational complexity, to provide a comprehensive review that
complements and updates previous summaries [8], [9],
[11], [12]. Since existing SU methods that address spectral
variability have very heterogeneous characteristics, navigating the field can be difficult, especially when accounting for classical algorithms and recent developments. This
difficulty motivated the present review, which presents
a novel taxonomy aimed at the practitioner as well as a
comprehensive categorization of existing approaches. The
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

contributions and highlights of the article are described in
the following.
A NEW TAXONOMY FOR THE PRACTITIONER
We propose a taxonomy to organize the existing techniques
according to a practitioner’s point of view, based on the
amount of user supervision and the computational complexity required to solve the SU problem. The resulting taxonomy is summarized in the form of a decision tree shown
in Figure 2, which can be used to guide the choice of a family of SU algorithms. The decision tree also dictates the organization of the rest of the article. We start from whether
or not a spectral library is known a priori and proceed to
different families of SU methods based on the tradeoffs
they offer regarding the need for user supervision and computational cost. Table 1 summarizes the main characteristics of each group of techniques and points to illustrations
with high-level descriptions of the key ideas on which the
categories are based.

0.4
0.3

EX SITU SPECTRAL LIBRARIES
A considerable number of SU methods address spectral
variability by using libraries of spectra that originally had
to be acquired a priori (e.g., through in situ measurements),
which used to limit the applicability of these approaches.
An important recent development concerns methods that
can extract spectral libraries directly from the observed images or generate them using physics-based mathematical
models of material spectra. This supports the widespread

0.3

0.6

0.25

0.5

0.2

0.4

0.15

0.3

0.1

0.1
0

A COMPREHENSIVE OVERVIEW
AND RECENT HIGHLIGHTS
We provide a comprehensive review of the methods developed to solve the SU problem with EM variability. We include and contextualize the classic strategies that have been
reviewed before as well as numerous recent developments
in the field. Thus, both classic and recent algorithms are categorized according to the proposed taxonomy, which helps
to highlight the advances in each area.

0.2

0.05
0.5

1
1.5
2
Wavelengths (µm)

0.5

1
1.5
2
Wavelengths (µm)

(a)

(b)

0.5

1
1.5
2
Wavelengths (µm)
(c)

FIGURE 1. Spectral variability is ubiquitous in hyperspectral images: the pixels in regions composed of a single material [e.g., (a) the trees,
(b) roof, and (c) soil in this image] can contain very different spectral signatures. (Source: Image generated based on sample data available
from the HyperCUbe software, distributed by the U.S. Army Corps of Engineers.)
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

225

226

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

Fuzzy
Unmixing

Is Expert
Knowledge
Available?

Yes

Apply Spectral
Transformation to
the Image and
Library?

Yes

MESMA and
Variants

Sparse
Unmixing

Yes

EM Libraries
Extraction

Yes

Prune Signatures
to Make the Library
Small?

Is the Library
Very Large?

Less
Less Cost
Supervision or
Less Computational
Cost?

Less
Supervision

User-Defined
Spectral
Transformation

Yes

Are Spectral
Libraries Available
a Priori?

Machine
Learning
Methods

Bayesian
Models

Parametric
EM Models

EM-ModelFree Methods

Less
Less Cost
Supervision or
Less Computational
Cost?

Less
Supervision

Extract
Libraries From the
Observed Image?

Local
Unmixing

Low Medium High

Color Code for the
Computational Cost of
the Unmixing Methods:

FIGURE 2. The decision tree for hyperspectral unmixing, considering spectral variability. The blue boxes denote families of unmixing algorithms, while the yellow boxes represent additional techniques related to the extraction and processing of spectral libraries. MESMA: multiple-EM spectral mixture analysis.

Library-Based
Spectral
Transformation

Yes

Estimate Abundances
and EMs for Each Pixel

applicability of library-based SU techniques in situations
where spectral libraries are not available or cannot be built.
Such methods are reviewed in the “How to Construct
Spectral Libraries” section. Moreover, library pruning techniques, which were originally devised to reduce the size
of libraries and lessen the computational complexity of
SU, have evolved to consider the quality of the unmixing

results. Recent library pruning methods aim at removing,
before unmixing, entire EM classes or individual spectral
signatures that are unlikely to be present in an observed
image. This reduces the ill-posedness of the SU problem
and can improve abundance estimation. These techniques
are discussed in the “Library Pruning Techniques” section. Table 2 summarizes the key ideas involved in library

TABLE 1. CHARACTERISTICS OF EACH GROUP OF SU TECHNIQUES AND WHERE THEY ARE REVIEWED IN THE ARTICLE.
MESMA AND
VARIANTS

FUZZY SU

SPARSE SU

MACHINE
LEARNING

LOCAL SU

PARAMETRIC
EM MODELS EM-MODEL-FREE

BAYESIAN

Amount of user
supervision

•

••

•••

••

•

Computational
cost

•••

•

••

•

••

•••

Requires spectral
libraries?

Estimates pixeldependent EMs?

Section in the
article

“MultipleEndmember
Spectral Mixture
Analysis and Its
Variants for Small
Spectral Libraries”

“Sparse
Unmixing”

“Machine
“Local
Learning
Unmixing
Algorithms” Methods”

“Parametric
Models”

“EndmemberModel-Free
Methods”

“Bayesian
Methods”

Illustration of the
key ideas

Figure 10

Figure 11

Figure 13

Figure 12

Figure 14

Sparse SU: SU with sparsity constraints; EM-model-free: SU without explicit EM models.

TABLE 2. CHARACTERISTICS OF SPECTRAL LIBRARY EXTRACTION AND PRUNING TECHNIQUES AND WHERE THEY ARE
REVIEWED IN THE ARTICLE.
LIBRARY EXTRACTION
TECHNIQUES

IMAGE-BASED LIBRARY
EXTRACTION

LIBRARY GENERATION FROM
PHYSICS MODELS

SPATIAL INTERPOLATION OF EM
SIGNATURES

Key idea

Extracts multiple EM signatures
from the observed image and
cluster them to construct a library

Create synthetic EM signatures
using physicochemical
mathematical models
describing EM variability

Estimate EM signatures for each pixel
by interpolating pure pixels at known
spatial locations

Adapted to the HI?

Amount of user
supervision

••

•••

••

Depends on the existence
of pure pixels?

Section in the article

“Image-Based Library Construction”

“Generating Spectral Libraries
From Physics Models”

“Spatial Interpolation of Endmember
Signatures”

LIBRARY PRUNING
TECHNIQUES

LIBRARY REDUCTION

EM SELECTION

SAME-CLASS EM PRUNING

Removes redundant signatures
from an existing library to reduce
the computational complexity of SU

Removes entire EM classes
(e.g., water and trees) not
present in the observed image
from the library

Selects the signatures from each
EM class most closely related to the
observed image before SU

Key idea

Adapted to the HI?

Amount of user supervision

•

••

Improves the computational
cost of SU?

Improves SU quality?

Section in the article

“Library Reduction Techniques”

“Endmember Selection
Methods”

“Pruning Libraries Within the Same
Class”

HI: hyperspectral image.

DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

227

extraction and pruning methods as well as the approaches’
main characteristics.
EXPERIMENTAL ASPECTS AND TOOLBOX
The practical aspects related to the evaluation of SU methods when spectral variability is considered are also discussed in the “Experimental Evaluation” section. This
includes the generation of realistic synthetic data and a
list of existing software resources that are available to the
reader. We also present a simulation to demonstrate the
application of a few of the SU techniques reviewed in the
article, which were chosen by selecting different paths in
the proposed decision tree. This example is made publicly available in the form of a software toolbox at https://
github.com/ricardoborsoi/u nmixing_spectral_variability
and in [13].
ORIGINS OF SPECTRAL VARIABILITY
AND ITS EFFECTS
The variability in spectral signatures occurs mainly due to
1) atmospheric effects, 2) illumination and topographic
changes, and 3) the intrinsic variation of the spectral signatures of the materials (i.e., due to physicochemical differences). Understanding how these conditions affect the
spectral signatures of the materials and the unmixing results is important for the development of informed models
and methods to deal with EM variability. Such knowledge
can be used, for instance, to generate physics-based and
physically inspired models that include the effects of spectral signature variability. Such representations can then be
directly incorporated into the SU process (as discussed in
the “Parametric Models” section) and used to generate synthetic spectral libraries for library-based SU (as discussed
in the “Generating Spectral Libraries From Physics Models” section).
In addition to SU, spectral variability affects other hyperspectral imaging tasks, which prompted extensive investigations into its causes and how it manifests in material spectra. In this context, a recent review article by Theiler
et al. [14] provides an excellent overview of spectral variability in hyperspectral target detection. In particular, the
causes and effects of spectral variability in target detection
are reviewed, with a focus on the study of environmentally induced variability (caused by, e.g., atmospheric and
topographic changes) through an in-depth examination
of radiative transfer models. A detailed computer simulation is included to illustrate how the material spectra
are affected by changes in the parameters of the radiative
transfer model.
In the following, we review the causes and effects of
spectral variability from an SU perspective. Although we
also introduce the radiative transfer function interpretation of some atmospheric and topographic effects, we
focus our exposition on a more generic analysis of the
consequences that spectral variability has for the observed pixel spectra and on the results of SU as reported
228

by previous experimental works (i.e., with a stronger focus
on the results of, e.g., atmospheric compensation methods
as opposed to the interpretation of the imaging models
themselves). The interested reader can find a more comprehensive analysis from a radiative transfer function
standpoint in [14].
ATMOSPHERIC EFFECTS
One of the main sources of spectral variability is atmospheric interference when measuring ground reflectance.
Atmospheric gases (such as ozone, oxygen, methane,
carbon dioxide, and so on), aerosols, and, most prominently, water vapor absorb significant amounts of radiation, while other molecules and vapors scatter incoming
light [15]. These effects have an impact on the radiance
measured at the sensor, which can become significantly
different than that corresponding to the desired ground
reflectance. Atmospheric absorption from gases is also
heavily wavelength dependent, whereas aerosol absorption varies smoothly in spectra. These effects must be
compensated for to achieve an accurate characterization
of surface reflectance.
Atmospheric compensation models can be roughly divided into statistical (empirical) and physics-based varieties [15]. Statistical models are based on additional information about the atmospheric influence, usually obtained
by means of reference objects and calibration panels in
the scene. This information is used to find a relationship
(e.g., linear) between the radiances observed at the sensor
and at the surface of the scene [15]. This results in gain
and offset factors for each spectral band, which are then
uniformly applied to every image pixel to compensate for
the atmospheric effects [15]. Sometimes, when a reference object is not present in the scene, naturally occurring objects can be employed, most commonly consisting
of smooth bodies of water, which exhibit low reflectance
and can be considered dark objects [5]. The downsides of
this approach are 1) the true reflectance of a reference object must be accurately known, and 2) it does not account
for the spatial variability of the distribution of gases and
aerosols. This variability can be very significant, and thus
it can introduce spatially dependent residual atmospheric
effects. A classic example of statistical methods is the empirical line method [5].
Physics-based models, on the other hand, are robust
alternatives to empirical methods and do not assume that
additional information about the scene is known. These
methods are mature and widely used, addressing the limitations of empirical approaches by employing a rigorous
model that explicitly describes the absorption and scattering effects due to atmospheric gases and aerosols [16]. Popular examples include the atmospheric removal algorithm
and the Fast Line-of-Sight Atmospheric Analysis of Spectral
Hypercubes (FLAASH) [15].
Assuming ground terrain illuminated by the sun,
the light incident on a pixel in the sensor can be roughly
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

characterized by four sources: 1) solar radiation directly
reflected off the ground, 2) light directly reflected off the
atmosphere into the sensor, 3) light scattered by the atmosphere and reflected off the ground, and 4) light reflected
off surrounding regions on the ground and then scattered
before reaching the sensor (constituting the adjacency effect) [17], [18]. These effects are illustrated in Figure 3. A
model for the reflectance at the sensor y sensor is given by [15]
y sensor = y atm Tg +

y s Tg T. T- + (y avg - y s) Tg T. T- r
, (3)
1 - y avg s

where y s is the reflectance of the surface of interest, Tg is the
gaseous transmittance, y atm is the reflectance of the atmosphere, T. and T- are the upward and downward scattering
transmittances, r is the ratio between the diffuse and total
transmittance for the ground-to-sensor path, s is the spherical albedo of the atmosphere, and y avg is the average surface
reflectance in a region around a pixel, which is used to account for scattering (adjacency) effects [15].
Physics-based atmospheric correction algorithms then
try to obtain the ground reflectance y s from the at-sensor
reflectance y sensor by solving (3). In the overall working of
these algorithms, the first step for atmospheric compensation consists of retrieving the atmospheric parameters necessary to represent the quantities in (3), mainly consisting of
an aerosol description (visibility and type of aerosol) and the
amount of water vapor for each pixel [19]. They are typically
based on variations of the so-called three-band ratio technique, which is an important step to quantify the amount of
water vapor for each pixel. The three-band ratio technique
basically compares ratios of radiances measured near the
edges of a number of spectral wavelengths that are known
to present heavy water vapor absorption (e.g., at around
0.91, 0.94, and 1.14 μm), using this information to derive
the column water vapor information for each pixel [5], [20].
After the necessary parameters have been estimated, (3) can
be solved for the ground reflectance, and an optional postprocessing step can be employed (called spectral polishing) to
remove artifacts from the correction process [19].
Physics-based models can represent and account for
the interaction between solar radiation and the atmosphere very accurately. However, for this accuracy to translate into meaningful surface reflectance estimates, these
models require precise information about atmospheric
properties, which is very difficult to obtain in practice.
This is especially true for scattering and absorption by
aerosols, which are hard to characterize accurately due
to their spatial and temporal variability [21]. Inaccuracies
in the estimation of these parameters (which include the
atmospheric visibility, aerosol model type, and an atmospheric model) introduce errors in the retrieved surface
reflectance spectra that can be significant and spectrally
nonuniform [22].
Furthermore, unlike water vapor compensation, which
is performed on a pixel-by-pixel basis, most methods
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

assume that individual aerosol and gas concentrations
are uniform across the scene (resulting in a single transmittance spectrum being computed for each gas) [19],
[22]. While this is true for some gases (such as ammonium, oxygen, methane, carbon dioxide, and so on) that
are fairly constant in the atmosphere [20], it is far from
true for aerosols, which may show significant variation in
space [23], [24]. The aerosol concentration can change depending on the environment (e.g., in large cities and rural
areas), and thus it must be provided by users of existing
algorithms [20]. Moreover, standard aerosol types often
do not adequately represent the scene being processed,
leading to inaccuracies in the retrieved spectra [25]. In addition, experimental studies have found that aerosol optical thickness has significant spatial variability within a
single scene [23], [26] and is often correlated with cloud
concentrations [26].
Some works attempted to estimate the aerosol optical
thickness for smaller patches of the image individually
by using shadow detection results [27], an approach that
depends on the presence of a large number of shadowed
pixels. However, acquiring precise data for an accurate
and possibly spatially variable atmospheric correction is
generally difficult, which means that the results of common atmospheric compensation methods can be subject
to significant errors [23]. For instance, a number of studies have investigated the residual errors in surface reflectance data after the application of atmospheric compensation methods by comparing the processed results with
in situ data and by using simulations. These studies found
that, generally, there is still an appreciable error in the retrieved reflectances. As an example, errors in the retrieved
reflectance by atmospheric corrections due to the spatial
variability of the aerosol optical thickness above southern
England were found to be as high as 1.7%, with errors of

(a)
(b)
Light Source

(c)

Viewer

(d)

Terrain

FIGURE 3. The effects of the atmosphere on the acquired hyperspectral image. The sources of radiation are represented by (a)
light directly reflected by the atmosphere to the sensor, (b) light
scattered by the atmosphere and reflected by the ground, (c) light
directly reflected by the ground, and (d) light reflected by surrounding regions on the ground and then scattered to the sensor.

229

5% in the normalized difference vegetation index (NDVI)
[23]. This can be significant for practical applications, as it
corresponds to errors of up to 30% in biomass production
estimates [23], [28].
Moreover, standard methods for column water vapor
retrieval lose accuracy when the aerosol optical thickness
is high, leading to errors of up to 10% if aerosol effects
are not properly compensated for [29]. Note that experimental measurements in a water quality management application found significant differences between the true
and retrieved spectral responses. Errors of up to 15% in reflectance spectra were indentified, more prominently concentrated in short (<450-nm) and long (>750-nm) wavelength intervals [30]. Another study evaluated a number
of physics-based atmospheric correction methods in an
experiment for a playa and canola target and found that,
although the average relative differences were moderate,
ranging between 0.023 and 0.042, larger deviations of up
to 0.12 occurred in the near-infrared region [31]. A study
with simulated data found that incorrectly supplying input
parameters to the model used in the FLAASH algorithm
can lead to considerable errors in the retrieved reflectance,
with an absolute difference of up to 0.11, and a strong sensitivity to moisture/optical depth (visibility) errors [22].
Also, very large errors can be introduced by a bad specification of the aerosol model type, with higher errors generally
present in short wavelengths, where scattering processes
are most significant [22].
The influence that uncertainties in the column water
vapor and aerosol optical depth specification have on SU
(given their influence on the retrieved reflectances) was investigated in [24]. The performance degradation was found
to be more severe in the abundance estimation than it was
in the reflectance estimation, with degradation of up to
30% in high-scattering conditions. The results were more
acutely affected due to uncertainties in the water vapor

amount than in the aerosol optical thickness, although the
latter showed a strong influence on the quality of the reconstructed abundance maps when the EMs were spectrally
similar. Finally, it is interesting to highlight that two characteristics were noticed in these studies. First, the errors in
the retrieved reflectances are fairly nonuniform in spectral
bands, with large spikes often concentrated near bands
where there is significant gas/water absorption [22], [24].
Second, errors due to bad aerosol specification are quite
significant in short wavelengths (450–750 nm), where they
are concentrated [22], [30]. All these effects are illustrated
in Figure 4.
ILLUMINATION AND TOPOGRAPHIC EFFECTS
Varying illumination conditions are one of the main sources of spectral variability in spectral mixture analysis [32].
Illumination changes are mainly due to two effects: varying terrain topography, which affects the angles of the incident radiation, and the occlusion of the light source by
other objects (leading to shaded areas). A number of works
handled the presence of heavily shaded areas by considering the presence of an additional EM representing shadow
[33]–[39]. Although this approach is very simple, its effectiveness is certainly limited since a single spectral signature
can be insufficient to adequately represent all pixels affected by shadow [40]. For instance, there might be many
shadow EMs since shadows in different regions of the image
are influenced by both the material that is being shaded and
the absorption properties of the material that is blocking
the light, which might lead to significantly different spectral signatures [41]. Furthermore, besides presenting a lower
reflectance amplitude, shadow EM is usually significantly
affected by nonlinear atmospheric scattering and multipath
effects because these areas are illuminated by a large proportion of diffuse irradiation scattered by the atmosphere
(i.e., skylight) and nearby objects. This implies that shadow

Atmospherically Corrected
Smaller Errors
in the Remaining
Spectra

Reflectance

True

Large Errors
Concentrated
Near Water
Absorption Bands

Significant
Errors in Short
Wavelengths
Due to Aerosols
Wavelengths

FIGURE 4. Variability caused by atmospheric effects.

230

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

Reflectance

the scene. To this end, one could resort to the model develEM is sensitive to the state of the atmosphere and can vary
oped by Hapke [48], [49], which describes the bidirectional
significantly in space, depending on the amount of scattered
reflectance (i.e., the reflectance as a function of the incidence
light being reflected from the sky at each position [42], [43].
angles of the light source and observer/viewer, as in Figure 6)
When illumination predominantly comes from scattered
as a function of the single scattering albedo and the photometradiation, the spectrum not only presents a lower amplitude
ric parameters of the material [50]. Hapke’s model suggests a
but is skewed to short (e.g., blue) wavelengths [44], [45]. This
more complex relationship between the EM signatures and
means that the signal amplitudes in the shorter (blue) wavethe topography. In this context, the mixture of materials is aslengths are considerably larger than in the rest of the spectra
sumed to happen at the macroscopic level, enabling the con[45]. Furthermore, since the shadow spectral signature is a
sideration of the LMM in the albedo domain, where Hapke’s
function of diffuse illumination, it depends on the neighmodel acts separately on each EM.
boring image area (where the skylight is scattered) [45] and
Besides the dependency on the spectral signature
the cloud cover. Moreover, variations of ground reflectance
with photometric parameters, which is discussed in the
may not be easily discernible from atmospheric effects since
following section, the dependence on the single scatterboth phenomena are jointly observed and not easily separaing albedo indicates that changes in incident angles can
ble [45]. These facts introduce a strong dependence between
the shadow signature and the spatial position, and they go
against the common notion that shadow EMs can be adequately represented by scaled versions of true EMs [5] (that
0.55
is true only for small illumination variations). This makes
0.5
the detection, correction, and quantification of shadow a
0.45
challenging task because the physical-based inversion of
0.4
these atmospheric effects turns out to be a hard problem.
0.35
However, this task is still necessary since linear SU with a
0.3
single dark EM usually does not successfully quantify the
0.25
presence of shadow in the scene [45].
0.2
Although the presence of shadows is common in hy0.15
perspectral images, a more prominent source of variability
0.1
comes from the differing topography of the scene, which
0.05
introduces complex fluctuations of the relative angles be0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8
tween the incoming light source and the sensor for each
Wavelength (µm)
pixel of the scene. Topographic variations have been shown
to significantly affect the spectral reflectance values of soil
FIGURE 5. Thirty pixel instances classified as a red roof in the Pavia
and green vegetation [46] as well as rocks in lithologic
image (in gray), which are primarily affected by illumination, and
mapping [47], expanding EM clusters and causing overlap
their spectral average (in red). The average Pearson correlation
between classes, hindering the EM identification and uncoefficient between each signature and the scaled version of the
mixing processes. Considering that only the amplitude of
mean spectra that is closest to it is about 0.993, indicating good
the incident radiation changes through the scene, the reagreement between illumination-based spectral variability and
flectance spectra of the observed pixels in the LMM become
the constant scaling model.
scaled by a constant positive factor.
This model agrees with the observation that most of the variability in
a hyperspectral image can be represented by a constant scaling of reference EMs [5]. As a simple empirical
Light Source
verification, we plot a random subIncoming
Outgoing
set of 30 pixels of red roofs from the
Angle
Angle
Viewer
Pavia image; the pixels are pure and
mostly affected by illumination effects. The results, which are depicted in Figure 5, indicate that the pixTerrain
els differ mostly by a scaling factor.
Although a constant scaling model is intuitive and simple, a more rigorous conclusion can be achieved by
FIGURE 6. Hapke’s model relates the reflectance to the incidence angles of the light source
analyzing the dependence of radiative
and the observer/viewer, given a material’s single scattering albedo and photometric paramtransfer models on the topography of
eters [50].
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

231

affect each material in a pixel differently than the others
since the behavior of the reflectance as a function of the
angle is different for each material. This indicates that
each EM/material in a pixel can be variously affected by
topographic effects. Furthermore, the nontrivial relationship between geometry and spectral signatures leads
to a more complex variation than single scaling of each
EM for high-albedo materials [51], [52]. Even small topographic variations can significantly affect the ground
reflectance. For instance, in [53], experimental studies
found that even small slopes (of fewer than 10º) originating from irregularities in the tree canopy can lead to
appreciable (enough to influence the results of subsequent tasks) changes in the measured reflectance of vegetation spectra.

0.9
0.8
0.7

Reflectance

1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0

Reflectance

INTRINSIC SPECTRAL VARIABILITY
Another important source of spectral variability is the intrinsic variation pertaining the definition of a material, which
is also called intrinsic variability. The characterization of this
type of variability has been prominently studied in the area
of vegetation monitoring, where it poses a huge challenge
to identifying tree species from spectral measurements [54],
[55] and to the characterization of soil and mineral spectra.
Vegetation’s spectral signature can change due to many factors, including microclimates, soil characteristics, precipitation, the presence of heavy metals, drought, foliage age, and
colonization by leaf pathogens [54]. The spectral signature
of soil is also heavily affected by variations in its composition and moisture content [56]. Furthermore, intrinsic
spectral variability is common in mineral spectra, due to
differences in the grain size distribution and the presence of
variable amounts of impurities [57], [58]. Moreover, it also
depends on what level of detail is adopted to represent a
given material (e.g., a tree EM may be split into trunk and
leaf EMs), which is generally application dependent [59].
Although there is a large impact on EM spectral signatures,
the dependence of intrinsic spectral variability on physicochemical parameters, which are usually unknown, makes
this area very hard to tackle.

One characteristic consistently obser ved in experimental studies is the smoothness of the observed spectra (i.e., the reflectance varies slowly between spectral
bands). This behavior can be taken into account when
designing SU algorithms. Moreover, unlike spectral
changes caused by illumination and topography effects,
intrinsic spectral variability frequently presents a considerable dependence of the variability amplitude on
the spectral wavelength. For instance, the signatures of
different instances of minerals in the United States Geological Survey (USGS) library, presented in Figure 7, show
complex dependence between the reflectance variation
and the wavelength. The samples from alunite and muscovite have a variability that is far from uniform across
the spectrum. Moreover, different instances from pyrite display complex variation, which is not consistent
across all samples, occurring independently in different
regions of the spectra. This behavior has been verified in
similar experimental studies, and it poses a great challenge for differentiating mineral classes based on their
spectral signatures [60].
These characteristics are even more prominent in the
spectral variation of vegetation reflectance, which shows
significant dependence on the wavelength and behaves
very differently in visible, near-infrared, and short-waveinfrared ranges [61]. This means that a simple scaling of
a reference spectral signature is usually not sufficient to
account for variations within tree species [54]. Extensive
experimental studies support this claim. In [54], the author found that the variation of spectral reflectance in the
visible and near-infrared regions can occur independently
when measuring the tropical forest canopy in Brazil. Similar inhomogeneity in spectral variation was also observed
in other studies with tropical tree species [62] and in many
distinctive environments, including conifer [63] and boreal tree forests [64]. Similar nonuniform variation trends
are also consistently observed in seasonal changes, as indicated by many experiments, including in salt marshes [65],
semiarid environments [66], and boreal tree species [67].
Furthermore, nonuniform spectral variations have also

0.6
0.5
0.4
0.3

0.5

1.5

Wavelength (µm)
(a)

2.5

0.2

0.5

1.5

2.5

Wavelength (µm)
(b)

0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02

0.5

1.5

2.5

Wavelength (µm)
(c)

FIGURE 7. Spectra variation samples from the USGS library. (a) Alunite. (b) Muscovite. (c) Pyrite.

232

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

0.5

1.5

2.5

0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0

Wavelength (µm)
(a)

ill-posedness of unsupervised SU problems that account for
spectral variability.
Another important characteristic is that EMs affected
by intrinsic spectral variability usually display significant
spatial correlation [77]. For instance, many experimental geostatistical works evaluating the spatial distribution and variability of the physicochemical properties
of the soil (e.g., the sand and clay concentration, electrical conductivity, acidity, compaction, and available elements, such as nitrogen, phosphorus, and potassium)
have reported significant spatial correlation/smoothness
in these properties. Reports include measurements performed in Rhodes grass crop terrain [78], calcareous soils
[79], rice fields [80], and tobacco plantations [81]. Besides
directly impacting the spectral signature of the soil, these
characteristics have been widely acknowledged to directly influence vegetation growth (e.g., they show a strong
correlation with crop productivity [78]) and hence their
spectral signature [61], [78]. Therefore, spatial correlation
in the variability is expected both in soil/terrain and in
vegetation signatures. A similar behavior has also been
observed in mineral spectra in the presence of spatially
correlated grain size distributions and impurity concentrations [57], [58]. This implies that the variability tends
to be small in modest spatial neighborhoods even though
it may be large across a sizeable scene. This fact can be
leveraged to design SU algorithms since it supplies information that can be used to reduce the severe ill-posedness
of the problem.
To illustrate this effect, we performed an experiment by
measuring the spectral variability in a homogeneous region
(composed mostly by pure pixels) of soil in the Samson image, presented in Figure 9(a). We then computed the Euclidean distance and spectral angle between each soil pixel
and the average spectra of all pixels in the subregion, which
was used as a reference material signature. The results are
depicted in Figure 9(b) and (c), where it can be seen that the
variability shows strong spatial correlation, as observed in
the Euclidean distance and the spectral angle.

Reflectance

0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0

Reflectance

been observed in samples from mineral, soil, and rock
spectra [60]. Numerous works model the spectral signature of materials as a function of photometric and chemical properties of the medium, based on radiative transfer
modeling and empirical approaches. A well-known example is Hapke’s model, which describes the spectra of a
surface composed of particles as a function of parameters,
such as surface roughness and density and the size of the
particles [48], [49].
Another prominent line of work models the spectral
characteristics of vegetation and soil samples as a function
of biophysical parameters [68]. Models of this kind have
been applied for the estimation of leaf biochemistry from
the observed spectra. An important example consists of the
characterization of leaf reflectance spectra as a function of
leaf biophysical parameters [68], for which a wide variety of
models has been used, ranging from a simple description of
leaf scattering and absorption properties to complex representations that include a detailed description of plant cells’
shape, size, position, and biochemical content [68]. Some
instances of those models include the characterization of
the spectra of broadleaf vegetation as a function of leaf mesophyll structure, pigment and water concentration [69],
angular profiles [70], and, in pine needles, cellulose, lignin,
and water content [71]. Other works model soil reflectance
spectra as functions of moisture conditions [72]–[74] and
snow albedo as a function of snow grain sizes and liquid
equivalent depth [75].
As an example, we generated spectral signatures of vegetation spectra using the PROSPECT-D model [76] as a function
of varying degrees of chlorophyll content, equivalent water
thickness, and dry matter content. The resulting signatures,
which appear in Figure 8, show that intrinsic spectral variability can present complex patterns and nonuniformity, as
it is often concentrated in specific regions of the spectrum.
Through their analytical characterization of EM spectra,
these kinds of models confine spectral variability to a lowdimensional manifold. This constitutes important information that can be leveraged to alleviate/reduce the severe

0.5

1.5

Wavelength (µm)
(b)

2.5

0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0

0.5

1.5

2.5

Wavelength (µm)
(c)

FIGURE 8. Reflectance spectra for vegetation generated with the PROSPECT-D model [76] for varying degrees of (a) chlorophyll content,

(b) equivalent water thickness, and (c) dry matter content.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

233

UNMIXING METHODS THAT
USE SPECTRAL LIBRARIES
METHODS THAT USE SPECTRAL LIBRARIES
Here, the following characteristics are present:
◗◗ The approaches are usually conceptually simple and easy to interpret.
◗◗ The quality of the SU results strongly depends on the spectral
library.

One of the main approaches to addressing spectral variability in SU is to consider large libraries of spectra acquired
a priori. These libraries contain different instances of each
material in a scene, and the unmixing problem becomes generally equivalent to finding which signatures can best represent each pixel in the scene. Different algorithms have been
proposed for this task, which we review. The spectral libraries used by these methods are sometimes called bundles, and,
in principle, they should account for all possible variations of
each material. Mathematically, they are represented as
u p, 1, f, m
u p, Mp}, p = 1, f, P, (4)
M p = {m
where M p is a library/bundle containing M p reference
u p, i ! R L of the pth material and P is the
spectral signatures m
number of materials in the scene. The spectral signature of
each material in the nth pixel y n of the hyperspectral image is then represented as an unknown element m n, p ! M p
belonging to this bundle.
Those sets can be readily used to constrain the EM matrices of the LMM for the N pixels to belong to a new set
M n ! M, with n = 1, f, N, where
M = {[m 1, f, m P], m p ! M p, p = 1, f, P}(5)
is the set of all possible EM matrices, with P Pp = 1 M p elements.
This definition assumes that only one signature from each
library, M p, p = 1, f, P , is present in each pixel. However,
other representations of the EM signatures as, e.g., sparse
and convex combinations of the elements in M p, can also
be considered to obtain more flexibility (see, e.g., [82]–[84]).
Such strategies are discussed in the “Multiple-Endmember

(a)

(b)

Spectral Mixture Analysis and Its Variants for Small Spectral
Libraries” and “Sparse Unmixing” sections.
Different methods have been proposed to solve the SU
problem by using spectral libraries. These can be roughly
divided into four groups of formulations: multiple-EM
spectral mixture analysis (MESMA), sparse SU, machine
learning, and spectral transformations. The MESMA algorithm and its variants formulate SU as a computationally
demanding optimization problem and achieve good quality. Sparse SU formulations use mathematical relaxations
to the MESMA problem that are computationally easier to
solve. Machine learning algorithms provide more flexible
ways to perform SU but also at a large computational complexity. Spectral transformations are empirically oriented
techniques that can be employed to improve methods from
the first three categories.
Although these families use spectral libraries to address
spectral variability in SU, the reasoning underlying each of
them can be quite different, leading to varying degrees of
required user supervision, computational complexity, and
abundance estimation quality, as illustrated in Figure 2.
Moreover, additional prior knowledge can be considered
in different ways, including, e.g., the design of principled
neural network architectures and the manual specification
of the robustness of particular spectral bands to variability.
We review each family of approaches in the following.
MULTIPLE-ENDMEMBER SPECTRAL MIXTURE
ANALYSIS AND ITS VARIANTS FOR SMALL
SPECTRAL LIBRARIES
MULTIPLE-ENDMEMBER SPECTRAL MIXTURE
ANALYSIS AND ITS VARIANTS
This algorithm and its modifications have the following qualities:
◗◗ They generally provide good SU results.
◗◗ They are easy to set up, with few or no parameters.
◗◗ They have very high computational complexity.
◗◗ Their results depend strongly on the quality of the available spectral
library.

×10–3
2

0.05

1.6

0.04

1.2

0.03

0.8

0.02

0.4

0.01

(c)

FIGURE 9. The spatial behavior of EM variability. (a) The soil subregion of the Samson image (highlighted by a red square). (b) The Euclidean distance and (c) the spectral angle between each pixel and the average spectra of the region.

234

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

The MESMA algorithm [37] and its variants (sometimes
also referred to as iterative mixture analysis cycles) are among
the most widely used for this task. These methods (Figure 10) enable the EM signatures to vary on a per-pixel basis while following the model in (4). The unmixing problem
is solved by searching for the EM and abundance combinations that result in the smallest reconstruction error (RE)
for each observed pixel, i.e.,
argmin y n - M n a n

a n, M n

subject to M n ! M, a n $ 0, 1 < a n = 1.

(6)

The EM matrices M n constructed by taking spectra from
the bundles are sometimes called EM models.
The MESMA algorithm has been employed in a wide
variety of situations, including natural, urban, and extraterrestrial environments [9, p. 1,607] as well as single and
multidate scenarios [85]. However, even though MESMA
is very amenable to parallelization [86], it consists of a
combinatorial optimization problem whose associated
computational cost can become very high. More specifically, its computational cost scales as the product of the
sizes of the individual libraries, as it consists of solving a
fully constrained least-squares (FCLS) problem, P Pp = 1 M p
times [87]. This can make the unmixing complexity unrealistic for large library sizes. Furthermore, the problem
(6) can become ill-posed when there are many EMs in
the bundles since different material combinations can
lead to very similar REs. To circumvent these limitations,
several modifications to the original MESMA algorithm
have been proposed.
Many variants of MESMA aim to provide computationally efficient approximate solutions to (6). The simplifications consist of stopping the exhaustive search

Sparse
Unmixing

minY – M A2, Subject to
Lib
F
A

Spectral
Library

optimization procedure (6) early by selecting the first EM
model that presents an RE that is below a threshold and
well distributed across spectral bands [37]. Another proposed approach is to solve (6) approximately by performing unconstrained least squares with every possible EM
model and then select the solution that yields positive
abundances and the smallest RE [88]. Although these simple modifications successfully reduce the computational
complexity of MESMA, the approximations involved can
also negatively impact the abundance reconstruction results [89], which imposes practical limitations on the selection of the thresholds and tolerances. This motivated
the consideration of more elaborate strategies to provide
better complexity reductions without impacting the unmixing performance.
An alternative approach to MESM A attempted to
lessen the computational complexity by solving an angle
minimization problem with respect to each library [87],
[90]. Although not guaranteed to converge to the optimal
solution of (6), this strategy performed similarly to MESMA on practical experiments, and it scales linearly with
library sizes, leading to computational improvements for
large numbers of signatures M p, p = 1, f, P, in the EM
bundles. Another work considered a mixed integer linear
program reformulation of the MESMA problem. This approach enables a more efficient computation of an exact
solution to (6) for small- to medium-scale problems [91].
A simple technique that is largely employed to reduce
the computational complexity of MESMA is to perform a
careful pruning of the spectral libraries M p, p = 1, f, P.
This process attempts to remove redundant and irrelevant spectra from the libraries before unmixing. These
strategies are described in detail in the “Library Reduction Techniques” section.

Sparsity Constraints
Physical Constraints
Structuring Constraints

Estimated Abundances
and Endmembers

Optimize
Cost Function

Observed Image
MESMA

Perform SU With
Every Possible
Combination of EMs

Select Abundances and
EMs With the Smallest
Reconstruction Error

FIGURE 10. The MESMA, fuzzy, and sparse SU techniques. MESMA and sparse SU are the main methods based on spectral libraries. The ba-

sic principle behind MESMA is to iteratively search for the combination of EM signatures in the library that, among all possibilities, enables
the closest reconstruction of each observed pixel under the LMM. Sparse SU, on the other hand, performs EM selection and abundance estimation in a single optimization problem by using sparsity and structuring constraints and penalties, which facilitates faster processing times.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

235

Besides reducing complexity, other approaches modify
MESMA with the purpose of improving its accuracy. For
instance, an early practice attempted to alleviate the illposedness of MESMA by prioritizing models with a smaller number of EMs for otherwise comparable REs [92], [93].
This avoids increasing the complexity of the model for
marginal gains. When consideration of material nuances
is important, it may be necessary to allow multiple signatures of the same broader EM class in the model. This was
the case in [94], where effects such as different vegetation
species in a single pixel were of interest. Spatial information has also been considered with MESMA by using segmentation algorithms to divide the image into different
homogeneous objects, which are then unmixed individually by using a library that is also constructed from objectbased spectra [95], [96]. A different formulation attempted
to increase the flexibility of MESMA by enabling the EMs
of each pixel to be represented as a sparse, nonnegative
combination of the signatures contained in the library for
their respective material class [82], [83]. Under this model, SU was then formulated as a nonconvex optimization
problem with different sparsity constraints, including both
L 1/2 [82] and L 0 norm-based penalties [83]. This problem
was solved through a multiplicative update rule in [82] and
by using the proximal alternating linearized minimization
method in [83].
Another set of approaches related to MESMA, referred to as fuzzy unmixing, consider a measure of uncertainty or indeterminacy in the estimated abundances by
computing quantities such as average, maximum, and
minimum cover fractions. One of the first techniques
of this kind employed linear programming methods to
determine maximum and minimum fractional abundances for each material by using spectral libraries extracted from the observed image [97]. Another approach
attempted to determine the abundance indeterminacy
(i.e., its fuzzy membership for each value of the abundance fractions) by evaluating how close synthetically
mixed spectra that had all the possible EM combinations
were to the observed pixel spectra y n [98]. This procedure, however, required the discretization of the abundance values, and its computational complexity does
not scale well with the number P of EM classes. Other
approaches performed linear SU with a large number
of EM models selected at random from the library. Afterward, measures of uncertainty in the estimated fractional abundances, such as maximum, minimum, and
average cover fractions, were computed from the results,
providing a more detailed characterization of the abundances [99]–[101]. A similar work proposed to compute
the final abundance fractions as a weighted sum of the
abundances obtained from SU, with each possible combination of signatures drawn from the library M [92].
The weights corresponded to the probability of each EM
model being actually present in the scene, which was
supposed to be known a priori.
236

SPARSE UNMIXING
SPARSE UNMIXING
Sparse unmixing has the following characteristics:
◗◗ Generally, it is very computationally efficient, especially compared to
multiple-EM spectral mixture analysis (MESMA).
◗◗ The SU results might not be as accurate as those from MESMA.
◗◗ The process can be harder to interpret (e.g., it might select multiple
signatures of the same material to represent a given pixel).
◗◗ The SU results are sensitive to the selection of the regularization
coefficients.

An alternative approach to SU with spectral libraries is
to formulate the SU as a sparse regression problem, where
we want to select a small number of spectral signatures
from the library that can best represent each observed pixel
according to the LMM. Most sparse unmixing methods are
based on an unstructured library, which can be derived
from (4) by concatenating all the signatures in a single matrix M Lib, defined as
u 1, 1, f, m
u p, k, m
u p, k + 1, f, m
u P, MP]. (7)
M Lib = [m
Using the spectral library defined in (7), the sparse unmixing problem can be formulated as the optimization
problem [102], [103]
2

argmin y n - M Lib a n
an $ 0

subject to a n

# P, 1 < a n = 1,

(8)

where $ 0 is the L 0 pseudonorm, which counts the number of nonzero elements in a vector. Different strategies
have been proposed to solve the sparse SU problem by using the L 0 pseudonorm, employing, for instance, greedy
(e.g., matching pursuit and forward–backward) algorithms
[104], [105], Lagrangian function (regularized) formulations [106], and multiobjective optimization procedures
that jointly consider the RE and the sparsity of the solution
[107]–[109]. Note that (8) would be equivalent to MESMA
if we added an additional linear structuring constraint to
enforce the occurrence of only a single nonzero abundance
per material class [91].
The optimization problem (8) is, however, nonconvex and generally NP-hard to solve. It is therefore common to relax the L 0 pseudonorm constraint into its
convex surrogate, leading to the following optimization
problem [102]:
argmin y n - M Lib a n
an $ 0

+ m a n 1, (9)

where $ 1 is the L 1 norm and the parameter m controls the
level of sparsity of the estimated abundances. The sum-toone constraint is not used in (9), due to its incompatibility
with the L 1 norm [102]. Although problem (9) is nonsmooth,
it is convex and can be solved very efficiently. Besides, it produces good experimental performance. This motivated a
great deal of interest in sparse unmixing methods, resulting
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

“background” spectrum of the EMs. This background spectrum was defined as the low-frequency part of the spectral
signatures and estimated a priori from the library as a parametric function of smooth splines. The performance of an
L 1 norm-based sparse SU framework under this model was
reported to be similar to MESMA, albeit at a much smaller
computational cost.

in a number of works proposing improvements, such as the
use of alternative sparsity-promoting penalties [110], [111]
and different means of spatial regularization [112], [113].
Sparse unmixing methods would merit a more comprehensive review, which is beyond the scope of this article. Thus,
in the following, we restrict ourselves to modifications of
the sparse SU framework specifically aimed at dealing with
spectral variability and structured libraries.
In [114], L 2, 1-norm based group sparsity constraints were
used to favor the selection of abundance vectors containing
many entire material classes with zero proportions. A later
formulation considered a fractional group (p, q)-norm sparsity constraint as a generalization of the approaches based
on the L 2, 1 norm [84]. The (p, q)-norm penalty provides
better control of the sparsity within each group of variables
as well as the addition of the sum-to-one constraint. However, this comes at the expense of making the optimization problem nonconvex. Another sparse SU formulation
[115] proposed to explicitly represent mismatches between
the library spectra and the hyperspectral image caused by
different acquisition conditions. In this case, the spectral
signatures of the library are also estimated in the SU process. However, they are constrained to be within a given
Euclidean distance of a corresponding element of the library known a priori. This enables the estimated signatures
to vary arbitrarily within Euclidean balls centered at the library elements to compensate for spectral mismatches.
A different approach [116] was to modify the LMM for
unmixing mineral spectra in mining applications by including an additional term representing the mixture of the

Spectral
Library

Synthetic
Abundances

MACHINE LEARNING ALGORITHMS
MACHINE LEARNING ALGORITHMS
The following is true of machine learning algorithms:
◗◗ They are very flexible approaches that, in principle, can deal with any
effect that is represented in the training data.
◗◗ Most methods have significant computational complexity or do not
have a clear physical motivation.
◗◗ The SU quality depends on the representativeness of the training data
(which are usually generated using a spectral library).
◗◗ They generally do not return EM spectra for each pixel.

Some works propose to address spectral variability using
machine learning methods (Figure 11) by formulating SU
as a supervised regression problem. The objective is to learn
transformations that map the observed (mixed) pixel to the
abundance fractions [117]–[121] using a supervised training
procedure. Mixed pixels with known proportions are employed as training data for algorithms, such as neural networks, random forests, and support vector machines (SVMs).
These techniques can be straightforwardly adapted to address spectral variability by considering multiple spectral

Mixed
Pixels

Supervised
Learning

Estimated
Abundances and EMs

Training Data

Hidden
Layers
Observed Image

Neural Network

FIGURE 11. A description of machine learning-based SU techniques. The flexibility and representation power of machine learning algorithms can be exploited to address spectral variability by formulating SU as a supervised learning problem. One simple approach is to learn
a mapping between the mixed pixels and the abundance and EMs based on training data generated synthetically via a spectral library. However, the incorporation of expert knowledge about the SU problem in the design of the machine learning algorithm is important to obtain
better performance and effectively address spectral variability.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

237

signatures for each EM when generating the synthetic training data set. This has been done by directly applying regression methods [122] and by converting SU into a classification problem by quantizing the solution space of abundance
values and using a one-against-all strategy [123]. Another
work modified the SVM cost function to directly minimize
the unmixing RE during the training process [124].
Usually, these approaches result in extremely large training sets for sizable spectral libraries. Thus, even though some
strategies, such as bootstrap aggregation, have been employed to speed up the training process [125], [126], the computational cost is still very high. Although methodologies
to discard irrelevant (regarding the impact on performance)
subsets of the training data [127], [128] could, in principle,
be applied to accelerate training, recent works have focused
on modifying the algorithms to reduce their complexity. One of the main reasons for this complexity is that the
training data must jointly describe spectral variations due
to changes in the abundances and EMs. Some works have
tried to address this issue by using only pure pixels from a
spectral library as training data. One such approach, which
received considerable attention, consists of extended SVMs.
Extended SVMs employ hybrid soft–hard classification and
regression to address spectral variability. It is assumed that
the spectral space is separable by hyperplanes delimiting
two complementary regions containing only pure and only
mixed pixels, respectively [129]. The extended SVM is then
trained to find a soft–hard classifier containing 1) a hard
classification rule consisting of the hyperplanes delimiting
the regions in which the pixels are considered pure and 2) a
soft classification rule, which determines the abundances of
the pixels considered to be mixed.
Different forms of the extended SVM have been studied,
using either a single kernel [129] or multiple kernels [130],
accounting for the abundance indeterminacy by computing
the maximum and minimum proportion values similarly to
the fuzzy SU procedures [131] and using Fisher discriminant
analysis (FDA) to reduce the within-class spectral signature
variability in the spectral library before training [132]. Although hybrid soft–hard classification methods can be fast
to train, they lack a clear physical interpretation of the results since they have no direct relation to the physical mixing
model. Moreover, the influence of spectral variability on the
regions of the spectral space containing mixed pixels is limited because it comes only from the marginal hyperplanes
that separate the pure from the mixed pixel regions [129].
A related strategy that uses only pure pixels in the training process consists of modeling the latent function from the
mixed pixel spectra to the abundance maps in a probabilistic framework as a multitask Gaussian process [133]. In this
case, the abundance means and covariance matrices are obtained through the posterior distribution of the abundances
conditioned on the training set (i.e., the spectral library) and
the mixed pixels. This strategy was extended to consider spatial correlation in a two-step procedure by using the Gaussian process results from [133] as input to the abundances
238

prior information in a maximum a posteriori estimation
problem [134]. Although this strategy has a strong statistical motivation, the introduction of additional constraints
(e.g., abundance nonnegativity and sum-to-unity) is not
straightforward and results in high computational complexity. Another work proposed to mitigate the influence of spectral variability by processing the image using a geodesic SU
method [135] before applying Gaussian process regression to
estimate the final abundances [136]. Although possibly inaccurate, the preliminary abundances estimated by the geodesic SU algorithm are not affected by EM variations caused by
differences in illumination and acquisition conditions. The
Gaussian process regression then learns to map the inaccurate initial abundances to the desired ones. Despite increasing the robustness to spectral variability, geodesic SU can
introduce significant distortions in the abundances for complex data manifolds, which may not be trivial to mitigate.
Note that other machine learning techniques have been
employed to perform SU without directly addressing spectral variability. These include the use of convolutional neural networks [137], the consideration of neural networks
that are well adapted to learn from fewer samples [138], and
the use of autoencoders to perform unsupervised SU by
identifying the latent codes with the fractional abundances
and the decoder with the mixture model [139]–[142]. Other
works considered specific neural network architectures inspired by unfolding iterative optimization algorithms [143],
and they employed Hopfield neural networks to optimize
the SU cost functions more efficiently [144]. Machine learning methods have been applied in different experimental
settings, such as unmixing spectrally similar vegetation
types [145] and urban surfaces [146] and using training data
collected at multiple locations [147]. Given the success that
machine learning methods are achieving with different
problems, particularly in the area of remote sensing [148],
[149], such techniques may bring important advances if
used to address the EM variability problem in SU.
SPECTRAL TRANSFORMATIONS
SPECTRAL TRANSFORMATIONS
Spectral transformations have the following qualities:
◗◗ They can be seen as a “preprocessing” strategy that can be used with
other library-based SU methods.
◗◗ They are conceptually simple and have low complexity.
◗◗ Many of the methods are empirical and require a significant degree of
expert knowledge about the underlying application.
◗◗ The performance of the less supervised methods depends strongly on
the representativeness/quality of the library.

An approach frequently used to mitigate the effect of
spectral variability in library-based SU consists of selecting a subspace of the spectral space that is minimally influenced by the variability of the EMs to be prioritized in
the unmixing process. This idea was introduced to improve
the classification of materials under varying atmospheric
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

illumination conditions [17], [20]. The majority of these
methods are based on affine transformations of the observed pixels, defined as
W m y n + b n = W m M n a n + W m e n + b n, (10)
where the matrix W m and the affine term b n are determined to minimize the effects of EM variability in the subsequent SU process. Besides modifying the observed pixel
spectrum y n, this transformation is applied to the elements
of the spectral library, yielding
W m M p + b n _ {W m m + b n : m ! M p}(11)
for p = 1, f, P. Different cases of this model have been
considered in the literature, most notably with W m being
a diagonal matrix with positive real (band weighting) and
binary (band selection) elements. Note that although traditional dimensionality reduction [e.g., principal component analysis (PCA)] and band selection methods used to
compress the hyperspectral image could be implemented
through this transformation with b n = 0, the direct application of compression techniques does not necessarily improve the robustness to spectral variability [150]. Spectral
transformation approaches can be generally divided into
two major groups: those defined a priori based on the user’s
expert knowledge and those constructed automatically by
incorporating information in a spectral library. We review
each case in the following.
USER-DEFINED SPECTRAL TRANSFORMATIONS
The first user-defined spectral transformations were proposed to normalize the effects of illumination and brightness variations and to emphasize useful spectral features.
These approaches include subtracting the reflectance value of
a selected (specific) spectral band from all remaining bands
[99], subtracting from each EM its mean value in the spectral dimension to reduce the variability due to differences in
brightness [151], and normalizing/dividing the reflectance
value at each wavelength by the corresponding value of the
convex hull of the spectral signatures [152]. Other examples
include using the first or second derivatives [153], [154] and
the wavelet transform of the spectral signatures [150] for SU.
A spectrum-based approach that has become very popular for solving this problem consists of using band selection
methods. These techniques basically work by performing SU
via selected wavelength intervals in which there is little spectral variability between different spectral signatures of the
same material [9], [99]. Although many of these approaches
rely on expert knowledge about the specific underlying application, they are simple and easily interpretable and also
help in reducing the computational cost of the SU problem.
Examples of band selection methods defined a priori by the
user include the use of the short-wave infrared 2 spectral region (2,100–2,400 nm) for unmixing soil and vegetation in
arid and semiarid environments [99] and the combination
of various spectral regions, such as visible, near infrared, and
short-wave infrared, for other applications [100].
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

LIBRARY-BASED SPECTRAL TRANSFORMATIONS
Spectral transformations proposed more recently leverage
information contained in the spectral library to compute
the terms W m and b n of the affine transformation. This
circumvents one of the main downsides of the previous
approaches by making the process automated instead of
delegating the choice to the user. These techniques can be
further divided in three groups, namely, band selection,
band weighting, and more general spectral transformations.
BAND SELECTION
Newer band selection methods seek to identify robust spectral regions based on samples in the spectral library. Different strategies have been proposed. One of the first approaches is based on the analysis of the spectral residuals
obtained by performing a preliminary unmixing of the image using the LMM with an average EM matrix [155]. Only
the spectral bands with minimal residual variance are then
used for SU, based on the empirical observation that they
correspond to more robust spectral zones. Another method, called stable zone unmixing, proposes to select spectral
bands that are robust to spectral variability by minimizing an instability index defined as the ratio between the
intra- and interclass EM variances (computed based on a
spectral library) [156]. This method was later extended to
minimize the instability index and the correlation between
signatures of different EM classes at the same time, aiming to improve the numerical conditioning of the SU problem [157]–[159]. The work [160] proposed to improve the
separability between classes by employing the stable zone
unmixing framework to select an individual set of spectral
bands for each possible subset of EM/material classes that
could be tested with MESMA when considering EM models
with fewer than P signatures in the SU process.
BAND WEIGHTING
Band weighting methods are more flexible techniques that
enable one to prioritize the spectral bands in the unmixing
process according to their reliability and significance by using
a continuous weight term. This is usually done by weighting
the RE of each band in the SU cost function. Different approaches have been proposed to compute the weight to be
applied to each band. For instance, a weighting strategy based
on two terms was offered in [161]. One term normalizes the
energy of the reflectance spectra to equalize the contributions
to low- and high-reflectance bands, and another term accounts for the robustness of each band to spectral variability
through its instability index (i.e., the ratio between the intraand interclass EM variance). This approach was later applied
to monitor the level of defoliation in Eucalyptus plantations
[162] and invasive plant species via multitemporal data [163].
It was later extended in [164] to consider SU that integrated
reflectance and derivative spectra. Band weights based on the
instability index were used to prioritize the more stable spectral bands when designing spectral filters that were robust to
spectral variability, which are low-complexity alternatives
239

that approximate the solution of the SU problem as a direct
application of a single linear transformation [165].
GENERAL SPECTRAL TRANSFORMATIONS
Another group of approaches employ more flexible linear
transformations to better mitigate the effect of spectral
variability. These techniques consist of variations of FDA,
which is widely used for pattern classification. FDA aims to
find a transformation of the data to obtain a feature space
with the best separability between different classes [166].
In the context of SU, this amounts to minimizing the variance of the signatures of each material while maximizing
the distance between the mean values of the different EM
classes [167]. Mathematically, this is formulated as
W m = argmin
W

W < S within W
, (12)
W < S between W

where S within is the weighted sum of the within-class covariance matrices and S between is the covariance matrix of the
mean EM spectra.
The first approaches directly applied FDA to SU by using
spectral libraries that were known a priori [167] or constructed with pure pixels extracted from the observed hyperspectral image [168]. Another work considered the augmentation
of the spectral library with pure pixels extracted from the
image to improve the discrimination among spectrally similar vegetation species [169]. Later approaches studied other
variations, such as the iterative addition of more column vectors to W m using a Gram–Schmidt orthonormalization procedure to increase the dimensionality of the output space for
multispectral images with a small number of bands [170].
Another work proposed to make the spectral signatures of
different EMs orthogonal to one another and the spectral
signatures of the same EM all unitary and collinear to improve the numerical conditioning of the SU problem [171].
FDA was successfully applied to improve the performance
of MESMA when unmixing urban surfaces (containing vegetation, soil, water, and man-made materials) using imageextracted spectral libraries [172]. In contrast to its improved
flexibility, FDA has as a downside: its dependence on good
estimation of the covariance matrices to be used in (12).
Thus, it may not perform well if the number of samples in
the libraries is not statistically representative [173].
UNMIXING METHODS THAT ESTIMATE
ENDMEMBERS FROM THE IMAGE
In recent years, a large number of works proposed to address spectral variability in SU without relying on prior
knowledge about spectral libraries. Different strategies
have been proposed to this end, which we divide into four
groups. Local unmixing methods are computationally and
conceptually simple but require significant user supervision. Parametric EM models provide more flexibility to
represent EM spectral variability but make the SU problem
harder to solve. EM-model-free methods address spectral
variability by using different modifications to the SU cost
240

function. Bayesian methods employ statistical representations for the EMs, which leads to less user supervision at the
price of high computational complexity.
All these methods are able to estimate EMs and abundances directly from the image. However, as seen in the “Unmixing Methods That Use Spectral Libraries” section, the
reasoning underlying each group is quite different, which
leads to various levels of required user supervision, computational complexity, and abundance estimation quality,
as illustrated in Figure 2. Moreover, prior knowledge in the
design of the algorithms is an important ingredient to guarantee good performance, and it includes, e.g., the spectral
and spatial correlation of EM signatures and their statistical
properties. We review each family in the following.
LOCAL UNMIXING METHODS
LOCAL UNMIXING
Local unmixing techniques are characterized by the following:
◗◗ They are conceptually simple and physically motivated.
◗◗ They are computationally efficient.
◗◗ Using them requires a significant amount of user supervision.
◗◗ The selection of the local image regions has a significant impact on
the results.
◗◗ Local EM extraction can be difficult.
◗◗ Grouping the local estimates into global results is also challenging.

A conceptually simple and efficient method to deal with
spectral variability is to perform EM extraction and SU locally for small, nonoverlapping regions of the hyperspectral image. This approach, called local unmixing and detailed
in Figure 12, assumes the EM signatures to be constant in
each region of the image, benefiting from the knowledge
that spectral variability is often negligible in small areas.
The basic framework of local unmixing can be summarized
into the following steps:
1) Divide the observed image into a set of regions.
2) Estimate the number of spectral signatures and extract
the EMs in each region.
3) Perform SU with the local EM signatures.
4) Combine all the local SU results into global sets of EM
signatures and global abundance maps using, e.g., clustering procedures.
Although local unmixing methods share similar overall
methodologies, there are important differences in the way
the hyperspectral image is partitioned (e.g., by using simple
square tiles or more advanced image segmentation) and
how the EMs are extracted from each region. This can have
a significant impact on the results.
The first approaches for local unmixing required
complete user supervision. For instance, the variable
MESMA algorithm proposed in [10] used manual image
segmentation to divide the image into local regions. SU
was then performed iteratively, updating the segmentation maps and manually including additional EMs in
the process until a satisfying result was obtained. Later
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

techniques attempted to reduce the need for user supervision in the process. For instance, EM extraction and SU
were performed individually in local (square) image tiles
in [174] and [175]. Afterward, the locally extracted EMs
and abundance maps were merged into the global EM sets
and abundance maps using clustering algorithms. Image
segmentation methods were later applied to provide more
flexibility when dividing the hyperspectral image into local regions. For instance, in [176], manual EM extraction
and SU (using the FCLS algorithm) were individually performed in each image region defined by a segmentation
algorithm. Another work considered a superpixel decomposition of the image, aided by external map metadata
to compute a more accurate segmentation [177]. A more
sophisticated method proposed in [178] and [179] used
a binary partition tree to divide the image into different
regions, from a coarse to a fine spatial scale. Local unmixing was then performed at the scale of the partition tree
that yielded the smallest REs.
Besides the choice of the segmentation procedure, EM
extraction is a challenging part of local unmixing and has
a great impact on the performance of these algorithms.
A spatially adaptive unmixing method was proposed in
[180] to estimate the distribution of different surfaces in
urban environments. EM spectra for each pixel were synthesized as a weighted average of pure pixels extracted in
a spatial neighborhood specified by the user, with weights
given as a function of their distance to each mixed pixel
at hand. A similar approach employed as EMs the mean
values of pure pixels extracted within each (square) image
region, with the pixels identified through a classification
strategy [181]. These techniques can positively weight pure

Divide Image
Into Local Regions

Observed Image

pixels that are spatially close to each pixel being unmixed.
This idea was also in other works, such as [182], which performed SU through a variant of the MESMA algorithm,
and in [183], which used only the spatially closest pure
pixels to process each mixed pixel. Other local unmixing approaches considered hierarchical segmentation approaches in which the hyperspectral image was divided
into two spatial scales: a coarse one, where unmixing was
performed with MESMA, and a fine one in which the spectral libraries were extracted by using the spectral signatures of small and homogeneous objects [184] or a priori
knowledge about the abundances obtained from external
high-resolution classification maps [185].
An important issue related to local unmixing algorithms
is the determination of the number of EMs contained in each
local image region. While in most experimental works this
was performed empirically and even manually, it is desirable
to have automated methodologies to estimate the number
of local EMs and their spectral signatures. This usually involves the estimation of the intrinsic dimensionality of the
local subset of the hyperspectral image [186]. However, the
performance of intrinsic dimensionality estimators is often
negatively impacted when the size of the data set is small
[187]. This strongly limits the characteristics (i.e., the size) of
the subsets and the segmentation procedures selected for unsupervised local unmixing. Collaborative sparse regression
approaches [110] were proposed to deal with the shortcomings of intrinsic dimensionality estimation by avoiding the
selection of repeated and mixed signatures during unmixing
[188]. The sparsity level was selected via a Bayesian information criterion to obtain a good compromise between small
REs and a low number of selected signatures.

Clustering
Estimated
Abundances and EMs

Unmixing Results
for Local Regions

FIGURE 12. A description of local SU techniques. Local SU addresses the variability of EM signatures across space by performing the process on

small, compact spatial regions of the image in which the EMs can be assumed to be approximately constant. The local SU results for each image
region are then clustered to assemble global abundance maps and sets of EM spectra. Local SU offers a lot of flexibility in the choice of the
segmentation of the image and the local EM extraction and clustering strategies, which can have a significant impact on the global SU results.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

241

A different line of work attempted to relax the assumption of the connectedness of the local spatial regions, performing SU in different subsets of the hyperspectral image
that were not necessarily spatially adjacent. For instance,
the piecewise convex model proposed in [189] considered
a set of different EM matrices, all estimated from the entire image. Each pixel was then assigned to one of these
EM matrices through a (fuzzy) membership function,
which was estimated with the other variables in a nonconvex matrix factorization problem. Other works extended
this approach by considering cluster validity indices [190]
and sparsity-promoting priors [191] to estimate parameters such as the number of EM matrices and material
classes in each segment and using spatial constraints to
encourage neighboring pixels to have similar membership
values [192].
Similar work considered the estimation of multiple EM
matrices in a nonnegative matrix factorization framework
by using abundance sparsity constraints instead of employing (fuzzy) membership functions while also penalizing
the mutual coherence between the signatures of different
material classes to improve interclass separability [193]. A
related strategy considered a self-dictionary model, where
multiple EM signatures were directly selected as the hyperspectral image pixels that could best reconstruct most of
the remaining pixels in the scene as a sparse linear combination [194]. Another approach with even more flexibility
considered an individual EM matrix for each image pixel
in a nonnegative matrix factorization formulation [195]. A
regularization term penalizing the trace of the covariance
matrix of the estimated spectral signatures for each class
was also considered to reduce the ill-posedness of the estimation problem.

PARAMETRIC MODELS
PARAMETRIC ENDMEMBER MODELS
Parametric EM models possess the following traits:
◗◗ The SU algorithms are computationally efficient.
◗◗ They are very flexible and physically motivated models to represent
any kind of variability.
◗◗ It is easy to incorporate prior information.
◗◗ Determining a good EM model might require some degree of expert
knowledge.
◗◗ They require significant user supervision for tuning the free model
parameters.
◗◗ Estimating the parameters of the EM models (along with the abundances) can be challenging due to the presence of nonconvex optimization problems and sensitivity to parameter choice and initialization.

A flexible and physically reasonable way to address spectral variability consists of employing parametric models to
represent EM spectra; see Figure 13. These strategies provide
great freedom to incorporate constraints and information
from the underlying application. They are generally based
on representing the EM spectra as
M n = f (M 0, i n), (13)
where f (·) is a function of an average or reference EM matrix
M 0 and a vector of parameters i n. The number of parameters in i n is usually small, which enables one to confine the
EM spectra to a low-dimensional manifold. The SU problem
is then formulated as the recovery of the abundances and parameters i n for all pixels in the image. The model in (13)
can be defined based on the underlying physics describing
material spectra as a function of numerous geometric and

Parametric EM Model
min
N
{θn,an}n = 1

Estimated
Abundances and EMs

Σ yn – f (M0, θn)an2 + R ({θn}nN= 1, A)

n=1

Optimize
Cost Function

Observed Image
EM-Model-Free
Objective Function

min
φ, A

gφ(Y – M0A) +R (φ, A)

Regularization
Terms

FIGURE 13. A description of SU techniques based on parametric EM models and EM-model-free SU approaches. Parametric EM models
represent the (variable) signatures of EMs as a function of a low-dimensional vector of parameters. The abundances and vector of EM
parameters for each pixel are then recovered by solving an optimization problem. EM-model-free methods, on the other hand, generally
attempt to indirectly mitigate spectral variability through the design of robust cost functions using, e.g., additive residual terms. The use
of regularization terms is important in both cases to incorporate a priori knowledge about the problem.

242

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

photometric parameters, such as Hapke’s [48], [49] and
Shkuratov’s [196] models for packed particles and the PROSPECT and PROSAIL models for vegetation [68], [69]. However, the model (13) can also be inspired by physics but chosen
to facilitate more flexibility and mathematical tractability.
We review these approaches in the following.
PHYSICS-BASED METHODS
The first SU approaches using parametric models aimed
to obtain fractional abundances from intimate mineral
mixtures by inverting the Hapke model [197]. With perfect knowledge about the viewing geometry, the scattering properties of the different materials, and the single
scattering albedo of the EMs, the SU problem using
Hapke’s model becomes linear in the albedo domain
[198]. However, since these variables are hardly available in practice, many works attempted to blindly invert
Hapke and related models. This inversion is mathematically and computationally very difficult, in general, and
requires hyperspectral images acquired at multiple viewing geometries [197]. Thus, subsequent works proposed
simplifications of the scattering characteristics of the
materials in the model (13) to improve its mathematical
tractability [199]. These methods have been successfully
applied to estimate abundance maps of different scenes,
including the Cuprite mining district in Nevada [200]
and the moon [201], [202].
This approach has also been applied to the SU of
vegetation mixtures based on the inversion of radiative
transfer models. The first works simplified the problem
by assuming external knowledge of biophysical parameters. For instance, a model for mixtures of vegetation
and shadowed and illuminated soil was proposed for
SU by approximating plant geometry with spatially distributed cylinders containing layers of leaves [203]. Although spectral variability was supported by means of
changes in biophysical parameters, these were assumed
to be known a priori to solve the SU problem. Another
approach considered the SU of soil and vegetation by
using a simplified mixing model as a function of NDVI
values instead of the full spectral signatures [204]. In
this case, a physical model was employed to represent
the variability of the NDVI values as a function of parameters, such as the viewing geometry, leaf density,
and clumping effect. However, the NDVI “EMs” for each
pixel had to be estimated before SU by using multiangled observations and assuming prior knowledge of the
leaf biophysical parameters.
A later technique for the SU of soil and vegetation
mixtures proposed to blindly estimate the biophysical parameters from the hyperspectral image using the
PROSAIL model for vegetation spectra [205]. The SU
problem was formulated as the recovery of both the
abundances and the two parameters of the PROSAIL
model and solved through an alternating optimization procedure. Note that the other parameters of the
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

PROSAIL model had to be fixed a priori. Although these
models possess a strong physical motivation, their use
in SU leads to computationally intensive and mathematically challenging (i.e., nonconvex and significantly
ill-posed [206]) problems. This occurs because physics
models were originally devised as forward representations that accurately describe reflectance spectra based
on a set of parameters and were not designed to be inverted, which limits their use for SU in practical problems [207].
PHYSICALLY MOTIVATED AND
NONPHYSICS-BASED METHODS
The low mathematical tractability of physics-based models has motivated recent studies leading to more flexible
and parsimonious models that are inspired only by the
underlying physics. Although these models are not as precise as those presented in the “Physics-Based Methods”
section when representing physical phenomena underlying spectral variability, they enable more efficient SU algorithms that estimate the involved parameters i n from
the observed image. Moreover, although models inspired
by physics can be ill-posed, EM spectra are often confined
to a low-dimensional manifold since they depend only on
a small number of physicochemical variables. This property can be exploited to design parsimonious models with
possible constraints and reduce the ill-posedness of the
SU problem.
Several parametric models have been proposed with
these objectives. One of the resulting SU algorithms is the
scaled constrained least-squares method [18], which attempts to represent uniform illumination variations in each
pixel by introducing an additional scaling factor } n ! R +
into the EM matrices as
M n = } n M 0. (14)
SU can be performed using the model (14) by solving a
simple nonnegative least-squares problem, which is convex
and computationally efficient. However, this representation lacks the ability to represent the more complex spectral
variability that has been observed in practical scenes, motivating the search for more flexible models.
A version of the LMM, the extended LMM (ELMM),
was proposed in [51] and [208] by enabling each EM
in a pixel to be individually scaled by a constant factor, resulting in the following representation for the
EM matrices:
M n = M 0 diag (} n), (15)
where vector } n ! R P+ contains the scaling factors for each
of the P materials. The ELMM can represent more complex
variability originated from variations in the illumination
and topography, which can affect each material in the
hyperspectral image differently. Furthermore, the ELMM
243

can be obtained from successive physical approximations
of the Hapke model for small-albedo materials [52]. Based
on an estimate of M 0 obtained from the observed image,
SU under the ELMM was formulated as a nonconvex matrix factorization problem in which the model (15) was enforced by means of an additive penalty in the cost function
[51]. A regularization promoting the spatial homogeneity
of the scaling factors } n was also considered to reduce
the ill-posedness of the SU problem [51]. The ELMM has
shown good performance for multitemporal data [209]
and been used to facilitate the interpretation of local unmixing results [210]. Moreover, it can be derived from a
Taylor series expansion of a general nonlinear mixture
model [211], which introduces SU with spectral variability (viewed as a locally linear SU problem) as a direct way
to address the general nonlinear SU problem. This shows
that some mixture models originally devised to represent
spectral variability (such as the ELMM) can achieve good
performance in nonlinear SU.
Despite its physical motivation, the ELMM lacks
the f lexibility to represent more complex spectral variability, e.g., nonuniformly affecting the spectra. To address this limitation, the generalized LMM (GLMM)
was proposed in [212] by introducing an individual
scaling factor for each band, leading to the following
EM model:
M n = W n % M 0, (16)
where the matrix W n ! R L # P contains the scaling factors
for each element of M 0 and % denotes the Hadamard (elementwise) matrix product. Note that the amount of spectral variability produced by the GLMM is proportional to
the amplitude of the reference spectra M 0 in each band.
However, the larger number of parameters makes the SU
problem resulting from (16) more ill-posed, with challenging estimation problems. This motivated the development
of a tensor interpolation framework to estimate the matrices W n from training hyperspectral data obtained from
prior knowledge about the positions of pure pixels in the
hyperspectral image [213]. However, the performance of
the method proposed in [213] strongly depends on the
number of pure pixels available in the image. The GLMM
also has been successfully used in multitemporal SU [214]
and to represent spectral variability when fusing hyperspectral and multispectral images acquired at different
time instants [215].
Note that the performance of unmixing methods
based on the ELMM and the GLMM heavily depends
on the quality of the reference EM matrix M 0, which
must be estimated from the obser ved image. To reduce the dependence of the ELMM on M 0, the authors
of [216] proposed to jointly estimate M 0 with the remaining variables during SU. Each column of M 0 was
constrained to have a unit norm to obtain EMs as
directional data in the spectral space. Moreover, M 0
244

was initialized using a simple cosine-based k-means
clustering of the obser ved data cube, which improved
the robustness of the method to the presence of shadowed pixels.
A different EM model was proposed in [217] by considering an additive term to the mean EM matrix, resulting in
the following EM representation:
M n = M 0 + dM n, (17)
where the matrix dM n ! R L # P is an additive perturbation representing spectral variability. In this case,
the reference EM matrix M 0 and the pixel-dependent
additive perturbation terms dM n were blindly estimated from the hyperspectral image. However, this
model has a large number of parameters. Thus, to
mitigate the ill-posedness of the SU problem, a regularization term consisting of the Frobenius norm of
dM n, n = 1, f, N was included in the unmixing cost
function. Besides the simplicity and mathematical
tractability, the use of an additive perturbation in (17)
also makes the problem amenable to an interesting interpretation when only a single additive perturbation
matrix is considered for all image pixels. In this case,
the SU problem becomes equivalent to a total leastsquares problem with constraints [218]. Furthermore,
the perturbed LMM (PLMM) has been considered for
robust SU using an outlier-insensitive RE metric with
an L p-quasi norm [219] and for multitemporal and distributed SU [220], [221].
One difficulty of parametric EM models involves
constructing functions f (·) that are parsimonious and
still flexible enough to represent complex spectral variations. To circumvent this issue, a deep generative EM
model was proposed in [222], based on the hypothesis
that the EMs lie on low-dimensional manifolds. Instead
of fixing the EM model a priori, variational autoencoders with neural networks were employed to learn the
parametric function f (·) in (13) by using pure pixels
extracted from the observed hyperspectral image. SU
was then formulated as the recovery of the abundances
and representations of the EMs in the learned manifold,
which can be of very small dimension. Despite making SU more well-posed, the resulting cost functions are
nonconvex and can be difficult to optimize. A different
work proposed to exploit the spatial correlation of the
EMs and abundance maps by proposing a general multiscale mixing model addressing EM variability [223].
The SU problem was solved using a multiscale representation of the mixing model, which facilitated the use
of any parametric EM representations, as in (13). This
generated improved results when compared to standard
spatial regularization strategies. Although the formulation was algebraically involved, an approximate algorithm with little complexity was derived through some
simplifying assumptions.
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

ENDMEMBER-MODEL-FREE METHODS
ENDMEMBER-MODEL-FREE UNMIXING
EM-model-free unmixing is characterized by the following:
◗◗ The algorithms are usually computationally efficient.
◗◗ It involves different strategies with a wide range of model complexities
and user supervision.
◗◗ Its methods usually make few or restrained assumptions about the EM
models (unlike Bayesian and parametric models).
◗◗ Some approaches have a more limited modeling ability.

Methods have proposed to blindly mitigate the effects of
spectral variability, without assuming any specific model to
represent the EM signatures; they are described in Figure 13.
One simple approach consists of using a metric depending
on the RE in the SU cost function to improve the robustness
of SU to EM variability. It can be motivated by the fact that
the commonly used Euclidean distance is very sensitive to
variations in the amplitude of the pixel spectra, thus being
significantly influenced by illumination variations [224].
This drove consideration of spectral angle mapper (SAM),
spectral correlation, and spectral information divergence
metrics, due to their insensitiveness to scaling variations
[224], [225]. The downside lies in the nonlinear and possibly nonconvex nature of the resulting SU optimization
problem, which becomes harder to solve. An efficient strategy based on the projected gradient descent algorithm was
proposed to optimize the SU cost function when using the
SAM metric [226]. Although conceptually simple, these approaches focus on specific effects, such as brightness variations, and it is not clear how they can be generalized to address more complex spectral variability.
Recent SU methods consider more general models to
deal with complex intrinsic variability effects. For instance,
an additive residual term in the LMM (1) was introduced in
[227] to account for spectral variability and other unmodeled effects. This term was represented as the product of
two matrices. The first corresponded to the first columns
of the discrete cosine transform, forcing the additive terms
to be spectrally smooth. The second was defined for the
pixel-dependent coefficients, which were forced to be spatially sparse and were estimated by solving a convex optimization problem. A similar technique included ideas from
physically motivated parametric models by considering the
LMM with a constant scaling factor for each pixel to account for rough illumination variations and an additional
nonparametric additive term to account for other types of
spectral variability [228]. This additive term was defined as
the product between an approximately orthonormal basis
matrix that had low coherence with the EM signatures and
a coefficient matrix representing the variability contribution to each pixel. However, these constraints make the resulting optimization problem nonconvex.
A different idea was to estimate a subspace projection of
the observed hyperspectral image that minimizes the effect of spectral variability in SU [229]. This strategy enables
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

SU to be performed by minimizing the RE in the projected
space. This subspace is forced to be of a low dimension
by penalizing the nuclear norm of the projection operator in the cost function, which is jointly estimated with
the abundances during SU. A more recent method considered the multidimensional representation of the pixel-dependent EM matrices and abundance vectors by applying
mathematical tools from tensor decomposition [230]. By
assuming that the EMs and abundance tensors were approximately low rank, the SU problem was formulated as a
nonconvex, nonnegative tensor factorization problem. This
led to a parsimonious model without the need for explicit
parametric representations of the EMs that are tied to specific applications.
BAYESIAN METHODS
BAYESIAN METHODS
In Bayesian methods, the following qualities are observed:
◗◗ The approaches benefit from well-developed statistical estimation
tools to derive SU methods.
◗◗ They can have a very low degree of user supervision once the statistical distributions are selected.
◗◗ They can use unrealistic distributions (e.g., isotropic Gaussians) to represent EM for mathematical tractability.
◗◗ Generally, they do not return the specific spectral signatures at each
image pixel.
◗◗ They suffer from a very high computational cost.
◗◗ Hyperparameters may need to be set by the user, and specifying
hyperprior distributions for hierarchical Bayesian models may not
be trivial.

Another set of methods considers EMs to be random
vectors, following multivariate statistical distributions, i.e.,
m n, p ` D (i n, p), (18)
where i n, p encodes parameters of a distribution D. The
spectral signatures actually present in each pixel are realizations of this random vector, and SU is then formulated
as the problem of finding a statistical estimator for the
abundances and for the EMs. These approaches depend
on the statistical distribution D employed to represent
EM spectra, the amount of user supervision required, and
the computational algorithm used to solve the problem.
Some methods require the parameters of the distribution
i n, p to be set a priori, which might be difficult in the absence of a large spectral library. Other works reduce user
supervision by employing hierarchical Bayesian methods
to jointly estimate i n, p with the remaining parameters, at
the cost of a higher computational cost [231], [232]. Bayesian methods addressing spectral variability (Figure 14) can
be classified according to the statistical distribution used to
represent the EMs: a Gaussian distribution, which provides
mathematical tractability, and more complex distributions
providing a more physically reasonable representation. We
discuss both cases in the following.
245

THE NORMAL COMPOSITIONAL MODEL
The first statistical model considered for the representation
of EM spectra was a multivariate Gaussian distribution, in
the so-called normal compositional model (NCM), given by
m p, n + N (i n, p), (19)
where D / N and i n, p = {mean, covariance} contain the
mean vector and covariance matrix for the pth EM of the nth
pixel. The NCM has been widely used due to its mathematical tractability [233], [234]. The first works employing it for SU considered expectation-maximization strategies in which the abundances and the mean EM values
and their covariance matrices were iteratively estimated
[233]. However, due to the nonconvexity of the estimation problem, a direct application of expectation-maximization approaches is unable to decide whether variations
observed in the mixed pixel spectra y n are due to different
abundances or to EM variability. This might result in the
EMs absorbing all the variation in the observed scene with
nearly constant abundances [234]. Some techniques proposed to address this problem by considering the use of
diagonal covariance matrices and empirical strategies to
estimate the EM data more easily from the observed mixed
pixels. For instance, EM means and covariances were estimated a priori by using pure pixels selected from the hyperspectral image [235], and they were iteratively based
on large regions of observed pixels with homogeneous
abundances (obtained from the segmentation of estimates
of the abundances available a priori) [236].
Other works attempted to improve different aspects of
this method by using a particle swarm optimization algorithm to solve the (usually intractable) integrals involved
in the estimation of the abundances in the “expectation”
step of the algorithm [237] and by incorporating a priori

information in the form of additional constraints penalizing the nuclear norm of the abundances in groups of pixels
determined through image segmentation methods (to promote spatial homogeneity) [238]. Despite these advances,
the susceptibility of expectation-maximization-based methods to converge to poor local minima of the nonconvex cost
function prevented their large-scale applicability for this
problem. Instead, most recent approaches rely on more robust (although costly) techniques based on Markov chain
Monte Carlo methods to sample the posterior distribution.
Although the works that adopt this approach share the same
underlying idea, they differ significantly in the way in which
EMs and abundances are represented and the amount of
user supervision that is required.
For example, different strategies have been proposed to
represent the mean and the covariance matrices of the EMs
in the NCM. One of the first considered the EM mean values to be known a priori and their covariance matrices to be
multiples of the identity matrix [239] while employing conjugate distributions to make the estimation of the parameters easier. Later works attempted to add more flexibility
by considering, for instance, a single full covariance matrix
shared by all EMs [240] or a positive definite matrix defined
a priori and multiplied by EM-dependent scaling parameters [241]. Diagonal covariance matrices were employed in
[242], which also considered the estimation of the EM mean
values in a hierarchical Bayesian framework, using hyperpriors to directly estimate the distribution parameters from the
observed hyperspectral image. The Bayesian framework was
also applied in [243] to blindly estimate the number of EMs
in the scene via a uniform discrete prior.
Other works attempted to address physically motivated
cases of the general NCM. These included consideration of
the statistical dependence between different EMs to represent spectral variability that may equally affect all materials

Estimated Abundances
and EM Distributions

Bayesian
Inference

Observed Image

p (YA, θM ) p (A, θM) ∝ p (A, θMY )
Likelihood

Prior

Posterior

FIGURE 14. A description of Bayesian SU techniques. Bayesian SU methods represent the EM signatures at each pixel as a realization of a
statistical distribution. Statistical distributions are first attributed to the EMs and abundances and, possibly, other variables or to hyperparameters of these distributions. Using the Bayes rule, the SU results are then derived from the posterior distribution in a Bayesian inference
problem. The abundances and EM distributions can be computed as, e.g., the mean or mode of the posterior distribution.

246

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

in the scene (e.g., atmospheric effects) [244] and the explicit
representation of the higher correlation between adjacent
spectral bands to introduce spectral smoothness to the signatures, leading to a well-posed model that is fast to compute [245]. An alternative approach, which has been used
to simplify the unmixing process associated with the NCM,
is to estimate the EM means and covariance matrices a priori based on spectral libraries extracted from the observed
image. This has been performed by considering libraries
obtained through pure pixel-based EM bundle extraction
methods [246] and on multiple EM matrices estimated by a
piecewise, convex, blind SU algorithm [191]. However, these
methods suffer from the limitations of image-based EM
bundle extraction techniques, which is discussed in detail in
the “How to Construct Spectral Libraries” section.
Other works considered a piecewise convex model that
uses a set of different Gaussian distributions to model EMs.
Afterward, during SU, each image pixel is assigned to one of
these distributions through a membership function represented by a Dirichlet random variable. The unmixing problem in this model was solved by considering an alternating
optimization method in a maximum a posteriori framework
[247] and a Markov chain Monte Carlo sampling approach
providing an estimate of the posterior distribution of interest
[248]. Although the Dirichlet prior distribution is frequently
used to represent abundances, many works have considered
variations that incorporate useful information from the underlying practical problem. Examples include the enforcement of abundance sparsity through a sparse Dirichlet prior
[249] and the encouragement of spatial homogeneity by dividing abundance maps into a finite number of classes sharing the same Dirichlet distribution parameters. This division
has been performed blindly by means of a classification
prior by using the Potts model [242] and through an a priori
segmentation of the hyperspectral image in a latent Dirichlet
allocation framework [250].
More recently, the NCM has been applied to problems
unrelated to linear unmixing and spectral variability. For
instance, it has been considered for representing the uncertainties in EM estimation instead of the intrinsic variability
of the material classes, which changes the problem by introducing statistical dependence between the different image pixels [251]. Other works applied the NCM to problems
such as nonlinear SU with a bilinear mixing model [252]
for the linear unmixing of sediment grain size distribution
(where EMs represent the grain sizes of constituent materials) to study the transport and deposition of sediments
[253] and represent the variability of the EMs across multiple images in multitemporal SU by using additional spatially sparse terms accounting for potential abrupt spectral
changes between the different images [254].
OTHER ENDMEMBER DISTRIBUTIONS
Despite its popularity, the NCM does not have a strong
physical motivation, which led to the consideration of more
accurate distributions to represent EMs. For instance, a beta
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

distribution was considered in [255] to constrain reflectance
values to physically meaningful ranges and enable possible
skewness in the distribution. Unfortunately, a direct solution
to the SU problem cannot be obtained. Thus, a piecewise
constant model was assumed for the abundances, which
enabled the parameters of the distribution to be estimated
using a combination of a clustering algorithm and a variant
of the method of moments. A Gaussian mixture model was
also considered in [256] to facilitate possibly multimodal
EM distributions. The SU problem was solved as a maximum a posteriori estimation problem using a generalized
expectation-maximization approach. However, since learning the parameters of Gaussian mixture models can be difficult, the distributions were estimated before performing SU,
based on spectral libraries assumed to be known a priori.
Another approach proposed to represent EM spectra as
a sum of an average spectral signature known a priori and
a spatially and spectrally smooth function representing EM
variability to provide a model that is physically more reasonable [257]. Bilinear mixing models were also considered
with an additive residual term to account for mismodeling
effects and outliers. A different technique has been offered
that does not make an explicit assumption about the distribution of EM spectra and, instead, relies only on some of
their statistics. This is the case in [258], which formulates
the SU problem similarly to the method of moments by trying to find the abundance values that match the mean and
covariances obtained through the LMM to those of the observed mixed pixels. A similar work applied the same idea
by using transformed statistics constructed from the ratio
between the means and covariances of the pixels and EMs
in different spectral bands [259]. This strategy increases the
robustness of the method since band ratios are invariant to
illumination variations. However, similar to [255], a piecewise constant abundance model is employed to estimate the
covariance matrix of the observed pixels. Moreover, the covariance matrices of the EMs are assumed to be known a priori.
SPECTRAL LIBRARIES
A large number of the SU techniques discussed in the
“Unmixing Methods That Use Spectral Libraries” section
address spectral variability by using spectral libraries and
bundles known a priori. The performance of these methods is often heavily impacted by how well the libraries can
represent the EMs actually present in the scene. Moreover,
in many practical situations, it is either very costly or even
impossible to obtain laboratory and in situ measurements
of EM spectra. Another issue with many methods presented
in the “Unmixing Methods That Use Spectral Libraries” section (such as MESMA) is that the approaches’ computational complexity increases very quickly with the library size,
which can make the problem intractable for large libraries.
Thus, the challenges of removing redundant and irrelevant
spectra before SU and, especially, extracting spectral libraries directly from observed hyperspectral images are of central importance to enable the techniques discussed in the
247

“Unmixing Methods That Use Spectral Libraries” section to
be widely applicable. Fortunately, several techniques have
been proposed to address both of these problems, which we
discuss in detail in this section.
HOW TO CONSTRUCT SPECTRAL LIBRARIES
Many library-based SU works assume that spectral libraries
are manually obtained from in situ and controlled laboratory measurements [37], [260], which may be complicated in
practical applications. Moreover, existing libraries may have
been acquired in conditions that do not reflect those actually observed in the scene, which introduces errors in the SU
process [37], [103], [261]. Even the spatial resolution at which
the hyperspectral image was acquired was found to have a
considerable impact on the results of SU with MESMA in urban environments when the library was fixed a priori [262].
Traditional EM extraction algorithms (EEAs), on the other hand, typically consider only a single spectral signature
per material and are thus unable to appropriately address
spectral variability [33], [263]. These shortcomings make
the construction of spectral libraries one of the main challenges of library-based SU methods [260]. A simple and reliable method that has been employed to construct spectral
libraries depends on expert knowledge to manually select
pure pixels of each material from the hyperspectral image
[262], [264]. However, there has been growing interest in
developing methods that can reduce the amount of user supervision and automatically extract libraries directly from
observed hyperspectral images. Three main general lines of
research can be identified in this direction:
1) Extract multiple pure pixels from the observed hyperspectral image to generate a candidate library, and then
cluster the extracted signatures into their respective material classes.
2) Generate libraries using radiative transfer models that
represent EM variability mathematically.
3) Extract pure pixels while keeping information about
their spatial locations, and apply an interpolation algorithm to generate EM signatures for each image pixel.
Figure 15 provides an overview of the key ideas underlying
each of these approaches, which are reviewed in the following.
IMAGE-BASED LIBRARY CONSTRUCTION
IMAGE-BASED LIBRARY EXTRACTION
Image-based library extraction is characterized by the following:
◗◗ It enables spectral libraries to be extracted with signatures that have
the same conditions as the image pixels.
◗◗ It can benefit from expert knowledge to reliably identify pure pixels in
the image.
◗◗ It strongly depends on the presence of pure pixels.
◗◗ The observed image should not be too small.
◗◗ Mixed pixels may be included in the library by mistake.
◗◗ Clustering the extracted signatures into their correct material classes
is challenging.

248

The simplest approaches for the construction of imagebased spectral libraries are completely supervised. Image
pixels are included in the library based on their correlation to some initial EMs manually selected as the extreme
points of the PCA of the observed image [97], [265] or
simply by manually screening a large number of pure pixels extracted from the image by using expert knowledge
about the spectral characteristic of the materials in the
scene [264]. Pure pixels were also extracted from multiple
hyperspectral images of the same scene acquired at different spatial resolutions to increase the diversity of the
resulting spectral library in urban environments [262].
Other work used only partially labeled data to reduce the
amount of domain knowledge that is required [266]. Recent strategies attempted to automate this process by extending EEAs for the extraction of multiple signatures of
each material in the observed image. The first work in this
direction proposed to apply traditional EEAs to random
subsets of pixels that are sampled from the hyperspectral image (with or without replacement) [267]. Different
sets of EM signatures are generated through this method.
All the extracted signatures are then grouped into different sets corresponding to the material classes by using a
clustering algorithm (e.g., k-means). The size of the image
subsets, however, must be selected with great care for the
EEAs to work satisfactorily [187], and the clustering step
can be challenging.
Later works proposed different strategies for the extraction and selection of multiple pure pixels and EM candidates from the observed image. One simple iterative strategy
consists of including in the library all pixels that are within
a given spectral distance of reference EMs [268]. This process
is performed iteratively, with the reference EMs initialized
using a standard EEA and then updated as the mean values
of the library signatures at the previous iteration. Besides
being very simple, this procedure does not require the library signatures to be clustered afterward. A related strategy
worked in a reverse way by iteratively removing pure pixels
from a large initial set of candidate signatures to obtain the
final spectral library [269]. A pixel candidate is removed if
it can be represented with little error as a convex combination of the remaining signatures in the library. A clustering
procedure is then performed to group the selected spectra
into EM classes.
Recent works have proposed more involved empirical
approaches to differentiate between spectrally similar materials when extracting and clustering the EM signatures
and to remove mixed pixels from the constructed library.
For instance, in [270], EM extraction was performed multiple times for different subsets of the spectral bands constructed at multiple spectral scales and intervals. These
signatures were then clustered into EM classes based on a
metric constructed from features derived by individually
applying clustering algorithms to the spectral scales and
intervals used previously. A related strategy considered the
extraction and clustering of the library signatures based on
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

subsets of the wavelet transform coefficients of the reflectance spectra that are robust to spectral variability [271].
These subsets were selected based on how much their
empirical statistical distribution deviated from an uncorrelated Gaussian distribution. The hyperspectral image
was also partitioned into spatial segments via a hierarchical clustering algorithm, and only one signature for each
spatial segment was considered for inclusion in the library.
Another strategy proposed to extract spectral signatures as
the image pixels that can best represent all others in the
observed image as a sparse linear combination [272]. Afterward, these signatures are grouped into material classes through spectral features derived from the slopes of a
piecewise linear approximation of each signature.
Auxiliary libraries available a priori were used to aid in
the extraction of image-based spectral libraries in [172].
The k-nearest-neighbor algorithm was first employed to

classify the image pixels in the different material classes,
using library spectra known a priori as training data. This
led to a set of candidate EMs for each material class. Based
on the classification results, the image-extracted library
was defined as the average spectra of the candidate EMs
of each class that were contained in a spectral neighborhood of each of the training samples (from its corresponding class) [172]. Another group of approaches makes use of
the empirical observation that pure pixels are more likely
to be contained in spatially homogeneous regions. Spectral
libraries can be constructed by restricting EM candidates to
sufficiently homogeneous regions [273], [274] by applying
an image oversegmentation strategy before the pure pixel
extraction [275] and by considering EM candidates as the
average of homogeneous regions obtained from a coarse
spatial scale selected from a multiscale image decomposition [276]. These strategies should be applied with care to

Physical Model
(Expert
Knowledge)

Observed Image

Extract Pure
Pixels and Their
Spatial Locations

Extract a Set of
Pure Pixels or
Candidate EMs

Synthesize
EMs for the
Other Spatial
Locations

Cluster the
Extracted
Signatures

(a)

(b)

e.g., Chlorophyll Select Different
Concentration Sets of Physical
θ1, θ2,..., θk
Parameters

f(θ1)
f(θ2)
f(θk)

Synthesize EM
Signatures Using
the Parametric
Model f (θ)

(c)

Spectral Libraries

FIGURE 15. Approaches to generate spectral libraries: (a) image-based library generation using EM extraction (discussed in the “ImageBased Library construction” section), (b) the spatial interpolation of pure pixels extracted from the image at known locations (the “Spatial
Interpolation of Endmember Signatures” section), and (c) the generation of synthetic signatures from physics-based models (the “Generating Spectral Libraries From Physics Models” section).
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

249

avoid the inclusion of pixels extracted from mixed homogeneous regions in the library.
Some alternatives tried to build spectral libraries by using
different forms of the matrix factorization of the hyperspectral image. For instance, spectral libraries for each material
class are constructed in [193] by learning sparse representations of sets of pure pixels of each material, which are extracted from the observed image. More precisely, dictionary
learning is applied to the pure pixels of each material, from
which the resulting basis matrices are used to construct the
spectral library. Another approach proposed to extract the
spectral library using the results of an SU procedure employing a matrix factorization approach, which does not account
for spectral variability [277]. However, besides depending
on the results of another SU algorithm, there is no guarantee
that the selected signatures are pure pixels.
GENERATING SPECTRAL LIBRARIES
FROM PHYSICS MODELS
PHYSICS-BASED LIBRARY SYNTHESIS
Physics-based library synthesis has the following qualities:
◗◗ It can generate libraries independently of the observed image.
◗◗ It can represent a wide range of spectral variability if more complex
models are employed.
◗◗ It depends on the availability of an accurate physical model for the
EM spectra.

An alternative approach to generate spectral libraries
that does not depend on the observed hyperspectral image is to employ a physics-based (i.e., radiative transfer)
model describing the reflectance of the EMs as a function
of physicochemical parameters. This enables us to generate different instances of the material spectra to constitute a synthetic library by sampling the free parameters of
the model. Examples of such representation include the
PROSPECT model [69] for vegetation and Hapke’s [48] and
Shkuratov’s models [196] for packed particle spectra. Different representations inspired by physics have been employed to generate and augment spectral libraries for SU in
many applications, which include describing tree canopy
as a function of its height and radius [279], fire temperature radiance as a function of the view and solar geometry
and atmospheric conditions [280], and soil reflectance as
a function of moisture content [73]. This strategy has also
been applied to generate training data for the SU of binary
mixtures of vegetation and impervious materials through
machine learning algorithms [281]–[284]. Note, however,
that directly sampling all parameters of complex models,
such as PROSPECT, might lead to a very large number of
signatures. This has motivated strategies to sample the
parametric models more efficiently and to remove redundant spectra from the generated library [285].
Despite their advantages, a significant drawback of
these methods is the requirement for accurate knowledge
of the physical process governing the observation of the
250

reflectance of the materials by the sensor. A different approach attempted to circumvent this issue by proposing a
data augmentation strategy, where one wishes to synthesize
additional signatures to be included in small, preexisting libraries [286]. The spectral signatures in the library are used
as training data to learn the statistical distribution of the
EMs through deep generative models, such as variational
autoencoders and deep neural networks. This enables one
to sample new signatures from the learned distributions to
augment the existing library.
SPATIAL INTERPOLATION OF ENDMEMBER
SIGNATURES
SPATIAL ENDMEMBER INTERPOLATION
For spatial EM interpolation, the following is true:
◗◗ The process uses the hypothesis of spatially correlated EM signatures.
◗◗ It needs knowledge of the spatial position of pure pixels in the scene.
◗◗ The number of available pure pixels can strongly affect the performance of the methods.

A number of approaches based on the assumption that
EMs are spatially correlated have proposed to synthesize
pixel-dependent EM signatures based on a set of pure
pixels at known spatial locations through interpolation
techniques. Many of these works aim to perform the SU
of vegetation and soil mixtures by employing vegetation
indices (i.e., spectral features given by ratios of band differences, such as the NDVI) in lieu of traditional EMs. For
instance, the spatial interpolation of vegetation and soil
NDVIs based on linear regression has been considered
for the SU of coarse-resolution images, where the training samples for the EMs were obtained using classification
maps from complementary high-resolution images available a priori [287].
A similar strategy considered the use of spatially
weighted kriging employing as training samples pure pixels that were manually extracted from the scene [288] or
obtained by randomly sampling the vertices of the simplex obtained by a low-dimensional projection of the
hyperspectral image [289]. This strategy enabled one to
weight the contribution of the training samples according
to their spatial distance to each interpolated signature.
Other works considered the spatial interpolation of actual spectral signatures, instead of just vegetation and soil
indices, by using spatially weighted linear regression or
kriging. This has been performed by considering training
data obtained from complementary high-resolution classification maps [290] and pure pixels extracted from the
image inside subregions that were selected with the aid of
a classification algorithm [291].
LIBRARY PRUNING TECHNIQUES
One significant problem with many SU methods based
on spectral libraries, such as MESMA, is that their computational complexity increases quickly with the size of
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

the spectral library. Furthermore, databases containing
laboratory-acquired spectra often contain hundreds of different materials. Using a library of this size can actually decrease the performance of SU since the problem becomes
more and more ill-posed. One solution consists of removing redundant and irrelevant signatures from large spectral
libraries before the SU process. These approaches, which
are also called library pruning, have been largely applied to
reduce the complexity and improve the accuracy of MESMA [38] and sparse unmixing algorithms [292]. There are
three main groups of library pruning techniques. Library
reduction approaches remove only redundant signatures
to improve the computation time. EM selection techniques
identify which materials are present in each hyperspectral
image pixel to remove absent EM classes from the library
before SU. Same-class library pruning attempts to identify
and remove signatures that are acquired in different conditions from those of the observed image. These approaches
are reviewed in detail in the following.
LIBRARY REDUCTION TECHNIQUES
SPECTRAL LIBRARY REDUCTION
Spectral library reduction involves the following:
◗◗ It employs very simple strategies that do not depend on the observed
hyperspectral image.
◗◗ It reduces computational complexity but does not improve the quality
of the SU results.

Library reduction techniques attempt to remove redundant spectral signatures, regardless of the observed hyperspectral image, which tends to improve the computational
complexity of SU but not necessarily its quality. A common
idea is to find a small set of signatures that can best represent the remaining spectra of the same EM class in some
sense [293], such as the squared error [38], average spectral angle [89], and count-based EM selection metric, where
one counts the number of signatures that one candidate
can represent with an error below a threshold [93]. An alternative method divided the library signatures into groups
according to their Euclidean norm, selecting one signature
from each group to explicitly account for brightness variations [294].

EM selection techniques attempt to identify which EM
classes are present in each pixel by using information, such
as classification maps [10], [295], to remove absent materials from the library and improve the unmixing results [10],
[295]. This relies on the observation that hyperspectral image pixels usually contain only a small number of materials, and it has also been applied to SU without considering
spectral variability [296], [297]. The simplest EM selection
methods use classification algorithms to select the EM classes
present in mixed pixels [298], [299]. Another work employed
a block sparse unmixing algorithm as a preprocessing step
to remove material classes with low abundance values from
the library for each image pixel before applying the MESMA
algorithm to obtain the final SU results [300]. A more elaborate approach proposed to semantically organize subsets of
material classes in a hierarchical tree, starting from a rough
(e.g., pervious and impervious) differentiation and progressing to a fine one between the EMs (e.g., different vegetation
species) [59]. Afterward, SU was performed at each level of
the tree, using the abundance results in the previous, coarser
level to constrain which EMs could be selected at the current
one (i.e., a pixel containing only a pervious EM in the coarse
scale cannot have a concrete EM in the finer one).
Some recent approaches have proposed to use external
complementary data to aid in identifying which materials
are present in each pixel. For instance, in [301], the hyperspectral image was divided into rural and urban subsets
using external data about road network density, which enabled the use of a separate set of EM classes for each of the
subsets. Another work proposed to use additional lidar data
to remove material classes from the library of each pixel,
based on its height distribution (e.g., a “tree” or “building”
EM can be removed from a pixel that has low height) [302].
PRUNING LIBRARIES WITHIN THE SAME CLASS
SAME-CLASS ENDMEMBER PRUNING
This process has the following qualities:
◗◗ It can remove spectral signatures from each material class that are not
representative of the observed hyperspectral image.
◗◗ It can improve the SU quality and reduce the computational cost, even
for libraries with few material classes.
◗◗ Identifying which signatures in the spectral library do not share acquisition conditions with the observed image is generally difficult.

ENDMEMBER SELECTION METHODS
ENDMEMBER SELECTION
For EM selection, the following must be considered:
◗◗ It can remove only entire material classes from the library for each
pixel; it is also effective for variability-free SU.
◗◗ It can leverage information from the observed hyperspectral image.
◗◗ It can improve SU quality and reduce computational complexity.
◗◗ It usually depends on some sort of classification procedure.
◗◗ It relies on the observation that only a few materials are normally
contained in each pixel.

DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

Recent approaches proposed to remove signatures from
the library that were acquired in conditions that differed from
those of the hyperspectral image, keeping only signatures that
were representative of the observed image. However, measuring the representativeness of EM signatures is a difficult task.
A simple approach suggested removing signatures that have
a large spectral angle and spectral L 1 distance relative to the
observed pixels [303]. However, this strategy might discard
relevant signatures in the presence of many mixed pixels. Another work proposed to compare only pure pixels extracted
from the image with the library spectra in the wavelet domain
251

[304]. A different method was to remove library elements that
have large distances to a small set of the leading eigenvectors
of the observed hyperspectral image and are thus unlikely to
be present therein [305]. This strategy eliminates the direct
need for pure pixels in the scene. It has also been successfully
applied for plant production system monitoring [306], and
it was later extended to consider a brightness normalization
preprocessing step and other strategies from the “Library Reduction Techniques” section to additionally remove redundant spectra [295]. Another work sought to perform library
pruning iteratively in a sparse unmixing formulation by removing signatures corresponding to low abundance values
during the SU process [307]. However, this directly depends
on the accuracy of the SU process at the first iterations.
EXPERIMENTAL EVALUATION
This section presents a brief discussion about the experimental evaluation of the unmixing algorithms when spectral variability is considered. We first explore the generation
of synthetic data in detail. Afterward, software packages
that can be useful to practitioners are presented. Finally,
a tutorial-style simulation is presented to demonstrate the
use of a few of the SU techniques reviewed in this article,
after selection using the decision tree in Figure 2.
GENERATING SYNTHETIC DATA
One challenge in the evaluation of unmixing methods is
the lack of reliable ground truth data for the abundances
of real hyperspectral images. The difficulty in collecting
ground truth data is even more pronounced when EM variability is considered. Thus, being able to generate realistic
synthetic data (for which the true abundances are available)
turns out to be important to facilitate a quantitative evaluation of SU algorithms. The production of synthetic data can
be roughly divided into the following three steps:
1) generating synthetic abundances
2) generating EM signatures for each pixel in the image
3) applying the mixing model of choice (in our case, the
LMM) to generate the mixed image pixels.
We discuss each in the following.
GENERATING SYNTHETIC ABUNDANCE DATA
The generation of synthetic abundance maps can be performed in different ways. A simple strategy is to sample the
abundance values randomly from a Dirichlet distribution.
This approach enables one to control the number of pure
pixels in the image and can be useful when performing
Monte Carlo simulations in which large amounts of data
must be created. Another technique consists of introducing
spatial contextual information (i.e., pixels that are close in
space tend to have similar abundance values) into the generated abundances to create more realistic data. Such data can
be produced using, for instance, piecewise smooth images
sampled from a Gaussian random field [308]. This method
can generate images containing smooth regions, sharp transitions, and fine details and whose spatial composition and
252

regularity characteristics can be controlled by the user [308].
One software tool that can be used to generate abundance
maps according to Gaussian random fields is the Hyperspectral Imagery Synthesis tool for MATLAB (http://www
.ehu.eus/ccwintco/uploads/f/fb/Synthesis.zip). Another way
to obtain realistic synthetic abundance maps is to consider
abundances obtained by applying an existing SU algorithm
on a real hyperspectral image [309]. The resulting abundance maps will have a realistic spatial distribution and can
be used as ground truth to generate new synthetic data sets.
GENERATING SYNTHETIC ENDMEMBER VARIABILITY
Generating realistic EM variability data is not a simple task
since, as explained, the spectral signatures of the materials
present a complex dependence of different physicochemical and environmental parameters. Fortunately, very accurate radiative transfer models have been developed for
many applications. Such representations describe the physical processes governing, e.g., vegetation spectra [68], mineral interactions [48], [196], and atmospheric effects [310].
Well-calibrated radiative transfer models can be used to
generate realistic simulated image scenes that enable one to
simultaneously study nonlinear mixtures and EM variability effects. Experimental studies have found that the data
simulated using such models show very strong agreement
with reference ground truth information collected under
the same circumstances using ground-based spectral measurement setups [311], [312]. This approach was employed
to evaluate nonlinear unmixing models in [313]. Thus, wellcalibrated radiative transfer models can be used to generate
realistic simulated hyperspectral data that enable us to develop, optimize, test, and compare different SU techniques
that consider EM variability.
Although complex ray tracing simulations can be considered (e.g., [48], [68], [196], [310], and [314]), here, we present
simplified models for illustrative purposes; the models describe variability present in vegetation spectra and caused by
different viewing geometries. The first is PROSPECT-D [76],
which represents vegetation leaf spectra as a function of,
e.g., the chlorophyll and dry matter content and the equivalent water thickness. PROSPECT-D and related models for
vegetation spectra can be downloaded for different software
platforms (http://teledetection.ipgp.jussieu.fr/prosail/) (including MATLAB and Python).
We also consider a simplification of Hapke’s model [48]
by assuming a Lambertian (isotropic) scattering and a densely
packed medium. This simplified representation describes variations in the reflectance spectra of a material y sensor (at each
wavelength) as a function of the viewing geometry [50], [52]:
y sensor =

~
, (20)
(1 + 2n 1 1 - ~ )(1 + 2n 2 1 - ~ )

where ~ is the single scattering albedo of the material, and
n 1 (respectively, n 2) is the cosine of the angle between the incoming (resp. outgoing) radiation and the normal to the surface. This model enables us to generate different EM spectra
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

by varying the values of n 1 and n 2. While (20) is approximately linear for small albedo values, important nonlinearities occur for large albedo values [52].
Additionally, we explore variability introduced by errors occurring in a simple atmospheric compensation
model, where the reflectance of each pixel at each wavelength is obtained by dividing the corresponding pixel’s
radiance by the radiance observed at a perfectly reflective
calibration panel [271, Sec. 5(a)(1)], [315]. Assuming full
visibility and that the adjacency effect is negligible, this
model is given by
E sun - gr n 1 + E sky
y sensor = y s E
, (21)
sun - gr n 2 + E sky
where y s and y sensor denote the reflectance at the ground and
the sensor, E sun-gr represents the solar radiance observed at
the ground level, and E sky is the skylight. Parameters n 1
and n 2 are the cosines of the angles between the surface
normal and the direction of the sun at each pixel and at the
calibration panel, respectively. By fixing n 2, E sun-gr, and E sky
a priori, n 1 can be varied to simulate spectral signatures at
different viewing geometries.

GENERATING THE MIXED PIXELS
Finally, each pixel can be generated according to the spectral
variability-accommodating LMM described in (2), with the
EM for each pixel (M n columns) sampled randomly from
the set of synthetically generated signatures. Additive noise
can be introduced to obtain a desired signal-to-noise ratio.
SOFTWARE RESOURCES
Several software packages are available to perform SU with
spectral variability. Classical techniques, such as MESMA and
some of its alternatives (including library pruning and transformation methods), can be found in the Visualization and
Image Processing for Environmental Research tools software
package [316], which is available as a plug-in for well-established software, such as ENVI and the Quantum Geographic
Information System. An implementation of the MESMA algorithm is also available in R in the remote sensing toolbox.
Algorithms that were developed more recently, on the other
hand, are usually available only as stand-alone prototypes
implemented in MATLAB and Python. A list of software packages for some of the papers reviewed in this work (most of
which are found at the authors’ websites) is in Table 3. Also,

TABLE 3. COMPUTATIONAL CODE CONTAINING IMPLEMENTATIONS OF SOME OF THE WORKS REVIEWED IN THIS ARTICLE.
METHOD

LINK

LANGUAGE

https://sites.google.com/site/robheylenresearch/code/AAM
.zip?attredirects=0&d=1

MATLAB

METHODS THAT USE SPECTRAL LIBRARIES
MESMA [37], AAM [87]
SUnSAL [103], SUnSAL-TV [112]

http://www.lx.it.pt/~bioucas/publications.html

MATLAB

Sparse SU with mixed norms [84]

https://openremotesensing.net/knowledgebase/hyperspectral-image-unmixing
-with-endmember-bundles-and-group-sparsity-inducing-mixed-norms/

MATLAB

BAYESIAN METHODS
BCM [255]

https://github.com/GatorSense/BetaCompositionalModel

MATLAB

NCM-E (NCM by Eches et al.) [239]

http://olivier.eches.free.fr/research.html

MATLAB

UsGNCM [242]

https://sites.google.com/site/abderrahimhalimi/publications

MATLAB

Bayesian OU [254]

https://pthouvenin.github.io/robust-unmixing-plmm/

MATLAB

PCOMMEND [247]

https://github.com/GatorSense/PCOMMEND

MATLAB

GMM [256]

https://github.com/zhouyuanzxcv/Hyperspectral

MATLAB

https://openremotesensing.net/knowledgebase/spectral-variability-and
-extended-linear-mixing-model/

MATLAB

PARAMETRIC EM MODELS
ELMM [51]
PLMM [217]

https://pthouvenin.github.io/unmixing-plmm/

MATLAB

GLMM [212]

https://github.com/talesimbiriba/GLMM

MATLAB

DeepGUn [222]

https://github.com/ricardoborsoi/Unmixing_with_Deep_Generative_Models

MATLAB

MUA-SV [223]

https://github.com/ricardoborsoi/DataDependentSUvarRelease

MATLAB

OU [221]

https://pthouvenin.github.io/online-unmixing-plmm/

MATLAB

EM-MODEL-FREE METHODS
RUSAL [227]

https://sites.google.com/site/abderrahimhalimi/publications

MATLAB

SULoRa [229]

https://sites.google.com/view/danfeng-hong/data-code

MATLAB

ALMM [228]

https://openremotesensing.net/knowledgebase/an-augmented-linear-mixing
-model-to-address-spectral-variability-for-hyperspectral-unmixing/

MATLAB

ULTRA-V [230]

https://github.com/talesimbiriba/ULTRA-V

MATLAB

The code was provided by its respective authors. AAM: alternating angle minimization; SUnSAL: sparse unmixing via variable splitting and augmented Lagrangian; TV: total variation;
BCM: beta compositional model; NCM-E: NCM published by Eches et al.; UsGNCM: unsupervised generalized NCM; OU: online unmixing; PCOMMEND: piecewise convex multiple-model
EM detection and SU; GMM: Gaussian mixture model; DeepGUn: deep generative unmixing; MUA-SV: multiscale unmixing algorithm accounting for spectral variability; RUSAL: robust
unmixing by variable splitting and augmented Lagrangian; SULoRa: subspace unmixing with low-rank attribute embedding; ALMM: augmented LMM; ULTRA-V: unmixing with low-rank
tensor regularization algorithm.

DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

253

the OpenRemoteSensing website, which aims to share and
disseminate code and papers, has an increasing number of
SU methods, some of which consider spectral variability.
EXPERIMENTAL SETUP AND RESULTS
We now present a simulation to illustrate the application
of some of the algorithms reviewed in this work. Note that
this is merely illustrative and not a comprehensive performance evaluation. We generated a synthetic hyperspectral
image containing vegetation, soil, and water as constituent
materials. Spatially correlated abundances with 50 # 50
pixels were first sampled from a Gaussian random field.
Then, we followed the procedure described in the “Generating Synthetic Endmember Variability” section to generate different EM spectra for each material in the scene.
The PROSPECT-D model was used to produce vegetation
spectra, while the simplified Hapke and atmospheric models were used to create dirt and water spectra, respectively,
at different viewing geometries. The generated synthetic
signatures, containing L = 198 bands, can be seen in Figure 16. The EMs contained in each pixel were randomly
sampled from this set of synthesized signatures, and the
pixel spectra were then generated following the LMM with
variability in (2), with white Gaussian noise added to the
image to obtain a signal-to-noise ratio of 30 dB.
To evaluate the SU results, we considered as quantitative
quality measures the root-mean-square error (RMSE) and
the SAM. The RMSE between two generic variables X and
Xt is defined as
RMSE X =

1
t 2
N X X - X F , (22)

where $ F is the Frobenius norm and N X denotes the
number of elements in X. We used the RMSE to evaluate
the estimated abundances At , the reconstructed images Yt ,
t n (for the cases
and the estimated pixel-dependent EMs M
when this estimate was available). The SAM was also used
to evaluate the estimated EMs as
0.6

ALGORITHM SELECTION AND SETUP
For illustrative purposes, we considered the recovery of the
abundance maps following four different paths in the decision
tree in Figure 2 (selected according to the algorithm implementations available in Table 3). These included the following:
1) small spectral libraries extracted directly from the image, with no expert knowledge available
•• less user supervision: MESMA and its variants [37]
•• lower computational cost: sparse unmixing (fractional
sparse SU [84])
2) spectral libraries not available a priori
•• less user supervision: Bayesian methods [NCM published by Eches et al. (NCM-E) [239] and beta compositional model (BCM) [255], [317]]
•• lower computational cost: parametric models [ELMM
[51] and deep generative unmixing (DeepGUn)
[222]] and EM-model-free methods [robust unmixing by variable splitting and augmented Lagrangian
(RUSAL) [227]].
We additionally considered the FCLS solution as a baseline, using a single set of EMs extracted from the image using the vertex component analysis (VCA) algorithm [278].
The EMs extracted by VCA were also used as initialization
and reference/mean signatures for some of the algorithms
(ELMM, DeepGUn, RUSAL, and NCM-E). For MESMA and
sparse SU, the spectral libraries were extracted from the
0.15

0.8
0.7

0.3
0.2

0.6

Reflectance

0.4

Reflectance

where N is the number of pixels and P the number of materials in the hyperspectral image. We assessed the complexity of the algorithms through their execution times,
measured on an Intel Core i7 processor with 4.2 GHz and
16 Gb of random-access memory. Finally, to increase the
reliability of the results, we executed the simulation for 10
independent Monte Carlo realizations and reported the average values for all metrics.

0.9

0.5

0.5
0.4
0.3

0.1

0.05

0.2

0.1
0

N
P
t p, n
m <p, n m
1
o, (23)
SAM M = LPN / / arccos e
t p, n
m p, n m
n=1 p=1

0.1
0.5

1.5

Wavelength (µm)
(a)

0.5

1.5

Wavelength (µm)
(b)

0.5

1.5

Wavelength (µm)
(c)

FIGURE 16. Generated spectral signatures used in the synthetic hyperspectral image. (a) Vegetation. (b) Soil. (c) Water.

254

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

observed image, as described in greater detail in the following section. The spectral libraries were also used to estimate
the parameters of the beta distribution in the BCM. The regularization/tuning parameters of the algorithms (fractional
sparse SU, ELMM, DeepGUn, and RUSAL) were manually
adjusted to maximize the abundance reconstruction performance measured in an independent data set generated
following the specifications at the beginning of the “Experimental Setup and Results” section.
LIBRARY EXTRACTION
To demonstrate the use of library-based SU methods in
practical settings, the spectral libraries used by MESMA
and fractional sparse SU were extracted directly from the
observed image. We used the method described in [267],
which consists of performing EM extraction (in this case,
using the VCA algorithm) in subsets of pixels randomly
sampled from the image. We extracted five sets of EMs,
using subsets of 500 pixels each (sampled with replacement). The library was kept small to prevent the inclusion
of redundant signatures and reduce the probability of selecting mixed pixels by mistake. As a byproduct, this kept
the computational complexity of methods such as MESMA
very low, while providing good experimental results. The
estimated signatures can be viewed in Figure 17. Although
the spectral variability in Figure 17 is less accentuated than
that of the true EMs in Figure 16, the estimated signatures
are good representatives of the materials in the scene. The
good performance of the library extraction method can be
explained by the presence of multiple pure pixels in the
synthetically generated abundance maps, as evident in the
first row of Figure 18.

0.9

0.4

0.7

0.3
0.2

0.15

Reflectance

0.5

Reflectance

DISCUSSION
The quantitative results are provided in Table 4, while the
estimated abundance maps and EMs are depicted in Figures 18
and 19, respectively. Note that the RMSE M and SAM M are
not available for FCLS, RUSAL, NCM-E, and BCM since

these algorithms do not estimate the spectral signatures of
the EMs present in each pixel of the image. All methods
that considered spectral variability led to better abundance
reconstruction results than the FCLS baseline. In particular, the library-based methods (MESMA and fractionalbased sparse SU) had very good performance, which likely
occurred due to the image-extracted spectral library accurately representing the typical EM variability contained in
the scene. Moreover, sparse SU with fractional norms performed similarly and slightly better than MESMA.
The methods based on parametric EM models (ELMM
and DeepGUn) also led to considerable improvements
when compared to FCLS, especially considering that the
EMs were estimated directly from the image. The EMmodel-free method (RUSAL), which takes general variability and incorrect models into account, also provided
an improvement over FCLS, albeit smaller when compared
to ELMM and DeepGUn. However, the sensitivity of these
techniques to the selection of the regularization parameters can negatively impact their performance when Monte
Carlo simulations are considered.
Among the Bayesian methods (NCM-E and BCM), BCM
provided a considerable performance improvement over
FCLS, especially when taking into account the unsupervised nature of the method (i.e., no parameter has to be
adjusted). The NCM-E results, on the other hand, were virtually identical to those of FCLS, which indicates that the
isotropic Gaussian EM hypothesis may not be appropriate
for this data set. The performance of the different methods
can be visually distinguished in Figure 18, especially from
the soil EM, in which the similarity between the reconstructions and the reference abundance maps reflects the
general behavior of the quantitative results from Table 4.
The EM reconstruction metrics in Table 4 indicate that
the EMs selected by library-based approaches (MESMA and
fractional sparse SU) are close to the reference ones, especially in terms of SAM M, while the model-based approaches (ELMM and DeepGUn) provided slightly worse results,

0.5
0.3

0.1

0.05

0.1
0.1
0

0.5

1.5

Wavelength (µm)
(a)

0.5

1.5

Wavelength (µm)
(b)

0.5

1.5

Wavelength (µm)
(c)

FIGURE 17. EM bundles extracted by batch VCA [267] for (a) vegetation, (b) soil, and (c) water.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

255

Soil

Water

Reference

Vegetation

1
0.5

FCLS

0
1
0.5

MESMA

0.5

Fractional

0
1

0.5

0
1

ELMM

0.5

DeepGUn

0
1

0.5

0
1

RUSAL

0.5

NCM-E

0.5

BCM

0
1

0.5

0
1

0
FIGURE 18. Abundance maps estimated by the algorithms [values are mapped to colors
ranging from blue (a = 0) to red (a = 1)].

TABLE 4. QUANTITATIVE SIMULATION RESULTS.
RMSE A

RMSE M

SAM M

RMSEY

TIME (S)

FCLS

9.899

—

0.239

0.37

MESMA

6.083

0.504

0.234

0.159

4.9

Fractional

5.993

0.525

0.232

0.159

3.41

ELMM

8.695

0.697

0.56

0.127

28.84

DeepGUn

7.203

0.447

0.395

0.324

80.42

RUSAL

9.509

—

0.108

1.05

NCM-E

9.897

—

0.239

2,482.85

BCM

8.105

—

0.472

468.69

The RMSE results are multiplied by 104.

256

in general, except for DeepGUn’s
RMSE M. The visual assessment of
the estimated signatures in Figure 19
shows an interesting pattern, since,
despite the quantitative metrics,
the amount of variability (i.e., the
variance) estimated by the ELMM
seems closer to the reference spectra. This shows that identifying the
correct spectral signatures in each
pixel is very difficult. We also note
that smaller image REs RMSE Y did
not correlate very well with better
abundance estimation results. Since
some SU methods that take spectral
variability into account adopt flexible models, they can represent the
hyperspectral image pixels in Y very
closely without necessarily improving the abundance estimation.
The execution times present a considerable difference among the methods. Library-based approaches were
able to run very fast (even for MESMA)
since the spectral library contained few
signatures. This demonstrates that the
construction of the library can significantly impact the runtime performance
of these techniques. The methods
based on parametric models (ELMM
and DeepGUn) had intermediate execution times, while RUSAL was very
fast. Finally, Bayesian methods took the
longest to run, with NCM-E proceeding significantly slower than the other
techniques. Finally, we note that this
example is merely illustrative and not
an in-depth evaluation of these methods. Thus, their performance can be different for other data sets and scenarios.

DISCUSSION, CONCLUSIONS,
AND FUTURE DIRECTIONS
Significant advances have been made to mitigate spectral
variability in SU during the past decade, encompassing
experimental and theoretical contributions. Recent work
has, for instance, enabled spectral libraries to be directly
extracted from observed hyperspectral images, provided
more accurate and flexible models to represent the EMs
(e.g., in statistical and parametric methods), and included
different kinds of a priori external information to alleviate
the ill-posedness of the problem, such as the locally correlated characteristics of the EMs and abundances. This was
performed explicitly, by means of regularization approaches and in the definition of the statistical models, as well as
implicitly in the design of the algorithms (e.g., in local SU).
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

Other methods leveraged the spectral characteristics of EM
variability to design improved algorithms (e.g., in spectral
transformations and robust SU methods).
However, there is still a noticeable dependence between
the quality of the unmixing solutions and the necessary
amount of user supervision in the algorithms. Many recent
techniques need considerable tuning to reach their full potential, with a significant portion of algorithm design being left to the user. The lack of more extensive data with
reliable ground truths has also made the evaluation of the
algorithms somewhat difficult. In the following, we detail
some aspects that we think deserve further consideration:

Vegetation

◗◗ As discussed, one important research direction is to im-

prove the robustness of the methods to the selection of
their parameters and to develop informed adjustment
methodologies. This could be performed, for instance,
by leveraging metadata (e.g., external classification
maps) that are available in many applications. This
point applies to the majority of SU algorithms reviewed
in this article and would make those methods more
readily employable as out-of-the-box solutions in practical scenarios.
◗◗ Most SU algorithms that address spectral variability
depend strongly on spectral libraries and reference EM

Soil

Water

Reference

0.15

0.4

0.1
0.5

0.2

0.05

0
0.15

0
1

MESMA

0.4
0.1
0.5

0.2

0.05
0
1

0
0.15

Fractional

0.4

0.1
0.5

0.2

0.05
0
0.6

0
0.15

DeepGUn

ELMM

0.4

0.1
0.5

0.2
0

0.4

0.6

0.05
0
0.2

0.4

0.2

0.1

0.2
0

0.5

1.5

Wavelength (µm)

0.5

1.5

Wavelength (µm)

0.5

1.5

Wavelength (µm)

FIGURE 19. Spectral signatures returned by the algorithms that estimate the EM spectra for each image pixel.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

257

◗◗

258

signatures known a priori or extracted from the observed
hyperspectral image. Improving the robustness of these
methods to the selection of these data is important to
guarantee a more reliable SU performance in practice.
The vast majority of the work reviewed in this article
uses the LMM to describe the interaction between incident light and the materials in the scene, even though
nonlinear mixtures are common in many applications
[207]. However, as shown in [211], a general nonlinear
mixture model is closely related to a spatially varying
version of the LMM, which indicates that linear unmixing with spectral variability is able to address the nonlinear unmixing problem to some extent. Nevertheless,
the relationship between these two models deserves to
be further investigated. In particular, deciding whether
variations in the observed pixel spectra originate from
spectral variability, nonlinear interactions, or slight abundance variations can be very difficult.
An aspect that induces difficulties to the evaluation of
SU methods is the lack of more extensive data with reliable ground truth. However, there is no clear approach
to reliably collect ground truth for abundance values.
This problem is more pronounced when spectral variability is considered. There is not a clearly agreed-upon
protocol to generate realistic synthetic data. A larger,
publicly available data set would strengthen the validation of the methods.
Although many techniques have been proposed to model spectral variability, there is still a distinction between
restrained models inspired by specific, concrete applications and mathematically flexible ones that aim for
a more generic representation. Combining insight from
practical applications with a mathematically thorough
treatment may lead to improved ways to represent spectral variability in a given scene.
Many of the methods discussed here rely, explicitly
or implicitly, on the solution to complex, nonconvex
optimization problems that are often solved only approximately to achieve a computationally tractable algorithm. Investigating the use of more reliable approaches
to solve those problems can help to evaluate the potential accuracy of the models by reducing the influence
from the use of such approximations.
Many algorithms (such as, e.g., MESMA and some statistical approaches) are computationally expensive and
do not scale very well for large images. Considering the
large amount of data currently in need of processing, it is
important to have fast alternatives to solve this problem.
Traditional SU can be readily interpreted as a nonnegative matrix factorization problem. This enables us to
understand many of the limitations of the SU problem
as well as to identify conditions under which it can be
solved exactly. However, such understanding is generally not available when EM variability is considered, except for the particular case of illumination-based spectral variability [216]. A deeper theoretical insight would

be valuable to clearly define limiting conditions under
which this problem can or cannot be solved.
Initially motivated by Earth observation applications,
spectral variability is now considered one of the main
challenges of SU. Although we have already seen a wealth
of contributions from application- and theoretically oriented researchers, it is expected that the further exchange
of ideas between these two areas will help to advance the
field even further.
ACKNOWLEDGMENT
This work was supported in part by the Brazilian National
Council for Scientific and Technological Development.
AUTHOR INFORMATION
Ricardo Augusto Borsoi (raborsoi@gmail.com) received his
doctorate degree. He is with the Federal University of Santa
Catarina, Florianópolis, SC, 88040-900, Brazil. He is a Student Member of IEEE.
Tales Imbiriba (talesim@gmail.com) received his doctorate degree. He is a research scientist at Northeastern University, Boston, Massachusetts, 02115, USA.
José Carlos Moreira Bermudez (jbermudez@ieee.org)
received his Ph.D. degree in electrical engineering. He is a
professor at the Federal University of Santa Catarina, Florianópolis, SC, 88040-900, Brazil. He is a Senior Member
of IEEE.
Cédric Richard (cedric.richard@unice.fr) received his
Ph.D. degree. He is a full professor with the Laboratoire Lagrange, UMR CNRS 7293, Université Côte d’Azur, 06108
Nice CEDEX 2, France. He is a Senior Member of IEEE.
Jocelyn Chanussot (jocelyn.chanussot@grenoble-inp.fr)
received his Ph.D. degree. He is a professor at the University
Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, Grenoble,
38000, France. He is a Fellow of IEEE.
Lucas Drumetz (lucas.drumetz@imt-atlantique.fr) received his Ph.D. degree from the University of Grenoble
Alpes, in 2016. He is an associate professor at IMT Atlantique,
UMR CNRS 6285 LabSTICC, Brest, 29238, France. He is a
Member of IEEE.
Jean-Yves Tourneret (jean-yves.tourneret@enseeiht.
fr) received his Ph.D. degree. He is a professor at the Institut
National Polytechnique of Toulouse, Toulouse, France. He
is a Fellow of IEEE.
Alina Zare (azare@ece.ufl.edu) received his Ph.D. degree. He is a professor at the University of Florida, Gainesville, Florida, 32611, USA. He is a Senior Member of IEEE.
Christian Jutten (christian.jutten@grenoble-inp.fr)
received his Ph.D., in 1981, and his doctorate in physical sciences, in 1987. He is an emeritus professor at the
University Grenoble Alpes, GIPSA-lab, Grenoble, 38400,
France. He is a Fellow of IEEE.
REFERENCES
[1]

T. Kouyama, Y. Yokota, Y. Ishihara, R. Nakamura, S. Yamamoto, and T. Matsunaga, “Development of an application scheme

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

for the SELENE/SP lunar reflectance model for radiometric
calibration of hyperspectral and multispectral sensors,” Planetary Space Sci., vol. 124, pp. 76–83, May 2016. doi: 10.1016/j.
pss.2016.02.003.
J. M. Bioucas-Dias, A. Plaza, G. Camps-Valls, P. Scheunders,
N. Nasrabadi, and J. Chanussot, “Hyperspectral remote sensing data analysis and future challenges,” IEEE Geosci. Remote
Sens. Mag. (replaces Newslett.), vol. 1, no. 2, pp. 6–36, 2013. doi:
10.1109/MGRS.2013.2244672.
D. Manolakis, “Detection algorithms for hyperspectral imaging applications,” IEEE Signal Process. Mag., vol. 19, no. 1, pp.
29–43, 2002. doi: 10.1109/79.974724.
G. Lu and B. Fei, “Medical hyperspectral imaging: A review,” J.
Biomed. Optics, vol. 19, no. 1, p. 010901, 2014. doi: 10.1117/1.
JBO.19.1.010901.
G. A. Shaw and H-h. K. Burke, “Spectral imaging for remote sensing,” Lincoln Laboratory J., vol. 14, no. 1, pp. 3–28,
2003.
N. Keshava and J. F. Mustard, “Spectral unmixing,” IEEE
Signal Process. Mag., vol. 19, no. 1, pp. 44–57, 2002. doi:
10.1109/79.974727.
J. M. Bioucas-Dias, A. Plaza, N. Dobigeon, M. Parente, Q.
Du, P. G, and, and J. Chanussot, “Hyperspectral unmixing
overview: Geometrical, statistical, and sparse regressionbased approaches,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 5, no. 2, pp. 354–379, 2012. doi: 10.1109/
JSTARS.2012.2194696.
A. Zare and K. C. Ho, “Endmember variability in hyperspectral
analysis: Addressing spectral variability during spectral unmixing,” IEEE Signal Process. Mag., vol. 31, no. 1, pp. 95–104, Jan.
2014. doi: 10.1109/MSP.2013.2279177.
B. Somers, G. P. Asner, L. Tits, and P. Coppin, “Endmember
variability in spectral mixture analysis: A review,” Remote Sens.
Environ., vol. 115, no. 7, pp. 1603–1616, 2011. doi: 10.1016/j.
rse.2011.03.003.
F. García-Haro, S. Sommer, and T. Kemper, “A new tool for variable multiple endmember spectral mixture analysis (VMESMA),” Int. J. Remote Sens., vol. 26, no. 10, pp. 2135–2162, 2005.
doi: 10.1080/01431160512331337817.
L. Drumetz, J. Chanussot, and C. Jutten, “Variability of the
endmembers in spectral unmixing: Recent advances,” in Proc.
8th IEEE Workshop on Hyperspectral Image Signal Process.: Evolution Remote Sens., Los Angeles, CA, Aug. 2016, pp. 1–5.
L. Drumetz, J. Chanussot, and C. Jutten, “Variability of the
endmembers in spectral unmixing,” in Hyperspectral Imaging
(Data Handling in Science and Technology), J. M. Amigo, Ed.
Amsterdam, The Netherlands: Elsevier, 2020, ch. 2.7, vol. 32,
pp. 167–203.
R. A. Borsoi et al., A Complete Toolbox for Spectral Unmixing with
Spectral Variability. (version 1.0). Zenodo. [Online]. Available:
http://doi.org/10.5281/zenodo.4659311
J. Theiler, A. Ziemann, S. Matteoli, and M. Diani, “Spectral
variability of remotely sensed target materials: Causes, models, and strategies for mitigation and robust exploitation,” IEEE
Geosci. Remote Sens. Mag. (replaces Newslett.), vol. 7, no. 2, pp.
8–30, 2019. doi: 10.1109/MGRS.2019.2890997.

DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

[15] M. K. Griffin, and H-h. K. Burke, “Compensation of hyperspectral data for atmospheric effects,” Lincoln Lab. J., vol. 14, no. 1,
pp. 29–54, 2003.
[16] B.-C. Gao, M. J. Montes, C. O. Davis, and A. F. Goetz, “Atmospheric correction algorithms for hyperspectral remote sensing
data of land and ocean,” Remote Sens. Environ., vol. 113, pp.
S17–S24, Sept. 2009. doi: 10.1016/j.rse.2007.12.015.
[17] G. Healey and D. Slater, “Models and methods for automated
material identification in hyperspectral imagery acquired under unknown illumination and atmospheric conditions,” IEEE
Trans. Geosci. Remote Sens., vol. 37, no. 6, pp. 2706–2717, 1999.
doi: 10.1109/36.803418.
[18] J. M. P. Nascimento and J. M. B. Dias, “Does independent component analysis play a role in unmixing hyperspectral data?”
IEEE Trans. Geosci. Remote Sens., vol. 43, no. 1, pp. 175–187, Jan.
2005. doi: 10.1109/TGRS.2004.839806.
[19] M. W. Matthew et al., “Atmospheric correction of spectral imagery: Evaluation of the FLAASH algorithm with AVIRIS data,” in
Proc. 31st Appl. Imagery Pattern Recogn. Workshop, Washington,
D.C., 2002, pp. 157–163.
[20] I. C. Lau, “Application of atmospheric correction to hyperspectral data: Comparisons of different techniques on Hymap
data,” in Proc. 12th Australasian Remote Sens. Photogrammetry
Conf. (ARSPC), Freemantle, Australia, 2004, pp. 1–15.
[21] C. Song, C. E. Woodcock, K. C. Seto, M. P. Lenney, and S. A. Macomber, “Classification and change detection using landsat TM
data: When and how to correct atmospheric effects?,” Remote
Sens. Environ., vol. 75, no. 2, pp. 230–244, 2001. doi: 10.1016/
S0034-4257(00)00169-3.
[22] M. K. Griffin, H. Burke, J. Vail, S. Adler-Golden, and M. Matthew, “Sensitivity of atmospheric compensation model retrievals to input parameter specification,” in Proc. AVIRIS Earth Sci.
Appl. Workshop, Pasadena, CA, 1999, pp. 99–17.
[23] R. Wilson, E. Milton, and J. M. Nield, “Spatial variability of the atmosphere over southern England, and its effect on scene-based atmospheric corrections,” Int. J. Remote
Sens., vol. 35, no. 13, pp. 5198–5218, 2014. doi: 10.1080/
01431161.2014.939781.
[24] N. Bhatia, M.-D. Iordache, A. Stein, I. Reusen, and V. A. Tolpekin, “Propagation of uncertainty in atmospheric parameters
to hyperspectral unmixing,” Remote Sens. Environ., vol. 204,
pp. 472–484, Jan. 2018. doi: 10.1016/j.rse.2017.10.008.
[25] C. Bassani, C. Manzo, F. Braga, M. Bresciani, C. Giardino, and
L. Alberotanza, “The impact of the microphysical properties of
aerosol on the atmospheric correction of hyperspectral data in
coastal waters,” Atmos. Measure. Techn., vol. 8, no. 3, pp. 1593–
1604, 2015. doi: 10.5194/amt-8-1593-2015.
[26] Y. J. Kaufman, G. P. Gobbi, and I. Koren, “Aerosol climatology using a tunable spectral variability cloud screening of
AERONET data,” Geophys. Res. Lett., vol. 33, no. 7, 2006. doi:
10.1029/2005GL025478.
[27] D. Schläpfer, A. Hueni, and R. Richter, “Cast shadow detection to quantify the aerosol optical thickness for atmospheric correction of high spatial resolution optical imagery,” Remote Sens., vol. 10, no. 2, p. 200, 2018. doi: 10.3390/
rs10020200.

259

[28] Y. Kaufman and B. Holben, “Calibration of the AVHRR visible
and near-IR bands by atmospheric scattering, ocean glint and
desert reflection,” Int. J. Remote Sens., vol. 14, no. 1, pp. 21–52,
1993. doi: 10.1080/01431169308904320.
[29] N. F. Larsen and K. Stamnes, “Use of shadows to retrieve water vapor in hazy atmospheres,” Appl. Opt., vol. 44, no. 32, pp.
6986–6994, 2005. doi: 10.1364/AO.44.006986.
[30] L. Markelin et al., “Atmospheric correction performance of hyperspectral airborne imagery over a small eutrophic lake under
changing cloud cover,” Remote Sens., vol. 9, no. 1, p. 2, 2016.
doi: 10.3390/rs9010002.
[31] K. Staenz, J. Secker, B.-C. Gao, C. Davis, and C. Nadeau, “Radiative transfer codes applied to hyperspectral data for the retrieval of surface reflectance,” ISPRS J. Photogrammetry Remote
Sens., vol. 57, no. 3, pp. 194–203, 2002. doi: 10.1016/S09242716(02)00121-1.
[32] R. J. Murphy, S. T. Monteiro, and S. Schneider, “Evaluating
classification techniques for mapping vertical geology using field-based hyperspectral sensors,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 8, pp. 3066–3080, 2012. doi: 10.1109/
TGRS.2011.2178419.
[33] A. Plaza, P. Martínez, R. Pérez, and J. Plaza, “A quantitative
and comparative analysis of endmember extraction algorithms
from hyperspectral data,” IEEE Trans. Geosci. Remote Sens.,
vol. 42, no. 3, pp. 650–663, 2004. doi: 10.1109/TGRS.2003.
820314.
[34] N. Keshava, J. Kerekes, D. Manolakis, and G. Shaw, “Algorithm
taxonomy for hyperspectral unmixing,” in Proc. Algorithms Multispectral, Hyperspectral, Ultraspectral Imagery VI, Orlando, FL,
2000, vol. 4049, pp. 42–63.
[35] K. McGwire, T. Minor, and L. Fenstermaker, “Hyperspectral
mixture modeling for quantifying sparse vegetation cover
in arid environments,” Remote Sens. Environ., vol. 72, no. 3,
pp. 360–374, 2000. doi: 10.1016/S0034-4257(99)00112-1.
[36] J. W. Boardman, “Automating spectral unmixing of AVIRIS data
using convex geometry concepts,” in Proc. 4th Annu. JPL Airbone
Geosci. Workshop, Jet Propulsion Lab., Pasadena, CA, 1993, pp. 11–14.
[37] D. A. Roberts, M. Gardner, R. Church, S. Ustin, G. Scheer, and
R. Green, “Mapping chaparral in the Santa Monica mountains
using multiple endmember spectral mixture models,” Remote
Sens. Environ., vol. 65, no. 3, pp. 267–279, 1998. doi: 10.1016/
S0034-4257(98)00037-6.
[38] P. E. Dennison and D. A. Roberts, “Endmember selection for
multiple endmember spectral mixture analysis using endmember average RMSE,” Remote Sens. Environ., vol. 87, nos. 2–3,
pp. 123–135, 2003. doi: 10.1016/S0034-4257(03)00135-4.
[39] T. Roper and M. Andrews, “Shadow modelling and correction
techniques in hyperspectral imaging,” Electron. Lett., vol. 49,
no. 7, pp. 458–460, 2013. doi: 10.1049/el.2012.4406.
[40] Q. Zhang, V. P. Pauca, R. J. Plemmons, and D. D. Nikic, “Detecting objects under shadows by fusion of hyperspectral and
LiDAR DATA: A physical model approach,” in Proc. 5th Workshop on Hyperspectral Image Signal Process.: Evolution Remote Sens.
(WHISPERS), Gainesville, FL, 2013, pp. 1–4.
[41] G. J. Fitzgerald, P. J. Pinter, D. J. Hunsaker, and T. R. Clarke,
“Multiple shadow fractions in spectral mixture analysis of a

260

[42]

[43]
[44]

[45]

[46]

[47]

[48]

[49]
[50]

[51]

[52]

[53]

[54]

[55]

[56]

cotton canopy,” Remote Sens. Environ., vol. 97, no. 4, pp. 526–
539, 2005. doi: 10.1016/j.rse.2005.05.020.
K. Choi and E. Milton, “An investigation into the properties of
the dark endmember in spectral feature space,” in Proc. 25th
IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), Seoul, South Korea, 2005, pp. 25–29.
D. K. Lynch, “Shadows,” Appl. Opt., vol. 54, no. 4, pp. B154–
B164, 2015. doi: 10.1364/AO.54.00B154.
S. Adler-Golden, M. W. Matthew, G. P. Anderson, G. W. Felde,
and J. A. Gardner, “An algorithm for de-shadowing spectral imagery,” in Proc. 11th JPL Airborne Earth Sci. Workshop, Pasadena,
CA, 2000, pp. 1–8.
D. Schläpfer, R. Richter, and A. Damm, “Correction of shadowing in imaging spectroscopy data by quantification of the
proportion of diffuse illumination,” in Proc. 8th Imaging Spectroscopy Workshop (SIG-EARSeL), Nantes, France, 2013, pp. 8–10.
R. Richter, T. Kellenberger, and H. Kaufmann, “Comparison of
topographic correction methods,” Remote Sens., vol. 1, no. 3,
pp. 184–196, 2009. doi: 10.3390/rs1030184.
J. Feng, B. Rivard, and A. Sanchez-Azofeifa, “The topographic normalization of hyperspectral data: Implications for the
selection of spectral end members and lithologic mapping,”
Remote Sens. Environ., vol. 85, no. 2, pp. 221–231, 2003. doi:
10.1016/S0034-4257(03)00002-6.
B. Hapke, “Bidirectional reflectance spectroscopy, 1, Theory,”
J. Geophys. Res., vol. 86, no. B4, pp. 3039–3054, 1981. doi:
10.1029/JB086iB04p03039.
B. Hapke, Theory of Reflectance and Emittance Spectroscopy. Cambridge, U.K.: Cambridge Univ. Press, 1993.
R. Heylen, M. Parente, and P. Gader, “A review of nonlinear
hyperspectral unmixing methods,” IEEE J. Sel. Topics Appl. Earth
Observ. Remote Sens., vol. 7, no. 6, pp. 1844–1868, June 2014.
doi: 10.1109/JSTARS.2014.2320576.
L. Drumetz, M.-A. Veganzones, S. Henrot, R. Phlypo, J. Chanussot, and C. Jutten, “Blind hyperspectral unmixing using an
extended linear mixing model to address spectral variability,”
IEEE Trans. Image Process., vol. 25, no. 8, pp. 3890–3905, 2016.
doi: 10.1109/TIP.2016.2579259.
L. Drumetz, J. Chanussot, and C. Jutten, “Spectral unmixing:
A derivation of the extended linear mixing model from the
Hapke model,” IEEE Geosci. Remote Sens. Lett., vol. 17, no. 11,
pp. 1866–1870, 2020. doi: 10.1109/LGRS.2019.2958203.
B. Combal, and H. Isaka, “The effect of small topographic variations on reflectance,” IEEE Trans. Geosci. Remote Sens., vol. 40,
no. 3, pp. 663–670, 2002. doi: 10.1109/TGRS.2002.1000325.
M. Cochrane, “Using vegetation reflectance variability for
species level classification of hyperspectral data,” Int. J.
Remote Sens., vol. 21, no. 10, pp. 2075–2087, 2000. doi:
10.1080/01431160050021303.
J. Zhang, B. Rivard, A. Sánchez-Azofeifa, and K. Castro-Esau,
“Intra-and inter-class spectral variability of tropical tree species
at La Selva, Costa Rica: Implications for species identification
using HYDICE imagery,” Remote Sens. Environ., vol. 105, no. 2,
pp. 129–141, 2006. doi: 10.1016/j.rse.2006.06.010.
M. F. Baumgardner, L. F. Silva, L. L. Biehl, and E. R. Stoner,
“Reflectance properties of soils,” in Advances in Agronomy, N. C.

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

[57]

[58]

[59]

[60]

[61]

[62]

[63]

[64]

[65]

[66]

[67]

[68]

[69]

[70]

Brady, Ed. Amsterdam, The Netherlands: Elsevier, 1986, vol.
38, pp. 1–44.
J. K. Crowley, “Visible and near-infrared spectra of carbonate
rocks: Reflectance variations related to petrographic texture
and impurities,” J. Geophys. Res., Solid Earth, vol. 91, no. B5, pp.
5001–5012, 1986. doi: 10.1029/JB091iB05p05001.
R. N. Clark, “Spectroscopy of rocks and minerals, and principles of spectroscopy,” in Remote Sensing for the Earth Sciences:
Manual of Remote Sensing, A. N. Rencz, Ed. New York: Wiley,
1999, vol. 3, pp. 3–58.
J. Franke, D. A. Roberts, K. Halligan, and G. Menz, “Hierarchical multiple endmember spectral mixture analysis (MESMA) of
hyperspectral imagery for urban environments,” Remote Sens.
Environ., vol. 113, no. 8, pp. 1712–1723, 2009. doi: 10.1016/j.
rse.2009.03.018.
J. C. Price, “How unique are spectral signatures?” Remote Sens.
Environ., vol. 49, no. 3, pp. 181–186, 1994. doi: 10.1016/00344257(94)90013-2.
G. P. Asner, “Biophysical and biochemical sources of variability
in canopy reflectance,” Remote Sens. Environ., vol. 64, no. 3, pp.
234–253, 1998. doi: 10.1016/S0034-4257(98)00014-5.
M. P. Ferreira, A. E. B. Grondona, S. B. A. Rolim, and Y. E.
Shimabukuro, “Analyzing the spectral variability of tropical
tree species using hyperspectral feature selection and leaf optical modeling,” J. Appl. Remote Sens., vol. 7, no. 1, p. 73,502,
2013. doi: 10.1117/1.JRS.7.073502.
P. Gong, R. Pu, and B. Yu, “Conifer species recognition: An exploratory analysis of in situ hyperspectral data,” Remote Sens.
Environ., vol. 62, no. 2, pp. 189–200, 1997. doi: 10.1016/S00344257(97)00094-1.
P. Lukeš, P. Stenberg, M. Rautiainen, M. Mõttus, and K. M. Vanhatalo, “Optical properties of leaves and needles for boreal tree
species in Europe,” Remote Sens. Lett., vol. 4, no. 7, pp. 667–676,
2013. doi: 10.1080/2150704X.2013.782112.
Z. Gao and L. Zhang, “Multi-seasonal spectral characteristics
analysis of coastal salt marsh vegetation in Shanghai, China,”
Estuarine, Coastal Shelf Sci., vol. 69, no. 1-2, pp. 217–224, 2006.
doi: 10.1016/j.ecss.2006.04.016.
H. Schmidt and A. Karnieli, “Remote sensing of the seasonal
variability of vegetation in a semi-arid environment,” J. Arid
Environ., vol. 45, no. 1, pp. 43–59, 2000. doi: 10.1006/jare.1999.
0607.
M. Mõttus, M. Sulev, and L. Hallik, “Seasonal course of the
spectral properties of alder and birch leaves,” IEEE J. Sel. Topics
Appl. Earth Observ. Remote Sens., vol. 7, no. 6, pp. 2496–2505,
2014. doi: 10.1109/JSTARS.2013.2294242.
S. Jacquemoud and S. L. Ustin, “Leaf optical properties: A state
of the art,” in Proc. 8th Int. Symp. Phys. Measure. Signatures Remote Sens., Aussois, France, 2001, pp. 223–332.
S. Jacquemoud and F. Baret, “PROSPECT: A model of leaf optical properties spectra,” Remote Sens. Environ., vol. 34, no. 2, pp.
75–91, 1990. doi: 10.1016/0034-4257(90)90100-Z.
W. Verhoef, “Light scattering by leaf layers with application to
canopy reflectance modeling: The SAIL model,” Remote Sens.
Environ., vol. 16, no. 2, pp. 125–141, 1984. doi: 10.1016/00344257(84)90057-9.

DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

[71] T. P. Dawson, P. J. Curran, and S. E. Plummer, “LIBERTY—modeling the effects of leaf biochemical concentration on reflectance spectra,” Remote Sens. Environ., vol. 65, no. 1, pp. 50–60,
1998. doi: 10.1016/S0034-4257(98)00007-8.
[72] D. B. Lobell and G. P. Asner, “Moisture effects on soil reflectance,” Soil Sci. Soc. Amer. J., vol. 66, no. 3, pp. 722–727, 2002.
doi: 10.2136/sssaj2002.0722.
[73] B. Somers, S. Delalieux, W. W. Verstraeten, and P. Coppin, “A
conceptual framework for the simultaneous extraction of subpixel spatial extent and spectral characteristics of crops,” Photogrammetric Eng. Remote Sens., vol. 75, no. 1, pp. 57–68, 2009.
doi: 10.14358/PERS.75.1.57.
[74] M. Sadeghi, S. B. Jones, and W. D. Philpot, “A linear physicallybased model for remote sensing of soil moisture using short
wave infrared bands,” Remote Sens. Environ., vol. 164, pp. 66–
76, July 2015. doi: 10.1016/j.rse.2015.04.007.
[75] W. J. Wiscombe and S. G. Warren, “A model for the spectral
albedo of snow. I: Pure snow,” J. Atmos. Sci., vol. 37, no. 12,
pp. 2712–2733, 1980. doi: 10.1175/1520-0469(1980)037<2712:
AMFTSA>2.0.CO;2.
[76] J.-B. Féret, A. Gitelson, S. Noble, and S. Jacquemoud, “PROSPECT-D: Towards modeling leaf optical properties through a
complete lifecycle,” Remote Sens. Environ., vol. 193, pp. 204–
215, May 2017. doi: 10.1016/j.rse.2017.03.004.
[77] R. Webster, P. Curran, and J. Munden, “Spatial correlation in reflected radiation from the ground and its implications for sampling and mapping by ground-based radiometry,” Remote Sens.
Environ., vol. 29, no. 1, pp. 67–78, 1989. doi: 10.1016/00344257(89)90079-5.
[78] E. Tola, K. Al-Gaadi, R. Madugundu, A. Zeyada, A. Kayad, and
C. Biradar, “Characterization of spatial variability of soil physicochemical properties and its impact on Rhodes grass productivity,” Saudi J. Biol. Sci., vol. 24, no. 2, pp. 421–429, 2017. doi:
10.1016/j.sjbs.2016.04.013.
[79] A. Najafian, M. Dayani, H. R. Motaghian, and H. Nadian,
“Geostatistical assessment of the spatial distribution of
some chemical properties in calcareous soils,” J. Integr. Agri.,
vol. 11, no. 10, pp. 1729–1737, 2012. doi: 10.1016/S20953119(12)60177-4.
[80] Y.-C. Wei, Y.-L. Bai, J.-Y. Jin, F. Zhang, L.-P. Zhang, and X.-Q.
Liu, “Spatial variability of soil chemical properties in the reclaiming marine foreland to Yellow Sea of China,” Agri. Sci.
China, vol. 8, no. 9, pp. 1103–1111, 2009. doi: 10.1016/S16712927(08)60318-1.
[81] J. Hou-Long et al., “Spatial variability of soil properties in a long-term tobacco plantation in Central China,”
Soil Sci., vol. 175, no. 3, pp. 137–144, 2010. doi: 10.1097/
SS.0b013e3181d82176.
[82] Y. Yuan, Y. Feng, and X. Lu, “Projection-based NMF for hyperspectral unmixing,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 8, no. 6, pp. 2632–2643, 2015. doi: 10.1109/
JSTARS.2015.2427656.
[83] T. Uezato, M. Fauvel, and N. Dobigeon, “Hyperspectral unmixing with spectral variability using adaptive bundles and
double sparsity,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 6,
pp. 3980–3992, 2019. doi: 10.1109/TGRS.2018.2889256.

261

[84] L. Drumetz, T. R. Meyer, J. Chanussot, A. L. Bertozzi, and C.
Jutten, “Hyperspectral image unmixing with endmember bundles and group sparsity inducing mixed norms,” IEEE Trans. Image Process., vol. 28, no. 7, pp. 3435–3450, 2019. doi: 10.1109/
TIP.2019.2897254.
[85] C. L. Lippitt, D. A. Stow, D. A. Roberts, and L. L. Coulter, “Multidate MESMA for monitoring vegetation growth forms in southern California shrublands,” Int. J. Remote Sens., vol. 39, no. 3,
pp. 655–683, 2018. doi: 10.1080/01431161.2017.1388936.
[86] S. Bernabe, F. D. Igual, G. Botella, M. Prieto-Matias, and A.
Plaza, “Parallel implementation of the multiple endmember
spectral mixture analysis algorithm for hyperspectral unmixing,” in Proc. High-Performance Comput. Remote Sens. V, 2015,
vol. 9646, p. 96460J. doi: 10.1117/12.2195120.
[87] R. Heylen, A. Zare, P. Gader, and P. Scheunders, “Hyperspectral
unmixing with endmember variability via alternating angle
minimization,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 8,
pp. 4983–4993, 2016. doi: 10.1109/TGRS.2016.2554160.
[88] J.-P. Combe et al., “Analysis of OMEGA/Mars express data hyperspectral data using a multiple-endmember linear spectral
unmixing model (MELSUM): Methodology and first results,”
Planetary Space Sci., vol. 56, no. 7, pp. 951–975, 2008. doi:
10.1016/j.pss.2007.12.007.
[89] P. E. Dennison, K. Q. Halligan, and D. A. Roberts, “A comparison of error metrics and constraints for multiple endmember
spectral mixture analysis and spectral angle mapper,” Remote
Sens. Environ., vol. 93, no. 3, pp. 359–367, 2004. doi: 10.1016/j.rse.
2004.07.013.
[90] L. Tits, R. Heylen, B. Somers, P. Scheunders, and P. Coppin, “A
geometric unmixing concept for the selection of optimal binary
endmember combinations,” IEEE Geosci. Remote Sens. Lett., vol.
12, no. 1, pp. 82–86, 2015. doi: 10.1109/LGRS.2014.2326555.
[91] R. Mhenni, S. Bourguignon, J. Ninin, and F. Schmidt, “Spectral
unmixing with sparsity and structuring constraints,” in Proc.
9th Workshop on Hyperspectral Image and Signal Process., Evolution
Remote Sens., Amsterdam, The Netherlands, 2018, pp. 1–5.
[92] C. Song, “Spectral mixture analysis for subpixel vegetation
fractions in the urban environment: How to incorporate endmember variability?” Remote Sens. Environ., vol. 95, no. 2, pp.
248–263, 2005. doi: 10.1016/j.rse.2005.01.002.
[93] D. A. Roberts, P. E. Dennison, M. E. Gardner, Y. Hetzel, S. L.
Ustin, and C. T. Lee, “Evaluation of the potential of hyperion
for fire danger assessment by comparison to the airborne visible/infrared imaging spectrometer,” IEEE Trans. Geosci. Remote Sens., vol. 41, no. 6, pp. 1297–1310, 2003. doi: 10.1109/
TGRS.2003.812904.
[94] K. Tan, X. Jin, Q. Du, and P. Du, “Modified multiple endmember spectral mixture analysis for mapping impervious surfaces
in urban environments,” J. Appl. Remote Sens., vol. 8, no. 1,
p. 85096, 2014. doi: 10.1117/1.JRS.8.085096.
[95] C. Zhang et al., “Mapping urban land cover types using
object-based multiple endmember spectral mixture analysis,” Remote Sens. Lett., vol. 5, no. 6, pp. 521–529, 2014. doi:
10.1080/2150704X.2014.930197.
[96] C. Zhang, “Multiscale quantification of urban composition
from EO-1/Hyperion data using object-based spectral unmix-

262

ing,” Int. J. Appl. Earth Observ. Geoinf., vol. 47, pp. 153–162, May
2016. doi: 10.1016/j.jag.2016.01.002.
[97] C. A. Bateson, G. P. Asner, and C. A. Wessman, “Endmember bundles: A new approach to incorporating endmember variability into spectral mixture analysis,” IEEE Trans.
Geosci. Remote Sens., vol. 38, no. 2, pp. 1083–1094, 2000. doi:
10.1109/36.841987.
[98] M. Petrou and P. G. Foschi, “Confidence in linear spectral unmixing of single pixels,” IEEE Trans. Geosci. Remote Sens., vol. 37,
no. 1, pp. 624–626, 1999. doi: 10.1109/36.739132.
[99] G. P. Asner and D. B. Lobell, “A biogeophysical approach for automated SWIR unmixing of soils and vegetation,” Remote Sens.
Environ., vol. 74, no. 1, pp. 99–112, 2000. doi: 10.1016/S00344257(00)00126-7.
[100] G. P. Asner and K. B. Heidebrecht, “Spectral u nmixing
of vegetation, soil and dry carbon cover in arid regions:
Comparing multispectral and hyperspectral observations,”

Int. J. Remote Sens., vol. 23, no. 19, pp. 3939–3958, 2002. doi:
10.1080/01431160110115960.
[101] G. P. Asner, M. M. Bustamante, and A. R. Townsend, “Scale
dependence of biophysical structure in deforested areas bordering the Tapajos National Forest, Central Amazon,” Remote
Sens. Environ., vol. 87, no. 4, pp. 507–520, 2003. doi: 10.1016/j.
rse.2003.03.001.
[102] J. M. Bioucas-Dias and M. A. Figueiredo, “Alternating direction
algorithms for constrained sparse regression: Application to
hyperspectral unmixing,” in Proc. 2nd Workshop on Hyperspectral Image and Signal Process.: Evolution Remote Sens., Reykjavik,
Iceland, 2010, pp. 1–4.
[103] M.-D. Iordache, J. M. Bioucas-Dias, and A. Plaza, “Sparse
unmixing of hyperspectral data,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 6, pp. 2014–2039, 2011. doi: 10.1109/
TGRS.2010.2098413.
[104] Z. Shi, W. Tang, Z. Duren, and Z. Jiang, “Subspace matching
pursuit for sparse unmixing of hyperspectral data,” IEEE Trans.
Geosci. Remote Sens., vol. 52, no. 6, pp. 3256–3274, 2014. doi:
10.1109/TGRS.2013.2272076.
[105] W. Tang, Z. Shi, and Y. Wu, “Regularized simultaneous forward–backward greedy algorithm for sparse unmixing of hyperspectral data,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 9,
pp. 5271–5288, 2014.
[106] Z. Shi, T. Shi, M. Zhou, and X. Xu, “Collaborative sparse hyperspectral unmixing using l0 norm,” IEEE Trans. Geosci. Remote Sens., vol. 56, no. 9, pp. 5495–5508, 2018. doi: 10.1109/
TGRS.2018.2818703.
[107] X. Xu, and Z. Shi, “Multi-objective based spectral unmixing for
hyperspectral images,” ISPRS J. Photogrammetry Remote Sens., vol.
124, pp. 54–69, Feb. 2017. doi: 10.1016/j.isprsjprs.2016.12.010.
[108] X. Xu, Z. Shi, and B. Pan, “ℓ0 -based sparse hyperspectral unmixing using spectral information and a multi-objectives
formulation,” ISPRS J. Photogramm. Remote Sens., vol. 141, pp.
46–58, July 2018. doi: 10.1016/j.isprsjprs.2018.04.008.
[109] X. Xu, Z. Shi, B. Pan, and X. Li, “A classification-based model
for multi-objective hyperspectral sparse unmixing,” IEEE Trans.
Geosci. Remote Sens., vol. 57, no. 12, pp. 9612–9625, 2019. doi:
10.1109/TGRS.2019.2928021.
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

[110] M.-D. Iordache, J. M. Bioucas-Dias, and A. Plaza, “Collaborative sparse regression for hyperspectral unmixing,” IEEE Trans.
Geosci. Remote Sens., vol. 52, no. 1, pp. 341–354, 2014. doi:
10.1109/TGRS.2013.2240001.
[111] Y. Qian, S. Jia, J. Zhou, and A. Robles-Kelly, “Hyperspectral unmixing via L ½ sparsity-constrained nonnegative matrix factorization,” IEEE Trans. Geosci. and Remote Sens., vol. 49, no. 11, pp.
4282–4297, 2011.
[112] M.-D. Iordache, J. M. Bioucas-Dias, and A. Plaza, “Total variation spatial regularization for sparse hyperspectral unmixing,”
IEEE Trans. Geosci. Remote Sens., vol. 50, no. 11, pp. 4484–4502,
2012. doi: 10.1109/TGRS.2012.2191590.
[113] R. A. Borsoi, T. Imbiriba, J. C. M. Bermudez, and C. Richard, “A fast multiscale spatial regularization for sparse
hyperspectral unmixing,” IEEE Geosci. Remote Sens. Lett.,
vol. 16, no. 4, pp. 598–602, Apr. 2019. doi: 10.1109/
LGRS.2018.2878394.
[114] M.-D. Iordache, J. M. Bioucas-Dias, and A. Plaza, “Hyperspectral unmixing with sparse group lasso,” in Proc. IEEE Int.
Geosci. Remote Sens. Symp. (IGARSS), Vancouver, Canada, 2011,
pp. 3586–3589.
[115] X. Fu, W.-K. Ma, J. M. Bioucas-Dias, and T.-H. Chan, “Semiblind hyperspectral unmixing in the presence of spectral library
mismatches,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 9, pp.
5171–5184, 2016. doi: 10.1109/TGRS.2016.2557340.
[116] M. Berman et al., “A comparison between three sparse unmixing algorithms using a large library of shortwave infrared mineral spectra,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 6, pp.
3588–3610, 2017. doi: 10.1109/TGRS.2017.2676816.
[117] K. J. Guilfoyle, M. L. Althouse, and C.-I. Chang, “A quantitative
and comparative analysis of linear and nonlinear spectral mixture models using radial basis function neural networks,” IEEE
Trans. Geosci. Remote Sens., vol. 39, no. 10, pp. 2314–2318, 2001.
doi: 10.1109/36.957296.
[118] A. Baraldi, E. Binaghi, P. Blonda, P. A. Brivio, and A. Rampini,
“Comparison of the multilayer perceptron with neuro-fuzzy
techniques in the estimation of cover class mixture in remotely
sensed data,” IEEE Trans. Geosci. Remote Sens., vol. 39, no. 5,
pp. 994–1005, 2001. doi: 10.1109/36.921417.
[119] A. Okujeni, S. Van der Linden, B. Jakimow, A. Rabe, J. Verrelst,
and P. Hostert, “A comparison of advanced regression algorithms for quantifying urban land cover,” Remote Sens., vol. 6,
no. 7, pp. 6324–6346, 2014. doi: 10.3390/rs6076324.
[120] F. Bovolo, L. Bruzzone, and L. Carlin, “A novel technique for
subpixel image classification based on support vector machine,” IEEE Trans. Image Process., vol. 19, no. 11, pp. 2983–
2999, 2010. doi: 10.1109/TIP.2010.2051632.
[121] G. A. Licciardi and F. Del Frate, “Pixel unmixing in hyperspectral data by means of neural networks,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 11, pp. 4163–4172, 2011. doi: 10.1109/
TGRS.2011.2160950.
[122] A. Okujeni, S. van der Linden, L. Tits, B. Somers, and P.
Hostert, “Support vector regression and synthetically mixed
training data for quantifying urban land cover,” Remote Sens.
Environ., vol. 137, no. 1, pp. 184–197, 2013. doi: 10.1016/j.
rse.2013.06.007.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

[123] F. A. Mianji and Y. Zhang, “SVM-based unmixing-to-classification conversion for hyperspectral abundance quantification,”
IEEE Trans. Geosci. Remote Sens., vol. 49, no. 11, pp. 4318–4327,
2011. doi: 10.1109/TGRS.2011.2166766.
[124] L. Wang, D. Liu, Q. Wang, and Y. Wang, “Spectral unmixing
model based on least squares support vector machine with
unmixing residue constraints,” IEEE Geosci. Remote Sens. Lett.,
vol. 10, no. 6, pp. 1592–1596, 2013. doi: 10.1109/LGRS.2013.
2262371.
[125] A. Okujeni, S. van der Linden, S. Suess, and P. Hostert, “Ensemble learning from synthetically mixed training data for
quantifying urban land cover with support vector regression,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol.
10, no. 4, pp. 1640–1650, 2017. doi: 10.1109/JSTARS.2016.
2634859.
[126] J. Rosentreter, R. Hagensieker, A. Okujeni, R. Roscher, P. D.
Wagner, and B. Waske, “Subpixel mapping of urban areas using
EnMAP data and multioutput support vector regression,” IEEE
J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 10, no. 5, pp.
1938–1948, 2017. doi: 10.1109/JSTARS.2017.2652726.
[127] J. Plaza, A. Plaza, R. Perez, and P. Martinez, “On the use of
small training sets for neural network-based characterization
of mixed pixels in remotely sensed hyperspectral images,”
Pattern Recognition, vol. 42, no. 11, pp. 3032–3045, 2009. doi:
10.1016/j.patcog.2009.04.008.
[128] J. Plaza, and A. Plaza, “Spectral mixture analysis of hyperspectral scenes using intelligently selected training samples,” IEEE
Geosci. Remote Sens. Lett., vol. 7, no. 2, pp. 371–375, 2010. doi:
10.1109/LGRS.2009.2036139.
[129] L. Wang and X. Jia, “Integration of soft and hard classifications
using extended support vector machines,” IEEE Geosci. Remote
Sens. Lett., vol. 6, no. 3, pp. 543–547, 2009.
[130] Y. Gu, S. Wang, and X. Jia, “Spectral unmixing in multiple-kernel Hilbert space for hyperspectral imagery,” IEEE Trans. Geosci.
Remote Sens., vol. 51, no. 7, pp. 3968–3981, 2013. doi: 10.1109/
TGRS.2012.2227757.
[131] X. Li, X. Jia, L. Wang, and K. Zhao, “On spectral unmixing resolution using extended support vector machines,” IEEE Trans.
Geosci. Remote Sens., vol. 53, no. 9, pp. 4985–4996, 2015. doi:
10.1109/TGRS.2015.2415587.
[132] X. Li, X. Jia, L. Wang, and K. Zhao, “Reduction of spectral unmixing uncertainty using minimum-class-variance support
vector machines,” IEEE Geosci. Remote Sens. Lett., vol. 13, no. 9,
pp. 1335–1339, 2016.
[133] T. Uezato, R. J. Murphy, A. Melkumyan, and A. Chlingaryan,
“A novel spectral unmixing method incorporating spectral
variability within endmember classes,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 5, pp. 2812–2831, 2016. doi: 10.1109/
TGRS.2015.2506168.
[134] T. Uezato, R. J. Murphy, A. Melkumyan, and A. Chlingaryan,
“Incorporating spatial information and endmember variability into unmixing analyses to improve abundance estimates,” IEEE Trans. Image Process., vol. 25, no. 12, pp. 5563–
5575, 2016.
[135] R. Heylen, D. Burazerovic, and P. Scheunders, “Non-linear
spectral unmixing by geodesic simplex volume maximization,”

263

IEEE J. Sel. Topics Signal Process., vol. 5, no. 3, pp. 534–542, 2011.
doi: 10.1109/JSTSP.2010.2088377.
[136] B. Koirala, Z. Zahiri, A. Lamberti, and P. Scheunders, “Robust
supervised method for nonlinear spectral unmixing accounting for endmember variability,” IEEE Trans. Geosci. Remote Sens.,
2020. doi: 10.1109/TGRS.2020.3031012.
[137] X. Zhang, Y. Sun, J. Zhang, P. Wu, and L. Jiao, “Hyperspectral
unmixing via deep convolutional neural networks,” IEEE Geosci. Remote Sens. Lett., vol. 15, no. 11, pp. 1755–1759, 2018. doi:
10.1109/LGRS.2018.2857804.
[138] Y. Zeng, C. Ritz, J. Zhao, and J. Lan, “Attention-based residual
network with scattering transform features for hyperspectral
unmixing with limited training samples,” Remote Sens., vol. 12,
no. 3, p. 400, 2020. doi: 10.3390/rs12030400.
[139] B. Palsson, J. Sigurdsson, J. R. Sveinsson, and M. O. Ulfarsson,
“Hyperspectral unmixing using a neural network autoencoder,” IEEE Access, vol. 6, pp. 25,646–25,656, 2018. doi: 10.1109/
ACCESS.2018.2818280.
[140] Y. Su, J. Li, A. Plaza, A. Marinoni, P. Gamba, and S. Chakravortty, “DAEN: Deep autoencoder networks for hyperspectral unmixing,” IEEE Trans. Geosci. Remote Sens., vol. 57, no.
7, pp. 4309–4321, 2019. doi: 10.1109/TGRS.2018.2890633.
[141] Y. Su, A. Marinoni, J. Li, J. Plaza, and P. Gamba, “Stacked nonnegative sparse autoencoders for robust hyperspectral unmixing,” IEEE Geosci. Remote Sens. Lett., vol. 15, no. 9, pp. 1427–
1431, 2018. doi: 10.1109/LGRS.2018.2841400.
[142] Y. Qu and H. Qi, “uDAS: An untied denoising autoencoder
with sparsity for spectral unmixing,” IEEE Trans. Geosci. Remote
Sens., vol. 57, no. 3, pp. 1698–1712, Mar. 2019. doi: 10.1109/
TGRS.2018.2868690.
[143] Y. Qian, F. Xiong, Q. Qian, and J. Zhou, “Spectral mixture model inspired network architectures for hyperspectral unmixing,”
IEEE Trans. Geosci. Remote Sens., vol. 58, no. 10, pp. 7418–7434,
2020. doi: 10.1109/TGRS.2020.2982490.
[144] J. Li, X. Li, B. Huang, and L. Zhao, “Hopfield neural network
approach for supervised nonlinear spectral unmixing,” IEEE
Geosci. Remote Sens. Lett., vol. 13, no. 7, pp. 1002–1006, 2016.
doi: 10.1109/LGRS.2016.2560222.
[145] S. Cooper, A. Okujeni, C. Jänicke, M. Clark, S. van der Linden,
and P. Hostert, “Disentangling fractional vegetation cover:
Regression-based unmixing of simulated spaceborne imaging
spectroscopy data,” Remote Sens. Environ., vol. 246, p. 111,856,
Sept. 2020. doi: 10.1016/j.rse.2020.111856.
[146] Z. Mitraka, F. Del Frate, and F. Carbone, “Nonlinear spectral
unmixing of landsat imagery for urban surface cover mapping,”
IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 9, no. 7,
pp. 3340–3350, 2016. doi: 10.1109/JSTARS.2016.2522181.
[147] A. Okujeni et al., “Generalizing machine learning regression
models using multi-site spectral libraries for mapping vegetation-impervious-soil fractions across multiple cities,” Remote
Sens. Environ., vol. 216, pp. 482–496, Oct. 2018. doi: 10.1016/j.
rse.2018.07.011.
[148] L. Zhang, L. Zhang, and B. Du, “Deep learning for remote sensing data: A technical tutorial on the state of the art,” IEEE Geosci. Remote Sens. Mag. (replaces Newslett.), vol. 4, no. 2, pp. 22–40,
2016. doi: 10.1109/MGRS.2016.2540798.

264

[149] X. X. Zhu et al., “Deep learning in remote sensing: A comprehensive review and list of resources,” IEEE Geosci. Remote
Sens. Mag. (replaces Newslett.), vol. 5, no. 4, pp. 8–36, 2017. doi:
10.1109/MGRS.2017.2762307.
[150] J. Li, “Wavelet-based feature extraction for improved endmember abundance estimation in linear unmixing of hyperspectral
signals,” IEEE Trans. Geosci. Remote Sens., vol. 42, no. 3, pp.
644–649, 2004. doi: 10.1109/TGRS.2003.822750.
[151] C. Wu, “Normalized spectral mixture analysis for monitoring urban composition using ETM+ imagery,” Remote Sens.
Environ., vol. 93, no. 4, pp. 480–492, 2004. doi: 10.1016/j.
rse.2004.08.003.
[152] K. N. Youngentob, D. A. Roberts, A. A. Held, P. E. Dennison, X.
Jia, and D. B. Lindenmayer, “Mapping two eucalyptus subgenera using multiple endmember spectral mixture analysis and
continuum-removed imaging spectrometry data,” Remote Sens.
Environ., vol. 115, no. 5, pp. 1115–1128, 2011. doi: 10.1016/j.
rse.2010.12.012.
[153] J. Zhang, B. Rivard, and A. Sanchez-Azofeifa, “Derivative spectral unmixing of hyperspectral data applied to mixtures of lichen and rock,” IEEE Trans. Geosci. Remote Sens., vol. 42, no. 9,
pp. 1934–1940, 2004. doi: 10.1109/TGRS.2004.832239.
[154] P. Debba, E. J. Carranza, F. D. van der Meer, and A. Stein,
“Abundance estimation of spectrally similar minerals by using derivative spectra in simulated annealing,” IEEE Trans.
Geosci. Remote Sens., vol. 44, no. 12, pp. 3649–3658, 2006. doi:
10.1109/TGRS.2006.881125.
[155] X. Miao et al., “Estimation of yellow starthistle abundance
through CASI-2 hyperspectral imagery using linear spectral
mixture models,” Remote Sens. Environ., vol. 101, no. 3, pp.
329–341, 2006. doi: 10.1016/j.rse.2006.01.006.
[156] B. Somers, S. Delalieux, W. Verstraeten, J. Van Aardt, G. Albrigo, and P. Coppin, “An automated waveband selection
technique for optimized hyperspectral mixture analysis,”
Int. J. Remote Sens., vol. 31, no. 20, pp. 5549–5568, 2010. doi:
10.1080/01431160903311305.
[157] B. Somers and G. P. Asner, “Multi-temporal hyperspectral mixture analysis and feature selection for invasive species mapping
in rainforests,” Remote Sens. Environ., vol. 136, no. 1, pp. 14–27,
2013. doi: 10.1016/j.rse.2013.04.006.
[158] O. Ghaffari, M. J. V. Zoej, and M. Mokhtarzade, “Reducing the
effect of the endmembers’ spectral variability by selecting the
optimal spectral bands,” Remote Sens., vol. 9, no. 9, p. 884, 2017.
doi: 10.3390/rs9090884.
[159] Z. Tane, D. Roberts, S. Veraverbeke, Á. Casas, C. Ramirez, and
S. Ustin, “Evaluating endmember and band selection techniques for multiple endmember spectral mixture analysis using post-fire imaging spectroscopy,” Remote Sens., vol. 10, no. 3,
p. 389, 2018. doi: 10.3390/rs10030389.
[160] B. Somers and G. P. Asner, “Tree species mapping in tropical
forests using multi-temporal imaging spectroscopy: Wavelength adaptive spectral mixture analysis,” Int. J. Appl. Earth
Observ. Geoinf., vol. 31, pp. 57–66, Sept. 2014. doi: 10.1016/j.
jag.2014.02.006.
[161] B. Somers, S. Delalieux, J. Stuckens, W. Verstraeten, and P. Coppin, “A weighted linear spectral mixture analysis approach to
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

address endmember variability in agricultural production systems,” Int. J. Remote Sens., vol. 30, no. 1, pp. 139–147, 2009. doi:
10.1080/01431160802304625.
[162] B. Somers, J. Verbesselt, E. M. Ampe, N. Sims, W. W. Verstraeten, and P. Coppin, “Spectral mixture analysis to monitor
defoliation in mixed-aged Eucalyptus globulus Labill plantations in southern Australia using Landsat 5-TM and EO-1 hyperion data,” Int. J. Appl. Earth Observ. Geoinf., vol. 12, no. 4, pp.
270–277, 2010. doi: 10.1016/j.jag.2010.03.005.
[163] B. Somers and G. P. Asner, “Invasive species mapping in Hawaiian rainforests using multi-temporal hyperion spaceborne
imaging spectroscopy,” IEEE J. Sel. Topics Appl. Earth Observ.
Remote Sens., vol. 6, no. 2, pp. 351–359, 2013. doi: 10.1109/
JSTARS.2012.2203796.
[164] B. Somers, S. Delalieux, W. W. Verstraeten, J. Verbesselt, S.
Lhermitte, and P. Coppin, “Magnitude-and shape-related
feature integration in hyperspectral mixture analysis to
monitor weeds in citrus orchards,” IEEE Trans. Geosci. Remote
Sens., vol. 47, no. 11, pp. 3630–3642, 2009. doi: 10.1109/
TGRS.2009.2024207.
[165] W. Krippner, S. Bauer, and F. P. León, “Considering spectral
variability for optical material abundance estimation,” tm-Technisches Messen, vol. 85, no. 3, pp. 149–158, 2018. doi: 10.1515/
teme-2017-0053.
[166] C. M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics). New York: Springer-Verlag,
2006.
[167] C.-I. Chang and B. Ji, “Weighted abundance-constrained linear
spectral mixture analysis,” IEEE Trans. Geosci. Remote Sens., vol.
44, no. 2, pp. 378–388, 2006.
[168] J. Jin, B. Wang, and L. Zhang, “A novel approach based on Fisher discriminant null space for decomposition of mixed pixels
in hyperspectral imagery,” IEEE Geosci. Remote Sens. Lett., vol. 7,
no. 4, pp. 699–703, 2010. doi: 10.1109/LGRS.2010.2046134.
[169] B. D. Bue et al., “Leveraging in-scene spectra for vegetation species discrimination with MESMA-MDA,” ISPRS J. Photogrammetry Remote Sens., vol. 108, pp. 33–48, Oct. 2015. doi: 10.1016/j.
isprsjprs.2015.06.001.
[170] M. Liu, W. Yang, J. Chen, and X. Chen, “An orthogonal Fisher
transformation-based unmixing method toward estimating
fractional vegetation cover in semiarid areas,” IEEE Geosci. Remote Sens. Lett., vol. 14, no. 3, pp. 449–453, 2017. doi: 10.1109/
LGRS.2017.2648863.
[171] A. Jafari, R. Safabakhsh, and M. M. Ebadzadeh, “Endmember
orthonormal mapping in hyperspectral mixture analysis to
address endmember variability,” Earth Sci. Informatics, vol. 9,
no. 3, pp. 291–307, 2016. doi: 10.1007/s12145-016-0256-4.
[172] F. Xu, X. Cao, X. Chen, and B. Somers, “Mapping impervious
surface fractions using automated Fisher transformed unmixing,” Remote Sens. Environ., vol. 232, p. 111,311, 2019. doi:
10.1016/j.rse.2019.111311.
[173] Q. Du, “Modified Fisher’s linear discriminant analysis for
hyperspectral imagery,” IEEE Geosci. Remote Sens. Lett., vol. 4,
no. 4, pp. 503–507, 2007. doi: 10.1109/LGRS.2007.900751.
[174] K. Canham, A. Schlamm, A. Ziemann, B. Basener, and D. Messinger, “Spatially adaptive hyperspectral unmixing,” IEEE Trans.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

Geosci. Remote Sens., vol. 49, no. 11, pp. 4248–4262, 2011. doi:
10.1109/TGRS.2011.2169680.
[175] M. A. Goenaga, M. C. Torres-Madronero, M. Velez-Reyes,
S. J. Van Bloem, and J. D. Chinea, “Unmixing analysis of a
time series of hyperion images over the Guánica dry forest in
Puerto Rico,” IEEE J. Sel. Topics Appl. Earth Observ Remote Sens.,
vol. 6, no. 2, pp. 329–338, 2013. doi: 10.1109/JSTARS.2012.
2225096.
[176] M. Li, S. Zang, C. Wu, and Y. Deng, “Segmentation-based and
rule-based spectral mixture analysis for estimating urban imperviousness,” Adv. Space Res., vol. 55, no. 5, pp. 1307–1315,
2015. doi: 10.1016/j.asr.2014.12.015.
[177] H. Sun and A. Zare, “Map-guided hyperspectral image superpixel segmentation using proportion maps,” in Proc. 37th IEEE
Int. Geoscience and Remote Sensing Symp., Fort Worth, TX, 2017,
pp. 3751–3754.
[178] L. Drumetz et al., “Binary partition tree-based local spectral
unmixing,” in Proc. 6th Workshop on Hyperspectral Image Signal
Process.: Evolution Remote Sens., Lausanne, Switzerland, 2014,
pp. 1–4.
[179] M. A. Veganzones, G. Tochon, M. Dalla-Mura, A. J. Plaza, and
J. Chanussot, “Hyperspectral image segmentation using a new
spectral unmixing-based binary partition tree representation,”
IEEE Trans. Image Process., vol. 23, no. 8, pp. 3574–3589, 2014.
doi: 10.1109/TIP.2014.2329767.
[180] C. Deng and C. Wu, “A spatially adaptive spectral mixture
analysis for mapping subpixel urban impervious surface distribution,” Remote Sens. Environ., vol. 133, no. 1, pp. 62–70, 2013.
doi: 10.1016/j.rse.2013.02.005.
[181] C. Wu, C. Deng, and X. Jia, “Spatially constrained multiple
endmember spectral mixture analysis for quantifying subpixel
urban impervious surfaces,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 7, no. 6, pp. 1976–1984, 2014.
[182] S. Cao, Q. Yu, A. Sanchez-Azofeifa, J. Feng, B. Rivard, and Z. Gu,
“Mapping tropical dry forest succession using multiple criteria
spectral mixture analysis,” ISPRS J. Photogrammetry Remote Sens.,
vol. 109, pp. 17–29, Nov. 2015. doi: 10.1016/j.isprsjprs.2015.08.009.
[183] S. Mei, Q. Du, and M. He, “Equivalent-sparse unmixing
through spatial and spectral constrained endmember selection
from an image-derived spectral library,” IEEE J. Sel. Topics Appl.
Earth Observ. Remote Sens., vol. 8, no. 6, pp. 2665–2675, 2015.
doi: 10.1109/JSTARS.2015.2403254.
[184] C. Deng, “Automated construction of multiple regional libraries for neighborhoodwise local multiple endmember unmixing,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 9, no.
9, pp. 4232–4246, 2016. doi: 10.1109/JSTARS.2016.2541660.
[185] C. Deng, “Incorporating endmember variability into linear
unmixing of coarse resolution imagery: Mapping large-scale
impervious surface abundance using a hierarchically objectbased spectral mixture analysis,” Remote Sens., vol. 7, no. 7, pp.
9205–9229, 2015.
[186] A. Robin, K. Cawse-Nicholson, A. Mahmood, and M. Sears,
“Estimation of the intrinsic dimension of hyperspectral images: Comparison of current methods,” IEEE J. Sel. Topics Appl.
Earth Observ. Remote Sens., vol. 8, no. 6, pp. 2854–2861, 2015.
doi: 10.1109/JSTARS.2015.2432460.

265

[187] L. Drumetz et al., “Hyperspectral local intrinsic dimensionality,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 7, pp. 4063–
4078, 2016. doi: 10.1109/TGRS.2016.2536480.
[188] L. Drumetz, G. Tochon, M. A. Veganzones, J. Chanussot,
and C. Jutten, “Improved local spectral unmixing of hyperspectral data using an algorithmic regularization path
for collaborative sparse regression,” in Proc. IEEE Int. Conf.
Acoust., Speech Signal Process. (ICASSP), New Orleans,
2017, pp. 6190–6194.
[189] A. Zare, P. Gader, O. Bchir, and H. Frigui, “Piecewise convex
multiple-model endmember detection and spectral unmixing,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 5, pp. 2853–
2862, 2013. doi: 10.1109/TGRS.2012.2219058.
[190] D. T. Anderson and A. Zare, “Spectral unmixing cluster validity
index for multiple sets of endmembers,” IEEE J. Sel. Topics Appl.
Earth Observ. Remote Sens., vol. 5, no. 4, pp. 1282–1295, 2012.
doi: 10.1109/JSTARS.2012.2189556.
[191] A. Zare, P. Gader, T. Allgire, D. Dranishnikov, and R. Close,
“Bootstrapping for piece-wise convex endmember distribution detection,” in Proc. 4th Workshop on Hyperspectral Image
and Signal Process.: Evolution Remote Sens., Shanghai, China,
2012, pp. 1–4.
[192] A. Zare, O. Bchir, H. Frigui, and P. Gader, “Spatially-smooth
piece-wise convex endmember detection,” in Proc. 2nd Workshop on Hyperspectral Image and Signal Process.: Evolution Remote
Sens., Reykjavik, Iceland, 2010, pp. 1–4.
[193] A. Castrodad, Z. Xing, J. B. Greer, E. Bosch, L. Carin, and G.
Sapiro, “Learning discriminative sparse representations for
modeling, source separation, and mapping of hyperspectral
imagery,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 11, pp.
4263–4281, 2011. doi: 10.1109/TGRS.2011.2163822.
[194] M.-D. Iordache, A. Okujeni, S. van der Linden, J. Bioucas-Dias,
A. Plaza, and B. Somers, “A multi-measurement vector approach for endmember extraction in urban environments,” in
Proc. Image Inf. Mining Conf.: The Sentinels Era, Bucharest, Romania, 2014, pp. 1–4.
[195] C. Revel, Y. Deville, V. Achard, X. Briottet, and C. Weber, “Inertia-constrained pixel-by-pixel nonnegative matrix factorisation: A hyperspectral unmixing method dealing with intraclass variability,” Remote Sens., vol. 10, no. 11, p. 1706, 2018.
doi: 10.3390/rs10111706.
[196] Y. Shkuratov, L. Starukhina, H. Hoffmann, and G. Arnold, “A
model of spectral albedo of particulate surfaces: Implications
for optical properties of the moon,” Icarus, vol. 137, no. 2, pp.
235–246, 1999. doi: 10.1006/icar.1998.6035.
[197] P. E. Johnson, M. O. Smith, and J. B. Adams, “Simple algorithms for remote determination of mineral abundances and
particle sizes from reflectance spectra,” J. Geophys. Res., Planets,
vol. 97, no. E2, pp. 2649–2657, 1992. doi: 10.1029/91JE02504.
[198] R. Heylen and P. Gader, “Nonlinear spectral unmixing with
a linear mixture of intimate mixtures model,” IEEE Geosci.
Remote Sens. Lett., vol. 11, no. 7, pp. 1195–1199, 2014. doi:
10.1109/LGRS.2013.2288921.
[199] J. F. Mustard and C. M. Pieters, “Photometric phase functions
of common geologic minerals and applications to quantitative
analysis of mineral mixture reflectance spectra,” J. Geophys.

266

Res., Solid Earth, vol. 94, no. B10, pp. 13,619–13,634, 1989. doi:
10.1029/JB094iB10p13619.
[200] H. Shipman and J. B. Adams, “Detectability of minerals on desert alluvial fans using reflectance spectra,” J. Geophys. Res., Solid
Earth, vol. 92, no. B10, pp. 10,391–10,402, 1987. doi: 10.1029/
JB092iB10p10391.
[201] J. F. Mustard, L. Li, and G. He, “Nonlinear spectral mixture
modeling of lunar multispectral data: Implications for lateral
transport,” J. Geophys. Res., Planets, vol. 103, no. E8, pp. 19,419–
19,425, 1998. doi: 10.1029/98JE01901.
[202] D. Dhingra, J. Mustard, S. Wiseman, M. Pariente, C. Pieters,
and P. Isaacson, “Non-linear spectral un-mixing using Hapke
modeling: Application to remotely acquired M3 spectra of spinel bearing lithologies on the moon,” in Proc. Lunar and Planetary Sci. Conf., 2011, vol. 42, p. 2431.
[203] M. Gilabert, F. García-Haro, and J. Melia, “A mixture modeling
approach to estimate vegetation parameters for heterogeneous
canopies in remote sensing,” Remote Sens. Environ., vol. 72, no.
3, pp. 328–345, 2000. doi: 10.1016/S0034-4257(99)00109-1.
[204] W. Song, X. Mu, G. Ruan, Z. Gao, L. Li, and G. Yan, “Estimating
fractional vegetation cover and the vegetation index of bare soil
and highly dense vegetation with a physically based method,”
Int. J. Appl. Earth Observ. Geoinf., vol. 58, pp. 168–176, 2017. doi:
10.1016/j.jag.2017.01.015.
[205] Q. Li, W. Luo, and F. Wang, “A PROSAIL-based spectral unmixing algorithm for solving vegetation spectral variability
problem,” in Proc. MIPPR 2017: Multispectral Image Acquisition,
Process., Anal., Xiangyang, China, 2018, vol. 10607, pp. 125–
130.
[206] K. M. Cannon and J. F. Mustard, “A Monte Carlo approach
to radiative transfer spectral unmixing,” in Proc. 48th Lunar Planetary Sci. Conf., The Woodlands, TX, 2017, pp. 1998–
1999.
[207] N. Dobigeon, J.-Y. Tourneret, C. Richard, J. C. M. Bermudez,
S. McLaughlin, and A. O. Hero, “Nonlinear unmixing of hyperspectral images: Models and algorithms,” IEEE Signal Process. Mag., vol. 31, no. 1, pp. 82–94, Jan. 2014. doi: 10.1109/
MSP.2013.2279274.
[208] M. A. Veganzones et al., “A new extended linear mixing model
to address spectral variability,” in Proc. 6th Workshop on Hyperspectral Image Signal Process.: Evolution Remote Sens., Lausanne,
Switzerland, 2014, pp. 1–4.
[209] S. Henrot, J. Chanussot, and C. Jutten, “Dynamical spectral
unmixing of multitemporal hyperspectral images,” IEEE Trans.
Image Process., vol. 25, no. 7, pp. 3219–3232, 2016. doi: 10.1109/
TIP.2016.2562562.
[210] G. Tochon, L. Drumetz, M. A. Veganzones, M. Dalla Mura, and
J. Chanussot, “From local to global unmixing of hyperspectral
images to reveal spectral variability,” in Proc. 8th Workshop on
Hyperspectral Image Signal Process.: Evolution Remote Sens., Los
Angeles, CA, 2016, pp. 1–5.
[211] L. Drumetz, B. Ehsandoust, J. Chanussot, B. Rivet, M. BabaieZadeh, and C. Jutten, “Relationships between nonlinear and
space-variant linear models in hyperspectral image unmixing,”
IEEE Signal Process. Lett., vol. 24, no. 10, pp. 1567–1571, 2017.
doi: 10.1109/LSP.2017.2747478.
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

[212] T. Imbiriba, R. A. Borsoi, and J. C. M. Bermudez, “Generalized
linear mixing model accounting for endmember variability,”
in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP),
Calgary, Canada, 2018, pp. 1862–1866.
[213] R. A. Borsoi, T. Imbiriba, and J. C. Moreira Bermudez, “Improved hyperspectral unmixing with endmember variability
parametrized using an interpolated scaling tensor,” in Proc.
IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Brighton,
U.K., 2019, pp. 2177–2181.
[214] R. A. Borsoi, T. Imbiriba, P. Closas, J. C. M. Bermudez, and C.
Richard, “Kalman filtering and expectation maximization for
multitemporal spectral unmixing,” IEEE Geosci. Remote Sens.
Lett., 2020. doi: 10.1109/LGRS.2020.3025781.
[215] R. A. Borsoi, T. Imbiriba, and J. C. M. Bermudez, “Superresolution for hyperspectral and multispectral image fusion
accounting for seasonal spectral variability,” IEEE Trans. Image Process., vol. 29, no. 1, pp. 116–127, 2020. doi: 10.1109/
TIP.2019.2928895.
[216] L. Drumetz, J. Chanussot, C. Jutten, W.-K. Ma, and A. Iwasaki,
“Spectral variability aware blind hyperspectral image unmixing based on convex geometry,” IEEE Trans. Image Process., vol.
29, pp. 4568–4582, 2020. doi: 10.1109/TIP.2020.2974062.
[217] P.-A. Thouvenin, N. Dobigeon, and J.-Y. Tourneret, “Hyperspectral unmixing with spectral variability using a perturbed linear
mixing model,” IEEE Trans. Signal Process., vol. 64, no. 2, pp.
525–538, Feb. 2016. doi: 10.1109/TSP.2015.2486746.
[218] R. Arablouei, “Spectral unmixing with perturbed endmembers,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 1, pp. 194–
211, 2019. doi: 10.1109/TGRS.2018.2852745.
[219] Y.-R. Syu, C.-H. Lin, and C.-Y. Chi, “An outlier-insensitive
unmixing algorithm with spatially varying hyperspectral
signatures,” IEEE Access, vol. 7, pp. 15,086–15,101, 2019. doi:
10.1109/ACCESS.2018.2890278.
[220] J. Sigurdsson, M. O. Ulfarsson, J. R. Sveinsson, and J. M. Bioucas-Dias, “Sparse distributed multitemporal hyperspectral
unmixing,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 11,
pp. 6069–6084, 2017. doi: 10.1109/TGRS.2017.2720539.
[221] P.-A. Thouvenin, N. Dobigeon, and J.-Y. Tourneret, “Online unmixing of multitemporal hyperspectral images accounting for
spectral variability,” IEEE Trans. Image Process., vol. 25, no. 9, pp.
3979–3990, 2016. doi: 10.1109/TIP.2016.2579309.
[222] R. A. Borsoi, T. Imbiriba, and J. C. M. Bermudez, “Deep generative endmember modeling: An application to unsupervised
spectral unmixing,” IEEE Trans. Comput. Imag., vol. 6, pp. 374–
384, 2019. doi: 10.1109/TCI.2019.2948726.
[223] R. A. Borsoi, T. Imbiriba, and J. C. Moreira Bermudez, “A data
dependent multiscale model for hyperspectral unmixing
with spectral variability,” IEEE Trans. Image Process., vol. 29,
pp. 3638–3651, 2020. doi: 10.1109/TIP.2020.2963959.
[224] J. Chen, X. Jia, W. Yang, and B. Matsushita, “Generalization of subpixel analysis for hyperspectral data with flexibility in spectral similarity measures,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 7, pp. 2165–2171, 2009. doi: 10.1109/
TGRS.2008.2011432.
[225] L. Tits, W. De Keersmaecker, B. Somers, G. P. Asner, J. Farifteh,
and P. Coppin, “Hyperspectral shape-based unmixing to imDECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

prove intra-and interclass variability for forest and agro-ecosystem monitoring,” ISPRS J. Photogrammetry Remote Sens., vol. 74,
pp. 163–174, Nov. 2012. doi: 10.1016/j.isprsjprs.2012.09.013.
[226] F. Kizel, M. Shoshany, N. S. Netanyahu, G. Even-Tzur, and J. A.
Benediktsson, “A stepwise analytical projected gradient descent
search for hyperspectral unmixing and its code vectorization,”
IEEE Trans. Geosci. Remote Sens., vol. 55, no. 9, pp. 4925–4943,
2017. doi: 10.1109/TGRS.2017.2692999.
[227] A. Halimi, J. M. Bioucas-Dias, N. Dobigeon, G. S. Buller, and S.
McLaughlin, “Fast hyperspectral unmixing in presence of nonlinearity or mismodeling effects,” IEEE Trans. Comput. Imag., vol.
3, no. 2, pp. 146–159, 2017. doi: 10.1109/TCI.2016.2631979.
[228] D. Hong, N. Yokoya, J. Chanussot, and X. X. Zhu, “An augmented linear mixing model to address spectral variability for
hyperspectral unmixing,” IEEE Trans. Image Process., vol. 28,
no. 4, pp. 1923–1938, 2019. doi: 10.1109/TIP.2018.2878958.
[229] D. Hong and X. X. Zhu, “SULoRA: Subspace unmixing with
low-rank attribute embedding for hyperspectral data analysis,”
IEEE J. Sel. Topics Signal Process., vol. 12, no. 6, pp. 1351–1363,
2018. doi: 10.1109/JSTSP.2018.2877497.
[230] T. Imbiriba, R. A. Borsoi, and J. C. M. Bermudez, “Low-rank
tensor modeling for hyperspectral unmixing accounting for
spectral variability,” IEEE Trans. Geosci. Remote Sens., vol.
58, no. 3, pp. 1833–1842, 2020. doi: 10.1109/TGRS.2019.
2949543.
[231] S. Moussaoui, C. Carteret, D. Brie, and A. Mohammad-Djafari,
“Bayesian analysis of spectral mixture data using Markov chain
Monte Carlo methods,” Chemometr. Intell. Lab. Syst., vol. 81, no.
2, pp. 137–148, 2006. doi: 10.1016/j.chemolab.2005.11.004.
[232] N. Dobigeon, S. Moussaoui, M. Coulon, J.-Y. Tourneret, and
A. O. Hero, “Joint Bayesian endmember extraction and linear unmixing for hyperspectral imagery,” IEEE Trans. Signal
Process., vol. 57, no. 11, pp. 4355–4368, 2009. doi: 10.1109/
TSP.2009.2025797.
[233] D. Stein, “Application of the normal compositional model to
the analysis of hyperspectral imagery,” in Proc. IEEE Workshop
on Adv. Tech. Anal. Remotely Sens. Data, Greenbelt, MD, 2003,
pp. 44–51.
[234] M. T. Eismann and R. C. Hardie, “Stochastic spectral unmixing
with enhanced endmember class separation,” Appl. Opt., vol. 43,
no. 36, pp. 6596–6608, 2004. doi: 10.1364/AO.43.006596.
[235] L. Liu, B. Wang, and L. Zhang, “Decomposition of mixed pixels
based on Bayesian self-organizing map and Gaussian mixture
model,” Pattern Recog. Lett., vol. 30, no. 9, pp. 820–826, 2009.
doi: 10.1016/j.patrec.2008.05.026.
[236] L. Gao, L. Zhuang, and B. Zhang, “Region-based estimate of
endmember variances for hyperspectral image unmixing,”
IEEE Geosci. Remote Sens. Lett., vol. 13, no. 12, pp. 1807–1811,
2016. doi: 10.1109/LGRS.2016.2614101.
[237] B. Zhang, L. Zhuang, L. Gao, W. Luo, Q. Ran, and Q. Du, “PSOEM: A hyperspectral unmixing algorithm based on normal
compositional model,” IEEE Trans. Geosci. Remote Sens., vol. 52,
no. 12, pp. 7782–7792, 2014.
[238] Y. Ma et al., “Hyperspectral unmixing with Gaussian mixture
model and low-rank representation,” Remote Sens., vol. 11,
no. 8, p. 911, 2019. doi: 10.3390/rs11080911.

267

[239] O. Eches, N. Dobigeon, C. Mailhes, and J.-Y. Tourneret, “Bayesian estimation of linear mixtures using the normal compositional model. Application to hyperspectral imagery,” IEEE
Trans. Image Process., vol. 19, no. 6, pp. 1403–1413, 2010. doi:
10.1109/TIP.2010.2042993.
[240] H. Kazianka, M. Mulyk, and J. Pilz, “A Bayesian approach
to estimating linear mixtures with unknown covariance
structure,” Appl. Stat., vol. 38, no. 9, pp. 1801–1817, 2011. doi:
10.1080/02664763.2010.529879.
[241] H. Kazianka, “Objective Bayesian analysis for the normal compositional model,” Comput. Statist. Data Anal., vol. 56, no. 6,
pp. 1528–1544, 2012. doi: 10.1016/j.csda.2011.08.016.
[242] A. Halimi, N. Dobigeon, and J.-Y. Tourneret, “Unsupervised
unmixing of hyperspectral images accounting for endmember
variability,” IEEE Trans. Image Process., vol. 24, no. 12, pp. 4904–
4917, 2015. doi: 10.1109/TIP.2015.2471182.
[243] O. Eches, N. Dobigeon, and J.-Y. Tourneret, “Estimating the
number of endmembers in hyperspectral images using the
normal compositional model and a hierarchical Bayesian algorithm,” IEEE J. Sel. Topics Signal Process., vol. 4, no. 3, pp. 582–
591, 2010. doi: 10.1109/JSTSP.2009.2038212.
[244] A. Jafari, M. M. Ebadzadeh, and R. Safabakhsh, “Independent
base vector representation to address endmember variability in
hyperspectral unmixing,” J. Indian Soc. Remote Sens., vol. 45,
no. 3, pp. 417–429, 2017. doi: 10.1007/s12524-016-0599-9.
[245] C. Puladas, K. Hossler, and J. N. Ash, “Sum-product unmixing
for hyperspectral analysis with endmember variability,” IEEE
Geosci. Remote Sens. Lett., vol. 15, no. 12, pp. 1–5, 2018. doi:
10.1109/LGRS.2018.2861577.
[246] L. Zhuang, B. Zhang, L. Gao, J. Li, and A. Plaza, “Normal
endmember spectral unmixing method for hyperspectral imagery,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol.
8, no. 6, pp. 2598–2606, 2015. doi: 10.1109/JSTARS.2014.
2360888.
[247] A. Zare and P. Gader, “PCE: Piecewise convex endmember
detection,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 6,
pp. 2620–2632, 2010. doi: 10.1109/TGRS.2010.2041062.
[248] A. Zare, P. Gader, and G. Casella, “Sampling piecewise convex
unmixing and endmember extraction,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 3, pp. 1655–1665, 2013. doi: 10.1109/
TGRS.2012.2207905.
[249] F. Amiri and M. Kahaei, “A sparsity-based Bayesian approach
for hyperspectral unmixing using normal compositional model,” Signal, Image Video Process., vol. 12, no. 7, pp. 1361–1367,
2018. doi: 10.1007/s11760-018-1290-0.
[250] S. Zou and A. Zare, “Hyperspectral unmixing with endmember variability using partial membership latent Dirichlet allocation,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process.
(ICASSP), New Orleans, LA, 2017, pp. 6200–6204.
[251] Y. Zhou, A. Rangarajan, and P. D. Gader, “A spatial compositional model for linear unmixing and endmember uncertainty estimation,” IEEE Trans. Image Process., vol. 25, no. 12,
pp. 5987–6002, 2016. doi: 10.1109/TIP.2016.2618002.
[252] W. Luo, L. Gao, R. Zhang, A. Marinoni, and B. Zhang, “Bilinear
normal mixing model for spectral unmixing,” IET Image Process.,
vol. 13, no. 2, pp. 344–354, 2018. doi: 10.1049/iet-ipr.2018.5458.

268

[253] S.-Y. Yu, S. M. Colman, and L. Li, “BEMMA: A hierarchical
Bayesian endmember modeling analysis of sediment grain-size
distributions,” Math. Geosci., vol. 48, no. 6, pp. 723–741, 2016.
doi: 10.1007/s11004-015-9611-0.
[254] P.-A. Thouvenin, N. Dobigeon, and J.-Y. Tourneret, “A hierarchical Bayesian model accounting for endmember variability and
abrupt spectral changes to unmix multitemporal h
yperspectral
images,” IEEE Trans. Comput. Imag., vol. 4, no. 1, pp. 32–45,
2018. doi: 10.1109/TCI.2017.2777484.
[255] X. Du, A. Zare, P. Gader, and D. Dranishnikov, “Spatial and
spectral unmixing using the beta compositional model,” IEEE
J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 7, no. 6, pp.
1994–2003, 2014. doi: 10.1109/JSTARS.2014.2330347.
[256] Y. Zhou, A. Rangarajan, and P. D. Gader, “A Gaussian mixture
model representation of endmember variability in hyperspectral unmixing,” IEEE Trans. Image Process., vol. 27, no. 5, pp.
2242–2256, May 2018. doi: 10.1109/TIP.2018.2795744.
[257] A. Halimi, P. Honeine, and J. M. Bioucas-Dias, “Hyperspectral unmixing in presence of endmember variability, nonlinearity, or mismodeling effects,” IEEE Trans. Image Process.,
vol. 25, no. 10, pp. 4565–4579, 2016. doi: 10.1109/TIP.2016.
2590324.
[258] P. Bosdogianni, M. Petrou, and J. Kittler, “Mixture models with
higher order moments,” IEEE Trans. Geosci. Remote Sens., vol. 35,
no. 2, pp. 341–353, 1997. doi: 10.1109/36.563273.
[259] M. Faraklioti and M. Petrou, “Illumination invariant unmixing
of sets of mixed pixels,” IEEE Trans. Geosci. Remote Sens., vol. 39,
no. 10, pp. 2227–2234, 2001. doi: 10.1109/36.957285.
[260] B. Somers, L. Tits, D. Roberts, and E. Wetherley, “Endmember
library approaches to resolve spectral mixing problems in remotely sensed data: Potential, challenges, and applications,”
in Data Handling in Science and Technology, C. Ruckebusch, Ed.
Amsterdam, The Netherlands: Elsevier, 2016, vol. 30, pp. 551–
577.
[261] S. Tompkins, J. F. Mustard, C. M. Pieters, and D. W. Forsyth,
“Optimization of endmembers for spectral mixture analysis,”
Remote Sens. Environ., vol. 59, no. 3, pp. 472–489, 1997. doi:
10.1016/S0034-4257(96)00122-8.
[262] E. B. Wetherley, D. A. Roberts, and J. P. McFadden, “Mapping
spectrally similar urban materials at sub-pixel scales,” Remote
Sens. Environ., vol. 195, no. 1, pp. 170–183, 2017. doi: 10.1016/j.
rse.2017.04.013.
[263] M. A. Veganzones and M. Grana, “Endmember extraction
methods: A short review,” in Proc. Int. Conf. Knowledge-Based and
Intell. Inf. Eng. Syst., 2008, pp. 400–407.
[264] C. Quintano, A. Fernández-Manso, and D. A. Roberts, “Multiple endmember spectral mixture analysis (MESMA) to map
burn severity levels from Landsat images in Mediterranean
countries,” Remote Sens. Environ., vol. 136, pp. 76–88, Sept.
2013. doi: 10.1016/j.rse.2013.04.017.
[265] A. Bateson and B. Curtiss, “A method for manual endmember
selection and spectral unmixing,” Remote Sens. Environ., vol. 55,
no. 3, pp. 229–243, 1996. doi: 10.1016/S0034-4257(95)00177-8.
[266] S. Meerdink, J. Bocinsky, E. Wetherley, A. Zare, C. McCurley,
and P. Gader, “Developing spectral libraries using multiple target multiple instance adaptive cosine/coherence estimator,” in
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

Proc. 10th Workshop on Hyperspectral Imaging Signal Process.: Evolution Remote Sens., Yokohama, Japan, 2019, pp. 1–5.
[267] B. Somers, M. Zortea, A. Plaza, and G. P. Asner, “Automated
extraction of image-based endmember bundles for improved
spectral unmixing,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 5, no. 2, pp. 396–408, 2012. doi: 10.1109/
JSTARS.2011.2181340.
[268] C. Gao, Y. Li, and C.-I. Chang, “Finding endmember classes in hyperspectral imagery,” in Proc. Satell. Data Compression, Commun. Process. XI, vol. 9501, Baltimore, MD, 2015, p.
95010M.
[269] M. Xu, L. Zhang, B. Du, and L. Zhang, “An image-based
endmember bundle extraction algorithm using reconstruction error for hyperspectral imagery,” Neurocomputing, vol.
173, pp. 397–405, Jan. 2016. doi: 10.1016/j.neucom.2015.
02.098.
[270] C. Andreou, D. Rogge, and R. Müller, “A new approach for
endmember extraction and clustering addressing inter-and
intra-class variability via multiscaled-band partitioning,” IEEE
J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 9, no. 9, pp.
4215–4231, 2016. doi: 10.1109/JSTARS.2016.2519610.
[271] T. Uezato, R. J. Murphy, A. Melkumyan, and A. Chlingaryan, “A
novel endmember bundle extraction and clustering approach
for capturing spectral variability within endmember classes,”
IEEE Trans. Geosci. Remote Sens., vol. 54, no. 11, pp. 6712–6731,
2016. doi: 10.1109/TGRS.2016.2589266.
[272] J. Yin, C. Huang, X. Luo, and Q. Du, “Automatic endmember
bundle unmixing methodology for lunar regional area mineral mapping,” Icarus, vol. 319, pp. 349–362, Feb. 2019. doi:
10.1016/j.icarus.2018.09.005.
[273] M. Xu, L. Zhang, and B. Du, “An image-based endmember
bundle extraction algorithm using both spatial and spectral information,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 8, no. 6, pp. 2607–2617, 2015. doi: 10.1109/
JSTARS.2014.2373491.
[274] Z. Hua, X. Li, and L. Zhao, “Endmember bundle extraction
based on pure pixel index and superpixel segmentation,”
in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2019,
pp. 2131–2134.
[275] X. Xu, J. Li, C. Wu, and A. Plaza, “Regional clustering-based
spatial preprocessing for hyperspectral unmixing,” Remote
Sens. Environ., vol. 204, pp. 333–346, Jan. 2018. doi: 10.1016/j.
rse.2017.10.020.
[276] M. C. Torres-Madronero and M. Velez-Reyes, “Integrating spatial information in unsupervised unmixing of hyperspectral
imagery using multiscale representation,” IEEE J. Sel. Topics
Appl. Earth Observ. Remote Sens., vol. 7, no. 6, pp. 1985–1993,
2014. doi: 10.1109/JSTARS.2014.2319261.
[277] C. Zhao, G. Zhao, and X. Jia, “Hyperspectral image unmixing based on fast kernel archetypal analysis,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 10, no. 1, pp. 331–346,
2017. doi: 10.1109/JSTARS.2016.2606504.
[278] J. M. P. Nascimento and J. M. Bioucas-Dias, “Vertex Component Analysis: A fast algorithm to unmix hyperspectral data,”
IEEE Trans. Geosci. Remote Sens., vol. 43, no. 4, pp. 898–910, Apr.
2005. doi: 10.1109/TGRS.2005.844293.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

[279] D. R. Peddle, F. G. Hall, and E. F. LeDrew, “Spectral mixture
analysis and geometric-optical reflectance modeling of boreal
forest biophysical structure,” Remote Sens. Environ., vol. 67, no. 3,
pp. 288–297, 1999. doi: 10.1016/S0034-4257(98)00090-X.
[280] P. E. Dennison, K. Charoensiri, D. A. Roberts, S. H. Peterson,
and R. O. Green, “Wildfire temperature and land cover modeling using hyperspectral data,” Remote Sens. Environ., vol. 100,
no. 2, pp. 212–222, 2006. doi: 10.1016/j.rse.2005.10.007.
[281] L. Yang, K. Jia, S. Liang, X. Wei, Y. Yao, and X. Zhang, “A robust
algorithm for estimating surface fractional vegetation cover
from Landsat data,” Remote Sens., vol. 9, no. 8, p. 857, 2017. doi:
10.3390/rs9080857.
[282] K. Jia et al., “Fractional vegetation cover estimation algorithm
for Chinese GF-1 wide field view data,” Remote Sens. Environ.,
vol. 177, no. 1, pp. 184–191, 2016. doi: 10.1016/j.rse.2016.
02.019.
[283] A. Verger, F. Baret, and F. Camacho, “Optimal modalities for radiative transfer-neural network estimation of canopy biophysical characteristics: Evaluation over an agricultural area with
CHRIS/PROBA observations,” Remote Sens. Environ., vol. 115,
no. 2, pp. 415–426, 2011. doi: 10.1016/j.rse.2010.09.012.
[284] D. R. Peddle, “Integration of a geometric optical reflectance
model with an evidential reasoning image classifier for improved forest information extraction,” Canadian J. Remote Sens.,
vol. 25, no. 2, pp. 189–196, 1999. doi: 10.1080/07038992.1999.
10874716.
[285] L. Tits, B. Somers, and P. Coppin, “The potential and limitations of a clustering approach for the improved efficiency of
multiple endmember spectral mixture analysis in plant production system monitoring,” IEEE Trans. Geosci. Remote Sens.,
vol. 50, no. 6, pp. 2273–2286, 2012. doi: 10.1109/TGRS.2011.
2173696.
[286] R. A. Borsoi, T. Imbiriba, J. C. M. Bermudez, and C. Richard,
“Deep generative models for library augmentation in multiple
endmember spectral mixture analysis,” IEEE Geosci. Remote
Sens. Lett., 2020. doi: 10.1109/LGRS.2020.3007161.
[287] F. Maselli, “Definition of spatially variable spectral endmembers by locally calibrated multivariate regression analyses,” Remote Sens. Environ., vol. 75, no. 1, pp. 29–38, 2001. doi: 10.1016/
S0034-4257(00)00153-X.
[288] B. Johnson, R. Tateishi, and T. Kobayashi, “Remote sensing of
fractional green vegetation cover using spatially-interpolated
endmembers,” Remote Sens., vol. 4, no. 9, pp. 2619–2634, 2012.
doi: 10.3390/rs4092619.
[289] W. Li, and C. Wu, “A geostatistical temporal mixture analysis
approach to address endmember variability for estimating regional impervious surface distributions,” GISci. Remote Sens.,
vol. 53, no. 1, pp. 102–121, 2016. doi: 10.1080/15481603.
2015.1118975.
[290] Z. Zhang, C. Liu, J. Luo, Z. Shen, and Z. Shao, “Applying spectral mixture analysis for large-scale sub-pixel impervious cover
estimation based on neighbourhood-specific endmember signature generation,” Remote Sens. Lett., vol. 6, no. 1, pp. 1–10,
2015. doi: 10.1080/2150704X.2014.996677.
[291] W. Li and C. Wu, “A geographic information-assisted temporal
mixture analysis for addressing the issue of endmember class

269

and endmember spectra variability,” Sensors, vol. 17, no. 3, p.
624, 2017. doi: 10.3390/s17030624.
[292] M.-D. Iordache, J. M. Bioucas-Dias, and A. Plaza, “Dictionary
pruning in sparse unmixing of hyperspectral data,” in Proc. 4th
Workshop on Hyperspectral Image and Signal Process.: Evolution Remote Sens., Shanghai, China, 2012, pp. 1–4.
[293] K. L. Roth, P. E. Dennison, and D. A. Roberts, “Comparing
endmember selection techniques for accurate mapping of
plant species and land cover using imaging spectrometer data,”
Remote Sens. Environ., vol. 127, pp. 139–152, Dec. 2012. doi:
10.1016/j.rse.2012.08.030.
[294] Y. Xu, J. Shi, and J. Du, “An improved endmember selection
method based on vector length for MODIS reflectance channels,” Remote Sens., vol. 7, no. 5, pp. 6280–6295, 2015. doi:
10.3390/rs70506280.
[295] J. Degerickx, A. Okujeni, M.-D. Iordache, M. Hermy, S. van der
Linden, and B. Somers, “A novel spectral library pruning technique for spectral unmixing of urban land cover,” Remote Sens.,
vol. 9, no. 6, p. 565, 2017. doi: 10.3390/rs9060565.
[296] D. M. Rogge, B. Rivard, J. Zhang, and J. Feng, “Iterative spectral unmixing for optimizing per-pixel endmember sets,” IEEE
Trans. Geosci. Remote Sens., vol. 44, no. 12, pp. 3725–3736,
2006. doi: 10.1109/TGRS.2006.881123.
[297] J. Bian et al., “Monitoring fractional green vegetation cover dynamics over a seasonally inundated alpine wetland using dense
time series HJ-1A/B constellation images and an adaptive endmember selection LSMM model,” Remote Sens. Environ., vol.
197, pp. 98–114, Aug. 2017. doi: 10.1016/j.rse.2017.05.031.
[298] S. Roessner, K. Segl, U. Heiden, and H. Kaufmann, “Automated
differentiation of urban surfaces based on airborne hyperspectral imagery,” IEEE Trans. Geosci. Remote Sens., vol. 39, no. 7, pp.
1525–1532, 2001. doi: 10.1109/36.934082.
[299] Y. Deng and C. Wu, “Development of a class-based multiple
endmember spectral mixture analysis (C-MESMA) approach
for analyzing urban environments,” Remote Sens., vol. 8, no. 4,
p. 349, 2016. doi: 10.3390/rs8040349.
[300] F. Chen, K. Wang, and T. F. Tang, “Spectral unmixing using
a sparse multiple-endmember spectral mixture model,” IEEE
Trans. Geosci. Remote Sens., vol. 54, no. 10, pp. 5846–5861, 2016.
doi: 10.1109/TGRS.2016.2574331.
[301] T. Liu and X. Yang, “Mapping vegetation in an urban area with
stratified classification and multiple endmember spectral mixture analysis,” Remote Sens. Environ., vol. 133, pp. 251–264, June
2013. doi: 10.1016/j.rse.2013.02.020.
[302] J. Degerickx, D. A. Roberts, and B. Somers, “Enhancing the
performance of multiple endmember spectral mixture analysis
(MESMA) for urban land cover mapping using airborne LIDAR
data and band selection,” Remote Sens. Environ., vol. 221, no. 1,
pp. 260–273, 2019. doi: 10.1016/j.rse.2018.11.026.
[303] F. Fan and Y. Deng, “Enhancing endmember selection in multiple endmember spectral mixture analysis (MESMA) for urban impervious surface area mapping using spectral angle and
spectral distance parameters,” Int. J. Appl. Earth Observ. Geoinf.,
vol. 33, pp. 290–301, Dec. 2014. doi: 10.1016/j.jag.2014.06.011.
[304] K. D. Singh and D. Ramakrishnan, “A comparative study of
signal transformation techniques in automated spectral un-

270

mixing of infrared spectra for remote sensing applications,”
Int. J. Remote Sens., vol. 38, no. 5, pp. 1235–1257, 2017. doi:
10.1080/01431161.2017.1280625.
[305] M.-D. Iordache, J. M. Bioucas-Dias, A. Plaza, and B. Somers,
“MUSIC-CSR: Hyperspectral unmixing via multiple signal
classification and collaborative sparse regression,” IEEE Trans.
Geosci. Remote Sens., vol. 52, no. 7, pp. 4364–4382, 2014. doi:
10.1109/TGRS.2013.2281589.
[306] M.-D. Iordache, L. Tits, J. M. Bioucas-Dias, A. Plaza, and B.
Somers, “A dynamic unmixing framework for plant production system monitoring,” IEEE J. Sel. Topics Appl. Earth Observ.
Remote Sens., vol. 7, no. 6, pp. 2016–2034, 2014. doi: 10.1109/
JSTARS.2014.2314960.
[307] X. Zhang et al., “Hyperspectral unmixing via low-rank representation with space consistency constraint and spectral library pruning,” Remote Sens., vol. 10, no. 2, p. 339, 2018. doi:
10.3390/rs10020339.
[308] B. Kozintsev, “Computations with Gaussian random fields,”
Ph.D. dissertation, Univ. of Maryland, College Park, 1999.
[309] Z. Hao, M. Berman, Y. Guo, G. Stone, and I. Johnstone, “Semirealistic simulations of natural hyperspectral scenes,” IEEE
J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 9, no. 9, pp.
4407–4419, 2016. doi: 10.1109/JSTARS.2016.2580178.
[310] A. Berk et al., “MODTRAN5: A reformulated atmospheric band
model with auxiliary species and practical multiple scattering
options,” in Proc. Remote Sens. Clouds Atmos. IX, 2004, vol. 5571,
pp. 78–85.
[311] B. Somers, L. Tits, and P. Coppin, “Quantifying nonlinear spectral mixing in vegetated areas: Computer simulation model
validation and first results,” IEEE J. Sel. Topics Appl. Earth Observ.
Remote Sens., vol. 7, no. 6, pp. 1956–1965, 2013. doi: 10.1109/
JSTARS.2013.2289989.
[312] B. Somers et al., “Nonlinear hyperspectral mixture analysis
for tree cover estimates in orchards,” Remote Sens. Environ.,
vol. 113, no. 6, pp. 1183–1193, 2009. doi: 10.1016/j.rse.2009.
02.003.
[313] N. Dobigeon, L. Tits, B. Somers, Y. Altmann, and P. Coppin,
“A comparison of nonlinear mixing models for vegetated areas
using simulated and real hyperspectral data,” IEEE J. Sel. Topics
Appl. Earth Observ. Remote Sens., vol. 7, no. 6, pp. 1869–1878,
2014. doi: 10.1109/JSTARS.2014.2328872.
[314] L. Tits, W. Delabastita, B. Somers, J. Farifteh, and P. Coppin,
“First results of quantifying nonlinear mixing effects in heterogeneous forests: A modeling approach,” in Proc. IEEE Int. Geosci.
Remote Sens. Symp., Munich, 2012, pp. 7185–7188.
[315] R. Ramakrishnan, J. Nieto, and S. Scheding, “Shadow compensation for outdoor perception,” in Proc. IEEE Int. Conf. Robot.
Automat. (ICRA), 2015, pp. 4835–4842.
[316] D. A. Roberts, K. Halligan, P. Dennison, K. Dudley, B. Somers,
and A. H. Crabbé. Viper Tools User Manual. (version 2.1), VIPER Lab., Univ. California Santa Barbara, Santa Barbara, CA.
92 pages.
[317] X. Du and A. Zare. Gatorsense/betacompositionalmodel: Initial
Release. (version 1.0). Zenodo. [Online]. Available: http://doi
.org/10.5281/zenodo.2638288
GRS
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

WOMEN IN GRSS
SHAWN C. KEFAUVER AND HEATHER MCNAIRN

The Women in Engineering
International Leadership Conference

s in previous years, the IEEE Geoscience and Remote Sensing Society (GRSS) Inspire, Develop, Empower, Advance (IDEA) Committee spent much of the
spring getting prepared to host some exciting events at
the International Geoscience and Remote Sensing Symposium (IGARSS). This year was no different, although
uncertainty hung in the air as we awaited decisions on
whether the largest annual GRSS event would be a hybrid or virtual one. Alas, the COVID-19 Delta variant
swept across the globe in what was for many the fifth or
sixth wave of coronavirus cases in the global pandemic,
and IGARSS moved to all virtual.
The IDEA Committee held its IGARSS GRSS Diversity Fireside Chat on Friday, 9 July 2021. The Fireside
Chat was an opportunity to introduce ourselves and our
core activities, following up on our success with the rollout of the Down to Earth Podcast, hosted by Stephanie
Tumampos. We discussed our new and more inclusive
structure and introduced our activity leads and coleads
for IDEA’s core activities, including the Women Mentoring Women (WMW) program, our nascent Professional
Development Microgrants, Women in Africa, our Diversity, Equality, and Inclusion Surveys, and our increasing connections with the IEEE Women in Engineering
(WIE) group. We had lively discussions and considered
it a great success and an excellent kickoff to the second
fully virtual IGARSS conference.
The WIE has been doing something quite extraordinary, which the IDEA committee has had several opportunities to benefit from, and we’d like to share our
experiences. The WIE International Leadership Conference (ILC) also went all virtual this year, and with this
format we were able to sponsor attendance to the ILC for
four of our rising-star IDEA committee members: Margot Flemming and Victoria “Vicky” Vanthof, from our
Digital Object Identifier 10.1109/MGRS.2021.3122734
Date of current version: 14 January 2022

DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

WMW core activity, and Mary Immaculate Neh Fru and
Dr. Nkeiruka “Nke” Nneti Onyia, from our Women in
Africa core initiative.
Flemming is a Ph.D. student focusing on geostatistical approaches to downscaling coarse-resolution snow
water equivalent estimates on a project with the government of the Northwest Territories in Canada (see Figure 1). She is a recent recipient of a National Science and
Engineering Research Council Canada Graduate Scholarship—Doctoral, which supports her Ph.D. studies. As a
colead of WMW, Flemming shares that “it is very fulfilling and powerful to be involved in supporting women
in a male-dominated field connect and help each other
grow and succeed.”
Her favorite talk at the ILC was by Aisha Moore on
how to manage stress and incorporate self-care into your
professional life. Flemming expresses that “Being in academia, I often find that it becomes the norm to overwork yourself, and end in a burnout. I have seen it many
times with my peers and have also felt it myself. A main
takeaway from this session was that stress can be a positive thing, as long as it’s addressed in a healthy manner.”
One quote from Moore that resonated with Flemming
was “I don’t believe in stress-free living.” Regarding her
take-home message from the conference, Flemming
adds, “One thing I learned from this conference is to be
more confident in my work and my intelligence, which
is something I often struggle with. Hearing from so
many women who have overcome a plethora of challenges and followed their dreams helped me realize that
I too can do that if I just trust myself and believe in myself. Although it may be a slow journey, going forward,
I hope to incorporate this into my research and onto my
career following.”
Vanthof is currently a Ph.D. candidate in geography
at the University of Waterloo, also in Canada. She is
working on remote sensing of surface-water resources
to support water management for her dissertation thesis
271

her wisdom, “Everybody is a little behind somebody, so if
we just pull one along with us, we will change the culture.”
Vanthof expands, “She [Carter] pushed mentorship, and
not only for senior researchers and leaders, but for everyone, which I [Vanthof] thought was so important.” This is
an especially salient thought considering Vanthof ’s current
role as colead for the GRSS WMW program, which supports
women through mentorship to help them succeed. In addition to Carter’s words, Vanthof was also inspired by a quote
and the Q&A interaction with Lynne Doherty, who first
said that “Your career is a jungle gym,” on which Doherty
later expanded with how she managed to navigate a family while pushing her career boundaries. Doherty humbly
highlighted that it’s not easy but added, in a sentence that
Vanthof said will stick with her forever, that “Your life is
chapters in a book, sometimes it’s a chapter that’s for work
and sometimes it’s on family. Sometimes it’s a good chapter and sometimes it’s a bad chapter, be okay with that in
the moment.” As a final thought, Vanthof adds, “In my
career thus far, I haven’t had the opportunity to attend a
leadership conference or a conference that isn’t technical.
While I knew that it would be different, I didn’t quite expect it to be as inspirational and motivating as it was.”
Fru is presently one of IDEA’s coleads for our Women in Africa core
initiative and a Ph.D. student at the
University of Buea, Cameroon, in applied geology. Her specific interest is
in remote sensing related to geosciences,
especially on minera l e x plorat ion
and disaster management (see Figure 3). One highlight from her current work is a team collaboration: “Assessing the Knowledge of Geoscience
Education in Africa,” a forthcoming
paper that attempts to explain the stereotypes and biases that exist in Earth
sciences, especially in Africa. “Doing
this project made me realize the gap
FIGURE 1. Margot Flemming improving seasonal snow-monitoring approaches by incorpobetween women and men in the Earth
rating satellite observations. (Source: Flemming.)
sciences, and I proposed how this can
be handled,” Fru adds. Fru also placed
third in the IGARSS 2020 Women in
Geoscience #InspireUs photo competition, organized by the IDEA committee. The contest inspired Fru to
join IDEA, and she has since launched
the first GRSS Chapter in Cameroon.
Very impressive work.
Since joining the GRSS, Fru states
that being a member has “greatly helped
me to learn more from experts in the
field, meet wonderful mentors who
are always open to listen and direct
me on the right path to take, and most
FIGURE 2. Victoria “Vicky” Vanthof doing geodesy calibrations in the field. (Source: Vanthof.)
(see Figure 2). In 2019, Vanthof received the Hugh C. Morris
Travel Fellowship. Vanthof reflects that “As a researcher you
are always competing for awards, and during your Ph.D., you
get exposed to writing grants and must learn to take rejection
as a progress step and not as defeat. I applied for the fellowship primarily as a learning experience as it consisted of me
developing my own research plan and budget, coordinated
field work travel, and allowed me to think outside the box. I
really struggled writing it because I felt like I wasn’t equipped
or experienced enough to be awarded such a prestigious
award, but I pushed through my doubts and submitted it. I received the award, and post-COVID will continue on my travel
journey across the world.” It’s quite an accomplishment and a
journey yet to fully unfold. Her first IGARSS conference was
in Texas in 2017, and soon thereafter, Vanthof volunteered as
a GRSS Social Media Ambassador for the North America Section before joining GRSS IDEA as the WMW program colead
with Flemming. Their combined social media prowess has
certainly showed during the launch of the GRSS-sponsored
Down to Earth podcast earlier this year. Go team!
Of her ILC experience, Vanthof was most inspired by
an amazing talk by Sandy Carter and continues to quote

272

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

especially, opened me more for scientific collaborations.” The talk that inspired
Fru the most at ILC was “Lead Beyond:
Advancing Women of Color in Engineering and STEM Entrepreneurship,”
something that she could closely relate
to as a black woman. The speaker provided definitive statistics on how black
women are linked to science, technology, engineering, and mathematics
(STEM) and how best to provide opportunities and work together for the better. Fru expressed that the virtual format
was excellent, and the session was very
interactive. Many questions popped up
and with the virtual format, everybody
had the opportunity to answer and interact during the session. Fru adds that
“Every woman must choose to chalFIGURE 3. Mary Immaculate Neh Fru panning for heavy mineral concentrate. (Source: Fru.)
lenge herself by learning new things all
the time. That was my thought before
attending the conference. I wanted to
learn more from the different panelists,
and I did. So many things were learned,
like having a checklist to map your career as a STEM entrepreneur, which was
very helpful, and I implemented it on
different sectors of my career life. Also,
I have to be self-aware, being conscious
of my own character, feelings, motives,
and desires.”
Our final attendee at the ILC was
Dr. Onyia. She is a research associate
at the University of Leicester and CEO
of LENKÉ Space and Water Solutions
Ltd. Dr. Onyia also coleads the GRSS
IDEA Women in Africa core initiative.
Dr. Onyia is excited about working on
developing the LENKÉ soil water index
forecast tool (SWIFT) with a team of FIGURE 4. Dr. Nkeiruka “Nke” Nneti Onyia touring Leicester Space Park, the new site for
scientists, programmers, and machine her expanding start-up, LENKÉ Space and Water Solutions Ltd. (Source: Dr. Onyia.)
learning experts based in Canada and
Latin America (see Figure 4). “SWIFT
geographic information systems gained traction when
is a satellite data-based tool designed to support natural
she attended her first IGARSS conference in Milan in
resource management in sub-Saharan Africa, particularly
2015. At the conference, she signed up for and particiwater and agricultural resources,” expands Dr. Onyia. Her
pated in the WinGRSS luncheon, where she met womproudest accomplishment as a STEM entrepreneur is her
en who had stayed on their paths, overcoming several
start-up, LENKÉ Space and Water Solutions Ltd., a comchallenges. Her experience at IGARSS and the luncheon
pany she cofounded with Dr. Lensa E. Jotte. First they won
led Dr. Onyia to become an active volunteer in the IDEA
the 2019 Copernicus hackathon hosted by the University
committee. Her favorite talk at the ILC was the keynote
of Leicester, then they won the Santander Seed Grant, and
speech by Stacey Abrams. Dr. Onyia recalls that she
next, LENKÉ became the first company to win the European
“could relate to her life experience of facing and overSpace Agency Business Incubation Center in Leicester grant.
coming challenges that instill fear in you. I admire the
GRSS is a Society that Dr. Onyia says she cherishes
for so many reasons. First and most significant is the
fact that her professional journey in remote sensing and
(continued on p. 282)
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

273

TECHNICAL COMMITTEES
NAOTO YOKOYA, PEDRAM GHAMISI, RONNY HÄNSCH, COLIN PRIEUR, HANA MALHA,
JOCELYN CHANUSSOT, CALEB ROBINSON, KOLYA MALKIN, AND NEBOJSA JOJIC

Report on the 2021 IEEE GRSS Data Fusion
Contest—Geospatial Artificial Intelligence
for Social Good

he Image Analysis and
Data Fusion Technical
Committee (IADF TC) of
the IEEE Geoscience and
Remote Sensing Society
(GRSS) has been organizing the annual Data Fusion Contest (DFC) since
2006. The contest promotes the development
of methods for extracting
geospatial information
from large-scale, multisensor, multimodal, and multitemporal data. It aims to
propose new problem settings that are challenging to
address with existing techniques and to establish new
benchmarks for scientific challenges in remote sensing
image analysis [1]–[5].

THE 2021 DATA FUSION CONTEST
The 2021 IEEE GRSS DFC promoted interdisciplinary research on geospatial artificial intelligence (AI) for social
good. The ultimate goal of the contest is to build models
to understand the state and changes in the manmade
and natural environment using multisensor and multitemporal remote sensing data for sustainable development. This contest was designed as a benchmarking
competition following previous editions [1], [2], [4], [6],
[7]. The 2021 DFC had two tracks running in parallel:
1) Track DSE: detection of settlements without electricity
2) Track MSD: multitemporal semantic change detection.
Track DSE, co-organized by Hewlett Packard Enterprise, SolarAid, and Data Science Experts, aimed to
automatically detect human settlements without access
to electricity using multimodal, multiresolution, and
Digital Object Identifier 10.1109/MGRS.2021.3121628
Date of current version: 14 January 2022

274

multitemporal satellite remote sensing data. As input
data, we used Sentinel-1 SAR
data, Sentinel-2, Landsat-8,
and Suomi Visible Infrared
Imaging Radiometer Suite
nighttime i m a ge s . The
original ground sampling
distance (GSD) ranged
from 10 m to 750 m, but all
images were resampled at
10 m. Semantic labels of
four classes (i.e., settlements with and without electricity, no settlements with
and without electricity) were provided at a GSD of
500 m for the training data. Participants submitted binary detection maps of settlements without electricity
with a GSD of 500 m. The classification accuracy was
assessed with the F1 score. The main challenge of Track
DSE was to develop robust and efficient methods for extracting high-level semantic information from heterogeneous data.
Track MSD, co-organized by Microsoft AI for
Earth, focused on the automated detection and
classification of land cover change from multitemporal, multiresolution, and multispectral imagery.
The challenge of Track MSD was to create high-resolution land cover maps for two time periods using
only low-resolution and noisy land cover labels for
training. Participants were provided with 1) 1-m
multispectral aerial imagery for 2013 and 2017 from
the U.S. Department of Agriculture’s National Agriculture Imagery Program data, 2) 30-m multispectral satellite imagery (Landsat-8) for five time points
between 2013 and 2017, and 3) 30-m noisy lowresolution land cover labels for 2013 and 2016 from
the U.S. Geological Survey’s National Land Cover
Database data over Maryland. Participants created
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

high-resolution (1-m GSD) land
cover maps to identify semantic
changes between 2013 and 2017.
The performance was evaluated
using the intersection-overunion
metric averaged over eight types
of change. The challenge was twofold: to detect which parts of the
image changed between two highresolution aerial images, and to
identify the class of change based
on weak supervision.
The 2021 DFC tackled two fundamental technical challenges rooted
in real social problems: 1) analysis
of multisensor, multiresolution, and
multitemporal data, and 2) learning
from weak supervision. These two
issues are major challenges in a wide
range of fields, from Earth observation to computer vision and maFIGURE 1. The awards for DFC 2021 were handed out during the virtual award ceremony of
chine learning. The most important
IGARSS21. The picture shows representatives of all eight winning teams as well as the chair
feature of the 2021 DFC is that it is
of IADF.
directly related to social issues such
as energy equality and environmental conservation. The re◗◗ Third place: dimartinot team; Thomas Di Martino, Maxsults of the contest will have a significant impact, not only
ime Lenormand, and Elise Colin Koeniguer (ONERA,
in terms of technological development, but also as a tool
France); multibranch CNN with 3D convolutions and
for solving real social problems.
EfficientNet [10].
◗ ◗ Third place: JIOJIO team; Ruoxian Feng, Mengjiao
OUTCOME OF THE CONTEST
Wang, Xuanming Zhang, and Jun Zhang (Xidian UniThe first- to fourth-ranked teams in each track were awardversity, China); ensembling of UNet and LinkNet with
ed as winners of the contest and presented their solutions
depthwise overparameterized convolutional layers [11].
during the 2021 IEEE International Geoscience and Remote
The four winning teams of the 2021 DFC Track MSD
Sensing Symposium (IGARSS 2021).
were the following:
The four winning teams of the 2021 DFC Track DSE were
◗◗ First place: AsheLee team; Zhuohong Li, Fangxiao Lu,
the following:
Hongyan Zhang, Guangyi Yang, and Liangpei Zhang
(Wuhan University, China) [12].
◗◗ First place: fengkexin team; Yanbiao Ma, Yuxin Li, Kexin
Feng, and Xueli Geng (Xidian University, China); dual◗◗ Second place: tulilin team; Lilin Tu, Jiayi Li, and Xin
task models based on squeeze-and-excitation networks
Huang (Wuhan University, China) [13].
followed by postprocessing based on expert priors [8].
◗◗ Third place: baoqianyue team; Qianyue Bao, Yang Liu,
Zixiao Zhang, Dafan Chen, Yuting Yang, Licheng Jiao,
◗◗ Second place: WHU_YuXia team; Yu Xia, Qi Huang, and
and Fang Liu (Xidian University, China) [14].
Hongyan Zhang (Wuhan University, China); ensembling of single-task and dual-task models based on glob◗◗ Fourth place: EVER team; Zhuo Zheng, Yinhe Liu, Shiqi
al context convolutional neural networks and random
Tian, Junjue Wang, Ailong Ma, and Yanfei Zhong (Wuforests [9].
han University, China) [15].

DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

275

Contest Data
The data of the 2021 Data Fusion Contest and its Codalab evaluation websites
[Track DSE (https://competitions.codalab.org/competitions/27943) and Track
MSD (https://competitions.codalab.org/competitions/27956)] with the public
leaderboard will remain available for benchmarking algorithms and publishing
research works. The data are usable free of charge for scientific purposes, but
the contest terms and conditions on the contest webpage remain applicable
(http://www.classic.grss-ieee.org/community/technical-committees/data
-fusion/2021-ieee-grss-data-fusion-contest/). Please read them carefully.

Join the GRSS IADF TC
You can contact the Image Analysis Data Fusion Technical Committee (IADF
TC) chairs at iadf_chairs@grss-ieee.org. If you are interested in joining the
IADF TC, please complete the form on our website (http://www.grss-ieee.org/
community/technical-committees/data-fusion/) or send an email to us including your
◗◗ first and last name
◗◗ institution/company
◗◗ country
◗◗ IEEE membership number (if available)
◗◗ email address.
Members receive information regarding research and applications on image
analysis and data fusion topics, and updates on the annual Data Fusion Contest
and on all other IADF TC activities. Membership in the IADF TC is free! You may
join the LinkedIn IEEE GRSS data fusion discussion forum, http://www.linkedin
.com/groups/IEEE-GRSS-Data-Fusion-Discussion-3678437, or join us on Twitter:
Grssiadf.

At the end of the competition, all winning teams wrote
a four-page paper on their approach, which was peer-reviewed by the DFC organizing committee. These papers
were included in the technical program of IGARSS 2021
and were presented in an invited session on the DFC during the symposium. All of these teams were awarded an
IEEE Certificate of Recognition for their winning participation during the virtual award ceremony of IGARSS 2021
(see Figure 1). The first-, second-, and third-ranked teams
in each track received special prizes, thanks to the support
of the organizing partners. An extended article discussing
the winning solutions of the first- and second-ranked teams
will be submitted for peer review to the open access IEEE
Journal of Selected Topics in Applied Earth Observations and Remote Sensing (JSTARS).
As in previous years, the 2021 DFC attracted participants from a variety of disciplines, including AI and machine learning as well as the remote sensing community.
The participation of such a diverse range of disciplines promotes the development of novel and interdisciplinary approaches to solve technical problems in the remote sensing
and geoscience communities, and also leads to a movement
to challenge global issues by bringing together knowledge
from different fields. The winning teams are all studentled, and their extraordinary efforts have led to dramatic
advances in technology for the new problems addressed in
this competition and to the formation of a vibrant community. One unique feature that differentiates the contest from
276

previous editions is that the contest was not only a competition but also led to subsequent collaborative projects and
real-world applications.
ACKNOWLEDGMENTS
The Image Analysis and Data Fusion Technical Committee
chairs would like to thank the IEEE GRSS for continuously
supporting the annual Data Fusion Content through funding and resources.
AUTHOR INFORMATION
Naoto Yokoya (yokoya@k.u-tokyo.ac.jp) is a lecturer at the
University of Tokyo, Kashiwa, Chiba, 277-8561, Japan
Pedram Ghamisi (p.ghamisi@gmail.com) is the head of
the machine learning group, Helmholtz-Zentrum DresdenRossendorf, Freiberg, D-09599, Germany.
Ronny Hänsch (rww.haensch@gmail.com) is with the
German Aerospace Center, Weßling, 82234, Germany.
Colin Prieur (colin.prieur@grenoble-inp.org) is with SICOM, Grenoble INP, Grenoble, Rhône-Alpes, 38400, France.
Hana Malha (hana.malha@hpe.com) is with HPC&AI
Competency Center, Grenoble, 38320, France.
Jocelyn Chanussot (jocelyn.chanussot@grenoble-inp
.fr) is with Grenoble INP, Grenoble, 38400, France.
Caleb Robinson (caleb.robinson@microsoft.com) is
with Microsoft AI for Good Research Redmond, Washington, 98052, USA.
Kolya Malkin (kolya.malkin@yale.edu) is with Yale University, New Haven, Connecticut, 06520, USA.
Nebojsa Jojic (jojic@microsoft.com) is with Microsoft
Research Redmond, Washington, 98052, USA.
REFERENCES
[1] N. Yokoya et al., “Open data for global multimodal land
use classification: Outcome of the 2017 IEEE GRSS data
fusion contest,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 11, no. 5, pp. 1363–1377, May 2018, doi:
10.1109/JSTARS.2018.2799698.
[2] Y. Xu et al., “Advanced multi-sensor optical remote sensing for urban land use and land cover classification: Outcome of the 2018 IEEE GRSS data fusion contest,” IEEE J.
Sel. Topics Appl. Earth Observ. Remote Sens., vol. 12, no. 6, pp.
1709–1724, Jun. 2019, doi: 10.1109/JSTARS.2019.2911113.
[3] B. Le Saux, N. Yokoya, R. Hansch, M. Brown, and G. Hager,
“2019 data fusion contest [Technical Committees],” IEEE
Geosci. Remote Sens. Mag., vol. 7, no. 1, pp. 103–105, Mar.
2019, doi: 10.1109/MGRS.2019.2893783.
[4] C. Robinson et al., “Global land-cover mapping with
weak supervision: Outcome of the 2020 IEEE GRSS data
fusion contest,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 14, pp. 3185–3199, Mar. 2021, doi: 10.1109/
JSTARS.2021.3063849.
[5] N. Yokoya et al., “2021 data fusion contest: Geospatial artificial intelligence for social good [Technical Committees],”
IEEE Geosci. Remote Sens. Mag., vol. 9, no. 1, pp. 287–C3,
2021, doi: 10.1109/MGRS.2021.3055633.
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

[6]

S. Kunwar et al., “Large-scale semantic 3-D reconstruction:
Outcome of the 2019 IEEE GRSS data fusion contest—Part
A,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 14,
pp. 922–935, Oct. 2020, doi: 10.1109/JSTARS.2020.3032221.
[7] Y. Lian et al., “Large-scale semantic 3-D reconstruction: Outcome of the 2019 IEEE GRSS data fusion contest—Part B,”
IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 14, pp.
1158–1170, Nov. 2020, doi: 10.1109/JSTARS.2020.3035274.
[8] Y. Ma et al., “Multisource data fusion for the detection of
settlements without electricity,” in Proc. IEEE Int. Geosci.
Remote Sens. Symp. (IGARSS), 2021, pp. 1839–1842, doi:
10.1109/IGARSS47720.2021.9553860.
[9] Y. Xia, Q. Huang, and H. Zhang, “A multi-model fusion
of convolution neural network and random forest for
detecting settlements without electricity,” in Proc. IEEE
Int. Geosci. Remote Sens. Symp. (IGARSS), 2021, pp. 1843–
1846, doi: 10.1109/IGARSS47720.2021.9553087.
[10] T. D. Martino, M. Lenormand, and E. C. Koeniguer, “Multibranch deep learning model for detection of settlements without electricity,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2021, pp. 1847–1850, doi:
10.1109/IGARSS47720.2021.9554286.
[11] R. Feng et al., “DO-UNet, DO-LinkNet: UNet, D-LinkNet
with DO-Conv for the detection of settlements without

electricity challenge,” in Proc. IEEE Int. Geosci. Remote
Sens. Symp. (IGARSS), 2021, pp. 1851–1854, doi: 10.1109/
IGARSS47720.2021.9553097.
[12] Z. Li, F. Lu, H. Zhang, G. Yang, and L. Zhang, “Change
cross-detection based on label improvements and multimodel fusion for multi-temporal remote sensing images,”
in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS),
2021, pp. 2054–2057, doi: 10.1109/IGARSS47720.2021.
9553120.
[13] L. Tu, J. Li, and X. Huang, “High-resolution land cover
change detection using low-resolution labels via a semisupervised deep learning approach - 2021 IEEE data fusion contest track MSD,” in Proc. IEEE Int. Geosci. Remote
Sens. Symp. (IGARSS), 2021, pp. 2058–2061, doi: 10.1109/
IGARSS47720.2021.9555033.
[14] Q. Bao et al., “MRTA: Multi-resolution training algorithm
for multitemporal semantic change detection,” in Proc.
IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2021, pp.
2062–2065, doi: 10.1109/IGARSS47720.2021.9554425.
[15] Z. Zheng, Y. Liu, S. Tian, J. Wang, A. Ma, and Y. Zhong,
“Weakly supervised semantic change detection via label
refinement framework,” in Proc. IEEE Int. Geosci. Remote
Sens. Symp. (IGARSS), 2021, pp. 2066–2069, doi: 10.1109/
IGARSS47720.2021.9553768.

TAP.
CONNECT.
NETWORK.
SHARE.
Connect to IEEE–no matter where you are–with the IEEE App.

DECEMBER 2021

Stay up-to-date
with the latest news

Schedule, manage, or
join meetups virtually

Get geo and interest-based
recommendations

Read and download
your IEEE magazines

Create a personalized
experience

Locate IEEE members by location,
interests, and affiliations

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

277

PAOLO DE MATTHAEIS

Agenda Items of the World Radiocommunication
Conference 2023 With a Potential Impact
on Microwave Remote Sensing

emote sensing is the scientific discipline of measuring
radiation emitted or reflected from an object or area to
study its physical characteristics. In particular, microwave
remote sensing uses the portion of the electromagnetic
spectrum called the radio-frequency spectrum, which is defined as the range of frequencies from 9 kHz to 3,000 GHz
by the International Telecommunication Union (ITU), a
specialized agency of the United Nations.
Rules governing the usage of the radio-frequency spectrum at the international level are contained in a treaty
called Radio Regulations (RR) [1]. The Radiocommunication
Sector of the ITU (ITU-R) is responsible for updating the RR
at a World Radiocommunication Conference (WRC) that is
held approximately every four years [2]. The next WRC will
be held in November 2023 and is referred to as WRC-23.
During a WRC, in addition to considering and deliberating on revisions to the RR proposed by the ITU members, the
agenda for the following WRC is set. WRC agendas are composed of focused topics whose scope is described by accompanying resolutions. A First Conference Preparatory Meeting
(CPM-1) is held to coordinate the work on the agenda items
among six Study Groups of the ITU-R, which carry out technical studies with the contribution of the ITU Member States
and Sector Members. Approximately six months before the
WRC, a Second Conference Preparatory Meeting (CPM-2) takes
place to consolidate the technical input for all study groups
into one CPM Report that will then be used as a guideline
in making decisions at the WRC [3].
The ITU-R study groups perform studies through their
Working Parties (WPs), with each WP focusing on specific
radiocommunication services and systems [1]. WP 7C, which
falls under Study Group 7 (Science Services), is responsible
for remote sensing systems. In ITU terminology, the radiocommunication service associated with spaceborne remote
sensing instruments is called the Earth exploration-satellite
Service (EESS), and it can be either active or passive.
Figure 1 is a graphical illustration of the WRC-23
agenda items for which WP 7C is responsible and those
to which WP 7C is contributing technical studies, as discussed during the CPM-1 [4] that followed WRC-19 and in
Digital Object Identifier 10.1109/MGRS.2021.3120892
Date of current version: 14 January 2022

278

subsequent study group meetings. Note that some of the
WRC-23 agenda items do not seek to change existing regulations globally, but only in some specific geographical
areas. The ITU refers to them as ITU Regions [1]. These regions are shown in Figure 2 and will be used in the brief
descriptions of the agenda items in the next section.
AGENDA ITEMS
AI 1.2: INTERNATIONAL MOBILE
TELECOMMUNICATIONS BETWEEN 3.3 AND 10.5 GHz
This agenda item will consider identification of the following frequency bands for international mobile telecommunications (IMT):
◗◗ 3,600–3,800 MHz and 3,300–3,400 MHz (in Region 2)
◗◗ 3,300–3,400 MHz (amend footnote RR No. 5.458 in Region 1)
◗◗ 6,425–7,025 MHz (in Region 1)
◗◗ 7,025–7,125 MHz (globally)
◗◗ 10,000–10,500 MHz (in Region 2).
Resolution 245 (WRC-19) invites ITU-R to conduct sharing
and compatibility studies that also consider protection of
other coprimary services using these bands as well as services operating in adjacent bands.
The remote sensing bands that could be affected are
◗◗ 6,425–7,250 MHz used by passive sensors without allocation (footnote RR No. 5.458)
◗◗ 10–10.4 GHz used by active sensors with a primary
allocation
◗◗ 10.6–10.7 GHz used by passive sensors with a primary
allocation.
Footnote RR No. 5.458 indicates that administrations should keep in mind the needs of the remote sensing passive instruments in their future planning of the
bands 6,425–7,075 MHz and 7,075–7,250 MHz as passive microwave sensor measurements are made in these
frequency bands.
AI 1.4: HIGH-ALTITUDE PLATFORM STATIONS AS
INTERNATIONAL MOBILE TELECOMMUNICATIONS
BASE STATIONS BELOW 2.7 GHz
This agenda item seeks to extend the opportunities for the
use of high-altitude platform stations as IMT base stations
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

Concerns for remote sensing related to this agenda item
are the following:
◗◗ The adjacent 2,690–2,700-MHz band is allocated to passive scientific services, i.e., EESS (passive), and radio astronomy (RAS).
◗ ◗ Secondar y harmonics from portions of the 694–
960-MHz band place the L-band at 1,400–1,427 MHz
at risk of interference.

(HIBSs) in certain bands below 2.7 GHz already identified
for IMT. The frequency bands under consideration are 694–
960 MHz, 1,710–1,885 MHz, 1,885–1,980 MHz, 2,010–
2,025 MHz, 2,110–2,170 MHz, and 2,500–2,690 MHz.
The HIBSs are a combination of two types of systems, IMT
and high-altitude platform stations, which individually
have a high potential for interference.

Responsible

AI 1.12
AI 1.14
AI 9.1.a
AI 9.1.d

WP 4A

WP 7C
Remote Sensing Systems

AI 1.15
AI 1.16
AI 1.17
AI 1.19

Contributing
WP 5D

AI 1.2
AI 1.4

AI 1.10

AI 1.13

AI 1.18

WP 5B

WP 7B

WP 4C

WP 4A: Efficient Orbit/Spectrum Utilization
for the Fixed-Satellite and
Broadcasting-Satellite Services
WP 4C: Efficient Orbit/Spectrum Utilization
for the Mobile-Satellite and
Radiodetermination-Satellite Services
WP 5B: Maritime Mobile Service, Aeronautical
Mobile Service, and Radiodetermination
Service
WP 5D: International Mobile
Telecommunications (IMT) Systems
WP 7B: Space Radiocommunication Applications

160° 140° 120° 100° 80°

60°

40°

20°

0°

20°

40°

60°

80°

100° 120° 140° 160° 180°

170°

FIGURE 1. The WRC-23 agenda item assignments to WP 7C.

75°

60°

Region 1
Region 2

40°

30°
20°

30°
20°
0°

0°

20°
30°

40°

40°
Region 3

Region 3
60°

160° 140° 120° 100° 80°

60°

40°

20°

0°

20°

40°

60°

80°

100° 120° 140° 160° 180°

170°

60°

FIGURE 2. The ITU Regions.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

279

Also note that some passive instruments (e.g., wideband
radiometers for cryosphere and salinity studies [5], [6]) are
planned to operate at 0.5–2.0 GHz without allocation.
AI 1.10: SHARING/COMPATIBILITY FOR
AERONAUTICAL MOBILE AT 15.4–15.7 AND
22.21–22.5 GHz
Resolution 430 (W RC -19) inv ites I T U-R to conduc t
studies on spectrum needs for new nonsafety aeronautical mobile applications for air–air, ground–air, and
air–ground communications of aircraft systems, particularly for
◗◗ possible new primary allocations to the aeronautical
mobile service in the frequency band 15.4–15.7 GHz
◗◗ changing the primary 22–22.21 GHz allocation from
“mobile except aeronautical mobile” to “mobile” service, which includes aeronautical mobile.
The bands 15.35–15.4 GHz and 22.21–22.5 GHz are
allocated to passive remote sensing systems as primary
and are adjacent to the frequency ranges considered in
this agenda item. However, currently no known missions have used or are using the 15.4–15.7-GHz band.
The 22.21–22.5-GHz band is widely used for water vapor measurements.
AI 1.12: POSSIBLE NEW SECONDARY ALLOCATION
TO EARTH EXPLORATION-SATELLITE SERVICE
(ACTIVE) AROUND 45 MHz
Resolution 656 (WRC-19) invites one to study a new secondary allocation to the EESS (active) for spaceborne radar
sounders in the 40–50-MHz band. These sensors would be
used for investigating subsurface properties of polar ice and
arid regions.
The instruments would be spaceborne on a low Sunsynchronous orbit at an altitude of around 400 km and be
subject to additional operational constraints, i.e., the radar
is to transmit only over some geographic areas (Antarctica,
Greenland, and the Sahara) at night between 3 a.m. and
6 a.m. local time to minimize errors due to ionospheric
perturbations and limit any impact on other radiocommunication services.
Technical parameters are still being discussed for an
update of Report ITU-R RS.2455, “Preliminary Results
of Sharing Studies Between a 45-MHz Radar Sounder
and Incumbent Fixed, Mobile, Broadcasting, and Space
Research Services Operating in the 40–50-MHz Frequency Range.”
AI 1.13: POSSIBLE UPGRADE OF 14.8–15.35 GHz
TO THE SPACE RESEARCH SERVICE
Resolution 661 (WRC-19) invites ITU-R to conduct sharing and compatibility studies to determine the feasibility of upgrading the space research service (SRS)
allocation to primary status in the frequency band
14.8–15.35 GHz, while still ensuring protection of the
primary services fixed and mobile within the band and
280

R AS, EESS (passive), and SRS (passive) in the adjacent
band 15.35–15.4 GHz.
Since the band would be used for transmitting and
receiving scientific data and related telemetry information, this agenda item falls under the responsibility of
WP 7B. The 14.8–15.35-GHz band is already a primary
allocation for SRS in the U.S. Table of Allocations under
RR 5.340, and no emissions are allowed in the frequency
range of 15.35–15.4 GHz. The primary concern is the
potential for radio-frequency interference from out-ofband emission (OOBE) caused by transmissions from
the adjacent band.
AI 1.14: ALLOCATIONS TO PASSIVE REMOTE
SENSING IN THE 231.5–252-GHz RANGE
This agenda item considers possible adjustments of the existing or potential new primary frequency allocations to the
EESS (passive) in the frequency range 231.5–252 GHz, with
the purpose of better aligning the EESS (passive) allocations
created 20 years ago with updated passive sensor design
requirements or adding possible new allocations to the
EESS (passive).
Current EESS (passive) allocations are 235–238 GHz
and 250–252 GHz. The band 237.9–238 GHz is also allocated to the EESS (active) for spaceborne cloud radars only.
These frequencies can be used for measurement of ice cloud
properties, and the 243.2-GHz band is being considered for
future ice cloud imaging passive sensors.
AI 1.15: GEOSTATIONARY EARTH STATIONS IN
MOTION AT 12.75–13.25 GHz
Agenda Item 1.15 is “to harmonize the use of the frequency
band 12.75–13.25 GHz (Earth-to-space) by Earth stations
on aircraft and vessels communicating with geostationary
space stations in the fixed-satellite service globally, in accordance with Resolution 172 (WRC-19).”
The potential for OOBEs into the adjacent EESS (active)
allocation at 13.25–13.75 GHz is of particular concern
for the scientific services. Resolution 172 (WRC-19) also
mentions space-to-Earth operations at 10.7–10.95 GHz,
which is adjacent to the 10.6–10.7-GHz EESS (passive) frequency band.
AI 1.16: EARTH STATIONS IN MOTION NEAR
18.6–18.8 GHz AND OTHER BANDS
Resolution 662 (WRC-19) invites the ITU-R to study and
develop technical, operational, and regulatory measures
for the use of the frequency bands 17.7–18.6, 18.8–19.3,
19.7–20.2 (space-to-Earth), and 27.5–29.1 and 29.5–30 GHz
(Earth-to-space) by Earth stations in motion (ESIM) in
nongeostationary orbit (non-GSO).
The ESIM operations are intended to provide broadband
data services to aeronautical (commercial and business aviation) and maritime routes (passenger cruise and merchant
ships, fishing vessels, and so on) on a global basis. Within the frequency bands under consideration, the bands
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

17.7–18.6 GHz and 18.8–19.3 GHz are adjacent to the band
18.6–18.8 GHz, which is allocated to the EESS (passive).
Instruments, such as the Advanced Microwave Scanning
Radiometer 2 (AMSR-2) and global precipitation measurement microwave imager (GPM-GMI) sensor, operating at
18.6–18.8 GHz are already experiencing interference from
reflections off the ocean surface of broadcast signals from
geostationary satellites, so particular attention needs to be
paid to this issue.
AI 1.17: INTERSATELLITE LINKS AT 11.7–12.7,
18.1–18.6, 18.8–20.2, AND 27.5–30 GHz
This agenda item calls for studies on provisions to allow
satellite-to-satellite links to be operated in several frequency
bands allocated to fixed satellite services. Some of these
bands are adjacent to 18.6–18.8 GHz, where EESS (passive)
systems also operate. Similarly to AI 1.16, there are concerns
for interference due to reflection off Earth’s surface as well
as from the direct path to the remote sensing sensor.
AI 1.18: SPECTRUM NEEDS AND POTENTIAL
NEW ALLOCATIONS TO THE MOBILE SATELLITE
SERVICE FOR FUTURE DEVELOPMENT OF
NARROWBAND MOBILE SATELLITE SYSTEMS
This agenda item calls for consideration of new allocations
to the mobile satellite service to be used by low-data-rate
systems for the collection and management of data from
terrestrial devices in the following bands:
◗◗ 1,695–1,710 MHz in Region 2
◗◗ 2,010–2,025 MHz in Region 1
◗◗ 3,300–3,315 MHz, 3,385–3,400 MHz in Region 2.
The main concern for scientific services is that the frequency band 1,695–1,710 MHz is allocated to the meteorological satellite service and is primarily used for downlink
of meteorological data from non-GSO meteorological satellites to ground stations around the world, thus potentially
affecting other regions beside Region 2. Also, this frequency band is allocated to the meteorological aids on a primary
basis in all three regions.
The frequency band 3,100–3,300 MHz, adjacent to 3,300–
3,400 MHz, is allocated on a secondary basis to the EESS (active), with a potential for out-of-band interference affecting
active remote sensing instruments operating in this band.
AI 1.19: NEW PRIMARY ALLOCATION TO THE
FIXED SATELLITE SERVICE IN THE
SPACE-TO-EARTH DIRECTION IN THE FREQUENCY
BAND 17.3–17.7 GHz IN REGION 2
Remote sensing has primary EESS (active) allocation in the
17.2–17.3-GHz band, and new fixed satellite service operations
at 17.3–17.7 GHz could potentially result in increased adjacent band interference.
AI 9.1.a: SPACE WEATHER SENSORS
Resolution 657 (WRC-19) calls for studies on technical and
operational characteristics, spectrum requirements, and
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

appropriate radio service designations for space weather
sensors to achieve appropriate recognition and protection
in the RR without placing additional constraints on incumbent services.
No regulatory action is to be taken at WRC-23 as an
outcome of this agenda item. ITU-R WP 7C has prepared
a comprehensive Report ITU-R RS.2456, “Space Weather
Sensor Systems Using Radio Spectrum.”
The following two other documents are under development:
◗◗ spectrum requirements and applicable radio service
designations for receive-only space weather sensors that
provide data critical for predictions and warnings
◗◗ interference criteria of receive-only space weather sensors.
AI 9.1.d: PROTECTION OF EARTH EXPLORATIONSATELLITE SERVICE (PASSIVE) AT 36–37 GHz
A preliminary study was performed for WRC-19 Agenda
Item 1.6 on the protection of EESS (passive) sensors operating in the band 36–37 GHz from space stations on non-GSO
in large constellations at 37.5–38 GHz. As a result, WRC-19
invited the ITU-R to conduct further analyses on this topic.
The nearby EESS (passive) 36–37-GHz band is critical
for satellite passive microwave measurements primarily
of precipitation and sea ice. AMSR-2, GMI, and WindSat
operate in this band, and the planned European Space
Agency mission Copernicus Imaging Microwave Radiometer will also use it. OOBEs into 36–37 GHz can arise in
several ways:
◗◗ reflections off Earth’s surface, particularly from the
ocean and ice
◗◗ direct coupling into the sensor receiving antenna
◗◗ interference with cold-sky calibration.
THE FREQUENCY ALLOCATIONS IN REMOTE
SENSING TECHNICAL COMMITTEE AND THE IEEE
GEOSCIENCE AND REMOTE SENSING SOCIETY
VIEWS ON WRC-23 AGENDA ITEMS DOCUMENT
The Frequency Allocations in Remote Sensing (FARS) Technical Committee was established by the IEEE Geoscience
and Remote Sensing Society (GRSS) in 2000. It is intended
as a means for the GRSS community to discuss spectrum
management issues that affect the remote sensing field and
defend the interests of the remote sensing community in
matters relevant to frequency allocations. Its membership
includes scientists and engineers working in remote sensing
at a variety of institutions worldwide.
The mission of the FARS Technical Committee is to serve
as an interface between the GRSS and the radio-frequency
regulatory world by
◗◗ educating the remote sensing community on spectrum
management processes and issues
◗◗ promoting the development of radio-frequency interference detection and mitigation technology
◗ ◗ organizing technical sessions at conferences, workshops, and so on regarding the preceding processes, issues, and technologies
281

◗◗ providing spectrum managers and regulators with tech-

nical input and perspective from remote sensing scientists and engineers
◗◗ fostering the exchange of information between researchers in different fields, such as remote sensing, radio astronomy, and telecommunications, with the common scope of
minimizing harmful interference between systems.
The FARS Technical Committee is working on a document providing views on international regulatory issues
affecting remote sensing operations. In particular, the GRSS
Views document includes the
THE GRSS VIEWS
WRC-23 agenda items that could
DOCUMENT INCLUDES
have a potential impact on reTHE WRC-23 AGENDA
mote sensing operations that
ITEMS THAT COULD HAVE
have been introduced here and
A POTENTIAL IMPACT ON
on other ITU-R topics that could
REMOTE SENSING
also affect remote sensing. The
purpose of this document is to
OPERATIONS THAT HAVE
be a tool enabling GRSS memBEEN INTRODUCED HERE
bers to familiarize themselves
AND ON OTHER ITU-R
with these issues and inform
TOPICS THAT COULD ALSO
remote sensing scientists and
AFFECT REMOTE SENSING.
engineers of these concerns so
that they may engage in their
administrations’ decision-making process to consider them. The FARS Technical Committee encourages the participation of the entire remote sensing community in developing this document. If you are
interested in participating in this effort, please contact the

WOMEN IN GRSS

REFERENCES
[1] “Radio regulations,” International Telecommunication Union,
Geneva, Switzerland, 2020. [Online]. Available: https://www
.itu.int/pub/R-REG-RR/en
[2] “Collection of the basic texts adopted by the plenipotentiary
conference,” Constitution and Convention of the International
Telecommunication Union, Geneva, Switzerland, 2019. [Online]. Available: https://www.itu.int/dms_pub/itu-s/opb/conf/
S-CONF-PLEN-2019-PDF-E.pdf
[3] P. de Matthaeis, R. Oliva, Y. Soldo, and S. Cruz-Pol, “Spectrum
management and its importance for microwave remote sensing [Technical Committees],” IEEE Geosci. Remote Sens. Mag.,
vol. 6, no. 2, pp. 17–25, June 2018, doi: 10.1109/MGRS.2018.
2832057.
[4] “Results of the first session of the conference preparatory meeting for WRC-23 (CPM23-1),” ITU Radiocommunication Bureau
(BR), Geneva, Switzerland, Administrative Circular CA/251,
Dec. 19, 2019.
[5] G. Macelloni et al., “Cryorad: A low frequency wideband radiometer mission for the study of the cryosphere,” in Proc. IGARSS
2018 2018 IEEE Int. Geosci. Remote Sens. Symp., pp. 1998–2000,
doi: 10.1109/IGARSS.2018.8519172.
[6] J. T. Johnson et al., “Microwave radiometry at frequencies from
500 to 1400 MHz: An emerging technology for earth observations,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 14,
pp. 4894–4914, Apr. 2021, doi: 10.1109/JSTARS.2021.3073286.
GRS

(continued from p. 273)

fact that she chose to embrace her fears, and through
them found her calling.”
Expanding on her ILC experience, Dr. Onyia was also
“honestly surprised with how much I related to the experiences and realities shared by speakers at the conference.
From tips on how to advance as a woman of color in STEM
entrepreneurship, to tips on promoting a psychologically
safe space at your workplace (my first time coming across
this concept), to rocking difficult conversations, I found
these topics very relevant to my current situation with my
company and my other job role.” On the virtual experience,
she adds that she would still prefer the face-to-face event,
especially the networking aspect, but that it was a good
compromise considering the situation. She had attended a
previous ILC in person and adds it was a really eye-opening
experience for her, even beyond her expectations: “The topics
282

Technical Committee chair and cochairs at fars_chairs@
grss-ieee.org.

addressed were so contemporary and relevant to current
work environments, and they provided handy tools for easy
implementation.”
At IDEA, our mission goal is to inspire, develop, empower, and advance all GRSS members. IDEA has developed an amazing team of dedicated volunteers. The
sponsorship of attendance at the ILC has proved to be an
inspirational experience for our IDEA members. The ILC
speakers shared experiences, insights, and advice that will
stay with our four attendees as they continue to pursue
their incredible careers. We are a global community, and
this coming together of women leaders is helping all women feel inspired and empowered. The IDEA committee is
grateful to be able to support our core-initiative coleads in
their journeys.
GRS
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

Harness the
publishing power
of IEEE Access.
®

IEEE Access is a multidisciplinary open
access journal offering high-quality peer
review, with an expedited, binary review
process of 4 to 6 weeks. As a journal
published by IEEE, IEEE Access offers
a trusted solution for authors like you
to gain maximum exposure for your
important research.

Explore the many benefits of IEEE Access:
• Receive high-quality, rigorous peer review
in only 4 to 6 weeks
• Reach millions of global users through
the IEEE Xplore® digital library by publishing
open access
• Submit multidisciplinary articles that may not
fit in narrowly focused journals
• Obtain detailed feedback on your research
from highly experienced editors

Learn more at ieeeaccess.ieee.org

• Establish yourself as an industry pioneer
by contributing to trending, interdisciplinary
topics in one of the many topical sections
IEEE Access hosts
• Present your research to the world quickly
since technological advancement is ever-changing
• Take advantage of features such as multimedia
integration, usage and citation tracking, and more
• Publish without a page limit for $1,750
per article

CHAPTERS
YUMING LU AND FUYOU TIAN

Activities of the IEEE GRSS University of Chinese
Academy of Sciences Student Branch Chapter

he IEEE Geoscience and Remote Sensing Society
(GRSS) University of Chinese Academy of Sciences
(UCAS) Student Branch Chapter is youthful and energetic. It was established on 2 September 2013 and was
the first IEEE GRSS Student Branch Chapter in Beijing.
Dr. Bin Peng served as the inaugural chair and Yuming
Lu succeeded the previous chair Fuyou Tian. Currently,
the Chapter officers are as follows:
◗◗ chair: Yuming Lu
◗◗ vice chair: Zhengdong Wang and Ke Zhang
◗◗ secretary: Yangjian Zhang and Xinyu Qian
◗◗ treasurer: Jingjing Zhao
◗◗ Young Professionals (YP) representative: Subit
Chakrabarti
◗◗ advisor: Guoqing Li.
The Chapter conducts academic exchange activities
in the field of remote sensing science and technology
through the IEEE and IEEE GRSS platforms. It aims to
enhance the understanding and connection of Student
Members to IEEE and IEEE GRSS. It also helps members to obtain sufficient technical resources and assistance and promote communication between members
and professionals.

public account to summarize the work that has been carried out, to publicize the work to be carried out, and to
mobilize everyone’s enthusiasm through publicity. Up to
1 June 2021, there were 211 subscribers to our account.
IEEE FELLOW ACADEMIC FORUM
IEEE Fellow Academic Forum is a distinguishing activity during which we invite an IEEE Fellow to deliver a
lecture to Student Members. We have done this every
year since 2015. On 17 December 2019, our Chapter
held the fifth IEEE Fellow Academic Forum at Aerospace Information Research Institute (AIR), Chinese
Academy of Sciences (CAS). At this event, we invited
the Institute of Remote Sensing and Digital Earth
(RADI) Graduate Student Association as co-organizers. Our distinguished invited guest was Prof. Bing
Zhang, deputy dean of the AIR Institute, the director

SOCIALIZATION OF THE ORGANIZATION
To better promote the IEEE GRSS UCAS Student Branch
Chapter, we built a WeChat official public account,
which is widely used in China. An official logo was
designed for the Student Chapter by Fuyou Tian, Zonghan Ma, and Yuming Lu, which combined elements of
the GRSS logo with “Beijing” (Figure 1). Our WeChat
official account number is IEEE_GRSS_UCAS. We put
the introduction of IEEE and IEEE GRSS on the site to
help students have a better understanding of the Chapter
and the IEEE. Moreover, we use the website and WeChat
Digital Object Identifier 10.1109/MGRS.2021.3115788
Date of current version: 14 January 2022

284

FIGURE 1. The official logo of the IEEE GRSS UCAS Student
Branch Chapter.
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

(a)

(b)

FIGURE 2. Live photos of the fifth IEEE Fellow Academic Forum. (a) Prof. Bing Zhang presenting his distinguished lecture. (b) A group photo of

the lecture.

of the Key L
aboratory of D
igital Earth of the CAS, and
a professor at the UCAS (Figure 2). Because of his outstanding contributions in hyperspectral image acquisition (hyperspectral image acquisition and processing), he
was one of the three scientists elected as IEEE Fellows in
the IEEE GRSS community in 2019. His lecture title was
“The Evolution of Civilization and Scientific Thinking,”
which delivered an experience and thinking about scientific research. Around 60 IEEE Student Members attended
this IEEE forum.
On 25 November 2020, our Chapter held the fifth
IEEE GRSS webinar and sixth IEEE Fellow Academic Forum online with the IEEE GRSS China office. Our distinguished invited guest was Prof. Jun Li, professor in the
School of Geography and Planning, Sun Yat-Sen University, and editor-in-chief of IEEE Journal of Selected Topics
in Applied Earth Observations and Remote Sensing (JSTARS)
(Figure 3). Her main research interests comprise remotely
sensed hyperspectral image analysis, signal processing,
remote sensing image processing, supervised/semisupervised learning, and active learning. She is also a GRSS
Distinguish Lecturer and was elected as an IEEE Fellow
in the IEEE GRSS community in 2020. In this talk, Prof.
Jun Li provided a comprehensive overview about how to
write a paper for publication from the viewpoint of the
editor-in-chief of JSTARS. Her lecture title was “How to
Write a Paper for Publication with IEEE.” In total, 244 people
attended this lecture.
TECHNICAL ACTIVITIES
On 28 October 2020, we organized the IEEE Members to
attend the IEEE GRSS webinar, “Deep Learning for Remote
Sensing Image Analysis: Applications, Methods, and Perspectives,” held by the IEEE GRSS China office (Figure 4).
Our invited guest was Runmin Dong, senior researcher
from Sense Time. In total, 124 people attended this lecture. Deep learning (DL) algorithms have seen a massive
rise in popularity over the past few years and have achieved
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

(a)

(b)
FIGURE 3. (a) The poster of the sixth IEEE Fellow Academic Forum.
(b) A screenshot of the Zoom webinar.

285

significant success in many remote sensing image analysis
tasks. Sense Time is a leading global artificial intelligence
(AI) company focused on developing cutting-edge AI technologies driven by DL algorithms. In the report, “Sense
Remote,” the self-developed
WE DEEM THAT
DL algor it hms for remote
MULTISPECTRAL UAV WILL
sensing image by Sense Time
BE USEFUL IN QUANTITATIVE
were introduced, including
ANALYSIS OF REMOTE
land cover classification, object detection, semantic segSENSING AND WILL OBTAIN
mentation, change detection,
THE TRUTH OF LANDSCAPE
and image super-resolution.
CLASSIFICATION IN THE
This report involved some sigNEAR FUTURE.
nification issues in DL, for example, semiautomatic labeling, noise label learning, and
multitask learning. The effectiveness of these algorithms
has been shown by introducing practical applications, such
as detection of illegal construction of buildings, protection
of green space and other natural resources, and so on.

(a)

TRAINING WORKSHOP AND FAMOUS
CORPORATION VISITS
We held a Multibands and Multispectral Unmanned Aerial
Vehicle (UAV) training workshop on 15 November 2019.
When we visited the UAV in Nanjing Agricultural University,
Prof. Cheng Tao, the chairman of Nanjing Chapter, showed
us the multispectral UAV in his lab during the IEEE GRSS
chairman meeting in 2019. We deem that multispectral UAV
will be useful in quantitative analysis of remote sensing and
will obtain the truth of landscape classification in the near
future. We held a workshop to train members how to use the
DJ P4M, one type of multispectral drone (Figure 5).
The 2019 IEEE Chinese Student Congress was held in Hangzhou on 29–30 July 2019. During the congress, IEEE GRSS
Student Branch Chapter representatives attended this meeting
and visited Alibaba Group and Zhijiang Lab with other Chinese Student Members (Figure 6). The activity was organized
by the IEEE Inc. Beijing Representative Office, aiming to promote
cooperation between Alibaba Group and the IEEE Inc. Beijing
Representative Office. Alibaba Group and Zhijiang Lab is a
leading technology institute in China and is appealing to

(b)

FIGURE 4. Screenshots of Dr. Dong presenting her research in the webinar. (a) Dr. Dong explaining the method of building extraction and

(b) showing the results of her research.

(a)

(b)

FIGURE 5. (a) A photo of the multibands and multispectral UAV training when visiting a UAV device at Nanjing Agricultural University.
(b) A photo of a multispectral drone training workshop.

286

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

(a)

(b)

FIGURE 6. (a) A group photo taken when we visited Alibaba Group. (b) A photo of representatives from UCAS.

FUTURE PLANS
The IEEE GRSS UCAS Student Branch Chapter will continue to organize technical activities to support and serve
Student Members and provide opportunities for Student
Members to communicate with IEEE Fellows in their fields of
interest. In addition, as important issues were carried out in
2020, the WeChat public account matured. We will continue
to maintain the running and update of the WeChat public account and will use it to carry out publicity work.

AUTHOR INFORMATION
Yuming Lu (luym@aircas.ac.cn) is with the College of Resources and Environment, University of Chinese Academy
of Sciences, Beijing, 100049, China and State Key Laboratory of Remote Sensing Science, Aerospace Information
Research Institute, Chinese Academy of Sciences, Beijing,
100101, China.
Fuyou Tian (tianfy@aircas.ac.cn) is with the College of
Resources and Environment, University of Chinese Academy of Sciences Beijing 100049, China and State Key Laboratory of Remote Sensing Science, Aerospace Information
Research Institute, Chinese Academy of Sciences, Beijing,
100101, China.
GRS

Are You Moving?
Update your contact information
so you don’t miss an issue of this magazine!
Change your address
E-MAIL: address-change@ieee.org
PHONE: +1 800 678 4333 in the United States

IMAGE LICENSED BY INGRAM PUBLISHING

graduates. Alibaba Group considered the possibility of providing internship positions for IEEE Student Members when
answering a related question.

or +1 732 981 0060 outside the United States
If you require additional assistance regarding your IEEE mailings,
visit the IEEE Support Center at supportcenter.ieee.org.

DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

287

We have 30
million reasons to
be proud.
Thanks to our donors, supporters and volunteers who answered the call of the

Realize the Full Potential of IEEE Campaign,
helping impact lives around the world through the power of
technology and education.

Illuminate

Educate

Engage

Energize

Realize Your Impact
Learn how: ieeefoundation.org/campaign

EDUCATION
FRANK CANTERS AND FRIEKE VAN COILLIE

IGARSS 2021 High School Program
Green in the City

upported by the IEEE Geoscience and Remote Sensing Society (GRSS) High School and Undergrad
Student Outreach Program (HSUSO), the educational
chairs of the 2021 IEEE International Geoscience and
Remote Sensing Symposium IGARSS 2021 developed a
remote sensing school project targeting 16- and 17-yearold pupils in the third grade of secondary education in
Flanders, Belgium. HSUSO was led by Dr. Linda Bailey
Hayden and Dr. Josée Lévesque. The Green in the City
project focused on the role of urban green in cities. Using dedicated course materials, including knowledge
clips, manuals and tutorials, and data sets for several
Belgian cities, the project offered hands-on training
in basic remote sensing and geographic information
system (GIS) skills. By exploring spectral reflectance
properties of different materials, participating students
learned how to map urban green from satellite data and
analyze the relation between urban green and other environmental and socioeconomic variables at the intraurban scale. The project reached more than 500 students
from 20 schools. Given the success of the project, plans
to roll out an international version of the program are
in the pipeline.
BACKGROUND
In 2014–2015, three Belgian universities (VUB, UGent,
KULeuven) launched the Geomobiel (Geomobile) project. Over a period of one and a half years, more than
5,000 pupils in the third grade of secondary education
in Flanders and Brussels were introduced to the basics
of surveying, remote sensing, and GIS through a series of workshops organized on site in the more than
100 schools that participated in the project. Geomobiel turned out to become such a success that in 2019,
when preparations for IGARSS 2021 started, the idea

Digital Object Identifier 10.1109/MGRS.2021.3126597
Date of current version: 14 January 2022

DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

FIGURE 1. A teacher instructing pupils.

came up to recycle the remote sensing component of
Geomobiel and use it as a basis for developing a new
educational project that would focus on sharpening
student’s geospatial research skills. The main idea
was to introduce students to basic remote sensing
concepts through hands-on tutorials and train them
in using geospatial tools to independently perform
research on important environmental themes related
to urban green, quality of life and sustainability. (See
Figures 1–3.)
TUTORIALS
With the support of the IEEE GRSS HSUSO, a set of tutorials was developed consisting of three main modules.
Module 1 focuses on how to produce an urban
vegetation map from medium-resolution multispectral imagery. Students are made familiar with the
characteristics of multiresolution satellite imagery
by examining spectral reflectance curves for different
types of urban land cover. Applying this knowledge,
289

they learn how to produce false color composite images
and how to interpret this type of imagery. They are also
introduced to the concept of vegetation indices and how
to combine information from different spectral bands
to calculate the normalized difference vegetation index
(NDVI) at pixel level. After examining the histogram of
the NDVI, they use thresholding to separate vegetation
from nonvegetation (Figure 4).
In Module 2, students learn
how to transform image data
into maps describing environSTUDENTS ARE MADE
mental properties at the neighFAMILIAR WITH THE CHARborhood level. Using basic GIS
ACTERISTICS OF MULTIREStools, they learn how to agOLUTION SATELLITE IMAGgregate pixel level data to administrative units and produce
ERY BY EXAMINING SPECmaps showing the spatial variTRAL REFLECTANCE CURVES
ation in greenness and average
FOR DIFFERENT TYPES OF
land surface temperature withURBAN LAND COVER.
in the city. They also acquire
basic cartographic skills, enabling them to produce good
looking maps with a well-chosen color scheme, legend and
other map components like map scale, north arrow, title,
and so on.
Module 3 focuses on examining spatial relationships
between different types of data. By linking greenness and
temperature maps at the neighborhood level with census
data (population density, age, income, level of education,
housing, and so on), students are encouraged to explore
relationships among greenness, temperature, and sociodemographic/housing characteristics (Figure 5). They are
also invited to look for other data on the web that might
be useful to examine the role of urban green in cities, inequalities in access to urban green, and environmental justice issues.
TEACHING THE TEACHERS
While implementing the Green in the City project, an important role was given to the teachers. Before rolling out
the project in the participating schools, teachers that registered for the project were subdivided into small working
groups (five teachers each). During an introductory group
session, teachers were informed about the goals of the project and the content of the tutorials and received detailed
instructions for downloading data and other materials
and for installing the software to get started. Next, with
respect to the tutorial materials, they also received a detailed teacher’s guide providing background information
on the concepts introduced in the tutorial, an overview
of dos and don’ts, as well as, suggestions for working out
research projects with small groups of students using the
material provided. A period of one month was suggested
for the teachers to get acquainted with the QGIS software
and with the materials provided, before starting to work on
the project with their classes.
290

The project ran over a three-month period (March–
May 2021). During the entire project, online support
was provided to teachers for solving technical issues, for
providing additional information on concepts or methods used, and for sharing tips and tricks. About once a
month, a group session was organized to exchange information and experiences with and among the teachers.
Overall, teachers and students proved very enthusiastic
about the project and about the materials provided. Few
problems occurred. In some schools, technical problems
were reported at the very start of the project that mostly
had to do with installing the QGIS software on different
types of platforms. Yet, apart from these startup issues, in
most schools the project ran smoothly. Although the time
reserved for the project differed from one school to another, depending on the amount of free space available
in the curriculum; on average students spent around 12 h
on the project.
SCHOOL COMPETITION
After completing the tutorials, participating groups of students were challenged to demonstrate their skills and develop their own research project using the image and census data provided for several Belgian cities. They were also
invited to take part in the IGARSS2021 high school program
competition by preparing a 5-min English-spoken multimedia presentation summarizing the results of their research.

FIGURE 2. Students discussing preliminary results.

FIGURE 3. Pupils carrying out independent spatial research on the
relationship between urban green and temperature.
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

development of an international demo version of the tutorial
is on the table. Upscaling for international use will involve
the translation of all teaching
materials to English, including tutorials, instruction vidOVERALL, TEACHERS AND
eos and knowledge clips, the
STUDENTS PROVED VERY
teacher’s guide, and the techENTHUSIASTIC ABOUT THE
nical manual. It also requires
PROJECT AND ABOUT THE
the preparation and formatMATERIALS PROVIDED.
ting of image and census data
sets for different cities in the
world. Besides, Brussels as
an exemplar European city, the idea is to compile readyto-use data sets for two other cities for this international

Presentations were evaluated based on: a) the scientific
nature of the presentation and interpretation of the results; b) the
storyline of the presentation; c) layout of the presentation,
quality of figures, use of language/sound; and d) creativity and
originality. The contributions of the three winners of the
competition are available on the IGARSS2021 website. The
three winning groups will be invited with their class for a day
visit to the European Space Agency and Technology Center
in Noordwijk (NL) in the Fall of 2021.
OUTLOOK
Given the success of the Green in the City project, ideas are
taking shape to make the teaching material accessible and
useful for a larger, international audience. As a first step, the

Q-GIS Software

Grid Data

Raster Calculations

NIR – R
NIR + R

Infrared

Red

Blue
Green

NDVI Value =

Ghent, NIR, Band 4

NDVI-Map

Water
Vegetation
Soil
0.3 mm

1 mm

Ghent, Red, Band 3
NDVI-Map

Ghent, NDVI
Raster Calculations

Vegetation Map

NDVI ≥ T Then Pixel Value = 1 = Vegetation
Ghent, NDVI

NDVI < T Then Pixel Value = 0 = No Vegetation
Ghent, Vegetation

FIGURE 4. Module 1: mapping urban green from satellite data.
DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

291

Ghent Greenness

2.5 5 km

Ghent Population Density

0.01–0.22
0.22–0.4
0.4–0.55
0.55–0.66
0.66–0.94

2.5 5 km

Ghent Mean Temperature

2.5 5 km

1–174
174–981
981–2,938
2,938–6,202
6,202–19,348

Ghent Mean Income

36.9–42.1
42.1–43.8
43.8–45.2
45.2–46.4
46.4–49.3

2.5 5 km

15,664–23,839
23,839–27,623
27,623–32,293
32,293–37,609
37,609–48,593

FIGURE 5. Module 3: examining spatial relationships among different types of data.

demo, one in the United States and one in Canada. This
will allow teachers and pupils, in both Europe and the
United States/Canada, to address questions related to urban green provision in a city they can easily relate to, but
also to study regional and/or intercontinental differences.
ACKNOWLEDGMENTS
We would like to thank all participating students for
their enthusiasm and for making the IGARSS 2021 high
school program a success! We would also like to express

292

our gratitude to the teachers, for taking the initiative to
participate in the project and for inspiring their students
all the way.
AUTHOR INFORMATION
Frank Canters (frank.canters@vub.be) is with Vrije Universiteit Brussels, Brussels, 1050, Belgium.
Frieke Van Coillie (frieke.vancoillie@ugent.be) is with
Ghent University, Ghent, 9000, Belgium.
GRS

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

IN MEMORIAM

Thomas C. von Deak (1954–2021)

homas (Tom) C. von Deak passed away on 30 July
2021. He was a skilled electrical engineer working for NASA at the Glenn Research Center, Cleveland, Ohio, until his retirement, in 2018. Beginning in
2020, along with consulting for the National Oceanic
and Atmospheric Administration (NOAA) on matters
of frequency management, he became increasingly involved with the IEEE Geoscience and Remote Sensing
Society (GRSS) Frequency Allocations in Remote Sensing (FARS) Technical Committee, participating in many
of its activities and becoming its senior advisor. Tom
received his B.S. degree in electrical engineering from
Cornell University, Ithaca, New York.
His NASA career had many phases. From 1991 to
1996, Tom worked on NASA’s Advanced Communications
Technology Satellite (ACTS), contributing to the success of
that unique program, which paved the way for the use
of Ka-band frequencies for commercial and government
communications. ACTS was a significant component of
the NASA space communications program and provided
for the development and flight testing of high-risk, advanced communications technologies. Using multiple
spot beam antennas and advanced onboard switching
and processing systems, ACTS pioneered initiatives in
communications satellite technology. NASA’s Glenn Research Center (previously the Lewis Research Center)
was responsible for the development, management, and
operation of ACTS as part of a long legacy of experimental communications satellites. Tom led efforts to obtain
public sector partnerships to collaborate with the ACTS
Experiment Office; these partner experimenters represented a variety of technologies and fields, including the
ISDN, high data rates, and the sciences. ACTS became
known as a “switchboard in the sky,” and the geostationary satellite was launched in September 1993. While

Digital Object Identifier 10.1109/MGRS.2021.3120438
Date of current version: 14 January 2022

DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

originally planned for 24–48 months of operation, the
system remained operational for 127.
After the launch of ACTS, Tom was part of the ACTS
Experiments Office, which worked with industry and academia on various satellite-based experiments. Through
its coordination, 53 Gigabit Earth Stations were built and
used by more than 100 experimenters. As leader of the
demonstration planning team, Tom was a participant in
many of those experiments, including a groundbreaking
demonstration of 520-Mb/s TCP/IP throughput using asynchronous transfer mode
(ATM) among several ground
stations. During this time, he
FROM 1991 TO 1996,
authored papers for telecomTOM WORKED ON
munications conferences and
NASA’S ADVANCED
coauthored several IEEE papers
COMMUNICATIONS
pertaining to high-data-rate applications of ACTS technology.
TECHNOLOGY SATELLITE,
He also represented NASA at the
CONTRIBUTING TO THE
National Institute of Standards
SUCCESS OF THAT UNIQUE
and Technology ISDN Forum
PROGRAM.
to advance satellite–terrestrial
ISDN telephony interoperability. For their pioneering achievements, Tom and the ACTS team received several NASA
awards, and the ACTS Gigabit Satellite Network was officially inducted into the Space Technology Hall of Fame.
Building on his experience with ACTS, Tom also represented NASA in the ATM Forum to obtain a specification
inclusive of geosynchronous satellite requirements. The
ATM Forum was a telecommunication consortium that
established industry standards for ATM protocols operating predominately over fiber optics.
Just prior to the official end of the ACTS experiments
program, in 2000, Tom looked for new challenges and
joined the NASA Spectrum Management Office at the
Glenn Research Center. He first worked on various
aspects of spectrum management supporting space
293

Tom led efforts to improve the interresearch missions and space radiocomference environment for passive sensors
munication systems for NASA programs.
through the development of and advocacy
He assisted, for example, in efforts to obfor inputs to task groups of the ITU-R. He
tain spectrum authorization from the
coordinated NASA interests with other U.S.
National Telecommunications and Inforgovernment agencies, such as NOAA and
mation Administration (NTIA) to enable
the National Science Foundation. He supNASA’s Tracking Data Relay Satellite Sysported the NTIA in drafting a proposal to
tem (TDRSS) to provide support for comWRC 2003 on agenda item 1.8.2. The promercial programs, and he eventually got
posal met with opposition from active serinvolved in international spectrum manvice interests but underscored the need to
agement work, attending various technical
find a means for protecting passive service
meetings of the International Telecommu- FIGURE 1. Thomas von Deak was
operations from adjacent band interfernication Union Radiocommunication Sec- a fierce advocate of frequency
spectrum use for remote sensing
ence. His efforts culminated in the WRC
tor (ITU-R).
2007 decision to mandate specific protecHis first foray into supporting remote applications.
tions in the radio regulations for certain
sensing spectrum management came in
passive sensing frequency bands that are critical to weather
various groups that were studying the effects of ultrawideforecasting and climate studies.
band communications systems and their possible impact
Delegations to WRCs represent national commercial
on space radiocommunication systems and remote sensing
and not-for-profit interests. Participating as a U.S. delegate
systems, especially passive remote sensing systems. As a
requires not only expertise and the ability to converse
U.S. delegate to many World Radiocommunication Conferclearly on highly technical subjects but strong negotiating
ences (WRCs), including those in 2003, 2007, 2012, 2015,
skills, gained largely from experience. Through many years,
and 2019, Tom contributed vital work in the area of proTom was responsible for various WRC agenda items of
tecting passive sensors from out-of-band (OOB) emissions.
interest to NASA, working on important studies and texts
At the 2003 WRC, for example, he represented NASA on
for the ITU-R Conference Preparatory Meeting (CPM) reports
two key issues related to enthat form the basis of members’ proposals to the WRC.
suring the long-term protecAs part of that work, he was active in the Organization
tion of TDRSS forward-link
TOM BECAME PASSIONATE
of American States Inter-American Telecommunication
spectrum and shielding pasABOUT SPECTRUM
Commission (CITEL). At CITEL, he was a U.S. delegate and
sive service bands from adjaMANAGEMENT FOR REMOTE
spokesperson for several WRC proposals that were imporcent band interference. Both
tant to NASA and remote sensing. Tom was also detailed
matters were hotly debated
SENSING SYSTEMS AND
for some time to the NASA Systems Engineering Office,
and
required
long
hours
of
THEIR PROTECTION FROM
Space Communications and Navigation division, at NASA
negotiation.
Considered
to
RADIO-FREQUENCY
Headquarters. In this role, he provided valuable input in
be
the
most
contentious
topic
INTERFERENCE.
systems planning and engineering for NASA’s communicaof the conference that year,
tion networks.
the TDRSS issue was fully
Although he had a great deal of experience in various
resolved, ensuring long-term
aspects of telecommunication and radiocommunication,
access for space missions, notably the Space Transport Sysespecially using satellites and other space assets, Tom betem and the International Space Station. The U.S. head of
came passionate about spectrum management (Figure 1)
delegation recognized Tom in writing for his contributions.
for remote sensing systems and their protection from radioTom continued to devote time and energy to remote
frequency interference (RFI). He took on many roles, assistsensing, particularly spaceborne passive microwave sening the NASA remote sensing spectrum manager at various
sors providing the ability to obtain all-weather, day and
international venues, such as ITU-R Working Party 7C,
night, global observations of Earth and its atmosphere. He
which deals with remote sensing systems. He also particiorganized and cochaired an international workshop on
Earth exploration-satellite service (EESS) wideband downpated in the Space Frequency Coordination Group, which
link spectrum to examine how to best utilize the 8,025–
includes more than 30 of the world’s civil space agencies.
8,400-MHz band for downlinking Earth remote sensing
He represented NASA on the World Meteorological Organidata in a cooperative manner. He developed the agenda
zation subgroup on RF issues as a subject matter expert in
and solicited papers/presenters from across all aspects of
active and passive remote sensing systems.
the EESS community (government and private sector EESS
Tom strongly believed in mentoring and sharing knowlservice providers—ground station operators, foreign and
edge. He developed content for training classes to prodomestic, as well as federal regulators). Many of the attendvide information to developing-country regulators on
ees wrote to say that the workshop was the best they had
the use of spaced-based remote sensing, the value of
participated in.
spectrum regulations, and the importance of protecting
294

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

the operation of remote sensing and GPS. He also developed and taught a popular course to International Space
University graduate students. Through role-playing exercises in a model WRC setting, students learned about the
procedures and practices of an actual intergovernmental
treaty meeting.
It was during his work to protect passive remote sensing
bands from OOB emissions that Tom became involved with
FARS Technical Committee. He recognized that the remote
sensing community needed to be more involved with spectrum management to promote its interests. He took the initiative to prepare a comprehensive presentation, “Spectrum
101 for Remote Sensing,” that he gave at a special session
organized by the FARS Technical Committee at the 2006
International Geoscience and Remote Sensing Symposium
(IGARSS), in Denver, Colorado. He attended many subsequent IGARSS meetings and assisted FARS members with
spectrum management matters through his efforts to educate and interact with remote sensing scientists and engineers. In 2020, already a FARS senior advisor on spectrum
management, he proposed and became involved with an
initiative to standardize the methodology to quantitatively
assess frequency bands with respect to RFI and contributed,
among other things, to defining the GRSS views for WRC23 agenda items. Overall, Tom assisted FARS members
with spectrum management for more than 15 years as part
of his commitment to educate and engage remote sensing
scientists and engineers.
In the United States, Tom participated in some activities
of the National Academy of Sciences Committee on Radio
Frequencies (CORF). He gave briefings to CORF members
at their annual meetings on various aspects of spectrum
management, including consideration of WRC agenda
items that might directly or indirectly affect remote sensing science. He also worked with representatives of NASA’s
Science Mission Directorate Earth Sciences Division to further the interest and awareness of remote sensing spectrum
management within the NASA science community.
In 2013, Tom was tapped to be NASA’s remote sensing
spectrum manager. Chief among his responsibilities was
ensuring that all spectrum/regulatory aspects of NASA’s
Earth science remote sensing program were addressed and
protected in ITU-R meetings, and his contributions to ITU-R
Working Party 7C were of critical importance. He advanced
work at the ITU-R level in several areas, including ITU-R
report RS.2178, “The Essential Role and Global Importance
of Radio Spectrum Use for Earth Observations and Related
Applications” and two ITU-R recommendations, RS.1859-1
and RS.1883-1: “Use of Remote Sensing Systems for Data
Collection to be Used in the Event of Natural Disasters and
Similar Emergencies” and “Use of Remote Sensing Systems
in the Study of Climate Change and the Effects Thereof.”
These international documents have helped bring awareness of the importance of Earth observation systems and

DECEMBER 2021

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

the radio spectrum that they use to many people in the telecommunications field.
Tom generated U.S. input to ITU-R document 7C/101,
“Active Sensor Characteristics on Using Allocated Bands
From 432 MHz to 238 GHz Applications,” and he was instrumental in developing ITU-R recommendation RS.2106-0,
“Detection and Resolution of RFI to Earth Exploration Satellite Service (Passive) Sensors,” which helps explain the
difficulties of dealing with RFI for passive sensors on satellites. He also contributed to the development and updates
of recommendation RS.2017-0, “Performance and Interference Criteria for Satellite Passive Remote Sensing,” which
added remote sensing applications above 275 GHz to the
ITU-R technical literature. He led a multiyear effort to develop recommendation RS.2105-0, “Typical Technical
and Operational Characteristics of Earth ExplorationSatellite Service (Active) Systems Using Allocations Between 432 MHz and 238 GHz,” and he was a key participant in efforts to update recommendation RS.1861-0,
“Typical Technical and Operational Characteristics of
Earth Exploration-Satellite Ser vice (Passive) Systems
Using Allocations Between 1.4 and 275 GHz,” which was
recently accepted at ITU-R
Study Group 7 (Science Services), the parent organizaHIS PASSION FOR THE
tion of Working Party 7C.
BEAUTY AND
When the approval process
SUSTAINABILITY OF THIS
is complete, the result will
be an updated and much
PLANET FUELED HIS
improved RS.1861-1 thanks
ADVOCACY FOR SCIENCE
in no small part to Tom’s efTO STUDY EARTH AND
forts during many years.
MITIGATE CLIMATE
After retiring from NASA,
CHANGE.
Tom continued to support
the remote sensing community in various venues as a
consultant to NOAA. He was always passionate about the
need to protect Earth observation and remote sensing systems, especially those employing passive sensors on satellites. His calm demeanor and logical technical and policy
arguments were effective in international negotiations. His
insight and expertise in spectrum management, especially
for remote sensing, were highly valued by his many colleagues both domestically and internationally.
Always curious about how things work, Tom had an inventive nature and often customized tools and equipment
for special purposes. His passion for the beauty and sustainability of this planet fueled his advocacy for science to
study Earth and mitigate climate change. With a long career
that took him from early satellite communications to global negotiations on behalf of NASA and the United States,
Tom left an indelible imprint on spectrum management for
remote sensing, and his absence creates a hole that will be
difficult to fill.

295

Gail Skofronick-Jackson (1963–2021)

actly who she was—an avid outdoors
ith a heavy heart, we announce that
person, an athlete, and an admirer of
Dr. Gail Skofronick-Jackson, 58, of
our Earth.”
McLean, Virginia, died suddenly on 7 SepGail is survived by her beloved
tember 2021. Gail was deployed with a joint
husband of 29 years, Dr. David JackNASA–European Space Agency airborne
son, and their children, Marina (25)
campaign team in St. Croix, U.S. Virgin Isand Matthew (23); her parents, Dr.
lands. On a day off from the experiments,
James and Dot Skofronick of Talshe perished in a tragic accident while hiklahassee; brothers, Greg of Ann
ing with colleagues.
Arbor, Michigan, and Gary (Anna)
Gail was a brilliant scientist—as well
Skofronick of DeLand, Florida; sisas a deeply passionate and principled perter, Gretchen (Dr. Paul) Desch of Nason—who carried her enthusiasm for life
perville, Illinois; and many nieces,
over into her career at NASA. She was a dednephews, aunts, uncles, and cousins.
icated researcher whose interests included
Dr. Gail Skofronick-Jackson. (Photo
Gail loved spending time with her
passive remote sensing, radiative transfer
courtesy of Warren Shultzaberger/NASA.)
family, traveling, and cooking gourtheory, and the detection and estimation of
met meals with her husband. Her
falling snow using active and passive spacemany interests included running, swimming, hiking, cavborne sensors.
ing, and gardening. She was a dedicated mother who took
Gail was born in Madison, Wisconsin, on 12 February
great joy in her children’s numerous interests and activities.
1963, and moved to Tallahassee in 1964 with her parents,
Gail cherished her many friends, especially in the McLean
Dr. James and Dot Skofronick. Gail received her B.S. deMoms Run This Town running community. She was also
gree in electrical engineering
active in her church, the Foundry United Methodist Church
from Florida State Univerin Washington, D.C.
sity. She went on to complete
WITHIN THE GRSS, GAIL
Gail was an IEEE Fellow and active within the IEEE
her M.S. and Ph.D. degrees in
CHAMPIONED IEEE WOMEN
Geoscience and Remote Sensing Society (GRSS), servelectrical engineering from
IN ENGINEERING,
ing on its Administrative Committee from 2012 to 2016.
the Georgia Institute of TechORGANIZED SUCCESSFUL
She was associate editor of IEEE Transactions on Geoscience
nology, after which she was
and
Remote Sensing and IEEE Geoscience and Remote Sensing
hired
at
NASA.
At
the
time
of
WOMEN IN GRSS EVENTS,
Magazine.
She was also part of the local organizing comher
death,
she
was
a
program
AND PAVED THE WAY FOR
mittee
for
the 2020 International Geoscience and Remote
manager
at
NASA
HeadquarTHE GRSS WOMEN
Sensing Symposium. Within the GRSS, Gail championed
ters, Science Mission DirecMENTORING WOMEN
IEEE Women in Engineering, organized successful Women
torate. Dr. Karen St. Germain,
INITIATIVE.
in GRSS events, and paved the way for the GRSS Women
director of the Earth Sciences
Mentoring Women initiative.
Division at NASA HeadquarBecause Gail was always excited to encourage young
ters and Gail’s longtime colwomen to pursue science, technology, engineering, and
league, stated that “she was one of our very best—brilliant,
mathematics careers, a memorial scholarship has been esthoughtful, and deeply committed to the science we do and
tablished in her name for students studying science and
the integrity with which we do it. And she died being exelectrical engineering at Florida State University. Information about this scholarship can be found at https://spark.fsu
Digital Object Identifier 10.1109/MGRS.2021.3132777
.edu/Project/1935. GRS
Date of current version: 14 January 2022

296

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE

DECEMBER 2021

What + If = IEEE

420,000+ members in 160 countries.
Embrace the largest, global, technical community.
People Driving Technological Innovation.

ieee.org/membership
knowledge

community

#IEEEmember
professional

development

career

advancement

Share Your
Preprint Research
with the World!
TechRxiv is a free preprint server for unpublished
research in electrical engineering, computer
science, and related technology. Powered by
IEEE, TechRxiv provides researchers across a
broad range of fields the opportunity to share
early results of their work ahead of formal
peer review and publication.

BENEFITS:
• Rapidly disseminate your research findings
• Gather feedback from fellow researchers
• Find potential collaborators in the
scientific community
• Establish the precedence of a discovery
• Document research results in advance
of publication

Upload your unpublished research today!

Follow @TechRxiv_org
Learn more techrxiv.org