Текст
                    
Introducing IEEE Collabratec™ The premier networking and collaboration site for technology professionals around the world. IEEE Collabratec is a new, integrated online community where IEEE members, researchers, authors, and technology professionals with similar fields of interest can network and collaborate, as well as create and manage content. Featuring a suite of powerful online networking and collaboration tools, IEEE Collabratec allows you to connect according to geographic location, technical interests, or career pursuits. You can also create and share a professional identity that showcases key accomplishments and participate in groups focused around mutual interests, actively learning from and contributing to knowledgeable communities. All in one place! Learn about IEEE Collabratec at ieee-collabratec.ieee.org Network. Collaborate. Create.
DECEMBER 2021 VOLUME 9, NUMBER 4 –40 –60 –60 Latitude (°) 0 –20 –40 –60 –150 –100 –50 0 50 Longitude (°) 100 150 40 40 0 –60 1 0.5 0 20 0 –20 –40 –60 –150 –100 –50 0 50 Longitude (°) 100 Latitude (°) 40 Latitude (°) 60 20 150 1 0.5 0 0 –20 –60 100 150 20 0 –40 –60 1 0 40 20 0 –20 –40 –1 –60 –150 –100 –50 0 50 Longitude (°) 100 150 Latitude (°) 60 Latitude (°) 80 60 Spatial–Temporal 80 60 40 1 0 40 20 0 –20 –40 –1 –60 –150 –100 –50 0 50 Longitude (°) (a) (b) 100 150 150 100 150 1 0.5 0 –150 –100 –50 0 50 Longitude (°) 80 –20 100 8 20 –40 –150 –100 –50 0 50 Longitude (°) 1 0.5 0 –150 –100 –50 0 50 Longitude (°) 80 60 I (x) 1 × 1 × 46 20 80 I (x) Latitude (°) 150 40 60 –20 Latitude (°) 100 1 0.5 0 80 –40 Spatial–Temporal I (x ) –20 –40 –150 –100 –50 0 50 Longitude (°) FEATURES 60 0 I (x) –20 1 0.5 0 I (x) 0 80 20 I (x) 40 20  ethods for Small, Weak Object M Detection in Optical High-Resolution Remote Sensing Images  by Wei Han, Jia Chen, Lizhe Wang, Ruyi Feng, Fengpeng Li, Lin Wu, Tian Tian, and Jining Yan Spatial–Temporal 60 40 Latitude (°) 80 60 Spatial–Temporal 7×7×1 80 I (x) Latitude (°) WWW.GRSS-IEEE.ORG 1 0 –1 –150 –100 –50 0 50 Longitude (°) 100 150 (c) 35 Hyperspectral Image Clustering  by Han Zhai, Hongyan Zhang, Pingxiang Li, and Liangpei Zhang 68 PG. 191  hange Detection From Very-HighC Spatial-Resolution Optical Remote Sensing Images  by Dawei Wen, Xin Huang, Francesca Bovolo, Jiayi Li, Xinli Ke, Anlu Zhang, and Jón Atli Benediktsson ON THE COVER: The cover on this issue illustrates the development trend of high-resolution remote sensing (HRRS) data sets over the last decade. The feature by Han, et al., on page 8, reviews the use of these data sets in the development, verification, and evaluation of new algorithms for detection of objects in HRRS images. 102  he CCSDS 123.0-B-2 “Low-Complexity T Lossless and Near-Lossless Multispectral and Hyperspectral Image Compression” Standard  by Miguel Hernández-Cabronero, Aaron B. Kiely, NASA/JPL Matthew Klimesh, Ian Blanes, Jonathan Ligo, Enrico Magli, and Joan Serra-Sagristà 120  dvances and Opportunities in Remote A Sensing Image Geometric Registration  by Ruitao Feng, Huanfeng Shen, SCOPE Jianjun Bai, and Xinghua Li IEEE Geoscience and Remote Sensing Magazine (GRSM) will inform readers of activities in the IEEE Geoscience and Remote Sensing Society, its technical committees, and chapters. GRSM will also inform and educate readers via technical papers, provide information on international remote sensing activities and new satellite missions, publish contributions on education activities, industrial and university profiles, conference news, book reviews, and a calendar of important events. 143 Deep Learning Meets SAR  by Xiao Xiang Zhu, Sina Montazeri, Mohsin Ali, Yuansheng Hua, Yuanyuan Wang, Lichao Mou, Yilei Shi, Feng Xu, and Richard Bamler 173 Forward-Looking GroundPenetrating Radar  by Davide Comite, Fauzia Ahmad, Moeness G. Amin, and Traian Dogaru 191 Gaussianizing the Earth  by J. Emmanuel Johnson, Valero Laparra, Digital Object Identifier 10.1109/MGRS.2021.3120176 DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE María Piles, and Gustau Camps-Valls 1
FEATURES (CONTINUED) 209  ireless Sensor Networks W Applied to Precision Agriculture  by Mónica Karel Huerta, Andrea García-Cedeño, Juan Carlos Guillermo, and Roger Clotet 223 EDITORIAL BOARD Dr. James L. Garrison Editor-in-Chief School of Aeronautics and Astronautics Purdue University West Lafayette, Indiana 47907 USA Email: jlg@ieee.org Dr. Paolo Gamba University of Pavia, Italy  pectral Variability S in Hyperspectral Data Unmixing Dr. Linda Hayden Center of Excellence in Remote Sensing Education and Research Elizabeth City State University, USA Email: haydenl@mindspring.com Tales Imbiriba, José Carlos Moreira Bermudez, Cédric Richard, Jocelyn Chanussot, Lucas Drumetz, Jean-Yves Tourneret, Alina Zare, and Christian Jutten Dr. Irena Hajnsek ETH Zürich, Switzerland, and DLR, Germany Email: Irena.Hajnsek@dlr.de  by Ricardo Augusto Borsoi, COLUMNS & DEPARTMENTS 3 6 271 274 284 289 293 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE FROM THE EDITOR PRESIDENT’S MESSAGE WOMEN IN GRSS TECHNICAL COMMITTEES CHAPTERS EDUCATION Dr. Michael Inggs University of Cape Town, South Africa Email: mikings@gmail.com Dr. John Kerekes Cochair, Conference Advisory Committee Rochester Institute of Technology, USA Email: kerekes@cis.rit.edu Dr. David M. Le Vine NASA Goddard Space Flight Center, USA Email: David.M.LeVine@nasa.gov Dr. Gail Skofronick Jackson NASA Goddard Space Flight Center, USA Email: Gail.S.Jackson@nasa.gov Dr. Marwan Younis DLR, Germany Email: marwan.younis@dlr.de IN MEMORIAM MISSION STATEMENT The IEEE Geoscience and Remote Sensing Society of the IEEE seeks to advance science and technology in geoscience, remote sensing and related fields using conferences, education, and other resources. IEEE Geoscience and Remote Sensing Magazine (ISSN 2168-6831) is published quarterly by The Institute of Electrical and Electronics Engineers, Inc., IEEE Headquarters: 3 Park Ave., 17th Floor, New York, NY 10016-5997, +1 212 419 7900. Responsibility for the contents rests upon the authors and not upon the IEEE, the Society, or its members. IEEE Service Center (for orders, subscriptions, address changes): 445 Hoes Lane, Piscataway, NJ 08854, +1 732 981 0060. Individual copies: IEEE members US$20.00 (first copy only), nonmembers US$110.00 per copy. Subscription rates: included in Society fee for each member of the IEEE Geoscience and Remote Sensing Society. Nonmember subscription prices available on request. Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries are permitted to photocopy beyond the limits of U.S. Copyright Law for private use of patrons: 1) those post-1977 articles that carry a code at the bottom of the first page, GRS OFFICERS President Dr. David Kunkee The Aerospace Corporation, USA Executive Vice President Dr. Mariko Burgin Jet Propulsion Laboratory (JPL), USA Vice President of Publications Dr. William Emery University of Colorado, USA Vice President of Information Resources Dr. Sidharth Misra Jet Propulsion Laboratory (JPL), USA Vice President of Professional Activities Dr. Lorenzo Bruzzone University of Trento, Italy Vice President of Meetings and Symposia Dr. Saibun Tjuatja The University of Texas at Arlington Vice President of Technical Activities Dr. Fabio Pacifici Maxar, USA Secretary Dr. Steven C. Reising Colorado State University, USA Chief Financial Officer Dr. John Kerekes Rochester Institute of Technology, USA IEEE PERIODICALS MAGAZINES DEPARTMENT Journals Production Manager Sara T. Scudder Senior Managing Editor Geraldine Krolin-Taylor Senior Art Director Janet Dudar Associate Art Director Gail A. Schnitzer Production Coordinator Theresa L. Smith Director, Business Development– Media & Advertising Mark David +1 732 465 6473 m.david@ieee.org Fax: +1 732 981 1855 Advertising Production Manager Felicia Spagnoli Production Director Peter M. Tuohy Editorial Services Director Kevin Lisankie Senior Director, Publishing Operations Dawn M. Melley provided the per-copy fee indicated in the code is paid through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923 USA; 2) pre-1978 articles without fee. For all other copying, reprint, or republication information, write to: Copyrights and Permission Department, IEEE Publishing Services, 445 Hoes Lane, Piscataway, NJ 08854 USA. Copyright © 2021 by the Institute of Electrical and Electronics Engineers, Inc. All rights reserved. Application to Mail at Periodicals Postage Prices is pending at New York, New York, and at additional mailing offices. Canadian GST #125634188. Canada Post Corporation (Canadian distribution) publications mail agreement number 40013885. Return undeliverable Canadian addresses to PO Box 122, Niagara Falls, ON L2E 6S8 Canada. Printed in USA. IEEE prohibits discrimination, harassment, and bullying. For more information, visit http://www.ieee.org/web/aboutus/whatis/policies/p9-26.html. Digital Object Identifier 10.1109/MGRS.2021.3120166 2 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
FROM THE EDITOR BY JAMES L. GARRISON Introducing the December Issue W elcome to the December 2021 Issue of IEEE Geoscience and Remote Sensing Magazine! Our theme in this issue is “innovative methods for different modalities.” With this in mind, we have nine feature articles covering a variety of different processing and analysis techniques with applications across a range of remote sensing modalities. Our first five features lie in the optical spectrum. We start off with the problem of identifying small, weak, and typically anthropogenic objects in high-resolution remote sensing images. Applications such as urban monitoring, military reconnaissance, and national security all make use of this capability. Han et al. review the challenges of this problem. A broad range of object-detection frameworks are described, including template matching, object-based image analysis, classical machine learning, and deep learning (DL). These are applied to 13 widely used data sets and evaluated for detection speed and accuracy. Recent advances to improve performance in the presence of image degradation, sensor limitations, object variation, and insufficient training data as well as improvements in suppressing background information and incorporating related context information are presented. Some future research directions include the use of multisource data fusion, weakly supervised detection, automatic neural architecture search, and a universal object framework. The article concludes by identifying the promising future research directions. This issue’s cover image was taken from Figure 6 of the article. Hyperspectral images (HSIs) are high-dimensional data sets, which can be characterized as having a “cube” structure with thousands of spectral bands forming the third dimension. The interpretation of HSI data using supervised methods requires a large amount of high-­ quality labeled data for training. Collecting and processing a ­sufficiently large training set is very labor and time Digital Object Identifier 10.1109/MGRS.2021.3129109 Date of current version: 14 January 2022 DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE intensive, but there is a risk of underfitting if an insufficient number of examples are provided. Unsupervised classification methods can expand the use and interpretation of HSIs in some applications. Zhai et al. review the current status of clustering, a widely used unsupervised method that groups similar pixels and separates dissimilar pixels into different classes based only upon the properties of the hyperspectral data themselves, obviating the need for labeled samples. They group clustering methods into nine main categories: centroid based, density based, probability based, bionics based, intelligent computing based, graph based, subspace clustering, DL based, and hybrid-mechanism based. Several popular clustering methods are then evaluated on two widely used images: Indian Pines and the University of Houston. Quantitative measure of the clustering performance (e.g., overall accuracy and purity) along with running time were compared across these different methods. Spectral–spatial methods were generally found to outperform spectral-based approaches, suggesting the value that spatial information adds to improve clustering. Centroid-, density-, and probability-based methods generally did not perform well because HSIs often do not meet their basic assumptions, but they have low complexity and are efficient with large data sets. Two examples of recently developed subspace clustering methods were found to show good potential for use with HSIs, but at a large computational cost. The article concludes by identifying several HSI clustering challenges and possible future research lines, including the tradeoff between accuracy and efficiency, pointing toward hybrid approaches and the integration with high-performance computing. Multifeature methods and object or subpixel-based methods are also identified, along with DL, as future research directions. F­ inally, automatic estimation of the number of clusters is an important research problem that has not, thus far, received much attention. The feature by Wen, et al., is concerned with change detection, an important technique in remote sensing. This 3
becomes a particularly challenging problem with the advent of very high resolution (VHR) images. A comprehensive review of the research on VHR change detection is provided covering methods, applications, and discussion of future directions. Moving on, the next article addresses compression, a necessity for handling an increasing amount of data while being limited by communication bandwidth or power. As ­indicated in the previous article, HSI generates a substantially larger volume of data than other imagers (up to 5 TB per day in the case of HyspIRI), so effective compression is required. “Nearlossless” algorithms can provide a balance between reduction in data volume and error by allowing the user to specify a bound on the maximum error introduced by compression. In our fourth feature, Hernández-Cabronero et al. present a comprehensive review of the Consultative Committee for Space Data Systems (CCSDS) 123.0-B-2 Standard with “Low-Complexity Lossless and Near-Lossless Multispectral and Hyperspectral Image Compression,” the latest in a series of standards developed by the CCSDS. CCSDS 123.0-B-2 incorporates support for near-lossless compression to achieve significantly better results. It has a number of novel features, including enhanced performance on low-entropy data, modes to facilitate efficient hardware implementation, and support for ancillary information. Decompression is backward compatible with data generated by CCSDS 123.0-B-1. Compression performance was demonstrated using mostly public data consisting of 17 multispectral images, 38 HSIs and two sounder data samples, produced from 14 different instruments. Generally, the new standard was able to meet state-of-the-art performance specifications in absolute or relative error measurements. Our fifth feature, by Feng et al., addresses systematic geometric distortions in attempting to align two or more remote sensing images, collected at different times, with different viewing angles, or from different instruments. Registration techniques have been developed to perform this alignment using information in the images themselves. This is often a required preprocessing step for advanced methods such as image mosaicking or image fusion. A review of intensity-, feature-based, and combination approaches to registration is presented, along with evaluation methods for registration performance (tie-point accuracy, transformation model performance, and alignment error). Some future trends include acceleration of the registration process, the use of compressed sensing methods, and frame-by-frame alignment. A combination of different advanced methods and high-performance computing may be necessary to meet future requirements for high-resolution, heterogenous, and cross-scale remote sensing images. The next feature, by Zhu et al., marks a good transition from optical to microwave modalities, describing the largely unrealized potential to apply DL methods (which have a long history in optical remote sensing) to synthetic aperture radar (SAR) data. DL models seek to encode input data into effective feature representations for target tasks. Common meth4 ods include convolutional neural networks, recurrent neural networks, and generative adversarial networks. Most of the DL approaches are supervised, however, and the existence of high-quality benchmark data for training is important. Although DL has proven quite effective in extracting data from optical images, its application to SAR has been quite limited mostly due to the lack of these large and representative benchmark data sets. In addition, some of the specific characteristics of SAR signals have made the direct application of DL models more difficult. These characteristics include their larger dynamic range, signal statistics, imaging geometry, and that native SAR data are complex with much information content in the phase. This article reviews six typical applications of DL to SAR: terrain surface classification, object detection, parameter inversion, despeckling, interferometric SAR, and the data fusion of SAR with optical images. The generation of representative training data sets, unsupervised DL, interferometric data processing, quantification of uncertainty, large-scale nonlinear optimization problems, and cognitive sensors are identified as promising future trends in this area. Several spaceborne SAR missions are expected to be launched in the upcoming years. Hopefully, this article will encourage more joint initiatives in this area. Forward-looking ground-penetrating radar (FL-GPR) has found important applications in real-time security, military situational awareness, and humanitarian demining. Typically mounted on a vehicle, FL-GPR can provide target detection from a standoff distance. Comite et al. review methods of detecting, locating, and imaging surface targets from arraybased FL-GPR systems, considering aspects of both the electromagnetic modeling and signal processing in the problem formulation and solutions. These are challenging problems as the signal return is strongly influenced by soil conditions and surface roughness. Furthermore, the target signature can be quite weak because most of the transmitted energy is forward scattered, and returns from the ground interface can dominate the radar measurements and obscure the target. Electromagnetic modeling and image-formation methods applied to this problem are introduced. The article also reviews migration approaches adapted from seismology, microwave tomography, and data-adaptive/compressive sensing. The use of FL-GPR from unmanned aerial vehicles is a promising future research area with a number of challenges, such as antenna design. Other open issues concern the detection of nonmetallic targets and real-time operation under realistic conditions. As in many other remote sensing fields, machine learning is attracting interest relevant to FLGPR. Multiplatform data fusion under communication and computation constraints is another important research area. Copious amounts of data do not necessarily mean large quantities of information. Quantifying the information content in Earth science and climate data can be difficult as the application of information theory requires a good estimation of the probability densities. For many types of remote sensing data, producing the density estimate is problematic. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
Johnson et al. review work on “Gaussianization” methods to produce statistics that can be used to estimate informationtheoretic measures (e.g., entropy, total correlation, and mutual information). This methodology scales to high dimensions, uses a simple orthogonal transform, and does not assume any parametric form for the density. This approach is demonstrated on several distinct types of data, including radar backscattering intensities, ­hyperspectral data, and aerial optical images. It is also applied to quantify the information content of soil–vegetation status in agroecosystems. Code and demonstrations of the implemented algorithms and IT measures are provided. Next we have a literature review on the use of wireless sensor networks for precision agriculture, focusing on Latin America. Huerta et al. describe how these networks have been applied to improve traditional agricultural processes in the region by monitoring the weather and environment in a noninvasive manner. They document the growth and global distribution of publications on this topic and the benefit of this technology to the agricultural industry in terms of time, production, and environmental factors. Our last feature concern spectral variability in Hyperspectral images. Bayesian, parametric and local endmember (EM) techniques have been developed to address this problem. A literature review covers both classic and recent approaches and provides a new taxonomy to organize these methods from the perspective of the user, based upon the necessary amount of supervision and the computational cost. The article concludes with an outline of future research directions. “Women in GRSS” reports on the IGARSS GRSS Diversity Fireside Chat, in conjunction with the 2021 IEEE International Geoscience and Remote Sensing Symposium (IGARSS) conference, and the Women in Engineering (WIE) International Leadership Conference, both held virtually this year. This issue contains two Technical Committee (TC) columns. The first, from the Information Analysis and Data Fusion TC, presents results from this year’s data fusion contest with the theme of “Geospatial AI for Social Good.” The second one is from the Frequency Allocations in Remote Sensing TC, which reviews items relevant to microwave remote sensing on the World Radiocommunication Conference agenda for 2023. The student branch of the University of Chinese Academy of Sciences, established in 2013, is featured in our Chapters column. The “Education” column reports on the “Green in the City” high school program targeting 16- and 17-year-old pupils in Flanders, Belgium, and held in conjunction with IGARSS. Lastly, I am sorry to report on the loss of two very active members of the geoscience and remote sensing community: Tom von Deak, who worked to ensure that radiofrequency spectrum needs for Earth science remote sensing (continued on p. 7) 1 year free PPK for UAV applications - Try it! QUANTA - Direct Georeferencing » Cost-effective and Full-featured solution » Real-time and Post-processing » Land and Aerial mapping projects www.sbg-systems.com DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 5
PRESIDENT’S MESSAGE BY DAVID KUNKEE GRSS Accomplishments in 2021: Success and Unexpected Turns I n this last message of the year, I think it is important to summarize accomplishments of the Society in 2021 and describe our preparations for success in 2022. By the numbers, our Society continues to grow. We added two student branch Chapters in September with another one expected in November. When this one receives its final approval, the IEEE Geoscience and Remote Sensing Society (GRSS) community will total 70 Chapters, 28 student branch Chapters, and 10 ambassadors engaged with communities in locations where we hope GRSS Chapters will form in the near future. This means that the GRSS community now consists of 98 combined Chapters with membership in the Society surpassing 5,300 members and submissions to our journals surpassing expectations. The numbers confirm continued success for GRSS in 2021, but this year also brought some unexpected twists and turns with high expectations at the beginning to immediately resume in-person meetings. In the spring, GRSS completed transition of the website to a new provider and updated its appearance and structure. I hope that it offers more content and is more straightforward to use when navigating the website. The process of improving the website is ongoing, and we are continuing to transition and add material. It is great to see this result from many past planning sessions and discussions. Extensive preparation by the 2021 IEEE International Geoscience and Remote Sensing Symposium (IGARSS) team to address numerous possible outcomes this summer about the state of COVID-19 worldwide resulted in a decision to pivot from the planned hybrid meeting format at the beginning of the year to a fully virtual meeting. IGARSS 2021 again enjoyed record attendance and continued success overall, including an in-person drone workshop and an evolved meeting format. We are leveraging lessons learned from the past two IGARSSs to provide the best Digital Object Identifier 10.1109/MGRS.2021.3129110 Date of current version: 14 January 2022 6 experience for the upcoming IGARSS 2022, which is currently expected to have both online and in-person content for those who can attend the meeting in Kuala Lumpur. This past year, GRSS education and outreach activities also expanded to include schools offered in all seasons, not just summer, and we expanded GRSS course offerings through the IEEE Learning Network. Don’t forget our cool videos, and now the second season’s sponsoring our “Down to Earth” podcast. Before the end of the year, we also plan to reinstate in-person engagements with our booth at the upcoming American Geophysical Meeting in December. Also underway is the third GRSS Student Grand Challenge. This activity is a collaboration between the Van Allen Foundation of the University of Montpellier and IEEE GRSS. The combined activity consists of four projects overall: REmote Sensing detection of Plastic POllution in the Gulf of LIons, optiCal floAt for PlasTic quAntIficatioN, Remote Identification of Microplastics using Ocean Surface Anomalies, and Micro-PLAStic in the SEA Detection experiment, with GRSS facilitating the latter three projects. The project kickoff meetings, with three of the four participants of this collaboration, were held in October, with the fourth project kickoff meeting anticipated for November. Tracking plastics and debris in our oceans is an important topic requiring a multidisciplinary approach. The value of a wide variety of data, both remotely and in situ sensed, needs to be assessed. New sensors may be needed to supply data to better understand the problem, assessments for decision makers, and design support to better control the problem and its impacts. The four projects underway focus on different approaches to the overall problem and possible mitigations. It is exciting to see the enthusiasm and value of these different approaches coming together. In November, GRSS cosponsored the 2021 Asia Pacific Conference on Synthetic Aperture Radar, which was held in Bali, Indonesia, thanks to conference organizers IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
Josaphat Sumantyo, the GRSS Instrumentation and Future Technologies Technical Committee, and Arifin Nugroho, chair of the GRSS Indonesian Chapter. Thanks also for keynote presentations provided by GRSS Administrative Committee (AdCom) members Alberto Moreira and Paul Rosen. Next, I would like to announce that in September, GRSS selected the proposing team from Brisbane, Australia, to host IGARSS 2025, which is now planned for early August 2025. Congratulations to Prof. Xiuping Jia, Prof. Jeffrey Walker, and Prof. Jocelyn Chanussot, general cochairs of IGARSS 2025. I would also like to thank all the teams that participated in the competition for their hard work and preparation. We recognize and appreciate your efforts, and we hope that you will continue to support the longterm success of IGARSS. The call for proposals for IGARSS 2026 from IEEE Regions 1–7 and 9 has now been posted on the GRSS website. Interested groups should submit a letter of intent and a preliminary proposal (preproposal) to the vice president of meetings and symposia of the GRSS at vp_meetings_symposia@grss-ieee.org by 1 March 2022. I am also happy to report that GRSS now has a published standard (IEEE 4003-2021) on IEEE Xplore describing global navigation satellite systems reflectometry data sets. This standard is notable not only because it was developed almost independent of industry representatives but also because a draft for balloting was produced in two years despite changes in leadership. The GRSS Standards Committee has several more IEEE standards projects ongoing. In future AdCom meetings, GRSS may consider further defining the role of standards activity as it relates to the Society’s core mission. In 2021, GRSS leadership held four additional executive sessions of AdCom meetings. These online sessions provided some extra time for discussion on important topics, which has been difficult due to the inability to hold in-person meetings. This year, all AdCom meetings were again held virtually due to the continuing changing nature of the global pandemic, although we are now planning to restart in-person meetings, beginning with our spring AdCom meeting in March. From our recent November AdCom meeting, some key decisions include the adoption of changes to our Bylaws and FROM THE EDITOR GRS (continued from p. 5) were represented in international proceedings, and Dr. Gail Skofronick-Jackson, NASA program manager, IEEE Fellow, former Administrative Committee member, and leader of WIE activities. Their memorials begin on page 289. As I have mentioned in the past few issues, IEEE Geoscience and Remote Sensing Magazine has now implemented a two-stage review process to give more timely feedback to potential authors. Short (five pages or fewer, excluding references) white papers will be submitted first. These will then be reviewed by associate editors or members of the editorial board. Following a positive review of the white DECEMBER 2021 Operations and Procedures (OPs) Manual that better reflect the Society’s practice and help ensure transparency in our future operations. The scope of these changes included the addition of required clauses and conditions for our GRSS awards committees, changes to the conference advisory committee charter, and reduction of the GRSS past-president term of office from three to two years. The roles of social media chair and social media ambassadors were also codified in our documents. Finally, additions to the OPs manual in November defined the terms of our associate and topical associate editors. Please look for these updates and additions to the GRSS Bylaws and OPs Manual on our website. There is a requirement from IEEE to allow a 30-day review period for changes to the Bylaws before they become effective. Considering the scope of the November meeting, I would like to thank the AdCom for their many contributions to GRSS activities throughout the year as well as their time spent preparing and reporting at all of the meetings throughout the year. Of note, the November AdCom meeting included 15 portfolio topics with 52 live presentations and 20 consent agenda presentations. The live meeting was held in short sessions spread over three days to cover the scope of activities within the Society. It was clear from listening to the many speakers at the November meeting that the level of activity is continuing to grow with our membership. To conclude my December letter, it is with a very heavy heart that I forward news of the passing of Dr. Gail Skofronick-Jackson due to an accident while she was on the island of St. Croix in the U.S. Virgin Islands. Gail was a close friend and colleague to many of us within GRSS, NASA, and the international Earth science community. Within GRSS, she served as a member of the AdCom from 2012 to 2016, was a member of the organizing committee of IGARSS 2020, and for several years organized and led GRSS Women in Engineering activities. Gail was a brilliant scientist, continually enthusiastic to learn about the world around us, and always very thoughtful of others. I am grateful for all of the times we were able to share with her at our various meetings and activities. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE paper, authors may be invited to submit a full manuscript, which will then undergo a complete peer review. Contributions to our regular columns; (“Chapters,” “Space Agencies,” “Women in GRSS,” “Education,” “Software and Data Sets,” and “Conference Reports”) are always welcome. White papers, columns, and invited manuscripts should be submitted through manuscript central at http://mc.manuscriptcentral.com/grsm. Proposals for special issues should be sent to me directly at jlg@ieee.org. Please continue to stay safe! GRS 7
Methods for Small, Weak Object Detection in Optical High-Resolution Remote Sensing Images A survey of advances and challenges WEI HAN, JIA CHEN, LIZHE WANG, RUYI FENG, FENGPENG LI, LIN WU, TIAN TIAN, AND JINING YAN O bject detection that focuses on locating objects of interest and categorizing them has long played a critical role in the development of remote sensing imagery. Following significant improvements in Earth observation technologies, the objects in high-resolution remote sensing (HRRS) images show additional detailed information and more complex patterns. Some applications, such as urban monitoring, military reconnaissance, and national security, have urgent needs in terms of identifying small-scale (small) and weak-feature-response (weak) objects. However, these kinds of objects usually take up the small proportion of an image that has enough of its own variations in color, shape, and texture so that the objects’ features are easily affected by weather, illumination, and occlusion. These characteristics of small, weak objects make their detection a more challenging task than generic object detection. This article comprehensively reviews the existing challenges and corresponding technologies for addressing that task and its specific problems. INTRODUCTION Object detection in remote sensing images aims at locating objects of interest on the ground and categorizing them. The term object generally refers to man-made or highly structured bodies (vehicles, buildings, ships, and so forth) that are independent of complex background environments as well as landscapes. As a fundamental task in the field of satellite and aerial image analysis, object detection plays an important role in a wide range of applications, such as urban planning, geographic information processing, precision agriculture, and environmental monitoring. Digital Object Identifier 10.1109/MGRS.2020.3041450 Date of current version: 25 January 2021 8 0274-6638/21©2021IEEE IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
©SHUTTERSTOCK.COM/WILLEM In the past 20 years, the increasing image interpretation accuracy of these applications has enabled them to meet the requirements needed in actual scenarios and thus significantly promotes the development of Earth observation technologies and object-detection approaches. The spatial, temporal, and spectral resolutions of Earth observation sensors have also been greatly improved [1]–[3]. For instance, the images from Google Earth (Google Inc.) [4] have resolutions of up to approximately 0.5 m. WorldView-3 (DigitalGlobe, Inc.) [5] provides a 0.31-m panchromatic resolution and a 1.24-m multispectral resolution. These HRRS images show more texture and shape and additional detailed information about geospatial objects as well as complex spatial patterns. The data volume of HRRS images has also dramatically increased, and a massive number of images is now accessible. The advantages of HRRS images are that they can offer the most economical and efficient way to achieve full-time, high-precision Earth surface monitoring with global coverage and fast detection of small-scale (small), weak-feature-response (weak), and nonuniformly DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE (sparsely or densely) distributed objects is of great significance when meeting the requirements of real scenarios in many special applications, such as military reconnaissance, national security, urban monitoring, and geological disaster monitoring. Unlike natural images, which are often clearer and contain several categories of objects, HRRS images cover an extensive range of the Earth’s surface and involve a massive number of objects. The objects vary in scale, color, shape, and texture; their features are easily affected by weather, illumination, occlusion, and imaging conditions. In addition, the great distance between the sensor and targets means that some kinds of targets occupy only a few to dozens of pixels in the imaging plane and are presented as small objects that can easily be overwhelmed by a bright background [6]. Objects of this kind are usually characterized by a low signal-to-noise ratio (SNR) and inadequate structure information, which is presented as a weak feature response. These characteristics make the detection of small, weak targets a more challenging task in remote sensing. The past decade has witnessed major advances in object detection in remote sensing images. At an early stage, various models based on prior knowledge [7]–[10] were proposed for target detection in satellite images. As image resolution increases, prior-knowledge-based models increase in uncertainty because the high complexity of HRRS images tends to cause limited detection accuracy. More recently, various forms of machine learning (ML) approaches [11] have played a critical role in object detection. With the increasing availability of big data and remarkable advances in data mining, novel methods have come into use for HRRS image processing. Deep learning (DL) models [12]–[15] have attracted serious attention and become dominant tools for processing large-scale, high-dimension data; they have achieved satisfactory accuracy for several tasks in the field of remote sensing. By stacking multiple nonlinear layers, DL models extract semantic information about objects as well as the context relationships among them and the background. DL models demonstrate superiority in the extractions and fusions of multiscale features and have therefore outperformed the early models, with significant developments in remote sensing object representations. In recent decades, many works have presented ML- and DL-based models, leading to the creation of a series of benchmark data sets for promoting remote sensing and small, weak object detection [16]–[19]. Although several survey papers on object detection have been published, they have focused mainly on detection technologies from the image-processing aspect [20], [21] or on reviewing some categories of approaches, such as ML[11] and DL-based methods [19], or some specific detection problems and tasks, including vehicle detection [22] and salient methods [23]–[25] for remote sensing object detection. There is still the lack of a comprehensive review of existing works that addresses the problems of small, weak object detection. Based on the aforementioned analysis, this article 9
concentrates on challenges to and recent advances in addressing these problems and can be summarized as follows: ◗◗ This article systematically analyzes the challenges of small, weak target detection. According to their causes, the challenges have been divided into three aspects: image quality, object variations, and complex context. ◗◗ The technical evolution of object detection, including main developments in the fields of computer vision and remote sensing, is comprehensively involved; the existing benchmark data sets and their contributions to small, weak object detection are introduced and analyzed. ◗◗ The existing works that address the various challenges are also summarized, and some promising research directions into further improvements to small, weak object detection are discussed. DIFFICULTIES AND CHALLENGES IN REMOTE SENSING SMALL, WEAK OBJECT DETECTION GENERIC OBJECT DETECTION IN REMOTE SENSING Object detection, a fundamental and essential task, has attracted broad attention over the past decades. The task is defined as follows: given a remote sensing image, determine whether it includes instances of objects from predefined categories, and, if it does, predict the spatial location and the extent of each instance [27]. Although thousands of geospatial objects occupy optical remote sensing images, research scholars interested in this topic use the term objects to refer to human-made or highly structured bodies (e.g., ships, vehicles, and airplanes) that have shape boundaries and are independent of the background environment and landscape items [11] rather than unstructured bodies or scenes, such as the sky or clouds. Generally, the spatial location and extent of an object can be defined using a bounding box (BB) (a horizontal or orientation rectangle tightly bounding the object) or a precise pixelwise segmentation mask, as shown in Figure 1. Over the past several years, BB annotation has become the most widely used method for evaluating detection performance in remote sensing images; it can define the location of an object by the corner coordinates of a rectangle. The main advantage of this type of annotation is that it focuses on locating only objects of interest, ignoring the context. Therefore, it can greatly save labor costs (a) (b) (c) (d) FIGURE 1. The different annotation types in the HRSC2016 [26] data set. (a) The original image, (b) the HBBs’ annotation, (c) the OBBs’ annotation, and (d) the pixelwise segmentation mask. 10 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
in labeling data and is available to quickly create largescale object-detection data sets for specific applications. The precise pixelwise segmentation mask is an annotation method, wherein each pixel in the image is assigned a category label, such as forest, farmland, road, or background. This type can be applied in scenarios in which the environmental context is important. This type of annotation requires more expert knowledge and labor to be successful. Due to the massive number of object categories and instances, complex backgrounds, and a large data volume of HRRS images, the precise pixelwise segmentation mask annotation is rarely used in large-scale remote sensing target detection. There are two types of widely used BB annotation methods: horizontal BBs (HBBs) and orientation (rotation) BBs (OBBs). In Figure 1, HBBs (the axis-aligned rectangle) were first used to localize objects. However, objects in HRRS images often appear in arbitrary orientations and may be densely distributed. In some extreme but common scenarios, this annotation method involves both the background and targets of interest; it cannot accurately or compactly outline the locations of objects and may decrease detector performance. The annotation method of the OBBs, which can be regraded to add angles to the HBBs, is utilized to gain a tight bounding for the rotation object. For this article, we review mainly methods with these two types of BB annotations. DIFFICULTIES AND CHALLENGES IN SMALL, WEAK OBJECT DETECTION Relevant works for small, weak object detection of infrared images started to appear long ago, when the spatial resolution of remote sensing images was relatively low and infrared images were the main data source for object detection. Many works have addressed solutions to such problems [28]–[32]. Related works covering the analysis of object detection in infrared images [33]–[35] originally defined a small object as one with a total spatial extent of fewer than 80 pixels (a width of fewer than nine pixels), which is less than 0.2% in an image of 256 × 256 pixels. As shown in Figure 2, the long distance of imaging means that the target takes up only a few dozen pixels in the imaging plane, presenting as a small target. Objects of this kind are basically shapeless and have no available texture features. Small objects are usually characterized as having a low SNR, small size, and no adequate structure information for the undulant clutter and imaging distance. These characteristics make small targets very difficult to detect, and small targets are easily overwhelmed by a bright background [6]. Therefore, a small, weak object is more formally defined as 1) small: the scale of the target is small, or the target’s proportion of the total images is low; and 2) weak: the features of the target are insufficient and easily affected by its background. Thanks to the acquisition of HRRS images and the requirements of the related applications, small, weak object detection has attracted increasing attention. Although numerous efforts have been made to develop detectors and DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE benchmark data sets for promoting the development of small, weak object detection in HRRS images, there is still no consensus on the definition of small, weak object detection. Kang et al. [16] proposed a complex background benchmark wherein vehicles were set as the small objects. VEDAI (for “vehicle detection in aerial imagery”) [17] is a data set created for small target detection, but the authors did not propose a specific definition for a small object. Xia et al. [18] proposed a new large-scale benchmark for HRRS image detection, dividing the object instances into three classes according to the pixel width of their BBs: small for a width from 10 to 50, middle for a width from 50 to 300, and large for a width of greater than 300. This was the first work to clearly define a scale for small objects in HRRS images. For the aspect of the weak feature response of objects, no related work has proposed sufficient discussion or drawn a clear conclusion. In this section, we comprehensively consider the factors that affect detection performance and then summarize the difficulties of and challenges to small, weak object detection in HRRS images. In Table 1, each influencing factor is examined from the three aspects of image quality, object variations, and complex context as follows. 1) Image quality: In the process of HRRS image acquisition, the imaging environment, satellite platform, optical system, FIGURE 2. Small, weak object detection in infrared images. TABLE 1. THE CHALLENGES AFFECTING SMALL, WEAK OBJECT DETECTION. THREE ASPECTS SPECIFIC CONTENT Image quality Mixed noise, patch missing, occlusion caused by cloud, fuzzy, shadow, and multisource data Object variations Small size, high intraclass variations, a change of the object features caused by illumination and background, antagonism of the background and the target, a lack of annotation samples, nonuniform distribution, and an imbalance of positive and negative training examples Complex context Many types and quantities of background targets and complex distribution patterns 11
and electronic equipment may affect the image quality, which leads to a certain degree of degradation of the acquired images. As presented in Figure 3, these images cannot fully meet the requirements of precise interpretation in real-world applications. There are two main categories of factors that degrade the image quality. The first one is the factors that possibly appear in the imaging process, such as noise, blurring, cloud occlusion, missing information, shadow, and so on. These kinds of factors are the main reason for remote sensing image degradation. Another category of factors arises from the limitations of sensor production technologies and application scenarios. Because spectral, spatial, and temporal resolutions are often mutually restricted, imaging sensors can achieve high resolution in only one of these three aspects. For these kinds of low-quality images, some methods for improving the image quality should be applied. Multisource satellite data with different resolutions should be complementary to obtain the required data. 2) Object variations: An HRRS image can cover an extensive area of the Earth’s surface and contain many kinds of objects. The scale variations of object instances in HRRS images are great, and some objects are ver y small. As depicted in Figure 4, some objects always take a small proportion of a total image and show weak feature response; for example, the width of a small ship can be fewer than 25 pixels. Different resolution, scale, color, shape, and texture changes residing within a single category create high intraclass variations for ob- (a) jects. These kinds of small-object instances may likely crowd into a specific region of aerial images. Additionally, HRRS images are noisy, and the features of objects easily change when affected by weather, illumination, and occlusions. Some specific targets are adversarial and camouflaged, making them difficult to identify effectively. Another critical problem is that the annotation samples may be insufficient. At present, there are more than 2,000 satellites in orbit around the world; they generate more than a petabyte of data every day. However, there are roughly only 100 GB of annotated data for target detection. 3) Complex context: Generally, the background and context of objects of interest are complex and crowded with other type of objects, as displayed in Figure 5. Natural images are often taken from horizontal perspectives, while HRRS images are typically taken as bird’s-eye views; this implies that many objects of interest form complex spatial patterns with the background. The intricate patterns increase the difficulty of object detection in HRRS images. Considering the three aspects of the challenges, remote sensing small, weak objects can be defined as 1) data quality, i.e., HRRS images for small, weak object detection may be of low quality due to the noise, illumination change, occlusion, and so forth introduced in the imaging process; (2) objects, i.e., they are of small scale, have weak feature response with many categories showing high intraclass variations and a nonuniform distribution, and may lack annotation samples; (3) context, i.e., the context is complex and changeable, (b) (c) IKONOS (0.8–1 m) (d) (e) WorldView-3 (0.31 m) (f) FIGURE 3. Some problems caused by image quality: (a) blur, (b) noise, (c) missing information (d) cloud, (e) shadow, and (f) multisource data. 12 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
(a) (b) (c) (d) (e) FIGURE 4. Some problems caused by object variations: (a) small-scale weak feature; (b) high intraclass variations; (c) multiclass densely distributed instances; (d) occlusions; and (e) camouflage and adversariness. and targets are easily hidden in the background. All of these characteristics make small, weak object detection a more challenging task than generic object detection. To promote its development, more work is needed to address these different aspects of the challenges and their difficulties. A REVIEW OF HRRS OBJECT-DETECTION BENCHMARK DATA SETS AND PERFORMANCE EVALUATION HRRS OBJECT-DETECTION BENCHMARK DATA SETS Throughout the development of object detection and recognition, data sets have played a critical role not only as common resources for the evaluation and verification of algorithm performance but also in pushing research into increasingly complex and challenging problems [20]. Over the past decade, in particular, detection and recognition methods based on DL have achieved tremendous success in addressing visual-understanding problems in the computer vision community; large amounts of annotation data, including Pascal visual object classes [44], ImageNet [45], Microsoft common objects in context (COCO) [46], and Open Images [47], have played a key role in this success. The development of Earth observation technologies and access to a large number of HRRS images make it possible to build large-scale data sets for capturing the vast richness and diversity of objects, promoting unprecedented performance in remote sensing object detection. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE In past decades, research groups in remote sensing have released many public data sets with different characteristics for solving different problems. There are 13 widely used data sets: the Institute for Computer Science and Control in Hungary-Inria (SZTAKI-Inria) [36], Northwestern Polytechnical University very high resolution (NWPU VHR)-10 [37], Chinese Academy of Sciences UCAS-AOD [38], road scene object detection (RSOD) [40], data set for object detection (DOTA), object detection in aerial images (ODAI) [18], VEDAI [17], high-resolution ship collection 2016 (HRSC2016) [26], 3K vehicle [16], cars overhead with context (COWC) [39], xView [41], HRRS detection (HRRSD) [42], [43], and detection in optical remote sensing images (DIOR) [19]. The attributes of (a) (b) FIGURE 5. The problems introduced by complex contexts. (a) A complex context and (b) massive background objects. 13
these data sets are listed in Table 2 for comparison. The development trend and some representative small, weak targets of the data sets are displayed in Figures 6 and 7, respectively. Each data set is introduced in this section. The things-andstuff data set [48] is excluded from this discussion because of its relatively low spatial resolution. SZTAKI-INRIA This benchmark data set, from SZTAKI and the Inria Sophia Antipolis-Méditerranée Research Center in France [36], was created for building detection and is a multisensor aerial set from QuickBird, IKONOS, and Google Earth [4]. It contains nine images and 665 building instances, annotated with oriented OBBs. The images of the data set have three bands: red, green, and blue (RGB). NWPU VHR-10 This available 10-class geospatial object-detection data set from NPWU in Xi’an, China, is used for research purposes [37]. The object classes are airplane, ship, storage tank, baseball diamond, tennis court, basketball court, ground track field, harbor, bridge, and vehicle. The data set contains 800 VHR remote sensing images cropped from the Google Earth and Vaihingen data sets, which are then manually annotated by experts into 3,775 instances with HBBs. The image resolutions range from 0.5 to 2 m. TABLE 2. COMPARIONS OF THE AVAILABLE BENCHMARK DATA SETS IN EARTH OBSERVATION COMMUNITY. DATA SET NAME TOTAL CATEGORIES IMAGES INSTANCE IMAGE WIDTH DATA SOURCE RESOLUTION ANNOTATION YEAR CHARACTERISTICS SZTAKI-INRIA [36] 1 9 665 ~800 Quick Bird, IKONOS, and Google Earth 0.5–1 m OBBs 2012 Single category, highresolution satellite images, multiple sensors NWUP-VHR10 [37] 10 800 3,775 ~1,000 Google Earth 0.3–2 m HBBs 2014 Multiple categories, clean background UCAS-AOD [38] 2 910 6,029 1,280 Google Earth 0.3–2 m HBBs 2015 Airplane and vehicle detection VEDAI [17] 9 1,210 3,640 1,024 Utah AGRC 0.125 m OBBs 2015 Small-scale objects, multispectral and multiresolution images, illumination changes 3K vehicle [16] 2 20 14,235 5,616 DLR 3K camera 0.13 m system OBBs 2015 Small-scale objects, VHR images COWC [39] 1 53 32,716 2,000– 19,000 Six sources 0.15 m Dot 2016 Small-scale objects, multisensor images HRSC2016 [26] 1 1,061 2,976 ~1,000 Google Earth 0.4–2 m Three types 2016 Sufficient object variations, complex background RSOD [40] 4 976 6,950 ~1,000 Google Earth, Tianditu 0.3–3 m HBBs 2017 Multisensor and multiresolution images DOTA [18] 15 2,806 188,282 800– 4,000 Google Earth, JL-1, and GF-2 0.3–1 m HBBs and OBBs 2018 Multisensor and multiresolution images, nonuniform distribution, many object categories, sufficient object variations ODAI [18] 16 2,806 ~400,000 800– 4,000 Google Earth, JL-1, and GF-2 0.3–1 m HBBs and OBBs 2019 Improved version of DOTA, more instances and categories, especially for small, weak objects xView [41] 60 1,128 ~1,000,000 2,000– 4,000 Worldview-3 0.3 m HBBs 2018 Complex background, many categories, massive instances, dense distribution, noise, blur, occlusion HRRSD [42], [43] 13 21,761 55,740 ~11,000 Google Earth 0.15–1.2m HBBs 2019 Many categories, many instances, sufficient variations DIOR [19] 20 23,463 192,472 800 0.5–30 m HBBs 2019 Complex background, many categories, noise, blur, occlusion Google Earth DLR: German Aerospace Center; COWC: cars overhead with context. AGRC: Automated Geographic Reference Center. 14 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
UCAS-AOD This UCAS data set, collected from Google Earth, contains two detection classes: airplane and vehicle [38]. The airplane category has 600 images with 3,210 instances, while the vehicle category has 310 images with 2,819 vehicles. VEDAI This data set was created for the task of multiclass vehicle detection in satellite images [17]. It consists of nine categories with a total of 3,640 instances, including boat, car, camping car, plane, pickup, tractor, truck, van, and a category labeled “other.” The data set has 1,210 images, each of which is 1,024 × 1,024 pixels with VHR (12.5 cm). VEDAI is provided as a tool to benchmark automatic target-recognition algorithms in unconstrained environments. The vehicles contained in the database, in addition to being small, exhibit different characteristics, such as multiple orientations, illumination/shadowing changes, peculiarities, and occlusions. Furthermore, each image is available in several spectral bands and resolutions. 3K VEHICLE This data set is another of those used for vehicle detection [16]. It has 20 images with 5,616 × 3,744 pixels and a spatial resolution of 13 cm. It contains 14,235 vehicles with OBBs. The images were captured by the German Aerospace Center 2012 Multiple Class High Resolution SZTAKI-INRIA 2014 NWPU VHR-10 Small Scale Illumination/Shadow Higher Resolution UCAS-AOD 3K Vehicle Detection VEDAI 2015 COWC 2016 HRSC2016 ROSD 2017 2018 DOTA and ODAI Complex Background Multiple Scales xView Massive Instance More Categories Dense Distribution Multisensor Data Sufficient Variations 2019 HRRSD DOIR FIGURE 6. The development trend of existing HRRS data sets. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 15
3K camera system (a real-time airborne digital monitoring system) at a height of 1 km above the ground. from Google Earth and Tianditu, and its resolutions range from 0.3 to 3 m. CARS OVERHEAD WITH CONTEXT Also created for vehicle detection, the cars overhead with context (COWC) data set images are standardized to 12.5 cm per pixel at ground level from their original resolutions [39]. The set contains 32,716 unique cars from six sources: Toronto, Canada; Selwyn, New Zealand; Potsdam and Vaihingen, Germany; Columbus, Ohio; and Utah, the United States, covering different geographical locations and produced by different imaging sensors. The car sizes range from 24 to 48 pixels. Two of the sets (Vaihingen and Columbus) are in gray scale; the others are in RGB color. It should be noted that each car in the annotated images has a dot placed on its center. HRSC2016 HSRC2016 is a benchmark data set for boat detection [26]; it has 1,070 images and 2,976 instances from Google Earth with HBB annotations. The image sizes vary from 300 × 300 to 1,500 × 900 pixels. The images contain large variations of scale, position, shape, and appearance. DOTA AND ODAI DOTA is a larger-scale data set with HBB and OBB annotations [18]. It contains 2,806 large images and classifies objects into 15 categories, including baseball diamond, ground track field, small and large vehicles, tennis court, basketball court, storage tank, soccer field, roundabout, swimming pool, helicopter, bridge, harbor, ship, and plane. The fully annotated DOTA contains 188,282 object instances, which vary greatly in scale, orientation, and aspect ratio; the resolutions range from 0.3 to 1 m. The images are collected mainly from Google Earth [4], but some are taken from JL-1 and the rest from GF-2 of the China Center for Resources Satellite Data and Application. ODAI is an updated version of the DOTA data set and contains 0.4 million annotated object instances in 16 categories. Both DOTA and ODAI use the same aerial images, but ODAI has revised and updated the annotation of objects, adding many small-object instances (approximately 10 pixels or fewer) that were missed in DOTA and extending the categories by adding a new one: a container crane. RSOD RSOD consists of 976 images and 6,950 object instances involving four categories [40]. The data set was collected XVIEW This is one of the largest published aerial data sets, covering 60 object classes [41]. It contains images from complex Ship Wind Mill (a) Ground Track Field (b) FIGURE 7. Small, weak object examples of existing HRRS data sets. (a) Small, weak objects in large-scale data sets and (b) small, weak objects in small-scale data sets for vehicle detection. 16 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
scenes and more than one million object instances with HBB annotations. Compared with images in existing HRRS data sets, xView images are high resolution, multispectral, and labeled with a greater variety of objects. The images collected from WorldView-3 have a resolution of up to 0.3 m. The variations of scale, color, shape, and texture make the data set more challenging to the remote sensing community. small objects. They can supplement the large-scale data sets mentioned previously. However, some critical limitations in these data sets are that the object categories are few, including only cars or airplanes, and the image quality is high, which is not consistent with actual scenarios. VHR data sets with more categories, variations, and challenges need to be developed further. HRRSD The HRRSD data set is a large-scale benchmark with 21,758 RBG images extracted from Google Earth and has spatial resolutions ranging from 0.15 to 1.2 m [42], [43]. There are 13 categories of objects, which allows this to be considered an extended version of the NWPU VHR-10 data set with additional classes, such as crossroads, parking lots, and T junctions. This data set is class balanced, and each category has 3,700–5,000 instances. EVALUATION METRICS There are two categories of metrics for evaluating detector performance: detection speed in frames per second (FPS) and detection accuracy in precision, recall, and average precision (AP). FPS is a metric used to express how fast the detector is; it means the number of image FPS that the detector can process. For example, if the time needed for a detector to analyze a standardscale image is 0.04 s, its detection speed is a frame rate of 25 FPS. For a given input image I, the outputs of a detector are the predicted results {(b j, c j, p j)} Mj = 1 (indexed by the object order j; M is the number of predicted detections) of the BB b j, predicted label c j, and confidence score p j. The groundtruth boxes are {(B k, C k)} kN= 1 (indexed by the order k; N is the number of ground-truth boxes) of the BB B k and label C k. {(b j, c j, p j)} Mj = 1 are greedily matched to {(B k, C k)} kN= 1. For given a confidence threshold t and a intersection over union (IoU) threshold e, a predicted result (b j, c j, p j) is set as a true positive (TP) if the following criteria are met: ◗◗ The predicted label c j is equal to the label C k of a ground-truth box (B k, C k), and p j is greater than t. ◗◗ The IoU value between the predicted BB b j and the ground-truth BB B k, IoU (b j, B k), is larger than e, where IoU (b j, B k) is computed as DIOR The DIOR data set is a recently released aerial DOTA [19]. It contains 23,463 images with 800 × 800 pixels and 192,472 instances labeled with HBBs. The images were collected from Google Earth and have resolutions ranging from 0.5 to 30 m. This data set has sufficient variations of scale, weather, seasons, imaging conditions, and quality as well as high interclass similarity and intraclass diversity. It is also one of the larger-scale data sets, with massive images and object instances. COMPARISON As shown in Figure 6, early HRRS data sets, such as SZTAKIIRIA [36] and NWPU VHR10 [37], contained a small number of categories and instances for the detection of large or easily recognized objects. After several years, scholars have forged ahead to introduce massive numbers of instances and many categories: multisensor data, complex context, and low-quality images to create large-scale challenging data sets, such as xView [41], DIOR [19], HRRSD [42], [43], and DOTA [18], which are becoming more and more in line with the conditions of actual applications. These four satellite data sets contain more than 13 object categories and more than 50,000 object instances, with resolutions ranging from 0.3 to 30 m, all available for the development detectors that adapt to large-scale object detection. Some representative small, weak examples from the aforementioned data sets are collected in Figure 7(a). It can be seen that the objects in the large-scale data sets, such as ships and windmills, are very small scaled, weak featured, and easily affected by the context and low-quality data. There are data sets that are very challenging and suitable to develop the detectors for small, weak object detection. VEDAI [17], 3K vehicle [16], and COWC [39] are three relatively small-scale data sets used for vehicle detection. As shown in Figure 7, their images have VHR (up to about 12.5 cm) and their objects, which are fixed to a range, are beneficial for developing and testing a model to detect DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE IoU (b j, B k) = area _ b j ( B k i ,(1) area _ b j ' B k i and the symbols of + and , denote the intersection and union, respectively. The value of e is generally set to 0.5. Otherwise, the predicted result is regarded as a false-positive (FP) sample. Precision is the proportion of correct detection instances out of the total detection results predicted by the detector. Based on the calculations of the TP and FP results, it can be computed by TP P (t) = TP + FP . (2) Recall is defined as the proportion of all positive instances indicated by a detector. It can be formulated by TP R (t) = N , (3) where N is the number of ground-truth boxes. Precision and recall can drive AP, which is the metric most used in recent works. AP is usually computed for each class separately. The precision, P(t), and the recall, R(t), can 17
TABLE 3. THE COMPARIONS OF FIVE CATEGORIES OF DETECTION METHODS. METHOD NAME MAIN CATEGORIES/MILESTONES HIGHLIGHTS LIMITATIONS Template matching Rigid template matching [7], [50], [51], deformable template [52]–[54] Simple and fast to implement, no training samples required Limited to the variations of object appearances, consumes more prior knowledge Knowledge Geometric information [8], [55], context knowledge [9], [56] Detects objects from coarse-to-fine hierarchical architecture, combines more prior information Defining the detection rules and knowledge is subjective, labor consuming OBIA Multiresolution segmentation [57]–[59] Flexible incorporation of different features, GIS-like functionality and expert knowledge Lacks generic solutions to the full automation of segmentation process, defining the classification rules is subjective and not robust Classical ML Features: HoGs [10], BoWs [60], texture Automatically establishes object-andfeatures [61], [62], and so on; classifiers: learn feature representation, better SVM [63], [64], AdaBoost [65], [66], kNN scalability and compatibility [67], and CRF [68], [69] Labels many training samples, detection accuracy depends on the training samples and the feature extractor DL Two stage: RCNN [70], SPPNet [71], fast RCNN [72], faster RCNN [73], RFCN [74], and so forth; one stage: YOLO [75], SSD [76], RetinaNet [77], and CornerNet [78] Labels a large number of samples, consumes massive computing resources End-to-end framework without manual intervention, automatically learns high-level features, adapts to large-scale complex image processing OBIA: object-based image analysis; GIS: geographic information system; HoGs: histogram of oriented gradients; BoWs: bag of words; SVM: support vector machine; kNN: k-nearest neighbor; CRF: conditional random field; RCNN: region-based convolutional neural network; SPPNet: spatial pyramid pooling network; RFCN: region-based fully convolutional network; YOLO: you only look; SSD: single-shot multibox detector. be computed as a function of the confidence threshold t; by varying the confidence threshold, t, different pairs (P, R) can be obtained; in principle, this allows precision to be considered as a function of recall from which the AP value can be found. The mean AP, the average of the AP values of all the object categories, has therefore been adopted as the final measure to evaluate the overall accuracy [44], [45], [49]. A BRIEF REVIEW OF OBJECT-DETECTION FRAMEWORKS Incredible progress has been made in feature representations and classifiers for object detection. In terms of feature Template-Matching Methods Classical ML Methods Deformable Template Matching Rigid Template Matching 1980 … 1995 DL Methods Mask RCNN SVM AdaBoost 1990 representation and recognition, an impressive change is the shift from handcrafted features to DL features. In terms of localization, the sliding-window stage is mainstream. However, the number of windows is extensive and increases dramatically with the number of image pixels, especially when processing remote sensing images. Therefore, scholars focus mainly on the design of effective and efficient objectdetection strategies; these include sharing-feature computations, cascading, reducing per-window computations, the fast localization of objects of interest, and the reduction of computational costs. In the following, we briefly review milestone works. (see Table 3 and Figure 8). Faster SPPNet RCNN BOWs HOGs 2000 2005 RCNN 2010 Fast RCNN 2015 FPN 2020 YOLO Context Knowledge Geometric Information SSD OBIA-Based Methods CornerNet RetinaNet Knowledge-Based Methods FIGURE 8. A road map of object-detection frameworks. SVM: support vector machine; BoWs: bag-of-words; HoGs: histogram of oriented gradients; RCNN: region-based convolutional neural network; SPPNet: spatial pyramid pooling network; FPN: feature pyramid network; SSD: single-shot multibox detector; YOLO: you only look once. 18 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
OBJECT-DETECTION METHODS BASED ON TEMPLATE MATCHING Methods based on template matching [11] are one kind of simple approach to object detection; they find matches in an input image, basing them on a series of predefined templates. The two main steps are 1) template generation, in which a template for each object category should be generated by manual design or learning from the training set, and 2) similarity measurement, in which, given an input image, the template is used to match the entire image at each possible position to find the matches. The methods have been classified into two groups: rigid template matching and deformable template matching. Early research concentrated mainly on rigid template matching, applying it to detect specific objects with simple appearances and small variations [7], [50], [51]. Because of its advanced ability to both impose geometrical constraints on the shape and integrate the local image evidence, deformable template matching is more powerful and flexible than rigid shape matching in processing shape deformations and intraclass variations [52]–[54]. Objectdetection methods based on template matching are simple and easy to implement for application to a specific task; expert knowledge is needed only to design them, and they do not need training samples. However, designing the templates calls for considerable prior knowledge and extensive computations; the templates are limited in their scale and rotation and shape viewpoint changes in objects. OBJECT-DETECTION METHODS BASED ON KNOWLEDGE Object-detection methods based on knowledge can transfer object detection into a hypothesis-testing problem by establishing various knowledge types and rules. The establishment of knowledge and rules is the most important step. Two widely used methods involve both geometric and context knowledge. The geometric information method is the most important and is widely used for early-target object detection; users can encode prior knowledge by taking parametric, specific, or generic-shape models [8], [55]. The context knowledge method is also crucial as the most widely used for object and background context and the relationships among objects and surrounding regions or objects [9], [56]. The methods of this kind enable users to perform the detection process through a coarse-to-fine hierarchical structure. However, decisions on how to define the prior-knowledge detection rules are subjective, and these factors pose critical challenges to the methods. Rules that are too loose cause false positives; too tight and they cause false negatives. OBJECT-DETECTION METHODS USING OBJECT-BASED IMAGE ANALYSIS With the increasing availability of submeter images, objectbased image analysis (OBIA) has been presented for classifying or mapping HRRS imagery into meaningful objects [57]–[59]. It contains two steps: image segmentation and object classification. First, imagery is segmented into DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE homogeneous regions (also called objects), representing a relatively homogeneous group of pixels; this is achieved by selecting the desired scale, shape, and compactness criteria. In the second step, a classification process is applied to these objects. An advantage of OBIA-based methods is that they exploit the knowledge of geographic information systems to overcome the limitations of pixel-based image-classification methods. The real challenges to the satisfactory performance of OBIA methods are in defining appropriate segmentation parameters for varying size, shape, and spatially distributed objects. In addition, accuracy assessments of OBIA are difficult, although many efforts have been made to address the problem. The technique’s advantages lie in its flexible incorporation of the shape, texture, geometry, and contextual semantic features as well as expert knowledge, making it context aware and multisource capable. Generic solutions to the full automation of the segmentation process are still missing, and the expert knowledge needed to decide how to define the classification rules is still subjective; these problems limit the technique’s adaptability to different tasks. OBJECT-DETECTION METHODS BASED ON CLASSICAL ML Due to the remarkable advances of ML techniques, especially their impressive feature representations and powerful classifiers, many recent approaches have taken object detection to be a classification problem, achieving significant improvements. ML object detection can be performed by training a classifier that captures the variations in object appearances and the views from a set of training data. The classifier takes a set of regions (object proposals or image patches) with their feature representations as the input; the output consists of their corresponding predicted labels. The most important components in the process of object detection are feature extraction, feature fusion, and classifier training. The dimension-reduction step is an optional operation. A histogram of oriented gradients (HoGs) feature [10], a bag-of-words (BoWs) feature [60], texture [61], [62], sparse representation-based [79], and Haar-like features [80] are common. The classifiers include support vector machines (SVMs) [63], [64], AdaBoost [65], [66], k-nearest neighbors [67], and conditional random fields [68], [69]. Methods based on ML can be automatically established using ML techniques. The scalability and compatibility are both greatly improved, but these methods need a large number of training samples to learn classifiers and are not suitable for large-scale data sets. In addition, the representation ability of the learned features is not sufficiently robust enough to deal with variations in an object’s appearance. DL-BASED DETECTION FRAMEWORKS We discuss DL detectors separately from the ML methods described previously because of the great success 19
of DL-based techniques in recent years. Deep convolutional neural networks (CNNs) can extract high-level feature representations of an input image and improve classification performance. Girshick et al. [70] took the lead, applying CNNs to object detection by developing region-based CNN (RCNN) features. Since then, many milestones have marked the unprecedented speed of the development of object detection. The main milestone approaches are reviewed in the following sections; they can be categorized into two classes according to the presence or absence of a proposal generation stage: two- and onestage detection frameworks. In the next sections, existing milestones of the two categories of detection frameworks are introduced first, and then the advances of DL-based detectors in small, weak object detection are reviewed. TWO-STAGE DETECTION FRAMEWORKS As depicted in Figure 9(a), for an input image, a two-stage detector would first examine DL features using a pretrained CNN architecture. Then, in the region proposal step, many regions of interest (RoIs), i.e., regions where a target may likely exist, would be generated. Finally, a detection head with a classifier and a regressor would simultaneously predict the location and category of a target for each RoI. The critical characteristic of two-stage detection frameworks is that they contain a prepressing component for generating object proposals. These kinds of detectors have dominated object recognition since the creation of RCNNs [70] due to their remarkable detection performance on benchmark data sets. REGIONS WITH CNN FEATURES The main principle of RCNNs [70] is that they first extract a set of object proposals (candidate boxes) using a selective search. The proposals are resized to a fixed scale and fed into a CNN model pretrained on ImageNet [12] to extract high-level features; for example, Visual Geometry Group [81], a residual neural network (ResNet) [13], and ResNeXt [82]. Then, a linear SVM classifier is used to predict the presence of an object and the object category for each proposed region. RCNNs have achieved remarkable improvement in natural image object detection, but they have obvious drawbacks; for example, the selective search strategy may generate more than 2,000 proposal candidates for one image, RPN For Each Pixel Position Whether There Is a Target Box Location For Each RoI Multiclass Classification BB Regressor Input Image Feature Extractor Feature Maps Feature Maps RoI Region Vector With Proposal Classification and Regression Output Results (a) For Each Grid Multiclass Classification BB Regressor Input Image Feature Extractor Feature Maps Feature Grid Classification and Regression FC Output Results (b) FIGURE 9. The main structures of mainstream frameworks. (a) An illustration of two-stage detection frameworks (using a faster RCNN as an example). (b) An illustration of one-stage detection frameworks (using YOLO as an example). RPN: region proposal network; RoI: region of interest. 20 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
increasing very significantly the computation cost and slowing the detection speed. SPATIAL PYRAMID POOLING NETWORK To reduce the computational costs incurred by an RCNN, He et al. [71], [83] proposed the spatial pyramid pooling network (SPPNet), wherein the SPP layer is the main improvement. Instead of requiring an input image of fixed size, the SPP layer can generate a fixed-length feature representation regardless of the size of the input proposals. During the detection process, the feature maps need only be computed once from the entire image. The SPP layer can then extract the corresponding region of the feature maps and generate a fixed-size feature representation for each region proposal. This significantly speeds up detection by avoiding repeated computations of the feature maps. SPPNet achieved speeds more than 20-times faster than those of RCNNs. However, it is not an end-to-end framework and can fine-tune only its fully connected layers, thus limiting the efficiency and performance of the model. FAST RCNN AND FASTER RCNN In 2015, Girshick et al. [72] proposed the fast RCNN detection framework that uses a unified neural module to localize and recognize targets. It increases detection precision and accelerates detection speed because it can train a classifier and a BB regressor simultaneously. Although fast RCNN outperforms RCNNs and SPPNet, it is restricted by the proposal-generation strategy. The faster RCNN framework presented by Ren et al. [73] is a fully end-to-end framework. It breaks though the speed bottleneck of fast RCNN by introducing a region proposal network (RPN) that enables generated object proposals using a CNN model. It achieved a near-real-time detection speed and state-of-the-art accuracy. From RCNNs to Faster RCNN, the building blocks of a detector, including region proposal generation, feature extraction, and BB regression, have been gradually improved and unified into an effective learning framework. REGION-BASED FULLY CONVOLUTIONAL NETWORK The regionwise subnetwork for localizing and recognizing an object in faster RCNN still needs to be applied per region proposal (several hundred proposals per image). To address this problem in faster RCNN, Dai et al. [74] proposed the region-based fully convolutional network (RFCN), a fully convolutional architecture with most of the computations shared over the entire image. Dai et al. constructed a set of position-sensitive score maps by using a bank of specialized convolutional layers as the FCN output and adding a position-sensitive RoI pooling (RoIPool) layer on top. An RFCN with ResNet101 could achieve an accuracy comparable to faster RCNN (often with faster running times). MASK RCNN Mask RCNN was presented by He et al. [84], [85] to tackle pixelwise object-instance segmentation by extending faster DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE RCNN. Mask RCNN adopts the same two-stage pipeline with an identical first stage (RPN). In the second stage, mask RCNN adds a branch that outputs a binary mask for each RoI in parallel with the class prediction and box offset. The new branch is an FCN [86], [87] on top of a CNN feature map. To avoid the misalignments caused by the original RoIPool layer, an RoI alignment layer was proposed to preserve the pixel-level spatial correspondence. With a backbone network, i.e., a ResNeXt101-feature pyramid network (FPN), mask RCNN achieved the top results for COCO objectinstance segmentation and BB object detection [46]. FPN The previous examples detect objects on only the top layer of the feature-extraction network. In some cases, this is not suitable for localizing objects, especially small ones. Lin et al. [88] proposed an FPN whose top-down architecture has skip connections to the remaining all-scale feature maps. It shows great advances for detecting objects with a wide variety of scales and aspect ratios and has been set as a basic building block in many recent detectors. CHAINED CASCADE NETWORK AND CASCADE RCNN Two-stage object detection can be considered a cascade structure; the first detector removes large amounts of background, and the second stage classifies the remaining regions. Recently, a series of end-to-end learning of more than two cascaded classifiers and regressors for generic object detection in the chained cascade network [89] was proposed, extended in cascade RCNN [90], and later applied to simultaneous object detection and instance segmentation [91]. These models have a sequence of detection heads trained with increasing IoU thresholds. The subsequent heads with the increasing IoU thresholds would train on more abundant positive samples to conduct accurate detection and avoid the problem of overfitting. ONE-STAGE DETECTION FRAMEWORKS Although two-stage detectors perform satisfactorily, they are computation intensive and therefore unsuitable for scenarios with limited storage and computational capability. Research scholars have therefore started to design one-stage unified detection approaches to accelerate detection speed. As displayed in Figure 9(b), a one-stage detector directly predicts the locations of the BB and the class probabilities in an entire image by using a single CNN. It does not involve the steps of region proposal generation, feature resampling, and postclassification, but it does encapsulate all of the computations in a single network [20]. YOU ONLY LOOK ONCE You only look once (YOLO), presented by Joseph et al. [75], is considered the first one-stage detector in the DL era. The model divides the entire image into many regions then predicts the category probabilities and BB offsets for each region simultaneously. Two improved versions, YOLO v2 and 21
v3, were proposed later [92], [93]; these further promote detection precision while retaining high detection speed. Although they have obvious speed advantages, these models have a lower localization accuracy than do the two-stage models, especially for small-scale objects. SINGLE-SHOT MULTIBOX DETECTOR To further boost the localization accuracy of a one-stage detector, Liu et al. developed a single-shot multibox detector (SSD) [76], which is faster than YOLO and achieves better detection accuracy. The main idea of SSD is that it can effectively combine an RPN in faster RCNN with multiscale feature maps, thus achieving high detection accuracy while keeping a fast detection speed. Unlike two-stage detectors, an SSD can predict only a fixed number of BBs, followed by a nonmaximal suppression (NMS) operation to obtain the final results. The network architecture of an SSD uses FCNs. It carries out detection processing on multiple feature maps, each of which predicts a category score and location offset for each box of an appropriate size. RETINANET For years, there has been a large gap between the accuracies of one- and two-stage detectors. Lin et al. [77] claimed that the central cause of this gap is the extreme foreground– background class imbalance encountered during the training of dense detectors. To counter this, a new loss function, focal loss, has been proposed in RetinaNet to improve the standard cross-entropy loss. Focal loss makes the detector focus more on hard-to-classify examples during training. It enables one-stage detectors to achieve detection performances comparable to those of two-stage detectors while maintaining a high detection speed. CORNERNET Law et al. [78], [94], thinking that the anchor boxes for regressing the location of objects could cause a huge imbalance between positive and negative examples, proposed CornerNet. This formulates BB object detection as the identification of paired top-left and bottom-right key points. In CornerNet, the backbone network consists of two stacked hourglass networks [95], with a simple corner pooling approach to better localize corners. Its accuracy, although improved, was obviously lower than that of SSD and YOLO’s. CornerNet may generate incorrect BBs because it is difficult to decide which pairs of key points belong to the same objects. Duan et al. [96] addressed the problem by detecting each object as a triplet of key points, introducing an extra point at the center of a proposal. DL FRAMEWORKS FOR SMALL, WEAK OBJECT DETECTION Though there is not a clear definition of small, weak object detection in the field of remote sensing, some excellent DLbased works have been made to address the related challenges. Data augmentation is a straightforward and simple technique used to improve the detection accuracy of small 22 objects. Kisantal et al. [97] simply oversampled images with small objects and augmented each of those by copying and pasting objects many times for small-object detection. Features of different levels in DL models can effectively retain the location and semantic information of targets with different scales. The development of multiscale detection, that is, detecting objects in an appropriate feature level, is marked by many milestones, such as an FPN [88] and path aggregation [98], extended [99], multilevel [100], and multiscale FPNs [101]. These models have proved their superiority and achieved satisfactory performances, especially for small-scale object detection. Although there has been success with multiscale detection, some objects lack the discriminative features necessary for recognition. Deng et al. [102] developed a feature-level superresolution method that enhances the features of small RoIs. Li et al. [103] proposed a perceptual generative adversarial network (GAN) to improve the representations of tiny objects to large objects with similar characteristics for more precise detection. Visual attention is an effective method used to highlight objects of interest, so it is used to detect small and dim objects. Yang et al. [104] developed a multicategory rotation detector for small, cluttered, and rotated objects wherein a supervised pixel-attention network and a channel-attention network are jointly used for highlighting small and cluttered objects. Lim et al. [105] combined the context information and the objects of interest for addressing the limited information of small objects. To address the nonuniform distribution, Yang et al. [106] presented a clustered detection network wherein a cluster proposal subnetwork can conduct object cluster regions and a scale-estimation subnetwork estimates object scales for each region. The clusterbased scale estimation is more accurate than the ones based on single objects, and the clustered regions implicitly model the prior context information. The detailed techniques and approaches for addressing small, weak target detection are summarized in the next section. ADVANCES FOR ADDRESSING DIFFERENT CHALLENGES IN SMALL, WEAK OBJECT DETECTION Inspired by the significant progress of object-detection methods and technologies, extensive studies have been devoted to object detection in remote sensing. Having thoroughly reviewed the recent progress of representative methods for remote sensing object detection, we introduce some critical technologies and methods that address the challenges to small, weak object detection. All of the mentioned approaches are divided into three aspects for solving the challenges discussed in the “Difficulties and Challenges in Remote Sensing Small, Weak Object Detection” section. HANDLING THE CHALLENGES INVOLVED IN IMAGE QUALITY In remote sensing image acquisition, there are various kinds of uncertain factors, such as noise, blurring, thin IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
clouds, missing information, and shadows, which may cause some degree of image degradation. In addition, due to the limitations of manufacturing technologies and the characteristics of imaging sensors, remote sensing images can reach a high resolution in only one aspect of spectral, spatial, and temporal resolution. These low-quality images cause the missing or false detection of small, weak objects. Therefore, improving the quality of remote sensing images is of great significance for small, weak object detection. In the following, the problems to be solved by the current methods for improving image quality are summarized from two aspects: image degradation and imaging sensor limitations. HANDLING IMAGE DEGRADATION The factors that cause image degradation can be divided into two categories: 1) the atmospheric influence on the reflection wave of ground objects and 2) the loss of information caused by the damaged components of the imaging sensors. Furthermore, a variety of degradation models, such as noise, blurring, thin clouds, missing information, and shadows, have been produced. Over the past few years, many approaches have been developed for addressing these different types of degradation models. In general, noise cannot be entirely avoided while acquiring remote sensing images. The most common types are additive, multiplicative speckle, and stripe noises. Some classical denoising methods are described in [107]–[109]. The causes of blurring in remote sensing images are optical blurring, mainly caused by imaging components; motion blurring, caused by relative motion between the target and sensor; and atmospheric blurring, caused by atmospheric turbulence. Most deblurring models use regularization terms to keep the solution stable and suppress the corresponding noise interference. In general, existing works for image deblurring can be divided into 1) image restoration with a known blur kernel function and 2) blind image restoration with an unknown blur kernel function [110], [111]. A large number of remote sensing images are likely covered by clouds, which can be characterized as thin and thick clouds. Thin clouds lead to the color fading of objects and reduce the contrast of objects in the images, making them difficult to recognize. In recent years, many approaches [112]–[114] have been proposed for thin cloud removal. Thick clouds and damaged sensors cause the loss of some image regions. In this case, the surface information of Earth obtained by images is incomplete and difficult to acquire for real-world applications. Some representative methods [115]–[117] have been developed to restore the missing parts of remote sensing images. Because of the imaging angle of sensors, shadows are one of the basic characteristics of remote sensing images. Tall trees, scattered buildings, mountains, and so on may cause shadows. Many small, weak objects in shadows are more difficult to recognize. Some effective methods for removing shadows are introduced in [118]–[121]. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE HANDLING SENSOR LIMITATIONS Due to the limitations of sensors, remote sensing images achieve high performance in only one aspect of spatial, spectral, and temporal resolution, which cannot meet the requirements for some specific tasks. Additionally, when processing remote sensing images, it is necessary to discretize the time, space, spectrum, and observation angle information from original images to save them in the form of digital images. The process of discretization often means downsampling data, which inevitably leads to a loss of information. To some extent, image-fusion models that fuse single or multisource images with different resolutions can remedy the degradation of remote sensing images and improve the data quality. Information-complementary fusion methods include spatial and spectral [122], temporal and spatial [123], and multispectral and hyperspectral fusion [124]. HANDLING THE CHALLENGES INVOLVED WITH OBJECT VARIATIONS HRRS images always contain massive object categories and instances, which are variant in scale, appearance, and distribution. The features easily change, as they are affected by weather, illumination, and occlusions. Additionally, due to large image sizes, the problem of unbalanced positive and negative training examples is quite serious, and high-quality training instances are relatively few. Obtaining large-scale annotation data sets is another critical problem for achieving satisfactory detection performance. The aforementioned challenges of object variations are divided into four types in this article: scale variations, high intraclass variations, the imbalance of positive and negative examples, and a lack of annotation data sets. The scale problem should belong with high intraclass variations; however, because of its importance in remote sensing object detection, we list it separately and summarize the corresponding methods. The methods used to address the challenges of these four aspects are introduced in the following sections. HANDLING SCALE VARIATIONS In the remote sensing community, scale variations, overlarge images, complex image backgrounds, and the nonuniform distribution of training samples make detection tasks more challenging, especially for small and cluttered objects. Some targets, such as football fields and harbors, are wider than 150 m and occupy 300 pixels in an image, while the widths of some other targets, such as vehicles, are fewer than 3 m and can occupy only 10 pixels in an image. The multiscale detection of objects with different sizes and aspect ratios is one of the main challenges in remote sensing object detection. Many scholars have further improved the model and achieved better results for robust multiscale detection. There are three main categories of detection methods used in Earth observation. The first category uses an image or sliding-window pyramids as the input. Zhang et al. [125], [126] resized the input image to different scales and extracted image features on each scale. Yao et al. [127]–[129] used 23
multiscale sliding windows with different step sizes to conduct training with images for generating potential candidate boxes. This method, however, is too time- and computation consuming to meet the requirements of practical applications. The second category is based mainly on various multiscale features of a manual design, such as a scale-invariant feature transform (SIFT) [130], an HoG [10], and a BoW [60]. Beril et al. [131] utilized the SIFT feature and graph theory to detect buildings and urban areas. Shi et al. [40], [132] combined both circle-frequency and HoG features to learn the appearances and shapes of objects. Sun et al. [134] developed a spatial sparse-coding BoW model to build the visual vocabulary by clustering local features; it can effectively fuse local and global features. However, the two categories of methods pose difficulties when it comes to achieving satisfactory performances for remote sensing target detection because they all depend on handcrafted features—extracted according to expert experience—and are not robust enough to process complex remote sensing images. Since 2014, many learning-based detectors that incorporate the object proposal strategy, coupled with the remarkable performance of DL-based features [13], [14], [81], [135], have enabled significant improvements in the performance of object localization and recognition [136]–[138]. Multireference and multiresolution detection, developed on this basis, have become the two most widely used fundamental blocks in the task of object detection [21]. The main idea of multireference detection is to predefine a set of reference boxes (anchor boxes) with different sizes and aspect ratios and then to predict the detection box based on those references. The milestone models are faster RCNN [73], RetinaNet [77], and mask RCNN [84], [85]. Multiresolution detection detects objects with different scales by constructing a feature pyramid at different layers of the network. The shallow layers hold information about small objects, while the deep layers contain information about large objects. The main improvements are in the FPN [88]. To detect multiscale objects, especially small ones, in HRRS images, Guo et al. [139] and Zhang et al. [140] designed unified multiscale detection frameworks; they used a modified FPN as well as anchors with different scales and aspect ratios. Qiu et al. [141] developed an adaptive aspect ratio multiscale network, which utilizes a multiscale feature gatefusion subnetwork and an aspect ratio attention network to learn the weights of different feature maps and automatically select the appropriate aspect ratios in accordance with the aspect ratios of objects. Wu et al. [142] introduced multiscale and rotation-insensitive convolutional channel features by involving two modules, the rotation-insensitive descriptor and the multiscale aggregated descriptor. AlAlimi et al. [143] designed a unique shallow-deep feature extraction that employs a squeeze and excitation network and ResNet to obtain feature maps. Deng et al. [144] addressed the problems of scale variants by applying different filters to several intermediate layers. Li et al. [145] proposed 24 multiscale convolutional feature fusion to detect multisensor HRRS images using a symmetric encoder–decoder module to extract and fuse multiscale and high-level spatial features. Some scholars have focused their research work on segmentation methods. Dong and You [146], [147] utilized a graph-segmentation algorithm. Based on multiscale saliency maps, it is constructed to overcome the problem of ship scale change to accurately locate candidate regions. Kang et al. [148] designed an FCN with dense SPP for building detection that can extract dense and multiscale features simultaneously. Mo et al. [149] focused on generating an anchor of the most suitable scale for each category and developed a class-specific anchor block, which provides better initial values for an RPN. Xie et al. [150] used multidetectors with different sensitivities and accessed the fused features to finish the task of target detection. Superresolution [102] and GANs [103] have also been used to restore or enhance the features response of small targets during the detection process. HANDLING INTRACLASS VARIATIONS Objects in HRRS images vary in color, texture, and shape feature because of the vast number of object instances and categories as well as the influences of weather, illumination, imaging condition, and occlusion. For real-scenario HRRS image object detection, powerful object representations should be extracted with robustness and discrimination. Many recent works have been devoted to handling changes in object variations by applying DL models to remote sensing object detection. However, CNN models lack the ability to be spatially invariant for generating transformations of input data. In processing HRRS images, the performance of these models is limited due to the intraclass variations of objects. Data augmentation is the most straightforward method used to address intraclass variations, including rotation and resizing. To some extent, these operations can make detectors learn robustness with regard to rotation and scale, although these methods can involve expensive training and a massive number of model parameters. Therefore, many attempts have been made to learn invariant CNN representations with respect to different transformations, including scale [151]–[153], rotation [151], [154]–[156], or both [157]. Early deformable part-based models (DPMs) [157], which represent objects by components arranged in a deformable configuration, were successful for generic object detection, but these models are less sensitive to object variations in both pose and viewpoint. Many scholars have attempted to combine DPMs with CNNs, aiming to realize the advantages of both [159]–[161]. To address the problem of occlusions, deformable RoIPool [161]–[163] and deformable convolution have been proposed to achieve more flexibility in fixed geometric structures [27]. Another method, the application of GANs [164], [165] to generate missing parts of objects and context, is promising. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
HANDING THE IMBALANCE OF POSITIVE AND NEGATIVE EXAMPLES In essence, training a detector is a problem in imbalanced data learning. For detectors based on a sliding window, the imbalance between objects of interest and backgrounds may be as extreme as 104–105 background windows for each object [21]. For a modern detection task with a prediction of the object aspect ratio, the imbalanced ratios increase to greater than 106. In this case, a vast number of negative and easy samples would guide the training process, and the detector would achieve poor performances for hard-to-recognize objects, especially small, weak objects. Hard negative mining focuses on solving the problem of imbalanced data during the training process. Bootstrapping was a milestone technique used for addressing the problem of a training data imbalance in object detection, in which the training starts with a small number of background samples to which new misclassified backgrounds are added iteratively during the training process [166]. Later in the DL era, detectors such as faster RCNN [73] and YOLO [75] developed a weighted balancing method for positive and negative samples. However, that method cannot completely address an imbalanced data problem. Bootstrapping was reused in DL-based detectors [76], [167]. In RefineDet [168], an anchor-refinement module is designed to filter easy negatives. An alternative improvement is to design new loss functions [77], [170] by reshaping the standard cross-entropy loss to put more focus on difficult, misclassified examples. The recent A-Fast-RCNN detection model [164], which utilizes GANs to handle occlusion and deformation samples, is also regarded as a hard miningapproach example. Pang et al. [172] proposed an IoUbalanced sample method to adaptively select high-quality negative examples in the proposal candidates for stabilizing the training process. In Earth observation literature, recent research works reveal that detection data sets contain an overwhelming number of easy examples and only a few difficult examples. Many scholars have therefore tried to mine the more representative difficult examples to balance the proportion of foreground– background class examples. Traditional methods usually freeze the model to mine negative examples; however, positive sample mining is also essential to avoid missed detection. Besides, freezing the model to collect difficult examples would dramatically slow the progress of the model. Cheng et al. [173] developed a two-step iterative training strategy, which alternates between updating the detection model given to the training set and adaptively selecting the difficult negative examples for updating the detection model. Focusing on airport detection, Cai et al. [174] and Xu et al. [175] applied cascade strategies to automatically select difficult examples according to the loss values of proposals. The cascade strategies significantly inhibited the false alarms that existed in airport detection. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE HANDLING INSUFFICIENT TRAINING DATA The difficulty of acquiring annotation samples means that the training data are not usually sufficient for obtaining ideal models, and data augmentation is the most straightforward method for increasing training data. In addition, research scholars have developed many methods to address the problem; these can be divided into three categories: transfer learning (TL), active learning (AL), and weak supervised learning (WSL). TL can effectively transfer welltrained knowledge from one or more source tasks to another task; this needs only a small amount of labeled data and eliminates the drudgery of preliminary learning [176]– [179]. Dong et al. [180] proposed a Sig-NMS-based faster RCNN with TL; this can annotate not only the class of an object but also its location. Chan-Hon-Tong et al. [181] and Kellenberger et al. [182] exploited an AL-based strategy to find very confident samples for the quick retrieval of TPs in the target data set. Another method, WSL, addresses the data insufficiency problem by training detection using image-level labels only. Recently, research works on WSL have followed different branches. Some scholars have utilized multi-instance learning for WSL [183]–[185]. If an image contains many object candidates, it is considered to involve a set of labeled bags, with each bag containing many instances; image-level annotation acts as the label. The object detector is then obtained by alternating detector training, using the detector to select the most likely object instances in positive images. Research works on CNN visualization have demonstrated that the convolution layer of a CNN model behaves as a target detector even though there is no supervision of the object’s location. Therefore, class-activation mapping sheds light on a way to give a CNN model localization ability by training it on image-level labels [186]–[188]. Some scholars automatically select the most informative regions and train them with image-level annotation [189]. Another method masks out different regions of the image to localize the object [190]. Interactive annotation [184] and generative adversarial training have also been used for WSL [191]. To address the problem of a lack of annotated HRRS data sets, Zhang et al. [192] employed an iterative, weakly supervised learning framework to automatically mine and augment a training data set from the original images. Cao et al. [193] proposed a novel multi-instance-detection algorithm based on learning, using it to learn instancewise detectors from such a “weak annotation.” In the algorithm, a density estimator is adopted to estimate the density map of vehicle instances from the positive regions; a multi-instance SVM is then trained to classify and locate vehicle instances from this map. Although existing WSL methods take scenes as being isolated and ignore the mutual cues between scene pairs when optimizing deep networks, Li et al. [194] exploited both the separate scene category information and the mutual cues between scene pairs to train deep networks well enough to pursue superior objectdetection performance. 25
HANDLING COMPLEX CONTEXT Objects of interest are always embedded in a typical context with surrounding environments and objects. An HRRS image usually involves a broad range of space and contains many kinds of objects that form an intricate spatial pattern. The complex background of the objects of interest increases the difficulty of highly accurate detection; however, many existing works have demonstrated that the proper use of context information can improve the performance of detectors. Current works on the adaptation of complex backgrounds have been divided into two categories: 1) detection with a suppressing background and 2) detection with related context information. DETECTION WITH SUPPRESSING BACKGROUND Many early works, taking advantage of the remarkable feature-extraction ability of the CNN model, directly applied the models to adapt to the complex, changeable background and learn discriminative features for HRRS image detection [126], [195]. To effectively distinguish between the target and background information, Xiao et al. [196] designed an encoder–decoder network to perform paired semantic segmentation for per-pixel prediction. The top-left and bottom-right parts of the objects of interest are then predicted, and the rotated minimum BB is generated as the rotated anchor. Compared to the presented methods, this method is more robust across different data sets. DETECTION WITH RELATED CONTEXT INFORMATION The remote sensing community has long acknowledged that context information benefits the improvement of object detection. Therefore, more work has been done to explore how to make good use of that information. Context information can be placed into two categories: local and global context [21]. Local context refers to visual information such as the texture, color, and objects in the region that surrounds the targets to be detected. In contrast, global context employs scene semantics as the additional information for target detection. Existing methods focus mainly on fusing local contexts to improve detection performance. Gong et al. [197] integrated the context RoIs’ mining layer into the detector. The layer can extract local context features by mapping context RoIs to multilevel feature maps. Considering the limited label information provided by objects—especially small objects—in the feature map, Mo et al. [149] doubled the size of the region proposal box, with the center in the predicted box, to incorporate the local context information and thus improve the discriminative ability of features in recognizing the objects. Ma et al. proposed a multimodel decision fusion network [198], based on gated recurrent units (GRUs) [199], in which one of the subnetworks is designed to learn the local context of objects of interest and the object–object relationships. GRUs are used to merge all of the features and form discriminative-feature representation. Bell et al. [200] developed the inside–outside network (ION) to exploit information both inside and outside the 26 RoIs; it integrates the contextual information outside the RoIs by using spatial recurrent neural networks. Xiao et al. [129] fused auxiliary features within and around the RoIs to represent the complementary information of each region proposal for airport detection, effectively alleviating detection problems caused by the diversity of illumination intensities in remote sensing images. To generate accurate rotation BBs in large-scale aerial images, Feng et al. [202] proposed a detection network that introduced a novel sequence local context module. It can extract local context features, thus making the rotated BB fit the ship tightly. The accurate BB can include the discriminative parts, such as the prow, and exclude noise information, such as the background. Other works have promoted the global context as additional information. Focusing on the task of vehicle detection, Tao et al. [158] proposed a vehicle-detection method driven by scene context. This first classifies the input image into different scene categories (e.g., road, parking lot, and others) and then detects vehicles in different scenes separately from the contextual information provided by the prior scene. Incorporating the scene before vehicle detection can effectively confine the region where vehicles may be present and apply a more flexible postprocessing strategy according to different scene types. By analyzing the relationship of objects and scenes in remote sensing images, Chen et al. [133] found that most of the objects appeared in their relevant scenes. The objects have a strong correlation with the contextual information of their scene. Chen et al. [133] proposed a scene-contextual FPN that fuses the global scene features into region proposal features for training the classifier. Both global and local contextual information is valuable, so the fusion of the two may achieve a better performance. Relevant work has been carried out on this approach. Zhang and Liu et al. [169] proposed a context-aware detection network to improve the accuracy of target detection; this can learn the correlations of the global information (at the scene level) and the local neighboring objects or features (at the object level). Li and Gong [156] used a double-channel network to fuse the local and global features to enhance the discrimination of the feature. FUTURE RESEARCH DIRECTIONS Despite tremendous recent progress in small, weak object detection, the main technologies are still primitive and cannot satisfactorily address all the difficulties and challenges. Our analysis shows that future research may focus on (but should not be limited to) the following areas. DETECTION WITH MULTISOURCE DATA FUSION Detectors for small, weak object detection may not be stable. The fusion of multiple sources/modalities of data, such as 3D point clouds, lidar, and Internet data, is of great importance for improving detection accuracy. Two critical problems should be addressed: how to encode multisource or multimodal data into a unified input for the detectors IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
and how to transfer well-trained detectors to different modalities of data. WEAKLY SUPERVISED DETECTION Recent state-of-the-art approaches require many samples with accurate annotation, in the manner of fully supervised learning. However, labeling samples is labor intensive and time consuming. Meanwhile, weakly/partially annotated or unlabeled samples are easily accessible and sufficient. Therefore, it is essential to leverage DL-based models to learn from these samples to boost detection ability. LIGHTWEIGHT OBJECT MODEL The number of layers in existing CNN models for extracting features has dramatically increased from several [14] to hundreds of layers [13], [206]. They have millions of parameters and need massive computation resources and training data to obtain an ideal mode. To train the CNN models effectively, much work has been done to develop a series of lightweight and compact models. However, a significant gap in the efficiency between detectors and the human eye remains. AUTOMATIC NEURAL ARCHITECTURE SEARCH Most existing target detectors are based on manual design. To meet problems of ever-increasing complexity requires increasing domain knowledge and expertise. Recently, a natural research direction has been to automatically select and build a detector with a performance that can deal with the number of parameters, such as automated ML [201]. Related work should be carried out for small, weak object detection. IMPROVEMENT OF IMAGE QUALITY Affected by imaging conditions such as weather, light, and the resolution of sensors, remote sensing images may not be able to meet the requirements of usage, as they are blurred or noisy or have low resolution. Algorithms, such as those undertaken in image fusion, image denoising, and superresolution, have been developed to address these problems. These should be combined with detection methods to improve detection performance. UNIVERSAL OBJECT FRAMEWORK Recently, increasing efforts have been made in learning universal representations, reinforcement learning, and lifelong learning; these are effective in learning, transferring, and reasoning knowledge from massive data. It is meaningful to design a universal object framework based on state-of-theart advances, which can gradually self-evolve and improve detection performance. CONCLUSIONS To meet the requirements of some applications, the task of small, weak object detection, which is more challenging than generic object detection, has gradually become increasingly important and attracted much attention. During the last several years, considerable efforts have been made DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE to develop various methods that address small, weak object detection. This article presented a systematic review of the advances of small, weak object detection in the remote sensing community. Having analyzed the challenges and difficulties of small, weak target detection, we discussed the technical evolution of object detection and benchmark data sets. Finally, we categorized the existing works that address different challenges and in which some promising research directions have been drawn for the further improvement of small, weak object detection. The research of small, weak object detection is still far from complete, but given the breakthroughs over the past several years, we are optimistic about future developments. ACKNOWLEDGMENTS This work was supported by the National Natural Science Foundation of China under grants U1711266 and 41925007 and the Fundamental Research Funds for the Central Universities, China University of Geosciences, Wuhan (no. 162301212697). Lizhe Wang and Ruyi Feng are the corresponding authors. AUTHOR INFORMATION Wei Han (weihan@cug.edu.cn) is with the School of Computer Science, China University of Geosciences, Wuhan, 430078, China, and also the Hubei Key Laboratory of Intelligent Geo-Information Processing, China University of Geosciences, Wuhan, 430078, China. Jia Chen (chen_jia@cug.edu.cn) is with the School of Computer Science, China University of Geosciences, Wuhan, 430078, China, and also the Hubei Key Laboratory of Intelligent Geo-Information Processing, China University of Geosciences, Wuhan, 430078, China. Lizhe Wang (lizhe.wang@foxmail.com) is with the School of Computer Science, China University of Geosciences, Wuhan, 430078, China, the Hubei Key Laboratory of Intelligent Geo-Information Processing, China University of Geosciences, Wuhan, 430078, China, and the Key Laboratory of Geological Survey and Evaluation of the Ministry of Education, China University of Geosciences, Wuhan, 430078, China. Ruyi Feng (fengry@cug.edu.cn) is with the School of Computer Science, China University of Geosciences, Wuhan, 430078, China, and also the Hubei Key Laboratory of Intelligent Geo-Information Processing, China University of Geosciences, Wuhan, 430078, China. Fengpeng Li (li_feng_peng@cug.edu.cn) is with the School of Computer Science, China University of Geosciences, Wuhan, 430078, China, and also the Hubei Key Laboratory of Intelligent Geo-Information Processing, China University of Geosciences, Wuhan, 430078, China. Lin Wu (wulin@cug.edu.cn) is with the Key Laboratory of Geological Survey and Evaluation of the Ministry of Education, China University of Geosciences, Wuhan, 430074, China. Tian Tian (tiantian@cug.edu.cn) is with the School of Computer Science, China University of Geosciences, Wuhan, 430078, China, and also the Hubei Key Laboratory of 27
Intelligent Geo-Information Processing, China University of Geosciences, Wuhan, 430078, China. Jining Yan (yanjn@cug.edu.cn) is with the School of Computer Science, China University of Geosciences, Wuhan, 430078, China, and also the Hubei Key Laboratory of Intelligent Geo-Information Processing, China University of Geosciences, Wuhan, 430078, China. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] 28 Z. Lin et al., “A contextual and multitemporal active-fire detection algorithm based on FengYun-2G S-VISSR data,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 11, pp. 8840–8852, 2019. doi: 10.1109/TGRS.2019.2923248. Z. Lin et al., “An active fire detection algorithm based on multitemporal FengYun-3C VIRR data,” Remote Sens. Environ, vol. 211, pp. 376–387, June 2018. doi: 10.1016/j.rse.2018.04.027. N. Wang, F. Chen, B. Yu, and Y. Qin, “Segmentation of largescale remotely sensed images on a spark platform: A strategy for handling massive image tiles with the MapReduce model,” ISPRS J. Photogram. Remote Sens., vol. 162, pp. 137–147, Apr. 2020. doi: 10.1016/j.isprsjprs.2020.02.012. N. Gorelick, M. Hancher, M. Dixon, S. Ilyushchenko, D. Thau, and R. Moore, “Google earth engine: Planetary-scale geospatial analysis for everyone,” Remote Sens. Environ., vol. 202, pp. 18–27, Dec. 2017. doi: 10.1016/j.rse.2017.06.031. D. Li, Y. Ke, H. Gong, and X. Li, “Object-based urban tree species classification using bi-temporal worldview-2 and worldview-3 images,” Remote Sens., vol. 7, no. 12, pp. 16,917–16,937, 2015. doi: 10.3390/rs71215861. K. Huang and X. Mao, “Detectability of infrared small targets,” Infrared Phys. Techn., vol. 53, no. 3, pp. 208–217, 2010. doi: 10.1016/j.infrared.2009.12.001. D. M. McKeown Jr. and J. L. Denlinger, “Cooperative methods for road tracking in aerial imagery,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 1988, pp. 662–672. doi: 10.1109/CVPR.1988.196307. S. Leninisha and K. Vani, “Water flow based geometric active deformable model for road network,” ISPRS J. Photogram. Remote Sens., vol. 102, pp. 140–147, Apr. 2015. doi: 10.1016/j.isprsjprs.2015.01.013. J. Peng and Y. Liu, “Model and context-driven building extraction in dense urban aerial images,” Int. J. Remote Sens., vol. 26, no. 7, pp. 1289–1307, 2005. doi: 10.1080/01431160512331326675. N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 886–893. doi: 10.1109/CVPR.2005.177. G. Cheng and J. Han, “A survey on object detection in optical remote sensing images,” ISPRS J. Photogram. Remote Sens., vol. 117, pp. 11–28, July 2016. doi: 10.1016/j.isprsjprs.2016.03.014. J. Deng, W. Dong, R. Socher, L. Li, K. Li, and F. Li, “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), 2009, pp. 248–255. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778. doi: 10.1109/CVPR.2016.90. [14] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. Adv. Neural Inf. Process. Syst. 25: 26th Annu. Conf. Neural Inf. Process. Syst., 2012, pp. 1106–1114. [15] F. Li, R. Feng, W. Han, and L. Wang, “High-resolution remote sensing image scene classification via key filter bank based on convolutional neural network,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 11, pp. 8077–8092, 2020. doi: 10.1109/TGRS .2020.2987060. [16] K. Liu and G. Mattyus, “Fast multiclass vehicle detection on aerial images,” IEEE Geosci. Remote Sens. Lett., vol. 12, no. 9, pp. 1938–1942, 2015. doi: 10.1109/LGRS.2015.2439517. [17] S. Razakarivony and F. Jurie, “Vehicle detection in aerial imagery: A small target detection benchmark,” J. Vis. Commun. Image R, vol. 34, pp. 187–203, 2016. doi: 10.1016/j.jvcir.2015.11.002. [18] G. Xia et al., “DOTA: A large-scale dataset for object detection in aerial images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 3974–3983. doi: 10.1109/CVPR.2018.00418. [19] K. Li, G. Wan, G. Cheng, L. Meng, and J. Han, “Object detection in optical remote sensing images: A survey and a new benchmark,” ISPRS J. Photogram. Remote Sens., vol. 159, pp. 296–307, Jan. 2020. doi: 10.1016/j.isprsjprs.2019.11.023. [20] L. Liu et al., “Deep learning for generic object detection: A survey,” Int. J. Comput. Vis., vol. 128, no. 2, pp. 261–318, 2020. doi: 10.1007/s11263-019-01247-4. [21] Z. Zou, Z. Shi, Y. Guo, and J. Ye, “Object detection in 20 years: A survey,” 2019. [Online]. Available: http://arxiv.org/ abs/1905.05055 [22] M. Manana, C. Tu, and P. A. Owolawi, “A survey on vehicle detection based on convolution neural networks,” in Proc. 3rd IEEE Int. Conf. Comput. Commun. (ICCC), 2017, pp. 1751–1755. doi: 10.1109/CompComm.2017.8322840. [23] A. Borji, M.-M. Cheng, Q. Hou, H. Jiang, and J. Li, “Salient object detection: A survey,” Comput. Vis. Media, vol. 1411, no. 7, pp. 1–34, 2014. doi: 10.1007/s41095-019-0149-9. [24] W. Wang, Q. Lai, H. Fu, J. Shen, and H. Ling, “Salient object detection in the deep learning era: An in-depth survey,” 2019, arXiv:1904.09146. [25] J. Han, D. Zhang, G. Cheng, N. Liu, and D. Xu, “Advanced deep-learning techniques for salient and category-specific object detection: A survey,” IEEE Signal Process. Mag., vol. 35, no. 1, pp. 84–100, 2018. doi: 10.1109/MSP.2017.2749125. [26] Z. Liu, H. Wang, L. Weng, and Y. Yang, “Ship rotated bounding box space for ship extraction from high-resolution optical satellite images with complex backgrounds,” IEEE Geosci. Remote Sens. Lett., vol. 13, no. 8, pp. 1074–1078, 2016. doi: 10.1109/ LGRS.2016.2565705. [27] X. Zhang, Y. Yang, Z. Han, H. Wang, and C. Gao, “Object class detection: A survey,” ACM Comput. Surv., vol. 46, no. 1, pp. 10:1–10:53, 2013. doi: 10.1145/2522968.2522978. [28] G.-D. Wang, C.-Y. Chen, and X.-B. Shen, “Facet-based infrared small target detection method,” Electron. Lett., vol. 41, no. 22, pp. 1244–1246, 2005. doi: 10.1049/el:20052289. [29] G. J. Klinker, S. A. Shafer, and T. Kanade, “Image segmentation and reflection analysis through color,” in Proc. Appl. Artific. Intell. VI, vol. 937, 1988, pp. 229–244. doi: 10.1117/12.946980. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
[30] P. W. Kruse, “Principles of uncooled infrared focal plane arrays,” in Semiconductors Semimetals, vol. 47, P. W. Kruse and D. D. Skatrud, Amsterdam, The Netherlands: Elsevier, 1997, pp. 17–42. [31] J. Han, Y. Ma, B. Zhou, F. Fan, K. Liang, and Y. Fang, “A robust infrared small target detection algorithm based on human visual system,” IEEE Geosci. Remote Sens. Lett., vol. 11, no. 12, pp. 2168–2172, 2014. doi: 10.1109/LGRS.2014.2323236. [32] B. Lei, B. Wang, G. Sun, Y. Xu, P. Hong, C. Liu, and S. Yue, “A fast detection method for small weak infrared target in complex background,” in Proc. Infrared, Millimeter-Wave, Terahertz Technol. IV, vol. 10030, 2016, p. 100301V. doi: 10.1117/12.2245912. [33] A. G. Tartakovsky, S. Kligys, and A. Petrov, “Adaptive sequential algorithms for detecting targets in a heavy IR clutter,” in Proc. Signal Data Process. Small Targets 1999, vol. 3809, pp. 119– 130. doi: 10.1117/12.364013. [34] A. G. Tartakovsky and R. B. Blazek, “Effective adaptive spatialtemporal technique for clutter rejection in IRST,” in Proc. Signal Data Process. Small Targets 2000, vol. 4048, pp. 85–95. doi: 10.1117/12.392023. [35] B. L. Rozovskii, A. Petrov, and R. B. Blazek, “Interactive banks of Bayesian matched filters,” in Proc. Signal Data Process Small Targets 2000, vol. 4048, pp. 122–133. doi: 10.1117/12.391972. [36] C. Benedek, X. Descombes, and J. Zerubia, “Building development monitoring in multitemporal remotely sensed image pairs with stochastic birth-death dynamics,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 1, pp. 33–50, 2012. doi: 10.1109/TPAMI.2011.94. [37] G. Cheng, J. Han, P. Zhou, and L. Guo, “Multi-class geospatial object detection and geographic image classification based on collection of part detectors,” ISPRS J. Photogram. Remote Sens., vol. 98, pp. 119–132, 2014. doi: 10.1016/j.isprsjprs.2014.10.002. [38] H. Zhu, X. Chen, W. Dai, K. Fu, Q. Ye, and J. Jiao, “Orientation robust object detection in aerial images using deep convolutional neural network,” in Proc. IEEE Int. Conf. Image Process., 2015, pp. 3735–3739. doi: 10.1109/ICIP.2015.7351502. [39] T. N. Mundhenk, G. Konjevod, W. A. Sakla, and K. Boakye, “A large contextual dataset for classification, detection and counting of cars with deep learning,” in Proc. Comput. Vis. - ECCV 2016 - 14th Euro. Conf., Amsterdam, The Netherlands, pp. 785– 800. doi: 10.1007/978-3-319-46487-9_48. [40] Z. Xiao, Q. Liu, G. Tang, and X. Zhai, “Elliptic fourier transformation-based histograms of oriented gradients for rotationally invariant object detection in remote-sensing images,” Int. J. Remote Sens., vol. 36, no. 2, pp. 618–644, 2015. doi: 10.1080/01431161.2014.999881. [41] D. Lam et al., “xView: Objects in context in overhead imagery,” 2018, arXiv:1802.07856. [42] Y. Zhang, Y. Yuan, Y. Feng, and X. Lu, “Hierarchical and robust convolutional neural network for very high-resolution remote sensing object detection,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 8, pp. 5535–5548, 2019. doi: 10.1109/TGRS.2019.2900302. [43] X. Lu, Y. Zhang, Y. Yuan, and Y. Feng, “Gated and axis-concentrated localization network for remote sensing object detection,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 1, pp. 179–192, 2020. doi: 10.1109/TGRS.2019.293517. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE [44] M. Everingham, S. M. A. Eslami, L. V. Gool, C. K. I. Williams, J. M. Winn, and A. Zisserman, “The Pascal visual object classes challenge: A retrospective,” Int. J. Comput. Vis., vol. 111, no. 1, pp. 98–136, 2015. doi: 10.1007/s11263-014-0733-5. [45] O. Russakovsky et al., “ImageNet large scale visual recognition challenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, 2015. doi: 10.1007/s11263-015-0816-y. [46] T. Lin et al., “Microsoft COCO: Common objects in context,” in Proc. Comput. Vis. - ECCV 2014 - 13th Euro. Conf., D. J. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Eds., in Lecture Notes in Computer Science, vol. 8693, 2014, pp. 740–755. doi: 10.1007/978-3-319-10602-1_48. [47] A. Kuznetsova et al., “The open images dataset V4: Unified image classification, object detection, and visual relationship detection at scale,” 2018. [Online]. Available: http://arxiv.org/ abs/1811.00982 [48] G. Heitz and D. Koller, “Learning spatial context: Using stuff to find things,” in ECCV 2008: Proc. 10th Euro. Conf. Comput. Vis., Part I, pp. 30–43. doi: 10.1007/978-3-540-88682-2_4. [49] M. Everingham, L. V. Gool, C. K. I. Williams, J. M. Winn, and A. Zisserman, “The Pascal visual object classes (VOC) challenge,” Int. J. Comput. Vis., vol. 88, no. 2, pp. 303–338, 2010. doi: 10.1007/s11263-009-0275-4. [50] J. Zhang, X. Lin, Z. Liu, and J. Shen, “Semi-automatic road tracking by template matching and distance transformation in urban areas,” Int. J. Remote Sens., vol. 32, no. 23, pp. 8331–8347, 2011. doi: 10.1080/01431161.2010.540587. [51] J. Zhou, W. F. Bischof, and T. Caelli, “Road tracking in aerial images based on human–computer interaction and Bayesian filtering,” ISPRS J. Photogram. Remote Sens., vol. 61, no. 2, pp. 108–124, 2006. doi: 10.1016/j.isprsjprs.2006.09.002. [52] M. A. Fischler and R. A. Elschlager, “The representation and matching of pictorial structures,” IEEE Trans. Comput., vol. C -22, no. 1, pp. 67–92, 1973. doi: 10.1109/ T-C.1973.223602. [53] A. K. Jain, Y. Zhong, and M. Dubuisson-Jolly, “Deformable template models: A review,” Signal Process., vol. 71, no. 2, pp. 109–129, 1998. doi: 10.1016/S0165-1684(98)00139-X. [54] C. Xu and H. Duan, “Artificial bee colony (ABC) optimized edge potential function (EPF) approach to target recognition for low-altitude aircraft,” Pattern Recognit. Lett., vol. 31, no. 13, pp. 1759–1772, 2010. doi: 10.1016/j. patrec.2009.11.018. [55] A. Huertas and R. Nevatia, “Detecting buildings in aerial images,” Comput. Vis. Graph. Image Process., vol. 41, no. 2, pp. 131– 152, 1988. doi: 10.1016/0734-189X(88)90016-3. [56] R. B. Irvin and D. M. McKeown, “Methods for exploiting the relationship between buildings and their shadows in aerial imagery,” IEEE Trans. Syst., Man, Cybern., vol. 19, no. 6, pp. 1564–1575, 1989. doi: 10.1109/21.44071. [57] T. Blaschke, “Object based image analysis for remote sensing,” ISPRS J. Photogram. Remote Sens., vol. 65, no. 1, pp. 2–16, 2010. doi: 10.1016/j.isprsjprs.2009.06.004. [58] T. Blaschke et al., “Geographic object-based image analysis–Towards a new paradigm,” ISPRS J. Photogram. Remote Sens., vol. 87, pp. 180–191, Jan. 2014. doi: 10.1016/j.isprsjprs.2013.09.014. 29
[59] T. Blaschke, S. Lang, and G. Hay, Object-Based Image Analysis: Spatial Concepts for Knowledge-Driven Remote Sensing Applications. Berlin: Springer-Verlag, 2008. [60] F. Li and P. Perona, “A Bayesian hierarchical model for learning natural scene categories,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. CVPR, pp. 524–531. doi: 10.1109/CVPR.2005.16. [61] Ö. Aytekin, U. Zöngür, and U. Halici, “Texture-based airport runway detection,” IEEE Geosci. Remote Sens. Lett., vol. 10, no. 3, pp. 471–475, 2013. doi: 10.1109/LGRS.2012.2210189. [62] C. Senaras, M. Ozay, and F. T. Yarman-Vural, “Building detection with decision fusion,” IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., vol. 6, no. 3, pp. 1295–1304, 2013. doi: 10.1109/ JSTARS.2013.2249498. [63] V. Vapnik, Statistical Learning Theory. Hoboken, NJ: Wiley, 1998. [64] J. Inglada, “Automatic recognition of man-made objects in high resolution optical remote sensing images by svm classification of geometric image features,” ISPRS J. Photogram. Remote Sens., vol. 62, no. 3, pp. 236–248, 2007. doi: 10.1016/j. isprsjprs.2007.05.011. [65] Y. Freund and R. E. Schapire, “Experiments with a new boosting algorithm,” in Proc. 13th Int. Conf. Machine Learn. (ICML ‘96), Bari, Italy, 1996, pp. 148–156. doi: 10.5555/3091696.3091715. [66] Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,” J. Comput. Syst. Sci., vol. 55, no. 1, pp. 119–139, 1997. doi: 10.1006/ jcss.1997.1504. [67] E. Blanzieri and F. Melgani, “Nearest neighbor classification of remote sensing images with the maximal margin principle,” IEEE Trans. Geosci. Remote Sens., vol. 46, no. 6, pp. 1804–1811, 2008. doi: 10.1109/TGRS.2008.916090. [68] E. Li, J. Femiani, S. Xu, X. Zhang, and P. Wonka, “Robust rooftop extraction from visible band images using higher order CRF,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 8, pp. 4483– 4495, 2015. doi: 10.1109/TGRS.2015.2400462. [69] P. Zhong and R. Wang, “A multiple conditional random fields ensemble model for urban area detection in remote sensing optical images,” IEEE Trans. Geosci. Remote Sens., vol. 45, no. 12, pp. 3978–3988, 2007. doi: 10.1109/TGRS.2007.907109. [70] R. B. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., (CVPR), 2014, pp. 580–587. doi: 10.1109/CVPR.2014.81. [71] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 9, pp. 1904–1916, 2015. doi: 10.1109/TPAMI.2015.2389824. [72] R. B. Girshick, “Fast R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Santiago, Chile, 2015, pp. 1440–1448. doi: 10.1109/ICCV.2015.169. [73] S. Ren, K. He, R. B. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Proc. Annu. Conf. Neural Inf. Process. Syst., 2015, pp. 91–99. [74] J. Dai, Y. Li, K. He, and J. Sun, “R-FCN: Object detection via region-based fully convolutional networks,” in Proc. Annu. Conf. Neural Inf. Process. Syst., 2016, pp. 379–387. 30 [75] J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 779–788. [76] W. Liu et al., “SSD: Single shot multibox detector,” in Proc. 14th Euro. Conf. Comput. Vis., Amsterdam, The Netherlands, 2016, pp. 21–37. doi: 10.1007/978-3-319-46448-0_2. [77] T. Lin, P. Goyal, R. B. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 2, pp. 318–327, 2020. doi: 10.1109/ TPAMI.2018.2858826. [78] H. Law and J. Deng, “Cornernet: Detecting objects as paired keypoints,” Int. J. Comput. Vis., vol. 128, no. 3, pp. 642–656, 2020. doi: 10.1007/s11263-019-01204-1. [79] Y. Zhao and J. Yang, “Hyperspectral image denoising via sparse representation and low-rank constraint,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 1, pp. 296–308, 2015. doi: 10.1109/ TGRS.2014.2321557. [80] P. A. Viola and M. J. Jones, “Rapid object detection using a boosted cascade of simple features,” in Proc. 2001 IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 511–518. doi: 10.1109/CVPR.2001.990517. [81] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2014. [Online]. Available: http://arxiv.org/abs/1409.1556 [82] S. Xie, R. B. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” in Proc. 2017 IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 5987–5995. doi: 10.1109/CVPR.2017.634. [83] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” in Proc. 13th Euro. Conf. Comput. Vis., 2014, pp. 346–361. doi: 10.1007/978-3-319-10578-9_23. [84] K. He, G. Gkioxari, P. Dollár, and R. B. Girshick, “Mask R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2017, pp. 2980– 2988. doi: 10.1109/ICCV.2017.324. [85] K. He, G. Gkioxari, P. Dollár, and R. B. Girshick, “Mask R-CNN,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 2, pp. 386– 397, 2020. doi: 10.1109/TPAMI.2018.2844175. [86] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2015, pp. 3431–3440. doi: 10.1109/CVPR.2015.7298965. [87] E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networks for semantic segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 4, pp. 640–651, 2017. doi: 10.1109/ TPAMI.2016.2572683. [88] T. Lin, P. Dollár, R. B. Girshick, K. He, B. Hariharan, and S. J. Belongie, “Feature pyramid networks for object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 936–944. doi: 10.1109/CVPR.2017.106. [89] W. Ouyang, K. Wang, X. Zhu, and X. Wang, “Chained cascade network for object detection,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2017, pp. 1956–1964. doi: 10.1109/ICCV.2017.214. [90] Z. Cai and N. Vasconcelos, “Cascade R-CNN: Delving into high quality object detection,” in Proc. IEEE Conf. Comput. Vis. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
[91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] Pattern Recognit. (CVPR), 2018, pp. 6154–6162. doi: 10.1109/ CVPR.2018.00644. K. Chen et al., “Hybrid task cascade for instance segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2019, pp. 4974–4983. doi: 10.1109/CVPR.2019.00511. J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 6517–6525. doi: 10.1109/CVPR.2017.690. J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,” 2018. [Online]. Available: http://arxiv.org/abs/1804.02767 H. Law and J. Deng, “CornerNet: Detecting objects as paired keypoints,” in Proc. 15th Euro. Conf. Comput. Vis., 2018, pp. 765–781. A. Newell, K. Yang, and J. Deng, “Stacked hourglass networks for human pose estimation,” in Proc. 14th Euro. Conf. Comput. Vis., Amsterdam, The Netherlands, 2016, pp. 483–499. K. Duan, S. Bai, L. Xie, H. Qi, Q. Huang, and Q. Tian, “Centernet: Keypoint triplets for object detection,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2019, pp. 6568–6577. M. Kisantal, Z. Wojna, J. Murawski, J. Naruniec, and K. Cho, “Augmentation for small object detection,” 2019. [Online]. Available: http://arxiv.org/abs/1902.07296 S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path aggregation network for instance segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 8759–8768. C. Deng, M. Wang, L. Liu, and Y. Liu, “Extended feature pyramid network for small object detection,” 2020. [Online]. Available: https://arxiv.org/abs/2003.07021 Q. Zhao et al., “M2Det: A single-shot object detector based on multi-level feature pyramid network,” in Proc. 33rd AAAI Conf. Artific. Intell., 2019, pp. 9259–9266. doi: 10.1609/aaai. v33i01.33019259. Z. Liu, G. Gao, L. Sun, and Z. Fang, “HRDNet: High-resolution detection network for small objects,” 2020. [Online]. Available: https://arxiv.org/abs/2006.07607 J. Noh, W. Bae, W. Lee, J. Seo, and G. Kim, “Better to follow, follow to be better: Towards precise supervision of feature superresolution for small object detection,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2019, pp. 9724–9733. J. Li, X. Liang, Y. Wei, T. Xu, J. Feng, and S. Yan, “Perceptual generative adversarial networks for small object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 1951–1959. doi: 10.1109/CVPR.2017.211. X. Yang et al., “SCRDet: Towards more robust detection for small, cluttered and rotated objects,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2019, pp. 8231–8240. doi: 10.1109/ ICCV.2019.00832. J. Lim, M. Astrid, H. Yoon, and S. Lee, “Small object detection using context and attention,” 2019. [Online]. Available: http:// arxiv.org/abs/1912.06319 F. Yang, H. Fan, P. Chu, E. Blasch, and H. Ling, “Clustered object detection in aerial images,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2019, pp. 8310–8319. V. S. Frost, J. A. Stiles, K. S. Shanmugan, and J. C. Holtzman, “A model for radar images and its application to adaptive digital filtering of multiplicative noise,” IEEE Trans. Pattern DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE Anal. Mach. Intell., vol. PAMI-4, no. 2, pp. 157–166, 1982. doi: 10.1109/TPAMI.1982.4767223. [108] D. T. Kuan, A. A. Sawchuk, T. C. Strand, and P. Chavel, “Adaptive noise smoothing filter for images with signal-dependent noise,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-7, no. 2, pp. 165–177, 1985. doi: 10.1109/TPAMI.1985.4767641. [109] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Phys. D, Nonlinear Phenomena, vol. 60, nos. 1–4, pp. 259–268, 1992. doi: 10.1016/01672789(92)90242-F. [110] C. R. Vogel and M. E. Oman, “Fast, robust total variationbased reconstruction of noisy, blurred images,” IEEE Trans. Image Process., vol. 7, no. 6, pp. 813–824, 1998. doi: 10.1109/ 83.679423. [111] J. Cai, H. Ji, C. Liu, and Z. Shen, “Framelet-based blind motion deblurring from a single image,” IEEE Trans. Image Process., vol. 21, no. 2, pp. 562–572, 2012. doi: 10.1109/TIP.2011.2164413. [112] M. Xu, M. R. Pickering, A. J. Plaza, and X. Jia, “Thin cloud removal based on signal transmission principles and spectral mi x ture analysis,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 3, pp. 1659–1669, 2016. doi: 10.1109/TGRS.2015. 2486780. [113] Y. Zhang, B. Guindon, and J. Cihlar, “An image transform to characterize and compensate for spatial variations in thin cloud contamination of Landsat images,” Remote Sens. Environ., vol. 82, nos. 2–3, pp. 173–187, 2002. doi: 10.1016/S00344257(02)00034-2. [114] S. Le Hégarat-Mascle and C. André, “Use of Markov random fields for automatic cloud/shadow detection on high resolution optical images,” ISPRS J. Photogram. Remote Sens., vol. 64, no. 4, pp. 351–366, 2009. doi: 10.1016/j.isprsjprs.2008. 12.007. [115] J. Zhang, M. K. Clayton, and P. A. Townsend, “Missing data and regression models for spatial images,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 3, pp. 1574–1582, 2015. doi: 10.1109/ TGRS.2014.2345513. [116] C. Zeng, H. Shen, and L. Zhang, “Recovering missing pixels for Landsat ETM + SLC-off imagery using multi-temporal regression analysis and a regularization method,” Remote Sens. Environ., vol. 131, pp. 182–194, Apr. 2013. doi: 10.1016/j.rse.2012. 12.012. [117] X. Li, H. Shen, L. Zhang, and H. Li, “Sparse-based reconstruction of missing information in remote sensing images from spectral/temporal complementary information,” ISPRS J. Photogram. Remote Sens., vol. 106, pp. 1–15, Aug. 2015. doi: 10.1016/j.isprsjprs.2015.03.009. [118] H. Li, L. Zhang, and H. Shen, “An adaptive nonlocal regularized shadow removal method for aerial remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 1, pp. 106–120, 2014. doi: 10.1109/TGRS.2012.2236562. [119] G. D. Finlayson, S. D. Hordley, and M. S. Drew, “Removing shadows from images using retinex,” in Proc. 10th Color Imag. Conf., Color Sci. Eng. Syst., Technol., Appl., 2002, pp. 73–79. [120] A. Suzuki, A. Shio, H. Arai, and S. Ohtsuka, “Dynamic shadow compensation of aerial images based on color and spatial analysis,” in Proc. 15th Int. Conf. Pattern Recognit. (ICPR’00), 31
Barcelona, Spain, 2000, pp. 1317–1320. doi: 10.1109/ICPR.2000. 905339. [121] H. Song, B. Huang, and K. Zhang, “Shadow detection and re­­ construction in high-resolution satellite images via morphological filtering and example-based learning,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 5, pp. 2545–2554, 2014. doi: 10.1109/TGRS.2013.2262722. [122] H. Li, B. S. Manjunath, and S. K. Mitra, “Multi-sensor image fusion using the wavelet transform,” in Proc. 1994 Int. Conf. Image Process., pp. 51–55. doi: 10.1109/ICIP.1994.413273. [123] F. Gao, J. G. Masek, M. R. Schwaller, and F. G. Hall, “On the blending of the Landsat and MODIS surface reflectance: Predicting daily Landsat surface reflectance,” IEEE Trans. Geosci. Remote Sens., vol. 44, no. 8, pp. 2207–2218, 2006. doi: 10.1109/ TGRS.2006.872081. [124] Q. Wei, J. M. Bioucas-Dias, N. Dobigeon, and J. Tourneret, “Hyperspectral and multispectral image fusion based on a sparse representation,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 7, pp. 3658–3668, 2015. doi: 10.1109/TGRS.2014.2381272. [125] L. Zhang and Y. Zhang, “Airport detection and aircraft recognition based on two-layer saliency model in high spatial resolution remote-sensing images,” IEEE J Sel. Topics Appl. Earth Observ. Remote Sens., vol. 10, no. 4, pp. 1511–1524, 2017. doi: 10.1109/JSTARS.2016.2620900. [126] Y. Long, Y. Gong, Z. Xiao, and Q. Liu, “Accurate object localization in remote sensing images based on convolutional neural networks,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 5, pp. 2486–2498, 2017. doi: 10.1109/TGRS.2016.2645610. [127] X. Yao, J. Han, L. Guo, S. Bu, and Z. Liu, “A coarse-to-fine model for airport detection from remote sensing images using target-oriented visual saliency and CRF,” Neurocomputing, vol. 164, pp. 162–172, Sept. 2015. doi: 10.1016/j.neucom.2015. 02.073. [128] J. Han, D. Zhang, G. Cheng, L. Guo, and J. Ren, “Object detection in optical remote sensing images based on weakly supervised learning and high-level feature learning,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 6, pp. 3325–3337, 2015. doi: 10.1109/TGRS.2014.2374218. [129] Z. Xiao, Y. Gong, Y. Long, D. Li, X. Wang, and H. Liu, “Airport detection based on a multiscale fusion feature for optical remote sensing images,” IEEE Geosci. Remote Sens. Lett., vol. 14, no. 9, pp. 1469–1473, 2017. doi: 10.1109/LGRS.2017. 2712638. [130] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004. doi: 10.1023/B:VISI.0000029664.99615.94. [131] B. Sirmaçek and C. Ünsalan, “Urban-area and building detection using SIFT keypoints and graph theory,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 4, pp. 1156–1167, 2009. doi: 10.1109/ TGRS.2008.2008440. [132] Z. Shi, X. Yu, Z. Jiang, and B. Li, “Ship detection in high-resolution optical imagery based on anomaly detector and local shape feature,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 8, pp. 4511–4523, 2014. doi: 10.1109/TGRS.2013.2282355. [133] C. Tao, L. Mi, Y. Li, J. Qi, Y. Xiao, and J. Zhang, “Scene contextdriven vehicle detection in high-resolution aerial images,” 32 IEEE Trans. Geosci. Remote Sens., vol. 57, no. 10, pp. 7339–7351, 2019. doi: 10.1109/TGRS.2019.2912985. [134] H. Sun, X. Sun, H. Wang, Y. Li, and X. Li, “Automatic target detection in high-resolution remote sensing images using spatial sparse coding bag-of-words model,” IEEE Geosci. Remote Sens. Lett., vol. 9, no. 1, pp. 109–113, 2011. doi: 10.1109/ LGRS.2011.2161569. [135] C. Szegedy et al., “Going deeper with convolutions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2015, pp. 1–9. doi: 10.1109/CVPR.2015.7298594. [136] S. Zhuang, P. Wang, B. Jiang, G. Wang, and C. Wang, “A single shot framework with multi-scale feature fusion for geospatial object detection,” Remote Sens., vol. 11, no. 5, p. 594, 2019. doi: 10.3390/rs11050594. [137] S. Chen, R. Zhan, and J. Zhang, “Geospatial object detection in remote sensing imagery based on multiscale single-shot detector with activated semantics,” Remote Sens., vol. 10, no. 6, p. 820, 2018. doi: 10.3390/rs10060. [138] W. Li, R. Dong, H. Fu, and L. Yu, “Large-scale oil palm tree detection from high-resolution satellite images using two-stage convolutional neural networks,” Remote Sens., vol. 11, no. 1, p. 11, 2019. doi: 10.3390/rs11010011. [139] W. Guo, W. Yang, H. Zhang, and G. Hua, “Geospatial object detection in high resolution satellite images based on multiscale convolutional neural network,” Remote Sens., vol. 10, no. 1, p. 131, 2018. doi: 10.3390/rs10010131. [140] X. Zhang et al., “Geospatial object detection on high resolution remote sensing imagery based on Double multi-scale feature Pyramid Network,” Remote Sens., vol. 11, no. 7, p. 755, 2019. doi: 10.3390/rs11070755. [141] H. Qiu, H. Li, Q. Wu, F. Meng, K. N. Ngan, and H. Shi, “A2RMNet: Adaptively Aspect Ratio Multi-Scale Network for object detection in remote sensing images,” Remote Sens., vol. 11, no. 13, pp. 1–23, 2019. doi: 10.3390/rs11131594. [142] X. Wu, D. Hong, P. Ghamisi, W. Li, and R. Tao, “MsRi-CCF: Multi-scale and rotation-insensitive convolutional channel features for geospatial object detection,” Remote Sens., vol. 10, no. 12, p. 1990, 2018. doi: 10.3390/rs10121990. [143] D. AL-Alimi, Y. Shao, R. Feng, M. A. Al-Qaness, M. A. Elaziz, and S. Kim, “Multi-scale geospatial object detection based on shallow-deep feature extraction,” Remote Sens., vol. 11, no. 21, 2019. [144] Z. Deng, H. Sun, S. Zhou, J. Zhao, L. Lei, and H. Zou, “Multiscale object detection in remote sensing imagery with convolutional neural networks,” ISPRS J. Photogram. Remote Sens., vol. 145, pp. 3–22, Nov. 2018. doi: 10.1016/j.isprsjprs. 2018.04.003. [145] Z. Li, H. Shen, Q. Cheng, Y. Liu, S. You, and Z. He, “Deep learning based cloud detection for medium and high resolution remote sensing images of different sensors,” ISPRS J. Photogram. Remote Sens., vol. 150, pp. 197–212, Mar. 2019. doi: 10.1016/j. isprsjprs.2019.02.017. [146] C. Dong, J. Liu, F. Xu, and C. Liu, “Ship detection from optical remote sensing images using multi-scale analysis and Fourier HOG descriptor,” Remote Sens., vol. 11, no. 13, p. 1529, 2019. doi: 10.3390/rs11131529. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
[147] Y. You, Z. Li, B. Ran, J. Cao, S. Lv, and F. Liu, “Broad area target search system for ship detection via deep convolutional neural network,” Remote Sens., vol. 11, no. 17, p. 1965, 2019. doi: 10.3390/rs11171965. [148] W. Kang, Y. Xiang, F. Wang, and H. You, “EU-Net: An efficient fully convolutional network for building extraction from optical remote sensing images,” Remote Sens., vol. 11, no. 23, p. 2813, 2019. doi: 10.3390/rs11232813. [149] N. Mo, L. Yan, R. Zhu, and H. Xie, “Class-specific anchor based and context-guided multi-class object detection in High Resolution Remote Sensing Imagery with a convolutional neural network,” Remote Sens., vol. 11, no. 3, p. 272, 2019. doi: 10.3390/rs11030272. [150] W. Xie, H. Qin, Y. Li, Z. Wang, and J. Lei, “A novel effectively optimized one-stage network for object detection in remote sensing imagery,” Remote Sens., vol. 11, no. 11, p. 1376, 2019. doi: 10.3390/rs11111376. [151] G. Cheng, P. Zhou, and J. Han, “RIFD-CNN: Rotation-invariant and fisher discriminative convolutional neural networks for object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 2884–2893. doi: 10.1109/ CVPR.2016.315. [152] J. Bruna and S. Mallat, “Invariant scattering convolution networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 1872–1886, 2013. doi: 10.1109/TPAMI.2012.230. [153] H. He, Y. Lin, F. Chen, H. Tai, and Z. Yin, “Inshore ship detection in remote sensing images via weighted pose voting,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 6, pp. 3091–3107, 2017. doi: 10.1109/TGRS.2017.2658950. [154] G. Cheng, J. Han, P. Zhou, and D. Xu, “Learning rotationinvariant and Fisher discriminative convolutional neural networks for object detection,” IEEE Trans. Image Process., vol. 28, no. 1, pp. 265–278, 2019. doi: 10.1109/TIP.2018. 2867198. [155] Y. Zhou, Q. Ye, Q. Qiu, and J. Jiao, “Oriented response networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 4961–4970. [156] K. Li, G. Cheng, S. Bu, and X. You, “Rotation-insensitive and context-augmented object detection in remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 56, no. 4, pp. 2337– 2348, 2018. doi: 10.1109/TGRS.2017.2778300] [157] M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, “Spatial transformer networks,” in Proc. Annu. Conf. Neural Inf. Process. Syst., 2015, pp. 2017–2025. [158] C. Chen, W. Gong, Y. Chen, and W. Li, “Object detection in remote sensing images based on a scene-contextual feature pyramid network,” Remote Sens., vol. 11, no. 3, p. 339, 2019. doi: 10.3390/rs11030339. [159] R. B. Girshick, F. N. Iandola, T. Darrell, and J. Malik, “Deformable part models are convolutional neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2015, pp. 437– 446. doi: 10.1109/CVPR.2015.7298641. [160] W. Ouyang et al., “DeepID-Net: Deformable deep convolutional neural networks for object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2015, pp. 2403–2412. doi: 10.1109/CVPR.2015.7298854. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE [161] J. Dai et al., “Deformable convolutional networks,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2017, pp. 764–773. [162] T. Mordan, N. Thome, G. Hénaff, and M. Cord, “End-to-end learning of latent deformable part-based representations for object detection,” Int. J. Comput. Vis., vol. 127, nos. 11–12, pp. 1659–1679, 2019. doi: 10.1007/s11263-018-1109-z. [163] W. Ouyang and X. Wang, “Joint deep learning for pedestrian detection,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2013, pp. 2056–2063. doi: 10.1109/ICCV.2013.257. [164] X. Wang, A. Shrivastava, and A. Gupta, “A-fast-RCNN: Hard positive generation via adversary for object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 3039–3048. doi: 10.1109/CVPR.2017.324. [165] S. Zhang, J. Yang, and B. Schiele, “Occluded pedestrian detection through guided attention in CNNs,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 6995–7003. doi: 10.1109/CVPR.2018.00731. [166] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), 2005, pp. 886–893. doi: 10.1109/CVPR.2005.177. [167] A. Shrivastava, A. Gupta, and R. B. Girshick, “Training regionbased object detectors with online hard example mining,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 761–769. doi: 10.1109/CVPR.2016.89. [168] S. Zhang, L. Wen, X. Bian, Z. Lei, and S. Z. Li, “Single-shot refinement neural network for object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 4203–4212. [169] G. Zhang, S. Lu, and W. Zhang, “CAD-net: A context-aware detection network for objects in remote sensing imagery,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 12, pp. 10,015–10,024, 2019. doi: 10.1109/TGRS.2019.2930982. [170] J. Jin, K. Fu, and C. Zhang, “Traffic sign recognition with hinge loss trained convolutional neural networks,” IEEE Trans. Intell. Transp. Syst., vol. 15, no. 5, pp. 1991–2000, 2014. doi: 10.1109/ TITS.2014.2308281. [171] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 2261– 2269. doi: 10.1109/CVPR.2017.243. [172] J. Pang, K. Chen, J. Shi, H. Feng, W. Ouyang, and D. Lin, “Libra R-CNN: Towards balanced learning for object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2019, pp. 821–830. doi: 10.1109/CVPR.2019.00091. [173] G. Cheng et al., “Object detection in remote sensing imagery using a discriminatively trained mixture model,” ISPRS J. Photogram. Remote Sens., vol. 85, pp. 32–43, Nov. 2013. doi: 10.1016/j.isprsjprs.2013.08.001. [174] B. Cai, Z. Jiang, H. Zhang, D. Zhao, and Y. Yao, “Airport detection using end-to-end convolutional neural network with hard example mining,” Remote Sens., vol. 9, no. 11, pp. 1–20, 2017. doi: 10.3390/rs9111198. [175] Y. Xu, M. Zhu, S. Li, H. Feng, S. Ma, and J. Che, “End-to-end airport detection in remote sensing images combining cascade region proposal networks and multi-threshold detection networks,” Remote Sens., vol. 10, no. 10, pp. 1–17, 2018. doi: 10.3390/rs10101516. 33
[176] M. Zhu, Y. Xu, S. Ma, S. Li, H. Ma, and Y. Han, “Effective airplane detection in remote sensing images based on multilayer feature fusion and improved nonmaximal suppression algorithm,” Remote Sens., vol. 11, no. 9, p. 1062, 2019. doi: 10.3390/ rs11091062. [177] G. Zhou and Y. Zhang, “Transfer and association: A novel detection method for targets without prior homogeneous samples,” Remote Sens., vol. 11, no. 12, p. 1492, 2019. doi: 10.3390/ rs11121492. [178] Z. Chen, T. Zhang, and C. Ouyang, “End-to-end airplane detection using transfer learning in remote sensing images,” Remote Sens., vol. 10, no. 1, pp. 1–15, 2018. doi: 10.3390/rs10010139. [179] C. Liu, S. Li, F. Chang, and W. Dong, “Supplemental boosting and cascaded ConvNet based transfer learning structure for fast traffic sign detection in unknown application scenes,” Sensors, vol. 18, no. 7, p. 2386, 2018. doi: 10.3390/s18072386. [180] R. Dong, D. Xu, J. Zhao, L. Jiao, and J. An, “Sig-NMS-based RCNN combining transfer learning for small target detection in VHR optical remote sensing imagery,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 11, pp. 8534–8545, 2019. doi: 10.1109/ TGRS.2019.2921396. [181] A. Chan-Hon-Tong and N. Audebert, “Object detection in re­­mote sensing images with center only,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2018, pp. 7054–7057. doi: 10.1109/ IGARSS.2018.8517860. [182] B. Kellenberger, D. Marcos, S. Lobry, and D. Tuia, “Half a percent of labels is enough: Efficient animal detection in UAV imagery using deep CNNs and active learning,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 12, pp. 9524–9533, 2019. doi: 10.1109/TGRS.2019.2927393. [183] R. G. Cinbis, J. J. Verbeek, and C. Schmid, “Weakly supervised object localization with multi-fold multiple instance learning,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 1, pp. 189–203, 2017. doi: 10.1109/TPAMI.2016.2535231. [184] D. P. Papadopoulos, J. R. R. Uijlings, F. Keller, and V. Ferrari, “We don’t need no bounding-boxes: Training object class detectors using only human verification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 854–863. doi: 10.1109/CVPR.2016.99. [185] T. G. Dietterich, R. H. Lathrop, and T. Lozano-Pérez, “Solving the multiple instance problem with axis-parallel rectangles,” Artif. Intell., vol. 89, nos. 1–2, pp. 31–71, 1997. doi: 10.1016/ S0004-3702(96)00034-3. [186] Y. Zhu, Y. Zhou, Q. Ye, Q. Qiu, and J. Jiao, “Soft proposal networks for weakly supervised object localization,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2017, pp. 1859–1868. [187] A. Diba, V. Sharma, A. M. Pazandeh, H. Pirsiavash, and L. V. Gool, “Weakly supervised cascaded convolutional networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 5131–5139. [188] B. Zhou, A. Khosla, À. Lapedriza, A. Oliva, and A. Torralba, “Learning deep features for discriminative localization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 2921–2929. doi: 10.1109/CVPR.2016.319. [189] H. Bilen and A. Vedaldi, “Weakly supervised deep detection networks,” in Proc. IEEE Conf. Comput. Vis. Pattern 34 Recognit. (CVPR), 2016, pp. 2846–2854. doi: 10.1109/CVPR. 2016.311. [190] L. Bazzani, A. Bergamo, D. Anguelov, and L. Torresani, “Selftaught object localization with deep networks,” in Proc. IEEE Winter Conf. Appl. Comput. Vis. (WACV), 2016, pp. 1–9. doi: 10.1109/WACV.2016.7477688. [191] Y. Shen, R. Ji, S. Zhang, W. Zuo, and Y. Wang, “Generative adversarial learning towards fast weakly supervised detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 5764–5773. doi: 10.1109/CVPR.2018.00604. [192] F. Zhang, B. Du, L. Zhang, and M. Xu, “Weakly supervised learning based on coupled convolutional neural networks for aircraft detection,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 9, pp. 5553–5563, 2016. doi: 10.1109/TGRS.2016. 2569141. [193] L. Cao et al., “Weakly supervised vehicle detection in satellite images via multi-instance discriminative learning,” Pattern Recognit., vol. 64, pp. 417–424, Apr. 2017. doi: 10.1016/j.patcog.2016.10.033. [194] Y. Li, Y. Zhang, X. Huang, and A. L. Yuille, “Deep networks under scene-level supervision for multi-class geospatial object detection from remote sensing images,” ISPRS J. Photogram. Remote Sens., vol. 146, pp. 182–196, Sept. 2018. doi: 10.1016/j. isprsjprs.2018.09.014. [195] Y. Ren, C. Zhu, and S. Xiao, “Small object detection in optical remote sensing images via modified Faster R-CNN,” Appl. Sci., vol. 8, no. 5, p. 813, 2018. doi: 10.3390/app8050813. [196] X. Xiao, Z. Zhou, B. Wang, L. Li, and L. Miao, “Ship detection under complex backgrounds based on accurate rotated anchor boxes from paired semantic segmentation,” Remote Sens., vol. 11, no. 21, pp. 1–18, 2019. doi: 10.3390/rs11212506. [197] Y. Gong et al., “Context-aware convolutional neural network for object detection in VHR remote sensing imagery,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 1, pp. 34–44, 2020. doi: 10.1109/TGRS.2019.2930246. [198] W. Ma, Q. Guo, Y. Wu, W. Zhao, X. Zhang, and L. Jiao, “A novel multi-model decision fusion network for object detection in remote sensing images,” Remote Sens., vol. 11, no. 7, pp. 1–18, 2019. doi: 10.3390/rs11070737. [199] K. Cho et al., “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” in Proc. 2014 Conf. Empir. Methods Nat. Lang. Process. (EMNLP), 1724–1734. doi: 10.3115/v1/D14-1179. [200] S. Bell, C. L. Zitnick, K. Bala, and R. B. Girshick, “Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks,” in Proc. 2016 IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 2874–2883. doi: 10.1109/ CVPR.2016.314. [201] B. Zoph and Q. V. Le, “Neural architecture search with reinforcement learning,” in Proc. 5th Int. Conf. Learn. Represent. (ICLR), 2017. [202] Y. Feng, W. Diao, X. Sun, M. Yan, and X. Gao, “Towards automated ship detection and category recognition from highresolution aerial images,” Remote Sens., vol. 11, no. 16, pp. 1–23, 2019. GRS IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
Hyperspectral Image Clustering Current achievements and future lines HAN ZHAI, HONGYAN ZHANG, PINGXIANG LI, AND LIANGPEI ZHANG ST ER TT HU ©S O CK .C OM /S ER GE YN IVE NS H yperspectral remote sensing organically combines traditional space imaging with advanced spectral measurement technologies, delivering advantages stemming from continuous spectrum data and rich spatial information. This development of hyperspectral technology takes remote sensing into a brand-new phase, making the technology widely applicable in various fields. Hyperspectral clustering analysis is widely utilized in hyperspectral image (HSI) interpretation and information extraction, which can reveal the natural partition pattern of pixels in an unsupervised way. In this article, current hyperspectral clustering algorithms are systematically reviewed and summarized in nine main categories: centroid-based, density-based, probability-based, bionics-based, intelligent computing-based, graph-based, subspace clustering, deep learning-based, and hybrid mechanism-based. The performance of several popular hyperspectral clustering methods is demonstrated on two widely used data sets. HSI clustering challenges and possible future research lines are identified. Digital Object Identifier 10.1109/MGRS.2020.3032575 Date of current version: 19 January 2021 DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE THE NECESSITY FOR HSI CLUSTERING Hyperspectral sensors can image an area of interest at a nanometer spectral resolution and collect rich spectral information to capture subtle differences among various ground objects [1]–[3]. An HSI has a 3D cube structure, containing tens and up to hundreds of bands, as shown in Figure 1, to support the fine recognition of ground objects [4]–[9]. This is good news in numerous applications, such as mineral exploration [10], [11], vegetation monitoring [12], [13], the quantitative inversion of physical and biological parameters [14], [15], military reconnaissance [16], [17], and so forth. However, with such high-dimensional data, the interpretation of HSIs commonly relies on a great quantity of high-quality labeled samples to avoid the Hughes phenomenon caused by having insufficient training examples and the underfitting problem that results from the inadequate training of the classifiers [18]–[20]. Unfortunately, in practice, sample collection is commonly time consuming, labor intensive, expensive, and inefficient, and, in some remote and uninhabited areas, training samples can be unavailable, which greatly limits the applications of hyperspectral remote sensing. Therefore, it is necessary to develop unsupervised ground object recognition 0274-6638/21©2021IEEE 35
theory and methods to overcome the restrictions related to labeled samples and prior knowledge. Clustering is an effective unsupervised pattern recognition and information extraction technique, and it is a common means for HSI interpretation [21]–[25]. Hyperspectral clustering groups similar pixels and separate dissimilar pixels, with each assemblage corresponding to a certain class, by fully mining the structural properties of hyperspectral data according to a similarity criterion, such as distance [26], [27], correlation THIS DEVELOPMENT OF [28], spectral angle [29], and HYPERSPECTRAL pair-wise pixel metrics [30]. TECHNOLOGY TAKES Because no labeled samples REMOTE SENSING INTO A are required, clustering seems BRAND-NEW PHASE, more attractive in many apMAKING THE TECHNOLOGY plications, in contrast to suWIDELY APPLICABLE IN pervised classification. EspeVARIOUS FIELDS. cially when there is no labeled sample, clustering can be an effective approach for ground object recognition, improving the application potential of hyperspectral remote sensing to a large degree. HSIs have a much more complex internal structure than handwritten figures, text, natural pictures, and multispectral images. In addition, there is a large spectral variability in HSIs, as pixels from the same class have different spectra, given the complexity of the imaging environment. Generally, in the high-dimensional feature space, the distribution of pixels is relatively sparse and uniform, with no clear rules to follow. Accordingly, hyperspectral clustering is commonly a more challenging task. Hyperspectral clustering has experienced decades of development, and a great quantity of methods has been put forward. However, to the best of our knowledge, very few studies have systematically and comprehensively reviewed FIGURE 1. The 3D cube structure of an HSI. 36 the current research status of hyperspectral clustering. Therefore, in this article, we fill this gap and investigate the current hyperspectral clustering methods in the literature to provide a detailed summary and analysis of various clustering methods, and we discuss challenges and possible future directions. REVIEW OF CURRENT HYPERSPECTRAL CLUSTERING METHODS Hyperspectral clustering generally includes two major tasks, i.e., estimating the number of clusters and constructing the proper clustering model. However, studies of the first task are relatively few in the hyperspectral clustering field. In [31]–[33], the number of clusters is automatically estimated by evolution algorithms and by using statistical histograms. However, these methods are generally bound to specific clustering models, such as the fuzzy c means (FCM) model [34], and are not universally applicable. In addition, many densitybased models can automatically estimate the number of clusters [35]–[37]. However, due to the inherent defects of density-based clustering, such techniques are generally less effective when applied to HSIs, which will be discussed in a later section. In some studies [38], [39], the optimal number of clusters is determined by a series of experiments. However, this strategy is time consuming and not practical in many use instances. In many cases, the number of clusters is regarded as a manually input parameter [21], [22], [40]–[44]. This number can be determined by visually interpreting the original HSIs [21], [41], which is simple and convenient but subjective and not fully automated. More often, in practice, this quantity is set as the number of classes in the ground truth [22], [42]–[44]. Generally speaking, cluster number estimation has always been an important topic in hyperspectral clustering research, while clustering model construction is the core of hyperspectral clustering, whose reasonability and effectiveness have a direct influence on the final clustering accuracy. Thus, the clustering methodology/model has always been a focus in the HSI processing field, and most of the existing work concentrates on clustering methodologies. In the article, we also focus on clustering models and methods. On the basis of the principle and the working mechanism, the current hyperspectral clustering methods can be classified into nine main types: 1) centroid-based methods, 2) density-based methods, 3) probabilitybased methods, 4) bionics-based methods, 5) intelligent computing-based methods, 6) graph-based methods, 7) subspace clustering methods, 8) deep learning-based methods, and 9) hybrid mechanism-based methods. In practice, an HSI can be expressed as a 2D data matrix, i.e., Y = 6Y1, Y2, f, YMN@ ! R D # MN, with each column denoting a pixel, where D and MN represent the number of bands and the number of pixels, respectively. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
For hyperspectral clustering, in the case of c different classes, the core task is to partition the pixels into c different groups based on a certain clustering model, with each group corresponding to a certain class. Different methods deal with the internal structure and the complexity of HSIs, with various model assumptions, which determines their clustering effect to a large degree. A taxonomy of the hyperspectral clustering methods considered in this article appears in Table 1. CENTROID-BASED CLUSTERING METHODS Centroid-based methods are the most classical and representative clustering approach, and they were also the earliest to be introduced to HSI analysis [45], [46]. Such techniques TABLE 1. THE TAXONOMY OF HYPERSPECTRAL CLUSTERING METHODS. CATEGORY MECHANISM SUBCATEGORY REPRESENTATIVE METHODS Centroid Assumes the cluster has a ball-like structure in the feature space; clusters HSIs by iteratively minimizing the overall partition error Hard partition k-means [47], ISODATA [49], NC-k-mean [52] Soft partition FCM [45], FCM-S1 [64], FLDNICM [69] Density Assumes clusters are density point sets separated by sparse areas in the feature space; clusters HSIs based on the local density and relative distances of pixels — CFSFDP [71], DAE [72], SSDL [77] Probability Assumes pixels from the same class satisfy a probability distribution model; clusters HSIs based on a probability rule — GMM [79], ICAMM [80], CLDD [86] Bionics Simulates the complex internal structure of HSIs with a certain biological model; clusters HSIs through a biological evolution algorithm — SOM [88], UAIC [42], UADSM [39] Intelligent computing Based on other clustering models; utilizes advanced intelligent computing algorithms to search for the global optimal solution to the clustering model Single objective FCIDE [92], MoDEFC [31], PSO-GMM [93] Multiple objective AFCMDE [32], AFCMOMA [94], MOPSO [38] Models the similarity among pixels with an adjacency matrix; clusters HSIs with a graph cut algorithm Complete graph SC [105], SENP [106], NLTV [107] Bipartite graph SSCC-BG [115], S-SC [116], BGP-CJS [117] Abbreviated graph FSCAG [43], SGCNR [121] Models the internal complex structure of HSIs via the union of subspaces; explores the underlying adjacency between pixels through self-representation learning; groups HSIs by applying spectral clustering (SC) to the adjacent matrix induced by the coefficient matrix Spectral–spatial subspace clustering S 4C [40], L2-SSC [41], SSC3DEPF [128] Multiple-view subspace clustering SSMLC [139], k-SSMLC [140], p-SSMLC [141] Kernel subspace clustering KSSC-SMP [142], KSLRSC [143] Relies on deep neural networks to learn more discriminative features for clustering and more accurately simulate the nonlinearity of data Autoencoder DCN [147], DMC [148], DSCNet [151] Separated network CCNN [155], DBNC [156], JSL [159] — — Generative network CatGAN [162], DAGMC [164], VaDE [166] Hybrid mechanism Deals with the clustering task by combining two or more clustering models — k-GMM [168], k-FDPC [169], SDCR [174] Graph Subspace clustering Deep learning ISODATA: iterative self-organizing data analysis technique algorithm; NC-k-mean: neighborhood-constrained k-means; FCM-S1: FCM with mean filtered spatial information; FLDNICM: fuzzy local double neighborhood information c-means; CFSFDP: clustering by the fast searching and finding of density peaks; DAE: density analysis ensemble; SSDL: spectral–spatial (SS) diffusion learning; GMM: Gaussian mixture model (MM); ICAMM: independent component analysis MM; CLDD: clustering based on the latent Dirichlet distribution; SOM: selforganizing map; UAIC: unsupervised artificial immune classifier; UADSM: unsupervised spectral matching classifier based on artificial deoxyribonucleic acid (DNA) computing; FCIDE: fuzzy clustering (FC) using improved differential evolution (DE); MoDEFC: modified DE FC; PSO–GMM: particle swarm optimization-based GMM; AFCMDE: automatic FC based on multiple-objective DE; AFCMOMA: adaptive multiple-objective memetic FC algorithm; MOPSO: multiple-objective PSO; SC: spectral clustering; SENP: Schroedinger Eigenmap with nondiagonal potentials; NLTV: graph-based nonlocal total variation; SSCC-BG: SS coclustering based on a bipartite graph (BG); S-SC: sequential SC; BGP-CJS: BG partition-based coclustering with joint sparsity; FSCAG: fast SC with anchor graph; SGCNR: scalable graph-based clustering with nonnegative relaxation; S 4C: SS sparse subspace clustering; L2-SSC: , 2 -norm regularized sparse subspace clustering; SSC-3DEPF: SSC based on 3D edge-preserving filtering; SSMLC: SS-based multiple-view low-rank SSC; p-SSMLC: parallel SSMLC; DCN: deep clustering network; DMC: deep multiple-manifold clustering; DSCNet: deep SSC based on an autoencoder network; CCNN: clustering based on a convolutional neural network; DBNC: deep brief network nonparametric clustering; JSL: joint unsupervised learning; CatGAN: categorical generative adversarial network; DAGMC: deep adversarial Gaussian mixture autoencoder clustering; VaDE: variational deep embedding; k-GMM: hybridization of the k-means and the GMM; k-FDPC: hybridization of the k-means and fast finding of density peaks clustering; SDCR: sparse dictionary-based anchor regression. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 37
c are based on the assumption that a cluster has a “ball-like” structure in the feature space. Starting with random initializations, such methods iteratively update the centroids and their associated pixel partitions until the overall partition error meets the tolerance requirement or the number of iterations reaches the predefined HYPERSPECTRAL maximum value, as illustrated CLUSTERING HAS in Figure 2. Partition error is EXPERIENCED DECADES OF generally defined as the sum DEVELOPMENT, AND A of squared distances between GREAT QUANTITY OF the assigned pixels and the corMETHODS HAS BEEN PUT responding centroids across all FORWARD. classes. Centroid-based clustering methods mainly include two types, i.e., hard partition clustering and soft partition clustering, based on whether a pixel belongs to multiple classes or not. min / / Y j - n i 22, (1a) 2 2 # Y j - n i 22 , i ! "1, f, c ,, (1b) i=1 j=1 ni 1 n i = n i / Yl , l=1 Y j - n i) where n i denotes the centroid of the ith cluster and n i represents the number of pixels in the ith cluster. Specifically, the k-means starts with randomly selected centroids and then iteratively updates the cluster centroids, with each pixel Y j assigned to the nearest cluster centroid n i) based on the distance metric, according to (1b) [48], until the cluster centroids do not change or the total partition error in (1a) does not significantly vary. Based on the k-means, numerous improved methods were developed. For example, the iterative self-organizing data analysis technique algorithm (ISODATA) was proposed to improve the clustering effect by integrating the dynamic adjustment mechanism of clusters into the clustering process, and it was successfully applied to HSIs [49]. In [50], a distributed k-means clustering method was developed for HSIs to further improve efficiency and practicability by employing the parallel computing technique. In [51], a kernel k-means was used for HSI feature extraction, which conducts clustering in the much-higher-dimensional kernel space to relieve the nonlinearity of HSIs. In addition, a neighborhood-constrained k-means HARD PARTITION-BASED CLUSTERING Hard partition-based methods allow each pixel to belong to only one class and assign each pixel to the nearest cluster. A typical example is the k-means [47], commonly considered as the originator of clustering analysis and one of the earliest clustering methods applied to HSIs [46]. The principle of the k-means is simple: it segments HSIs by minimizing the partition error across all c classes, as in (1): (a) MN (b) (c) (e) (d) Iteratively Updating the Centroids and the Pixel Assignment … (f) FIGURE 2. The centroid-based clustering mechanism. (a) The original pixel points. (b) The initialization (randomly selecting the centroids). (c) The pixel assignment. (d) Updating the centroids. (e) Updating the pixel assignment. (f) The clustering result. 38 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
(NC-k-means) approach was put forward, inspired by the clearly evident spatial correlation among neighboring pixels [52]. With a pure neighborhood index integrated into (1), the spatial information of HSIs is incorporated to help with the spectral analysis, and a much better clustering result is obtained. Furthermore, a two-stage k-means clustering technique combined with a neighboring union histogram (k-NUH) was developed, integrating the spatial information by the NUH [53]. It divides HSIs into several uncorrelated groups and computes the NUH of each collection based on the first few principle components. Then, it employs a twostage k-means model to cluster HSIs from rough to fine. Moreover, an improved k-means (I-means) algorithm was proposed for HSI mineral mapping. It takes the spectral information divergence as the similarity measurement and initializes the centroids via three different strategies [54]. SOFT PARTITION-BASED CLUSTERING Differing from hard partition-based approaches, soft partition-based methods consider the uncertainty of the pixel partitioning during the clustering process, allowing each pixel to belong to multiple classes, which may be more suitable for HSIs, due to the mixed pixel problem. Such techniques assign a fuzzy membership to each pixel in the range of [0, 1], with the sum of the memberships across all c classes being equal to one. The most representative soft partition-based clustering method is the FCM model [34], [45], which can be formulated as in (2): c min / MN / U mi,j Y j - n i 22 , (2a) i=1 j=1 MN / U mi,j Y j ni = j=1 MN / U mi,j j=1 , U i, j = c / Yj - ni -2/^m - 1 h Yj - nl -2/^m - 1 h , (2b) l=1 where U denotes the fuzzy membership matrix, with each element U i, j standing for the fuzzy membership of the jth pixel belonging to the ith centroid; n i represents the ith centroid, which can be updated according to (2b); and m is the fuzzy exponent. Based on the FCM, many enhanced methods were successively proposed. In [55] and [56], two weighted FCM models were developed, i.e., fuzzy weighted c-means (FWCM) and new weighed FCM (NW-FCM). These two approaches weight the similarity between neighboring pixels and the center pixel, which effectively improves the clustering performance. In [57], an uncertainty analysis-based FCM (UAFCM) algorithm was introduced. It detects pixels that have a large uncertainty through entropy and squared error-based criterion and reclassifies those pixels to refine the clustering results. In addition, to address the nonlinearity of HSIs, a kernel FCM was used in HSI semisupervised classification [58]. To overcome the sensitivity of the FCM DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE to initialization, an improved FCM algorithm based on the support vector domain description (SVDD) was proposed for HSIs [59]. It estimates the cluster centroids based on the SVDD to reduce the influence of noise and outliers on the centroids. Furthermore, in [33], an automatic histogrambased FCM (AHFCM) algorithm was developed. It obtains the initializations and the number of clusters for the FCM through two steps, clustering each band by calculating the slopes in the histogram and automatically fusing the labeled images. However, these techniques take only the spectral information into account, which are susceptible to noise and singular points and the spatial homogeneity of the clustering result is difficult to guarantee. To overcome these obstacles, a large number of enhanced FCM models that incorporate spatial information were developed. A representative example can be found in the spatial model for fuzzy clustering (SMFC) [60], with the formulation shown in (3): c min / MN / U mi,j 2 2 Yj - ni i=1 j=1 b c MN + 2 / / U mi, j / / U mp, q, (3a) i=1 j=1 p ! Mi q ! N j MN / U mi,j Y j ni = U i, j = / ` Yj - n c l=1 j=1 MN / U mi,j j=1 2 i 2 + b / p ! Mi / q ! N j U j ` Yj - nl 2 2 m -1/^m - 1 h p, q + b / p ! Mi / q ! N j U mp, q j -1/^m - 1 h  , (3b) where N j represents t he neig hbors of pi xel j, M i = " 1, 2, f, c , \ " i ,, and b is a tradeoff parameter. By adding a spatial penalty term to (3), the spatial neighborhood information is integrated to smooth the membership matrix, which leads to a more accurate result. In addition, in [61], a conditional FCM (C-FCM) algorithm was proposed. It simultaneously makes use of spectral–spatial information via the generalized multiplication of the spatial information and the spectral information. These methods have been successfully applied to HSIs [62]. To better utilize the spatial information, a neighborhood constraint clustering (NCC) algorithm was put forward [62]. It exploits the local spatial information via a neighborhood homogeneity index and obtains more smooth clustering results with a higher accuracy for HSIs. In addition, through adding a spatial constraint term to (2), an FCM with spatial information (FCM-S) algorithm was proposed [63]. It explores the spatial neighborhood information through a local window that is opened for each target pixel and obtains much better performance compared to the FCM. However, the FCM-S is computationally complex. To tackle this problem, two improved versions were developed, i.e., the FCM-S1 and FCM-S2, which, respectively, employ the mean filtered result and the median filtered result to simplify the spatial information calculation [64]. These techniques were then successfully applied to HSIs [65]. However, the spatial regularization 39
pixels and the center pixel to accurately model the spatial contextual information of HSIs. In this way, the clustering accuracy is further improved. Generally speaking, due to their simplicity and efficiency, centroid-based methods are very popular in many practical applications. However, centroid-based methods, in essence, belong to the “mountain-climbing” algorithms, which are easy to sink into the local optimal solutions [65], [70]. What is worse, the “ball-like” structure assumption generally cannot be satisfied by HSIs, due to a complex internal structure and a large spectral variability, which limit the approaches’ clustering performance to a large degree. parameters are difficult to determine, and the global information is poorly utilized. To overcome the drawbacks of these models, an adaptive memetic fuzzy clustering algorithm with spatial information (AMASFC) was proposed [65]. Through adaptively determining the spatial regularization parameters based on the information entropy and by simultaneously exploring the local information and the global information via the memetic algorithm, the clustering accuracy is further improved. Furthermore, a fuzzy approach with the spatial membership relations (FASMR) algorithm was proposed [66]. It incorporates the spatial information via a Gaussian filter and explores the membership relations among pixels in a local neighborhood. Moreover, by defining a fuzzy factor to integrate the spectral information and the local spatial information and to avoid parameter determination, a new fuzzy local information c-means clustering model (FLICM) was developed [67]. It was then applied to HSIs [68]. However, the FLICM has drawbacks, such as fuzzy edges and poor maintenance of spatial details. Faced with these obstacles, an adaptive FLICM (ADFLICM) algorithm was put forward [68]. It constructs a pixel spatial attraction model to adaptively measure the effects of neighboring pixels through weighting, which better recognizes the boundaries among different classes and maintains the details. Then, by flexibly exploiting the local spatial information and the spectral information, an improved version, the fuzzy local double neighborhood information c-means clustering (FLDNICM) algorithm, was introduced [69]. A fuzzy prior probability function is constructed based on the mutual dependent information between neighboring DENSITY-BASED CLUSTERING METHODS Density-based clustering methods partition pixels according to density criteria, under the basic assumption that clusters are generally dense point sets separated by sparse areas in the feature space. Such methods cluster HSIs based on the local density and the relative distances of pixels, as detailed in Figure 3. A typical example is the clustering by the fast searching and finding of density peaks (CFSFDP) algorithm [71]. It assumes that cluster centroids are surrounded by pixel points that have a lower density and are relatively far from pixel points with a higher density, computing two quantities for each pixel, i.e., local density t i and relative distance d i, as in (4), to search the optimal centroids: t i = / | ^d ij - d c h, | ^ x h = ( j di = ( 1, if x 1 0 , (4a) 0, otherwise min ^d ij h, j : t j 2 t i , (4b) max ^d ij h, t i = max ^t h (b) Pixel Assignment 0.5 0.4 Cluster Centroids 0.3 δ (a) 0.2 (d) 0.1 0 0 40 ρ 80 120 160 (c) FIGURE 3. The density-based clustering mechanism. (a) The HSI. (b) The density assumption for the clusters. (c) Searching the cluster centroids based on the local density and the relative distances. (d) The clustering result. 40 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
where d ij denotes the distance between pixels i and j, and d c represents the cutoff distance. Cluster centroids can be found by constructing a decision graph, i.e., a d-t graph, or determined by the measurement c i = t i # d i . Pixels with significantly large d and relatively large t values in the decision graph or with a significantly large c are considered to be centroids. Then, by assigning each pixel to the nearest cluster centroid, the final clustering result can be obtained. To further improve the efficiency and accuracy of CFSFDP, an enhanced version, i.e., density analysis ensemble (DAE) clustering, was developed for HSIs [72]. The DAE uses a random subspace ensemble to establish a series of clustering systems, with each individual system corresponding to a density analysis. Subsequently, the final clustering result is obtained by majority voting. Another representative method is the density-based spatial clustering of applications with noise (DBSCAN) [73]. The core idea of DBSCAN is to find pixels that have a higher density and connect them to generate clusters. The approach was utilized for HSI band selection and obtained good results [74]. In addition, the mean shift (MS) is also a typical density-based model, based on the rule of density gradient rising [75]. In [76], an adaptive MS algorithm was put forward by integrating nonnegative matrix factorization (NMF) and bandwidth selection, which better segments HSIs. In addition, in recent years, a series of nearest-neighbor density-based clustering methods were developed for HSIs. For example, the k-nearest-neighbor density-based clustering (KNNCLUST) method was proposed by extending the k-nearest-neighbor (KNN) model to an iterative procedure to automatically estimate the number of clusters [35]. Each pixel is assigned based on its KNNs and the distances to those neighbors by using the Bayes decision rule. Then, KNNCLUST was applied to HSIs, and a stochastic extended version, i.e., the kernel stochastic expectation maximum (KSEM), was developed for HSIs [36]. The KSEM employs KNNs to estimate the contextual class conditional distribution, which it iteratively updates with the posterior probability to account for the current clustering result. Then, the KSEM defines the stopping criterion based on the clustering entropy to make the conditional distribution converge to a stationary clustering result. As a result, the KSEM outperforms KNNCLUST. Moreover, a graph watershed clustering based on nearest neighbors (GWNN) algorithm was introduced for HSIs to alleviate the quadratic complexity of KNN estimation [37]. GWNN utilizes a labeling rule similar to KNNCLUST to account for the local density values and introduce a coarse-tofine multiresolution scheme, instead of a full KNN graph computation with all pixels. Consequently, GWNN effectively enhances the efficiency of the model and obtains a high clustering accuracy. Furthermore, in [77], an unsupervised spectral–spatial diffusion learning (SSDL)-based clustering algorithm was proposed for HSIs. SSDL takes advantage of geometrical DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE estimation and diffusion-inspired labeling to excavate the spectral–spatial duality of HSIs, based on the diffusion distances. SSDL includes two main steps, i.e., finding the cluster modes through density estimation and geomatic analysis and assigning pixels to the corresponding modes based on the spectral–spatial proximity. In addition, based on SSDL, an enhanced spectral–spatial diffusion geometry (SSDG)-based clustering method was developed [78]. SSDG introduces the spatially regularized random walk strategy to the diffusion construction, regularizes neighboring pixels by Markov diffusion, searches cluster modes via kernel density estimation and the diffusion distance, and assigns pixels based on the selected modes. As a result, SSDG further improves the clustering accuracy. In a word, density-based methods are relatively robust to noise and the shapes of clusters. In addition, many densitybased methods can automatically estimate the number of clusters. However, the relatively sparse and uniform distribution of the high-dimensional feature space of HSIs makes the assumption of the density-based clustering methods not fully satisfied, which degrades the clustering effect to a large degree. PROBABILITY-BASED CLUSTERING METHODS Probability-based clustering methods partition pixels based on certain likelihood criteria. Such methods assume that pixels from the same class generally obey a certain probability distribution, with each cluster modeled by a multivariate conditional distribution with specific parameters and the HSIs modeled by the joint probability distribution, as in Figure 4. Then, the final clustering result can be obtained by maximizing the likelihood function based on a certain probability stipulation, such as expectation maximization (EM), the maximum posterior probability, and the Bayesian rule. A representative probability-based clustering method is the Gaussian mixture model (GMM) [79]. The GMM is based on the assumption that hyperspectral pixels generally satisfy the Gaussian distribution, and it models each cluster with a multivariate Gaussian conditional distribution, as in (5): c p ^ Yih = / Pj g ^ Yi | m j, C jh, (5a) j =1 MN p ^ Y h = % p ^ Yih, (5b) i =1 where g is a Gaussian probability density function (pdf) and Pj is the prior probability of the jth cluster, with m j and C j denoting the mean vector and the covariance matrix of the jth cluster. Then, according to a certain probability rule, such as EM, the GMM partitions pixels into c different clusters to obtain the final clustering result. Considering that hyperspectral pixels commonly would not strictly obey the Gaussian distribution, an independent component analysis mixture model (ICAMM) was constructed for HSIs [80], [81]. The ICAMM represents each cluster as a non-Gaussian distribution, as in (6): 41
p ^ Yi | Hh = c / p^ Yi | ~ j , i jhP^~ jh,(6a) j=1 MN p ^ Y | H h = % p ^ Yi | H h, (6b) i =1 where H = 6i 1, i 2, f, i c@ is the class parameter set and P ^~ j h is the prior probability of the jth class ~ j . Then, the independent components and the mixing matrix of each class are estimated based on the modified information maximum model, and the membership probability of each pixel to belong to various classes is computed. Based on the maximum membership probability rule, the pixel partition result can be obtained based on the ICA model. A weighted principle component analysis ICA (WPCA-ICA) method was developed to extract the independent features based on second- and higher-order statistics, which performs better for HSIs [82]. Furthermore, in [83], a nonparametric stochastic expectation maximum (NPSEM) algorithm was proposed, which extends stochastic EM to a nonparametric representation to further improve the model’s practicability. The NPSEM was then introduced to HSIs and performed well [36]. In [84], a pairwise Markov field (PMF) model was constructed to segment noisy and blurred astronomical HSIs. It integrates the PMF model into the Bayesian framework to optimize the probability model, and it segments HSIs based on faint singles. In addition, to better learn the similarity among hyperspectral pixels, a layered sparse adaptive possibility c-means clustering (LSAPCM) approach was developed [85]. It integrates the layered possibility into the FCM framework to extend the architecture to a probability optimization model, and it produces good clustering results. In [86], a novel clustering model based on the latent Dirichlet distribution (CLDD) was constructed by introducing the topic model to simulate the structure of HSIs, with each topic modeled by the LDD. Moreover, considering that the mixed pixels of HSIs generally degrade the GMM performance, a Bayesian clustering method based on the spectral mixture model (SMM) and the Markov random field (MRF) was put forward for HSIs [87]. The Bayesian SMM-MRF utilizes the SMM to obtain the end-member abundance for each mixed pixel, and it assigns the mixed pixel according to the dominant endmember. Subsequently, this method integrates the SMM into the Bayesian framework to construct a conditional distribution of the mixed pixels to search for the dominant end-member, with the MRF utilized to optimize the label prior. Last, by solving the maximum posterior probability problem based on the EM rule, the pixel partition result is obtained. By considering the mixed pixel problem and comprehensively utilizing spectral–spatial information, the Bayesian SMM-MRF achieves good performance. As a whole, probability-based clustering methods have strict mathematical foundations and employ various probability theories to optimize the clustering model. However, the complex internal structure and large spectral variability of HSIs make hyperspectral pixels not strictly obey specific probability distributions, and thus they are inconsistent with the assumptions of such methods. As a result, probability-based clustering methods may fail to obtain good performance for HSIs. BIONICS-BASED CLUSTERING METHODS Bionics-based clustering methods employ certain biological models, such as artificial neural networks (NNs), to simulate the complex internal structure of HSIs and partition pixels based on certain biological evolution algorithms, as described in Figure 5. A typical example is the self-organizing map (SOM) model, which is an unsupervised learning method based on the Kohonen NN and has been successfully applied to HSIs [42], [88]. The SOM automatically learns the underlying similarity among the input pixels and then puts similar pixels close together in the network. The SOM generally consists of an input layer and a competitive layer, with a learning stage and a clustering stage. In the learning stage, the winning neurons are selected based on Euclidean distance, and then the weights of the winning neurons and the neighboring neurons are 0.04 0.02 0 20 20 10 (a) 10 0 0 (b) (c) FIGURE 4. The probability-based clustering mechanism. (a) The HSI. (b) The probability model construction and optimization. (c) The clustering result. 42 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
updated, as in (7). In the clustering stage, similar pixels are mapped to the neighboring neurons: DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE FIGURE 5. The bionics-based clustering mechanism. (a) The HSI. (b) The biological model construction. (c) The biological evolution optimization. (d) The clustering result. (b) (a) Artificial DNA Model Evolution Affinity Threshold Artificial Immune Network Fitness Value (c) Reproduction Crossover Mutation Operators Population Memory Cell Antigen New Memory Cell where W kij denotes the weight between neurons i and j in the kth iteration, TW ij stands for the weight gains, d means the Euclidean distance, I ^ $ h represents the activated neuron, h is the learning rate, and v is the kernel parameter. To better simulate the complexity of HSIs, many advanced biological models have been constructed. For example, in [42], an unsupervised artificial immune classifier (UAIC) was proposed. The UAIC utilizes an artificial immune system to simulate the complex internal structure of HSIs and employs a series of biological computation techniques, such as clonal selection, immune network, and immune memory, to partition pixels. Specifically, cluster centroids are randomly selected, and each pixel is assigned to a cluster with the maximum affinity between antigens and antibodies. An immune evolution algorithm is utilized to update the antibody population and the memory cell (MC) pooling until convergence. As a result, the UAIC obtains a relatively good result for HSIs. Then, an enhanced version of the UAIC, i.e., an unsupervised artificial immune network for remote sensing classification (RSUAIN) was constructed to further improve the clustering performance [89]. Instead of utilizing the distance threshold scalar to update the MC pooling and constrain the number of MCs, the RSUAIN introduces two immunological parameters, i.e., the death rate and the suppression rate, to update the MC matrix and determine the structure of the network by controlling the connection of network cells. Then, the RSUAIN forces each class to have an inner network connection and enhances the diversity of the MC population via a suppression rate to improve the evolution quality. In addition, considering the large volume, high dimension, and spectral diversity of HSIs, an unsupervised spectral-matching classifier based on artificial deoxyribonucleic acid (DNA) computing (UADSM) was put forward [39]. The UADSM employs an artificial DNA model to simulate the complexity of HSIs, and it clusters pixels through a series of artificial DNA computing techniques, including DNA spectral coding, optimization, and matching. The UADSM extracts multiple spectral features, such as the shape, amplitude, and slope, to enhance the discriminability of the features and optimizes clusters by recombining DNA strands. Based on the normalized DNA spectral similarity, the spectral signature of each pixel is assigned to the corresponding cluster to obtain the clustering result. Moreover, in [90], a novel context-aware unsupervised discriminative extreme learning machine (CUDELM) algorithm was developed for HSIs. The CUDELM introduces the extended NN, i.e., the ELM, to efficiently learn the structural information. Then, local spectral–spatial information (d) (7) Closest Antibody W kij + 1 = W kij + TW ij ,  TW ij = h exp _ - d 2j, I^ Yih /2v 2 i ^ Y j - W ij h, 43
is incorporated into the hidden layer features via a contextaware propagation filter, and the local and global structural information is integrated through regularization to learn more discriminative features. Consequently, the CUDELM yields accurate clustering results for HSIs. Besides, in [91], a new weighted incremental NN (WINN) method was developed for HSI segmentation. The WINN models the topology of pixels by using a set of weighted nodes, with the weights determined by the local density, and clusters the net through a watershed-like procedure to obtain the final clustering result. On the whole, bionics-based clustering methods can effectively simulate the internal complexity of HSIs to some degree, and they may produce accurate clustering results by employing advanced biological evolution algorithms. However, these methods still face obstacles. For example, the complex structure of HSIs cannot always be well fitted by specific biological models, in practice, and the large spectral variability further reduces the modeling accuracy, which limits the clustering performance. INTELLIGENT COMPUTING-BASED CLUSTERING METHODS Intelligent computing-based clustering methods are generally founded on other clustering models, such as the centroid-based clustering model, and utilize some advanced intelligent computing algorithms, such as genetic evolution, differential evolution, and particle swarm optimization (PSO), to search for the global optimal solution of the clustering model and further improve the clustering performance, as presented in Figure 6. According to the number of objective functions in the optimization problem, intelligent computation-based clustering methods can be further divided into two types: 1) single-objective-based clustering and 2) multiobjective-based clustering. SINGLE-OBJECTIVE-BASED CLUSTERING The single-objective-based clustering method has only a single objective function in the optimization problem, with an intelligent computing technique utilized to search for the global optimal solution. A representative single-objective-based clustering method is fuzzy clustering using an improved differential evolution (FCIDE) algorithm [92]. It introduces a certain validation index as the fitness function and searches for the optimal solution based on the differential evolution algorithm. Specifically, FCIDE utilizes the clustering separation (CS) measure or the Davis–Bouldin (DB) measure as the validation index to define the fitness function, as in (8): f= 1 1 , (8) or f = CS i ^K h + eps DB i ^K h + eps where K denotes the number of clusters and eps is an adjustment factor. The definition of CS and DB can refer to [92]. Based on FCIDE, in [31], a modified differential evolution fuzzy clustering (MoDEFC) algorithm was put forward to further improve the clustering performance. MoDEFC constructs a model using the Xie–Beni index as a validation index. FCIDE and MoDEFC were then introduced to HSIs, delivering good performance [32]. In addition, the AMASFC method employs the memetic algorithm to combine local and global information to search for the optimal solution, and it further improves the clustering accuracy [65]. Moreover, considering that the GMM–EM easily falls into the local optimal solution, a novel PSO-based GMM clustering (PSO-GMM) method was developed for HSIs [93]. It uses the advanced PSO algorithm instead of EM to search for a global optimal solution and improves the parameterization and parameter updating approaches to overcome the degeneracy problem. Consequently, the clustering accuracy is effectively improved. f1 Z2 Z1 Individuals Pareto Front Global Optimum f2 Evolution Algorithm Single-/ Multiobjective Model (a) (b) (d) Particle Swarm Algorithm (c) FIGURE 6. The intelligent computing-based clustering mechanism. (a) The HSI. (b) The clustering model construction. (c) Intelligent comput- ing. (d) The clustering result. 44 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
MULTIOBJECTIVE-BASED CLUSTERING Multiobjective-based clustering methods generally address more than one optimization problem and simultaneously search for optimal solutions based on certain intelligent computing techniques. Compared with the singleobjective-based clustering methods, multiobjective-based clustering approaches are more popular and generally perform better, as they consider numerous factors at the same time, e.g., spectral and spatial information, local and global information. A representative example is the automatic fuzzy clustering based on the multiobjective differential evolution (AFCMDE) algorithm [32]. It extends the MoDEFC model to a multiobjective version for an improved ability to learn the complexity of remote sensing images, with two objective functions included, i.e., the partition error and the Xie–Beni index, as in (9): min f ^ Y h = 6 f1 ^ Y h, f2 ^ Y h@, (9a) c c f1 = / MN /U MN / / U mi,j m i, j Yj - n i =1 j =1 2 i 2 , f2 = Yj - ni i =1 j =1 MN min i ! k n i - n k 2 2 2 2 , (9b) MN / U mi,j Y j ni = j=1 MN / U mi,j j =1 , U i, j = c / Yj - ni -2 /^m - 1 h Yj - nl -2/^m - 1 h . (9c) l =1 Specifically, AFCMDE consists of two layers, i.e., optimization and clustering. In the optimization layer, a feasible number of clusters is obtained by minimizing these two objective functions. In the clustering layer, a nondominated sorting method is utilized to update the population and search the Pareto front to obtain the final clustering result. Through multiobjective optimization, AFCMDE outperforms MoDEFC. Then, based on AFCMDE, a multiobjective memetic FCM algorithm (AFCMOMA) was presented to further improve the optimization capability of the model [94]. The approach introduces the memetic algorithm to balance the local and global search ability and adds a new population-updating strategy to obtain more high-quality individual samples. As a result, the clustering accuracy is further improved. In addition, a novel social recognition-based multiobjective gravitational algorithm (SMGSA) was developed for HSIs to learn the similarity relationships among pixels [95]. The SMGSA algorithm searches individual pixels among the elite ones obtain by the gravitational force and the general ones learned from the social recognition model, based on the whole population, to generate an outstanding exploitation ability. Furthermore, in [38], a novel multiobjective PSO (MOPSO) method was proposed for HSIs to simultaneously solve three problems, i.e., clustering the statistical parameter estimation, searching for the best discriminative bands, and estimating the number of clusters, using three different optimization criteria. Moreover, based on the advanced sparse subspace clustering (SSC) model, a multiobjective SSC (MOSSC) method was put forward for HSIs. It DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE treats the sparse constraint term and the data fidelity term as two objective functions to avoid the manual determination of the regularization parameter, as in the SSC model [96]. Commonly, with the help of advanced intelligent optimization algorithms to search for the global optimal solution to the clustering model, intelligent computing-based clustering methods may perform better than traditional clustering approaches. However, such techniques still have several disadvantages that limit their practical applications to some degree. For example, the principle of such methods is relatively complex, with a high application threshold. In addition, such techniques are generally based on other clustering models, and their performance is limited by the inherent defects of the foundation clustering models, such as FCM and GMM. GRAPH-BASED CLUSTERING METHODS Graph-based clustering is one of the recently developed advanced hyperspectral clustering approaches that is evolved from graph theory. Such methods generally model the relationships among hyperspectral pixels with an adjacency u ! R MN # MN, also known as a similarity graph, matrix W whose element represents the similarity between a corresponding pair of pixels or the penalty factor when separating the corresponding two pixels into different subgraphs. The adjacency matrix is the basis of graph clustering. The quality of the matrix directly affects the final clustering accuracy. In practice, it is generally constructed by the f- ball strategy [97], the KNN strategy [98], and the full connection strategy [99]. Then, by applying a certain graph cut algorithm to minimize the total cutting cost of the adjacency matrix, the final clustering result can be obtained, as shown in Figure 7. Specifically, graph cut is a very important part of graph theory. It aims to segment a graph into several disjoint and distinctive subgraphs by maximizing the intrasubgraph similarity and minimizing the intersubgraph similarity, with each subgraph denoting a specific class. With decades of development, many graph cut algorithms have been developed, including minimum cut [100], radio cut [101], normalized cut [102], average cut [103], minimum–maximum cut [104], and so on. Among them, the normalized cut algorithm is the most widely used. According to the differences among the constructed graphs, graph-based clustering methods can be coarsely divided into three main kinds: 1) complete graph-based clustering, 2) bipartite graph-based clustering, and 3) abbreviated graph-based clustering. COMPLETE GRAPH-BASED CLUSTERING Complete graph-based clustering methods group HSIs based u that consists of all pixels; the maon an adjacency matrix W trix contains the similarity between any pair of pixels, at a size of MN # MN. A typical example is spectral clustering (SC) [102], [105], which generally employs the normalized cut algorithm to conduct graph cutting, with a spectral analysis model formulated as the following optimization problem: 45
min Tr ^F T LFh, (10) (c) 3 0.6 (b) 0 0.7 0.8 1 (a) 0 v6 0 2 0 0.8 1 0.8 v5 0.2 0 0 0.3 1 0.8 0.7 0 v3 0.6 0.8 1 0.3 0 v4 0 0 v2 0.8 1 0.8 0 0 0.8 0.8 1 v1 1 0.8 0.6 0 0.2 0 v1 v2 v3 v4 v5 v6 46 FIGURE 7. The graph-based clustering mechanism. (a) The HSI. (b) The adjacency matrix. (c) The graph cut. (d) The clustering result. 0.3 0.8 0.2 Cut Edge 4 5 0.7 0.8 6 (d) FT F = I u , and where L is the graph Laplacian matrix, i.e., L = D - W D is the degree matrix, which is a diagonal matrix with the u ij . SC commonly solves diagonal element being D ii = / j W the optimization problem via singular value decomposition (SVD). By extracting the c eigenvectors corresponding to the c smallest eigenvalues, an optimal F can be obtained, where c is the number of clusters. Then, by applying k-means to F, the final clustering result can be obtained. In [106], a novel Schroedinger eigenmap with nondiagonal potentials for a spectral–spatial clustering (SENP) algorithm was proposed for HSIs. The approach employs a Schroedinger eigenmap, which is an extension of the graph Laplacian matrix, to integrate barrier and cluster potentials to accurately model the similarity between pixels. Then, different kinds of nondiagonal potentials are explored within the model to encode the spatial proximity and integrate the spectral proximity through manifold learning. As a result, the graph discriminability is enhanced, and a more accurate clustering result is obtained. In addition, in [107], a graph-based nonlocal total variation (NLTV) method was developed. It explores the spatial information of HSIs with an NLTV constraint to construct a more accurate similarity graph, and it introduces the primal–dual hybrid gradient algorithm to efficiently solve the graph cut problem. Consequently, NLTV obtains accurate clustering results for HSIs. Furthermore, a joint spectral–spatial clustering with a block-diagonal amplified affinity matrix (JC-BAAM) algorithm was proposed. It considers the size and shape differences of the spatial neighborhoods of different hyperspectral pixels to promote the block-diagonal property of the affinity matrix and increase the separability between different classes [108]. Besides, by paying special attention to small variations in data density and scaling the clusters based on the latent structure, a novel graph-based clustering (GC) algorithm was developed for HSIs. It obtains a better effect for small classes that have few pixels [109]. In addition, a graph clustering-based method was put forward to solve semisupervised and unsupervised classification problem for HSIs [110]. It constructs a pairwise pixel similarity graph and develops a parallel Nyström extension model that randomly samples the graph to obtain a low-rank approximation of the graph Laplacian for SC. Moreover, some other extended graph-based clustering models, i.e., manifold-based models, were developed for HSIs. For example, in [111], a multimanifold SC (MMSC) algorithm was proposed for HSIs that constructs a nearestneighbor connectivity model based on the shared nearest neighborhood and estimates the tangent space with a weighted principal component analysis (PCA). Then, an enhanced MMSC, i.e., contractive autoencoder-based MMSC (CA-MMSC), was developed for HSIs to estimate the tangent space via a contractive autoencoder and obtain IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
better performance [112]. In [113], a rank 2 NMF-based hierarchical clustering (H2NMF) algorithm was developed for HSIs. It first treats all pixels as a cluster and then splits one cluster into two disjoint clusters using a rank 2 NMF model until obtaining stable results. In addition, in [114], an orthogonal graph-regularized NMF (OGNMF) method was introduced. It combines the orthogonal graph constraints with the NMF model to learn the local structure information of HSIs and achieves a relatively good clustering effect. In addition, a robust manifold factorization-based clustering (RMFC) algorithm was proposed for HSIs [22]. It employs a low-rank matrix factorization framework to simultaneously deal with the dimension reduction (DR) task and the clustering task, with manifold regularization to enhance the robustness of the clustering model. With the help of the out-of-sample extension trick, it can be extended to large HSIs. BIPARTITE GRAPH-BASED CLUSTERING Bipartite graph-based clustering is an extended version of complete graph-based clustering, and it has been successfully applied to HSIs to obtain good effect. In contrast to the completed graph, the bipartite graph models the relationships between two different sets, i.e., the anchor set and the pixel set, to obtain a structured similarity matrix t ! R^MN +nh # ^MN + nh at a larger size, as in (11): W t = c 0T A m . (11) W A 0 Here, A is generally constructed based on Gaussian kernel distances, with the KNN strategy utilized, as in (12): A ij = ) 2 exp _ - d Yi - Yt j i, Yi ! M k ^ Yt jh or Yt j ! M k ^ Yih , (12) 0, otherwise where Yt ! R D # n is the anchor matrix derived from the HSI matrix and d is the kernel parameter. A representative bipartite graph-based clustering method is spectral–spatial coclustering based on a bipartite graph (SSCC-BG) [115]. It extracts anchors from the cluster centroids of k-means and then constructs a bipartite graph between centroids and pixels. SSCC-BG obtains good clustering results for HSIs by fusing spectral information and spatial information into the graph. In addition, in [116], a sequential SC (S-SC) method was developed to efficiently cluster HSIs. It employs the minibatch k-means to determine the anchors and conduct cluster assignments. Then, based on the bipartite graph, S-SC utilizes the sequential SVD for the product of the rows and columns of A T A , instead of directly decomposing it, which effectively reduces the computational complexity and improves the efficiency of the model. Furthermore, in [117], a novel bipartite graph partitionbased coclustering with joint sparsity (BGP-CJS) was put forward for HSIs. The technique builds a more informative DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE bipartite graph with a learned A from the joint-sparsity-constrained optimization problem. Then, an efficient spectral graph-based normalized cut method is proposed to simultaneously cluster the rows and columns of the similarity matrix. Consequently, the BGP-CJS further improves the clustering accuracy. ABBREVIATED GRAPH-BASED CLUSTERING To overcome the large computational complexity of complete graph-based clustering, efficient abbreviated graphbased clustering has been developed, which selects only a few important and representative points, i.e., key points, to construct a similarity graph at a much smaller size of n # n, where n is the number of key points. In practice, the abbreviated graph is generally induced by the anchor graph. Hence, anchor graph-based clustering methods can be the most representative abbreviated graph-based clustering models. In many recent studies, the anchor graph is utilized to evaluate the similarity among pixels, instead of the complete-graph [118], [119]. Such methods generally include two main steps, i.e., anchor selection and relation matrix construction. The representative pixels or cluster centroids are considered as anchors, which are commonly obtained via random selection or preclustering. The relation matrix A ! R MN # n models the relationships among anchors and pixels, and it is generally constructed based on certain similarity measurements, such as the Gaussian kernel distance. In [120], A is taken as a variable and obtained by learning, as in (13), which leads to a more accurate A: min A1=1, A ij $ 0 MN n // i =1 j =1 Yi - Yt j 2 2 A ij + c A 2F , (13) where c is the regularization parameter and 1 ! R n #1 is a vector whose elements are all ones. With the obu can be constructed as tained A , the adjacency matrix W u = A K -1 A T , where K ! R n # n is a diagonal matrix with W MN the diagonal element being K jj = / i =1 A ij . Such methods are generally much more efficient, and they are more scalable to large HSIs, given their small computing demands. However, because only a few key points are utilized to approximate the structure information of HSIs, the underlying adjacency among pixels cannot be accurately mined. Consequently, the clustering accuracy of such methods is generally discounted. A typical example of the abbreviated graph-based clustering method is the fast SC with an anchor graph (FSCAG) algorithm [43]. To ensure the clustering efficiency, FSCAG randomly selects the anchors from the original hyperspectral pixels. Then, the relation matrix A is learned from (13), with a spatial constraint based on the mean filtered results of HSIs, inspired by FCM-S1, to incorporate spatial information into the anchor graph. Last, through spectral analu induced by A , as in (10), ysis of the adjacency matrix W the final clustering result is yielded. In addition, in [121], 47
a scalable graph-based clustering with nonnegative relaxation (SGCNR) algorithm was proposed. It learns A from u . Then, through (13) to construct the adjacency matrix W adding an additional nonnegative constraint to the spectral analysis model to more accurately relax it from the discrete case to the continuous case, improved clustering results can be obtained. In summary, because of the flexible graph construction means, powerful structure information mining ability, and relatively robust clustering performance, graph-based clustering methods have drawn wide attention and become one of the research hot spots in the hyperspectral clustering field. However, they are generally restricted by computational complexity, and they need to strike a compromise between accuracy and efficiency, as in abbreviated graphbased clustering. In addition, due to the inadequate consideration of the interactions among pixels during the graph construction process and the influences from the large spectral variability and high correlations among hyperspectral pixels, such techniques generally cannot accurately mine the underlying adjacency among pixels, which limits their clustering performance to a certain degree. SUBSPACE CLUSTERING METHODS Subspace clustering is another recently developed advanced hyperspectral clustering approach founded on graph-based clustering models. Such methods generally model same-class pixels that have various spectral signatures with a subspace and approximate the complex internal structure of HSIs by a union of subspaces, as detailed in Figure 8, which may relieve the large spectral variability and improve the modeling accuracy. Then, such methods explore the underlying adjacency among pixels through self-representation learning via an overcompleted dictionary derived from the HSI data, with a certain prior structural constraint utilized for the representation coefficient matrix to obtain stable solutions, as in (14). By fully exploring the interactions among pixels and the contribution of each atom to the target pixel, the learned adjacency matrix may be more accurate and informative, and it may guarantee that pixels are segmented into the correct subspaces: min C G ^C h subject to Y = YC + N, (14) where C ! R MN # MN is the representation coefficient matrix, which contains pairwise-pixel similarity and reveals the latent partition pattern of pixels to a certain degree. Here, G ^ $ h denotes the certain prior structural constraint for C , including the sparse constraint [122], low-rank constraint [123], energy constraint [124], and so on. Then, the adjacency mau can be induced by the coefficient matrix C , such as trix W u = C + C T . Last, by employing SC to the adjacency maW trix, the final clustering result can be obtained [125], [126]. The most representative subspace clustering model can be SSC [122], [127], which exploits the underlying adjacency among hyperspectral pixels by solving the following sparsity-promoting optimization problem, based on the basic assumption that each target pixel can be recovered by only a few atoms from its own subspace in the HSI selfdictionary. The SSC model can be formulated as in (15): min C C 1 m + 2 Y - YC 2 F C T 1 = 1, subject to diag ^C h = 0,  (15) 7,000 6,000 DN Value 5,000 4,000 3,000 2,000 1,000 0 0 50 100 Band Number 150 200 Spectral Variability A Subspace A Union of Subspaces Spectral Clustering (b) Structural Prior Constraint MN (a) MN MN (d) D × MN =D Hyperspectral 2D Matrix Y (e) Dictionary Y Coefficient Matrix C Spatial Regularization (c) FIGURE 8. The subspace clustering mechanism. (a)The HSI. (b) Subspace modeling. (c) Self-representation learning. (d) The similarity graph. (e) The clustering result. DN: digital number. 48 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
where m is a regularization parameter to balance the sparsity term and the data fidelity term, diag ^Ch = 0 is to avoid the trivial solution caused by representing each pixel by itself, and C T1 = 1 means that the affine subspace model is adopted. Although SSC has shown significant potential in hyperspectral clustering, due to some shortcomings, e.g., ignoring the importance of the spatial information and the nonlinearity of HSIs, the clustering performance is still limited. Based on this fact, in recent years, many enhanced subspace clustering algorithms have been proposed to further improve the clustering performance and exploit the potential of subspace clustering. On the basis of the working mechanism, such methods can be coarsely summarized into three main categories: 1) spectral–spatial subspace clustering, 2) multiview subspace clustering, and 3) kernel subspace clustering. SPECTRAL–SPATIAL SUBSPACE CLUSTERING Spectral–spatial subspace clustering methods focus on exploring the spectral–spatial duality of HSIs within the selfrepresentation framework to reduce the influence of saltand-pepper noise and enhance the spatial homogeneity of the clustering result. By incorporating spatial information to help spectral analysis in the representation domain, the piecewise smoothness of the representation coefficient matrix can be effectively enhanced, and the representation bias can be reduced to some degree. As a result, the clustering performance can be effectively improved. In general, with a certain spatial constraint, the spectral–spatial subspace clustering model can be formulated as follows: min C G ^C h + aR ^C h subject to Y = YC + N, (16) where R^ $ h denotes the spatial regularization term and a is a regularization parameter to trade off the importance between the spectral term and the spatial term. A typical example is the spectral–spatial SSC ^S 4 C h algorithm [40]. It promotes the target pixels to be represented by highly related atoms via a weighting strategy and incorporates the spatial neighborhood information to generate an integrated self-representation model by constructing an eight-neighborhood local average spatial regularization, based on the assumption that the average coefficient in the local small window should be close to the coefficient of the center pixel. Considering that the assumption of the local average constraint cannot be satisfied in areas with a complex land cover distribution, a new , 2- norm regularized SSC (L2-SSC) algorithm was proposed [41]. It incorporates spatial information in a more refined way by constructing an efficient four-neighborhood , 2- norm spatial regularization, which further improves the clustering performance. In addition, in [128], a spectral–spatial SSC based on 3D edge-preserving filtering (SSC-3DEPF) algorithm was put forward. It utilizes 3D edge-preserving filtering for the sparse coefficient matrix obtained by SSC to extract the spectral–spatial information to generate a more accurate coefficient matrix, which is favorable for clustering. In [129], a joint SSC (JSSC) method was proposed to make use of spatial information DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE through joint sparse representation. It forces the pixels in a spatial neighborhood to share the same sparse basis. In [44], based on the sparse coefficient matrix learned by SSC, two enhanced methods were put forward to construct a more accurate adjacency matrix, i.e., cosine–Euclidean (CE) and CE dynamic weighting (CEDW). These two methods simultaneously utilize the spectral and spatial information, with the cosine similarity exploited to measure the spectral similarity and Euclidean distances utilized to incorporate the spatial information. Moreover, in [130], a Laplacian-regularized low-rank subspace clustering (LLRSC) algorithm was proposed. It incorporates three different Laplacian regularizations into the low-rank subspace clustering (LRSC) model to explore the importance of the correlation information of HSIs, and it achieves good performance in HSI band selection. In [131], a spectral–spatial LRSC (SS-LRSC) model was developed. It utilizes a new modulation strategy to incorporate the correlations into the low-rank representation matrix through weighting and local spatial bilateral filtering, which performs well for HSIs. Furthermore, in [132], a Gaussian kernel dynamic similarity matrix-based SSC (GKD-SSC) method was introduced. It improves the quality of the adjacency matrix by simultaneously utilizing the sparse coefficient matrix obtained by SSC and the Gaussian kernel similarity based on the distances between pixels after PCA processing. Considering the large computational complexity of sparse recovery-based methods, a novel total variation (TV)-regularized collaborative representation clustering algorithm with a locally adaptive dictionary (TV-CRC-LAD) was proposed for HSIs [21]. This approach exploits the collaborative and competitive relationships among pixels from all classes in the self-representation process and deals with the clustering task within the collaborative representation framework, with less complexity. Then, it reduces the serious interferences from unrelated atoms in the whole dictionary by constructing a locally adaptive dictionary for each target pixel and integrates spatial information to enhance the piecewise smoothness of the coefficient matrix via the TV regularization. As a result, the TV-CRC-LAD may perform better for HSIs. In addition, to overcome the large computational complexity and the time and memory cost of the self-dictionarybased methods, a sketched subspace clustering method was developed [133]. It conducts the self-representation learning under a sketched dictionary with much fewer atoms obtained by random projection, which reduces the computational complexity and enhances the scalability of the model to a large degree. Then, the sketched subspace clustering method was introduced to HSIs, and based on it, through the TV constraint to incorporate spatial information, an enhanced method was proposed, i.e., TV sketched subspace clustering [134]. Furthermore, considering that pixel-based clustering methods generally encounter several obstacles and that they were easily affected by salt-and-pepper noise and could not accurately model the spatial neighborhoods of hyperspectral pixels with various shapes, several object/super-pixel 49
based SSC methods were developed for HSIs. A typical example is the mass center-reweighted object-oriented SSC (MCR-OOSSC) algorithm [135]. It flexibly models spatial neighborhoods with various shapes via objects obtained from oversegmentation and extracts more representative and discriminative object mass centers as features to construct the object sparse representation model, as in (17): u min Cu C 1 m uu + 2 Yu - YC u T 1 = 1. C 2 F u h = 0, subject to diag ^C  (17) Here, Yu ! R D # G is the object mass center data matrix and C ! R G # G is the associated sparse coefficient matrix, with G denoting the number of objects. Based on the MCR-OOSSC approach, in [136], a higher-order superpixel-based SSC algorithm with a conditional random field (SP-SSC-CRF) was proposed. It integrates the advantages of the S 4 C and OOSSC methods to generate an enhanced model and utilizes the conditional random field to further smooth the within-class noise. In general, these object/superpixel-based methods improve the clustering performance to a certain degree and greatly reduce the time cost by converting pixel clustering to object clustering, which significantly increases the attractiveness of subspace clustering in practical applications. Moreover, to better evaluate the discriminative information and more accurately learn the nonlinear structure of HSIs, a Laplacian-regularized deep subspace clustering (LRDSC) algorithm was proposed [137]. It combines subspace clustering with the deep convolutional autoencoder network to learn the nonlinearity of HSIs and extracts spectral–spatial information through 3D convolutions and deconvolutions with skip connections to fully exploit multilevel features. Consequently, LRDSC obtains highly competitive clustering performance for HSIs. MULTIVIEW SUBSPACE CLUSTERING Multiview subspace clustering methods take full advantage of complementary information found in different domains of HSIs to further improve the clustering performance. Generally, each view corresponds to a specific feature domain, such as the spectral feature domain, the contexture feature domain, the shape feature domain, and so on. Such techniques generally construct a unified model to integrate multiview feature self-representation problems. A typical example is the spectral–spatial-based multiview low-rank SSC (SSMLC) method [138], which has been applied to HSIs [139]. Specifically, it generates the spectral view by spectral partitioning to obtain correlated bands and creates the spatial view by morphological processing. In addition, another view is generated by PCA to remove the serious noise in HSIs. By integrating different views within the SSC framework, SSMLC can be modeled as in (18): 1 min 2 / ^b 1 m m C , C , f, C i =1 Ci ) + b 2 C i 1h + c subject to Y i = Y i C i, diag ^C ih = 0, 50 / 1 # i, j # m, i ! j Ci - C j 2 F  (18) where m denotes the number of views and Y i indicates the feature matrix of the ith view, with C i representing the associated coefficient matrix. The terms b 1, b 2, and c are three regularization parameters. The second term is utilized to force the coefficient matrixes learned from different views to share the same pattern. Through multiview learning, the complementary information of HSIs can be effectively integrated, and the discriminability of the representation coefficients can be enhanced to some degree, which leads to more accurate clustering results. Considering the nonlinearity of HSIs, the SSMLC model was extended to a kernel version, i.e., k-SSMLC [140]. It further improves the clustering accuracy by introducing the kernel technique to address the nonlinearly separable problem of HSIs in the multiview subspace clustering framework. In addition, to overcome the large computational burden of multiview subspace clustering, a parallel SSMLC (p-SSMLC) method was put forward [141]. It adopts a simple parallel strategy to reduce the time cost of SSMLC. Specifically, given the large size of remote sensing images, the HSI is first partitioned into many nonoverlapping 3D blocks. Then, the SSMLC method is applied to each 3D block to obtain the local clustering results. Last, by merging these local clustering outcomes, the final clustering result is obtained. By employing the advanced parallel computing technique, the overall time cost is significantly reduced, which further improves the practicability of the computationally expensive multiview subspace clustering models. KERNEL SUBSPACE CLUSTERING Due to the complex imaging environment and serious interference from various nonlinear factors in the imaging process, HSIs generally have an obvious nonlinear structure, and different classes are generally not linearly separable. However, most subspace clustering methods are based on the linear subspace assumption, which utilizes the union of linear subspaces to approximate the complex nonlinear internal structure of HSIs, leading to a large systematic error and poor separability among different classes. As a result, the clustering performance is degraded to some degree. Based on this fact, kernel subspace clustering methods have been developed to relieve the nonlinearity of HSIs to further improve the clustering performance through a kernel self-representation model, instead of a linear model, to more accurately mine the latent adjacency among pixels. Such methods first map pixels from the original feature space into a much higher dimensional kernel space to approximately transform the nonlinearly separable problem into a linearly separable one. Then, the self-representation property of the mapped features in the reproducing kernel space is exploited to construct a kernel self-representation model, as in (19), which generally leads to a more accurate coefficient matrix: m min C H ^C h + 2 K ^ Y h - K ^ Y h C 2F  subject to diag ^Ch = 0, IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE (19) DECEMBER 2021
where K ^ $ h denotes the kernelized data matrix, with the Gaussian radial basis function commonly utilized. A typical kernel subspace clustering method is the kernel SSC algorithm with a spatial maximum pooling operation (KSSC-SMP) [142]. The KSSC-SMP extends SSC to nonlinear manifolds to construct the KSSC model to relieve the nonlinearly separable problem of HSIs to some degree. Then, it incorporates spatial neighborhood information through spatial maximum pooling to generate more discriminative features. Consequently, the KSSC-SMP may outperform linear SSC methods. In addition, in [143], a kernel sparse and LRSC (KSLRSC) algorithm was proposed. It utilizes sparse and low-rank constraints to simultaneously explore the local and global structure information of HSIs. Accordingly, the underlying adjacency among pixels can be more accurately learned. Then, the KSLRSC method is extended to semisupervised classification for HSIs. Furthermore, by adding a TV denoising constraint into the KSSC model to enhance the similarity among pixels from the same subspace, a KSSC with TV denoising (KSSC-TVD) algorithm was put forward for HSIs [144]. In addition, the k-SSMLC method is also a typical kernel subspace clustering model [140], which extends the linear multiview subspace clustering model to a kernel version to further improve the clustering accuracy. In general, because of accurate modeling and powerful information extraction capability, subspace clustering methods have shown great potential for HSI clustering and achieved very competitive performance. In recent years, subspace clustering has gained progressively more attention. However, such methods are generally accompanied by a large computational complexity and massive time and memory consumption, which limits their applications to some degree. DEEP LEARNING-BASED CLUSTERING METHODS Deep learning-based clustering methods have been recently developed and are one of the most advanced clustering techniques [145]. These approaches rely on deep NNs (DNNs), such as fully connected networks (FCNs) and convolutional NNs (CNNs), to learn more discriminative features for clustering and to more accurately simulate the nonlinearity of data, as shown in Figure 9. Such methods generally deploy two components, i.e., the network and the clustering model. Since there are no available labeled samples, these models are generally optimized in an unsupervised way. According to the basic architecture, deep learning-based clustering methods can be further divided into three main categories: 1) autoencoder-based clustering, 2) separated network-based clustering, and 3) generative network-based clustering [146]. AUTOENCODER-BASED CLUSTERING Autoencoder-based clustering methods are the earliest and most representative deep clustering approaches. An autoencoder is an unsupervised NN with the advantages of simplicity and effectiveness. It generally consists of an encoder for data representation and a decoder for data reconstruction, and it self-trains by minimizing the reconstruction error. A typical example is the deep clustering network (DCN) [147]. It implements DR via a deep autoencoder network to learn more k-means-friendly features and optimizes the DR and clustering tasks in a unified framework, as shown in (20): MN c min / ` , ^ g ^ f ^ Yihh, Yih + 2 f ^ Yih - Is i 22 j I, " s i , i = 1  T subject to s j, i ! " 0, 1 ,, 1 s i = 1, 6i, j, (20) Autoencoder Features Encoder Reconstruction Loss Decoder Joint Training Clustering Loss Separated Network Features Clustering Model Fine-Tuning the Network (a) (c) Generative Network z Generator D (x) G (z) Discriminator x_real D (G (z)) (b) FIGURE 9. The deep learning-based clustering mechanism. (a) The HSI. (b) Deep learning. (c) The clustering result. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 51
where f ^ $ h and g ^ $ h denote the nonlinear mapping function of the encoder and the decoder, respectively; , ^ $ h stands for the reconstruction loss function, defined as , ^ Yi, X ih = Yi - X i 22 , with X i being the reconstructed sample; I represents the centroid matrix, with its ith column referring to the ith cluster centroid; and s i means the assignment vector with only one nonzero element. The first term is the network loss, and the second term is the clustering assignment loss, with c being a tradeoff parameter. Then, the DCN solves this problem with an alternating stochastic algorithm. Through learning more discriminative features via a deep autoencoder network, the DCN outperforms traditional clustering methods. A set of deep clustering models has been developed based on an autoencoder approach. In [148], a deep multimanifold clustering (DMC) method was proposed. It integrates a locality-preserving constraint into the deep autoencoder network to learn the latent embedded manifolds and more informatic features, using both the reconstruction loss and the local preserving loss. Then, the proximity of the representations to the centroids is employed as the penalty to enhance the representations’ clustering friendliness. Furthermore, a dual autoencoder-based deep SC (DAE-DSC) method was proposed that jointly optimizes the deep autoencoder and the deep SC networks [149]. It employs a dual autoencoder network to learn more robust reduced representations and uses mutual information to more effectively reserve discriminative information. Moreover, a general deep clustering model was developed by integrating the traditional clustering models, e.g., the k-means and the GMM, into the deep networks [150]. It yields a much higher accuracy than tradition methods. The autoencoder was successfully introduced to subspace clustering and delivered competitive performance by employing deep networks to more effectively deal with feature extraction and data nonlinearity simulation tasks. For example, in [151], a deep subspace clustering algorithm based on an autoencoder network (DSCNet) was proposed. It inserts a self-expressive layer between the encoder and the decoder to learn the pairwise adjacency between data points via back propagation, and it integrates the reconstruction loss and the self-representation loss to learn more discriminative representations. In addition, a structured autoencoder-based subspace clustering (StructAE) method was developed, which constructs a structured autoencoder network to more effectively preserve the local and global structure information of data [152]. Furthermore, a self-supervised convolutional subspace clustering network (S2ConvSCN) was put forward [153]. It employs a convolutional autoencoder network to fully explore the spatial information of an image, and it adds a self-expression module and an SC module into the network to generate a trainable end-to-end model. Then, it jointly optimizes the feature extraction and subspace clustering in a self-supervised way. Moreover, the LRDSC algorithm introduced previously obtains a higher clustering accuracy for 52 HSIs by introducing a convolutional autoencoder to subspace clustering [137]. In addition, in [154], a deep subspace clustering band selection model was developed for HSIs, combining a convolutional autoencoder network with the SSC model and producing a good band selection effect. SEPARATED NETWORK-BASED CLUSTERING Separated network-based clustering methods generally optimize a deep network only by the clustering loss, with the network and the clustering model separated. Although the basic network can be very deep, these methods may fail to learn informatic features for clustering, due to the absence of a network constraint, such as the reconstruction loss. Therefore, the initialization of the network seems crucial for these methods. Generally, the network is pretrained or randomly initialized. A typical example is clustering based on CNN (CCNN) [155]. It deals with the feature extraction and clustering tasks within the CNN framework in an iterative way. First, it employs a CNN pretrained on ImageNet to extract features for initial clustering, with c randomly selected centroids and the minibatch k-means utilized. Then, it exploits the difference between the label predicted by the CNN and the minibatch k-means to fine-tune the network, based on the stochastic gradient descent algorithm, and simultaneously updates the cluster centroids, as in (21). The CCNN employs feature drift compensation to relieve mismatching to further improve the clustering accuracy: 1 SSE = 2 c / ^y j - t jh2,(21a) j =1 n j = ^1 - c jh n j ^kh ^k - 1h kh + c j h^new , (21b) where SSE denotes the sum of the squared error; y j and t j stand for the label predicted by the CNN and the minibatch ^kh k-means, respectively; n j represents the jth centroid in the ^kh kth iteration; h new indicates the extracted features from a minibatch assigned to n j ; and c j is the learning rate of the jth centroid, which is defined as the reciprocal of the number of samples in the jth cluster. Unsupervised pretraining is widely utilized for separated network-based clustering methods. In [156], a deep brief network (DBN) nonparametric clustering (DBNC) algorithm was proposed that relies on an unsupervised pretrained DBN. It learns the reduced representations of data through the pretrained DBN and employs nonparametric clustering with the maximum margin to perform clustering, with the parameters of the top layer of the DBN finetuned subsequently. In addition, a deep embedded clustering (DEC) method was developed [157]. It first pretrains a stacked autoencoder network, based on the reconstruction loss in an unsupervised way, and drops the decoder part. Then, it fine-tunes the network with the clustering loss and refines the clustering result based on the Kullback–Leibler divergence between the soft assignment and the auxiliary distribution in an iterative way. Based on the DEC model, an improved version was put forward, by employing a IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
L G = min - G G 6p ^y | D h@ + E z 6G 6p ^y | G ^ z h, D h@@, (22b) enhances the discriminability and robustness of the classifier and yields a high clustering accuracy. Based on the CatGAN, in [163], an information-maximizing GAN (InfoGAN) was proposed. It enhances the clustering performance by exploiting the mutual information among the fixed small subset of latent variables and the observations. Furthermore, a deep adversarial GMM autoencoder clustering (DAGMC) algorithm was developed [164]. It uses an adversarial autoencoder network to learn the reduced representations and employs a tunable GMM for clustering. It simultaneously considers the autoencoder, the GMM, and the adversarial losses in its objectives, which are optimized by the stochastic gradient descend algorithm. Moreover, a deep adversarial subspace clustering (DASC) method was put forward [165]. It is also based on an adversarial autoencoder network with a generator for subspace estimation and the clustering assignment and a discriminator for clustering performance evaluation. Then, it progressively learns more informatic representations, with the selfrepresentation and subspace clustering tasks supervised by adversarial learning. In addition to GAN-based models, a set of variational autoencoder (VAE)-based generative deep clustering methods has been developed, integrating certain probability models into a deep network to learn the distribution of data for sample generation. For example, in [166], a variational deep embedding (VaDE) algorithm was proposed. It integrates GMM into the VAE network for sample generation and optimizes the clustering problem by maximizing the evidence lower bound via the stochastic gradient variational Bayes estimator. In addition, a VAE with Gaussian mixture (VAE-GM) method was put forward [167]. It generates samples from a prior distribution, i.e., Gaussian mixture, and introduces the minimum information constraint to relieve the over-regularization of VAE. As a result, it yields a high clustering accuracy. Overall, deep learning-based clustering methods can bring about a higher clustering accuracy due to their powerful feature learning and nonlinearity-fitting capabilities. Accordingly, they have become a research hot spot in the clustering field. However, most deep clustering methods are concentrated in the computer vision field, with rare trails in the hyperspectral remote sensing arena. Hence, more deep learning-based hyperspectral clustering methods should be developed to promote the development of this field. In addition, most of the relevant works focus on improving the clustering performance but ignoring the theoretical exploration behind the performance, which leads to the poor interpretability of these methods and limits their popularization and application to a certain degree. where G ^ $ h denotes the empirical entropy, y is the predicted label of a given example Yi, and z is a generated noise vector from a prior distribution P ^ z h, with D and G representing the discriminator and the generator, respectively. Through adversarial learning, the CatGAN effectively HYBRID MECHANISM-BASED CLUSTERING MODELS Hybrid mechanism-based methods deal with the clustering task by combining two or more models, as presented in Figure 10. Considering that a single clustering model generally has certain shortcomings, such techniques integrate convolutional autoencoder network to learn more informatic features for clustering [158]. Random initialization is also often utilized for separated network-based clustering models. For instance, in [159], a CNN-based joint unsupervised learning (JSL) algorithm was proposed. It starts with a random initialization and formulates a recurrent framework to jointly update the representations and clusters during the training process, with clustering as the forward pass and representation as the backward pass. In addition, a deep SSC (DSSC) method was developed, which combines a DNN with SSC [160]. It randomly initializes the network and iteratively refines the sparse coding and the clustering results in the forward propagation stage, with the parameters of the DNN updated in the backward propagation stage. Furthermore, a DNN-based SC (SpectralNet) method was proposed [161]. It randomly initializes the parameters of the network and considers three terms in the unsupervised training process: 1) affinity learning based on a Siamese network, 2) embedding learning under an orthogonality constraint to map the data into eigenspace, and 3) the clustering assignment. As a result, it significantly outperforms traditional SC methods. GENERATIVE NETWORK-BASED CLUSTERING Differing from autoencoder- and separated network-based deep clustering methods, generative network-based clustering approaches simultaneously perform clustering and uncover the underlying structure of data to generate new samples. These methods generally aim at learning the real structure of data as accurately as possible to create highquality samples. Therefore, they can more effectively guarantee the discriminability of the extracted features. The most representative generative deep clustering methods can be the generative adversarial network (GAN)-based approaches. These methods commonly include two parts, i.e., the generator and the discriminator, and improve the quality of the extracted features through the antagonism between the generator and the discriminator. A typical example is the categorical GAN (CatGAN) [162]. It plays a minimum–maximum adversarial game to learn a discriminative classifier in an unsupervised way, by trading off mutual information between the observations and the predicted class distribution. Its discriminator and generator are defined as in (22): L D = max G 6p ^y | D h@ - E Yi 6G 6p ^y | Yi, D h@@ D + E z 6G 6p ^y | G ^ z h, D h@@,  (22a) G DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 53
model, a graph-based k-means (G-k-means) technique was developed. It utilizes the graph model to estimate the parameters and initializations for the k-means, which effectively improves the clustering performance to obtain an accurate segmentation result for HSIs for Mars exploration. In addition, by combining anchor graph-based clustering with subspace clustering, a sparse dictionary-based anchor regression (SDCR) algorithm was introduced for HSIs [174]. It constructs a more representative dictionary through dictionary learning with double sparsity constraints, and it utilizes the anchor subspace regression to efficiently evaluate the similarity between hyperspectral pixels. With the help of SC, the final clustering result is obtained. By integrating the advantages of the anchor graph and the subspace, SDCR achieves good performance. Generally speaking, by comprehensively taking advantage of two or more different clustering schemes, hybrid mechanism-based clustering methods can overcome the defects of both techniques and may bring about better clustering performance. In theory, hybridization can be extended to any two or more clustering schemes, and progressively more attractive hybrid clustering methods may be developed through future research. the advantages of different schemes to further improve the clustering performance. A typical example is the combination of the centroid-based clustering scheme with other approaches, such as the k-GMM [168]. The k-GMM combines centroid- and probability-based clustering. Taking advantage of the k-means and the GMM, k-GMM obtains better clustering accuracy for HSIs. In addition, in [169], an improved fast density peaks-based clustering (k-­F DPC) algorithm was proposed, which is a hybridization of the k-means and the CFSFDP. Based on the CFSFDP approach, this algorithm calculates the local density based on an adaptive bandwidth pdf, and it searches cluster centroids by fitting the density and distance decision graph. Subsequently, it infers the pixel assignment through the k-means. As a result, k-FDPC outperforms both k-means and CFSFDP. In addition, bionics-based clustering can also be combined with other approaches. For example, in [170], a fuzzy Kohonen local information c-means clustering (FKLICM) method was put forward. It employs the Kohonen NN to model the complexity of remote sensing images and integrates the discriminative rules of the FLICM to enhance the discriminability of the model. Consequently, more accurate clustering results are obtained. In addition, by combining the advanced artificial bee colony (ABC) model with MRF, a novel ABC–MRF clustering algorithm was developed for HSIs [171]. The ABC model is utilized to better search cluster centroids and optimize the objective function, with the MRF utilized to incorporate spatial neighborhood information to further improve the clustering accuracy. Moreover, the graph-based clustering scheme can be flexibly combined with other clustering schemes as well. For example, in [172], a Gaussian SC model (GSC) was constructed by integrating the powerful information extraction ability of the graph model into the GMM framework. In [173], by combining the k-means with the graph EXPERIMENTS In this section, the performance of some popular and representative hyperspectral clustering algorithms is evaluated, including FCM (https://github.com/wwwwwwzj/ fcm)[45],FCM-S1[64],CFSFDP (https://github.com/DesperadoZ/ Density_Peak_Clustering) [71], GMM (https://github.com/ AdamaTG/Matlab_GMM) [79], SC (https://github.com/jhliu17/ SpectralClustering) [105], FSCAG [43], SGCNR [121], SSC (http://vision.jhu.edu/code/) [40], [122], and L2-SSC [41]. Specifically, FCM is one of the most representative centroidbased clustering methods, while FCM-S1 is a classical improved version of FCM, achieved by incorporating spatial information. CFSFDP is a representative density-based Centroid-Based Model 0.04 0.02 Combination (a) Density-Based Model Optimization 0 20 10 0 0 10 20 Probability-Based Model (c) (d) Graph-Based Model (b) FIGURE 10. The hybrid mechanism-based clustering scheme. (a) The HSI. (b) The hybrid clustering model construction. (c) The hybrid model optimization. (d) The clustering result. 54 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
clustering approach. GMM is a typical probability-based clustering method. SC is a complete graph-based clustering technique, while FSCAG and SGCNR are two recently developed state-of-the-art abbreviated graph-based clustering approaches. SSC is a representative subspace clustering method, and L2-SSC is a very competitive spectral–spatial subspace clustering technique for HSIs. These clustering methods were tested on two wellknown HSIs, i.e., the Indian Pines image and the University of Houston image, with both cluster maps and quantitative assessments provided for comprehensive evaluation and comparison. Specifically, the producer’s accuracy (PA), user’s accuracy (UA), overall accuracy (OA), kappa, and purity were utilized for quantitative analysis. In addition, the running time of each method was also given. The parameters of each approach were manually adjusted to be optimal. In the experiments, the clusters’ thematic information was automatically determined by the widely used Hungarian algorithm [175], [176], with the number of clusters set as the quantity of the classes in the ground truth [22], [43], [121]. Until now, there has been no unified standard for the utilization of unlabeled pixels in the ground truth in the hyperspectral clustering field. Some of the literatures utilize all the pixels and gives the cluster map of the whole image [21], [39], [68], while other works give only the cluster map of the labeled pixels in the ground truth [22], [77], [117]. Generally speaking, each of these strategies has advantages. The former seems more in line with the working mechanism of unsupervised clustering, as there is no available prior knowledge. The latter can more clearly present the differences between the clustering results of different algorithms. In this article, the latter is utilized. The Indian Pines image was collected by the Airborne Visible/Infrared Imaging Spectrometer sensor over northwestern Indiana on 12 June 1992. This image has a size of 145 × 145 pixels and 220 spectral bands, with a spatial resolution of 20 m. In the experiments, only 200 bands were utilized for analysis, with 20 badquality bands removed. This scene covers an agricultural area and has a relatively concentrated land cover distribution. It contains 16 different classes, with many subclasses of vegetation. As in [21] and [177], nine main classes are utilized for clustering. The false-color image and the ground truth are shown in Figure 11(a) and (b). Figure 11(c) displays the mean spectra of the nine classes, with the t-distributed stochastic neighbor embedding (t-SNE) graph of labeled samples of the nine classes given in Figure 11(d) [178], [179], from which it can be found that different classes are mixed together and are difficult to separate, leading to a very challenging clustering task. The University of Houston image is a relatively new HSI data set, provided by the 2013 IEEE Geoscience and Remote Sensing Society Data Fusion Competition. It was obtained above the University of Houston by the National Center for Airborne Laser Mapping sensor on 23 June 2012. Different DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE from the Indian Pines image, this scene mainly covers an urban area with a relatively complex land cover distribution. The image has a size of 349 × 1,905 pixels, with 144 spectral bands. In the experiment, a typical subset at a size of 160 × 150 × 144 was utilized, with seven main classes included [21]. The false-color image, the ground truth, the mean spectra, and the t-SNE graph of labeled samples of the seven classes appear in Figure 12. Cluster maps of different methods are provided in Figures 13 and 14, with the quantitative evaluations given in Tables 2 and 3, respectively. Comprehensively analyzing the experimental results, it can be seen that, in general, the spectral–spatial methods outperform the spectralbased approaches by taking full advantage of the spectral–spatial duality of HSIs to obtain smoother clustering results with a higher accuracy, which suggests that the spatial information is informative and favorable for clustering. Specifically, it can be seen that FCM and FCM-S1 fail to perform well for both HSI data sets, with a large number of misclassifications, a significant amount of within-class noise in the cluster map, and relatively lower clustering accuracy. Generally speaking, centroid-based methods are more suitable for data that have a well-separated and near-spherical geometric structure [180]. However, this performance guarantee is generally violated for HSIs. Comparatively speaking, these methods perform better on the second image scene, where there is better indivisibility, as shown in Figures 11 and 12. Similarly, GMM also performs poorly for both image scenes because its assumption that samples from different classes obey the union of Gaussian distributions cannot be fully satisfied by HSIs. Although CFSFDP obtains relatively smooth clustering results for both scenes, there are several important classes that are not effectively recognized, especially for the Indian Pines image. This is because density-based methods commonly have strong assumptions about the distribution of the feature space and are suitable for data with a multimodal distribution and nonlinear structure [180]. Unfortunately, the complexity of HSIs generally conflicts with the performance guarantee of these methods. By comparison, SC performs better, as it more accurately exploits the similarity among pixels by means of the graph. It obtains the second-best clustering accuracy for the Indian Pines image and the fourth-best clustering accuracy for the University of Houston image. Since the abbreviated graph cannot accurately model the relationships among pixels, SGCNR and FSCAG fail to obtain good performance for both scenes, although they are very efficient. There are a large number of misclassifications and a notable amount of noise in the cluster map for both scenes. In general, graph-based methods also need certain performance guarantees and are more suitable for data with a geometric structure that samples from different classes and are almost orthogonal or where the overlap between classes is small relative to the indivisibility [180]–[182], which cannot be well satisfied by HSIs. 55
effectiveness, i.e., a tolerable noise level to support a strict subspace model, enough samples for each subspace, and a low affinity between different subspaces, which has been theoretically proved [183]. It can be found that subspace clustering methods fail to obtain a satisfactory accuracy for the Indian Pines image, due to the strong noise and serious overlap between different classes, while the approaches perform well for the noiseless University of Houston image, with larger distances between different classes. In addition, it should be noted that SSC and L2-SSC are troubled by the large computational complexity and are time consuming compared with the other clustering methods, which is a shortcoming of such approaches that needs to be solved. Relative to these methods, the recently developed subspace clustering techniques, i.e., SSC and L2-SSC, may better model the complex structure of HSIs and relieve the large spectral variability with the subspace model. Through self-representation learning, interactions among pixels can be more effectively exploited, and the underlying adjacency between pixels can be more accurately learned, which might guarantee that pixels are partitioned into the correct groups. As a result, SSC and L2-SSC have a relatively good performance and show significant potential for HSIs. L2-SSC achieves the best clustering results, with smoother cluster maps and higher clustering accuracy for both scenes. However, behind the good performance, some restrictive assumptions are needed to guarantee their (b) (a) 80 7,000 60 Dimension Two After DR 8,000 DN Value 6,000 5,000 4,000 3,000 2,000 1,000 40 20 0 –20 –40 –60 0 20 40 60 80 100 120 140 160 180 200 Band Number Corn-Notill Corn-Minimum-Till Grass/Pasture Grass/Trees Hay-Windrowed Soybeans-Notill Soybeans-Minimum-Till Soybeans-Clean Woods (c) –80 –80 –60 –40 –20 0 20 40 Dimension One After DR Corn-Notill Corn-Minimum-Till Grass Trees Hay-Windrowed 60 80 Soybeans-Notill Soybeans-Minimum-Till Soybeans-Clean Woods (d) FIGURE 11. The Indian Pines data set. (a) The original image (red: 40; green: 30; blue: 20). (b) The ground truth. (c) The mean spectra of the nine classes. (d) The t-SNE graph of labeled samples of the nine classes. 56 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
SUMMARY AND DISCUSSION Hyperspectral remote sensing images provide a wealth of spectral information and show subtle differences between various classes to support fine land cover classification, and they have been an important data resource in various applications. As typical high-dimensional data, the interpretation of HSIs relies on a large number of labeled samples. However, it is very difficult to acquire high-quality samples, in practice. Therefore, during recent decades, many clustering methods have been developed for HSIs to deal with the interpretation task in an unsupervised way. In this article, we systematically reviewed the existing hyperspectral clustering methods in the literature and summarized them into nine main kinds, i.e., centroid-based, density-based, probability-based, bionics-based, intelligent computing-based, graph-based, subspace clustering, deep learning-based, and hybrid mechanism-based. In addition, we introduced the principle and mechanism of each type of clustering method and reviewed the representative approaches in detail, with the advantages and disadvantages simply summarized. From this research, we find that the development of hyperIN THIS ARTICLE, WE spectral clustering is not balSYSTEMATICALLY REVIEWED anced. The development of the centroid-, density-, and probaTHE EXISTING bility-based clustering methHYPERSPECTRAL ods is more mature, especially CLUSTERING METHODS IN for the former two approaches. THE LITERATURE AND The achievements of these two SUMMARIZED THEM INTO kinds of clustering methods NINE MAIN KINDS. are relatively more abundant. Research on bionics-based (b) (a) × 104 3.5 100 80 Dimension Two After DR 3 DN Value 2.5 2 1.5 1 0.5 0 60 40 20 0 –20 –40 –60 0 50 100 Band Number 150 –80 –80 –60 –40 Grass-Synthetic Running Track Bare Soil Building 1 Building 2 Grass Trees (c) –20 0 20 40 60 Dimension One After DR Building 1 Building 2 Grass Trees 80 100 Grass-Synthetic Running Track Bare Soil (d) FIGURE 12. The University of Houston data set. (a) The original image (red: 110; green: 40; blue: 10). (b) The ground truth. (c) The mean spectra of the seven classes. (d) The t-SNE graph of labeled samples of the seven classes. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 57
and trials are needed in the future. Recently, graph-based clustering and subspace clustering have gained an increasing attention due to their relatively good clustering performance, and more and more algorithms have been proposed. clustering is relatively few, which demands more attention in future work. In addition, deep learning has obtained remarkable achievements in the computer vision field, but it has few applications in the hyperspectral clustering arena. More effort (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) Corn-No-Till Corn-Minimum-Till Grass/Pasture Grass/Trees Hay-Windrowed Soybeans-No-Till Soybeans-Minimum-Till Soybeans-Clean Woods Unlabeled FIGURE 13. Cluster maps of the different methods for the Indian Pines image. (a) The ground truth. (b) FCM. (c) FCM-S1. (d) CFSFDP. (e) GMM. (f) SC. (g) SGCNR. (h) FSCAG. (i) SSC. (j) L2-SSC. (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) Building 1 Building 2 Grass Trees Grass-Synthetic Running Track Bare Soil Unlabeled FIGURE 14. Cluster maps of the different methods for the University of Houston image. (a) The ground truth. (b) FCM. (c) FCM-S1. (d) CFSFDP. (e) GMM. (f) SC. (g) SGCNR. (h) FSCAG. (i) SSC. (j) L2-SSC. 58 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
Moreover, we comprehensively compared and analyzed the performance of several popular hyperspectral clustering methods on two well-known HSIs. From the experimental results, we find that, in general, spectral–spatial methods outperform spectral-based methods, which indicates the importance of spatial information. Centroid-, density-, and probability-based methods, e.g., FCM, FCM-S1, CFSFDP, and GMM, do not perform well because their assumptions TABLE 2. QUANTITATIVE EVALUATIONS OF THE DIFFERENT METHODS FOR THE INDIAN PINES IMAGE. METHOD CLASS FCM FCM-S1 CFSFDP GMM SC SGCNR FSCAG SSC L2-SSC PA (%) Cluster 1 35.78 37.9 58.33 25.59 30.21 39.27 28.6 48.39 44.33 Cluster 2 0 0 3.86 17.54 0.96 0 3.01 8.07 0 Cluster 3 56.73 60.25 0 36.65 57.56 39.09 51.76 59.42 65.01 Cluster 4 52.19 54.79 66.71 67.97 77.4 55.21 64.96 67.67 76.71 Cluster 5 98.95 100 0 84.35 99.58 90.79 99.71 88.08 100 Cluster 6 49.38 49.59 59.57 30.31 53.5 40.64 55.95 34.36 45.68 Cluster 7 42.93 42.62 15.72 56.43 47.98 50.26 40.17 48.92 67.09 Cluster 8 33.73 34.57 0 15.08 28.84 16.16 27.52 2.87 2.53 Cluster 9 47.43 45.68 99.68 64.43 55.73 48.08 58.04 60.16 49.96 Cluster 1 52.95 53.34 28.1 37.96 63.78 44.81 65.96 45.22 48.62 Cluster 2 0 0 8.74 7.42 18.18 0 3.14 7.55 0 Cluster 3 30.34 30.04 0 27.75 33.03 36.78 33.73 34.41 34.97 Cluster 4 94.54 93.9 53.22 75.45 91.87 67.05 82.96 96.67 98.94 Cluster 5 69.56 68.92 0 80.28 60.79 62.64 94.51 98.14 93.91 Cluster 6 20.78 21.01 23.22 28.48 21.54 23.05 22.09 28.64 23.97 Cluster 7 51.92 52.89 58.13 45.27 49.41 45.9 47.04 45.03 51.78 Cluster 8 21.93 22.76 0 25.19 23.42 20.82 20.45 4.05 9.74 Cluster 9 88.76 91.02 77.08 84.16 94.5 77.11 92.73 96.09 98.29 OA (%) 43.03 43.55 38.75 45.18 46.92 42.45 43.99 46.27 51.15 Kappa 0.3427 0.3497 0.2927 0.3486 0.3839 0.3227 0.353 0.3676 0.419 Purity 0.5528 0.5596 0.473 0.5205 0.5498 0.4843 0.5642 0.5489 0.5647 Time (s) 69 381 497 30.68 5409 7.42 1.44 32764 13532 UA (%) Cluster 1: corn-notill; cluster 2: corn-minimum-till; cluster 3: grass/pasture; cluster 4: grass/trees; cluster 5: hay-windrowed; cluster 6: soybeans-no-till; cluster 7: soybeans-minimum-till; cluster 8: soybeans-clean; cluster 9: woods. TABLE 3. THE QUANTITATIVE EVALUATION OF THE DIFFERENT METHODS FOR THE UNIVERSITY OF HOUSTON IMAGE. METHOD CLASS FCM FCM-S1 CFSFDP GMM SC SGCNR FSCAG SSC L2-SSC PA (%) Cluster 1 51.73 51.12 99.82 68.37 88.72 66.67 74.59 79.53 91.8 Cluster 2 95.16 97.25 93.96 39.52 96.52 38.94 38.68 58.42 99.08 Cluster 3 94.45 95.51 99.94 95.01 93.48 59.78 76.09 95.25 96.61 Cluster 4 49.44 68.5 2.07 28.42 73.69 37.58 87.66 68.67 64.32 Cluster 5 100 100 99.76 99.95 100 76.66 96.71 98.9 100 Cluster 6 16.37 0 98.42 41.21 0.39 79.16 58.82 94.87 98.42 Cluster 7 67.32 72.32 94.91 73.48 54.54 48.92 82.81 69.07 89.09 Cluster 1 100 98.57 98.72 99.75 100 65.93 98.59 99.77 99.27 Cluster 2 47.51 49.54 99.81 23.66 50.87 13.73 20.18 51.04 68.48 Cluster 3 90.32 92.21 76.6 82.6 96.08 81.05 95.58 96.44 92.69 Cluster 4 43.89 79.82 30.1 37.13 78.47 39.32 66.49 88.45 86.65 Cluster 5 82.65 88 96.24 88.93 75.16 59.88 92.81 46 74.28 Cluster 6 17.28 0 99.87 22.04 0.5 43.67 60 100 99.07 Cluster 7 69.65 67.34 98.38 80.43 60.1 79.02 79.16 78.45 99.03 OA (%) 73.99 76.62 86.69 74.01 78.27 57.45 77.14 83.82 91.13 Kappa 0.6614 0.6968 0.8199 0.6556 0.7209 0.4756 0.7131 0.7921 0.8849 Purity 0.8083 0.8301 0.87 0.7903 0.8153 0.6931 0.8479 0.8382 0.9113 Time (s) 32 391 507 39 24382 7.47 1.28 24382 53281 UA (%) Cluster 1: building 1; cluster 2: building 2; cluster 3: grass; cluster 4: trees; cluster 5: grass-synthetic; cluster 6: running track; cluster 7: bare soil. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 59
cannot be fully satisfied by HSIs. FCM and FCM-S1 have a low complexity of O(MNDct) and are relatively efficient, and thus they are suitable for large hyperspectral data sets, where t denotes the number of iterations. CFSFDP has a higher complexity of approximately O ^^ MN h2h and requires a relatively large memory to store a sizeable pairwise pixel distance matrix, limiting its suitability for large HSIs. GMM has a relatively large complexity of O ^^ MN h2ct h, which degrades its suitability for large HSIs to some degree. By comparison, complete graph-based methods, e.g., SC, may perform well, but they are troubled by a large computational cost. Specifically, SC has a large complexity of O ^^ MN h2 Dt h and is time consuming, which reduces its practicability to a large degree. Comparatively speaking, abbreviated graph-based methods, e.g., FSCAG and SGCNR, are very efficient and suitable for large HSIs because of their lower complexity. The complexity of FSCAG and SGCNR are O(MNDu) and O(MND log u + MNc 2 + MNcv + c 3), respectively, where u is the number of anchors and v is the number of nearest neighbors, with u, v % MN [43], [121]. However, their clustering accuracy cannot be guaranteed. Relative to the above methods, subspace clustering approaches, e.g., SSC and L2-SSC, may deal better with the clustering task for HSIs and bring about a competitive clustering performance. However, such methods generally have a very large computational complexity of about O ^^ MN h3t h and are very time and memory consuming, which degrades their attractiveness in real functions and hinders their applications to large hyperspectral data sets to a large degree. In general, clustering is an important and necessary technique for HSI interpretation, but it has much room for improvement. Accuracy, efficiency, and intelligence may be the major lines of development for hyperspectral clustering in the future. Based on the research status of hyperspectral clustering, the challenges and possible future research lines are pointed out as follows. DEVELOPING EFFECTIVE AND EFFICIENT MODELS Accuracy and efficiency are both very important in practical applications. However, most current hyperspectral clustering algorithms cannot simultaneously consider these two aspects very well. For example, subspace clustering may bring about a higher clustering accuracy, but significant computational complexity generally follows, which degrades the technique’s scalability to large scenes and limits its practical applications. Centroid-, density-, and probability-based methods are generally efficient but with limited clustering accuracy for HSIs. Hence, how to develop more effective and efficient hyperspectral clustering models with a high accuracy and a low time cost is an interesting and attractive topic. Generally speaking, combining the advantages of different clustering models, such as hybrid mechanism-based clustering, may be an effective way to overcome these obstacles. In addition, combining advanced clustering models with high-performance 60 computing techniques, such as parallel computing, may greatly enhance the efficiency while guaranteeing a high clustering accuracy. DEVELOPING MULTIFEATURE-BASED METHODS Hyperspectral remote sensing images generally come with the serious problem that pixels from the same class have different spectra, while pixels from different classes have similar spectra, which greatly degrades the separability among different classes. Multiple features from different views/domains, e.g., spectrum, texture, and geometry, describe ground objects from different views and can provide complementary information to effectively enhance the discriminable capability of a clustering model to improve the clustering accuracy. However, most existing clustering methods integrate the spatial information by means of regularization to simply explore the discriminability of the spectral–spatial information or simply fuse multiple features through concatenation, which does not fully excavate the potential of the multidomain information in HSIs. Therefore, more advanced multiple featured-based clustering methods should be developed to further improve the clustering accuracy. DEVELOPING OBJECT- OR SUPERPIXEL-BASED METHODS Hyperspectral remote sensing images contain abundant and complex spatial neighborhood information; however, they are seriously influenced by noise during the imaging process. Most existing clustering methods are pixel-based methods, which have several inherent shortcomings. First, pixel-based methods are easily affected by salt-and-pepper noise, resulting in fragmented cluster maps. Second, pixel-based methods cannot flexibly model the spatial neighborhoods with various shapes, which leads to an inadequate exploitation of the spatial information of HSIs. Last, due to the large number of pixels, pixel-based methods may be troubled by a large computational cost, especially for graph-based clustering and subspace clustering methods. At this point, object/ superpixel-based clustering techniques can effectively overcome these obstacles. Thus, more object-/superpixel-based clustering methods should be developed for HSIs to further improve the clustering performance. PUSHING DEEP LEARNING INTO THE HYPERSPECTRAL CLUSTERING FIELD Hyperspectral remote sensing images generally have a typical nonlinear structure due to the complex imaging environment and the influences of many nonlinear factors. Thus, pixels from different classes are commonly not linearly separable. On the other hand, low-level spectral or spatial features have a limited discriminability and cannot well distinguish various classes with high similarity. However, most existing clustering methods are based on linear models to approximate the nonlinearity of HSIs, which leads to a large systematical error, or simply deal IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
with the nonlinearity of HSIs through the kernel technique. It should be noted that the kernel approach is, in essence, a template-based model, which results in a large computing amount and can only alleviate the nonlinear separable problem to a certain extent, which restricts the technique’s practical applications. Many deep learning-based methods have been developed in the computer vision field and shown powerful capabilities for nonlinear fitting and feature extraction, but the successful deployments in hyperspectral clustering are very rare. Generally speaking, current hyperspectral clustering methods remain at the stage of shallow learning and only utilize the low-level features of HSIs, which yields a limited clustering accuracy. Due to the huge differences between hyperspectral data and natural figures, directly introducing deep models in the computer vision field to HSIs generally fails to obtain a satisfactory effect. Therefore, how to adjust/ modify deep models to better learn the intrinsic structure of HSIs and extract more informatic and discriminative features to further improve the clustering performance would be a very promising research line. AUTOMATICALLY ESTIMATING THE NUMBER OF CLUSTERS Automatically and accurately estimating the number of clusters is very important for hyperspectral clustering, which promotes clustering applications to be more intelligent and attractive in practical applications. However, most current studies focus on improving the clustering models, and studies on the automatic estimation of the number of clusters are relatively few. Although some methods can automatically estimate the number of clusters for HSIs, they are generally bound to specific clustering models, e.g., FCM, and have a limited universality. Hence, finding a technique to automatically and accurately estimate the number of clusters in a more generic way will be an interesting and important research direction in the future. ACKNOWLEDGMENTS This work was funded, in part, by the Special Foundation for National Science and Technology Basic Research Program of China, under grant 2019FY202503; the National Key Research and Development Program of China, under grant 2018YFB0504500; the National Natural Science Foundation of China, under grants 42001313, 61871298, and 42071322; and the Fundamental Research Funds for Central Universities, under grant G1323520273. Readers who have questions about the article are encouraged to directly contact the corresponding author, Hongyan Zhang (zhanghongyan@whu.edu.cn). AUTHOR INFORMATION Han Zhai (zhaihan@cug.edu.cn) is with the School of Geography and Information Engineering, China University of Geosciences, Wuhan, 430074, China. He is a Member of IEEE. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE Hongyan Zhang (zhanghongyan@whu.edu.cn) is with the State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan, 430079, China. He is a Senior Member of IEEE. Pingxiang Li (pxLi@whu.edu.cn) is with the State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan, 430079, China. He is a Member of IEEE. Liangpei Zhang (zlp62@whu.edu.cn) is with the State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan, 430079, China. He is a Fellow of IEEE. REFERENCES [1] A. Plaza et al., “Recent advances in techniques for hyperspectral image processing,” Remote Sens. Environ., vol. 113, pp. S110– S122, Sept. 2009. doi: 10.1016/j.rse.2007.07.028. [2] G. Camps-Valls, D. Tuia, L. Bruzzone, and J. A. Benediktsson, “Advances in hyperspectral image classification: Earth monitoring with statistical learning methods,” IEEE Signal Process. Mag., vol. 31, no. 1, pp. 45–54, Jan. 2014. doi: 10.1109/ MSP.2013.2279179. [3] M. Imani and H. Ghassemian, “An overview on spectral and spatial information fusion for hyperspectral image classification: Current trends and challenges,” Inf. Fusion, vol. 59, pp. 59–83, July 2020. doi: 10.1016/j.inffus.2020.01.007. [4] P. Duan, X. Kang, S. Li, P. Ghamisi, and J. A. Benediktsson, “Fusion of multiple edge-preserving operations for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 12, pp. 10,336–10,349, 2019. doi: 10.1109/TGRS.2019.2933588. [5] H. Zhang, Y. Song, C. Han, and L. Zhang, “Remote sensing image spatiotemporal fusion using a generative adversarial network,” IEEE Trans. Geosci. Remote Sens., early access, 2020. doi: 10.1109/TGRS.2020.3010530. [6] G. Camps-Valls and L. Bruzzone, “Kernel-based methods for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens., vol. 43, no. 6, pp. 1351–1362, 2005. doi: 10.1109/ TGRS.2005.846154. [7] H. Zhang, L. Liu, W. He, and L. Zhang, “Hyperspectral image denoising with total variation regularization and nonlocal low-rank tensor decomposition,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 5, pp. 3071–3084, 2019. doi: 10.1109/ TGRS.2019.2947333. [8] H. Zhai, H. Zhang, L. Zhang, and P. Li, “Cloud/shadow detection based on spectral indices for multi/hyperspectral optical remote sensing imagery,” ISPRS J. Photogram. Remote Sens., vol. 144, pp. 235–253, Oct. 2018. doi: 10.1016/j.isprsjprs.2018.07.006. [9] W. He, H. Zhang, and L. Zhang, “Total variation regularized reweighted sparse nonegative matrix factorization for hyperspectral unmixing,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 7, pp. 3909–3921, 2017. doi: 10.1109/TGRS.2017.2683719. [10] F. A. Kruse, J. W. Boardman, and J. F. Huntington, “Comparison of airborne hyperspectral data and EO-1 Hyperion for mineral mapping,” IEEE Trans. Geosci. Remote Sens., vol. 41, no. 6, pp. 1388–1400, 2003. doi: 10.1109/TGRS.2003.812908. 61
[11] L. Tusa et al., “Mineral mapping and vein detection in hyperspectral drill-core scans: Application to porphyry-type mineralization,” Minerals, vol. 9, no. 2, p. 122, 2019. doi: 10.3390/ min9020122. [12] U. Bradter, J. O’Connell, W. E. Kunin, C. W. Boffey, R. J. Ellis, and T. G. Benton, “Classifying grass-dominated habitats from remotely sensed data: The influence of spectral resolution, acquisition time and the vegetation classification system on accuracy and thematic resolution,” Sci. Total Environ., vol. 711, p. 134,584, Apr. 2020. doi: 10.1016/j.scitotenv.2019.134584. [13] H. Zhang, J. Kang, X. Xu, and L. Zhang, “Accessing the temporal and spectral features in crop type mapping using multi-temporal Sentinel-2 imagery: A case study of Yi’an County, Heilongjiang province, China,” Comput. Electron. Agricul., vol. 176, p. 105,618, Sept. 2020. doi: 10.1016/j.compag.2020.105618. [14] R. Darvishzadeh, C. Atzberger, A. Skidmore, and M. Schlerf, “Mapping grassland leaf area index with airborne hyperspectral imagery: A comparison study of statistical approaches and inversion of radiative transfer models,” ISPRS J. Photogram. Remote Sens., vol. 66, no. 6, pp. 894–906, 2011. doi: 10.1016/j.isprsjprs .2011.09.013. [15] B. Kong, H. Yu, R. Du, and Q. Wang, “Quantitative estimation of biomass of alpine grasslands using hyperspectral remote sensing,” Rangeland Ecol. Manage, vol. 72, no. 2, pp. 336–346, 2019. doi: 10.1016/j.rama.2018.10.005. [16] K. C. Tiwari, M. K. Arora, and D. Singh, “An assessment of independent component analysis for detection of military targets from hyperspectral images,” Int. J. Appl. Earth Observ. Geoinf., vol. 13, no. 5, pp. 730–740, 2011. doi: 10.1016/j.jag.2011.03.007. [17] M. Shimoni, R. Haelterman, and C. Perneel, “Hypersectral imaging for military and security applications: Combining myriad processing and sensing techniques,” IEEE Geosci. Remote Sens. Mag., vol. 7, no. 2, pp. 101–117, 2019. doi: 10.1109/ MGRS.2019.2902525. [18] A. Plaza, P. Martínez, J. Plaza, and R. Pérez, “Dimensionality reduction and classification of hyperspectral image data using sequences of extended morphological transformations,” IEEE Trans. Geosci. Remote Sens., vol. 43, no. 3, pp. 466–479, 2005. doi: 10.1109/TGRS.2004.841417. [19] W. Li, S. Prasadand, J. E. Fowler, and L. M. Bruce, “Locality-preserving dimensionality reduction and classification for hyperspectral image analysis,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 4, pp. 1185–1198, 2012. doi: 10.1109/TGRS.2011.2165957. [20] W. Li, F. Feng, H. Li, and Q. Du, “Discriminant analysis-based dimension reduction for hyperspectral image classification: A survey of the most recent advances and an experimental comparison of different techniques,” IEEE Geosci. Remote Sens. Mag., vol. 6, no. 1, pp. 15–34, 2018. doi: 10.1109/MGRS.2018.2793873. [21] H. Zhai, H. Zhang, L. Zhang, and P. Li, “Total variation regularized collaborative representation clustering with a locally adaptive dictionary for hyperspectral imagery,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 1, pp. 166–180, 2019. doi: 10.1109/ TGRS.2018.2852708. [22] L. Zhang, L. Zhang, B. Du, J. You, and D. Tao, “Hyperspectral image unsupervised classification by robust manifold matrix 62 [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] factorization,” Inf. Sci., vol. 485, pp. 154–169, June 2019. doi: 10.1016/j.ins.2019.02.008. Y. Kong, Y. Cheng, C. P. Chen, and X. Wang, “Hyperspectral image clustering based on unsupervised broad learning,” IEEE Geosci. Remote Sens. Lett., vol. 16, no. 11, pp. 1741–1745, 2019. doi: 10.1109/LGRS.2019.2907598. H. Zhai, H. Zhang, L. Zhang, and P. Li, “Nonlocal means regularized sketched reweighted sparse and low-rank subspace clustering for large hyperspectral images,” IEEE Trans. Geosci. Remote Sens., early access, 2020. doi: 10.1109/TGRS.2020.3023418. H. Zhai, H. Zhang, L. Zhang, and P. Li, “Sparsity-based clustering for large hyperspectral remote sensing images,” IEEE Trans. Geosci. Remote Sens., early access, 2020. doi: 10.1109/ TGRS.2020.3032427. H. Kashima, J. Hu, B. Ray, and M. Singh, “K-means clustering of proportional data using L1 distance,” in Proc. IEEE Int. Conf. Pattern Recognit., Dec. 2008, pp. 1–4. doi: 10.1109/ ICPR.2008.4760982. J. Mao and A. K. Jain, “A self-organizing network for hyperellipsoidal clustering (HEC),” IEEE Trans. Neural Netw., vol. 7, no. 1, pp. 16–29, 1996. doi: 10.1109/72.478389. Y. Ma, S. Lao, E. Takikawa, and M. Kawade, “Discriminant analysis in correlation similarity measure space,” in Proc. Int. Conf. Mach. Learn., June 2007, pp. 577–584. doi: 10.1145/1273496.1273569. J. Chen, X. Jia, W. Yang, and B. Matsushita, “Generalization of subpixel analysis for hyperspectral data with flexibility in spectral similarity measures,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 7, pp. 2165–2171, 2009. doi: 10.1109/ TGRS.2008.2011432. C. Rohkohl and K. Engel, “Efficient image segmentation using pairwise pixel similarities,” in Proc. Joint Pattern Recognit. Symp. (JPRS), Berlin: Springer-Verlag, Sept. 2007, pp. 254–263. doi: 10.1007/978-3-540-74936-3_26. U. Maulik and I. Saha, “Automatic fuzzy clustering using modified differential evolution for image classification,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 9, pp. 3503–3510, 2010. doi: 10.1109/TGRS.2010.2047020. Y. Zhong, S. Zhang, and L. Zhang, “Automatic fuzzy clustering based on adaptive multi-objective differential evolution for remote sensing imagery,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 6, no. 5, pp. 2290–2301, 2013. doi: 10.1109/ JSTARS.2013.2240655. S. Ghaffarian and S. Ghaffarian, “Automatic histogram-based fuzzy C-means clustering for remote sensing imagery,” ISPRS J. Photogram. Remote Sens., vol. 97, pp. 46–57, Nov. 2014. doi: 10.1016/j.isprsjprs.2014.08.006. J. C. Bezdek, Pattern Recognition With Fuzzy Objective Function Algorithms. New York: Plenum, 1981. T. N. Tran, R. Wehrens, and L. M. Buydens, “KNN-kernel density-based clustering for high-dimensional multivariate data,” Comput. Statist. Data Anal., vol. 51, no. 2, pp. 513–525, 2006. doi: 10.1016/j.csda.2005.10.001. C. Cariou and K. Chehdi, “Nearest neighbor-density-based clustering methods for large hyperspectral images,” in Proc. Image Signal Process. Remote Sens. XXIII. Int. Soc. Opt. Photon., Oct. 2017, vol. 10427, p. 1,042,70I. doi: 10.1117/12.2278221. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
[37] C. Cariou and K. Chehdi, “Unsupervised nearest neighbors clustering with application to hyperspectral images,” IEEE J. Sel. Topics Signal Process., vol. 9, no. 6, pp. 1105–1116, 2015. doi: 10.1109/JSTSP.2015.2413371. [38] A. Paoli, F. Melgani, and E. Pasolli, “Clustering of hyperspectral images based on multiobjective particle swarm optimization,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 12, pp. 4175–4188, 2009. doi: 10.1109/TGRS.2009.2023666. [39] H. Jiao, Y. Zhong, and L. Zhang, “An unsupervised spectral matching classifier based on artificial DNA computing for hyperspectral remote sensing imagery,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 8, pp. 4524–4538, 2013. doi: 10.1109/ TGRS.2013.2282356. [40] H. Zhang, H. Zhai, L. Zhang, and P. Li, “Spectral-spatial sparse subspace clustering for hyperspectral images,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 6, pp. 3672–3684, 2016. doi: 10.1109/TGRS.2016.2524557. [41] H. Zhai, H. Zhang, L. Zhang, P. Li, and A. Plaza, “A new sparse subspace clustering algorithm for hyperspectral remote sensing imagery,” IEEE Geosci. Remote Sens. Lett., vol. 14, no. 1, pp. 43– 47, 2017. doi: 10.1109/LGRS.2016.2625200. [42] Y. Zhong, L. Zhang, B. Huang, and P. Li, “An unsupervised artificial immune classifier for multi/hyperspectral remote sensing imagery,” IEEE Trans. Geosci. Remote Sens., vol. 44, no. 2, pp. 420–431, 2006. doi: 10.1109/TGRS.2005.861548. [43] R. Wang, F. Nie, and W. Yu, “Fast spectral clustering with anchor graph for large hyperspectral images,” IEEE Geosci. Remote Sens. Lett., vol. 14, no. 11, pp. 2003–2007, 2017. doi: 10.1109/ LGRS.2017.2746625. [44] Q. Yan, Y. Ding, J. J. Zhang, Y. Xia, and C. H. Zheng, “A discriminated similarity matrix construction based on sparse subspace clustering algorithm for hyperspectral imagery,” Cognit. Syst. Res., vol. 53, pp. 98–110, Jan. 2019. doi: 10.1016/j.cogsys.2018.01.003. [45] C. W. Ahn, M. F. Baumgardner, and L. L. Biehl, “Delineation of soil variability using geostatistics and fuzzy clustering analyses of hyperspectral data,” Soil Sci. Soc. Amer. J., vol. 63, no. 1, pp. 142– 150, 1999. doi: 10.2136/sssaj1999.03615995006300010021x. [46] D. Lavenier, “FPGA implementation of the k-means clustering algorithm for hyperspectral images,” Los Alamos National Lab, LAUR, Los Alamos, NM, 2000. [Online]. Available: https:// www.researchgate.net/publication/2582177_FPGA_imple mentation_of_the_k-means_clustering_algorithm_for_hyper spectral_images [47] S. Lloyd, “Least squares quantization in PCM,” IEEE Trans. Inf. Theory, vol. 28, no. 2, pp. 129–137, 1982. doi: 10.1109/ TIT.1982.1056489. [48] K. Alsabti, S. Ranka, and V. Singh, “An efficient k-means clustering algorithm,” Elect. Eng. Comput. Sci., vol. 43, 1997. [49] S. A. El Rahman, “Hyperspectral imaging classification using ISODATA algorithm: Big data challenge,” in Proc. IEEE Int. Conf. E-Learn. (ECOF), Oct. 2015, pp. 247–250. doi: 10.1109/ ECONF.2015.39. [50] J. M. Haut, M. Paoletti, J. Plaza, and A. Plaza, “Cloud implementation of the K-means algorithm for hyperspectral image analysis,” J. Supercomput., vol. 73, no. 1, pp. 514–529, 2017. doi: 10.1007/s11227-016-1896-3. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE [51] B. Zhao, L. Gao, W. Liao, and B. Zhang, “A new kernel method for hyperspectral image feature extraction,” Geo-spat. Inf. Sci., vol. 20, no. 4, pp. 309–318, 2017. doi: 10.1080/10095020. 2017.1403088. [52] B. Zhang, S. Li, C. Wu, L. Gao, W. Zhang, and M. Peng, “A neighbourhood-constrained k-means approach to classify very high spatial resolution hyperspectral imagery,” Remote Sens. Lett., vol. 4, no. 2, pp. 161–170, 2013. doi: 10.1080/2150704X. 2012.713139. [53] W. Yang, K. Hou, B. Liu, F. Yu, and L. Lin, “Two-stage clustering technique based on the neighboring union histogram for hyperspectral remote sensing images,” IEEE Access, vol. 5, pp. 5640–5647, Apr. 2017. doi: 10.1109/ACCESS.2017.2695616. [54] Z. Ren, L. Sun, Q. Zhai, and X. Liu, “Mineral mapping with hyperspectral image based on an improved k-means clustering algorithm,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), July 2019, pp. 2989–2992. [55] B. C. Kuo and D. A. Landgrebe, “Nonparametric weighted feature extraction for classification,” IEEE Trans. Geosci. Remote Sens., vol. 42, no. 5, pp. 1096–1105, May 2004. doi: 10.1109/ TGRS.2004.825578. [56] C. C. Hung, S. Kulkarni, and B. C. Kuo, “A new weighted fuzzy c-means clustering algorithm for remotely sensed image classification,” IEEE J. Sel. Topics Signal Process., vol. 5, no. 3, pp. 543– 553, 2010. doi: 10.1109/JSTSP.2010.2096797. [57] Q. Wang and W. Shi, “Unsupervised classification based on fuzzy c-means with uncertainty analysis,” Remote Sens. Lett., vol. 4, no. 11, pp. 1087–1096, 2013. doi: 10.1080/2150704X.2013.832842. [58] X. Liu, B. He, and X. Li, “Semi-supervised classification for hyperspectral remote sensing image based on PCA and kernel FCM algorithm,” in Proc. GeoInformatics Joint Conf. GIS Built Environ., Classif. Remote Sens. Images. Int. Soc. Opt. Photon., Nov. 2008, vol. 7147, p. 714,71I. doi: 10.1117/12.813255. [59] S. Niazmardi, S. Homayouni, and A. Safari, “An improved FCM algorithm based on the SVDD for unsupervised hyperspectral data classification,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 6, no. 2, pp. 831–839, 2013. doi: 10.1109/ JSTARS.2013.2244851. [60] D. L. Pham, “Spatial models for fuzzy clustering,” Comput. Vis. Image Understand, vol. 84, no. 2, pp. 285–297, 2001. doi: 10.1006/cviu.2001.0951. [61] W. Pedrycz, “Conditional fuzzy c-means,” Pattern Recog. Lett., vol. 17, no. 6, pp. 625–631, 1996. doi: 10.1016/01678655(96)00027-X. [62] S. Li, B. Zhang, A. Li, X. Jia, L. Gao, and M. Peng, “Hyperspectral imagery clustering with neighborhood constraints,” IEEE Geosci. Remote Sens. Lett., vol. 10, no. 3, pp. 588–592, 2012. doi: 10.1109/LGRS.2012.2215005. [63] X. Y. Wang and J. Bu, “A fast and robust image segmentation using FCM with spatial information,” Dig. Signal Process., vol. 20, no. 4, pp. 1173–1182, 2010. doi: 10.1016/j.dsp.2009.11.007. [64] S. Chen and D. Zhang, “Robust image segmentation using FCM with spatial constraints based on new kernel-induced distance measure,” IEEE Trans. Syst., Man, Cybern. B Cybern., vol. 34, no. 4, pp. 1907–1916, 2004. doi: 10.1109/TSMCB.2004. 831165. 63
[65] Y. Zhong, A. Ma, and L. Zhang, “An adaptive memetic fuzzy clustering algorithm with spatial information for remote sensing imagery,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 7, no. 4, pp. 1235–1248, 2014. doi: 10.1109/JSTARS.2014.2303634. [66] G. Bilgin, S. Erturk, and T. Yildirim, “Unsupervised classification of hyperspectral-image data using fuzzy approaches that spatially exploit membership relations,” IEEE Geosci. Remote Sens. Lett., vol. 5, no. 4, pp. 673–677, 2008. doi: 10.1109/ LGRS.2008.2002319. [67] S. Krinidis and V. Chatzis, “A robust fuzzy local information Cmeans clustering algorithm,” IEEE Trans. Image Process., vol. 19, no. 5, pp. 1328–1337, 2010. doi: 10.1109/TIP.2010.2040763. [68] H. Zhang, Q. Wang, W. Shi, and M. Hao, “A novel adaptive fuzzy local information c-means clustering algorithm for remotely sensed imagery classification,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 9, pp. 5057–5068, 2017. doi: 10.1109/ TGRS.2017.2702061. [69] H. Zhang, L. Bruzzone, W. Shi, M. Hao, and Y. Wang, “­ Enhanced spatially constrained remotely sensed imagery classification ­using a fuzzy local double neighborhood information c-means clustering algorithm,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 11, no. 8, pp. 2896–2910, 2018. doi: 10.1109/ JSTARS.2018.2846603. [70] A. K. Jain, “Data clustering: 50 years beyond K-means,” Pattern Recognit. Lett., vol. 31, no. 8, pp. 651–666, 2010. doi: 10.1016/j. patrec.2009.09.011. [71] A. Rodriguez and A. Laio, “Clustering by fast search and find of density peaks,” Science, vol. 344, no. 6191, pp. 1492–1496, 2014. doi: 10.1126/science.1242072. [72] Y. Chen, S. Ma, X. Chen, and P. Ghamisi, “Hyperspectral data clustering based on density analysis ensemble,” Remote Sens. Lett., vol. 8, no. 2, pp. 194–203, 2017. doi: 10.1080/2150704X.2016.1249295. [73] H. Bäcklund, A. Hedblom, and N. Neijman, “A density-based spatial clustering of application with noise,” Data Min. TNM, vol. 33, pp. 11–30, Nov. 2011. [74] S. Jia, G. Tang, J. Zhu, and Q. Li, “A novel ranking-based clustering approach for hyperspectral band selection,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 1, pp. 88–102, 2015. doi: 10.1109/TGRS.2015.2450759. [75] D. Comaniciu and P. Meer, “Mean shift: A robust approach toward feature space analysis,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 5, pp. 603–619, 2002. doi: 10.1109/34.1000236. [76] X. Huang and L. Zhang, “An adaptive mean-shift analysis approach for object extraction and classification from urban hyperspectral imagery,” IEEE Trans. Geosci. Remote Sens., vol. 46, no. 12, pp. 4173–4185, 2008. doi: 10.1109/TGRS.2008.2002577. [77] J. M. Murphy and M. Maggioni, “Unsupervised clustering and active learning of hyperspectral images with nonlinear diffusion,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 3, pp. 1829– 1845, 2018. doi: 10.1109/TGRS.2018.2869723. [78] J. M. Murphy and M. Maggioni, “Spectral-spatial diffusion geometry for hyperspectral image clustering,” IEEE Geosci. Remote Sens. Lett., vol. 17, no. 7, pp. 1243–1247, 2020. doi: 10.1109/ LGRS.2019.2943001. [79] N. Acito, G. Corsini, and M. Diani, “An unsupervised algorithm for hyperspectral image segmentation based on the 64 [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] Gaussian mixture model,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), July 2003, vol. 6, pp. 3745–3747. C. A. Shah, M. K. Arora, and P. K. Varshney, “Unsupervised classification of hyperspectral data: An ICA mixture model based approach,” Int. J. Remote Sens., vol. 25, no. 2, pp. 481– 487, 2004. doi: 10.1080/01431160310001618040. C. A. Shah, P. K. Varshney, and M. K. Arora, “ICA mixture model algorithm for unsupervised classification of remote sensing imagery,” Int. J. Remote Sens., vol. 28, no. 8, pp. 1711–1731, 2007. doi: 10.1080/01431160500462121. C. F. Li, L. Liu, Y. M. Lei, J. Y. Yin, J. J. Zhao, and X. K. Sun, “Clustering for HSI hyperspectral image with weighted PCA and ICA,” J. Intell. Fuzzy Syst., vol. 32, no. 5, pp. 3729–3737, 2017. doi: 10.3233/JIFS-169305. G. Celeux, “The SEM algorithm: A probabilistic teacher algorithm derived from the EM algorithm for the mixture problem,” Comput. Statist. Quart., vol. 2, pp. 73–82, Jan. 1985. J. B. Courbot, V. Mazet, E. Monfrini, and C. Collet, “Pairwise Markov fields for segmentation in astronomical hyperspectral images,” Signal Process., vol. 163, pp. 41–48, Oct. 2019. doi: 10.1016/j.sigpro.2019.05.005. S. D. Xenaki, K. D. Koutroumbas, A. A. Rontogiannis, and O. A. Sykioti, “A layered sparse adaptive possibilistic approach for hyperspectral image clustering,” in Proc. IEEE Geosci. Remote Sens. Symp. (IGARSS), July 2014, pp. 2890–2893. doi: 10.1109/ IGARSS.2014.6947080. C. Teodor, B. Alzenk, R. Constantinescu, and M. Datcu, “Unsupervised classification of EO-1 Hyperion hyperspectral data using Latent Dirichlet allocation,” in Proc. IEEE Int. Symp. Signals Circuits Syst. (ISSCS), July 2013, pp. 1–4. doi: 10.1109/ ISSCS.2013.6651211. Y. Fang, L. Xu, J. Peng, H. Yang, A. Wong, and D. A. Clausi, “Unsupervised Bayesian classification of a hyperspectral image based on the spectral mixture model and Markov random field,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 11, no. 9, pp. 3325–3337, 2018. doi: 10.1109/JSTARS.2018.2858008. A. Baraldi and F. Parmiggiani, “A neural network for unsupervised categorization of multivalued input patterns: An application to satellite image clustering,” IEEE Trans. Geosci. Remote Sens., vol. 33, no. 2, pp. 305–316, Mar. 1995. doi: 10.1109/36.377930. Y. Zhong, L. Zhang, and W. Gong, “Unsupervised remote sensing image classification using an artificial immune network,” Int. J. Remote Sens., vol. 32, no. 19, pp. 5461–5483, 2011. doi: 10.1080/01431161.2010.502155. J. Xu, H. Li, P. Liu, and L. Xiao, “A novel hyperspectral image clustering method with context-aware unsupervised discriminative extreme learning machine,” IEEE Access, vol. 6, pp. 16,176– 16,188, Mar. 2018. doi: 10.1109/ACCESS.2018.2813988. H. H. Muhammed, “Unsupervised hyperspectral image segmentation using a new class of neuro-fuzzy systems based on weighted incremental neural networks,” in Proc. IEEE Applied Imagery Pattern Recognit. Workshop (AIPR), Oct. 2002, pp. 171– 177. doi: 10.1109/AIPR.2002.1182272. S. Das, A. Abraham, and A. Konar, “Automatic clustering using an improved differential evolution algorithm,” IEEE Trans. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
Syst., Man, Cybern. A, Syst. Humans, vol. 38, no. 1, pp. 218–237, Jan. 2008. doi: 10.1109/TSMCA.2007.909595. [93] Ç. Ari and S. Aksoy, “Unsupervised classification of remotely sensed images using Gaussian mixture models and particle swarm optimization,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), July 2010, pp. 1859–1862. doi: 10.1109/ IGARSS.2010.5653855. [94] A. Ma, Y. Zhong, and L. Zhang, “Adaptive multiobjective memetic fuzzy clustering algorithm for remote sensing imagery,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 8, pp. 4202–4217, 2015. doi: 10.1109/TGRS.2015.2393357. [95] A. Zhang et al., “Clustering of remote sensing imagery using a social recognition-based multi-objective gravitational search algorithm,” Cognit. Comput., vol. 11, no. 6, pp. 789–798, 2019. doi: 10.1007/s12559-018-9582-9. [96] Y. Wan, Y. Zhong, A. Ma, and L. Zhang, “Multi-objective sparse subspace clustering for hyperspectral imagery,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 4, pp. 2290–2307, 2019. doi: 10.1109/TGRS.2019.2947253. [97] R. S. Zemel and M. Á. Carreira-Perpiñán, “Proximity graphs for clustering and manifold learning,” in Proc. Adv. Neural Inf. Process. Syst. (ANIPS), 2005, pp. 225–232. [98] X. Zhu, C. Change Loy, and S. Gong, “Constructing robust affinity graphs for spectral clustering,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit. (CVPR), 2014, pp. 1450–1457. [99] S. Liu, S. De Mello, J. Gu, G. Zhong, M. H. Yang, and J. Kautz, “Learning affinity via spatial propagation networks,” in Proc. Adv. Neural Inf. Process. Syst. (ANIPS), 2017, pp. 1520–1530. [100] D. R. Karger and C. Stein, “A new approach to the minimum cut problem,” J. ACM, vol. 43, no. 4, pp. 601–640, 1996. doi: 10.1145/234533.234534. [101] S. Wang and J. M. Siskind, “Image segmentation with ratio cut,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 6, pp. 675– 690, 2003. doi: 10.1109/TPAMI.2003.1201819. [102] J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 8, pp. 888– 905, Aug. 2000. doi: 10.1109/34.868688. [103] P. Soundararajan and S. Sarkar, “Analysis of mincut, average cut, and normalized cut measures,” in Proc. Workshop Percept. Organiz. Comput. Vision (POCV), July 2001, pp. 1–4. [104] C. H. Ding, X. He, H. Zha, M. Gu, and H. D. Simon, “A min-max cut algorithm for graph partitioning and data clustering,” in Proc. IEEE Int. Conf. Data Min. (ICDM), Nov. 2001, pp. 107–114. [105] U. Von Luxburg, “A tutorial on spectral clustering,” Statist. Comput., vol. 17, no. 4, pp. 395–416, 2007. doi: 10.1007/s11222007-9033-z. [106] N. D. Cahill, W. Czaja, and D. W. Messinger, “Schroedinger eigenmaps with nondiagonal potentials for spatial-spectral clustering of hyperspectral imagery,” in Proc. Alg. Tech. Multispe. Hyperspe. Ultraspe. Image. XX. Int. Soc. Opt. Photon. (ISOP), June 2014, vol. 9088, p. 908,804. [107] W. Zhu et al., “Unsupervised classification in hyperspectral imagery with nonlocal total variation and primal-dual hybrid gradient algorithm,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 5, pp. 2786–2798, 2017. doi: 10.1109/TGRS. 2017.2654486. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE [108] L. Fan and D. W. Messinger, “Joint spatial–spectral hyperspectral image clustering using block-diagonal amplified affinity matrix,” Opt. Eng., vol. 57, no. 3, p. 033107, 2018. doi: 10.1117/1. OE.57.3.033107. [109] B. Hufnagl and H. Lohninger, “A graph-based clustering method with special focus on hyperspectral imaging,” Anal. Chimica Acta, vol. 1097, pp. 37–48, Feb. 2020. doi: 10.1016/j. aca.2019.10.071. [110] Z. Meng, E. Merkurjev, A. Koniges, and A. L. Bertozzi, “Hyperspectral image classification using graph clustering methods,” Image Process. Line, vol. 7, pp. 218–245, Aug. 2017. doi: 10.5201/ ipol.2017.204. [111] A. Hassanzadeh, T. Kauranne, and A. Kaarna, “A multi-manifold clustering algorithm for hyperspectral remote sensing imagery,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), July 2016, pp. 3326–3329. [112] A. Hassanzadeh, A. Kaarna, and T. Kauranne, “Unsupervised multi-manifold classification of hyperspectral remote sensing images with contractive autoencoder,” in Proc. Scandinavian Conf. Image Anal. (SCIA), Cham: Springer-Verlag, June 2017, pp. 169–180. [113] N. Gillis, D. Kuang, and H. Park, “Hierarchical clustering of hyperspectral images using rank-two nonnegative matrix factorization,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 4, pp. 2066–2078, 2014. doi: 10.1109/TGRS.2014.2352857. [114] L. Tian, Q. Du, I. Kopriva, and N. Younan, “Orthogonal graphregularized non-negative matrix factorization for hyperspectral image clustering,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), July 2019, pp. 795–798. [115] W. Liu, S. Li, X. Lin, Y. Wu, and R. Ji, “Spectral–spatial co-clustering of hyperspectral image data based on bipartite graph,” Multimedia Syst., vol. 22, no. 3, pp. 355–366, 2016. doi: 10.1007/ s00530-015-0450-0. [116] A. Hassanzadeh, A. Kaarna, and T. Kauranne, “Sequential spectral clustering of hyperspectral remote sensing image over bipartite graph,” Appl. Soft Comput., vol. 73, pp. 727–734, Dec. 2018. doi: 10.1016/j.asoc.2018.09.015. [117] N. Huang, L. Xiao, and Y. Xu, “Bipartite graph partition based coclustering with joint sparsity for hyperspectral images,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 12, no. 12, pp. 4698–4711, 2019. doi: 10.1109/JSTARS.2019.2953378. [118] W. Liu, J. He, and S.-F. Chang, “Large graph construction for scalable semi-supervised learning,” in Proc. Int. Conf. Mach. Learn. (ICML), 2010, pp. 679–686. [119] D. Cai and X. Chen, “Large scale spectral clustering via landmarkbased sparse representation,” IEEE Trans. Cybern., vol. 45, no. 8, pp. 1669–1680, Aug. 2015. [120] F. Nie, W. Zhu, and X. Li, “Unsupervised large graph embedding,” in Proc. 31st Conf. Artif. Intell. (AAAI), 2017, pp. 2422–2428. [121] R. Wang, F. Nie, Z. Wang, F. He, and X. Li, “Scalable graphbased clustering with nonnegative relaxation for large hyperspectral image,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 10, pp. 7352–7364, 2019. doi: 10.1109/TGRS.2019.2913004. [122] E. Elhamifar and R. Vidal, “Sparse subspace clustering: Algorithm, theory, and applications,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 11, pp. 2765–2781, 2013. doi: 10.1109/ TPAMI.2013.57. 65
[123] R. Vidal and P. Favaro, “Low rank subspace clustering (LRSC),” Pattern Recognit. Lett., vol. 43, pp. 47–61, July 2014. doi: 10.1016/j.patrec.2013.08.006. [124] Z. Wu, M. Yin, Y. Zhou, X. Fang, and S. Xie, “Robust spectral subspace clustering based on least square regression,” Neural Process. Lett., vol. 48, no. 3, pp. 1359–1372, 2018. doi: 10.1007/ s11063-017-9726-z. [125] V. M. Patel, H. Van Nguyen, and R. Vidal, “Latent space sparse subspace clustering,” in Proc. IEEE Int. Conf. Comput. Vision (ICCV), 2013, pp. 225–232. [126] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma, “Robust recovery of subspace structures by low-rank representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 1, pp. 171–184, 2012. doi: 10.1109/TPAMI.2012.88. [127] E. Elhamifar and R. Vidal, “Sparse subspace clustering,” in Proc. IEEE Conf. Comput. Vision Pattern Recog. (CVPR), June 2009, pp. 2790–2797. [128] A. Li, A. Qin, Z. Shang, and Y. Y. Tang, “Spectral-spatial sparse subspace clustering based on three-dimensional edge-preserving filtering for hyperspectral image,” Int. J. Pattern Recognit. Artif. Intell., vol. 33, no. 3, p. 1,955,003, 2019. doi: 10.1142/ S0218001419550036. [129] S. Huang, H. Zhang, and A. Pižurica, “Semisupervised sparse subspace clustering method with a joint sparsity constraint for hyperspectral remote sensing images,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 12, no. 3, pp. 989–999, 2019. doi: 10.1109/JSTARS.2019.2895508. [130] H. Zhai, H. Zhang, L. Zhang, and P. Li, “Laplacian-regularized low-rank subspace clustering for hyperspectral image band selection,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 3, pp. 1723–1740, 2019. doi: 10.1109/TGRS.2018. 2868796. [131] J. Xu, N. Huang, and L. Xiao, “Spectral-spatial subspace clustering for hyperspectral images via modulated low-rank representation,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), pp. 3202–3205, July 2017. [132] Y. Long, X. Deng, G. Zhong, J. Fan, and F. Liu, “Gaussian kernel dynamic similarity matrix based sparse subspace clustering for hyperspectral images,” in Proc. Int. Conf. Comput. Intell. Security (CIS), Dec. 2019, pp. 211–215. [133] P. A. Traganitis and G. B. Giannakis, “Sketched subspace clustering,” IEEE Trans. Signal Process., vol. 66, no. 7, pp. 1663–1675, 2017. doi: 10.1109/TSP.2017.2781649. [134] S. Huang, H. Zhang, Q. Du, and A. Pižurica, “Sketch-based subspace clustering of hyperspectral images,” Remote Sens., vol. 12, no. 5, p. 775, 2020. doi: 10.3390/rs12050775. [135] H. Zhai, H. Zhang, L. Zhang, and P. Li, “Reweighted mass center based object-oriented sparse subspace clustering for hyperspectral images,” J. Appl. Remote Sens., vol. 10, no. 4, p. 046014, 2016. doi: 10.1117/1.JRS.10.046014. [136] L. Wang et al., “Fast high-order sparse subspace clustering with cumulative MRF for hyperspectral images,” IEEE Geosci. Remote Sens. Lett., early access, 2020. doi: 10.1109/LGRS. 2020.2968350. [137] M. Zeng, Y. Cai, X. Liu, Z. Cai, and X. Li, “Spectral-spatial clustering of hyperspectral image based on Laplacian regularized 66 deep subspace clustering,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), July 2019, pp. 2694–2697. [138] M. Brbić and I. Kopriva, “Multi-view low-rank sparse subspace clustering,” Pattern Recognit., vol. 73, pp. 247–258, Jan. 2018. doi: 10.1016/j.patcog.2017.08.024. [139] L. Tian, Q. Du, I. Kopriva, and N. Younan, “Spatial-spectral based multi-view low-rank sparse subspace clustering for hyperspectral imagery,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), July 2018, pp. 8488–8491. [140] L. Tian, Q. Du, I. Kopriva, and N. Younan, “Kernel spatial-spectral based multi-view low-rank sparse subspace clustering for hyperspectral imagery,” in Proc. IEEE Workshop Hyperspec. Image Signal Process. Evol. Remote Sens. (WHISPERS), Sept. 2018, pp. 1–4. [141] L. Tian and Q. Du, “Parallel multi-view low-rank and sparse subspace clustering for unsupervised hyperspectral image classification,” in Proc. IEEE Asia-Pacific Signal Inf. Process. Assoc. Annu. Summit Conf. (APSIPA ASC), Nov. 2018, pp. 618–621. [142] H. Zhai, H. Zhang, X. Xu, L. Zhang, and P. Li, “Kernel sparse subspace clustering with a spatial max pooling operation for hyperspectral remote sensing data interpretation,” Remote Sens., vol. 9, no. 4, p. 335, 2017. doi: 10.3390/rs9040335. [143] F. De Morsier, M. Borgeaud, V. Gass, J. P. Thiran, and D. Tuia, “Kernel low-rank and sparse graph for unsupervised and semisupervised classification of hyperspectral images,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 6, pp. 3410–3420, 2016. doi: 10.1109/TGRS.2016.2517242. [144] J. Bacca, C. A. Hinojosa, and H. Arguello, “Kernel sparse subspace clustering with total variation denoising for hyperspectral remote sensing images,” in Proc. Math. Imag. Opt. Soc. Amer. (MIOSA), June 2017, pp. MTu4C–MTu45. [145] M. E. Paoletti, J. M. Haut, J. Plaza, and A. Plaza, “Deep learning classifiers for hyperspectral imaging: A review,” ISPRS J. Photogram. Remote Sens., vol. 158, pp. 279–317, 2019. doi: 10.1016/j. isprsjprs.2019.09.006. [146] E. Min, X. Guo, Q. Liu, G. Zhang, J. Cui, and J. Long, “A survey of clustering with deep learning: From the perspective of network architecture,” IEEE Access, vol. 6, pp. 39,501–39,514, July 2018. doi: 10.1109/ACCESS.2018.2855437. [147] B. Yang, X. Fu, N. D. Sidiropoulos, and M. Hong, “Towards k-means-friendly spaces: Simultaneous deep learning and clustering,” in Proc. Int. Conf. Mach. Learn. (ICML), July 2017, pp. 3861–3870. [148] D. Chen, J. Lv, and Y. Zhang, “Unsupervised multi-manifold clustering by learning deep representation,” in Proc. Workshop 31th AAAI Conf. Artif. Intell. (AAAI), Mar. 2017, pp. 385–391. [149] X. Yang, C. Deng, F. Zheng, J. Yan, and W. Liu, “Deep spectral clustering using dual autoencoder network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2019, pp. 4066–4075. [150] K. Tian, S. Zhou, and J. Guan, “Deepcluster: A general clustering framework based on deep learning,” in Proc. Joint Eur. Conf. Mach. Learn. Knowl. Discovery Databases, Cham: Springer-Verlag, Sept. 2017, pp. 809–825. [151] P. Ji, T. Zhang, H. Li, M. Salzmann, and I. Reid, “Deep subspace clustering networks,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), 2017, pp. 24–33. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
[152] X. Peng, J. Feng, S. Xiao, W. Y. Yau, J. T. Zhou, and S. Yang, “Structured autoencoders for subspace clustering,” IEEE Trans. Image Process., vol. 27, no. 10, pp. 5076–5086, 2018. doi: 10.1109/TIP.2018.2848470. [153] J. Zhang et al., “Self-supervised convolutional subspace clustering network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2019, pp. 5473–5482. [154] M. Zeng, Y. Cai, Z. Cai, X. Liu, P. Hu, and J. Ku, “Unsupervised hyperspectral image band selection based on deep subspace clustering,” IEEE Geosci. Remote Sens. Lett., vol. 16, no. 12, pp. 1889–1893, 2019. doi: 10.1109/LGRS.2019.2912170. [155] C. C. Hsu and C. W. Lin, “CNN-based joint clustering and representation learning with feature drift compensation for large-scale image data,” IEEE Trans. Multimedia, vol. 20, no. 2, pp. 421–429, 2017. doi: 10.1109/TMM.2017.2745702. [156] G. Chen, “Deep learning with nonparametric clustering,” 2015. [Online]. Available: http://arxiv.org/abs/1501.03084 [157] J. Xie, R. Girshick, and A. Farhadi, “Unsupervised deep embedding for clustering analysis,” in Proc. Int. Conf. Mach. Learn. (ICML), June 2016, pp. 478–487. [158] F. Li, H. Qiao, and B. Zhang, “Discriminatively boosted image clustering with fully convolutional auto-encoders,” Pattern Recognit., vol. 83, pp. 161–173, Nov. 2018. doi: 10.1016/j.patcog .2018.05.019. [159] J. Yang, D. Parikh, and D. Batra, “Joint unsupervised learning of deep representations and image clusters,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 5147–5156. [160] X. Peng, J. Feng, S. Xiao, J. Lu, Z. Yi, and S. Yan, “Deep sparse subspace clustering,” 2017. [Online]. Available: http://arxiv.org/ abs/1709.08374 [161] U. Shaham, K. Stanton, H. Li, B. Nadler, R. Basri, and Y. Kluger, “Spectralnet: Spectral clustering using deep neural networks,” 2018. [Online]. Available: http://arxiv.org/abs/1801.01587 [162] J. T. Springenberg, “Unsupervised and semi-supervised learning with categorical generative adversarial networks,” 2015. [Online]. Available: http://arxiv.org/abs/1511.06390 [163] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel, “Infogan: Interpretable representation learning by information maximizing generative adversarial nets,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), 2016, pp. 2172–2180. [164] W. Harchaoui, P. A. Mattei, and C. Bouveyron, “Deep adversarial Gaussian mixture auto-encoder for clustering,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2017, pp. 1–5. [165] P. Zhou, Y. Hou, and J. Feng, “Deep adversarial subspace clustering,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 1596–1604. [166] Z. Jiang, Y. Zheng, H. Tan, B. Tang, and H. Zhou, “Variational deep embedding: An unsupervised and generative approach to clustering,” 2016. [Online]. Available: http://arxiv.org/ abs/1611.05148 [167] N. Dilokthanakul et al., “Deep unsupervised clustering with Gaussian mixture variational autoencoders,” 2016. [Online]. Available: http://arxiv.org/abs/1611.02648 [168] V. E. Neagoe and V. Chirila-Berbentea, “Improved Gaussian mixture model with expectation-maximization for clustering of remote sensing imagery,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2016, pp. 3063–3065. [169] H. Xie et al., “Unsupervised hyperspectral remote sensing image clustering based on adaptive density,” IEEE Geosci. Remote Sens. Lett., vol. 15, no. 4, pp. 632–636, 2018. doi: 10.1109/ LGRS.2017.2786732. [170] K. K. Singh, M. J. Nigam, K. Pal, and A. Mehrotra, “A fuzzy Kohonen local information c-means clustering for remote sensing imagery,” IETE Tech. Rev., vol. 31, no. 1, pp. 75–81, 2014. doi: 10.1080/02564602.2014.891375. [171] X. Sun, L. Yang, L. Gao, B. Zhang, S. Li, and J. Li, “Hyperspectral image clustering method based on artificial bee colony algorithm and Markov random fields,” J. Appl. Remote Sens., vol. 9, no. 1, p. 095047, 2015. doi: 10.1117/1.JRS.9.095047. [172] S. G. Beaven, G. G. Hazel, and A. D. Stocker, “Automated Gaussian spectral clustering of hyperspectral data,” in Proc. Alg. Tech. Multispec. Hyperspec. Ultraspec. Image. VIII. Int. Soc. Opt. Photo. (ISOP), 2002, vol. 4725, pp. 254–267. [173] L. Galluccio, O. Michel, P. Comon, and A. O. Hero, III, “Graph based k-means clustering,” Signal Process., vol. 92, no. 9, pp. 1970– 1984, 2012. doi: 10.1016/j.sigpro.2011.12.009. [174] N. Huang and L. Xiao, “Hyperspectral image clustering via sparse dictionary-based anchored regression,” IET Image Process., vol. 13, no. 2, pp. 261–269, 2018. doi: 10.1049/iet-ipr .2018.5421. [175] H. W. Kuhn, “The Hungarian method for the assignment problem,” Naval Res. Logist. Quart., vol. 2, no. 1–2, pp. 83–97, 1955. doi: 10.1002/nav.3800020109. [176] G. Carpaneto and P. Toth, “Algorithm 548: Solution of the assignment problem,” ACM Trans. Math. Softw., vol. 6, no. 1, pp. 104–111, 1980. doi: 10.1145/355873.355883. [177] H. Yuan and Y. Y. Tang, “A novel sparsity-based framework using max pooling operation for hyperspectral image classification,” ‘‘ IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 7, no. 8, pp. 3570–3576, Aug. 2014. doi: 10.1109/JSTARS.2014.2339298. [178] L. V. D. Maaten and G. Hinton, “Visualizing data using t-SNE,” J. Mach. Learn. Res., vol. 9, pp. 2579–2605, Nov. 2008. [179] L. Van Der Maaten, “Fast optimization for t-SNE,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), Sept. 2010, vol. 100, pp. 1–5. [180] M. Maggioni and J. M. Murphy, “Learning by unsupervised nonlinear diffusion,” J. Mach. Learn. Res., vol. 20, no. 160, pp. 1–56, 2019. [181] W. Czaja and M. Ehler, “Schroedinger eigenmaps for the analysis of biomedical data,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 5, pp. 1274–1280, 2012. doi: 10.1109/TPAMI.2012.270. [182] G. Schiebinger, M. J. Wainwright, and B. Yu, “The geometry of kernelized spectral clustering,” Ann. Statist., vol. 43, no. 2, pp. 819–846, 2015. doi: 10.1214/14-AOS1283. [183] M. Soltanolkotabi, E. Elhamifar, and E. J. Candes, “Robust subspace clustering,” Ann. Statist., vol. 42, no. 2, pp. 669–699, 2014. doi: 10.1214/13-AOS1199. GRS DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 67
Methods, applications, and future directions Digital Object Identifier 10.1109/MGRS.2021.3063465 Date of current version: 5 April 2021 hange detection is a vibrant area of research in remote sensing. Thanks to increases in the spatial resolution of remote sensing images, subtle changes at a finer geometrical scale can now be effectively detected. However, change detection from very-high-spatial-resolution (VHR) (≤5 m) remote sensing images is challenging due to limited spectral information, spectral variability, geometric distortion, and information loss. To address these challenges, many change detection algorithms have been developed. However, a comprehensive review of change detection in VHR images is lacking in the existing literature. This review aims to fill the gap and mainly includes three aspects: methods, applications, and future directions. 0274-6638/21©2021IEEE IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DAWEI WEN, XIN HUANG, FRANCESCA BOVOLO, JIAYI LI, XINLI KE, ANLU ZHANG, AND JÓN ATLI BENEDIKTSSON 68 C DECEMBER 2021 ©SHUTTERSTOCK.COM/VORAN Change Detection From Very-High-SpatialResolution Optical Remote Sensing Images
Textural Features Land Cover and Land Use Feature Extraction Buildings Vegetation Frequency Object-Based Features Scale Angular Features Crops Ecosystem Services Impervious Surfaces (a) Change Detection Global Change Detection Detailed Algebra Lakes and Wetlands Change Tracking Deep Features Hyperspectral Change Detection Transforms Machine Learning Semantic End-to-End Architectures Urban Functional Zone Changes (b) (c) FIGURE 1. An outline of this review, including (a) applications, (b) methods, and (c) future directions. BACKGROUND Change detection is a vibrant area of research with wideranging applicability, including damage assessment, land management, and environment monitoring. Due to the revisit property of Earth observation sensors, multitemporal remote sensing images at a large geographical scale can be acquired easily and conveniently. Due to their extensive availability, optical images become the main data sources for change detection [1]. Since these satellite sensors are able to acquire images with meter and submeter spatial resolutions, ground objects in fine spatial detail can be investigated [2]. Subtle change detection using these VHR images has drawn great interest in both the academic and industrial communities. However, multitemporal VHR images exhibit unique properties, such as limited spectral information, intrinsic spectral variability, spatial displacement, and information loss, that limit the usefulness of traditional change detection methods. Therefore, a great number of studies have been carried out on VHR change detection, and a series of new research topics has emerged along with advances in remote sensing technology and data computing methods. In this regard, a timely overview of VHR change detection is required to summarize the new techniques and applications. Although a number of reviews about change detection using remote sensing data [3]–[10] exist in the literature, the publications discuss general change detection methods and do not focus on high-spatial-resolution images. Only a few available works involve VHR images, e.g., the reviews in [6] and [7]. However, those two works concern object-based change detection methods for VHR data, neglecting other aspects, e.g., recent technological advances in deep learning and multiview and 3D change detection. Moreover, specific applications of VHR change detection have rarely been summarized and discussed in the currently available literature. Therefore, a comprehensive review of change detection from VHR remote sensing images, including methods, applications, and future directions, is presented (Figure 1). DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE ISSUES RELATED TO VHR IMAGES AND THEIR CHANGE DETECTION With the ongoing development of remote sensing imaging techniques, an increasing number of VHR sensors are available, and many new sensors are being planned and launched [11]. New platforms, such as unmanned aerial vehicles (UAVs) and remotely piloted aircraft systems, have grown in popularity [12] and are now providing a large amount of VHR remote sensing data. As seen in Table 1, the imaging capabilities of VHR platforms and sensors TABLE 1. THE MAIN PARAMETERS OF SOME VHR SENSORS. SENSOR SPATIAL RESOLUTION (M) NUMBER OF BANDS REVISIT TIME (DAYS) LAUNCH YEAR IKONOS 1 Four One to three 1991 QuickBird 0.61 Four 1.5–2.5 2001 SPOT-5 2.5 Four 26 2002 OrbView-3 1 Four Three 2003 Cartosat-2 0.8 One Four 2007 WorldView-1 0.5 One 1.7 2007 GeoEye-1 0.41 Four Fewer than three 2008 WorldView-2 0.46 Four 1.1 2009 KOMPSAT-3 0.7 Four Three 2012 Ziyuan-3 2.1 Four Four to five 2012 SPOT-6/7 2 Four One 2012/2014 Gaofen-1 2 Four Fewer than four 2013 Gaofen-2 0.8 Four Four 2014 Planet Labs 3 Four One or two 2014 Deimos-2 1 Four One or two 2014 WorldView-3 0.31 16 Fewer than one 2014 DMC-3 1 Four One 2015 WorldView-4 0.31 Four Fewer than one 2016 SPOT: Satellite Pour l’Observation de la Terre; KOMPSAT: Korean Multipurpose Satellite; DMC: Disaster Monitoring Constellation. 69
are continually being improved with higher spatial resolutions, more spectral bands, and higher temporal revisit frequencies. In addition, most VHR sensors provide an along-track and across-track pair for stereo capture [12], [13]. With the improved capability of VHR remote sensing equipment, it is now becoming possible to achieve subtle, detailed, and frequent 3D change detection. Although change detection using VHR images is advantageous, from a technological point of view, it remains a challenge due to 1) limited spectral information, 2) intrinsic spectral variability, 3) spatial displacement, and 4) information loss, as discussed in the following. 1) Limited spectral information: Compared to coarse- and medium-resolution sensors, images captured by VHR sensors usually provide a smaller number of bands. Although WorldView-3, one of the most advanced VHR sensors, can provide images with 16 spectral bands, most VHR images, e.g., from IKONOS, QuickBird, WorldView-2, and Ziyuan-3, cover only four bands (blue, green, red, and near-infrared) [14]. With limited spectral information, it is difficult to separate classes that have similar spectral signatures because of the low between-class variance [15]–[18]. Researchers have also pointed out that it is difficult to achieve high-accuracy change detection with the limited spectral information [5], [15], [19]–[21] of VHR images. This may inhibit the direct use of traditional spectral-based change detection methods, e.g., change vector analysis (CVA) [22]. Therefore, other categories of features are often adopted to augment the spectral information for VHR change detection. 2) Spectral variability: There exists a high degree of spectral variability in VHR images. Buildings, for example, have complicated appearances, with various roof superstructures, such as chimneys, water tanks, and pipelines; this leads to significantly heterogeneous spectral characteristics in VHR images [23], [24]. High spectral variability within geographic objects increases the within-class variance, which inevitably leads to the uncertainty of spectral-based image interpretation methods. External (a) (b) (c) FIGURE 2. The spatial displacement in multispectral data acquired with different viewing geometries in an unchanged urban scene [21]: (a) Image (t1), with a satellite angle zenith of 153°, and (b) image (t 2), with a satellite angle zenith of 129°12´. (c) The result of traditional spectral-based CVA shows a high number of false alarms (black and white indicate unchanged and changed areas, respectively) [31]. 70 factors, such as atmospheric conditions, phenological stages, sun angles, soil moisture, tidal stages, and water turbidity, may make unchanged objects temporally variant in their spectral features and hence result in them being incorrectly identified as changed ones [25], [26]. In addition, temporary objects, such as cars on a road, visible in VHR images can also affect the performance of traditional spectral-based change detection methods using VHR images. 3) Spatial displacement: The VHR imaging systems on optical satellites are highly agile platforms and can operate as constellations [27] that can support rapid retargeting, high revisit times (for instance, <1 day for WorldView-3 and WorldView-4), and stereoscopic coverage for rapid disaster response and 3D change detection [28]. However, this imaging mode makes it extremely difficult to acquire multitemporal images with the same or close viewing angles for accurate change detection [29], [30]. As such, multitemporal VHR images may suffer from apparent spatial displacement due to the parallax distortion of land cover objects, especially for high-rise buildings [31]. Specifically, a building may display distinct spatial morphologies (e.g., roofs and facades) in multitemporal VHR images due to different viewing angles (Figure 2). This may lead to a large number of commission errors if traditional spectral and pixel-based change detection methods are adopted. To solve such a problem, precise orthorectification using VHR digital surface models (DSMs) is a feasible solution. In particular, sensors equipped with multiview imaging systems, for instance, the three-line array of Ziyuan-3 and the two cameras of Cartosat-2, that can nearly simultaneously collect multiview images are preferred in similar atmospheric conditions for their stereo pairs and convenient collection of multitemporal data. 4) Information loss: VHR images suffer from serious information loss owing to the presence of clouds/haze, cloud shadows, and shadows cast by terrain, buildings, and trees. The problem of cloud and cloud shadow contamination can be avoided by selecting cloud-free observations [32]. However, shadows cast by terrain, buildings, and trees seem unavoidable in VHR imagery, especially in urban areas [33]. Although shadow information is useful in building detection and height estimation [34]– [36], it becomes a problem for change detection in wider areas [37]. Since the direction and length of shadows are dependent on the sun’s azimuth and elevation angle at the time of image acquisition, shadow-affected areas are different in multitemporal images. Besides, in the case of occlusions by vertical structures (e.g., high-rise buildings and trees), the problem of information loss can be more complicated. With different viewing geometries in multitemporal images, the size and direction of the tilting effect can vary, as shown in Figure 2. Overall, the regions affected by shadow and occlusions may become invisible and different in multitemporal VHR images. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
METHODS Change detection methods for VHR images are commonly based on two steps: 1) feature extraction and 2) change detection (see Figure 1). the challenges of limited spectral information and intrinsic spectral variability. A summary of the major features used for VHR change detection, including categories, subcategories, descriptions, characteristics, most-used sensors, and corresponding references, is presented in Table 2. 4 5 6 5 6 5 1 2 1 6 5 1 6 0 2 6 5 4 0 0 0 1 0 0 0 3 5 3 3 0 0 0 0 0 1 0 4 1 2 2 1 0 0 0 0 0 0 5 1 1 1 1 0 0 0 0 3 6 FEATURE EXTRACTION Change detection methods rely on effective multitemporal feature representation to indicate whether and what changes have occurred. It has been agreed that spectral-based methods become ineffective in dealing with the challenges facing VHR change detection. During the past decades, a large number of image features have been extracted, which can compensate for the limited spectral information contained in VHR images and improve the discriminative capability of image change information. In this review, image features designed for VHR change detection are divided into the following categories: textural, deep, object based, and angular (Figure 3). These are potentially useful for dealing with 0 0 0 0 2 0 Statistical Model Based Transform Based (a) Convolution Autoencoder Single Object Openings Structural Code Encoder Closings TEXTURAL FEATURES Textural features depict contextual and structural information by using a moving window or kernel, where the parameters of size, direction, and distance must be appropriately determined [5], [38]. Textural features for VHR change detection can be categorized as statistical, structural, model based, and transform based. Statistical textures describe the relationships between the gray levels of local windows, e.g., the gray-level cooccurrence matrix (GLCM); local binary patterns (LBPs); and pixel shape index (PSI). The GLCM, the most popular statistical texture, measures the contrast (e.g., dissimilarity and homogeneity), orderliness (e.g., the Pooling Decoder Radiometry (b) Two Objects Convolution Fully Connected Convolutional Neural Network Adjacency Geometry Proximity Texture Relations Second Level (c) First Level Pooling Multiple Objects Spatial Arrangements Third Level Angular Variation Stereo Photogrammetry Forward Nadir DSM Implicit Backward ADF (d) Explicit FIGURE 3. Features for change detection using high-spatial-resolution remote sensing images. (a) Textural features. (b) Deep features. (c) Object-based features. (d) Angular features. ADF: angular difference feature. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 71
geometric information of relevant structures is preserved and unimportant details are attenuated [48], [49]. MPs and APs have proved to be effective in VHR change detection since they can simplify results and reduce noise components (e.g., spectral variations) [48], [49]. For instance, Liu et al. [50] took the geometrical structure of change targets into account using MPs. In addition, the morphological building index (MBI) [36], which is defined as differential MPs with linear structural elements, has been extensively used in VHR change detection in urban areas since it can highlight bright and high-contrast structures, mostly consisting of buildings, in remote sensing images. For example, Huang et al. [51] proposed an automatic building change detection framework based on the MBI. Experimental results showed that the proposed method outperformed supervised classification via a support vector machine (SVM). In addition, point and line features, for instance, Harris [52] and scale-invariant feature transforms (SIFTs) [53], can improve the discriminability of man-made objects, such as buildings, roads, and cars, by describing corners and edges, therefore improving results. Model-based textures, e.g., Markov random fields (MRFs) and fractal models, aim to represent textures through stochastic processes [54]. MRF models present spatial context through a graph-based image representation, where the nodes and edges of the graph express pixels and their relationship with connected nodes, respectively. Fractal models can depict texture roughness and complexity by capturing self-similar and self-affine patterns [55]. A number of MRF-based methods have been proposed to deal with VHR image change detection [56]–[60] because of their ability to describe local spatial angular second moment and entropy), and statistical (e.g., the mean, variance, and correlation) attributes within local windows [39], [40]. The LBP, an ordered set of binary comparisons of pixel values between the central pixel and its neighboring ones, is invariant to monotonic grayscale change [41]. The PSI aims to measure the length of direction lines, which are extended based on gray-level similarity along a series of directions [42]. Some representative examples for VHR change detection using statistical textures are briefly introduced in the following. Tan et al. [43] adopted the GLCM in an automatic change detection method to consider the variation information of direction, distance, and amplitude in images. Li et al. [44] applied the local similarity of GLCM textures to detect changes and demonstrated that this kind of feature was robust against both noise and spectral similarity. Peng and Zhang [45] used the LBP for change detection from Gaofen-1 imagery, and both qualitative and quantitative analyses demonstrated the effectiveness of the proposed approach. Zhang et al. [46] identified building change types, i.e., new construction, demolition, and reconstruction, by using LBP features and obtained satisfactory change detection results with a high detection accuracy and precise structure boundaries. Liu et al. [47] proposed a line-constrained shape feature, a modified version of the PSI, for building change detection, and the results showed the approach’s advantage in individual building change detection in a lightly populated region. Structural textures, e.g., morphological profiles (MPs) and attribute profiles (APs), facilitate the investigation of the geometries, shapes, and edges of regions, with the convex and concave components being erased so that the TABLE 2. A SUMMARY OF THE FEATURES USED FOR VHR IMAGE CHANGE DETECTION. CATEGORY SUBCATEGORY DESCRIPTION CHARACTERISTICS SENSOR REFERENCES Textural features Statistical Describe the relationships among the gray levels of local windows Edge effect, difficulty of identifying parameters QuickBird [48]–[53] [43]–[47] Structural Investigate the geometry, shapes, and edges of regions [48]–[53] Model based Obtain coefficients from the model describing the relationships among the local image neighborhood [56]–[61] Transform based Capture local structures in a transformed space Autoencoders Learn efficient encoding through the optimization of a series of criteria Convolutional neural networks Extract mid- and high-level abstract features by interleaving convolutional and pooling layers First level Radiometry, geometry, and texture for each image object Second level Relationships between two image objects, e.g., adjacency and proximity, and relationships with neighboring objects Third level Spatial arrangements of multiple objects Implicit Orthographic images and DSMs Explicit Quantify the differences contained in multiangle images, such as angular difference features Deep features Objectbased features Angular features 72 [63], [64] Complex training and parameter tuning, “black-box” nature, high computational burden, overfitting, and so on Gaofen-2 [66], [77] and Google Earth images [66], [76] [70]–[73] Determination of appropriate segmentation parameters and uncertainties of the segmentation results QuickBird [88], [89] [85], [88], [89] [66], [67], [75]–[78] [91], [92] [95] Availability of multiangle images Ziyuan-3 [2], [21] [21], [98], [99] [2] IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
relationships. Specifically, Bruzzone and Prieto [57] introduced a change detection method based on an MRF to model prior class probabilities by interpixel dependence, which increased the accuracy and reliability of the change detection results. In [60], spatial constraints between neighboring samples were formulated using an MRF in an active learning process for change detection. Multifractal features were applied to change detection in [61], and experiments on a complex landscape that included urban areas, agricultural fields, trees, and an unregulated river indicated that the features were tolerant to some degree to multitemporal differences caused by the viewing geometry and illumination angles. Transform-based textures, e.g., Gabor, wavelets, and contourlets (CTs), aim to convert images into a new space to capture local structures corresponding to scale, localization, and orientation [62]. For example, Li et al. [63] used a Gabor-based approach to improve the change detection performance since the technique can capture contextual information at different scales and orientations. Wei et al. [64] introduced wavelet pyramid decomposition features to VHR change detection. Thus, in VHR images, the complexity of homogeneous regions can be reduced in low-scale features, and details and edge information can be retained in high-scale ones [64]. In a comparative study conducted by Li et al. [65], a number of representative textural features were selected for change detection using VHR images, and it was shown that texture-based change detection methods can obtain better performance than spectral-based pixel ones. Texture change detection results are demonstrated in Figure 4, and it can be seen that, compared to using individual textures, combining multiple textures can improve change detection accuracy. Unchanged Changed (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) FIGURE 4. Change detection results based on textures: (a) image (t1), (b) image (t 2), (c) the reference change map, (d) the GLCM, (e) APs, (f) a 2D wavelet transform (WT), (g) a fractal, (h) a fuzzy set (APs plus a 2D WT plus a 3D WT), (i) a fuzzy set (all textures), (j) a random forest (APs plus a 2D WT plus a 3D WT), and (k) a random forest (all textures) [65]. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 73
DEEP FEATURES Deep feature representation based on the layer-wise learning of image patterns is a very promising research direction for change detection in VHR images [66], [67]. Differing from traditional handcrafted features, higher-level abstractions (both linear and nonlinear features) can be automatically extracted and optimized by multilayer neural networks, which can retain crucial variations and discard uncorrelated differences for change detection tasks [68]. In recent years, many deep learning methods have been developed, such as autoencoder (AE) models and convolutional neural networks (CNNs), for deep feature extraction in change detection with VHR images. The AE is an unsupervised feature learning model that is constructed by minimizing the reconstruction error. However, it may learn a useless feature representation, such as a simple copy of the input [69]. To overcome that issue, variant models, e.g., the denoising AE (DAE) [70], sparse AE (SAE) [71], and Fisher AE (FAE) [72], have been employed for VHR change detection, with denoising, sparsity, and Fisher discriminant criteria, respectively. Specifically, a stacked DAE was used to learn high-level features from the local neighborhood [70]. In [70], it was found that the filters learned by a stacked DAE have a stronger representation capability than existing explicit ones. Based on the SAE, Su et al. [71] transformed a difference image into a suitable feature space for suppressing noise and extracting key change information in the change detection framework. Liu et al. [72] used the FAE for unsupervised layer-wise feature learning and showed that the model can generate more discriminative features than the original AE. In addition to unsupervised feature learning through the optimization of certain criterions, AE-based models can learn effective features in a supervised way by considering label consistency, e.g., the contractive AE [73]. It is well recognized that CNNs are effective in extracting mid- and high-level abstract features by interleaving convolutional and pooling layers [74]. According to the feature learning strategy, CNNs can be categorized as unsupervised [67], [75], [76], supervised [77], fine-tuning [66], and transfer learning based [78]. For example, Zhan et al. [75] used a pretrained CNN to automatically extract deep spatial–spectral features for change detection in VHR satellite images. Saha et al. [67] developed unsupervised deep CVA for change detection, and a network trained on remote sensing aerial images for semantic labeling by Volpi and Tuia [79] was adopted for deep feature extraction. As detailed in Figure 5, the experimental results demonstrated that, compared to object-based methods, deep features are effective for capturing change information and are promising for distinguishing multiclass change information. Wang et al. [77] trained a model through manually selected samples, where the parameters of the shared convolutional layers were initialized by the pretrained ResNet-50 model, and the others were randomly initialized. Hou et al. [66] chose to extract CNN-based deep features through a fine-tuned Visual Geometry Group 74 (VGG)-16 by transferring a model pretrained on largescale natural images to the remote sensing domain via an aerial image data set. Liu et al. [78] proposed a CNN-based transfer learning method for change detection. In particular, the loss function was designed by combining high-level features extracted from a pretrained model (i.e., the U-net model trained on an open source data set) and semantic information contained in change detection data sets. Notably, deep learning methods depend on an enormous amount of training data, which may not be available for multitemporal VHR remote sensing imagery [74]. Meanwhile, great differences in spectral properties and image contexts among natural red–green–blue (RGB) images and remote sensing data result in deep features extracted by finetuned models that do not fully represent the essential characteristics of remote sensing images. As a result, the contrast between a small number of remote sensing data sets and a large number of natural images during model learning may hamper the further improvement of VHR change detection using deep features. In recent years, large multitemporal data sets have been released, such as 86 image pairs from the DigitalGlobe satellite constellation (i.e., QuickBird, WorldView-1, WorldView-2, and GeoEye-1) [80], 291 pairs of multitemporal aerial images [81], and more than 700,000 labeled instances for building damage assessment [82]. It can be anticipated that more and larger multitemporal VHR remote sensing data sets with diverse image characteristics and various acquisition conditions will appear in the near future. In this case, the essential change features for VHR remote sensing images can be effectively extracted by a deep network specialized for multitemporal remote sensing data. OBJECT-BASED FEATURES Object-based features refer to spectral, geometry, texture, extent, and contextual information at the object scale rather than single pixels and groups of pixels within a kernel filter/moving window. In this way, an image object is viewed as the processing unit for change detection. An object is a set of spatially adjacent pixels that are spectrally similar and that can be extracted through image segmentation. Overall, object-based features are effective in VHR change detection since they mitigate radiometric differences, spectral variability, and misregistration errors [38], [83]. However, appropriate segmentation parameters, which are often dependent on subjective and laborious trial-and-error experiments, need to be determined [84]. Furthermore, shortcomings and problems in different multitemporal image segmentation strategies, e.g., 1) the segmentation of only one monotemporal image, 2) the segmentation of stacked multitemporal images, and 3) the independent segmentation of multitemporal images, should be carefully considered and tackled [5], [85]. Specifically, geometric changes (e.g., the size and shape) cannot be captured by 1) and 2) [85]. Moreover, strategy 2) may also result in “sliver objects” caused by image misregistration. As for strategy 3), spatial correspondence between multitemporal objects needs to be established. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
Therefore, object-oriented texture computed within the boundary of an object is recommended, such as object-wise GLCM texture measures [87] and object-based MPs [90]. The second-level object-based features exploit relationships between two image objects, e.g., adjacency, proximity, and relations between neighboring objects [87]. For example, Liang et al. [91] considered the relations of neighboring objects in feature extraction for object-oriented change detection. Yu et al. [92] combined a relative border with a “forest with no change” and the normalized difference vegetation index (NDVI) to identify the category of “change from forest to developed land.” The third-level features refer to spatial arrangements among multiple objects [87]. Thirdlevel object-based features have been used in image classification, such as urban functional zone extraction [93] and urban village detection [94]. Nevertheless, such features have rarely been used in VHR image change detection. In [95], spatial dependency and sharing boundaries among multiple objects are considered to reduce spurious errors caused by shadow in urban vegetation change detection. Object-based CVA results [85] derived from different multitemporal segmentation strategies are presented in Figure 6, where it can be observed that different multitemporal segmentation strategies can significantly affect change detection results. Generally speaking, three levels of object-based features can be used for change detection [86]. In the first, the objectbased features include the radiometry, geometry, and texture for each image object [87]. For instance, in [88], key points of each object are extracted in change detection, which was successfully applied in three landslide scenes and one view that examined land use changes. Bovolo [89] computed the mean values of texture measures in separate parcels for change detection, and better accuracy with high fidelity in the homogeneous and border regions was achieved by the objectbased method than with the pixel-based one. However, in these studies, texture is still extracted in a pixel-based manner and depends on the size of a moving window (or kernel). More importantly, kernel- and window-based texture can create between-class texture, leading to an edge effect [87]. (a) (b) (c) (d) (e) (f) ωnc ωc1 ωc2 ωc3 Bounding Box Denoting Changes FIGURE 5. Change detection results for QuickBird bi-temporal images: (a) image (t1), (b) image (t 2), (c) the reference change map, (d) multiclass deep CVA, (e) binary change deep CVA, and (f) object-based CVA [67]. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 75
ANGULAR FEATURES Multiangle satellite images can be acquired by WorldView-2, IKONOS, Cartosat-1, and Ziyuan-3 through across-track and along-track stereoscopy [96]. Spatial and spectral variations encoded in multiangle images can be extracted as new information sources for change detection. To be specific, multiangle observations can capture information about bidirectional reflectance signatures and vertical structures (e.g., trees and buildings) and hence complement conventional spectral and spatial features [27]. In this article, angular features are categorized as 1) implicit ones that are generated by stereo photogrammetry, such as orthographic images and DSMs, and 2) explicit ones that capture angular variations, such as angular difference features [97]. Most existing change detection studies based on multiangle VHR imagery adopt implicit angular features. For example, Chaabouni-Chouayakh et al. [98] presented a fully automatic change detection method for urban monitoring using IKONOS stereo data, and their experimental results verified the effectiveness of the joint use of multispectral and DSM features. Tian et al. [99] investigated building and forest change detection using panchromatic Cartosat-1 stereo imagery, and they found that extracted height values from DSMs can greatly improve change detection accuracy. Huang et al. [21] used photogrammetrically derived orthographic images from multiangle Ziyuan-3 data to monitor subtle changes across urban areas, and it was shown that the use of orthographic images can minimize the influence of spatial inconsistency among multitemporal data, e.g., misregistration and parallax distortion for high-rise buildings. On the other hand, explicit angular features aim at describing the differences contained in multiangle images, e.g., the angular difference feature [100], multiangular builtup index (MABI) [101], multiangle spectral variation feature [27], stacked multiangle spectral feature [102], and bidirectional reflectance distribution function-based index [103]. Benefiting from these explicit angular features, detailed urban and vegetation classifications were achieved using multiangle VHR images. Nevertheless, in the current literature, the previously mentioned explicit angular features have seldom been employed for change detection. One exception is a recent study presented in [2]. In it, the MABI, which indicates spectral and structural variations in multiview images, was used. Specifically, Huang et al. [2] integrated planar (i.e., MBI, Harris, and PanTex) and vertical [multispectral image (MSI), normalized DSM (nDSM), and MABI] features to detect newly constructed buildings and identify their change timing by using time-series, multiview Ziyuan-3 imagery. Figure 7 gives an example of change results from different feature combinations. It shows that the joint use of planar and vertical features can generate more accurate results in terms of change extents and timings. To better evaluate the different kinds of features, we create a Ziyuan-3 multiview change detection (MVCD) data (a) (b) (c) (d) (e) (f) FIGURE 6. Object-based CVA results from different multitemporal segmentation strategies: (a) image (t1), (b) image (t 2), (c) the reference change map, (d) the segmentation of image(t1), (e) the segmentation of stacked multitemporal images, and (f) the separate segmentation of each monotemporal image [85]. 76 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
set, which is available at http://irsip.whu.edu.cn/resources/ resources_en_v2.php. It includes both urban and rural scenes with diverse and complex change types, and, moreover, it considers seasonal and illumination influences. These characteristics enable the MVCD to function as a challenging change detection data set. A comparative analysis between different attributes, including the GLCM [39], AP [49], CT [62], MABI, [101], object-wise GLCM (GLCM-Obj) [87], and deep features [67], has been carried out. Specifically, the change intensity map was obtained by CVA, and the threshold for each feature was determined based on receiver operating characteristic curves to achieve a balance between commission and omission errors [65]. Qualitative and quantitative experimental results are provided in Figure 8 and Table 3, respectively. The spectral feature fails to detect changes between spectrally similar classes (e.g., bare soil and buildings), and unchanged objects with spectral variation are incorrectly detected as changed ones. The GLCM, AP, and CT can depict textural changes, e.g., the spatial distribution of the gray value, geometry, and local details. Among them, the CT gives more complete changed regions, and the AP produces more false alarms. The MABI emphasizes building changes, but it is not sensitive to other variations (e.g., soil, vegetation, and roads), which therefore leads to a large omission error. The GLCM-Obj generates smoother results with smaller omissions but larger commission errors than its pixel-wise version. Deep CVA outperforms the other methods, but false alarms caused by shadows and seasonal effects can be still observed. CHANGE DETECTORS VHR change detectors can be categorized as algebra-, transform-, and machine learning-based indicators. CVA is one of the most widely used algebraic approaches, and it is carried out by measuring the difference among bi-temporal multifeature vectors to derive a change vector for VHR images [67], [104], [105]. Transform-based methods, such as principal component analysis [106] and multivariate alteration detection [107], attempt to suppress no-change areas and emphasize change information in the transformed feature space. In the machine learning community, change detection is often viewed as a classification problem. In conventional classification-based VHR change detection, spectral– spatial feature extraction and detectors (e.g., SVMs [108] and the random forest [65]) are separately implemented. The recent hot spot, i.e., deep learning, can integrate these two operations in a joint learning framework, which is therefore very promising for VHR change detection [109], [110]. Deep learning-based change detectors can be grouped in terms of different criteria, including learning and fusion strategies, network models, and processing units (Table 4). We first discuss learning strategies. On the basis of a large amount of annotated data, supervised deep learning methods can capture semantic changes, and hence they 1 2012 2013 2014 MBI Harris 2015 2016 2017 MSI nDSM 2 2018 MABI 2013 2014 2015 2016 2017 2018 Non-NCBAs 4 3 Pantex Reference Data Fused Planar Features (a) Fused Vertical Features Planar Vertical Features (b) FIGURE 7. Experimental results for the automatic monitoring of newly constructed building areas (NCBAs) using planar (i.e., MBI, Harris, and Pantex) and vertical (MSI, nDSM, and MABI) features [2]. (a) Multitemporal Ziyuan-3 images. (b) NCBAs and their change timing. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 77
(a) (b) (c) (d) (e) Commission (f) (g) (h) (i) (j) Omission FIGURE 8. A comparison of different features for the MVCD data set: (a) image (t1), (b) image (t 2), (c) the ground reference, (d) spectral features, (e) the GLCM, (f) APs, (g) CTs, (h) the MABI, (i) the GLCM-Obj, and (j) deep features. TABLE 3. THE CONSIDERED METHODS’ CHANGE DETECTION ACCURACY WHEN USING THE MVCD DATA (%). METHOD CORRECTNESS COMMISSION ERROR OMISSION ERROR OVERALL ERROR Spectral 70.88 24.51 29.12 26.62 GLCM 65.05 16.1 34.95 22.04 AP 71.75 40.2 28.25 33.18 CT 75.13 33.84 24.87 28.67 MABI 57.87 28.76 42.13 34.18 GLCM-Obj 74.51 30.47 25.49 27.76 Deep 79.98 25.46 20.02 22.41 78 are sensitive to actual variations of interest and tolerant to “pseudo changes” (such as geometric deformation and radiation distortions caused by spatial displacement and phenology variation, respectively) [110]–[116]. However, it is difficult to learn a deep model only from the training samples of a study area since the proportion of the change area is usually very small. To tackle this problem, on the one hand, transfer learning [117] and meta-learning [118] are considered to leverage knowledge from other data sources. Transfer learning strategies focus on fine-tuning pretrained models that are designed for different but related tasks. Meta-learning can learn from data, and it can learn how to learn by utilizing previous experiences [119]. Regarding the IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
huge difference between VHR remote sensing images and data from other fields (e.g., natural RGB images) in terms of the image modality, spectral bands, spatial resolution, viewing angle, and so on, large amounts of publicly available multitemporal VHR remote sensing data are required to construct a robust VHR deep change detector. On the other hand, semisupervised deep learning methods, with the consideration of unlabeled samples [120], can relieve the burdensome labeling process, although the effects of unlabeled samples as well as the complexity of the semisupervised model should be further investigated. With regard to the fusion strategy, according to how bi-temporal images are dealt with, deep learning-based change detectors can be classified as early fusion and late fusion. Early fusion methods concatenate multitemporal images as a whole input into a deep network [110]. Early fusion is able to capture the hierarchical difference representation, i.e., from low-level grayscale differences in shallow layers to high-level semantic changes in deep layers, while grayscale differences that are not relevant to semantic changes, e.g., spatial misalignment and the internal variability of objects, may propagate to deeper layers and therefore lead to false alarms [113]. In contrast to early fusion, late fusion methods separately learn monotemporal features and concatenate them later as an input to the change detection layers [121]. This kind of network architecture may lead to insufficient learning, e.g., during network training. Gradients in high layers are difficult to flow backward to lower ones [122] and hence affect the change detection performance. Thus, as an attempt in [113], early and late fusion networks were combined to complement one another. As for network models, AE [123], [124], deep belief networks [125], CNNs [110], [112], [113], [115], [120], [126], recurrent neural networks (RNNs) [127]–[129], generative adversarial networks (GANs) [130], [131], and graph neural networks [132] have been adopted for end-to-end change detection. The CNN is one of the most widely used methods, and mainstream CNN architectures, such as AlexNet [133], VGGNet [134], GoogleNet [135], ResNet [136], and DenseNet [137] as well as their variants, have been considered [138]. RNNs with modules, such long short-term memory and gated recurrent units as well as their variants, are also widely employed to model the phenological process of multitemporal VHR images, due to the superiority of recurrent layers in processing sequential data and modeling time-series dependence. In addition, the U-net and its variants, which are composed of an encoder to hierarchically extract semantic information and a counterpart decoder to delineate spatial details, can be viewed as AE architectures for VHR change detection. They receive much attention due to their ability to maintain change object spatial details. Recently, some studies proposed hybrid models, such as those in [111] and [127]. For instance, as illustrated in Figure 9, a CNN and an RNN are combined in one endto-end network to extract joint spectral–spatial–temporal features [111]. In [139], difference-based methods using DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE edge-based level set evolution (ELSE), region-based level set evolution (RLSE), MRFs, and fully convolutional networks (FCNs) as well as postclassification-based methods with SVMs, CNNs, GANs, Siamese convolutional networks (SCNs), and end-to-end GAN-based Siamese frameworks (GSFs) are compared for landslide detection (Figure 10). Since observing landslides separately from unchanged and other changed regions is required, this kind of change detection is challenging. As can be seen, the four difference-based methods lead to more false alarms. As for the five postclassification methods, deep learning techniques generally outperformed SVMs, due to their explorative capabilities in representing related changes and suppressing irrelevant variations. TABLE 4. A SUMMARY OF DEEP LEARNING-BASED CHANGE DETECTORS. CRITERIA CATEGORY DESCRIPTION REFERENCES Learning strategy Supervised Based on a large number of labeled samples [110]–[116], [121], [124]– [130], [139], [140] Transfer learning Fine-tunes pretrained models that are designed for different but related tasks [117], [131] Metalearning Learns from little labeled data and learns how to learn [118] Semisupervised Joint use of labeled and unlabeled data [120], [132] Fusion strategy Network model Processing unit Early fusion Uses concatenated multitemporal images as input [110], [114], [115], [125]–[129], [131], [132], [139] Late fusion Learns monotemporal features separately and then concatenates them as a whole input [111], [112], [116], [117], [121], [130], [140] CNN Stacked convolutional, pooling, and fully connected layers [110], [112]– [116], [120], [126] Recurrent neural network Models with a recurrent hidden state, e.g., gated recurrent units and long short-term memory [127]–[129] AE Reconstructs the input with an encoder–decoder structure [123], [124] Deep belief Composed of layer-wise renetwork stricted Boltzmann machine [125] Graph neural network Learns graph structure, e.g., relationships between features of pixels/objects [132] Generative adversarial network Generator and discriminator that are adversarially trained [130], [131], [139] Patch Assigns a label to each patch [111], [115]–[117], [120], [121], [128]–[130] Pixel Predicts change labels for each pixel [110], [113], [114], [126], [131], [139] Object Incorporation of segments/ superpixels [124], [125], [127], [132], [140] 79
c h c h Convolutional Layers of Branch (t2) (a) (b) (c) Sigmoid/Softmax Unrolled Recurrent Layer Fully Convolutional Layer Convolutional Layers of Branch (t1) (d) (e) FIGURE 9. An end-to-end architecture composed of a CNN, RNN, and fully connected network for change detection [111]. (a) Image (t1) (top) and image (t 2). (b) The convolutional subnetwork. (c) The recurrent subnetwork. (d) The fully convolutional layers. (e) The binary change detection (top) and multiclass change detection. (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) FIGURE 10. Landslide detection results from different methods: (a) image (t1), (b) image (t 2), (c) ELSE, (d) RLSE, (e) an MRF, (f) an FCN, (g) an SVM, (h) a CNN, (i) a GAN, (j) an SCN, (k) a GSF, and (l) the ground truth. White and black indicate areas where landslides are detected and not detected, respectively. Red and blue circles represent landslide pixels that are wrongly detected and omitted [139]. 80 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
According to the processing unit, deep learning-based detectors are divided into patch- [116], [130], pixel- [110], [117], and object-based [127], [140] varieties. For a patch-based change detection task, a sliding window with a fixed size is used to divide the study area into a series of patches, and each patch is assigned a label by the detector. In this way, each pixel in the patch is assigned the same label. Consequently, rough location—not fine-grained—boundary-of-change information is obtained. However, patch-based change detection can reduce the influence of spatial misalignment to some extent in VHR change detection. Since patch-based deep learning networks view each patch as the change detection unit and encode each patch as a set of feature maps with coarser spatial resolutions, the spatial misalignment of these feature maps becomes smaller, and some errors of spatial alignment are therefore avoided in a change detection task. In other words, when regarding a patch as the change analysis unit, only a very large misalignment can cause an unchanged image patch to be identified as a changed one, and a small misalignment can be tolerated. Several important issues should be noticed for the patch-based method, such as the oversmoothing of results and the selection of the patch size. The multiscale strategy [135] may be appropriate for addressing these issues, but it inevitably leads to larger computation burdens. Pixel-based methods usually employ semantic segmentation architectures to predict pixel-wise change detection results [33]. Specifically, in semantic segmentation architectures, after extracting abstract semantic information through multilayer encoding (e.g., convolution layers), a series of operations, e.g., interpolation, deconvolution, and upsampling, is used to progressively decode semantic information into feature maps that have the same spatial resolution as the input images. Unlike traditional pixel-based change detectors that suffer from misregistration, viewing angle differences, and occlusions, deep learning methods can predict pixel-wise change detection with a highly semantic abstraction of the spatial context. However, object boundaries are often blurred in the change detection results, as up-sampling layers reconstruct the appearance but not the shape of objects. To cope with this issue, better networks are designed. UNet++, for example, combines nested features to preserve change region boundaries, considering that shallow layers are better able to capture spatial details [110]. Object-based deep learning methods are also considered for change detection [127], [140]. A simple approach is to adopt object-based segmentation in the pre/postprocessing step, as shown in [140]. On the other hand, object information can be also considered during the training process by adding object-wise loss terms [127]. However, issues related to conventional multitemporal image segmentation, such as oversegmentation, undersegmentation, and “sliver objects” caused by misregistration, remain unsolved. In the future, object-based detectors need to generate semantic segments and establish spatial correspondence between multitemporal segments. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE The types of characteristics most often used for each criterion (i.e., the learning strategy, fusion strategy, network model, and processing unit) in VHR change detection are summarized in the following: 1) For the learning strategy, supervised learning is the most widely used method for VHR change detection. However, the great amount of labor required to collect a large number of training samples becomes a bottleneck, especially for deep network models, which leads to increasing attention for other learning strategies. 2) Late and early fusion strategies have their own strengths and weaknesses in representing multitemporal features and their differences, and hence hybrid fusion is sometimes chosen. 3) Among various network models, CNNs are the most commonly considered, and they are coupled with other networks, i.e., hybrid models, for instance, CNN–RNNs [111]. 4) As for the processing unit, most studies consider patchand pixel-level models. Patch-level detectors are more tolerant to spatial misalignment, but pixel-based ones are more appropriate for identifying fine-grained changes. APPLICATIONS OF VHR CHANGE DETECTION VHR image change detection is widely used in a large number of practical scenarios. A series of representative applications is the focus of this review, including the monitoring and change detection of 1) land cover and land use, 2) buildings, 3) vegetation, 4) crops, 5) lakes and wetlands, 6) ecosystem services, and 7) impervious surfaces. LAND COVER AND LAND USE CHANGE DETECTION Compared to coarse- and medium-resolution images, VHR images can reveal detailed and subtle intraurban change information [141]. Specifically, urban change detection by combining multiple features (e.g., object-based spectral, shape, and texture attributes) was presented in [142], where changes to detailed urban objects, e.g., buildings, roads, and playgrounds, can be detected. Huang et al. [21] identified pixel-level change transitions in 2012–2013 using Ziyuan-3 orthographic images, and the experimental result is presented in Figure 11. It can be seen that, even in the one-year period, small-scale changes extensively occurred in the urban area of Wuhan, China. For instance, fine-scale urban land cover transitions caused by pond infilling, building demolitions, building construction, weed growth, and site preparation can be observed. In [143], changes in detailed land cover classes, including bright roofs, gray roofs, tile roofs, brown fields, dark asphalt, light asphalt, and so on, were analyzed using IKONOS and GeoEye-1 images. As for land use change detection, Wu et al. [108] interpreted change transitions, e.g., from sparse housing to industrial areas, by combining spectral and SIFT features. In [144], land use maps of Shenzhen (a highly dynamic and developed megacity in China) were generated in 2005 and 2017 based on VHR satellite data. As demonstrated in Figure 12, detailed land use categories, including residential, commercial, 81
industrial, infrastructure, grassland, farmland, woodland, water, breeding surfaces, and unused land, were monitored. In addition, the performance of different features, i.e., color histograms (CHs), LBPs, SIFTs, and deep features, were compared, and the best accuracies of 96.9% and 97.1% were obtained by the deep learning method [Figure 12(b)]. BUILDING CHANGE DETECTION Buildings are one of the most dynamic artificial structures, and building change detection is important for urban development monitoring (e.g., building demolition and construction) and disaster management (e.g., building damage caused by natural hazards). Numerous methods for building change detection have been proposed [19], [51]–[53], [85], [145]–[157]. Some studies focus on multitemporal building observation and subsequent change analysis, where descriptors for building detection in VHR images are a critical issue. The descriptors can be categorized as template matching (e.g., the snake model) [158], knowledge based (e.g., shadow evidence and the MBI) [36], [159], and machine learning [148], [160]. For example, in [52], the MBI and the Harris detector were used to identify building areas, and then building change detection was conducted through interest point matching. Other types of methods directly explore changes in shapes, colors, and textural properties that are highly related to characteristics of buildings. For example, in [51], multitemporal variations in the MBI and spectral information were used to identify altered buildings. Likewise, in [85], the change feature generated by the MBI and spectral features was considered the indicator of building change. In [161], building changes were detected through the aggregation of spectral and textural features. Figure 13 provides building change detection results from different methods, including SVMs based on MBI features (MBI–SVM), building interest point detection using the MBI and the Harris detector, MBI-based CVA (MBI–CVA), the fusion of the MBI and spectral and shape features, CVA using morphological features, and objectbased CVA. It can be seen that automatic methods can achieve performance comparable to or better than supervised ones, i.e., the MBI–SVM [Figure 13(d)]. Meanwhile, the results of the MBI–CVA [Figure 13(g)] show Result 2012 2013 b d c (b) (c) (d) e (e) f (a) Soil to Roof Roof to Soil (f) Soil to Grass Grass to Soil Water to Grass No Change FIGURE 11. Land cover change detection using Ziyuan-3 satellite imagery from 2012 and 2013. (a) The change detection result of the study area in Wuhan. (b)–(f) Five example cases of the change detection result and corresponding bi-temporal images [21]. 82 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
representation is the key to achieving good performance for VHR change detection. Apart from 2D characteristics, 3D information has been exploited for building change detection in recent more small false alarms. The fusion of the MBI and other features, e.g., the Harris detector [Figure 13(f)] and spectral and shape features [Figure 13(h)], can reduce these errors. These results illustrate that effective feature N W E 0 4 8 16 Km S 96.9 Overall Accuracy (%) 100 2005 80 74.6 97.1 84.6 79.2 88.9 77.2 78.9 69.7 63 60 40 20 2017 Residential Commercial Unused Land Breeding Surface 0 Infrastructure Grassland Industrial Woodland Water Farmland 2005 2017 CH LBP SIFT CH + LBP + SIFT Deep Learning (a) (b) FIGURE 12. Land use change detection in the city of Shenzhen using high-spatial-resolution satellite imagery from 2005 to 2017, including (a) land use maps and (b) an accuracy assessment with different features [144]. (a) (b) (c) (d) (e) Changed Unchanged (f) (g) (h) (i) FIGURE 13. Building change detection maps obtained by different algorithms: (a) image (t1), (b) image (t 2), (c) the reference change map, (d) the MBI–SVM, (e) object-based CVA, (f) the MBI and the Harris detector, (g) the MBI–CVA, (h) the fusion of the MBI with spectral and shape features, and (i) CVA using morphological features [51], [52]. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 83
Benefiting from time-series, multiview satellite imagery, Wen et al. [155] analyzed 3D annual building changes in inner city areas of four Chinese megacities (Beijing, Shanghai, Xi’an, and Wuhan). Their results characterized changes in the horizontal direction, such as construction and demolition, and quantified changes in the vertical direction, i.e., height and volume (Figure 14). It should be noted that uncertainty and the cost of 3D data can present a bottleneck for the development and application of 3D building change detection. Specifically, on the one hand, lidar data are relatively accurate but not recurrently acquired. On the other hand, photogrammetrically derived 3D data from multiview images are a sufficiently cost-effective alternative to lidar, but their 3D reconstruction qualities depend on metaparameters of stereo pairs (e.g., intersections, off-nadir angles, sun elevations, azimuth studies. With easier access to 3D data, such as multiview images, 3D information indicated by angular features can be conveniently used. More importantly, misregistration caused by spatial displacement is minimized [162]. Turker and Cetinkaya [163] detected damaged buildings by calculating the difference between digital elevation maps derived from pre- and postearthquake stereo images. In [157], multichannel indicators, such as height differences and texture similarities, are fused to monitor building changes. The incorporation of angular features is effective in improving the performance of building change detection, and it has potential for quantifying 3D dynamic processes in urban renewal and development. However, due to the relatively high cost of 3D data acquisition, such as lidar and multiview UAV images, only a few studies investigate detailed building change processes in 3D space. 2012 2013 2014 Constructed 2013 60 m 2015 2016 Height (2012) Height (2017) 2015 2016 Height (2012) Height (2017) 2017 Demolished 2013 2017 Unchanged 3m 2017 Building Change (a) 2012 2013 2014 Constructed 2013 60 m 2017 Demolished 2013 2017 Unchanged 2017 Building Change 3m (b) FIGURE 14. The annual 3D building change in subset areas of Shanghai that was achieved using multiview satellite imagery. (a) Subset area 1. (b) Subset area 2 [155]. 84 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
angles, completeness, and time differences) [164]. Therefore, successful 3D building change detection relies on more advanced models that can produce accurate multitemporal 3D data in an economical and effective way. Very recently, deep learning has been explored for 3D reconstruction from multiview images. For example, a CNN-based method was proposed for dense image matching in [165]. This novel technique may provide a new research orientation for 3D urban change detection when vertical and height information can be accurately derived from multiview satellite images. VEGETATION CHANGE DETECTION Analysis of vegetation change is important to understanding ecological transitions [166]. Using VHR imagery, vegetation change can be investigated at a much finer scale, e.g., from forest stands to individual trees. In general, there are three types of vegetation changes: 1) seasonal, caused by plant phenology; 2) gradual, caused by interannual climate variability, land management, and land degradation; and 3) abrupt, caused by disturbances, e.g., urbanization, deforestation, and fires [167]. In [168], to assess seasonal changes, both spectral and textural information extracted from multiseasonal Pléiades imagery (2 m) was used for multiseasonal leaf area index (LAI) mapping. The results showed that the highest LAI occurred in midsummer, followed by late spring, autumn, and winter, and the observed seasonal change trend was similar to that based on the in situ measured LAI. Seasonal changes in the crown scale in an Amazon tropical evergreen forest were assessed by Wang et al. [169] using Planet constellation imagery with a spatial resolution of 3 m. The crown scale fraction of nonphotosynthetic vegetation showed large seasonal trend variability from June to November. As for gradual changes, Gärtner et al. [170] used QuickBird and WorldView-2 imagery to quantify tree crown diameter changes in a degraded riparian tugai forest in northwestern China, and their results indicated that the diameter increased by 1.14 m, on average, during 2005–2011. Tian et al. [171] explored DSMs from satellite stereo sensors to monitor vertical tree growth and found that periodic annual increments at the study sites were in the range of 0.3–0.5 m. In the case of abrupt change, Dalagnol et al. [172] quantified tree canopy loss and gap recovery in tropical forests where there was low-intensity logging by using WorldView-2 and GeoEye-1 images. Their study showed that VHR satellite imagery has potential for tracking small-scale human disturbances. Ardila et al. [173] identified bi-temporal tree crown elliptical objects through the iterative surface fitting of a Gaussian model to crown membership in two urban residential areas in The Netherlands using QuickBird and aerial images. A detection rate of 77% was reported for both removed and planted trees. In addition to coverage, tree crown diameters, and canopy heights, species types are an essential parameter of vegetation community structures. In particular, VHR imagery is able to identify small and highly mixed species. Since different vegetation types exhibit similar spectral characteristics, textures are often used to identify various species. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE For instance, Lu and He [174] investigated seasonal species variations in a tall grassland in Ontario, Canada, during the growing season (from April to December) in 2015 using UAV images. The reflectance value, vegetation indices, and GLCM textures were used in the classification, and temporal change analysis revealed the growing process and succession of different species. Notably, some advanced methods, e.g., deep features [175], photogrammetric-derived DSMs from stereo images [176], phenological characteristics [177], and data fusion (e.g., lidar and airborne hyperspectral images) [178], have been considered for the change analysis of vegetation species. Moreover, some researchers attempted to discriminate vegetation function types, e.g., park, roadside, and residential–industrial trees in urban areas [179]. Likewise, vegetation function-type change monitoring is of great significance but has not been addressed in the current research. MONITORING CROP CHANGES Information about agricultural land changes, crop type conversions, and crop growth, critical for precision agriculture, can be effectively captured using VHR images. In [180], land cover data for Guanlin, Yixing City, China, in 2006, 2009, 2012, and 2015 were generated using QuickBird images, and they showed a decrease followed by an increase in the agricultural land area that was observed. Malinverni et al. [181] quantified the temporal variation of main crop rotations on the Capitanata plain of Southern Italy using WorldView-2 images, and the textural features (e.g., the GLCM and the Gabor wavelet) were employed to improve the classification accuracy. The study suggests that multitemporal classification is preferred in crop mapping, due to its rich phonological characteristics. Furthermore, frequent crop growth monitoring is extremely important for timely decision making in precision agriculture. Therefore, timeseries data are recommended, although dense time series of VHR images are relatively difficult to acquire. Recently, new generation micro-/nanosatellites (e.g., Planet) and UAV systems have become available and are able to obtain time-series VHR images, which has potential for agricultural applications. For example, Sadeh et al. [182] detected sowing dates using dense time-series Planet CubeSat data with an interval of two days. As shown in Figure 15, a partly sown field was successfully detected, implying that detailed processes on a near daily basis can be monitored by dense time series of VHR data. Likewise, Bendig et al. [183] monitored plant growth based on crop surface models using stereo UAV images. Notably, height differences between cultivars and their increased trend during the growing season can be observed. Crop change caused by disease and insect damage can also be located. VHR images are able to identify small-extent disease and insect damage, which is beneficial for controlling problems at early stages. Generally, diseases and insects can result in various kinds of harm to crop canopies, such as the removal of leaves, skeletonizing of leaf tissue, 85
management, restoration, and protection. Many studies have used remote sensing data for monitoring lakes, from a local to a global scale. They include lake changes between 1975 and 2015 across the Yangtze floodplain in China via Landsat images [191], water clarity changes in lakes and reservoirs across China that were observed using Moderate Resolution Imaging Spectroradiometer (MODIS) data [192] from 2000 to 2017, and global surface water changes between 1984 and 2015 acquired through Landsat images [193]. In these studies, which were subject to relatively low spatial resolution, lakes with large areas were targeted. However, more than 303.6 million of the 304 million lakes at the global scale are smaller than 1 km2 [194]. Therefore, VHR remote sensing images are required for observing them. To our knowledge, however, only a few studies have focused on lake monitoring using VHR images. Cooley et al. [195] tracked water changes in the 470 lakes (0.0025–1.23 km2) in the Yukon Flats of north-central Alaska during mid to late summer (23 June to 1 October) in 2016, using Planet CubeSat images with a spatial resolution of 3 m. A time-series analysis revealed that the area of 83% of the studied lakes had decreased and that 22% of the lakes had lost more than half their surface. Notably, more applications of advanced methods of water detection through VHR images, e.g., deep learning [196] and physical approaches [197], are needed. Furthermore, information about black and odorous water [198] and water types (e.g., rivers, lakes, canals, and ponds) [199] is of increasing interest, and multitemporal monitoring is imperative. In addition to lakes, VHR images have potential for monitoring detailed changes in wetland ecosystems. In [200], the results of five-level mangrove features, including vegetation boundaries, mangrove stands, mangrove zonations, individual tree crowns, and species communities, using different data sets [Landsat (30 m), Advanced Land and discoloration of leaves, and these effects vary depending on the type of disease, insect, and crop [184]. Therefore, different damage shows various spectral and structural characteristics in remote sensing images, which makes the identification of disease and insect problems via VHR images a challenging task. One of the successful applications was presented by Johansen et al. [185], where GeoEye-1 images acquired in 2012, 2013, and 2014 were used to detect canegrub damage in sugarcane fields. In the study, objects with low NDVI values and rough textures were identified as likely to be damaged, and they were further classified as low, medium, and high likelihood. Franke and Menz [186] observed different levels of disease severity in a plot of winter wheat using multitemporal QuickBird images acquired in April, May, and June. The experimental results show that VHR multispectral data are only moderately suitable for damage detection at an early growth stage, a fact attributed to the subtle spectrum and texture differences between damaged and healthy crops [187], [188]. However, VHR hyperspectral sensors seem to have potential to address this issue. For example, in [189], spectral and spatial features were extracted by a CNN from UAV hyperspectral images for the detection of yellow rust across a whole crop cycle of winter wheat. Satisfactory accuracy was achieved through all growing stages, due to the detailed spectral information and rich spatial details in VHR hyperspectral images. MONITORING LAKES AND WETLANDS Lakes and wetlands, which play a critical role in biodiversity, ecosystems, hydrology, and climate regulation, are highly dynamic due to various natural and anthropogenic factors, such as climate change, farming, urbanization, floods, and hydrological interventions [190]. Therefore, accurate and timely monitoring of lakes and wetlands is important for Change 0 0.5 1 Km Sown Field 0 0.5 1 Km Unsown Field (a) No Change 0 0.5 1 Km Noise (b) Sown Area (c) FIGURE 15. A sowing detection result obtained using time-series Planet CubeSat images [182]. (a) RGB satellite imagery. (b) The change result. (c) The sowing detection result. 86 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
Observing Satellite Advanced Visible and Near-Infrared Radiometer 2 (10 m), pan-sharpened WorldView-2 (0.5 m), and lidar] were generated and compared. As described in Figure 16, the Landsat image cannot accurately discriminate the mangrove extent, due to the mixed-pixel problem [Figure 16(e)], and more fine-scale mangrove features, i.e., tree-crown-level species, can be captured only by pan-sharpened WorldView-2 imagery [Figure 16(l)–(p)]. By summarizing the current literature, it can be found that most studies focus on detecting the extent of wetland change but ignore species change. For instance, Hu et al. [201] monitored land cover changes in the Hangzhou Xixi wetland from 2000 to 2013 using IKONOS, QuickBird, and WorldView-2 images. It was shown that the nonwetland area increased by approximately 100%, mostly in the form of herbaceous zones, followed by forests, ponds, cropland, marshes, and rivers. Wu et al. [202] integrated lidar data and multitemporal aerial imagery (1 m) to map wetland inundation dynamics in the Prairie Pothole region of North America, which is characterized by millions of small depressional wetlands. The difficulties of species change detection in wetlands lie in the following aspects. On the one hand, tidal and phenological changes make different plant species highly dynamic on daily and seasonal frequencies, respectively. On the other hand, many species have a similar spectral reflectance during the peak biomass in complex wetland landscapes [203], and the spectral signature of the same species can be influenced by many complex factors, such as the off-nadir angle, sun-viewing geometry, crown porosity, leaf clumping, and ground surface scattering [204]. For instance, in [200], mangrove species were categorized from WorldView-2 images using the nearest-neighbor classifier to extract object-based spectral and textural features within tree crowns, but a low overall accuracy of around 54% was reported. As demonstrated in Figure 16(p), misclassified open scrub Avicennia marina can be clearly observed. To improve the discriminative power among various species, the potential of VHR hyperspectral images, dense time-series data, and vertical information for characterizing detailed spectral, phenological, and height attributes needs to be explored. ECOSYSTEM SERVICES MONITORING Ecosystem services link ecosystems to human welfare by regarding nature as a stock providing a flow of services (e.g., local climate regulation and water purification) [205]. Monitoring urban ecosystem services is of great value for investigating ecological function changes and can help improve the understanding of urbanization impacts on local ecological benefits. VHR satellite data can monitor spatially explicit ecosystem services at fine scales. Generally speaking, there are two categories of methods to derive ecosystem services: 1) statistical regression and radiative transfer models and 2) land use/cover-based methods [206]. Since in situ observations are not always available and the validity of statistical regression and radiative transfer models is affected by time inconsistencies between ground and remotely DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE sensed measurements, land use/cover-based methods are often preferred. For example, in [207], land use/cover maps of Shanghai’s urban core from 2000 to 2009 were classified using IKONOS and GeoEye-1 images, and the classes were then transformed into ecosystem service supply and demand budgets, including regulating, provisioning and cultural services, and ecological integrity. An increase of at least 20% in ecosystem service supply budgets was observed, which was mainly attributed to the replacement of continuous urban fabric and industrial areas by high-rise commercial/residential areas despite a slight increase in urban green sites. Huang et al. [144] assessed ecosystem service change in Shenzhen from 2005 to 2017 using Gaofen-2 (4-m) and QuickBird (2.4-m) images. In the study, multitemporal land use maps were generated by a transferred deep CNN (as shown in Figure 12), based on which ecosystem service supply and demand values were estimated. It was found that supply capacity had decreased by 13.7% due to a reduction in woodlands, water, farmland, and so on, but, on the other hand, demand values had grown by 23.5% because urban expansion and redevelopment had increased the amount of residential, commercial, and infrastructure land. The results clearly demonstrated the ecosystem degradation of Shenzhen during the previous 10 years. Ren et al. [208] evaluated the ecosystem services of Guyuan City in 2003, 2009, and 2014 via VHR satellite imagery (e.g., QuickBird and Gaofen-1) and showed that VHR images were advantageous in the dynamic, quantitative, and visual examination of ecological changes. With VHR remote sensing images, fine-scale ecosystem services within urban areas can be effectively quantified. However, most of the current works focus on urban areas and ignore the ecosystem services of natural scenes, such as forests and wetlands. Moreover, these works present only case studies, and large-scale examinations are still lacking. IMPERVIOUS-SURFACE CHANGE DETECTION The change detection of impervious surfaces is important in monitoring and understanding urban development and has been extensively studied in the remote sensing literature. However, most of the existing studies monitor the change of impervious surfaces based on coarse- and medium-spatial-resolution satellite imagery, such as MODIS and Landsat [209], [210], which, on the other hand, have difficulty dealing with areas that have low impervious-surface intensities and mixed pixels [211]. During recent decades, images with high spatial resolution have provided new opportunities for subtle impervious-surface monitoring at very fine scales. However, impervious-surface monitoring using VHR imagery is a challenging task. VHR multitemporal images exhibit a large number of details (e.g., buildings, roads, driveways, and sidewalks), greater spatial heterogeneity (e.g., different viewing geometries), and occlusion by urban trees, shadow, and vertical structure layover [212]. To address the problem caused by shadow, Li et al. [213] 87
153°10′15″E 153°10′E 153°10′15″E 153°10′E 153°10′15″E 153°10′E 153°10′15″E (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n) (o) (p) (q) 27°24′30″S 27°24′15″S Level 1 Local Vegetation Cover 153°10′E 27°24′30″S 27°24′15″S Level 2 Local Vegetation Community 250 m 153°10′15″E Not Vegetation Mangroves Not Mangroves 27°24′15″S Vegetation 27°24′30″S Vegetation Class Level 3 Mangrove Zonations 153°10′E Mangrove Zonation Tree Crowns Canopy Gaps Closed Forest, Avicennia marina Low-Closed Forest, A. marina Open Scrub, A. marina 27°24′16″S Level 5 Species Community Tree Crowns and Species Community 20 m 153°10′17″E 153°10′18″E 153°10′17″E 153°10′18″E 27°24′15″S Zone 4 27°24′16″S Zone 3 Level 4 Tree Crowns Zone 2 27°24′15″S Zone 1 153°10′17″E 153°10′18″E FIGURE 16. Five-level mangrove features generated using different data sets [200]. (a) Level 1 TM, (b) level 1 AVNIR-2, (c) level 1 WorldView-2, (d) WorldView-2 RGB image, (e) level 2 TM, (f) level 2 AVNIR-2, (g) level 2 WorldView-2, (h) level 2 WorldView-2+LiDAR, (i) level 3 AVNIR-2, (j) level 3 WorldView-2, (k) level 3 WorldView-2+LiDAR, (l) level 4 pan-sharpened WorldView-2, (m) level 4 pan-sharpened WorldView-2+LiDAR, (n) WorldView-2 PC1,2,1, (o) level 5 pan-sharpened WorldView-2, (p) level 5 pan-sharpened WorldView-2+LiDAR, and (q) aerial photograph. 88 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
extracted multiscale object features and further classified shaded areas to extract impervious surfaces using QuickBird and IKONOS imagery. More recently, Zhang and Huang [214] developed a two-stage object-based classification method based on multilevel features (i.e., spectral, textural, shape, and class related) for time-series impervious-surface change detection in Shenzhen in 2003–2017, including the impervious-surface mapping of both nonshaded and shaded areas. As can be seen in Figure 17, in addition to single changes across the studied period (i.e., cases 1 and 2), some regions (e.g., case 3) experienced multiple changes. 1) spatial resolution: HR (2–5 m), VHR (1–2 m), and ultraHR (UHR) (<1 m) 2) temporal resolution: bi-temporal and multitemporal 3) analysis unit: pixel, object, and patch 4) change category: binary change (BC), multiple change (MC), and directional change (DC) categories 5) targets. In terms of the previously mentioned categorization schemes, a distribution of the literature reviewed in this study appears in Figure 18. Most articles use only bi-temporal images (78.12%) and concern binary change (66.32%). With regard to spatial resolution, 43.75% of the papers use UHR images, followed by VHR (33.33%) and HR (22.92%) images. As for analysis units, pixels and objects have almost the same number of articles, but patch-based change detection is rarely reported. Of the studies reviewed in this research, more than half involve land cover and land use change detection with multiple targets considered, followed by a series of specific targets, including buildings (20%), vegetation (10.53%), crops (8.42%), lakes and wetlands (5.26%), ecosystem services (3.16%), and impervious surfaces (2.1%). SUMMARY OF VHR CHANGE DETECTION DIMENSIONS As suggested in [10], remote sensing change detection can be categorized according to different dimensions, e.g., input data, temporal resolutions, change categories, targets, and analysis units. Since this research focuses on VHR optical images, the input data are discussed in terms of spatial resolutions. Therefore, we divide VHR change detection studies by considering the following five categorization schemes: 2003 2005 2007 2010 2012 2015 2017 1 2 3 (a) 1 2 Unchanged 2010 2012 Multiple Times 3 2005 2015 2007 2017 N 0 1 2 3 4 km (b) FIGURE 17. Impervious-surface monitoring results from Shenzhen during 2003–2017. (a) Some typical cases of change profiles and (b) change detection results [214]. Red borders represent corresponding change times. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 89
RECOMMENDATIONS FOR FUTURE WORK but studies related to tracking moving objects (e.g., ships, planes, trains, and vehicles) in VHR sequential videos are limited. In [219], the automatic detection and tracking of moving ships using satellite video was achieved based on multiscale saliency and surrounding contrast analysis. Wang et al. [220] presented a UAV-based vehicle detecting and tracking system, which jointly considered edges, optical flows, and local feature points. The first-ranked team at the 2016 IEEE Geoscience and Remote Sensing Society Data Fusion Contest designed an innovative deep neural network with an MSI and spaceborne video as input, and object activity was analyzed using the Kanade–Lucas–Tomasi key point tracker [221], [222]. During the coming years, space videos are likely to be a very important data source for Earth monitoring, and more promising studies based on VHR sequential videos can be expected, while a new era in VHR change detection that shifts from conventional multitemporal change detection to video sequential tracking may dawn. Despite the preceding attempts, change tracking using VHR videos is still in its early stage and needs to be further explored. Notably, unlike conventional videos, challenges related to satellite video processing may include the small size of moving objects (e.g., vehicles), complex backgrounds (e.g., building relief displacement in urban scenes), camera movements, and low frame rates. FROM CHANGE DETECTION TO TRACKING Most VHR change detection studies focus on bi-temporal images and multiple time series. However, change events, such as phenology and urban development, cannot be well characterized by coarse temporal observations. Frequent HR monitoring of both human and natural activities deserves much attention, especially when small satellite constellation (e.g., Planet) images become available. With time series VHR images, change detection is advanced from simply locating variations via bi-temporal data to dense time-series monitoring [215]. There have been attempts at time-series monitoring using VHR images of buildings [155], crops [216], water [195], impervious surfaces [214], newly constructed building areas [2], forests [217], and landslides [218]. However, most of these methods are merely an extension of bi-temporal techniques by multiple pair comparisons, which is not sufficient to capture the temporal context and semantics and to support time series analysis. Recently, VHR videos acquired by SkySat-1, Jinlin-1, and the UrtheCast Iris camera have shown great potential for near-real-time target tracking from space. Most of the current change detection studies have focused on the appearance/disappearance and shape changes of objects, DC (15.79%) HR (22.92%) Multitemporal (21.88%) UHR (43.75%) Bi-Temporal (78.12%) MC (17.89%) BC (66.32%) VHR (33.33%) (a) (b) (c) Impervious Surfaces (2.1%) Ecosystem Services (3.16%) Patch (10.53%) Lakes and Wetlands (5.26%) Pixel (43.16%) Object (46.31%) Crops (8.42%) Land Cover and Land Use (50.53%) Vegetation (10.53%) Buildings (20%) (d) (e) FIGURE 18. The distribution of different dimensions for the studies reviewed in this research: (a) temporal resolution, (b) spatial resolution, (c) change categories, (d) analysis units, and (e) targets. 90 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
HR GLOBAL CHANGE DETECTION Remote sensing imagery has long been considered an effective data source for global change detection, due to its large coverage area, convenient access, and frequent revisits. Previous multitemporal global maps of land cover and thematic change detection are often generated at a relatively coarse resolution (i.e., >300 m), e.g., 8-km-resolution global forest change based on Advanced Very High Resolution Radiometer data for 1982–1999 [223], 500-m resolution mapping of the global urban extent from MODIS data from 2005 and 2009 [224], [225], and the 300-m resolution annual Climate Change Initiative Land Cover maps from 1992 to 2015 [226]. More recently, global-scale change detection with fine spatial resolution (around 30 m) has been attempted with open source Landsat imagery. Notable examples include the Global Forest Cover database [227], GlobeLand30 global land cover product [228], Global Artificial Impervious Area annual maps [229], Global Surface Water data sets by the European Commission Joint Research Center [230], and Global Human Settlement Layer framework [231]. Please note that 30 m is not a high spatial resolution in a common sense, but it should be regarded as high in the case of intercontinental and global mapping. Recently, Gong et al. [232] developed a 10-m resolution global land cover map through Sentinel-2 images acquired in 2017. It is a trend that global products are being developed in finer spatial and temporal resolutions that can characterize heterogeneous and mixed areas more accurately. For instance, the Planet CubeSats are able to acquire images at a 3–5-m spatial resolution with near-real-time daily global coverage [233], which has potential for VHR global change detection in the future. In addition, cloud computing platforms, such as Google Earth Engine and Amazon Web Services, can facilitate the processing of large volumes of satellite images and speed the development of VHR global mapping [234]. HYPERSPECTRAL CHANGE DETECTION Hyperspectral data can distinguish more detailed land cover types due to their rich spectral information. For a long time, the data availability of hyperspectral images seemingly limited real applications in precise change detection. Recently, however, the development of hyperspectral satellites with a relatively fine spatial resolution, e.g., Gaofen-5 (30 m, with 330 spectral bands), Tiangong-1 (10 m, with 128 spectral bands), and Zhuhai-1 (10 m, with 32 spectral bands), and airborne hyperspectral sensors, e.g., HyMap (3 m, with 126 spectral bands) and the Reflective Optics System Imaging Spectrometer (ROSIS) (1.3 m, with 115 spectral bands), has significantly increased the availability of multitemporal hyperspectral images. However, studies related to VHR hyperspectral change detection are very limited, and even the existing methodologies were developed based on synthetic data [235]. Moreover, advances in hyperspectral image classification benefit from a set of widely used public benchmark data sets, e.g., the ROSIS Pavia University and DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE Airborne Visible/Infrared Imaging Spectrometer Salinas data sets [236]. Therefore, there is an urgent need for public hyperspectral change detection data sets to promote the development of the related research fields. URBAN FUNCTIONAL ZONE CHANGE DETECTION Currently, the classification of urban functional zones is one of the important research areas in interpreting VHR remote sensing images, as the urban functional zones can bridge the semantic gap between land cover and human socioeconomic activities. Current urban functional zone mapping not only involves various image features, e.g., deep [237], [238], angular [97], object based [239], and textural [240], but it also refers to multisource geographic information, such as points of interest (POIs) [241], social media [242], and mobile phone positioning [100]. In rapidly urbanizing regions, the timely and accurate monitoring of urban functional zones is crucial for planning and management. However, studies for change detection in urban functional zones are lacking. Frankly, urban functional zone change detection is a difficult task since land cover change does not necessarily signify the conversion of a functional zone type. Meanwhile, multisource geographic data, e.g., POIs, are widely used for functional zone classification [230], but these data do not provide a time tag, which hampers the dynamic monitoring of urban functional zones. These issues should be overcome to effectively monitor changes in cities. CONCLUSIONS With the increasing availability of VHR remote sensing images, precise, frequent, and stereo change detection becomes possible. To the best of our knowledge, a comprehensive review of VHR change detection is lacking in the current literature. Therefore, this article aimed to summarize recent advances in VHR remote sensing image change detection, including methods and applications. The review of methods focused on feature extraction and change detectors for multitemporal VHR images. Applications including change detection for land cover and land use, impervious surfaces, buildings, crops, vegetation, lakes and wetlands, and ecosystem services were reviewed. Finally, some future directions were suggested and discussed for this important research area. Recommendations for future work include focusing on change tracking, global change detection, hyperspectral change detection, and urban functional zone change detection to generate frequent and detailed semantic change information on a global scale. ACKNOWLEDGMENTS The authors are grateful to the editor-in-chief, associate editor, and reviewers for their insightful comments and suggestions. This research was supported by the National Natural Science Foundation of China, under grants 41901279, 41771360, and 41971295, and the Chinese Academy of Sciences Interdisciplinary Innovation Team, under grant JCTD-2019-04. (Corresponding author: Xin Huang.) 91
AUTHOR INFORMATION Dawei Wen (daweiwen@mail.hzau.edu.cn) received the B.E. degree in surveying and mapping and the Ph.D. degree in photogrammetry and remote sensing from Wuhan University, Wuhan, China, in 2013 and 2018, respectively. She is a postdoctoral researcher in the College of Public Administration, Huazhong Agricultural University, Wuhan, 430070, China. Her research interests include the change analysis of multitemporal remote sensing images and remote sensing applications. Xin Huang (xhuang@whu.edu.cn) received the Ph.D. degree in photogrammetry and remote sensing in 2009 from Wuhan University, Wuhan, China. He is a Luojia Distinguished Professor at Wuhan University, Wuhan, 430079, China, where he teaches remote sensing, photogrammetry, and image interpretation. He is the founder and director of the Institute of Remote Sensing Information Processing, School of Remote Sensing and Information Engineering, Wuhan University. He has published more than 150 peerreviewed articles (Science Citation Index papers) in international journals. His research interests include remote sensing image processing methods and applications. He was the recipient of the Boeing Award for the Best Paper in Image Analysis and Interpretation from the American Society for Photogrammetry and Remote Sensing (ASPRS) in 2010, the second-place recipient of the John I. Davidson President’s Award from ASPRS in 2018, and the winner of the IEEE Geoscience and Remote Sensing Society 2014 Data Fusion Contest. He was an associate editor of Photogrammetric Engineering and Remote Sensing (2016–2019) and of IEEE Geoscience and Remote Sensing Letters (2014–2020), and he now serves as an associate editor of IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (since 2018). He is also an editorial board member of Remote Sensing of Environment (since 2019), Science of Remote Sensing (since 2020), and Remote Sensing (since 2018). He is a Senior Member of IEEE. Francesca Bovolo (bovolo@f bk.eu) received the B.S. and M.S. degrees in telecommunication engineering (summa cum laude) and the Ph.D. degree in communication and information technologies from the University of Trento, Italy, in 2001, 2003, and 2006, respectively, where she remained as a research fellow until June 2013. She is the founder and head of the Remote Sensing for Digital Earth unit at Fondazione Bruno Kessler, Trento, 38123, Italy, and a member of the Remote Sensing Laboratory, Trento. Her research interests include multitemporal remote sensing image analysis; change detection in multispectral, hyperspectral, and synthetic aperture radar images and VHR images; time series analysis; content-based time series retrieval; domain adaptation; and lidar and radar sounders. She was the publication chair for the 2015 IEEE International Geoscience and Remote Sensing Symposium. She is the cochair of the Society of Photographic Instrumentation Engineers International Conference on Signal and Image Processing for Remote Sensing. She is a Senior Member of IEEE. 92 Jiayi Li (zjjerica@whu.edu.cn) received the B.S. degree from Central South University, Changsha, China, in 2011 and the Ph.D. degree in photogrammetry and remote sensing from Wuhan University, Wuhan, China, in 2016. She is currently an assistant professor in the School of Remote Sensing and Information Engineering, Wuhan University, Wuhan, 430079, China. She has authored more than 30 peer-reviewed articles (Science Citation Index papers) in international journals. Her research interests include hyperspectral imagery, sparse representation, computation vision and pattern recognition, and remote sensing images. She is a reviewer for more than 10 international journals, including IEEE Transactions on Geoscience and Remote Sensing, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, IEEE Geoscience and Remote Sensing Letters, IEEE Signal Processing Letters, and International Journal of Remote Sensing. She is the guest editor of the special issue “Change Detection Using Multisource Remotely Sensed Imagery” of Remote Sensing (an open-access journal of the Multidisciplinary Digital Publishing Institute). She is a Member of IEEE. Xinli Ke (kexl@mail.hzau.edu.cn) received the B.S. degree in land planning and utilization from Huazhong Agricultural University, Wuhan, China, in 2001 and the M.S. degree in cartography and geographical information systems and the Ph.D. degree in photogrammetry and remote sensing from Wuhan University, Wuhan, China, in 2006 and 2009, respectively. He is a professor in the College of Public Administration, Huazhong Agricultural University, Wuhan, 430070, China. Anlu Zhang (zhanganlu@mail.hzau.edu.cn) received the Ph.D. degree in 1999 from Huazhong Agricultural University, Wuhan, China. He has been a professor at Huazhong Agricultural University, Wuhan, 430070, China, since 2000. He is an executive director of the China Land Society; deputy director of the Academic Committee, China Land Society; deputy director of the Youth Working Committee, China Land Society; and a member of the Expert Committee, Land Remediation Center, Ministry of Land and Resources. Jón Atli Benediktsson (benedikt@hi.is) received the Cand.Sci. degree in electrical engineering from the University of Iceland, Reykjavik, Iceland, in 1984, and the M.S.E.E. and Ph.D. degrees in electrical engineering from Purdue University, West Lafayette, Indiana, USA, in 1987 and 1990, respectively. He is with the Faculty of Electrical and Computer Engineering, University of Iceland, Reykjavik, IS 107, Iceland. From 2009 to 2015, he was the prorector of science and academic affairs and a professor of electrical and computer engineering at the University of Iceland. In 2015, he was the rector of the University of Iceland. He is a cofounder of Oxymap, Reykjavik, a biomedical start-up company. He has authored and coauthored extensively in his fields of interest. His research interests include remote sensing, image analysis, pattern recognition, biomedical analysis of signals, and signal processing. He is a Fellow of IEEE. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] S. Liu, Q. Du, X. Tong, A. Samat, and L. Bruzzone, “Unsupervised change detection in multispectral remote sensing images via spectral-spatial band expansion,” IEEE J. Select. Topics Appl. Earth Observat. Remote Sens., vol. 12, no. 9, pp. 3578–3587, 2019. doi: 10.1109/JSTARS.2019.2929514. X. Huang, Y. Cao, and J. Li, “An automatic change detection method for monitoring newly constructed building areas using time-series multi-view high-resolution optical satellite images,” Remote Sensing Environment, vol. 244, p. 111,802, 2020. doi: 10.1016/j.rse.2020.111802. I. J, P. Coppin, K. Nackaerts, B. Muys, and E. Lambin, “Digital change detection methods in ecosystem monitoring: A review,” Int. J. Remote Sens., vol. 25, no. 9, pp. 1565–1596, 2004. doi: 10.1080/0143116031000101675. D. Lu, P. Mausel, E. Brondizio, and E. Moran, “Change detection techniques,” Int. J. Remote Sens., vol. 25, no. 12, pp. 2365– 2401, 2004. doi: 10.1080/0143116031000139863. A. P. Tewkesbury, A. J. Comber, N. J. Tate, A. Lamb, and P. F. Fisher, “A critical synthesis of remotely sensed optical image change detection techniques,” Remote Sens. Environ., vol. 160, pp. 1–14, 2015. doi: 10.1016/j.rse.2015.01.006. M. Hussain, D. Chen, A. Cheng, H. Wei, and D. Stanley, “Change detection from remotely sensed images: From pixelbased to object-based approaches,” ISPRS J. Photogrammetry Remote Sens., vol. 80, pp. 91–106, June 2013. doi: 10.1016/j.isprsjprs.2013.03.006. G. Chen, G. J. Hay, L. M. T. Carvalho, and M. A. Wulder, “Object-based change detection,” Int. J. Remote Sens., vol. 33, no. 14, pp. 4434–4457, 2012. doi: 10.1080/01431161.2011.648285. S. Liu, D. Marinelli, L. Bruzzone, and F. Bovolo, “A review of change detection in multitemporal hyperspectral images: Current techniques, applications, and challenges,” IEEE Geosci. Remote Sens. Mag. (replaces Newslett.), vol. 7, no. 2, pp. 140–158, 2019. doi: 10.1109/MGRS.2019.2898520. F. Bovolo and L. Bruzzone, “The time variable in data fusion: A change detection perspective,” IEEE Geosci. Remote Sens. Mag. (replaces Newslett.), vol. 3, no. 3, pp. 8–26, 2015. doi: 10.1109/ MGRS.2015.2443494. H. Si Salah, S. E. Goldin, A. Rezgui, B. Nour El Islam, and S. AitAoudia, “What is a remote sensing change detection technique? Towards a conceptual framework,” Int. J. Remote Sens., vol. 41, no. 5, pp. 1788–1812, 2020. doi: 10.1080/01431161.2019.1674463. H. Han et al., “A mixed property-based automatic shadow detection approach for VHR multispectral remote sensing images,” Appl. Sci., vol. 8, no. 10, p. 1883, 2018. doi: 10.3390/ app8101883. C. Toth and G. Jóźków, “Remote sensing platforms and sensors: A survey,” ISPRS J. Photogrammetry Remote Sens., vol. 115, pp. 22–36, May 2016. doi: 10.1016/j.isprsjprs.2015.10.004. D. Poli and T. Toutin, “Review of developments in geometric modelling for high resolution satellite pushbroom sensors,” Photogrammetric Rec., vol. 27, no. 137, pp. 58–73, 2012. doi: 10.1111/j.1477-9730.2011.00665.x. M. Dalla Mura, S. Prasad, F. Pacifici, P. Gamba, J. Chanussot, and J. A. Benediktsson, “Challenges and opportu- DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] nities of multimodality and data fusion in remote sensing,” Proc. IEEE, vol. 103, no. 9, pp. 1585–1601, 2015. doi: 10.1109/ JPROC.2015.2462751. R. Momeni, P. Aplin, and D. Boyd, “Mapping complex urban land cover from spaceborne imagery: The influence of spatial resolution, spectral band set and classification approach,” Remote Sens., vol. 8, no. 2, p. 88, 2016. doi: 10.3390/rs8020088. M. Volpi, D. Tuia, F. Bovolo, M. Kanevski, and L. Bruzzone, “Supervised change detection in VHR images using contextual information and support vector machines,” Int. J. Appl. Earth Observat. Geoinf., vol. 20, pp. 77–85, Feb. 2013. doi: 10.1016/j. jag.2011.10.013. J. P. Ardila, W. Bijker, V. A. Tolpekin, and A. Stein, “Multitemporal change detection of urban trees using localized regionbased active contours in VHR images,” Remote Sensing Environ., vol. 124, pp. 413–426, 2012. doi: 10.1016/j.rse.2012.05.027. J. Gong, C. Liu, and X. Huang, “Advances in urban information extraction from high-resolution remote sensing imagery,” Sci. China Earth Sci., vol. 63, no. 4, pp. 463–475, 2020. doi: 10.1007/ s11430-019-9547-x. R. Qin, X. Huang, A. Gruen, and G. Schmitt, “Object-based 3-D building change detection on multitemporal stereo images,” IEEE J. Select. Topics Appl. Earth Observat. Remote Sens., vol. 8, no. 5, pp. 2125–2137, 2015. doi: 10.1109/JSTARS.2015.2424275. D. Liu et al., “Integration of historical map and aerial imagery to characterize long-term land-use change and landscape dynamics: An object-based analysis via Random Forests,” Ecol. Indicators, vol. 95, pp. 595–605, Dec. 2018. doi: 10.1016/j. ecolind.2018.08.004. X. Huang, D. Wen, J. Li, and R. Qin, “Multi-level monitoring of subtle urban changes for the megacities of China using highresolution multi-view satellite imagery,” Remote Sens. Environ., vol. 196, pp. 56–75, July 2017. doi: 10.1016/j.rse.2017.05.001. G. Xian and C. Homer, “Updating the 2001 National Land Cover Database impervious surface products to 2006 using Landsat imagery change detection methods,” Remote Sens. Environ., vol. 114, no. 8, pp. 1676–1686, 2010. doi: 10.1016/j. rse.2010.02.018. M. Pesaresi et al., “A global human settlement layer from optical HR/VHR RS data: Concept and first results,” IEEE J. Select. Topics Appl. Earth Observat. Remote Sens., vol. 6, no. 5, pp. 2102– 2131, 2013. doi: 10.1109/JSTARS.2013.2271445. L. Bruzzone and F. Bovolo, “A novel framework for the design of change-detection systems for very-high-resolution remote sensing images,” Proc. IEEE, vol. 101, no. 3, pp. 609–630, 2012. doi: 10.1109/JPROC.2012.2197169. M. Lu, J. Chen, H. Tang, Y. Rao, P. Yang, and W. Wu, “Land cover change detection by integrating object-based data blending model of Landsat and MODIS,” Remote Sens. Environ., vol. 184, pp. 374–386, Oct. 2016. doi: 10.1016/j.rse.2016.07.028. S. Ye, D. Chen, and J. Yu, “A targeted change-detection procedure by combining change vector analysis and post-classification approach,” ISPRS J. Photogrammetry Remote Sens., vol. 114, pp. 115–124, Apr. 2016. doi: 10.1016/j.isprsjprs.2016.01.018. N. Longbotham, C. Chaapel, L. Bleiler, C. Padwick, W. J. Emery, and F. Pacifici, “Very high resolution multiangle urban 93
[28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] 94 classification analysis,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 4, pp. 1155–1170, 2012. doi: 10.1109/TGRS.2011.2165548. D. Poli, F. Remondino, E. Angiuli, and G. Agugiaro, “Radiometric and geometric evaluation of GeoEye-1, WorldView-2 and Pléiades-1A stereo images for 3D information extraction,” ISPRS J. Photogrammetry Remote Sens., vol. 100, pp. 35–47, 2015/02/01/, 2015. doi: 10.1016/j.isprsjprs.2014.04.007. F. Pacifici, N. Longbotham, and W. J. Emery, “The importance of physical quantities for the analysis of multitemporal and multiangular optical very high spatial resolution images,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 10, pp. 6241–6256, 2014. doi: 10.1109/TGRS.2013.2295819. K. Jacobsen, “High resolution satellite imaging systems-an overview,” Photogrammetrie Fernerkundung Geoinf., vol. 2005, pp. 487–496, Jan. 2005. D. Wen, X. Huang, L. Zhang, and J. A. Benediktsson, “A novel automatic change detection method for urban high-resolution remotely sensed imagery based on multiindex scene representation,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 1, pp. 609– 625, 2016. doi: 10.1109/TGRS.2015.2463075. N. Tatar, M. Saadatseresht, H. Arefi, and A. Hadavand, “A robust object-based shadow detection method for cloud-free high resolution satellite images over urban areas and water bodies,” Adv. Space Res., vol. 61, no. 11, pp. 2787–2800, 2018. doi: 10.1016/j.asr.2018.03.011. A. Movia, A. Beinat, and F. Crosilla, “Shadow detection and removal in RGB VHR images for land use unsupervised classification,” ISPRS J. Photogrammetry Remote Sen., vol. 119, pp. 485– 495, Sept. 2016. doi: 10.1016/j.isprsjprs.2016.05.004. G. Liasis and S. Stavrou, “Satellite images analysis for shadow detection and building height estimation,” ISPRS J. Photogrammetry Remote Sens., vol. 119, pp. 437–450, Sept. 2016. doi: 10.1016/j.isprsjprs.2016.07.006. N. Kadhim and M. Mourshed, “A shadow-overlapping algorithm for estimating building heights from VHR satellite images,” IEEE Geosci. Remote Sens. Lett., vol. 15, no. 1, pp. 8–12, 2018. doi: 10.1109/LGRS.2017.2762424. X. Huang and L. Zhang, “A multidirectional and multiscale morphological index for automatic building extraction from multispectral GeoEye-1 imagery,” Photogrammetric Eng. Remote Sens., vol. 77, no. 7, pp. 721–732, 2011. doi: 10.14358/PERS.77.7.721. H. Song, B. Huang, and K. Zhang, “Shadow detection and reconstruction in high-resolution satellite images via morphological filtering and example-based learning,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 5, pp. 2545–2554, 2014. doi: 10.1109/TGRS.2013.2262722. T. Blaschke et al., “Geographic object-based image analysis – Towards a new paradigm,” ISPRS J. Photogrammetry Remote Sens., vol. 87, pp. 180–191, Jan. 2014. doi: 10.1016/j.isprsjprs.2013.09.014. R. M. Haralick, K. Shanmugam, and I. H. Dinstein, “Textural features for image classification,” IEEE Trans. on systems, man, and cybernetics, vol. SMC-3, no. 6, pp. 610–621, 1973. doi: 10.1109/TSMC.1973.4309314. M. Hall-Beyer, “GLCM texture: A tutorial, version v3.0,” Univ. of Calgary, 2007. [Online]. Available: ttp://www.fp.ucalgary.ca/ mhallbey/tutorial.htm [41] S. Yao, S. Pan, T. Wang, C. Zheng, W. Shen, and Y. Chong, “A new pedestrian detection method based on combined HOG and LSS features,” Neurocomputing, vol. 151, pp. 1006–1014, Mar. 2015. doi: 10.1016/j.neucom.2014.08.080. [42] L. Zhang, X. Huang, B. Huang, and P. Li, “A pixel shape index coupled with spectral information for classification of high spatial resolution remotely sensed imagery,” IEEE Trans. Geosci. Remote Sens., vol. 44, no. 10, pp. 2950–2961, 2006. [43] K. Tan, X. Jin, A. Plaza, X. Wang, L. Xiao, and P. Du, “Automatic change detection in high-resolution remote sensing images by using a multiple classifier system and spectral– spatial features,” IEEE J. Select. Topics Appl. Earth Observat. Remote Sens., vol. 9, no. 8, pp. 3439–3451, 2016. doi: 10.1109/ JSTARS.2016.2541678. [44] Z. Li, W. Shi, M. Hao, and H. Zhang, “Unsupervised change detection using spectral features and a texture difference measure for VHR remote-sensing images,” Int. J. Remote Sens., vol. 38, no. 23, pp. 7302–7315, 2017. doi: 10.1080/01431161.2017.1375616. [45] D. Peng and Y. Zhang, “Object-based change detection from satellite imagery by segmentation optimization and multifeatures fusion,” Int. J. Remote Sens., vol. 38, no. 13, pp. 3886– 3905, 2017. doi: 10.1080/01431161.2017.1308033. [46] L. Zhang, B. Zhong, and A. Yang, “Building change detection using object-oriented LBP feature map in very high spatial resolution imagery,” in Proc. 10th Int. Workshop on the Anal. Multitemporal Remote Sens. Images (MultiTemp), 2019, pp. 1–4. doi: 10.1109/Multi-Temp.2019.8866919. [47] H. Liu, M. Yang, J. Chen, J. Hou, and M. Deng, “Line-constrained shape feature for building change detection in VHR remote sensing imagery,” ISPRS Int. J. Geo-Inform., vol. 7, no. 10, p. 410, 2018. doi: 10.3390/ijgi7100410. [48] M. Dalla Mura, J. A. Benediktsson, F. Bovolo, and L. Bruzzone, “An unsupervised technique based on morphological filters for change detection in very high resolution images,” IEEE Geosci. Remote Sens. Lett., vol. 5, no. 3, pp. 433–437, 2008. doi: 10.1109/LGRS.2008.917726. [49] N. Falco, M. Dalla Mura, F. Bovolo, J. A. Benediktsson, and L. Bruzzone, “Change detection in VHR images based on morphological attribute profiles,” IEEE Geosci. Remote Sens. Lett., vol. 10, no. 3, pp. 636–640, 2013. doi: 10.1109/LGRS.2012.2222340. [50] S. Liu, Q. Du, X. Tong, A. Samat, L. Bruzzone, and F. Bovolo, “Multiscale morphological compressed change vector analysis for unsupervised multiple change detection,” IEEE J. Select. Topics Appl. Earth Observat. Remote Sens., vol. 10, no. 9, pp. 4124– 4137, 2017. doi: /10.1109/JSTARS.2017.2712119. [51] X. Huang, L. Zhang, and T. Zhu, “Building change detection from multitemporal high-resolution remotely sensed images based on a morphological building index,” IEEE J. Select. Topics Appl. Earth Observat. Remote Sens., vol. 7, no. 1, pp. 105–115, 2014. [52] Y. Tang, X. Huang, and L. Zhang, “Fault-tolerant building change detection from urban high-resolution remote sensing imagery,” IEEE Geosci. Remote Sens. Lett., vol. 10, no. 5, pp. 1060–1064, 2013. [53] X. Huang, T. Zhu, L. Zhang, and Y. Tang, “A novel building change index for automatic building change detection from high-resolution remote sensing imagery,” Remote sensing letters, vol. 5, no. 8, pp. 713–722, 2014. doi: 10.1080/2150704X.2014.963732. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
[54] G. R. Cross and A. K. Jain, “Markov random field texture models,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-5, no. 1, pp. 25–39, 1983. doi: 10.1109/TPAMI.1983.4767341. [55] T. Xu, I. D. Moore, and J. C. Gallant, “Fractals, fractal dimensions and landscapes—A review,” Geomorphology, vol. 8, no. 4, pp. 245–262, 1993. doi: 10.1016/0169-555X(93)90022-T. [56] C. Benedek, M. Shadaydeh, Z. Kato, T. Szirányi, and J. Zerubia, “Multilayer Markov Random Field models for change detection in optical remote sensing images,” ISPRS J. Photogrammetry Remote Sensing, vol. 107, pp. 22–37, Sept. 2015. doi: 10.1016/j. isprsjprs.2015.02.006. [57] L. Bruzzone and D. F. Prieto, “An adaptive semiparametric and context-based approach to unsupervised change detection in multitemporal remote-sensing images,” IEEE Trans. Image Process., vol. 11, no. 4, pp. 452–466, 2002. doi: 10.1109/TIP.2002.999678. [58] A. Ghosh, B. N. Subudhi, and L. Bruzzone, “Integration of Gibbs Markov random field and Hopfield-type neural networks for unsupervised change detection in remotely sensed multitemporal images,” IEEE Trans. Image Process., vol. 22, no. 8, pp. 3087–3096, 2013. doi: 10.1109/TIP.2013.2259833. [59] B. N. Subudhi, F. Bovolo, A. Ghosh, and L. Bruzzone, “Spatiocontextual fuzzy clustering with Markov random field model for change detection in remotely sensed images,” Optics Laser Technol., vol. 57, pp. 284–292, Apr. 2014. doi: 10.1016/j.optlastec.2013.10.003. [60] H. Yu, W. Yang, G. Hua, H. Ru, and P. Huang, “Change detection using high resolution remote sensing images based on active learning and Markov random fields,” Remote Sensing, vol. 9, no. 12, p. 1233, 2017. doi: 10.3390/rs9121233. [61] S. Aleksandrowicz, A. Wawrzaszek, W. Drzewiecki, and M. Krupiński, “Change detection using global and local multifractal description,” IEEE Geosci. Remote Sens. Lett., vol. 13, no. 8, pp. 1183–1187, 2016. doi: 10.1109/LGRS.2016.2574940. [62] S. Luan, C. Chen, B. Zhang, J. Han, and J. Liu, “Gabor convolutional networks,” IEEE Trans. Image Process., vol. 27, no. 9, pp. 4357–4366, 2017. [63] Z. Li, W. Shi, H. Zhang, and M. Hao, “Change detection based on Gabor wavelet features for very high resolution remote sensing images,” IEEE Geosci. Remote Sens. Lett., vol. 14, no. 5, pp. 783–787, 2017. doi: 10.1109/LGRS.2017.2681198. [64] C. Wei, P. Zhao, X. Li, Y. Wang, and F. Liu, “Unsupervised change detection of VHR remote sensing images based on multi-resolution Markov Random Field in wavelet domain,” Int. J. Remote Sens., vol. 40, no. 20, pp. 7750–7766, 2019. doi: 10.1080/01431161.2019.1602792. [65] Q. Li, X. Huang, D. Wen, and H. Liu, “Integrating multiple textural features for remote sensing image change detection,” Photogrammetric Eng. Remote Sens., vol. 83, no. 2, pp. 109–121, 2017. doi: 10.14358/PERS.83.2.109. [66] B. Hou, Y. Wang, and Q. Liu, “Change detection based on deep features and low rank,” IEEE Geosci. Remote Sens. Lett., vol. 14, no. 12, pp. 2418–2422, 2017. doi: 10.1109/LGRS.2017.2766840. [67] S. Saha, F. Bovolo, and L. Bruzzone, “Unsupervised deep change vector analysis for multiple-change detection in VHR images,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 6, pp. 3677–3693, 2019. doi: 10.1109/TGRS.2018.2886643. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE [68] T. Zhan, M. Gong, J. Liu, and P. Zhang, “Iterative feature mapping network for detecting multiple changes in multi-source remote sensing images,” ISPRS J. Photogrammetry Remote Sens., vol. 146, pp. 38–51, Dec. 2018. doi: 10.1016/j.isprsjprs.2018.09.002. [69] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol, “Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion,” J. Mach. Learning Res., vol. 11, pp. 3371–3408, Dec. 2010. [70] P. Zhang, M. Gong, L. Su, J. Liu, and Z. Li, “Change detection based on deep feature representation and mapping transformation for multi-spatial-resolution remote sensing images,” ISPRS J. Photogrammetry Remote Sens., vol. 116, pp. 24–41, June 2016. doi: 10.1016/j.isprsjprs.2016.02.013. [71] L. Su, M. Gong, P. Zhang, M. Zhang, J. Liu, and H. Yang, “Deep learning and mapping based ternary change detection for information unbalanced images,” Pattern Recogn., vol. 66, pp. 213–228, June 2017. doi: 10.1016/j.patcog.2017.01.002. [72] G. Liu, L. Li, L. Jiao, Y. Dong, and X. Li, “Stacked Fisher autoencoder for SAR change detection,” Pattern Recogn., vol. 96, p. 106,971, Dec. 2019. doi: 10.1016/j.patcog.2019.106971. [73] N. Lv, C. Chen, T. Qiu, and A. K. Sangaiah, “Deep learning and superpixel feature extraction based on contractive autoencoder for change detection in SAR images,” IEEE Trans. Ind. Inf., vol. 14, no. 12, pp. 5530–5538, 2018. doi: 10.1109/TII.2018.2873492. [74] X. X. Zhu et al., “Deep learning in remote sensing: A comprehensive review and list of resources,” IEEE Geosci. Remote Sens. Mag. (replaces Newslett.), vol. 5, no. 4, pp. 8–36, 2017. doi: 10.1109/MGRS.2017.2762307. [75] T. Zhan, M. Gong, X. Jiang, and M. Zhang, “Unsupervised scale-driven change detection with deep spatial-spectral features for VHR images,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 8, pp. 1–13, 2020. doi: 10.1109/TGRS.2020.2968098. [76] S. Saha, L. Mou, C. Qiu, X. X. Zhu, F. Bovolo, and L. Bruzzone, “Unsupervised deep joint segmentation of multitemporal highresolution images,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 12, pp. 1–13, 2020. doi: 10.1109/TGRS.2020.2990640. [77] Q. Wang, X. Zhang, G. Chen, F. Dai, Y. Gong, and K. Zhu, “Change detection based on Faster R-CNN for high-resolution remote sensing images,” Remote Sensing Letters, vol. 9, no. 10, pp. 923–932, 2018. doi: 10.1080/2150704X.2018.1492172. [78] J. Liu et al., “Convolutional neural network-based transfer learning for optical aerial images change detection,” IEEE Geosci. Remote Sens. Lett., vol. 17, no. 1, pp. 127–131, 2019. doi: 10.1109/LGRS.2019.2916601. [79] M. Volpi and D. Tuia, “Dense semantic labeling of subdecimeter resolution images with convolutional neural networks,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 2, pp. 881–893, 2016. doi: 10.1109/TGRS.2016.2616585. [80] L. Gueguen and R. Hamid, “Toward a generalizable image representation for large-scale change detection: Application to generic damage analysis,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 6, pp. 3378–3387, 2016. doi: 10.1109/TGRS.2016.2516402. [81] R. Caye Daudt, B. Le Saux, A. Boulch, and Y. Gousseau, “Multitask learning for large-scale semantic change detection,” Comput. Vision Image Understanding, vol. 187, p. 102783, 2019. doi: 10.1016/j.cviu.2019.07.003. 95
[82] R. Gupta et al., “Creating xBD: A dataset for assessing building damage from satellite imagery,” in Proc. IEEE Conf. Comput. Vision and Pattern Recogn. Workshops, 2019, pp. 10–17. [83] J. Zhu, Y. Su, Q. Guo, and T. C. Harmon, “Unsupervised objectbased differencing for land-cover change detection,” Photogrammetric Eng. Remote Sens., vol. 83, no. 3, pp. 225–236, 2017. doi: 10.14358/PERS.83.3.225. [84] D. Ming, J. Li, J. Wang, and M. Zhang, “Scale parameter selection by spatial statistics for GeOBIA: Using mean-shift based multi-scale segmentation as an example,” ISPRS J. Photogrammetry Remote Sens., vol. 106, pp. 28–41, Aug. 2015. doi: 10.1016/ j.isprsjprs.2015.04.010. [85] P. Xiao, M. Yuan, X. Zhang, X. Feng, and Y. Guo, “Cosegmentation for object-based building change detection from highresolution remotely sensed images,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 3, pp. 1587–1603, 2017. doi: 10.1109/ TGRS.2016.2627638. [86] Y. Liu, Q. Guo, and M. Kelly, “A framework of region-based spatial relations for non-overlapping features and its application in object based image analysis,” ISPRS J. Photogrammetry Remote Sens., vol. 63, no. 4, pp. 461–475, 2008. doi: 10.1016/ j.isprsjprs.2008.01.007. [87] M. Kim and M. Madden, Xu, Bo, “GEOBIA vegetation mapping in great smoky mountains national park with spectral and nonspectral ancillary information,” Photogrammetric Eng. Remote Sensing, vol. 76, no. 2, pp. 137–149, 2010. doi: 10.14358/PERS.76.2.137. [88] Z. Lv, T. Liu, and J. A. Benediktsson, “Object-oriented key point vector distance for binary land cover change detection using VHR remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 9, pp. 6524–6533, 2020. doi: 10.1109/ TGRS.2020.2977248. [89] F. Bovolo, “A multilevel parcel-based approach to change detection in very high resolution multitemporal images,” IEEE Geosci. Remote Sens. Lett., vol. 6, no. 1, pp. 33–37, 2009. doi: 10.1109/LGRS.2008.2007429. [90] C. Geiß, M. Klotz, A. Schmitt, and H. Taubenböck, “Objectbased morphological profiles for classification of remote sensing imagery,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 10, pp. 5952–5963, 2016. doi: 10.1109/TGRS.2016.2576978. [91] J. Liang et al., “A comparison of two object-oriented methods for land-use/cover change detection with SPOT 5 imagery,” Sensor Lett., vol. 10, no. 1, pp. 415–424, 2012. doi: 10.1166/ sl.2012.1865. [92] W. Yu, W. Zhou, Y. Qian, and J. Yan, “A new approach for land cover classification and change analysis: Integrating backdating and an object-based method,” Remote Sensing Environment, vol. 177, pp. 37–47, May 2016. doi: 10.1016/j.rse.2016.02.030. [93] X. Zhang, S. Du, Q. Wang, and W. Zhou, “Multiscale geoscene segmentation for extracting urban functional zones from VHR satellite images,” Remote Sens., vol. 10, no. 2, p. 281, 2018. doi: 10.3390/rs10020281. [94] H. Liu, X. Huang, D. Wen, and J. Li, “The use of landscape metrics and transfer learning to explore urban villages in China,” Remote Sens., vol. 9, no. 4, p. 365, 2017. doi: 10.3390/rs9040365. [95] J. Zhou, B. Yu, and J. Qin, “Multi-level spatial analysis for change detection of urban vegetation at individual tree scale,” 96 Remote Sens., vol. 6, no. 9, pp. 9086–9103, 2014. doi: 10.3390/ rs6099086. [96] M. A. Aguilar, M. D. M. Saldana, and F. J. Aguilar, “Generation and quality assessment of stereo-extracted DSM from GeoEye-1 and WorldView-2 imagery,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 2, pp. 1259–1271, 2013. doi: 10.1109/TGRS.2013.2249521. [97] X. Huang, H. Chen, and J. Gong, “Angular difference feature extraction for urban scene classification using ZY-3 multi-angle high-resolution satellite imagery,” ISPRS J. Photogrammetry Remote Sens., vol. 135, pp. 127–141, Jan. 2018. doi: 10.1016/j. isprsjprs.2017.11.017. [98] H. Chaabouni-Chouayakh, I. R. Arnau, and P. Reinartz, “Towards automatic 3-D change detection through multispectral and digital elevation model information fusion,” Int. J. Image Data Fusion, vol. 4, no. 1, pp. 89–101, 2013. doi: 10.1080/19479832.2012.739577. [99] J. Tian, P. Reinartz, P. d’Angelo, and M. Ehlers, “Region-based automatic building and forest change detection on Cartosat-1 stereo imagery,” ISPRS J. Photogrammetry Remote Sens., vol. 79, pp. 226–239, May 2013. doi: 10.1016/j.isprsjprs.2013.02.017. [100] W. Tu et al., “Portraying urban functional zones by coupling remote sensing imagery and human sensing data,” Remote Sens., vol. 10, no. 1, p. 141, 2018. doi: 10.3390/rs10010141. [101] C. Liu, X. Huang, Z. Zhu, H. Chen, X. Tang, and J. Gong, “Automatic extraction of built-up area from ZY3 multi-view satellite imagery: Analysis of 45 global cities,” Remote Sens. Environ., vol. 226, pp. 51–73, June 2019. doi: 10.1016/j.rse.2019.03.033. [102] R. Duca and F. D. Frate, “Hyperspectral and multiangle CHRIS–PROBA images for the generation of land cover maps,” IEEE Trans. Geosci. Remote Sens., vol. 46, no. 10, pp. 2857–2866, 2008. doi: 10.1109/TGRS.2008.2000741. [103] Y. Yan, L. Deng, X. Liu, and L. Zhu, “Application of UAV-based multi-angle hyperspectral remote sensing in fine vegetation classification,” Remote Sens., vol. 11, no. 23, p. 2753, 2019. doi: 10.3390/rs11232753. [104] M. Zanetti and L. Bruzzone, “A theoretical framework for change detection based on a compound multiclass statistical model of the difference image,” IEEE Trans. Geosci. Remote Sens., vol. 56, no. 2, pp. 1129–1143, 2017. doi: 10.1109/ TGRS.2017.2759663. [105] Y. T. Solano-Correa, F. Bovolo, and L. Bruzzone, “An approach to multiple change detection in VHR optical images based on iterative clustering and adaptive thresholding,” IEEE Geosci. Remote Sens. Lett., vol. 16, no. 8, pp. 1–5, 2019. doi: 10.1109/ LGRS.2019.2896385. [106] J. S. Deng, K. Wang, Y. H. Deng, and G. J. Qi, “PCA‐based land‐ use change detection and analysis using multitemporal and multisensor satellite data,” Int. J. Remote Sens., vol. 29, no. 16, pp. 4823–4838, 2008. doi: 10.1080/01431160801950162. [107] A. Tahraoui, R. Kheddam, A. Bouakache, and A. Belhadj-Aissa, “Land change detection using multivariate alteration detection and Chi squared test thresholding,” in Proc. 4th Int. Conf. Adv. Technol. Signal and Image Process. (ATSIP), 2018, pp. 1–6. doi: 10.1109/ATSIP.2018.8364501. [108] C. Wu, L. Zhang, and L. Zhang, “A scene change detection framework for multi-temporal very high resolution remote IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
sensing images,” Signal Process., vol. 124, pp. 184–197, July 2016. doi: 10.1016/j.sigpro.2015.09.020. [109] X. Zhang, R. Fan, L. Ma, X. Liao, and X. Chen, “Change detection in very high-resolution images based on ensemble CNNs,” Int. J. Remote Sens., vol. 41, no. 12, pp. 4757–4779, 2020. doi: 10.1080/01431161.2020.1723818. [110] D. Peng, Y. Zhang, and H. Guan, “End-to-end change detection for high resolution satellite images using improved UNet++,” Remote Sens., vol. 11, no. 11, p. 1382, 2019. doi: 10.3390/rs11111382. [111] L. Mou, L. Bruzzone, and X. X. Zhu, “Learning spectral-spatialtemporal features via a recurrent convolutional neural network for change detection in multispectral imagery,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 2, pp. 924–935, 2019. doi: 10.1109/TGRS.2018.2863224. [112] T. Bao, C. Fu, T. Fang, and H. Huo, “PPCNET: A combined patch-level and pixel-level end-to-end deep network for high-resolution remote sensing image change detection,” IEEE Geosci. Remote Sens. Lett., vol. 17, no. 10, pp. 1–5, 2020. doi: 10.1109/LGRS.2019.2955309. [113] C. Zhang et al., “A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images,” ISPRS J. Photogrammetry Remote Sensing, vol. 166, pp. 183–200, Aug. 2020. doi: 10.1016/j.isprsjprs.2020.06.003. [114] T. Lei, Y. Zhang, Z. Lv, S. Li, S. Liu, and A. K. Nandi, “Landslide inventory mapping from bitemporal images using deep convolutional neural networks,” IEEE Geosci. Remote Sens. Lett., vol. 16, no. 6, pp. 982–986, 2019. doi: 10.1109/LGRS.2018.2889307. [115] W. Wiratama, J. Lee, S.-E. Park, and D. Sim, “Dual-dense convolution network for change detection of high-resolution panchromatic imagery,” Appl. Sci., vol. 8, no. 10, p. 1785, 2018. doi: 10.3390/app8101785. [116] W. Zhang and X. Lu, “The spectral-spatial joint learning for change detection in multispectral imagery,” Remote Sens., vol. 11, no. 3, p. 240, 2019. doi: 10.3390/rs11030240. [117] A. Song and J. Choi, “Fully convolutional networks with multiscale 3D filters and transfer learning for change detection in high spatial resolution satellite images,” Remote Sens., vol. 12, no. 5, p. 799, 2020. doi: 10.3390/rs12050799. [118] M. Zhai, H. Liu, and F. Sun, “Lifelong learning for scene recognition in remote sensing images,” IEEE Geosci. Remote Sens. Lett., vol. 16, no. 9, pp. 1472–1476, 2019. doi: 10.1109/LGRS.2019.2897652. [119] M. Rußwurm, S. Wang, M. Korner, and D. Lobell, “Meta-learning for few-shot land cover classification,” in Proc. IEEE/CVF Conf. Comput. Vision Pattern Recogn. Workshops, 2020, pp. 200–201. [120] R. Hedjam, A. Abdesselam, and F. Melgani, “Change detection in unlabeled optical remote sensing data using Siamese CNN,” IEEE J. Select. Topics Appl. Earth Observat. Remote Sens., vol. 13, pp. 4178–4187, July 2020. doi: 10.1109/JSTARS.2020.3009116. [121] H. Chen, C. Wu, B. Du, L. Zhang, and L. Wang, “Change detection in multisource VHR images via deep siamese convolutional multiple-layers recurrent neural network,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 4, pp. 2848–2864, 2020. doi: 10.1109/ TGRS.2019.2956756. [122] J. Liu, M. Gong, A. K. Qin, and K. C. Tan, “Bipartite differential neural network for unsupervised image change detection,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 3, pp. 876–890, 2020. doi: 10.1109/TNNLS.2019.2910571. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE [123] X. Junfeng, Z. Baoming, G. Haitao, L. Jun, and L. Yuzhun, “Combining iterative slow feature analysis and deep feature learning for change detection in high-resolution remote sensing images,” J. Appl. Remote Sens., vol. 13, no. 2, pp. 1–16, 2019. doi: 10.1117/1.JRS.13.024506. [124] J. Fan, K. Lin, and M. Han, “A novel joint change detection approach based on weight-clustering sparse autoencoders,” IEEE J. Select. Topics Appl. Earth Observat. Remote Sens., vol. 12, no. 2, pp. 685–699, 2019. doi: 10.1109/JSTARS.2019.2892951. [125] A. Argyridis and D. P. Argialas, “Building change detection through multi-scale GEOBIA approach by integrating deep belief networks with fuzzy ontologies,” Int. J. Image Data Fusion, vol. 7, no. 2, pp. 148–171, 2016. [126] P. F. Alcantarilla, S. Stent, G. Ros, R. Arroyo, and R. Gherardi, “Street-view change detection with deconvolutional networks,” Autonom. Robots, vol. 42, no. 7, pp. 1301–1322, 2018. doi: 10.1007/s10514-018-9734-5. [127] R. Jing et al., “Object-based change detection for VHR remote sensing images based on a Trisiamese-LSTM,” Int. J. Remote Sens., vol. 41, no. 16, pp. 6209–6231, 2020. doi: 10.1080/01431161.2020.1734253. [128] J. Geng, J. Fan, H. Wang, and X. Ma, “Change detection of marine reclamation using multispectral images via patchbased recurrent neural network,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2017, pp. 612–615. doi: 10.1109/ IGARSS.2017.8127028. [129] H. Lyu, H. Lu, and L. Mou, “Learning a transferable change rule from a recurrent neural network for land cover change detection,” Remote Sens., vol. 8, no. 6, p. 506, 2016. doi: 10.3390/ rs8060506. [130] M. Gong, X. Niu, P. Zhang, and Z. Li, “Generative adversarial networks for change detection in multispectral imagery,” IEEE Geosci. Remote Sens. Lett., vol. 14, no. 12, pp. 2310–2314, 2017. doi: 10.1109/LGRS.2017.2762694. [131] M. Gong, Y. Yang, T. Zhan, X. Niu, and S. Li, “A generative discriminatory classified network for change detection in multispectral imagery,” IEEE J. Select. Topics Appl. Earth Observat. Remote Sens., vol. 12, no. 1, pp. 321–333, 2019. doi: 10.1109/ JSTARS.2018.2887108. [132] S. Saha, L. Mou, X. X. Zhu, F. Bovolo, and L. Bruzzone, “Semisupervised change detection using graph convolutional network,” IEEE Geosci. Remote Sens. Lett., 2020. [133] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proc. Adv. Neural Information Process. Syst., 2012, pp. 1097–1105. [134] K. Simonyan and A. Zisserman, “Ver y deep convolutional n e t works for large-scale image recognition,” 2014, arXiv: 1409.1556. [135] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, “Inceptionv4, inception-resnet and the impact of residual connections on learning,” 2016, arXiv:1602.07261. [136] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vision Pattern Recogn., 2016, pp. 770–778. [137] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proc. IEEE Conf. Comput. Vision Pattern Recogn., 2017, pp. 4700–4708. 97
[138] Y. Wu, Z. Bai, Q. Miao, W. Ma, Y. Yang, and M. Gong, “A classified adversarial network for multi-spectral remote sensing image change detection,” Remote Sensing, vol. 12, no. 13, p. 2098, 2020. doi: 10.3390/rs12132098. [139] B. Fang, G. Chen, L. Pan, R. Kou, and L. Wang, “GAN-based Siamese framework for landslide inventory mapping using bi-temporal optical remote sensing images,” IEEE Geosci. Remote Sens. Lett., vol. 11, no. 11, pp. 1–5, 2020. doi: 10.3390/ rs11111292. [140] A. Song, Y. Kim, and Y. Han, “Uncertainty analysis for objectbased change detection in very high-resolution satellite images using deep learning network,” Remote Sensing, vol. 12, no. 15, p. 2345, 2020. doi: 10.3390/rs12152345. [141] S. I. Toure, D. A. Stow, H-c Shih, J. Weeks, and D. Lopez-Carr, “Land cover and land use change analysis using multi-spatial resolution data and object-based image analysis,” Remote Sens. Environ., vol. 210, pp. 259–268, June 2018. doi: 10.1016/j.rse.2018.03.023. [142] X. Wang, S. Liu, P. Du, H. Liang, J. Xia, and Y. Li, “Object-based change detection in urban areas from high spatial resolution images based on multiple features and ensemble learning,” Remote Sens., vol. 10, no. 2, 2018. doi: 10.3390/rs10020276. [143] M. Chini, C. Bignami, A. Chiancone, and S. Stramondo, “Classification of VHR optical data for land use change analysis by scale object seletion (SOS) algorithm,” in Proc. IEEE Geosci. Remote Sens. Symp., 2014, pp. 2834–2837. [144] X. Huang, X. Han, S. Ma, T. Lin, and J. Gong, “Monitoring ecosystem service change in the City of Shenzhen by the use of high‐ resolution remotely sensed imagery and deep learning,” Land Degradation Develop., vol. 30, no. 12, 2019. doi: 10.1002/ldr.3337. [145] G. Doxani, K. Karantzalos, and M. Tsakiri-Strati, “Monitoring urban changes based on scale-space filtering and object-oriented classification,” Int. J. Appl. Earth Observat. Geoinf., vol. 15, pp. 38–48, Apr. 2012. doi: 10.1016/j.jag.2011.07.002. [146] Z. Guo and S. Du, “Mining parameter information for building extraction and change detection with very high-resolution imagery and GIS data,” GISci. Remote Sens., vol. 54, no. 1, pp. 38– 63, 2017. doi: 10.1080/15481603.2016.1250328. [147] B. Hou, Y. Wang, and Q. Liu, “A saliency guided semi-supervised building change detection method for high resolution remote sensing images,” Sensors, vol. 16, no. 9, p. 1377, 2016. doi: 10.3390/s16091377. [148] X. Huang, H. Liu, and L. Zhang, “Spatiotemporal detection and analysis of urban villages in mega city regions of China using high-resolution remotely sensed imagery,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 7, pp. 3639–3657, 2015. doi: 10.1109/ TGRS.2014.2380779. [149] M. Janalipour and A. Mohammadzadeh, “Building damage detection using object-based image analysis and ANFIS from high-resolution image (case study: BAM earthquake, Iran),” IEEE J. Select. Topics Appl. Earth Observat. Remote Sens., vol. 9, no. 5, pp. 1937–1945, 2016. doi: 10.1109/JSTARS.2015.2458582. [150] T. Leichtle, C. Geiß, M. Wurm, T. Lakes, and H. Taubenböck, “Unsupervised change detection in VHR remote sensing imagery–an object-based clustering approach in a dynamic urban environment,” Int. J. Appl. Earth Observat. Geoinf., vol. 54, pp. 15–27, 2017. doi: 10.1016/j.jag.2016.08.010. 98 [151] Y. Li, X. Huang, and H. Liu, “Unsupervised deep feature learning for urban village detection from high-resolution remote sensing images,” Photogrammetric Eng. Remote Sensing, vol. 83, no. 8, pp. 567–579, 2017. doi: 10.14358/PERS.83.8.567. [152] S. Radhika, Y. Tamura, and M. Matsui, “Cyclone damage detection on building structures from pre-and post-satellite images using wavelet based pattern recognition,” J. Wind Eng. Ind. Aerodynamics, vol. 136, pp. 23–33, 2015. doi: 10.1016/j. jweia.2014.10.018. [153] N. Sofina and M. Ehlers, “Building change detection using high resolution remotely sensed data and GIS,” IEEE J. Select. Topics Appl. Earth Observat. Remote Sens., vol. 9, no. 8, pp. 3430–3438, 2016. doi: 10.1109/JSTARS.2016.2542074. [154] X. Tong et al., “Use of shadows for detection of earthquakeinduced collapsed buildings in high-resolution satellite imagery,” ISPRS J. Photogrammetry Remote Sensing, vol. 79, pp. 53–67, 2013. doi: 10.1016/j.isprsjprs.2013.01.012. [155] D. Wen, X. Huang, A. Zhang, and X. Ke, “Monitoring 3D building change and urban redevelopment patterns in inner city areas of Chinese megacities using multi-view satellite imagery,” Remote Sens., vol. 11, no. 7, p. 763, 2019. doi: 10.3390/rs11070763. [156] J. Tian, S. Cui, and P. Reinartz, “Building change detection based on satellite stereo imagery and digital surface models,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 1, pp. 406–417, 2014. doi: 10.1109/TGRS.2013.2240692. [157] R. Qin, “Change detection on LOD 2 building models with very high resolution spaceborne stereo imagery,” ISPRS J. Photogrammetry Remote Sens., vol. 96, pp. 179–192, Oct. 2014. doi: 10.1016/j.isprsjprs.2014.07.007. [158] A. Kovacs and T. Sziranyi, “Orientation based building outline extraction in aerial images,” ISPRS Ann. Photogrammetry, Remote Sens. Spatial Inf. Sci., vol. I-7, pp. 141–146, July 2012. doi: 10.5194/isprsannals-I-7-141-2012. [159] A. O. Ok, “Automated detection of buildings from single VHR multispectral images using shadow information and graph cuts,” ISPRS J. Photogrammetry Remote Sensing, vol. 86, pp. 21– 40, Dec. 2013. doi: 10.1016/j.isprsjprs.2013.09.004. [160] M. Vakalopoulou, K. Karantzalos, N. Komodakis, and N. Paragios, “Building detection in very high resolution multispectral data with deep learning features,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2015, pp. 1873–1876. [161] M. Janalipour and M. Taleai, “Building change detection after earthquake using multi-criteria decision analysis based on extracted information from high spatial resolution satellite images,” Int. J. Remote Sens., vol. 38, no. 1, pp. 82–99, 2017. doi: 10.1080/01431161.2016.1259673. [162] X. Huang and Y. Wang, “Investigating the effects of 3D urban morphology on the surface urban heat island effect in urban functional zones by using high-resolution remote sensing data: A case study of Wuhan, Central China,” ISPRS J. Photogrammetry Remote Sens., vol. 152, pp. 119–131, June 2019. doi: 10.1016/j. isprsjprs.2019.04.010. [163] M. Turker and B. Cetinkaya, “Automatic detection of earthquake‐ damaged buildings using DEMs created from pre‐ and post‐earthquake stereo aerial photographs,” Int. J. Remote Sens., vol. 26, no. 4, pp. 823–832, 2005. doi: 10.1080/01431160512331316810. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
[164] R. Qin, “A critical analysis of satellite stereo pairs for digital surface model generation and a matching quality prediction model,” ISPRS J. Photogrammetry Remote Sensing, vol. 154, pp. 139–150, Aug. 2019. doi: 10.1016/j.isprsjprs.2019.06.005. [165] S. Ji, J. Liu, and M. Lu, “CNN-based dense image matching for aerial remote sensing images,” Photogrammetric Eng. Remote Sens., vol. 85, no. 6, pp. 415–424, 2019. doi: 10.14358/ PERS.85.6.415. [166] Y. Lü et al., “Recent ecological transitions in China: Greening, browning and influential factors,” Sci. Rep., vol. 5, no. 1, p. 8732, 2015. doi: 10.1038/srep08732. [167] J. Verbesselt, R. Hyndman, G. Newnham, and D. Culvenor, “Detecting trend and seasonal changes in satellite image time series,” Remote Sens. Environ., vol. 114, no. 1, pp. 106–115, 2010. doi: 10.1016/j.rse.2009.08.014. [168] R. Pu and S. Landry, “Evaluating seasonal effect on forest leaf area index mapping using multi-seasonal high resolution satellite pléiades imagery,” Int. J. Appl. Earth Observat. Geoinf., vol. 80, pp. 268–279, Aug. 2019. doi: 10.1016/j.jag.2019.04.020. [169] J. Wang, D. Yang, M. Detto, B. W. Nelson, M. Chen, K. Guan, et al. “Multi-scale integration of satellite remote sensing improves characterization of dry-season green-up in an Amazon tropical evergreen forest,” Remote Sens. Environ., vol. 246, p. 111,865, 2020. doi: 10.1016/j.rse.2020.111865. [170] P. Gärtner, M. Förster, A. Kurban, and B. Kleinschmit, “Object based change detection of Central Asian Tugai vegetation with very high spatial resolution satellite imagery,” Int. J. Appl. Earth Observat. Geoinf., vol. 31, pp. 110–121, Sept. 2014. doi: 10.1016/j.jag.2014.03.004. [171] J. Tian, T. Schneider, C. Straub, F. Kugler, and P. Reinartz, “Exploring digital surface models from nine different sensors for forest monitoring and change detection,” Remote Sens., vol. 9, no. 3, p. 287, 2017. doi: 10.3390/rs9030287. [172] R. Dalagnol et al., “Quantifying canopy tree loss and gap recovery in tropical forests under low-intensity logging using VHR satellite imagery and airborne LiDAR,” Remote Sensing, vol. 11, no. 7, p. 817, 2019. doi: 10.3390/rs11070817. [173] J. P. Ardila, W. Bijker, V. A. Tolpekin, and A. Stein, “Quantification of crown changes and change uncertainty of trees in an urban environment,” ISPRS J. Photogrammetry Remote Sens., vol. 74, pp. 41–55, 2012. doi: 10.1016/j.isprsjprs.2012.08.007. [174] B. Lu and Y. He, “Species classification using Unmanned Aerial Vehicle (UAV)-acquired high spatial resolution imagery in a heterogeneous grassland,” ISPRS J. Photogrammetry Remote Sens., vol. 128, pp. 73–85, 2017. doi: 10.1016/j.isprsjprs.2017.03.011. [175] Y. Sun, Q. Xin, J. Huang, B. Huang, and H. Zhang, “Characterizing tree species of a tropical wetland in southern China at the individual tree level based on convolutional neural network,” IEEE J. Select. Topics Appl. Earth Observat. Remote Sens., vol. 12, no. 11, pp. 4415–4425, 2019. doi: 10.1109/JSTARS.2019.2950721. [176] Z. Xie, Y. Chen, D. Lu, G. Li, and E. Chen, “Classification of land cover, forest, and tree species classes with ZiYuan-3 multispectral and stereo data,” Remote Sens., vol. 11, no. 2, p. 164, 2019. doi: 10.3390/rs11020164. [177] R. Pu, S. Landry, and Q. Yu, “Assessing the potential of multiseasonal high resolution Pléiades satellite imagery for mapping DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE urban tree species,” Int. J. Appl. Earth Observat. Geoinf., vol. 71, pp. 144–158, Sept. 2018. doi: 10.1016/j.jag.2018.05.005. [178] S. Hartling, V. Sagan, P. Sidike, M. Maimaitijiang, and J. Carron, “Urban tree species classification using a WorldView-2/3 and LiDAR data fusion approach and deep learning,” Sensors, vol. 19, no. 6, p. 1284, 2019. doi: 10.3390/s19061284. [179] D. Wen, X. Huang, H. Liu, W. Liao, and L. Zhang, “Semantic classification of urban trees using very high resolution satellite imagery,” IEEE J. Select. Topics Appl. Earth Observat. Remote Sens., vol. 10, no. 4, pp. 1413–1424, 2017. doi: 10.1109/ JSTARS.2016.2645798. [180] M. Xia, Y. Zhang, Z. Zhang, J. Liu, W. Ou, and W. Zou, “Modeling agricultural land use change in a rapid urbanizing town: Linking the decisions of government, peasant households and enterprises,” Land Use Policy, vol. 90, pp. 104266, 2020. doi: 10.1016/j.landusepol.2019.104266. [181] E. S. Malinverni, M. Rinaldi, and S. Ruggieri, “Agricultural crop change detection by means of hybrid classification and high resolution images,” EARSeL eProc., vol. 11, no. 2, pp. 132–154, 2012. [182] Y. Sadeh, X. Zhu, K. Chenu, and D. Dunkerley, “Sowing date detection at the field scale using CubeSats remote sensing,” Comput. Electron. Agriculture, vol. 157, pp. 568–580, 2019. doi: 10.1016/j.compag.2019.01.042. [183] J. Bendig, A. Bolten, and G. Bareth, “UAV-based imaging for multi-temporal, very high resolution crop surface models to monitor crop growth variability monitoring des Pflanzenwachstums mit Hilfe multitemporaler und hoch auflösender Oberflächenmodelle von Getreidebeständen auf Basis von Bildern aus UAV-Befliegungen,” Photogrammetrie-FernerkundungGeoinf., vol. 2013, pp. 551–562, Dec. 2013. [184] P. L. Hatfield and P. J. Pinter, “Remote sensing for crop protection,” Crop Protection, vol. 12, no. 6, pp. 403–413, 1993. doi: 10.1016/0261-2194(93)90001-Y. [185] K. Johansen et al., “Using GeoEye-1 imagery for multi-temporal object-based detection of canegrub damage in sugarcane fields in Queensland, Australia,” GISci. Remote Sensing, vol. 55, no. 2, pp. 285–305, 2018. doi: 10.1080/15481603.2017.1417691. [186] J. Franke and G. Menz, “Multi-temporal wheat disease detection by multi-spectral remote sensing,” Precision Agriculture, vol. 8, no. 3, pp. 161–172, 2007. doi: 10.1007/s11119-007-9036-y. [187] L. Yuan, Y. Huang, R. W. Loraamm, C. Nie, J. Wang, and J. Zhang, “Spectral analysis of winter wheat leaves for detection and differentiation of diseases and insects,” Field Crops Res., vol. 156, pp. 199–207, 2014. doi: 10.1016/j.fcr.2013.11.012. [188] A. M. Mouazen et al., “Chapter 2—Monitoring,” “ in Agricultural Internet of Things and Decision Support for Precision Smart Farming, A. Castrignanò, G. Buttafuoco, R. Khosla, A. M. Mouazen, D. Moshou, and O. Naud, Eds. New York: Academic Press, 2020, pp. 35–138. [189] X. Zhang et al., “A deep learning-based approach for automated yellow rust disease detection from high-resolution hyperspectral UAV images,” Remote Sens., vol. 11, no. 13, p. 1554, 2019. doi: 10.3390/rs11131554. [190] Y. Wang and H. Yésou, “Remote sensing of floodpath lakes and wetlands: A challenging frontier in the monitoring of changing 99
environments,” Remote Sens., vol. 10, no. 12, p. 1955, 2018. doi: 10.3390/rs10121955. [191] C. Xie, X. Huang, H. Mu, and W. Yin, “Impacts of land-use changes on the lakes across the Yangtze floodplain in China,” Environ. Sci. Technol., vol. 51, no. 7, pp. 3669–3677, 2017. doi: 10.1021/acs.est.6b04260. [192] S. Wang et al., “Changes of water clarity in large lakes and reservoirs across China observed from long-term MODIS,” Remote Sens. Environ., vol. 247, pp. 111949, 2020. doi: 10.1016/j. rse.2020.111949. [193] J.-F. Pekel, A. Cottam, N. Gorelick, and A. S. Belward, “Highresolution mapping of global surface water and its long-term changes,” Nature, vol. 540, no. 7633, pp. 418–422, 2016. doi: 10.1038/nature20584. [194] J. A. Downing et al., “The global abundance and size distribution of lakes, ponds, and impoundments,” Limnol. Oceanogr., vol. 51, no. 5, pp. 2388–2397, 2006. doi: 10.4319/ lo.2006.51.5.2388. [195] S. W. Cooley, L. C. Smith, L. Stepan, and J. Mascaro, “Tracking dynamic northern surface water changes with high-frequency planet CubeSat imagery,” Remote Sens., vol. 9, no. 12, p. 1306, 2017. doi: 10.3390/rs9121306. [196] W. Feng, H. Sui, W. Huang, C. Xu, and K. An, “Water body extraction from very high-resolution remote sensing imagery using deep U-Net and a superpixel-based conditional random field model,” IEEE Geosci. Remote Sens. Lett., vol. 16, no. 4, pp. 618–622, 2019. doi: 10.1109/LGRS.2018.2879492. [197] F. Chen, X. Chen, T. Van de Voorde, D. Roberts, H. Jiang, and W. Xu, “Open water detection in urban environments using high spatial resolution remote sensing imagery,” Remote Sens. Environ., vol. 242, p. 11,1706, June 2020. [198] Q. Shen et al., “A CIE color purity algorithm to detect black and odorous water in urban rivers using high-resolution multispectral remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 9, pp. 6577–6590, 2019. doi: 10.1109/ TGRS.2019.2907283. [199] X. Huang, C. Xie, X. Fang, and L. Zhang, “Combining pixel- and object-based machine learning for identification of water-body types from urban high-resolution remote-sensing imagery,” IEEE J. Select. Topics Appl. Earth Observat. Remote Sens., vol. 8, no. 5, pp. 2097–2110, 2015. doi: 10.1109/JSTARS. 2015.2420713. [200] M. Kamal, S. Phinn, and K. Johansen, “Object-based approach for multi-scale mangrove composition mapping using multiresolution image datasets,” Remote Sens., vol. 7, no. 4, pp. 4753– 4783, 2015. doi: 10.3390/rs70404753. [201] T. Hu, J. Liu, G. Zheng, Y. Li, and B. Xie, “Quantitative assessment of urban wetland dynamics using high spatial resolution satellite imagery between 2000 and 2013,” Sci. Rep., vol. 8, no. 1, p. 7409, 2018. doi: 10.1038/s41598-018-25823-9. [202] Q. Wu et al., “Integrating LiDAR data and multi-temporal aerial imagery to map wetland inundation dynamics using Google Earth Engine,” Remote Sens. Environ., vol. 228, pp. 1–13, July 2019. doi: 10.1016/j.rse.2019.04.015. [203] K. S. Schmidt and A. K. Skidmore, “Spectral discrimination of vegetation types in a coastal wetland,” Remote Sens. En- 100 viron., vol. 85, no. 1, pp. 92–108, 2003. doi: 10.1016/S00344257(02)00196-7. [204] G. Viennois, C. Proisy, J. Féret, J. Prosperi, F. Sidik, Suhardjono, et al. “Multitemporal analysis of high-spatial-resolution optical satellite imagery for mangrove species mapping in Bali, Indonesia,” IEEE J. Select. Topics Appl. Earth Observat. Remote Sens., vol. 9, pp. 3680–3686, 2016. doi: 10.1109/JSTARS.2016.2553170. [205] R. B. Norgaard, “Ecosystem services: From eye-opening metaphor to complexity blinder,” Ecol. Econ., vol. 69, no. 6, pp. 1219–1227, 2010. doi: 10.1016/j.ecolecon.2009.11.009. [206] Y. Z. Ayanu, C. Conrad, T. Nauss, M. Wegmann, and T. Koellner, “Quantifying and mapping ecosystem services supplies and demands: A review of remote sensing applications,” Environ. Sci. Technol., vol. 46, no. 16, pp. 8529–8541, 2012. doi: 10.1021/ es300157u. [207] J. Haas and Y. Ban, “Mapping and monitoring urban ecosystem services using multitemporal high-resolution satellite data,” IEEE J. Select. Topics Appl. Earth Observat. Remote Sens., vol. 10, no. 2, pp. 669–680, 2016. doi: 10.1109/JSTARS.2016.2586582. [208] X. Ren, X. Chen, and Q. Ma, “Urban spatial ecological performance based on the data of remote sensing of Guyuan,” Int. Arch. Photogrammetry, Remote Sens. Spatial Inf. Sci., vol. 42, p. 3, Apr. 2018. [209] C. R. Hakkenberg, M. P. Dannenberg, C. Song, and G. Vinci, “Automated continuous fields prediction from landsat time series: application to fractional impervious cover,” IEEE Geosci. Remote Sens. Lett., vol. 17, no. 1, pp. 132–136, 2019. doi: 10.1109/LGRS.2019.2915320. [210] L. Zhang, Q. Weng, and Z. Shao, “An evaluation of monthly impervious surface dynamics by fusing Landsat and MODIS time series in the Pearl River Delta, China, from 2000 to 2015,” Remote Sens. Environ., vol. 201, pp. 99–114, Nov. 2017. doi: 10.1016/j. rse.2017.08.036. [211] G. Xian, H. Shi, J. Dewitz, and Z. Wu, “Performances of WorldView 3, Sentinel 2, and Landsat 8 data in mapping impervious surface,” Remote Sens. Appl., Soc. Environ., vol. 15, p. 100,246, 2019. doi: 10.1016/j.rsase.2019.100246. [212] W. Zhou, G. Huang, A. Troy, and M. L. Cadenasso, “Objectbased land cover classification of shaded areas in high spatial resolution imagery of urban areas: A comparison study,” Remote Sens. Environ., vol. 113, no. 8, pp. 1769–1777, 2009. doi: 10.1016/j.rse.2009.04.007. [213] P. Li, J. Guo, B. Song, and X. Xiao, “A multilevel hierarchical image segmentation method for urban impervious surface mapping using very high resolution imagery,” IEEE J. Select. Topics Appl. Earth Observat. Remote Sens., vol. 4, no. 1, pp. 103–116, 2011. doi: 10.1109/JSTARS.2010.2074186. [214] T. Zhang and X. Huang, “Monitoring of urban impervious surfaces using time series of high-resolution remote sensing images in rapidly urbanized areas: A case study of Shenzhen,” IEEE J. of Select. Topics Appl. Earth Observat. Remote Sens., vol. 11, no. 8, pp. 2692–2708, 2018. doi: 10.1109/JSTARS.2018.2804440. [215] C. E. Woodcock, T. R. Loveland, M. Herold, and M. E. Bauer, “Transitioning from change detection to monitoring with remote sensing: A paradigm shift,” Remote Sens. Environ., vol. 238, p. 111,558, 2020. doi: 10.1016/j.rse.2019.111558. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
[216] D. Helman et al., “Using time series of high-resolution planet satellite images to monitor grapevine stem water potential in commercial vineyards,” Remote Sens., vol. 10, no. 10, p. 1615, 2018. doi: 10.3390/rs10101615. [217] M. A. Wulder, J. C. White, N. C. Coops, and C. R. Butson, “Multi-temporal analysis of high spatial resolution imagery for disturbance monitoring,” Remote Sens. Environ., vol. 112, no. 6, pp. 2729–2740, 2008. doi: 10.1016/j.rse.2008.01.010. [218] D. Turner, A. Lucieer, and S. M. De Jong, “Time series analysis of landslide dynamics using an Unmanned Aerial Vehicle (UAV),” Remote Sens., vol. 7, no. 2, pp. 1736–1757, 2015. doi: 10.3390/rs70201736. [219] H. Li, L. Chen, F. Li, and M. Huang, “Ship detection and tracking method for satellite video based on multiscale saliency and surrounding contrast analysis,” J. Appl. Remote Sens., vol. 13, no. 2, p. 026511, 2019. doi: 10.1117/1.JRS.13.026511. [220] L. Wang, F. Chen, and H. Yin, “Detecting and tracking vehicles in traffic by unmanned aerial vehicles,” Automat. Construct., vol. 72, pp. 294–308, Dec. 2016. doi: 10.1016/j.autcon.2016.05.008. [221] L. Mou et al., “Multitemporal very high resolution from space: Outcome of the 2016 IEEE GRSS data fusion contest,” IEEE J. Select. Topics Appl. Earth Observat. Remote Sens., vol. 10, no. 8, pp. 3435–3447, 2017. doi: 10.1109/JSTARS.2017.2696823. [222] L. Mou and X. X. Zhu, “Spatiotemporal scene interpretation of space videos via deep neural network and tracklet analysis,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2016, pp. 1823–1826. [223] M. C. Hansen and R. S. DeFries, “Detecting long-term global forest change using continuous fields of tree-cover maps from 8-km advanced very high resolution radiometer (AVHRR) data for the years 1982–99,” Ecosystems, vol. 7, no. 7, pp. 695–716, 2004. doi: 10.1007/s10021-004-0243-3. [224] A. Schneider, M. A. Friedl, and D. Potere, “A new map of global urban extent from MODIS satellite data,” Environ. Res. Lett., vol. 4, no. 4, p. 044003, 2009. doi: 10.1088/1748-9326/4/4/044003. [225] M. A. Friedl et al., “MODIS collection 5 global land cover: Algorithm refinements and characterization of new datasets,” Remote Sens. Environ., vol. 114, no. 1, pp. 168–182, 2010. doi: 10.1016/j.rse.2009.08.016. [226] ESA. CCI-LC Product User Guide v2.4 [Online]. Available: Http://maps.elie.ucl.ac.be/CCI/viewer/download/ESACCI-LC -PUG-v2.4.pdf [227] M. C. Hansen et al., “High-resolution global maps of 21st-century forest cover change,” Science, vol. 342, no. 6160, pp. 850– 853, 2013. doi: 10.1126/science.1244693. [228] J. Chen et al., “Global land cover mapping at 30m resolution: A POK-based operational approach,” ISPRS J. Photogrammetry Remote Sens., vol. 103, pp. 7–27, May 2015. doi: 10.1016/j.isprsjprs.2014.09.002. [229] P. Gong et al., “Annual maps of global artificial impervious area (GAIA) between 1985 and 2018,” Remote Sens. Environ., vol. 236, p. 111,510, Jan. 2020. doi: 10.1016/j.rse.2019.111510. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE [230] P. Gong et al., “Mapping essential urban land use categories in China (EULUC-China): Preliminary results for 2018,” Sci. Bull., vol. 65, no. 3, pp. 182–187, 2020. doi: 10.1016/j. scib.2019.12.007. [231] M. Pesaresi, D. Ehrilch, A. J. Florczyk, S. Freire, A. Julea, T. Kemper, et al. “GHS built-up grid, derived from Landsat, multitemporal (1975, 1990, 2000, 2014),” European Commission, Joint Res. Centre, JRC Data Catalogue, 2015. [232] P. Gong, H. Liu, M. Zhang, C. Li, J. Wang, H. Huang, et al. “Stable classification with limited sample: Transferring a 30-m resolution sample set collected in 2015 to mapping 10-m resolution global land cover in 2017,” Sci. Bull., vol. 64, no. 6, pp. 370–373, 2019. doi: 10.1016/j.scib.2019.03.002. [233] R. Houborg and M. McCabe, “High-resolution NDVI from Planet’s constellation of earth observing nano-satellites: A new data source for precision agriculture,” Remote Sens., vol. 8, no. 9, p. 768, 2016. doi: 10.3390/rs8090768. [234] L. Wang, M. Jia, D. Yin, and J. Tian, “A review of remote sensing for mangrove forests: 1956–2018,” Remote Sens. Environ., vol. 231, p. 111,223, 2019. doi: 10.1016/j.rse.2019.111223. [235] A. Ertürk, M. Iordache, and A. Plaza, “Sparse unmixing with dictionary pruning for hyperspectral change detection,” IEEE J. Select. Topics Appl. Earth Observat. Remote Sens., vol. 10, no. 1, pp. 321–330, 2017. doi: 10.1109/JSTARS.2016.2606514. [236] M. V. M Graña, B Ayerdi. “Hyperspectral remote sensing scenes.” Grupo de Inteligencia Computacional (GIC). http://www.ehu.es/ ccwintco/index.php?title=Hyperspectral_Remote_Sensing _Scenes&redirect=no (accessed 2012). [237] K. Nogueira, O. A. B. Penatti, and J. A. dos Santos, “Towards better exploiting convolutional neural networks for remote sensing scene classification,” Pattern Recognition, vol. 61, pp. 539–556, Jan. 2017. doi: 10.1016/j.patcog.2016.07.001. [238] W. Zhou, D. Ming, X. Lv, K. Zhou, H. Bao, and Z. Hong, “SO– CNN based urban functional zone fine division with VHR remote sensing image,” Remote Sens. Environ., vol. 236, p. 111,458, 2020. doi: 10.1016/j.rse.2019.111458. [239] M. Li, K. M. d Beurs, A. Stein, and W. Bijker, “Incorporating open source data for Bayesian classification of urban land use from VHR stereo images,” IEEE J. Select. Topics Appl. Earth Observat. Remote Sens., vol. 10, no. 11, pp. 4930–4943, 2017. doi: 10.1109/JSTARS.2017.2737702. [240] S. Du, S. Du, B. Liu, and X. Zhang, “Context-enabled extraction of large-scale urban functional zones from very-high-resolution images: A multiscale segmentation approach,” Remote Sens., vol. 11, no. 16, p. 1902, 2019. doi: 10.3390/rs11161902. [241] X. Zhang, S. Du, and Q. Wang, “Hierarchical semantic cognition for urban functional zones with VHR satellite images and POI data,” ISPRS J. Photogrammetry Remote Sens., vol. 132, pp. 170–184, Oct. 2017. doi: 10.1016/j.isprsjprs.2017.09.007. [242] X. Liu et al., “Classifying urban land use by integrating remote sensing and social media data,” Int. J. Geographical Inform. Sci., vol. 31, no. 8, pp. 1675–1696, 2017. doi: 10.1080/13658816.2017.1324976. GRS 101
The CCSDS 123.0-B-2 “Low-Complexity Lossless and Near-Lossless Multispectral and Hyperspectral Image Compression” Standard A comprehensive review MIGUEL HERNÁNDEZCABRONERO, AARON B. KIELY, MATTHEW KLIMESH, IAN BLANES, JONATHAN LIGO, ENRICO MAGLI, AND JOAN SERRA-SAGRISTÀ ©SHUTTERSTOCK.COM/ASVMAGZ T he Consultative Committee for Space Data Systems (CCSDS) published the CCSDS 123.0-B-2, “LowComplexity Lossless and Near-Lossless Multispectral and Hyperspectral Image Compression” standard. This standard extends the previous issue, CCSDS 123.0-B-1, which supported only lossless compression, while maintaining backward compatibility. The main novelty of the new issue is support for near-lossless compression, i.e., lossy compression with user-defined absolute and/or relative error limits in the reconstructed images. This new feature is achieved via closed-loop quantization of prediction errors. Two further additions arise from the new nearlossless support: first, the calculation of predicted sam- 102 ple values using sample representatives that may not be equal to the reconstructed sample values, and, second, a new hybrid entropy coder designed to provide enhanced compression performance for low-entropy data, prevalent when nonlossless compression is used. These new features enable significantly smaller compressed data volumes than those achievable with CCSDS 123.0-B-1 while controlling the quality of the decompressed images. As a result, larger amounts of valuable information can be retrieved given a set of bandwidth and energy consumption constraints. Digital Object Identifier 10.1109/MGRS.2020.3048443 Date of current version: 10 February 2021 BACKGROUND During the past 30 years, multispectral imaging and hyperspectral imaging (HSI) have become a staple tool used for geoscience remote sensing and Earth observation [1], 0274-6638/21©2021IEEE IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
[2]. This type of imagery enables the simultaneous registration of multiple parts of the electromagnetic spectrum, providing invaluable information for many detection, classification, and unmixing problems [3]. As a result, today, remote sensing HSI is used in many commercial, scientific, and defense areas, including precision agriculture, mining, forestry, coastal and oceanic observation, intelligence, and disaster monitoring [3]–[6]. Due to the growing quantity of deployed sensors [7], the number of public and private remote sensing stakeholders [4], and the ongoing effort to improve the analysis of retrieved images [8]–[19], the importance of HSI is likely to increase in the future. Images produced by multispectral and hyperspectral sensors consist of multiple spectral bands, instead of the three—red, green, and blue—present in traditional color images. Depending on the application and the available hardware, the number of registered bands can be on the order of tens, hundreds, and even thousands [20]. Thus, HSI generates significantly larger volumes of data compared to traditional imagers. Moreover, the spatial resolution of the deployed sensors also follows a rising trend, further increasing the amount of data produced. For instance, the HyspIRI sensor developed by NASA can produce up to 5 TB of data per day [21]. However, the downlink channel capacity between the remote sensing devices and the ground stations is constrained, which limits the amount and quality of the retrieved data [22]. Data compression is typically applied to reduce the amount of data to be downloaded, hence improving effective transmission capacity [23]–[27]. Due to hardware and energy constraints, employed algorithms must be tailored to attain a beneficial tradeoff between complexity and efficiency [22], [28]. When lossless compression is applied to images, the resulting compressed data suffice to reconstruct identical copies of the originals. On the other hand, lossy compression enables the transmission of even smaller data volumes at the cost of the reconstructed images not being identical to the originals. Among lossy compression algorithms, those that provide user-controlled bounds on the maximum error introduced in any sample are referred to as near lossless. In spite of the distortion introduced by lossy and nearlossless methods, several studies have concluded that reconstructed images can be successfully used for the intended analysis tasks [29]. This is sometimes observed for compressed images up to 25-times smaller than the original ones [30]. Notwithstanding, a successful analysis can be performed only when the amount of loss is adequate for the type of images and the task at hand [29], [31]. One of the main advantages of near-lossless compressors is that they guarantee the accuracy of all the reconstructed samples in an image. This is in contrast to regular lossy compression approaches, which typically provide competitive average distortion results but no assurance about the fidelity of any given set of samples. Regardless of the employed compression regime, compression algorithms DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE must meet very stringent limitations in terms of complexity and required computational resources [32]. This constraint is particularly relevant for small satellites and CubeSats, which have attracted much scientific and industrial interest recently [4], [33]. The CCSDS, founded in 1982, publishes the standards for spaceflight communication used in more than 900 space missions to date. (An updated list of space missions using CCSDS standards can be found at https://public. ccsds.org/implementations/missions.aspx.) CCSDS standards enable cooperation among space agencies and with industrial associates, seeking enhanced interoperability, reliability, and cost-effectiveness. The latest CCSDS compression standard (CCSDS 123.0-B-2, “Low-Complexity Lossless and Near-Lossless Multispectral and Hyperspectral Image Compression [34]), the central topic of this article, supersedes CCSDS 123.0-B-1 [35] while maintaining backward compatibility. In the CCSDS naming convention, suffixes “-1” and “-2” denote the first and second issues, respectively, of a standard. Hereafter, CCSDS 123.0-B-1 and CCSDS 123.0-B-2 are also denoted as Issue 1 and Issue 2, respectively. Perhaps the most relevant novel feature of Issue 2 is a new near-lossless compression regime, enabled by a closed-loop scalar quantizer in the prediction stage [36]. Note that this in-loop quantization approach enables a higher compression performance than quantization of input samples before prediction [26]. With this new feature, users can specify the maximum error limits—absolute and/or relative—introduced in the decompressed images. Fidelity settings can vary from band to band and can be periodically updated within an image. Another new feature of Issue 2 is a hybrid entropy coder option. It is specifically designed to provide improved performance on low-entropy data, i.e., for the case when prediction errors tend to be small compared to the quantizer stepsize. The hybrid encoder extends the sample-adaptive codes of CCSDS 123.0-B-1 with 16 additional variable-to-variablelength codes, which can represent multiple input symbols using a single codeword. To guarantee backward compatibility, both lossless and near-lossless compression can be performed with either of CCSDS 123.0-B-1’s original entropy coders or with the new hybrid option. A third novelty in the new standard is a new mode within the predictor stage called narrow local sums, which are designed to facilitate the design of efficient hardware implementations. Yet another change introduced in the new standard is added support for optional supplementary information tables, which can provide ancillary image or instrument information, e.g., to identify the wavelengths associated with each spectral band. This article provides a comprehensive overview of Issue 2, paying special attention to the new concepts and capabilities not present in Issue 1. The content hereafter presented extends those presented in a previous conference work [36]. The following overview is more in depth, it assumes no 103
previous knowledge of Issue 1, and a performance evaluation is included. Furthermore, the experimental results discussed here complement those in [37] by providing both a quantitative and qualitative comparison to other relevant compression methods. THE NEW CCSDS 123.0-B-2 STANDARD PREVIOUS WORK The CCSDS Data Compression Working Group (1995–2007; 2020–present) and the Multispectral and Hyperspectral Data Compression (MHDC) Working Group (2007–2020) have developed and maintained several compression standards applicable to remote sensing HSI, listed chronologically in Table 1. The CCSDS 121.0-B-1 standard describes a general-purpose adaptive entropy coder. In CCSDS 121.0B-2, the efficiency and flexibility of this entropy coder was enhanced by allowing larger block sizes and the possibility of using a restricted set of codewords. (As this entropy coder is available in the new CCSDS 123.0-B-2 standard, an overview is provided later in the “Block-Adaptive Coder” section). The CCSDS 122.0-B-1 standard was designed specifically for image data and supports both lossless and lossy regimes. It consists of a spatial discrete wavelet transform, which is then followed by a bit-plane coder. The CCSDS 122.1-B-0 standard extends CCSDS 122.0-B-1 by allowing the application of spectral decorrelation transforms. To provide compatibility between the 122.0 and 122.1 standards, a second issue of 122.0 (CCSDS 122.0-B-2) was also published. Finally, the CCSDS 123.0-B-1 standard formalizes a predictive coding scheme for multispectral and hyperspectral data. This standard is the immediate predecessor of the one addressed in this article, and their functional blocks are described in subsequent subsections. Several hardware implementations can be found in the literature of the CCSDS 123.0-B-1 standard. In [44], a parallelization technique is described that achieves from 31 to 123 Megasamples per second (Ms/s), respectively, on the Xilinx V-7 XC7VX690T and V-5QV FX130T field-programmable TABLE 1. A CHRONOLOGY OF CCSDS DATA-COMPRESSION STANDARDS. NAME RELEASE STATUS REGIME MULTISPECTRAL 121.0-B-1 [38] May 1997 Retired LL No 122.0-B-1 [39] May 2005 Retired LL, LS No 121.0-B-2 [40] April 2012 Retired LL No 123.0-B-1 [35] May 2012 Retired LL Yes 122.0-B-2 [41] September 2017 Active LL, LS No 122.1-B-1 [42] Active LL, LS Yes September 2017 123.0-B-2 [34] February 2019 Active LL, NL Yes 121.0-B-3 [43] August 2020 Active LL No The active recommendations (blue books) are shown in blue while retired (superseded) standards (silver books) are presented in gray. Lossless, lossy, and near-lossless compression regimes are denoted as LL, LS, and NL, respectively. The “multispectral” column indicates whether or not several bands can be compressed simultaneously. 104 gate arrays (FPGAs). In [45], parallelization using C-slow retiming is proposed, which achieves a throughput of up to 213 Ms/s on a space-grade Virtex-5QV FPGA. In [46], another implementation, this one with a throughput of 147 Ms/s on a Xilinx Zynq-7020 FPGA, is described. The FPGA design discussed in [47] allows parallel processing of any number of samples, provided that resource constraints are met. This enables configurable tradeoffs between throughput and power consumption. In [48], a low-cost FPGA design is described for the prediction block of CCSDS 123.0-B-1, with a throughput as high as 20 Ms/s on a Xilinx Zynq-7000 FPGA. In [49]– [51], low-complexity and low-occupancy FPGA designs are proposed. These implementations are designed to be independent and combinable in a plug-and-play fashion. The latest version of this system, referred to as SHyLoC 2.0, yields a throughput of 150 Ms/s on a Xilinx Virtex XQR5VFX130 FPGA. The hardware designs for CCSDS 123.0-B-2 are currently ongoing, with the European Commission funding two research projects within the framework of the Horizon 2020 (H2020) program [52], [53] and with NASA and the European Space Agency funding other research projects [54], [55]. To the best of our knowledge, there are no public implementations of Issue 2 available. The extensions to CCSDS compression algorithms have been published as well. In [56], a method to extend lossless predictive coding schemes—in particular, CCSDS 123.0B-1—was proposed. This method enables compression in a lossy regime, producing constant signal-to-noise ratio (SNR) and accurate rate control. In [57], a lightweight arithmetic coder was proposed as a possible replacement for the entropy coder of CCSDS 123.0-B-1. Some algorithms have been suggested related to the prediction stage of Issue 2, based on recursive least-squares theory. These algorithms describe more adaptive prediction methods at the cost of increased computational complexity. In [58], the inverse correlation matrix of the local differences is used to update the prediction weights. In [59], this predictor is enhanced by adaptively selecting the number of local differences to be used. In [60], two prediction modes are described: the first uses only spectral neighbors in the weight update process; the second also employs spatial neighbors. The best of the two for each band in terms of mean absolute error is selected for coding. In [61], the image is divided into nonoverlapping regions, which allows for parallel application of the methods described in [59] and [60]. OVERVIEW OF THE NEW STANDARD The CCSDS 123.0-B-2 standard is based on the fast lossless extended (FLEX) compressor [62]. In turn, FLEX is based on the fast lossless (FL) compressor [63], which was formalized as CCSDS 123.0-B-1. FLEX improves upon FL by adding adjustable lossy compression capabilities while maintaining the option to perform lossless compression. The latest CCSDS compression standard extends FLEX by adding new features, such as relative error limits, periodic error limit updating, and new prediction modes that facilitate hardware IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
implementations. Very importantly, Issue 2 has been designed to retain many of FL’s desirable properties, including low computational complexity; single-pass compression and decompression; automatic adaptation to the data being compressed; and the ability to operate requiring a constant, reasonably sized memory space. Moreover, Issue 2 inherits all the capabilities of CCSDS 123.0-B-1, allowing decompression of the data output by the latter. These features make both issues of the CCSDS 123.0 standard suitable for use onboard spaceborne systems, including small satellite missions. Note that compressed images do not include synchronization markers or any other similar scheme. It is assumed that the transport layer will provide the ability to locate the next image in the event of a bit error or data loss. The general structure of the Issue 2 compressor is shown in Figure 1. Similar to CCSDS 123.0-B-1, the input data—signed or unsigned integers—go through a predictor stage in which previously coded information is employed to predict the value of the next sample to be compressed. As a main novelty of Issue 2, prediction errors are uniformly quantized. The quantization bin sizes are determined by the user’s choice of absolute error limit (i.e., the maximum allowed absolute difference between the original and reconstructed sample values) and/or the relative error limit, which controls the maximum ratio of the error to the sample’s predicted value. Quantized data are then mapped to nonnegative integers, which then are input to the entropy coder. When nonzero error limits are selected, quantizer indices represent approximations of the aforementioned prediction errors, instead of the actual values. In this case, data output by the predictor stage typically exhibit lower entropy rates, which allows the coder to produce smaller compressed files. To make decompression possible, the decoder must be able to make the same predictions as the encoder. To guarantee this, when nonzero error limits are selected, prediction is done using so-called sample representatives instead of the original samples. The following sections provide an informative description of the aforementioned functional blocks. For the sake of readability, some definitions in this description are simplified so as to not contemplate boundary cases, e.g., the image edges when neighboring samples are involved. Interested readers are referred to [34] for complete, normative definitions. A list of the symbols employed hereafter is available in Table 2 for ease of reference. PREDICTOR STAGE The predictor stage is designed to process input samples sequentially in a single pass, producing one mapped quantizer index per input sample. Although CCSDS 123.0-B-1 was designed to accept input samples of, at most, 16 bits, Issue 2 accepts bit depths, D, up to 32 bits. Hereafter, s z (t) denotes the tth sample of the zth spectral band in raster scan order, and d z (t) is its corresponding mapped quantizer index. To obtain d z (t), a prediction of the sample’s original value, denoted as ts z (t), is computed as described in the “Prediction” section, and the prediction error is computed as D z (t) = s z (t) - ts z (t). (1) This prediction error is then quantized, as discussed in the “Quantization” section, to produce a quantizer index q z (t). This index is mapped to a nonnegative value: d z (t), the output of the predictor stage, as described in the “Quantizer Index Mapping” section. The quantizer index is also transformed into its corresponding sample representative smz (t), as described in the “Sample Representatives” section. These representatives are then used to obtain the predicted values, ts z (t), used in (1). As mentioned previously in this section, the sample value prediction must be based on smz (t) instead of s z (t) to avoid compressor–decompressor prediction differences when compression is not lossless. QUANTIZATION The CCSDS 123.0-B-2 standard allows for quantization of each prediction error D z (t) into a quantizer index q z (t) so that D z (t)—and, thus, also the input sample s z (t)—can be reconstructed with maximum error m z (t). A quantizer with uniform bin size 2m z (t) + 1 is used, i.e., Encoder Predictor Quantized Prediction Prediction Input Errors Errors Image Quantizer qz(t ) ∆z(t ) sz(t ) Index Quantization – Mapping Sample Representatives Predicted s z″(t) Sample Sample Prediction Values Representative sz(t ) Calculation Mapped Quantizer Indices δz(t ) BlockAdaptive Coder Coder Selection (Once Per Image) Encoded Bitstream SampleAdaptive Coder Hybrid Coder FIGURE 1. A structure overview of the CCSDS 123.0-B-2 compressor. The new functional blocks with respect to CCSDS 123.0-B-1 are high- lighted in blue while the modified blocks are shown in green. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 105
TABLE 2. A LIST OF SYMBOLS REFERENCED IN THIS ARTICLE. SYMBOL D z (t) + m z (t) q z (t) = sgn (D z (t))· < 2m (t) + 1 F , (2) z MEANING GENERAL s z (t) Original sample value (tth sample of spectral band z) D Dynamic range in bits s min, s max Minimum and maximum allowed sample values s mid Midrange sample value NX , NY Horizontal and vertical spatial dimensions of the image NZ Number of spectral bands of the image where the sgn function is defined as 1, x 2 0 sgn (x) = * 0, x = 0 . (3) - 1, x 1 0 The users control m z (t) indirectly by selecting the maximum absolute error az , the maximum relative error rz , or both for each spectral band z. When only absolute error limits are specified, SAMPLE REPRESENTATIVE CALCULATION smz (t) Sample representative for s z (t) H Sample representative resolution zz Sample representative damping for band z ]z Sample representative offset for band z st z (t) Predicted value for s z (t) X Prediction weight arithmetic resolution P Number of previous bands used for the prediction PREDICTION s z, y, x Alternative notation for s z (t) v z, y, x Local sum for s z, y, x NW d z, y, x, d Nz, y, x, d W z, y, x, d z, y, x Local differences for s z, y, x U z, y, x Local difference vector for s z, y, x W z, y, x Prediction weight vector for s z, y, x dt z, y, x Predicted central local difference for s z, y, x Double resolution predicted value for s z (t) su z (t) (i ) z v min, v max, t inc, g , g * z User-specified weight update parameters QUANTIZATION D z (t) Prediction error for s z (t) q z (t) Quantizer index of D z (t) sź (t) Clipped quantizer bin center for D z (t) az Maximum absolute error in the spectral band z rz Maximum relative error in the spectral band z m z (t) Maximum reconstruction error | s z (t) - slz (t) | d z (t) Mapped quantizer index for q z (t) i z (t) Scaled difference between st z (t) and the closest of smin and smax U max Golomb-power-of-2 (GPO2) length limit R z (t) Accumulator value for d z (t) QUANTIZER INDEX MAPPING ENTROPY CODING C (t) c * k z (t) Counter value for d z (t) Sample-adaptive rescaling counter size GPO2 code index for d z (t) Ru z (t) High-resolution counter value for d z (t) i Hybrid code index Ti Hybrid code entropy-threshold constants Li Hybrid code symbol-limit constants X Hybrid code escape symbol 106 m z (t) = a z . (4) When only relative error limits are set, m z (t) = ; rz | ts z (t)| E, (5) 2D where ts z (t) is the predicted value for the original sample s z (t). Setting relative error limits allows for the reconstruction of different samples with dissimilar degrees of precision. More specifically, the samples predicted to have a smaller magnitude are reconstructed with lower error. Note that predicted, rather than actual, sample values are used in (5) to keep the encoder and the decoder synchronized. Thus, absolute error bounds are not guaranteed when only a relative error limit rz 2 0 is specified. When both the absolute and relative error limits are used, m z (t) is set to the minimum of (4) and (5). When lossless compression is desired in band z, users may set a z = 0 or rz = 0 so that m z (t) = 0. This guarantees that q z (t) = D z (t), i.e., that the original samples can be reconstructed exactly. It is worth emphasizing that error limits can be set individually for each spectral band. With this mechanism, higher-importance bands can be reconstructed with greater fidelity (even perfect fidelity), while lesser-priority bands can be represented with lower fidelity using smaller compressed data volumes [56], [64]–[6]. Furthermore, the periodic error limit update option can be activated so that different fidelity choices can be adapted within a band. This option is useful to meet a given downlink transmission rate constraint and/or to better preserve the image regions expected to contain features of interest. It should be highlighted that the standard does not define a specific method for selecting error limit values, e.g., to meet a given downlink rate. This is because error limit values are encoded in the bitstream, and thus, the decoder does not need to know how those error limits were selected. SAMPLE REPRESENTATIVES The decompressor must duplicate the prediction operation performed by the compressor, but, in general, the original image samples s z (t) cannot be perfectly reconstructed from the compressed bitstream because of information lost during the quantization stage. Consequently, the prediction IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
calculation (in both the compressor and decompressor) is performed using sample representatives smz (t) in place of the original samples s z (t). A naive solution to this problem is to use the central point slz (t) of the quantizer bin, whose index q z (t) is transmitted to the decoder. The quantizer bin center slz (t) can be calculated as slz (t) = clip ^ts z (t) + q z (t)· (2m z (t) + 1), s min, s max h, (6) where s min and s max are the minimum and maximum values, respectively, allowed for an input sample and clip (x, a, b) = min (b, max (a, x))(7) guarantees that slz (t) falls within the allowed value range. However, using the quantizer bin center slz (t) as the sample representative smz (t) for prediction does not always minimize compressed data volume [37]. This is true even for m z (t) = 0, i.e., lossless compression. In the CCSDS 123.0-B-2 standard, three user-specified parameters can be used to adjust the choice of smz (t). These are the sample representative resolution (H), damping (z z), and offset (} z) parameters. Based on them, sample representatives smz (t) are defined as an integer approximation to zz z ts z (t) + c 1 - Hz mc slz (t) - }Hz sgn ^q z (t) h m z (t) m . (8) 2H 2 2 Regardless of the parameter choice, the sample representatives always fall between slz (t) and ts z (t). Parameter H determines the precision with which representatives are computed. Parameter z z limits the effect of noisy samples in the representative calculation. In turn, parameter } z establishes a bias toward slz (t) or ts z (t), depending on its value. Although H is defined for the whole image, z z and } z can be chosen on a band-by-band basis. Setting z z = } z = 0 causes the sample representatives to be equal to slz (t); the larger values of z z and/or } z produce representatives closer Sample Band z Representation S″z + Local Difference Vector Uz,y,x Sample Band z – 1 Representation S″z–1 Local Sums σz–P dz–1,y,x ... PREDICTION The predicted sample value ts z (t) for an input sample s z (t) is computed causally using sample representatives from spectral bands z - P, f, z, where P $ 0 is a user-defined parameter. Within each band, previous sample representatives are used to compute local sums. These can be regarded as preliminary, scaled estimates of the actual sample value. Local sums, in combination with the sample representatives, are used to compute local differences. The predicted value ts z (t) is then calculated using the local sum in the current band z as well as a weighted sum of local differences from the current and previous bands. Local sums can be understood as a local mean subtraction, and prediction as being made in the mean-subtracted domain. Figure 2 displays an overview of the prediction process. Its stages are more precisely described in the following. Local sums are computed from previous sample representatives using one of the four available modes. Similar to CCSDS 123.0-B-1, each mode is either neighbor- or column-oriented. As a novelty of Issue 2, modes can now be narrow instead of wide. The sample representatives used to calculate the local sums depend on the selected mode, as depicted in Figure 3. In the figure and hereafter, s z, y, x is used to denote the current sample s z (t), which makes explicit the band index z as well as the spatial coordinates (x, y) within the band. In all of the modes, the highlighted sample representatives are multiplied by the factor indicated in the Figure 3 and added together to obtain the local sum v z, y, x corresponding to s z, y, x . For instance, the narrow neighbororiented local sums are computed as Central dz–P,y,x Local + − Differences dz–P Prediction ... ... Weight Predicted Value Vector Wz,y,x Central ≈ Sz,y,x Local Local Sums + − Differences Predicted Central dz,y,x + 2Ωσz,y,x σz–1 dz–1 Local Difference 2Ω+2 dz,y,x Directional Inner Prediction Local Local Product − Differences Sums N W NW d z,y,x, d z,y,x , d z,y,x dz σz σz,y,x (Full Prediction Only) Sample Band z – P Representation S″z–P ... to ts z (t). Note that, depending on the parameter choice, smz (t) may not be contained in the quantizer bin identified by q z (t). The empirical results indicate that setting the damping and offset parameters to values other than zero tends to provide larger benefits to compression performance when spectral bands are closer in wavelength and for images with larger noise prevalence [37]. FIGURE 2. An overview of the prediction block in CCSDS 123.0-B-2. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 107
v z, y, x = smz, y - 1, x - 1 + 2smz, y - 1, x + smz, y - 1, x + 1 . (9) dtz, y, x = W Tz, y, x U z, y, x . (13) As can be observed in Figure 3, column-oriented local sums employ sample representatives at the same x coordinate, whereas neighbor-oriented sums also use sample representatives at contiguous x coordinates. In turn, the new narrow option removes the dependency on smz, y, x - 1, which facilitates pipelining in a hardware implementation at the cost of some compression performance loss [37]. Note that wide and narrow column-oriented modes are identical in the general case. Notwithstanding, only the wide columnoriented mode uses smz, y, x - 1 for calculating local sums at the first row, i.e., y = 0, of each spectral band. Local differences are computed based on the sample representatives and the local sums. For an input sample s z, y, z, up to four local difference types are computed: the central difference ^d z, y, x h and three directional differences, i.e., NW north ^d Nz, y, x h, west ^d W z, y, x h, and northwest ^ d z, y, x h . They are defined as follows: The predicted sample is then calculated as an integer approximation to d z, y, x d Nz, y, x dW z, y, x d NW z, y, x = = = = 4smz, y, x - v z, y, x, 4smz, y - 1, x - v z, y, x,  4smz, y, x - 1 - v z, y, x, 4smz, y - 1, x - 1 - v z, y, x . ts z, y, x . < where Ω is a parameter that controls arithmetic precision. The initial prediction weight vector for each band, W z, 0, 0, can be defined based on default or user-provided values. In either case, vector elements are updated after processing each input sample s z (t). The updates are based on the obtained prediction error and several user-defined parameters, namely, v min, v max, g (zi), g *z , and t inc, which control the rate at which weights are adapted to the original image statistics. More precisely, the smaller values of g (zi), g *z , v min, v max, and 1/t inc typically produce larger weight updates. This results in a faster adaptation to the source statistics at the cost of worse steady-state compression performance [37]. It is important to highlight that the existence of two prediction modes (full and reduced) as well as two different local mean types (column and neighbor oriented) is present in Issue 2 so that prediction is effective for the image data produced by different types of instruments. For instance, when streaking artifacts are present in the images, reduced column-oriented prediction tends to produce the best results [37]. (10) The predicted sample value is then computed using either the full or reduced prediction modes. In the full prediction mode, the local difference vector U z, y, x is defined using directional differences from the current spectral band and central differences from the previous bands: QUANTIZER INDEX MAPPING The prediction errors D z (t) obtained in (1) as well as their corresponding quantizer indices q z (t) defined in (2) may be negative. However, the entropy coders available in CCSDS 123.0-B are defined for nonnegative input values. The quantizer index mapping stage depicted in Figure 1 provides a one-to-one mapping between valid quantizer indices and nonnegative values, referred to as mapped quantizer indices, and is denoted as d z (t). This functional block remains unaltered with respect to the previous Issue of the standard [35]. A key property NW U z, y, x = 6d Nz, y, x, d W z, y, x, d z, y, x, d z - 1, y, x, f, d z - P, y, x@ . (11) In the reduced prediction mode, the local difference vector uses only central differences from previous bands: U z, y, x = 6d z - 1, y, x, f, d z - P, y, x@ . (12) In both modes, a prediction weight vector W z, y, x is used to obtain a weighted sum of local differences, called the predicted central local difference, as x 1× 1× S″z,y–1, x–1 S″z,y–1, x y S″z,y,x–1 x 1× 1× S″z,y–1, x+1 x 2× S″z,y–1, x–1 S″z,y–1, x y 1× Sz,y,x (a) dt z, y, x + 2 X v z, y, x F, (14) 2X + 2 1× 4× S″z,y–1, x+1 S″z,y–1, x–1 S″z,y–1, x S″z,y–1, x+1 y S″z,y,x–1 Sz,y,x (b) S″z,y,x–1 Sz,y,x (c) FIGURE 3. The local sum calculation modes available in Issue 2. The current sample position is highlighted with a blue border. The sample representatives employed for the corresponding local sum are shown in orange. (a) Wide neighbor-oriented, (b) narrow neighbor-oriented, and (c) column-oriented. 108 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
of this mapping is that indices can be represented using the same number of bits as in the original image. This is true because predicted values are guaranteed to satisfy ts z (t) ! [s min, s max]; i.e., predictions do not exceed the range of allowed sample input values given bit depth D. Thus, the number of possible prediction errors equals the number of elements in the aforementioned interval. Based on this, the mapping is defined as one of them to code all the mapped quantizer indices for an image. The first two encoding options were already present in the previous issue of the standard [35] while the hybrid coder in Issue 2 is new. The hybrid coder tends to provide better compression performance than the other two options, but the benefit may be small when compression is lossless. An overview of the three available coders is provided in the following sections. | q z (t) | + i z (t), | q z (t) | 2 i z (t) d z (t) = * 2 | q z (t) | , 0 # (- 1) us z (t) q z (t) # i z (t) (15)  2 | q z (t) | - 1, otherwise, BLOCK-ADAPTIVE CODER A block-adaptive coder is a separate CCSDS standard, originally specified in [38] and later extended in [40], based on Rice coding. In this coder, the samples are partitioned into disjoint blocks of fixed length of between eight and 64 samples. Each block is encoded using the most effective of five available coding methods: zero block, second extension, fundamental sequence, sample splitting, and no compression. A simplified diagram of this process is shown in Figure 4. Interested readers are referred to [67] for a summary of key operational concepts and a detailed performance analysis of this coder. where us z (t) is a double-resolution version of the predicted sample value defined in the “Prediction” section, and i z (t) is the difference between the predicted value and the nearest interval endpoint, i.e., i z (t) = min d< ts z (t) - s min + m z (t) F, (16) 2m z (t) + 1 s max - ts z (t) + m z (t) < Fn . (17) 2m z (t) + 1 ENCODER STAGE The encoder stage compresses the sequence of mapped quantizer indices d z (t) produced by the predictor stage into a variable-length bitstream. This operation is reversible, meaning that an identical sequence of mapped quantizer indices can be recovered from the bitstream. These indices allow for an exact or approximate reconstruction of the input image, depending on the error limits set in the predictor stage. In Issue 2, three coders are available for this purpose: sample and block adaptive and hybrid. The user must select Zero Block Mapped Quantizer Indices δz(t) Mapped Quantizer Index Block Block Splitting Fundamental Sequence Second Extension Sample Splitting k = 1 Sample Splitting k = 2 SAMPLE-ADAPTIVE CODER In the sample-adaptive coder, each mapped quantizer index d z (t) is compressed using a variable-length codeword from a family of length-limited Golomb-power-of-2 (GPO2) codes. Each GPO2 code is identified by an index k, which is selected based on the statistics of previously coded samples. Given k and d z (t), the selected codeword is denoted as 0 k (d z (t)) and defined as follows: ◗◗ If 6d z (t)/2 k@ 1 U max, 0 k (d z (t)) consists of 6d z (t)/2 k@ zeros, followed by a one, followed by the k least-significant bits of the binary representation of d z (t). Zero-Block Codeword Fundamental Sequence Codeword Second Extension Codeword Sample Splitting (k = 1) Codeword Sample Splitting (k = 2) Codeword Block of Selected Codewords Coding Option Selection Selected Option ID ... ... No Compression No Compression Codeword FIGURE 4. An overview of the block-adaptive entropy coder. The coding options executed in parallel for each block are highlighted in orange. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 109
◗◗ Otherwise, 0 k (d z (t)) consists of U max zeros, followed by the binary representation of d z (t) using D bits. Here, U max is a user-specified parameter utilized to limit the maximum codeword length, and D is the image’s bit depth. Two variables are used to keep track of the input data statistics and to choose the GPO2 family’s index k z (t) to code d z (t): an accumulator R z (t) and a counter C (t). The ratio of these two variables determines k z (t): ◗◗ If 2C (t) 2 R z (t) + 649C (t) /2 7@, then k z (t) = 0. ◗◗ Otherwise, k z (t) is the largest positive integer such that k z (t) # D - 2,  C (t) 2 k z (t) # R z (t) + 649C (t) /2 7@ . (18) Mapped quantizer indices typically follow a nonstationary geometric distribution, for which k z (t) is a good parameter estimator. Note that the counter and accumulator variables are initialized based on user-specified parameters. The values of the counter and the accumulator variables are updated after coding each input sample d z (t - 1) . More specifically, C is increased by one, and R is increased by d z (t - 1) . In addition, both C and R are periodically divided by two (rounding down) to enable calculation using finite-precision arithmetic. This division is hereafter referred to as renormalization. HYBRID CODER The hybrid coder uses the statistics of previously encoded data to classify each input-mapped quantizer index as either a high- or low-entropy sample. The high-entropy samples are compressed using a variation of the length-limited GPO2 code family described in the “Sample-Adaptive Coder” section. The low-entropy samples are coded using another family of 16 variable-to-variable-length codes, i.e., several input samples can be encoded with a single codeword. A detailed description of these variable-to-variable-length Start Input Sample δz(t) Update Counter and Accumulator kz(t) Calculate GPO2 Code Index High Entropy Sample Type? Low Entropy Calculate Low-Entropy Code Index Code Index i codes can be found in [68]. The ability to adaptively switch between GPO2 and variable-to-variable-length codes gives this code the name hybrid. Variable-to-variable-length codes enable very efficient compression of highly predictable (low-entropy) samples, which become more prevalent when near-lossless error limits are used in the predictor stage. Meanwhile, variableto-variable-length codes introduce variability in the latency between the arrival of a low-entropy mapped quantizer index and the output of the codeword that encodes it. To accommodate this, codewords emitted by the hybrid coder are designed so that they can be decoded in reverse order. This is possible thanks to two main properties of the coder. First, output codewords are suffix-free rather than prefixfree. Second, the compressed image ends with a specification of the final state of the coder. A set of flush tables is provided in the standard to signal the code states in an unambiguous and compact manner. Reverse decoding allows for simpler and more memory-efficient implementations than does FLEX’s original hybrid entropy coder [62]. The remainder of this section describes Issue 2’s hybrid coder. A flow diagram of this coder’s logic is provided in Figure 5 to support this description. The classification of samples as high or low entropy is performed using a similar statistical approach to that of the sample-adaptive coder. Two variables are used to keep track of these statistics: a counter C (t) and a high-resoluu z (t) . These variables are updated the tion accumulator R same way as in the sample-adaptive coder, with two main differences. First, variables are updated before coding the input sample; this is done so that decoding can proceed in reverse order. To this effect, the least-significant bit of the accumulator variable is output before renormalization u z (t) so that the decoder can invert this process. Second, R is increased by 4d z (t) instead of d z (t) to enable a more Emit Reversed, Limited-Length GPO2 Codeword R′k (t)(δz(t)) z Update Code i Prefix Yes (Likely Sample) No End Yes Emit Code i’s Codeword Given Its Current Prefix No (Unlikely Sample) δz(t ) ≤ Code i Symbol Limit Li? Is Code i’ s Prefix Complete? Emit R′0(δz(t) – Li – 1) Clear Code i Prefix Complete Code i’s Codeword With Scape Symbol FIGURE 5. A flow diagram of CCSDS 123.0-B-2’s hybrid coder. The logical decisions are highlighted in orange, the processes that update the codes’ internal state are shown in green, and the processes that emit codewords are presented in purple background. 110 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
precise estimation of the input data statistics. The ratio u z (t) /C (t) determines whether a sample is a high- or lowR entropy symbol. More specifically, d z (t) is defined as high entropy if and only if u z (t)·2 14 $ T0·C (t), (19) R where T0 is a constant provided in the standard. This definition allows for image regions that are well predicted to be coded with low-entropy codes and using the high-entropy mode otherwise. Each high-entropy sample is encoded using a family of reversed-length-limited GPO2 codes. As in the sampleadaptive case, each code is identified by an index, k z (t). For the hybrid coder, k z (t) is the largest positive integer that satisfies k z (t) # max (D - 2, 2), u z (t) + 649C (t) /2 5@ .  C (t) 2 k z (t) + 2 # R (20) The codeword emitted for the high-entropy sample d z (t), 0lk z (t) (d z (t)) is defined as follows: ◗◗ If 6d z (t) /2 k z (t)@ 1 U max, then 0lk z (t) (d z (t)) consists of the k z (t) least-significant bits of the binary representation of d z (t), followed by a one, followed by 6d z (t) /2 k z (t)@ zeros. ◗◗ Otherwise, 0lk z (t) (d z (t)) consists of the D-bit binary representation of d z (t) followed by U max zeros. The low-entropy samples are processed with one of 16 available variable-to-variable-length codes. The code index used to process a low-entropy sample d z (t) is the largest i satisfying u z (t)·2 14 1 C (t)·Ti, 0 # i # 15, (21) R where T0, f, T15 are constants provided in the standard, and T0 is used in (19). This definition allows for the magnitude of recent prediction errors to determine the next variable-to-variable-length code to be used. Each code i has a prefix of previously input samples. When a sample is processed, a symbol is added to the corresponding code’s prefix. The standard defines a list of complete prefixes for each code. At this point, if code i’s prefix matches any of those complete prefixes, a codeword that uniquely identifies that prefix and its associated sequence of input samples is emitted. After that, the prefix for that code is cleared. It is worth noting that the complete prefixes defined for code i cannot contain sample values satisfying d z (t) 2 L i, where L 0, f, L 15 are constants defined in the standard. When such a sample is processed, i.e., referred to as an unlikely sample, 0l0 (d z (t) - L i - 1) is emitted, and an escape symbol X is added to the prefix instead of d z (t). Adding X to any code’s prefix is guaranteed to make it complete and trigger the emission of an output codeword. The input symbol limit Li limits the size of the input alphabet in the low-entropy codes by treating all of the unlikely symbols in the same way. This enables us to reduce the number of DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE codewords in a code. As escape symbols occur with low probability, the efficiency with which these residual values are encoded has only a small impact on the overall coding effectiveness. COMPRESSION PERFORMANCE EXPERIMENTAL SETUP The lossless and near-lossless compression performances of Issue 2 are explained in this section. The results are provided for both the block- and sample-adaptive entropy coders already present in Issue 1 and are compared to those of the new hybrid coder defined in Issue 2. The hybrid coder’s computational complexity is comprehensively addressed in [69], so the execution time results are not presented here. The empirical results were obtained using a varied corpus of 17 multispectral images, 38 hyperspectral images, and two sounder data samples. These were generated by 14 different instruments deployed in real missions, except for the Pleiades images, which are simulated. Most of the images included are raw, giving more weight to the direct compression of images as they are acquired, while the nonraw instances processed after acquisition are also included to represent some possible onboard calibration. Both pushbroom and whiskbroom sensors are covered in the corpus and include the streaking artifacts that are characteristic of pushbroom instruments (such as Hyperion) in uncalibrated images. A diverse range of spectral separations is considered, and examples of images with significant noise levels (the Moon Mineralogy Mapper) or that are acquired with airborne instruments [the Compact Airborne Spectrographic Imager (CASI)] are included as well. Regarding the dynamic range, all the hyperspectral and sounding instruments produce data with bit depths of at least 11 bits, whereas, for multispectral instruments, samples of lower bit depths are available, too. A summary of this corpus, produced by the CCSDS MHDC Working Group, is provided in Table 3. All of the images are publicly available, except for those produced by the Infrared Atmospheric Sounding Interferometer (IASI) and Meteosat Second Generation instruments, due to licensing restrictions. (The download links for the test images can be found at http://cwe.ccsds.org/sls/docs/sls-dc/123.0-B-Info/TestData.) The “Entropy” column in the table represents the zero-order entropy of the images. Note that this is not a strict bound on compression efficiency and should be regarded as only an assessment of the difficulty of compressing the images. The performance results are obtained by invoking Issue 2’s compressor with the default set of parameters described in [37], except for the Hyperion, IASI, Moderate Resolution Imaging Spectroradiometer (MODIS), and Système Pour l’Observation de la Terre 5 (SPOT5) instruments. For these, the following parameters are modified to enhance compression performance: t inc = 2 9, v min = v max = 0, U max = 32, c* = 11, and c 0 = 4. A full prediction with wide, neighbor-oriented local sums is used in most of the images, including the 111
terms of rate distortion, i.e., float discrete wavelet transform (DWT) and spectral pairwise-orthogonal transform (POT), is used. JPEG-LS is arguably the best-known compression standard; it offers low complexity and supports both lossless and near-lossless regimes. In turn, M-CALIC is another low-complexity algorithm well known for its competitive compression performance. Note that, because JPEG-LS does not admit an arbitrary number of spectral bands, images are reshaped by concatenating the bands along the y-axis. More specifically, an image with a width, height, and number of bands equal to NX, NY, and NZ, respectively, is transformed into a one-band image with the same width and height as NY and NZ, respectively. No attempt is made to perform decorrelation across spectral bands for JPEG-LS. In contrast, M-CALIC is designed specifically to exploit spectral redundancy in hyperspectral images. four aforementioned instruments. The column-oriented local sums are employed for images that present streaking artifacts, i.e., when the average sample values exhibit strong differences for contiguous x positions. A full analysis of the impact on performance of parameter tuning as well as an identification of images with streaking artifacts can be found in [37]. To provide a comparison baseline, the authors’ implementation of CCSDS 122.1-B-1, the reference implementation of the JPEG-LS standard, and the original authors’ implementation of multiband context-based adaptive lossless image coding (M-CALIC) [70] are included in the comparison as well. (Note that the employed JPEG-LS implementation is available at https://github.com/thorfdbg/ libjpeg; to attain lossless and near-lossless compression, this compressor was invoked with parameter −ls 0.) For CCSDS 122.1-B-1, the best-performing configuration in TABLE 3. A SUMMARY OF THE EMPLOYED CORPUS PROPERTIES. THE ENTROPY (IN BITS) IS AVERAGED FOR ALL OF THE IMAGES IN EACH ROW. INSTRUMENT ACRONYM IMAGE TYPE BIT DEPTH D ENTROPY NUMBER OF BANDS WIDTH HEIGHT NUMBER OF IMAGES Atmospheric Infrared Sounder AIRS Raw 12 11.2 1,501 90 135 1 Airborne Visible/Infrared Imaging Spectrometer AVIRIS Raw 15 12.6 224 680 512 1 — — Raw 10 8.6 224 614 512 1 — — Calibrated 13 10.3 224 677 512 13 Compact Airborne Spectrographic Imager CASI Raw 12, 13, and 15 11.6 72 406 1,225 3 Compact Reconnaissance Imaging Spectrometer for Mars CRISM FRT, raw 11 10.1 107 640 510 2 — — FRT, raw 12, 13 10.4 438 640 510 2 — — FRT, raw 12, 13 10.6 545 640 510 2 — — HRL, raw 12, 13 11.2 545 320 450 2 — — MSP, raw 11 9.8 74 64 2,700 2 Hyperion Hyperion Raw 12 8.5 242 256 1,024 3 Infrared Atmospheric Sounding Interferometer IASI Calibrated 12 11 8,461 66 60 1 Landsat Landsat Raw 8 6.6 6 1,024 1,024 3 Moon Mineralogy Mapper M3 Target, raw 12 9.7 260 640 512 2 — — Global, raw 11, 12 9.4 86 320 512 2 Moderate Resolution Imaging Spec- MODIS troradiometer Night, raw 12 10.8 17 1,354 2,030 2 — — Day, raw 12, 13 8.6 14 1,354 2,030 2 — — 500 m, raw 12, 13 11.1 5 2,708 4,060 2 — — 250 m, raw 12 10.4 2 5,416 8,120 2 Meteosat Second Generation MSG Calibrated 10 8.2 11 3,712 3,712 1 Pleiades High Resolution Pleiades High resolution, simulated 12 10.8 4 224 2,465 1 — — High resolution, simulated 12 10.2 4 224 2,448 3 SWIR Full Spectrum Imager SFSI Calibrated 15 9.9 240 452 140 1 — — Raw 9, 11 7.4 240 496 140 2 Système Pour l’Observation de la Terre 5 High Resolution Geometric SPOT5 HRG, processed 8 6.8 3 1,024 1,024 1 Vegetation Vegetation Raw 10 9.4 4 1,728 10,080 2 FRT: full-resolution target; HRL: half-resolution long; MPS: multispectral survey; HRG: half-resolution. 112 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
LOSSLESS COMPRESSION RESULTS Lossless compression results are obtained for all of Issue 2’s entropy coders, for JPEG-LS, and for M-CALIC by setting the absolute error limit to zero. For each image I in the test corpus, the compression ratio is defined as Relative Frequency 0.15 N X ·N Y ·N Z ·D CR (I) = compressed data size (bits) . (22) Based on this definition, higher compression ratio values indicate better compression. A distribution of the obtained compression ratios for each compressor is shown in Figure 6. Vertical bar heights indicate the relative frequency of each range of compression ratios. The average compression ratio, plus/minus one standard deviation, is denoted with a dot and two horizontal bars. Note that the aggregated results presented here and in the “Near-Lossless Compression Results” section are not necessarily representative of any particular image or instrument. This is due to their different statistical properties and the fact that a different number of images is available for each instrument. As can be observed, all three entropy coders in Issue 2 yield similar compression ratio distributions and average values. In turn, JPEG-LS and M-CALIC produce average compression ratios 25 and 13% lower, respectively, than those of Issue 2. These differences can be explained by the more advanced predictor stage used in Issue 2. To provide further insight, the average compression ratios grouped by instrument are shown in the “Lossless” columns of Table 4 for Issue 2 using the hybrid coder, for JPEG-LS, and for M-CALIC. Consistent with the previous discussion, the CCSDS compressor yields higher compression efficiency than do JPEG-LS and M-CALIC for most instruments. Improvements of up to 63.7 and 63.4%, respectively, can be observed. Only for the MODIS instrument does JPEG-LS Sample Adaptive 0 0.15 Hybrid 0 0.15 Block Adaptive 0 0.15 JPEG-LS 0 0.15 0 M-CALIC 0 1 2 3 4 5 Compression Ratio 6 FIGURE 6. A distribution of lossless compression ratios. perform better, yielding an average compression ratio 7.7% higher than Issue 2’s with the hybrid coder. In turn, M-CALIC improves upon JPEG-LS in all cases and is able to yield results between 0.3 and 8.9% better than Issue 2 for five of the tested instruments. These differences can be explained by the fact that M-CALIC employs an arithmetic entropy coder, which enables better modeling of the source’s statistics, although at the cost of higher computational complexity. NEAR-LOSSLESS COMPRESSION RESULTS Near-lossless compression results are obtained for all three entropy coders in CCSDS 123.0-B-2 as well as for JPEG-LS and M-CALIC by limiting the maximum absolute error in any pixel of the reconstructed images. This error is hereafter denoted as peak absolute error (PAE). Two illustrative examples of near-lossless compression using Issue 2 and JPEG-LS are provided in Figure 7. In the top row, it can be TABLE 4. THE AVERAGE COMPRESSION RATIO RESULTS GROUPED BY INSTRUMENT. CCSDS 123.0-B-2 (HYBRID CODER) JPEG-LS M-CALIC INSTRUMENT LOSSLESS PAE 1 PAE 2 PAE 5 PAE 16 LOSSLESS PAE 1 PAE 2 PAE 5 PAE 16 LOSSLESS PAE 1 PAE 2 PAE 5 PAE 16 AIRS 2.86 4.56 6.09 10.76 35.74 1.89 2.51 2.95 3.97 6.68 2.87 4.51 5.92 9.95 27.51 AVIRIS 3.11 5.28 7.66 15.29 37.52 1.90 2.56 3.03 4.09 7.06 3.01 4.82 6.31 10.23 21.36 CASI 2.29 3.22 3.96 5.91 12.21 1.66 2.08 2.36 2.96 4.38 2.27 3.17 3.87 5.63 11.09 CRISM 3.10 5.05 6.87 11.15 22.93 2.20 3.08 3.71 5.14 8.17 2.21 3.21 4.01 6.08 13.43 Hyperion 2.86 4.57 6.09 10.80 44.75 2.44 3.56 4.48 6.76 13.22 2.79 4.36 5.72 9.59 28.38 IASI 2.53 3.75 4.70 7.17 14.96 1.92 2.56 3.01 4.05 7.12 2.48 3.64 4.55 6.94 15.75 Landsat 2.35 4.12 6.24 12.8 41.88 2.13 3.68 5.09 8.46 20.33 2.37 3.97 5.4 9.25 19.51 M3 4.38 7.44 9.61 14.27 24.28 2.72 4.15 5.29 7.27 10.33 2.68 4.17 5.42 8.86 22.49 MODIS 1.94 2.60 3.07 4.12 2.09 2.77 3.24 4.27 6.95 2.13 2.72 3.22 4.35 7.39 MSG 2.77 4.49 6.06 10.01 24.18 2.64 4.20 5.39 8.08 14.78 2.73 4.12 5.31 8.22 17.45 Pleiades 1.66 2.12 2.43 3.11 5.04 1.62 2.06 2.36 3.01 4.64 1.68 2.16 2.49 3.23 5.18 SFSI 3.07 5.18 7.02 11.97 53.21 2.58 3.75 4.65 6.99 16.5 2.91 4.39 5.65 9.13 30.05 SPOT5 1.55 2.21 2.74 4.22 10.00 1.45 2.03 2.48 3.63 6.69 1.54 2.22 2.74 4.07 8.90 Vegetation 1.95 2.77 3.40 5.04 10.54 1.87 2.61 3.16 4.42 7.78 2.03 2.86 3.51 5.08 10.05 All 2.67 4.20 5.55 9.07 22.98 2.12 3.00 3.67 5.17 9.20 2.35 3.44 4.34 6.7 15.43 7.35 PAE: peak absolute error. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 113
reconstructed images. For each compressor, PAE, and input image I, the compressed data rate expressed in bits per sample (bps) is computed as observed that Issue 2’s hybrid coder enables higher image quality, i.e., a lower PAE, at similar, albeit smaller, compressed data sizes. Furthermore, for sufficiently low PAEs, reconstructed images are hardly distinguishable from the originals. In turn, the bottom row illustrates how moderately larger PAEs introduce some texture artifacts, but retain the image’s structure, and so might not hinder analysis tasks performed on it [30], [31]. A visual inspection of this row also reveals that Issue 2 introduces distortion patterns similar to those of JPEG-LS. This is expected because both algorithms apply quantization after prediction. It is worth noting that the choice of entropy coder in CCSDS 123.0-B does not affect the obtained reconstructed image, only the compressed data size. Compressed data rate differences aside, a similar discussion regarding visual quality applies for M-CALIC, too. It is omitted here for space constraints. The remainder of this section provides quantitative discussion of the compression performance of the aforementioned algorithms in relation to the fidelity of the compressed data size (bits) . (23) N X ·N Y ·N Z compressed data rate = In turn, the peak SNR (PSNR) between I and its reconstructed counterpart It is defined as PSNR (I, It) = 10·log 10 d MAX 2I n (dB). (24) MSE(I, It) Here, MAX I denotes the dynamic range of an image, i.e., 2 D - 1, where D is I’s bit depth, and mean square error (MSE)(I, It) is the mean squared error between I and It, i.e., N X NY N Z MSE(I, It) = / / / ^I z,y,x - It z,y,xh2 x y z N X ·N Y ·N Z (a) (b) (c) (d) (e) (f) . (25) FIGURE 7. (a) A crop (256 × 256) of Band 220 of an original AVIRIS f060925t01p00r12_sc00 image (calibrated, 16 bit); (b) and (c) the colo- cated crops of the same AVIRIS image after reconstruction with CCSDS 123.0-B-2’s hybrid coder [compressed at 2.4 bits per sample (bps)] and JPEG-LS (2.9 bps) with absolute error limits of 2 and 16, respectively; (d) a crop (128 × 128) of an original SPOT5 toulouse_spot5_xs_ extract1 image (processed, 8 bit); and (e) and (f) the colocated crops of the same SPOT5 image after reconstruction with CCSDS 123.0-B-2’s hybrid coder (1.1 bps) and JPEG-LS (1.4 bps) with an absolute error limit 12. The brightness and magnification have been adjusted in all of the images to facilitate a comparison. The SPOT5 images are presented using false color. 114 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
Nz N O O Nz O . (26) 2 t / I z,y,x O z P / I z,y,x ·It z,y,x z Nz / I 2z,y,x · z The mean spectral angle and maximum spectral angle metrics are defined as the average and maximum spectral angle, respectively, for all (x, y) positions in the image. Figure 8 provides near-lossless compressed data rate results for the three entropy coders of Issue 2, for JPEG-LS, and for M-CALIC, setting PAE limits between 0 (lossless) and 32. For each coder and PAE value, the plotted value is the mean compressed data rate for all the images in the corpus. Markers have been included in the figure at the integer PAE values for which data have been obtained, and linear interpolation is used between them for the sake of readability. The results indicate that, for larger PAE values, the differences between Issue 2’s coders become more apparent than for the lossless case. When compared to the block- and sample-adaptive coders, the hybrid coder yields compressed data rates up to 0.2 and 0.6 bps better, respectively. For PAE values up to 5, both JPEG-LS and M-CALIC are outperformed by all entropy coders of Issue 2. For PAE value from 20 onward, M-CALIC improves upon the blockadaptive coder. For PAEs larger than 25, JPEG-LS produces results better than the sample-adaptive coder. Notwithstanding, for PAE values of 2 and above, the hybrid coder’s average results are consistently better than all other compressors for all tested PAE values. The global results presented in Figure 8 are complemented by Table 4, which also reports average compression ratios for several PAE values. In it, the average compression ratios for each instrument are provided. It can be observed that the per-instrument results are generally consistent with global averages, with similar exceptions as for the lossless case. These behaviors are explained by the different predictor stages and by the way in which each coder handles the low-entropy data prevalent in near-lossless compression. The sample-adaptive coder does not have a mode in which multiple input symbols are compressed in a single codeword. Therefore, the minimum length of any sampleadaptive codeword sets a lower bound for the compression rates achievable by this coder. Both JPEG-LS and the blockadaptive coder have run-length modes that allow coding of consecutive zeros in a single codeword. Thus, their compression performance is increased as the prevalence of such runs is increased. In turn, the 16 stateful codes featured in the hybrid coder enable a more efficient processing of low-entropy data, including inputs that are not sequences of only zeros. Finally, M-CALIC’s performance improvement for higher PAEs is due to its arithmetic entropy coder, which is close to optimal for many data distributions. In addition to considering the compressed data rates and PAE of the reconstructed images, it is useful to consider DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 5.6 Compressed Data Rate (bps) J K K a (x, y) = cos -1 K K L other distortion metrics to better understand the efficiency of each coder. To complete the rate-distortion compression performance comparison, the average PSNR as a function of the average compressed data rate is plotted in Figure 9. The mean spectral angle and maximum spectral angle metrics are plotted in Figure 10(a) and (b), respectively. All of the metrics are computed for each coder, PAE value (or target bitrate, for CCSDS 122.1-B-1), and test image, and the mean values are used in the plots. Markers are placed at the obtained data points, and linear interpolation is used 4.8 4 3.2 2.4 1.6 0.8 0 0 4 8 12 16 20 24 28 32 PAE Block Adaptive Sample Adaptive Hybrid JPEG-LS M-CALIC CCSDS 122.1 (POT) FIGURE 8. The average compressed data rate in bps as a function of the maximum absolute error. 72 68 64 PSNR (dB) The spectral angle is computed at each (x, y) position for each original and reconstructed image pair, defined in [71] as 60 56 52 48 1 0 2 3 4 Compressed Data Rate (bpppc) Block Adaptive Sample Adaptive Hybrid JPEG-LS M-CALIC CCSDS 122.1 (POT) FIGURE 9. The average PSNR results as a function of the average compressed data rate. 115
between them to enhance readability. As in the previous case, the hybrid coder yields better fidelity results than do the other near-lossless coders for all the metrics, especially at low compressed data rates. This and other differences among compressors are comparable to those shown in Figure 8, for similar reasons as mentioned previously in this section. When compared to CCSDS 122.1-B-1, all of the nearlossless codecs yield significantly better PAE results. This is as expected, as the CCSDS 122.1-B-1 standard is not designed to bound the maximum introduced error, but rather to minimize MSE. At low bitrates, i.e., below 1.25 bps, CCSDS 122.1B-1 yields the best PSNR results of all the tested codecs. Again, this can be explained by the minimization goal of the standard. At higher bitrates, the hybrid coder of Issue 2 produces the best PSNR results, which illustrates the competitive performance of CCSDS 123.0-B-2. When spectral angles are considered, the relative performance of the near-lossless coders is very similar to the PAE and PSNR cases. In turn, for the mean spectral angle metric, CCSDS 122.1-B-1 improves upon all the other coders for bitrates up to 2 bps. This can be explained by the fact that CCSDS 122.1-B-1 applies a spectral transform across all bands, instead of predicting pixel values using a local spatial and spectral neighborhood. Interestingly, when the maximum spectral angle is considered, CCSDS 123.0-B-2 yields better results than does CCSDS 122.1-B-1, except for low bitrates, i.e., below 0.75 bps. This can be explained by the fact that CCSDS 123.0-B-2 is near lossless, i.e., it bounds the maximum error introduced in any pixel of the image. CONCLUSIONS Multispectral imaging and HSI have become invaluable tools for many commercial, scientific, and defense applications of remote sensing. With the advent of sensors allowing enhanced spatial and spectral resolution, data compression is paramount to maximize the amount of valuable information retrieved from spaceborne systems. In particular, nearlossless compression can significantly improve the effective capacity of transmission channels while providing strict control of the distortion introduced in the images. Even if rate-control strategies are possible, strong quality guarantees are prioritized over obtaining constant data rates in near-real time transmission. The CCSDS 123.0-B-2 compression standard published by the CCSDS enables the specification of absolute and/ or relative error limits at the image or band level. This is achieved via the uniform, in-loop quantization of prediction errors, obtaining higher performance at the expense of a simpler implementation. As the decompressor does not have access to the original image samples, sample representatives are used instead in the predictor stage. To fully exploit the lower entropy rates exhibited by quantized data, a new hybrid entropy coder is defined for Issue 2. This coder includes 16 variable-to-variable-length codes selected on a sample-by-sample basis depending on the statistics of previously coded information. One last improvement over CCSDS 123.0-B-1 is the definition of narrow local sums that facilitate the design of highly efficient hardware implementations. Experimental results with a comprehensive corpus of test images indicate that the new hybrid coder yields competitive compression performance results, measurably improving upon the other coding modes of Issue 2 as well as upon the JPEG-LS compression standard and the M-CALIC algorithm. The standard obtains state-of-the-art performance in absolute or relative error measurements, 24 2.4 Maximum Spectral Angle (°) Mean Spectral Angle (°) 2.8 2 1.6 1.2 0.8 0.4 0 21 18 15 12 9 6 3 0 1 2 3 4 0 0 1 2 3 Compressed Data Rate (bps) Compressed Data Rate (bps) (a) (b) Block Adaptive Sample Adaptive JPEG-LS M-CALIC 4 Hybrid CCSDS 122.1 (POT) FIGURE 10. The spectral angle metrics as a function of the compressed data rate. (a) The mean spectral angle and (b) the maximum ­spectral angle. 116 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
while other approaches may provide better performance in terms of quadratic error at very low rates. Regarding future developments related to this standard, it is unlikely that major changes are introduced soon. ACKNOWLEDGMENTS Miguel Hernández-Cabronero, Ian Blanes, and Joan SerraSagristà received partial funding from the postdoctoral fellowship program Beatriu de Pinós, reference 2018-BP00008, funded by the Secretary of Universities and Research (Government of Catalonia) and by the H2020 Programme of Research and Innovation of the European Union (EU) under Marie Skłodowska-Curie grant agreement 801370; from the EU’s H2020 program under grant agreement 776151; from the Spanish Government under grant RTI2018-095287-B-I00; and from the Catalan Government under grant 2017SGR-463. The research conducted at the Jet Propulsion Laboratory at the California Institute of Technology was performed under a contract with NASA. Miguel Hernández-Cabronero is the corresponding author. AUTHOR INFORMATION [3] [4] [5] [6] [7] Miguel Hernández-Cabronero (miguel.hernandez@uab .cat) is with the Department of Information and Communications Engineering, Universitat Autònoma de Barcelona, Barcelona, 08193, Spain. Aaron B. Kiely (aaron.b.kiely@jpl.nasa.gov) is with the Jet Propulsion Laboratory, California Institute of Technology, Pasadena, California, 91109, USA. He is a Senior Member of IEEE. Matthew Klimesh (matthew.a.klimesh@jpl.nasa.gov) is with the Jet Propulsion Laboratory, California Institute of Technology, Pasadena, California, 91109, USA. He is a Senior Member of IEEE. Ian Blanes (ian.blanes@uab.ca) is with the Department of Information and Communications Engineering, Universitat Autònoma de Barcelona, Barcelona, 08193, Spain. He is a Senior Member of IEEE. Jonathan Ligo (jonathan.ligo@jhuapl.edu) is with the Applied Physics Laboratory, Johns Hopkins University, Baltimore, Maryland, 20723, USA. He is a Member of IEEE. Enrico Magli (enrico.magli@polito.it) is with the Department of Electronics and Telecommunications, Politecnico di Torino, Turin, 10129, Italy. He is a Fellow of IEEE. Joan Serra-Sagristà (joan.serra@uab.cat) is with the Department of Information and Communications Engineering, Universitat Autònoma de Barcelona, Barcelona, 08193, Spain. He is a Senior Member of IEEE. [8] [9] [10] [11] [12] [13] [14] REFERENCES [1] [2] M. Parente, J. Kerekes, and R. Heylen, “A special issue on hyperspectral imaging [from the guest editors],” IEEE Geosci. Remote Sens. Mag. (replaces Newslett.), vol. 7, no. 2, pp. 6–7, June 2019. doi: 10.1109/MGRS.2019.2912617. E. J. Ientilucci and S. Adler-Golden, “Atmospheric compensation of hyperspectral data: An overview and review of in-scene DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE [15] and physics-based approaches,” IEEE Geosci. Remote Sens. Mag. (replaces Newslett.), vol. 7, no. 2, pp. 31–50, June 2019. doi: 10.1109/MGRS.2019.2904706. M. J. Khan, H. S. Khan, A. Yousaf, K. Khurshid, and A. Abbas, “Modern trends in hyperspectral image analysis: A review,” IEEE Access, vol. 6, pp. 14,118–14,129, Mar. 2018. doi: 10.1109/ACCESS.2018.2812999. M. Malyy, Z. Tekic, and A. Golkar, “What drives technology innovation in new space? A preliminary analysis of venture capital investments in earth observation start-ups,” IEEE Geosci. Remote Sens. Mag. (replaces Newslett.), vol. 7, no. 1, pp. 59–73, Mar. 2019. doi: 10.1109/MGRS.2018.2886999. J. Theiler, A. Ziemann, S. Matteoli, and M. Diani, “Spectral variability of remotely sensed target materials: Causes, models, and strategies for mitigation and robust exploitation,” IEEE Geosci. Remote Sens. Mag. (replaces Newslett.), vol. 7, no. 2, pp. 8–30, June 2019. doi: 10.1109/MGRS.2019.2890997. Y. Zhong et al., “Mini-UAV-borne hyperspectral remote sensing: From observation and processing to applications,” IEEE Geosci. Remote Sens. Mag. (replaces Newslett.), vol. 6, no. 4, pp. 46–62, Dec. 2018. doi: 10.1109/MGRS.2018.2867592. G. Denis et al., “Towards disruptions in Earth observation? New Earth Observation systems and markets evolution: Possible scenarios and impacts,” Acta Astronaut. (U.K.), vol. 137, pp. 415–433, Aug. 2017. doi: 10.1016/j.actaastro.2017.04.034. W. Sun and Q. Du, “Hyperspectral Band Selection: A Review,” IEEE Geosci. Remote Sens. Mag. (replaces Newslett.), vol. 7, no. 2, pp. 118–139, June 2019. doi: 10.1109/MGRS.2019.2911100. S. Li, W. Song, L. Fang, Y. Chen, P. Ghamisi, and J. A. Benediktsson, “Deep learning for hyperspectral image classification: An overview,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 9, pp. 6690–6709, 2019. doi: 10.1109/TGRS.2019.2907932. P. Duan, X. Kang, S. Li, P. Ghamisi, and J. A. Benediktsson, “Fusion of multiple edge-preserving operations for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 12, pp. 10,336–10,349, 2019. doi: 10.1109/TGRS.2019.2933588. Y. Su, J. Li, A. Plaza, A. Marinoni, P. Gamba, and S. Chakravortty, “DAEN: Deep autoencoder networks for hyperspectral unmixing,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 7, pp. 4309–4321, 2019. doi: 10.1109/TGRS.2018.2890633. Y. Chen, K. Zhu, L. Zhu, X. He, P. Ghamisi, and J. A. Benediktsson, “Automatic design of convolutional neural network for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 9, pp. 7048–7066, 2019. doi: 10.1109/ TGRS.2019.2910603. J. M. Haut et al., “Cloud deep networks for hyperspectral image analysis,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 12, pp. 9832–9848, 2019. doi: 10.1109/TGRS.2019.2929731. B. Tu, X. Zhang, X. Kang, J. Wang, and J. A. Benediktsson, “Spatial density peak clustering for hyperspectral image classification with noisy labels,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 7, pp. 5085–5097, 2019. doi: 10.1109/TGRS.2019.2896471. K. Bhardwaj, S. Patra, and L. Bruzzone, “Threshold-free attribute profile for classification of hyperspectral images,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 10, pp. 7731–7742, 2019. doi: 10.1109/TGRS.2019.2916169. 117
[16] X. Lu, L. Dong, and Y. Yuan, “Subspace clustering constrained sparse NMF for hyperspectral unmixing,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 5, pp. 3007–3019, 2020. doi: 10.1109/ TGRS.2019.2946751. [17] C. J. Della Porta, A. A. Bekit, B. H. Lampe, and C. Chang, “Hyperspectral image classification via compressive sensing,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 10, pp. 8290–8303, 2019. doi: 10.1109/TGRS.2019.2920112. [18] J. Nalepa, M. Myller, and M. Kawulok, “Validating hyperspectral image segmentation,” IEEE Geosci. Remote Sens. Lett., vol. 16, no. 8, pp. 1264–1268, 2019. doi: 10.1109/LGRS.2019. 2895697. [19] D. Hong, X. Wu, P. Ghamisi, J. Chanussot, N. Yokoya, and X. X. Zhu, “Invariant attribute profiles: A spatial-frequency joint feature extractor for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens., 2020, pp. 1–18. [20] “IASI level 1: Product guide,” EUMETSAT, Tech. Rep. EUM/ OPS-EPS/MAN/04/0032, Darmstadt, Germany, Sept. 2019. [21] K. Turpie, S. Veraverbeke, R. Wright, M. Anderson, and D. Quattrochi, “NASA 2014 The Hyperspectral Infrared Imager (HyspIRI) – Science impact of deploying instruments on separate platforms,” Jet Propulsion Lab., Tech. Rep. JPL-Publ-14-13, July 2014. [Online]. Available: http://hdl.handle.net/2060/20160001776 [22] S.-E. Qian, Optical Satellite Data Compression and Implementation. Bellingham, WA: SPIE, 2013. [23] B. Huang, Satellite Data Compression. Berlin, Germany: Springer Science & Business Media, 2011. [24] K. Sayood, Introduction to Data Compression, 5th ed. San Mateo, CA: Morgan Kaufmann, 2017. [25] S. Álvarez-Cortés, J. Serra-Sagristà, J. Bartrina-Rapesta, and M. W. Marcellin, “Regression wavelet analysis for near-lossless remote sensing data compression,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 2, pp. 790–798, 2020. doi: 10.1109/ TGRS.2019.2940553. [26] D. Valsesia and E. Magli, “High-throughput onboard hyperspectral image compression with ground-based CNN reconstruction,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 12, pp. 9544–9553, Dec. 2019. doi: 10.1109/TGRS.2019.2927434. [27] M. Díaz et al., “Real-time hyperspectral image compression onto embedded GPUs,” IEEE J. Select. Topics Appl. Earth Observat. Remote Sens., vol. 12, no. 8, pp. 2792–2809, 2019. doi: 10.1109/ JSTARS.2019.2917088. [28] S.-E. Qian, Optical Satellite Signal Processing and Enhancement. Bellingham, WA: SPIE, 2013. [29] Z. Chen, Y. Hu, and Y. Zhang, “Effects of compression on remote sensing image classification based on fractal analysis,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 7, pp. 4577–4590, July 2019. doi: 10.1109/TGRS.2019.2891679. [30] J. García-Sobrino, J. Serra-Sagristà, and A. J. Pinho, “Competitive segmentation performance on near-lossless and lossy compressed remote sensing images,” IEEE Geosci. Remote Sens. Lett, vol. 17, no. 5 , pp. 834–838, 2020. doi: 10.1109/LGRS.2019.2934997. [31] F. Garcia-Vilchez et al., “On the impact of lossy compression on hyperspectral image classification and unmixing,” IEEE Geosci. Remote Sens. Lett., vol. 8, no. 2, pp. 253–257, 2010. doi: 10.1109/ LGRS.2010.2062484. 118 [32] I. Blanes, E. Magli, and J. Serra-Sagrista, “A tutorial on image compression for optical space imaging systems,” IEEE Geosci. Remote Sens. Mag. (replaces Newslett.), vol. 2, no. 3, pp. 8–26, Sept. 2014. doi: 10.1109/MGRS.2014.2352465. [33] A. D. George and C. M. Wilson, “Onboard processing with hybrid and reconfigurable computing on small satellites,” Proc. IEEE, vol. 106, no. 3, pp. 458–470, 2018. doi: 10.1109/JPROC. 2018.2802438. [34] Low-Complexity Lossless and Near-Lossless Multispectral and Hyperspectral Image Compression, Consultative Committee for Space Data Systems (CCSDS) Standard CCSDS 123.0-B-2, Feb. 2019. [Online]. Available: https://public.ccsds.org/Pubs/123x0b2c1.pdf [35] Lossless Multispectral & Hyperspectral Image Compression. Silver Book, Consultative Committee for Space Data Systems (CCSDS) Standard CCSDS 123.0-B-1-S, May 2012. [Online]. Available: https://public.ccsds.org/Pubs/123x0b1ec1s.pdf [36] A. Kiely et al., “The new CCSDS Standard for low-complexity lossless and near-lossless multispectral and hyperspectral image compression,” in Proc. 6th Int. Workshop on On-Board Payload Data Compression (OBPDC), 2018, pp. 1–6. [37] I. Blanes, A. Kiely, M. Hernández-Cabronero, and J. Serra-Sagristà, “Performance impact of parameter tuning on the CCSDS-123.0B-2 low-complexity lossless and near-lossless multispectral and hyperspectral image compression standard,” MDPI Remote Sens., vol. 11, no. 11, p. 1390, 2019. doi: 10.3390/rs11111390. [38] Lossless Data Compression, Consultative Committee for Space Data Systems (CCSDS) Standard CCSDS 121.0-B-1-S, May 1997. [Online]. Available: https://public.ccsds.org/Pubs/121x0b1sc2.pdf [39] Image Data Compression, Consultative Committee for Space Data Systems (CCSDS) Standard CCSDS 122.0-B-1-S, May 2005. [Online]. Available: https://public.ccsds.org/Pubs/122x0b1c3s .pdf [40] Lossless Data Compression, Consultative Committee for Space Data Systems (CCSDS) Standard CCSDS 121.0-B-2, Apr. 2012. [Online]. Available: https://public.ccsds.org/Pubs/121x0b2ec1s .pdf [41] Image Data Compression, Consultative Committee for Space Data Systems (CCSDS) Standard CCSDS 122.0-B-2, Sept. 2017. [Online]. Available: https://public.ccsds.org/Pubs/ 122x0b2.pdf [42] Spectral Preprocessing Transform for Multispectral and Hyperspectral Image Compression, Consultative Committee for Space Data Systems (CCSDS) Standard CCSDS 122.1-B-1, Sept. 2017. [Online]. Available: https://public.ccsds.org/Pubs/122x1b1.pdf [43] Lossless Data Compression, Consultative Committee for Space Data Systems (CCSDS) Standard CCSDS 121.0-B-3, Aug. 2020. [Online]. Available: https://public.ccsds.org/Pubs/121x0b3.pdf [44] D. Báscones, C. González, and D. Mozos, “Parallel implementation of the CCSDS 1.2.3 standard for hyperspectral lossless compression,” MDPI Remote Sens., vol. 9, no. 10, p. 973, 2017. doi: 10.3390/rs9100973. [45] A. Tsigkanos, N. Kranitis, G. A. Theodorou, and A. Paschalis, “A 3.3 Gbps CCSDS 123.0-B-1 multispectral hyperspectral image compression hardware accelerator on a space-grade SRAM FPGA,” IEEE Trans. Emerg. Topics Comput., early access, July 12, 2018. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
[46] J. Fjeldtvedt, M. Orlandić, and T. A. Johansen, “An efficient real-time FPGA Implementation of the CCSDS-123 compression standard for hyperspectral images,” IEEE J. Select. Topics Appl. Earth Observat. Remote Sens., vol. 11, no. 10, pp. 3841–3852, 2018. doi: 10.1109/JSTARS.2018.2869697. [47] M. Orlandić, J. Fjeldtvedt, and T. A. Johansen, “A parallel FPGA implementation of the CCSDS-123 compression algorithm,” MDPI Remote Sens., vol. 11, no. 6, p. 673, 2019. doi: 10.3390/ rs11060673. [48] L. M. V. Pereira, D. A. Santos, C. A. Zeferino, and D. R. Melo, “A low-cost hardware accelerator for CCSDS 123 predictor in FPGA,” in Proc. IEEE Int. Symp. Circuits and Syst. (ISCAS), 2019, pp. 1–5. doi: 10.1109/ISCAS.2019.8702428. [49] L. Santos, L. Berrojo, J. Moreno, J. F. López, and R. Sarmiento, “Multispectral and hyperspectral lossless compressor for space applications (HyLoC): A low-complexity FPGA implementation of the CCSDS 123 standard,” IEEE J. Select. Topics Appl. Earth Observat. Remote Sens., vol. 9, no. 2, pp. 757–770, 2016. doi: 10.1109/JSTARS.2015.2497163. [50] L. Santos, A. J. Gomez, and R. Sarmiento, “Implementation of CCSDS standards for lossless multispectral and hyperspectral satellite image compression,” IEEE Trans. Aerosp. Electron. Syst., vol. 56, no. 2, pp. 1120–1138, 2020. doi: 10.1109/ TAES.2019.2929971. [51] Y. Barrios, A. J. Sánchez, L. Santos, and R. Sarmiento, “Shyloc 2.0: A versatile hardware solution for on-board data and hyperspectral image compression on future space missions,” IEEE Access, vol. 8, pp. 54,269–54,287, 2020. doi: 10.1109/ACCESS.2020.2980767. [52] “High-speed integrated satellite data systems for leading EU industry,” European Commission, Hi-SIDE Project, H2020-COMPET-3-2017 (RIA): High speed data chain, Gemany, 2018–2021. [53] “Next generation satellite processing chain for rapid civil alerts,” European Commission, EO-ALERT Project, H2020COMPET-3-2017 (RIA): High speed data chain Spain, 2018– 2021. [54] D. Keymeulen et al., “High performance space data acquisition, clouds screening and data compression with modified COTS embedded system-on-chip instrument avionics for space-based next generation imaging spectrometers (NGIS),” in Proc. 6th Int. Workshop on On-Board Payload Data Compression (OBPDC), 2018, pp. 7–15. [55] “Copernicus Hyperspectral Imaging Mission for the Environment, mission requirements document.” European Space Agency, France, 2018. http://esamultimedia.esa.int/docs/EarthObservation/Copernicus_CHIME_MRD_v2.1_Issued20190723.pdf [56] M. Conoscenti, R. Coppola, and E. Magli, “Constant SNR, rate control, and entropy coding for predictive lossy hyperspectral image compression,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 12, pp. 7431–7441, 2016. doi: 10.1109/TGRS.2016.2603998. [57] J. Bartrina-Rapesta, I. Blanes, F. Aulí-Llinàs, J. Serra-Sagristà, V. Sanchez, and M. W. Marcellin, “A lightweight contextual arithmetic coder for on-board remote sensing data compression,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 8, pp. 4825–4835, 2017. doi: 10.1109/TGRS.2017.2701837. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE [58] J. Song, Z. Zhang, and X. Chen, “Lossless compression of hyperspectral imagery via RLS filter,” Electron. Lett., vol. 49, no. 16, pp. 992–994, 2013. doi: 10.1049/el.2013.1315. [59] F. Gao and S. Guo, “Lossless compression of hyperspectral images using conventional recursive least-squares predictor with adaptive prediction bands,” J. Appl. Remote Sens., vol. 10, no. 1, p. 015010, 2016. doi: 10.1117/1.JRS.10.015010. [60] A. C. Karaca and M. K. Güllü, “Lossless hyperspectral image compression using bimodal conventional recursive leastsquares,” Remote Sens. Lett., vol. 9, no. 1, pp. 31–40, 2018. doi: 10.1080/2150704X.2017.1375612. [61] A. C. Karaca and M. K. Güllü, “Superpixel based recursive leastsquares method for lossless compression of hyperspectral images,” Multidimensional Syst. Signal Process., vol. 30, no. 2, pp. 903–919, 2019. [62] D. Keymeulen et al., “High performance space computing with system-on-chip instrument avionics for space-based Next Generation Imaging Spectrometers (NGIS),” in Proc. NASA/ESA Conf. Adaptive Hardware and Syst. (AHS), Aug. 2018, pp. 33–36. doi: 10.1109/AHS.2018.8541473. [63] M. Klimesh, “Low-complexity lossless compression of hyperspectral imagery via adaptive filtering,” Jet Propulsion Lab., NASA, Pasadena, CA, Tech. Rep., 2005. [Online]. Available: http://ipnpr.jpl.nasa.gov/progress_report/42-163/163H.pdf [64] D. Valsesia and E. Magli, “A novel rate control algorithm for onboard predictive coding of multispectral and hyperspectral images,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 10, pp. 6341–6355, 2014. doi: 10.1109/TGRS.2013.2296329. [65] D. Valsesia and E. Magli, “Fast and lightweight rate control for onboard predictive coding of hyperspectral images,” IEEE Geosci. Remote Sens. Lett., vol. 14, no. 3, pp. 394–398, 2017. [66] R. Guerra, Y. Barrios, M. Díaz, A. Baez, S. López, and R. Sarmiento, “A hardware-friendly hyperspectral lossy compressor for nextgeneration space-grade field programmable gate arrays,” IEEE J. Select. Topics Appl. Earth Observat. Remote Sens., vol. 12, no. 12, pp. 4813–4828, 2019. doi: 10.1109/JSTARS.2019.2919791. [67] Lossless Data Compression, Green Book, no. 3, Consultative Committee for Space Data Systems (CCSDS), Washington, D.C., 2013. [68] I. Blanes, A. Kiely, L. Santos, M. Hernández-Cabronero, and J. Serra-Sagristà, “The hybrid entropy encoder of CCSDS 123.0B-2: Insights and decoding process,” in Proc. 7th Int. Workshop on On-Board Payload Data Compression (OBPDC), Sept. 2020, pp. 1–10. [69] M. Hernández-Cabronero, J. Portell, I. Blanes, and J. SerraSagristà, “High-performance lossless compression of hyperspectral remote sensing scenes based on spectral decorrelation,” MDPI Remote Sens., vol. 12, no. 18, p. 2955, 2020. doi: 10.3390/rs12182955. [70] E. Magli, G. Olmo, and E. Quacchio, “Optimized onboard lossless and near-lossless compression of hyperspectral data using CALIC,” IEEE Geosci. Remote Sens. Lett., vol. 1, no. 1, pp. 21–25, 2004. doi: 10.1109/LGRS.2003.822312. [71] F. A. Kruse et al., “The spectral image processing system (SIPS)interactive visualization and analysis of imaging spectrometer data,” AIP Conf. Proc., vol. 283, no. 1, pp. 192–201, 1993. GRS 119
©SHUTTERSTOCK.COM/SALMANALFA Advances and Opportunities in Remote Sensing Image Geometric Registration A systematic review of state-of-the-art approaches and future research directions RUITAO FENG, HUANFENG SHEN, JIANJUN BAI, AND XINGHUA LI G Digital Object Identifier 10.1109/MGRS.2021.3081763 Date of current version: 28 June 2021 OVERVIEW Remote sensing images from various sensors, periods, and viewpoints can provide complementary information about regions of interest (ROIs) and Earth surface observation. Owing to various factors, such as Earth’s rotation and curvature and variations in platform altitudes, remote sensing images contain systematic geometric distortions that cannot be thoroughly corrected without high-precision elevation data [through the digital elevation model (DEM) or the digital surface model (DSM)] and control points on the ground. Although the true digital orthophoto map (TDOM) promises accurate spatial positions, it has high production costs and is difficult for general users to obtain. Therefore, most available remote sensing images retain small geometrical distortions after systematic correction, resulting in objects in one image not spatially corresponding to those in another image, as in Figure 1. Furthermore, topographical fluctuations in mountainous regions, differences in imaging viewpoints (shown in Figure 2), and spatial resolutions cause dislocation in two 0274-6638/21©2021IEEE IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE eometric registration is often an accuracy assurance for most remote sensing image processing and analysis, such as image mosaicking, image fusion, and time-series analysis. In recent decades, geometric registration has attracted considerable attention in the remote sensing community, leading to a large amount of research on the subject. However, few studies have systematically reviewed its current status and deeply investigated its development trends. Moreover, new approaches are constantly emerging, and some issues still need to be solved. Thus, this article presents a survey of state-of-the-art approaches for remote sensing image registration in terms of intensity-based, feature-based, and combination techniques. Optical flow estimation and deep learning-based methods are summarized, and software-operated registration and registration evaluation are introduced. Building on recent advances, promising opportunities are explored. 120 DECEMBER 2021
(a) (b) (c) FIGURE 1. Multitemporal optical image geometrical dislocation. (a) A reference image taken by Landsat 5 on 15 October 1990. (b) A sensed image taken by Landsat 5 on 15 September 1993. (c) The overlapping images of (a) and (b). images covering the same scene. Thus, geometrical registration techniques are implemented to align two or more images from the image-to-image perspective rather than the imaging mechanism. Consequently, geometrical registration is an image-processing technique that aligns different images of the same scene acquired at various times and viewing angles and with multiple sensors [1]. As a fundamental task in remote sensing information processing, it is a prerequisite for many practical applications, such as image mosaicking [2], image fusion [3], land cover change detection [4], [5], and disaster evaluation [6], [7]. It worth noting that there is a technical term, coregistration, that is similar but not exactly the same as image registration. It is now commonly used in aerial and unmanned aerial vehicle image registration, generally including multimode registration and alignment through the aid of auxiliary data. When the registration is conducted with a GPS/inertial measurement unit, it usually establishes a connection between an image and the simulated or real ground [8]. Certainly, the registration technology works on tie points generation for the construction of relationships. With real ground control points (GCPs), the tie points between the reference and sensed bands are produced to register different bands of hyperspectral images [9]. Additionally, when the orientation of the reference image is determined, without GCPs, the coregistration of multitemporal high-resolution image blocks is automatically achieved [10]. Although there are time-increasing papers focused on coregistration techniques doing some auxiliary work with the positioning data, the core of the process is image registration, as far as we are concerned. Therefore, the emphasis is put on the opportunities and challenges of geometrical registration in remote sensing fields. Geometrical registration can be traced to the 1970s, when the United States proposed image registration to analyze target objects in aircraft-aided navigation and weapons systems. Since then, it has rapidly developed, particularly in the domains of remote sensing, computer vision, DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE and medical image processing. Some conclusive studies of computer vision and medical image processing have been published [11]–[16]. Building on a widespread survey of image registration, published in 1992 by Brown [15], a 2003 review [16] comprehensively summarized the subsequent research. In recent years, several overviews of image registration have focused on newly developed approaches inspired by extant versions [17]–[19]. However, these surveys are limited to analyzing and drawing conclusions based on conventional approaches [20]–[22]. Since the first study of multispectral and multitemporal digital imagery registration in 1970 [23], an increasing number of papers have contributed to the field. A total of 140,983 related studies with the keywords image registration or image matching were retrieved, from 1979 to January 2021, from Web of Science (WoS). When screening again using the keyword remote sensing, 46,141 articles were found, as plotted, based on their publication year, in Figure 3. The respective proportions of the total number of papers on WoS per year are also presented. It can be seen that a small number of papers T T+t FIGURE 2. The angle difference from multitemporal images in a mountainous region. 121
feature-based, and combination registration, as detailed in Figure 4. The intensity-based technique directly uses pixel intensity information to register images, including the conventional area-based approach and optical flow estimation. The geometrical and advanced features used to register images instead of intensity information are defined as featurebased approaches. Combination registration mainly consists of the integration of feature- and area-based methods as well as two geometric feature-based techniques. Many detailed classifications are presented in each category. All registration approaches must undergo coordinate transformation and resampling to ultimately acquire the aligned image, as demonstrated in Figure 5. Before this step, transformation models for coordinate recalculation other than optical flow estimation should be constructed. In general, transformation models, such as the affine, projective, piecewise linear, and thin spline models, are 3,500 0.14 derived from global or local paramet2019: 3,154 Published Papers ric models. To calculate these models, 3,000 0.12 Proportion images are preprocessed to extract 2,500 0.1 representative features through techniques including geometrical- and 2,000 0.08 advanced-feature extraction and 1,500 0.06 matching. Given that intensity information is directly utilized in area1,000 0.04 based registration, feature extraction is omitted, and the transformation 500 0.02 model is constructed when matching 0 0 the intensity information. Since most approaches prefer to contribute to the preliminary steps (e.g., feature exYear traction, feature matching, and mismatched feature elimination) rather FIGURE 3. The number of papers about remote sensing image registration on WoS, per year. 1979 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 2019 2021 Proportion (%) Published Papers were presented early in the field’s development, with remote sensing image registration accounting for a minimal percentage of annual WoS publications. More recently, a considerable number of studies have been published, peaking in 2019. Thus, comprehensive analysis is necessary to identify unsolved problems for the rapid development of this field. In this article, we summarize various classical approaches to remote sensing image registration as well as recent methods based on deep learning, optical flow estimation, and image registration software. We also point out interesting aspects and analyze development trends from our perspective, without describing specific approaches in detail. Concretely, the registration approaches can be classified into three categories, namely, intensity-based, • Two Geometric Feature-Based Methods • Feature- and Area-Based Methods Combination Method Remote Sensing Image Registration Intensity-Based Method • Area Based • Optical Flow Feature-Based Method • Geometrical Feature Based • Deep Learning Frequency Domain Dense Optical Flow Points, Lines, and Polygons Siamese Network Spatial Domain Sparse Optical Flow Feature Matching... GAN FIGURE 4. The remote sensing image registration algorithms. GAN: generative adversarial network. 122 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
than designing new transformation models and presenting novel resampling techniques, this article emphasizes the previous steps, as well, comprehensively summarizing studies and further predicting development trends. INTENSITY-BASED REGISTRATION Intensity-based registration directly employs original or extended intensity information, such as gradients, for registering remote sensing images. In addition to the traditional area-based approach, we classify optical flow estimation, a direct calculation of the increased displacement of corresponding pixels with intensity information, as intensitybased registration. AREA-BASED METHOD In general, area-based registration accords with a similarity criterion established in advance and adopts the optimal search strategy to iteratively find the parameters of the transformation model that yield the maximum or minimum similarity measurement to achieve the spatial registration of images, as illustrated in Figure 6. With the transformation model constantly being optimized, the aligned image changes gradually, which is mainly reflected in the growing black area in the lower- and upper-left-hand corners of the aligned image. This approach differs from image matching, which is generally understood as template matching. Although both methods directly employ intensity information, template matching aims to extract the centroids of matched windows as a feature point. This process is not true geometric registration, but it constitutes an important step. Here, we introduce areabased registration. The well-known core of this technique is the similarity metric, which has been researched in terms of spatial- and frequency-domain approaches [16], [24], [25]. SPATIAL-DOMAIN APPROACH Spatial-domain techniques directly employ intensity difference and statistical information of all pixels, without any image transformation. These methods generally come at the problem from one of two perspectives, namely, the correlation-like technique or the mutual information (MI) algorithm. CORRELATION-LIKE SIMILARITY METRIC This technique determines the spatial alignment of images by directly comparing the similarity of corresponding pixels. It is vulnerable to intensity changes, which may be introduced, for instance, by noise, thick or thin clouds, and differences in the photosensitive components of various sensors. As a fundamental similarity metric, the cross-correlation (CC) algorithm directly calculates the difference between corresponding pixels to iteratively register images until they have the largest CC, which is useful for small rigid-body and affine transformation [26], [27]. Many other correlation-like similarity metrics are available, including the sequential similarity detection algorithm [28], correlation coefficient [29], [30], normalized CC (NCC) [31]–[33], sum of squared differences [34], Hausdorff distance [35], and other minimum distance criteria. NCC, in DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE particular, is very popular and widely applied due to its invariance to linear intensity variations [31], [36], [37]. Recently, the centers of windows well-matched by NCC have been used as feature points to solve transformation model parameters [38], namely, image matching. Supposing t (R, S) to be the NCC coefficient of matched windows, we calculate NCC as follows: m#n t (R, S) = / (R (i) - n R)(S (i) - n S) i=1 m#n m#n i=1 i=1 / (R (i) - n R) 2 / (S (i) - n S) 2 ,(1) where the predefined window consists of m # n pixels, R (i) and S (i) denote specified positions in the windows of the reference and sensed images, and n R and n S are the average intensity values of a specified window. The algorithm was developed to generate tie points that resist complicated geometric deformation [31], [38], [39]; it has recently been integrated with a novel feature descriptor [e.g., the local self-similarity (LSS) descriptor] for robust feature extraction in multimodal remote sensing image registration [36]. Although NCC is superior to the traditional correlation-like similarity metric, it is unable to handle the nonlinear radiometric difference, which is a common problem for correlation-like similarity metrics. MI APPROACH MI has appeared recently compared with correlation-like techniques; it has been successfully applied to multispectral and multisensor image registration due to its robustness against nonlinear radiation differences [40]–[43], which are usually calculated by (2). The normalized MI (NMI) method is a measure that is independent of changes in the marginal entropies of two images in their region of overlap [44], [45]. MI and NMI are the same type of statistical similarity measurement, and both are prone to registration errors. Inspired by these approaches, the region–MI approach was developed [46] with consideration of structural information. Reference Image Intensity Information Gradient Information Sensed Image Geometrical Feature Image Preprocessing Transformation Model Construction Advanced Feature Displacement Field Coordinate Transformation/Resampling Aligned Image FIGURE 5. General geometrical registration. 123
where H (R) and H (S) are the Shannon entropies of the reference and sensed images, respectively; H (R, S) represents the mutual entropy; P (r) and P (s) are the marginal probability distributions of R and S; and P (r, s) is the joint probability distribution that is calculated, in practice, by 2D histogram binning as the discrete random variables. Additionally, there is an MI registration based on displacement maps, which is similar to optical flow estimation. In this variational framework, MI is employed as the similarity metric for displacement calculation [47]. Overall, the MI-like algorithms originating from information theory are a measure of the statistical Furthermore, rotationally invariant regional MI considers not only the spatial information but also the influence that local gray variations and rotation changes have on the computation of the probability density function [45]: MI (R, S) = H (R) + H (S) - H (R, S), H (R) = - / P (r) log 2 P (r), r!R H (S) = - / P (s) log 2 P (s),  s!S H (R, S) = - / P (r, s) log 2 P (r, s), (2) r ! R, s ! S (a) (d) (g) (c) (b) (f) (e) (h) (i) FIGURE 6. Conventional area-based registration. Pay attention to how the black-edge region changes in the lower- and upper-left corners of the aligned image. (a) The aligned images overlapping. (b) The sensed image. (c) The original images overlapping. (d) The fifth iteration. (e) The reference image. (f) The first iteration. (g) The fourth iteration. (h) The third iteration. (i) The second iteration. 124 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
dependence between two data sets and particularly suitable for registration with different imaging mechanisms. However, they are computationally expensive, which may be restrictive, as remote sensing images are always relatively large. or the overlap between two scenes inevitably reduces their robustness [16], [25]. Overall, intensity-based approaches directly use the pixel value of an image, without error accumulation, offering high-precision registration. However, these algorithms have limitations in terms of large rotations, translations, scale differences, and so on and are quite time-consuming. FREQUENCY-DOMAIN APPROACHES Frequency-domain approaches indirectly utilize intensity inOPTICAL FLOW ESTIMATION formation, transforming an image and exploiting its frequenSimilar to the area-based approaches, optical flow estimation cy-domain features for registration. By so doing, they accelcalculates object motions with direct and indirect consistency erate the computational speed of relatively small geometric constraints based on pixel intensity. This technique is popudislocations. Fourier techniques are typical representations lar in computer vision for motion estimation. Owing to the of frequency-domain registration, which were first used to similarity between the displacements of corresponding pixels register images with translational changes [48]. Phase-based under the same coordinate system and the optical flow of an correlation approaches [23], [49]–[51] exploit the Fourier object, some studies have utilized optical flow estimation to transform to register images by searching for global optimal register remote sensing images [60], [61]. Unlike area-based matching [53]; they compute the cross-power spectra of the approaches, optical flow estimation calculates pixel displacesensed and reference images and seek the location of the ment based on intensity and gradient consistency constraints peak. The translational and rotational properties of the Foufor coordinate recalculation. After resampling, the intensity rier transform are employed to calculate the transformation value is assigned to the new noninteger position, and the parameters [53]. Frequency domain approaches are robust aligned image is acquired [62], as summarized in Figure 7. against frequency-dependent noise and illumination changOptical flow is a 2D displacement field that describes the es. They also contribute to the acceleration of computational apparent motion of brightness patterns between two succesefficiency [54] since they neither involve feature extraction, sive images [63], and its concept was proposed by Gibson as feature-based approaches do, nor require an optimization approach in the spatial domain, which would increase their [64]. Horn and Schunck (HS) [63] and Lucas and Kanada computational complexity [53]. However, given that the Fou(LK) [65] proposed a differential approach for optical flow rier transform offers poor spatial localization, the operation calculation in 1981. Since then, many extensions and modican be replaced by a wavelet transform with strong spatial and fications have been proposed for video image processing frequency localization [55], which can be applied to remote [66]–[68]. Given that the process is at the initial stage of desensing image registration [56]. Recently, phase congruency velopment in the remote sensing field and that many stud(PC) has been used to represent structural information in ies have focused on differential techniques, the following remote sensing images; it is similar to the image gradient but is invariant in terms of image contrast and brightReference Image Sensed Image ness variations [57], [58]. In short, most correlation-like approaches are statistical similarity metrices that do not facilitate structural Displacement information or high computational Calculation complexity. Owing to their easy hardware implementations, they remain in frequent use for registration evaluation [59]. Fourier techniques have some advantages in terms of comPixels (Assumption) putational sufficiency, and they are u1 = u1′ + ∆u1 robust against frequency-dependent noise. However, they have limitations v1 = v1′ + ∆v1 u = u ′ + ∆u in the case of image pairs with signifiCoordinate v = v ′ + ∆v cantly different spectral content. AlTransformation un = un′ + ∆un though MI methods offer outstandvn = vn′ + ∆vn ing performance compared with the two aforementioned algorithms, they do not always provide a global FIGURE 7. Optical flow estimation for remote sensing image registration. [(u i , v j ) indicates the maximum of the entire search space pixel coordinates in the reference image, and (uli , v lj ) indicates the coordinates of the correfor the transformation, as images sponding pixel in the sensed image. The coordinate difference, which we called the displacecontaining insufficient information ment, is depicted as (Tu i , Tv j ).] DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 125
aspects are generally emphasized in research on remote sensing image registration. DENSE OPTICAL FLOW ESTIMATION The differential method for dense optical flow calculation proposed by HS is generally called the typical global approach [63]. Dense optical flow calculates each pixel’s motion in a scene, as in Figure 8. The regular grid represents image pixels, and the displacement is displayed at equal intervals, where only the displacement directions and magnitudes of the green pixels are marked, for brevity. The HS optical flow integrates the brightness constancy assumption and the global smoothness constraint to separately estimate the pixel motion in the x and y directions. The intensity value constancy assumption is markedly susceptible to slight brightness changes [69], which are inevitable for remote sensing images. Applying the spatial gradient constancy assumption to the HS equation [as in (3)] is popular in research on multitemporal remote sensing image registration [62], [69]: E (u, v) =  #X } (; I (x + w) - I (x) ;2 + c ; dI (x + w) - dI (x) ;2) dx (3) + a # } (; d 3 u ;2 + ; d 3 v ;2) dx, X where w = (u, v, 1) T is the pixel displacement to be solved, X = (x, y, t) T is a pixel coordinate, } (s 2) = s 2 + f 2 is an increasing concave function, and f is a fixed value. Here, a and c are the weights for the gradient and smoothness terms, respectively, and d 3 = (2 x, 2 y, 2 t) T indicates a spatiotemporal smoothness assumption and is often replaced by the spatial gradient when used for remote sensing image registration. Owing to the advantages of the per-pixel computation of optical flow estimation, very local deformation due to terrain elevations can be eliminated. Occlusion remains a challenge for accurate dense optical flow calculation [66], which is similar to land use (LU) and land cover (LC) changes in remote Pixels FIGURE 8. Dense optical flow. 126 Optical Flow sensing images [62]. Under this circumstance, an object in the reference (sensed) image cannot be sought in the sensed (reference) image. For example, in the yellow, rounded rectangles in Figure 9(a) and (b), a road disappears in the sensed image. This leads to further abnormal pixel displacement, in Figure 9(c), where the magnitudes and directions of the displacements are inconsistent with the neighborhood. The successive abnormal displacements further change the content of the aligned image, although it is highly geometrically aligned with the reference image in Figure 9(d). This change opposes the principle of image registration in that it does not alter the image content but spatially aligns the sensed and reference images. After the abnormal displacement correction, the recalculated displacement is similar to that of the surrounding region, as in Figure 9(e). Furthermore, the aligned image is similar to the corresponding region in the sensed image in Figure 9(b), and the two are spatially aligned with the reference image, as in Figure 9(f). For large-scale movements, which are another concern when applying optical flow for remote sensing image registration, an improved approach was proposed in [70]. The pixel displacement calculated by the extended phase correlation technique is determined as the initial motion estimator for the global optical flow to achieve general remote sensing image registration, especially for large-scale movement deformation [70]. However, given that dense optical flow estimation calculates the displacement for each pixel, it is unavailable for the real-time registration of large images, although it provides a high-precision result. SPARSE OPTICAL FLOW ESTIMATION Sparse optical flow estimation is more popular for remote sensing image registration than its dense counterpart is. The sparse optical flow represented by the local difference may be supported in a specified local region, such as the position of the feature points extracted by popular extractors, including the scale-invariant feature transform (SIFT), as shown in Figure 10. This approach assumes that pixel motions are identical within a local neighborhood and estimates the optical flow by performing least-squares regression with a set of similar equations [66]. The LK gradient-based approach [65], as the origin, is widely used to estimate the motion of video images, on an equal footing with the HS model. The GeFOLKI algorithm was developed from LK and implemented on a graphics processing unit to achieve real-time and robust optical flow estimation [60], [71]. Furthermore, the GeFOLKI algorithm is adopted for the coregistration of heterogeneous data, such as synthetic aperture radar (SAR) lidar images and SAR optical images [61]. Subsequently, given the different imaging mechanisms of SAR and high-resolution optical images, which benefit from the high registration precision of optical flow estimation, two dense feature descriptors replace raw intensities when aligning images by an optical-to-SAR flow; this combines the global and local optical flow estimation approaches [72]. Sparse optical flow based on specified and distinct pixels is computationally time saving, whereas IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
(a) (b) (c) (d) (e) (f) FIGURE 9. Abnormal displacement detection and correction. (a) The reference image. (b) The sensed image. (c) The displacement field estimated by (3). (d) The aligned image overlapping (a). (e) The corrected displacement field. (f) The aligned image formed by overlapping the corrected optical flow with (a) The highlighted road in (a) disappears in (b), leading to similar occlusion. its accuracy for remote sensing image registration is relatively low compared with the dense optical flow approach. In addition, it is not vulnerable to LU–LC changes because it does not have similar features for sparse optical flow estimation in the changed region. In summary, optical flow estimation has been developed in computer vision for motion estimation in superresolution reconstruction for several decades, whereas it is in the initial stage of use in remote sensing image registration. Optical flow estimation is a superior pixel displacement calculation approach that is particularly interesting in the case of very local deformation due to, for example, terrain elevation, which has considerable influence on high-resolution image registration [61]. The efficiency of optical flow estimation should be considered when applying it to remote sensing because a wide field of view (WFV) is a characteristic of remote sensing image. Therefore, due to social development and seasonal changes, LU–LC changes are frequent phenomena for multitemporal remote sensing images. The dense optical flow approach is sensitive to such changes, leading to abnormal displacement and the alteration of the content of an aligned image. Therefore, efficient and accurate correction should be integrated into the initial optical flow estimation when used for registration. and automatically detected to represent the original remote sensing image. The feature correspondence is then established between the reference and sensed images by a similarity comparison of the feature descriptors. The geometric relationship is calculated, guiding a sensed image that is spatially aligned with the reference. Ultimately, coordinates in the sensed image are transformed. The transformed coordinates are usually noninteger, and they are calculated by interpolation to acquire their intensity values, as demonstrated in Figure 11. In the following, we summarize geometrical feature extraction and matching because research into this subject has been at the core of the traditional feature-based approach. FEATURE EXTRACTION The feature extraction mentioned here is a representation of feature detection and extraction. Detection aims to locate distinctive features in an image and determine their positions. In the feature-extraction stage, the recognizable descriptor is uniquely constructed, identifying the detected feature. Formerly, features were manually selected. This approach is still in use today, as in the “image-to-image registration” module in Environment for Visualizing Images (ENVI) software. Experts require a considerable amount of time for this approach, FEATURE-BASED REGISTRATION The feature-based approach directly exploits the abstract features of an image, rather than the pixel intensity, for registration. Feature refers to a distinct geometrical or advanced characteristic extracted by a specified approach. Geometrical features are distinct points, line segments, and closed boundary regions in a remote sensing image that can be detected or extracted by extant or novel approaches. Advanced features are abstract descriptions of local regions, which are extracted by a neural network (NN) (especially in the deep learning approach) to represent the original image. Geometric features are understood as being conventional for feature-based registration, and the use of advanced features is defined as novel feature-based registration. CONVENTIONAL FEATURE-BASED METHOD In general, salient and distinctive features, such as points, line segments, and closed boundary regions, are manually DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE Pixels Feature Points Optical Flow FIGURE 10. Sparse optical flow. 127
especially for large remote sensing images. At present, many methods have been proposed to automatically acquire representative features. Common geometrical features, including salient points (line intersections, corners, points on curves with high curvature, and road crossings) [73], [74], polylines (roads, contours, and edges) [41], [75], and polygons (closed boundary regions and lakes) [76], are selected by the specified approach. As shown in Figure 11, the yellow points, line segments, and regions are detected to abstractly describe the original image. FEATURE POINTS The local points at which the gray value varies dramatically in all directions are feature points, including corner points, inflection points, and T-intersection points. Many attempts have been made to extract them in computer vision, inspiring the development of feature point extraction in remote sensing. The first corner detection approach was proposed by Moravec in 1977 [77]. This algorithm has fast computation but is sensitive to noise and vulnerable to image rotation, leading to its rare use in the remote sensing field. The Harris corner detector was proposed in 1988 [78]. This algorithm is invariant under grayscale and rotational changes. It and improved Harris algorithms are applied to remote sensing image processing [38], [74], [79], [80], mainly with respect to multiscale corner detection. Smith and Brady presented the smallest unvalued segment assimilating nucleus operator [81], which is insensitive to local noise and has high anti-interference ability [82]. However, it is not widely used in remote sensing image registration [83], whereas the SIFT algorithm is [45], [58], [74], [84]–[90]. The SIFT was developed by Lowe [92] and is invariant under rotation, scale, and translational changes [93]. It has been followed by many improved versions, such as principal component analysis SIFT [94], scale-restriction (SR) SIFT [36], [95], affine SIFT [96], and uniform robust SIFT [97], [98]. Moreover, the speeded-up robust features (SURF) [99] algorithm was proposed, by Bay et al. to overcome the time-consuming nature of the SIFT for large-scale remote sensing images [100]–[102]. SURF applies an integral image to compute image derivations and quantifies the gradient orientations in a small number of histogram bins [103]. Additionally, the features from accelerated segment test (FAST) [104]; binary, robust, independent elementary features (BRIEF) [105]; oriented FAST and rotated BRIEF [106], [107]; Kaze [108]; and accelerated Kaze [109] algorithms are fast tools for descriptor construction but are less widely utilized in remote sensing. In addition, a novel key point detector combining corners and blobs for remote sensing image registration is under development to increase the number of correctly matched features [110]. Recently, looking at intensity differences in multimodal remote sensing images, robust and novel feature descriptors have been adopted to depict detected feature points; these include the LSS descriptor, which accommodates effects such as nonlinear intensity differences [36]; the histogram of oriented PC, based on structural similarity measures [57]; and maximally stable PC, representing a novel affine and contrastinvariant descriptor [111]. All these coincidentally absorb PC information. PC is similar to the image gradient, presenting structural information with resistance to variations in illumination [112]. Therefore, the use of phase consistency information is a trend in the construction of robust feature descriptors for multimodal remote sensing image. FEATURE LINES A feature line is also known as a line feature; it is the generalization of feature points, such as general line segments [113], object contours [75], roads, coast lines [114], and rivers [115]. Given that feature lines have more attributes than feature points as control features [116], they have been gradually developed for use in image registration [117] as well as remote sensing image registration [116], [118], [119]. Standard edge detection, as with the Canny detector [120], [121] and detectors based on the Laplacian of Gaussian [122] are conventional feature line detection approaches [16]. Recently, some excellent detectors generating precise Feature Point Feature Line Feature Region and robust line segments have been proposed [123], [124], and they are suitable for line detection in remote sensing images. Feature lines are comFeature Matching Mismatched Features Elimination paratively less utilized in the remote sensing field than are feature points because matching them is an obstaCoordinates Transformation Model Aligned cle. They are often abstracted from Transformation/Resampling Construction Image corners, midpoints, and endpoints as final features [16], thereby losing FIGURE 11. The geometrical feature-based registration algorithm. their geometric value. 128 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
FEATURE REGION Feature region is a general term for all closed boundary regions of appropriate size, e.g., lakes [125], forests [126], buildings [113], urban areas [127], and so on. Before the robust feature point extraction approach was developed, the feature region was used to indirectly extract feature points. Regions with high contrast were extracted by filtering [128] and image segmentation [129] and described with moment-invariant descriptors [130], [131]. They are often abstracted by their centers of gravity [128], [132]–[135], which are invariant with respect to rotation, scaling, and skewing and are stable under random noise and gray-level variation [16]. Compared with feature points and lines, the extraction and description of feature regions were relatively early foci of research, and they have been used less for recent feature-based registration. FEATURE MATCHING AND MISMATCHED FEATURE ELIMINATION The correspondence relationship between reference and sensed images can be established based on detected feature points, lines, and regions, exploiting various descriptors of features [16], [136], [137]. Mismatched features are an inevitable byproduct of general feature matching, the elimination of which purifies correspondences for generating transformation models that are as accurate as possible. A pair of features with similar attributes is considered a selectable matching despite radiometric differences, noise, image distortion, and so forth. Under the circumstances, a robust matching measurement is essential. Feature matching approaches can be generally classified into two categories, namely, feature similarity and spatial relations. FEATURE SIMILARITY The constructed feature descriptors are used to establish the correspondence between extracted features in the reference and sensed images through feature similarity comparison. Feature similarity is conducted in the feature space by using the Euclidean distance ratio between the first and second nearest neighbors [92]. For efficiency, the k-dimensional tree and the best-bin-first algorithms are employed for feature similarity determination [93], [138]. The clustering technique [140], chamfer matching [141], and PC models are frequently used matching approaches, and they are invariant under intensity changes during matching [1]. SPATIAL RELATIONS Aimed at tie point matching in poor textural regions, approaches based on spatial relations have been developed. Representative of these, graph-based feature points matching considers feature points as graph nodes. Feature matching is then transformed into a node-correspondence problem and solved by graph matching [125], [142]. Graph matching is applied to image feature correspondences, although it is not affine invariant [143]. By finding a consensus nearest-neighbor graph from candidate matches, a graph-transformation matching approach is developed [144]. Targeting the problem DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE in [143], a similar graph matching for tie point matching in poor textural images is proposed [101]. Furthermore, Xiong and Zhang introduced a novel interest point matching for high-resolution satellite images [145]. For this, the relative position and angle are used to reduce ambiguity and to avoid false matching, as the approach is suitable for image shifting and rotation. Affine and large-scale transformations are not considered [144]. MISMATCHED FEATURE ELIMINATION Although the extracted features in a reference image have been matched with the corresponding ones in the sensed image via the aforementioned approach, some mismatched feature points are inevitable, further affecting the transformation model estimation [32], [76]. Therefore, eliminating mismatched features with a specified approach is necessary [146], [147]. Generally, based on the initial matching result, random sample consensus (RANSAC) is used to remove a mismatched point. This method randomly selects a sample from the consensus set in each iteration and finds the largest consensus set to calculate the final model parameters [33], [148]. RANSAC performs well and robustly when there are no more than 50% outliers [144], [149], [150]. Combining the local structure with global information, a restricted spatial order constraints algorithm is developed to find exact matched feature points in reference and sensed images [144]. Based on the affine-invariance property of the triangle-area representation (TAR), a robust sample consensus judging algorithm is proposed to efficiently identify bad samples and ensure accuracy with a light computational load [151]. For images with simple patterns, large affine transformations, and low overlapping areas, a mismatch- removal principle based on the TAR value of the k-nearest neighbors is proposed and referred to as k-nearest neighbors–TAR [149]. Furthermore, an improved RANSAC approach called fast sample consensus is developed to obtain correct matching in a few iterations [150], [152]. Thus, most of the reserved feature points in the reference image accurately correspond to the specified feature points in the sensed image, as the feature points connected by the yellow lines in Figure 12 will add precision to the transformation model estimation in the following step. The geometrical feature-based approach abstracts an original remote sensing image with distinct features instead of its intensity information, which is efficient and can easily process large rotations, translations, and scale differences between reference and sensed images. However, position errors in the automatically extracted features are inevitable, and a few mismatched features cannot be eliminated. This leads to a relatively low registration precision compared with the intensity-based approach. NOVEL FEATURE-BASED REGISTRATION BY DEEP LEARNING Deep learning provides a new concept for remote sensing image registration. It essentially refers to image registration based on advanced feature extraction [153]. Deep learning 129
originated in computer vision and has a long history [154]. In recent years, it has gradually entered use in remote sensing image applications, such as image fusion [155], [156], LC classification [157], [158], and segmentation [159]. The framework is data driven and can generate image features by learning from many training data sets with a specified principle [158]. Therefore, it is suitable for remote sensing image registration. Some studies have focused on feature matching for this purpose [158], [160]. Most utilize a Siamese network consisting of two parts to train a deep NN (DNN) [161]–[164]. One part extracts features from image patch pairs by training a Siamese, pseudo-Siamese, or improved Siamese network [165]; the other part measures the similarity between these features for image matching. In [164], the DNN inspired the construction of a deep learning framework for remote sensing image registration. In addition, generative adversarial networks (GANs) are applied to image matching and registration [166], [167]. These approaches first translate an image into another one by training the GANs, enabling two images to have similar intensities and feature information [166], [168]. Feature extraction and matching are subsequently performed between two artificially generated images, effectively improving the performance of image matching. For the deficiencies of specified-scale NNs, multitask learning is introduced to improve the registration precision [169]. Wang et al. break through the limitations of the traditional deep learning approach, which extracts image features in one network and matches them with the other NN. They design an end-toend network using forward propagation and backward feedback to learn the mapping functions of the patches and their matching labels for remote sensing image registration [164]. Recently, Li et al. paired image blocks from sensed and reference images and directly learned the displacement parameters of four corners of the sensed block relative to the reference image on a deep learning regression network, which differs from the traditional deep learning method [170]. Deep learning has advantages over the traditional registration approach. It is completely data driven and has strong flexibility, enabling it to theoretically fit any complex mapping function, whereas the traditional registration method can deal only with fixed pattern registration. Moreover, deep learning extracts abstract and high-level semantic information. Compared with low-level gray and gradient data, deep FIGURE 12. Feature matching examples. 130 semantic information is more consistent with the way humans understand images. Therefore, deep learning methods can extract robust features. However, deep learning has challenges. It highly depends on image samples; when there is a lack of data or the data quality is poor, deep learning methods have difficulty ensuring the effectiveness of the registration results. Although remote sensing images are now easy to acquire, the lack of manual annotation and standard data is still very serious. Deep learning, in essence, learns the statistical characteristics of a large number of similar images, but its input–output process is a complex, nonlinear mapping without clear physical significance. Additionally, deep learning requires high computing power and has major hardware requirements, limiting its applicability. In short, remote sensing image registration based on deep learning is still in its infancy, and its registration framework is not mature. However, many studies have demonstrated that deep learning methods can achieve or even surpass the optimal level of traditional registration approaches in terms of accuracy and efficiency. We predict that deep learning-based methods will become important solutions to the problem of real-time and high-precision remote sensing image registration. REGISTRATION BASED ON THE COMBINATION METHOD As mentioned, feature- and intensity-based approaches have their own advantages. Different feature extractors also have various precisions. To integrate these strengths as fully as possible, combination techniques have been developed. Typically, popular combinations consist of two aspects, namely, feature- and area-based approaches; however, some integrate two geometric feature-based approaches, such as the SIFT and Harris detectors. COMBINATIONS OF FEATURE- AND AREA-BASED ALGORITHMS Feature-based approaches are typically suitable for images with more significant structural data than intensity information. However, they are restricted by the distribution and accuracy of the features. On the other hand, area-based approaches are appropriate for images with more distinctive intensity information; however, they require the intensity information of the reference and sensed images to be correlated. Thus, the two methods have complementary pros and cons. To further improve registration accuracy and robustness, some studies focus on a combination of geometric featureand area-based techniques [171]. Huang et al. [172] proposed a hybrid approach to aligning images by intensities within a scale-invariant feature region. Elsewhere, a wavelet-based feature extraction technique and an area-based method with NCC were combined to reduce the local distortion caused by terrain relief [173]. In a wavelet-based hierarchical pyramid framework, Mekky et al. [174] proposed a hybrid approach using MI and the SIFT; employing the rough registration parameters of the area-based approach for MI, the number of false alarms obtained by the SIFT was reduced. In addition, IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
Gong et al. employed the robustness of the SIFT and the accuracy of MI, proposing a novel coarse-to-fine registration framework aimed at registering optical and SAR remote sensing images [90]. For multisensor SAR image registration, Suri et al. proposed a multistage registration strategy. The rough parameters of the transformation model are estimated by MI, and this model is introduced during the SIFT matching phase to increase the number of tie points [175]. Under the SIFT and MI combinations, Heo et al. introduced a stereo matching method that produces accurate depth maps [176]. All these approaches can be considered coarse-to-fine-processing chains. The basic idea is to improve the result of the featurebased approach by adopting an optimization process from an area-based technique [90], [171]. The combined methods integrate the robustness of the feature-based algorithm with the accuracy of the area-based approach. They are relatively few compared with individual methods, but their combination will be the focus in the near future, from our point of view. To deal with the possible accumulation of errors, bundle block adjustment is usually needed [178], [179] to register sequential images. Moreover, the integration of different geometric feature-based approaches is being developed, as well, for ever-increasing transformation model estimation accuracy, generating precise registration results to the greatest extent possible. INTEGRATION OF TWO GEOMETRIC FEATURE-BASED APPROACHES In addition to combinations of feature- and area-based techniques, the integration of two geometric feature-based approaches is a developing trend for high-precision registration. In particular, the feature points extracted by different methods are used to register images in two stages. Yu et al. proposed to extract feature points using the SIFT for the preregistration of Satellite Pour l’Observation de la Terre-5/Thematic Mapper/ Quickbird images from different sensors [74]. In the fine registration stage, the Harris algorithm for corner point detection is enforced to detect the distinct corner, and the extracted point is matched by the NCC algorithm. Similarly, Lee used SURF to extract the feature point of a low-resolution image after Harr wavelet transformation, which is defined as rough registration [180]. Fine registration is the same as the approach proposed by Yu et al. Recently, Ye et al. utilized SR–SIFT to extract the feature point in the preregistration stage for distinct translation, rotation, and scale difference elimination. To further optimize registration, the Harris algorithm was employed to detect feature points in the reference and prealigned images and describe them by LSS for matching [36]. To register large, high-resolution remote sensing images, a coarse-to-fine strategy combining the Harris–Laplace detector with the SIFT descriptor has been proposed. After rough registration, a large image is divided into small, processable blocks for fine alignment [181]. Additionally, in a new twostep registration, the approximate spatial relationship is calculated with the deep features using a convolutional NN in the first step. Then, the previous result is adjusted based on the DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE extracted local features [182]. Another technique combines feature point and feature line methods for the registration of images covering low-texture scenes in the computer vision field [183]. Since low- and repeated-texture regions are common in remote sensing images, feature lines can be employed to supplement the number of feature points. Therefore, beside the combination of two geometric feature-based methods, the integration of different geometrical features has great potential for the high-precision alignment of remote sensing images [22]. Since combination schemes integrate the advantages of two or more registration approaches, they offer remarkable precision. Moreover, in general, preregistration provides a rough result that approximates the final alignment. With finetuning in the optimized registration stage, a high-precision registration result is finally acquired. This algorithm is suitable for remote sensing image registration with large spatial position differences. It is as time-consuming as two or more alignment strategies. SOFTWARE-BASED REGISTRATION Most reviews emphasize the ever-increasing number of image registration approaches that are improved on the basis of existing methods for registering larger and more complicated images [16], [184]. Few studies have evaluated the performance of software-embedded image registration modules and the packages/tools for image geometric registration [185]. Thus, in this section, we present some examples. The Earth Resources Data Analysis System (ERDAS), ENVI, PCI Geomatica, ER Mapper, and Arc Geographic Information System (GIS) are well-known software packages for remote sensing image processing that include registration modules. ER Mapper was acquired by ERDAS a few years ago. They integrate conventional manual and automatic registration programs. Concretely, ENVI could register two remote sensing images or align one image with a map covering the same scene. A user can extract tie points by observing similar objects lying on two images, such as corners of buildings, road intersections, inflection points of rivers, and so on. With a uniform point distribution, the parameters of a specified transformation model can be estimated. There are some general geometric mapping functions, including affine, polynomial, and triangulation transformation models. Geometric mapping is generally conducted by an expert and is time-consuming and tedious. It is difficult to avoid subjective factors while extracting tie points, especially when registering WFV images that require more time than general image registration. To liberate the productive forces and improve the registration efficiency, the automatic alignment technique is also put into ENVI. We should point out the reference and sensed images, respectively. After setting the area-based matching parameters, the tie point for transformation model construction is automatically extracted; soon, the aligned image is obtained. Neither the manually extracted tie point nor the automatically acquired point in ENVI is sufficiently accurate. For example, the coordinates of the extracted feature point are (157.05, 171), 131
which may suggest the neighborhood of the real corner. Under this circumstance, the calculated geometric spatial relationship is not as precise as it could be. The obtained registration result is usually worse than expected, especially for high-resolution remote sensing images with inconsistent local deformation. ERDAS was developed by the ERDAS Corporation, in the United States. Compared with ENVI, it can produce tie points with higher location accuracies [for instance, the coordinate of the extracted feature point is (385.776, 75.161), which has more decimal places] to generate precise mapping functions between reference and sensed images that approximate real geometric relations. Additionally, there are abundant transformation models, such as linear rubber sheeting, nonlinear rubber sheeting, and the direct linear transform. Elevation data are introduced into the registration to generate the highprecision alignment of mountainous remote sensing images, even using the digital terrain model (DTM). Furthermore, the region and interval of the selected tie point can be set manually in the “AutoSync” module. To acquire a high-precision registration result, the elevation data (DEM or DTM) should be input at the same time as the image to be registered. If higher-spatial-resolution elevation data were included in ERDAS, the corresponding information would be automatically extracted when an image’s geographic information was identified to register the input image. Image registration can also be conducted in ArcGIS, although most researchers would probably utilize the software to solve problems with the GIS, such as spatial analysis. PCI Geomatica prefers to produce orthophoto and fusion images, rather than registering remote sensing images. However, both ArcGIS and PCI Geomatica contain an image registration module. The steps for alignment processing are similar to those for the aforementioned software, including manual registration and automatic operation. Some different transformation models, such as spline, similarity polynomial, and projective transformations, are used to achieve the high-precision registration of complicated remote sensing images. However, sometimes the result is unsatisfactory for further applications, as the tie points are not uniformly distributed and their number is small. Pixel Information Expert is a new generation of remote sensing image processing software that was developed by Beijing Aerospace Hongtu Information Technology. It can handle the dislocation of multisource, heterogeneous remote sensing images since it integrates a novel algorithm with a focus on multimodal remote sensing image registration. It can be tested free for 30 days. In addition, copyrighted geometric registration software, such as the Hyperspectral Image Processing and Analysis System, GeoImager, Titan Image, and so forth, were generated by the Institute of Remote Sensing, Chinese Academy of Sciences. Because high-resolution image registration is an important task in remote sensing image processing, much emphasis has been placed on it. To extract dense tie points representing local geometric relationships, SURF and an adaptive binning SIFT descriptor have been combined [186]. With the guidance 132 of the local transformation model, an accurate registration result is obtained. The MATLAB code for the algorithm is provided, with experimental data, at https://www.researchgate .net/publication/320354469_HRImReg. The code is encrypted, and the parameters cannot be adjusted. It can be used only for comparative experiments to evaluate a proposed approach. When doing simulation experiments to assess a feature point detector or to evaluate a mismatched elimination approach with real data, the progressive sparse spatial consensus algorithm can be employed [187]. The code, with experimental data, is publicly available at https://github.com/ jiayi-ma?tab=repositories. It has been tested on photographs from the computer vision field. To apply it to remote sensing images, some improvements are needed. Beyond these, there are many commercial and open-source software packages/ tools for geometric registration. There are also different points of view, which should be discussed in depth in the future as more resources become available. However, an evaluation of registration approaches should be conducted, as well, whenever an aligned image is generated from software or a proposed method. EVALUATION OF IMAGE REGISTRATION ACCURACY For the spatial alignment of remote sensing images, it is highly desirable to provide users with an estimate of how accurate the registration actually is. Accuracy evaluation is a nontrivial problem that is present in all literature on remote sensing image registration. We have identified three aspects to measuring the registration accuracy on the basis of different considerations, including tie point identification, the transformation model performance, and the alignment error. In this section, we review basic approaches for alignment assessment. ACCURACY OF TIE POINTS The quality and quantity of tie points are important to guarantee high-precision image registration. The number of redundant tie points, in addition to the elementary computation of the specified transformation model, is essential information since we generally use as many tie points as possible to calculate the parameters of the mapping function for alignment. Furthermore, we must allow for a residual (Tx i, Ty i) for the ith extracted feature point compared with the origin of the image [188]. If there are N tie points, the root-mean-square error (RMSE) can be estimated as follows: RMS tp = N 1 ((Tx i) 2 + (Ty i) 2) .(4) N i/ =1 To enable general comparison, the RMSE should be computed across the normalized (to the pixel size) residuals. Additionally, the bad point proportion should be calculated to evaluate the extracted feature point. This is the number of residuals that lie above a certain threshold multiplied by the ellipse formed by the pixel size. Besides the mentioned criteria, the distribution of tie points is attracting increased attention. To design a uniform distribution of tie points, some papers have proposed to extract feature points within a specified IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
subregion [30]. A detection approach is employed to extract the specified number of feature points. Tie points affect the registration accuracy but are not the sole influencer. TRANSFORMATION MODEL PERFORMANCE The transformation model abstractly represents the geometric mapping function from a sensed image to a reference image. The actual between-image geometric distortion is difficult to obtain without prior information, and the estimated transformation approximates the real geometric relationship between images. One part of the N pairs of tie points is taken for mapping function estimation through the least-squares method, assuming N matched feature points. The left part in the sensed image is employed as the test point to be transformed into the reference image system [188]. The distance between the transformed coordinate and the corresponding point in the reference image is calculated as the residual, the mean of which is a representation of the estimated transformation model: RMS N - te = 1 N-T N-T / ((x - Hx') 2 + (y - Hy') 2),(5) j=1 where H denotes the estimated transformation model by T pairs of tie points, (x, y) and (x', y'), which represent the corresponding points in the reference and sensed images, respectively. Furthermore, a | 2 goodness-of-fit test may be applied [188] to analyze whether the residuals are equally distributed across all quadrants. However, “overfitting” may yield zero error for a mapping model with sufficient degrees of freedom; this is a well-known phenomenon in numerical analysis. Under this circumstance, the registration results may not be optimal. ALIGNMENT ERROR The oldest method for estimating registration accuracy is visual assessment by a domain expert, which is still in use and remains the most effective technique, although it cannot be quantified [16], [188]. At present, this is performed using professional software, such as ENVI and ArcGIS, with shutter tools. Similarity metrics in area-based registration, such as MI, NMI, CC, and so on, are frequently employed to evaluate alignment accuracy [59]. The indicators are easily influenced changes in the information with development and differences in radiation. To quantitatively present the alignment error, the RMSE is calculated using feature points manually extracted by a specialist employing (4) [85]. Since image registration aims to achieve the relative spatial alignment of two different images, there is no gold standard reference image with which to evaluate the registration accuracy. When evaluating outcomes according to at least three criteria, the most indicative results point to the best registration, as different assessments have their own advantages and disadvantages. FUTURE TRENDS There has been a large number of independent studies on remote sensing image registration, and much effort has been DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE put into constructing robust feature descriptors and eliminating mismatched features. With the development of sensor technology and application requirements, some novel opportunities and challenges must be addressed for remote sensing image registration. To us, it seems likely that the future of this field will include accelerated, combined, heterogenous, cross-scale, and smart remote sensing image registration techniques, which are introduced in detail in the following. ACCELERATED REMOTE SENSING IMAGE REGISTRATION With the ongoing development of sensor technology, the spatial resolution of remote sensing images increases, resulting in a growing number of features with distinctive details. The huge number of features lengthens the distance to the real-time registration of remote sensing images, causing inefficiency when aligning large-scale images. Thus, constructing descriptors and matching the detected features is time-consuming for general images, especially WFV ones. As proposed in [52], to achieve real-time registration to the greatest extent possible, remote sensing image registration can be operated on a cloud platform based on finite-state chaotic compressed sensing theory. Similarly, cloud computing [91] and some hardware systems may also be effective for accelerating image registration. At present, parallel computing [139] is the easiest path to implementation. Here, an image is divided into several subregions, and the image features in each one are simultaneously extracted, based on the same principles, on different parallel processors, as is the transformation model construction. The parallel commands are easy to implement on MATLAB and other platforms. COMBINED APPROACHES FOR IMAGE REGISTRATION With the development of imaging sensors, the resolution of remote sensing images has increased, and local deformation has become obvious. For example, the geometric distortion caused by terrain relief and high-rise buildings leads to inaccurate registration [36], introducing difficulties for remote sensing image applications. The reference and sensed images cover the plain and mountainous regions simultaneously in Figure 13(c). Calculating the displacements of corresponding pixels for spatial registration, the enlarged displacements in the specified rectangular regions are shown in Figure 13(d) and (e). The magnitude and direction of the displacements in the plain region are similar, but they differ in the mountainous region. Here, multistage registration with a global mapping function cannot exactly describe the spatial relationship between the reference and sensed images, and neither can the local transformation model. Given that displacements vary in different terrain regions, dividing images into a series of regions and registering with a specified approach may yield a high-precision alignment, indicating a combination of different techniques. Concretely, 133
(a) (b) 108° 45′ 0″ E 108° 50′ 0″ E Elevation (m) 1,722 N 34° 0′ 0″ N 34° 0′ 0″ N 400 108° 45′ 0″ E (d) (c) 108° 50′ 0″ E (e) FIGURE 13. The spatial position of corresponding pixels in a remote sensing image of complex terrain. (a) The reference image. (b) The sensed image. (c) The topographic image. (d) The displacements in the mountainous region marked with a yellow rectangle in (a) and (b). (e) The displacements in the plain region marked with a red rectangle in (a) and (b). 134 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
this transformation model is calculated with distinct tie features in the plain region. With the transformation model, rather than directly obtaining the aligned plain region, the displacement guiding pixels to alignment is estimated. In mountainous regions, the dense optical flow estimation borrowed from computer vision is utilized to acquire the displacement of each corresponding pixel. Then, the displacement fields from different terrain regions are mosaicked (e.g., using the inverse distance weighted function for uniform transitions in image stitching) to obtain a seamless displacement field of the entire image [177]. This is a creative combination of different registration approaches in a coordinated way, differing from the combined approaches mentioned in the “Registration Based on the Combination Method” section with the serial mode. Therefore, regional registration accommodating complex geometric relationships that vary with terrain differences may become a significant trend in remote sensing image registration, giving full play to the registration advantages of different approaches in various terrain regions. HETEROGENOUS AND CROSS-SCALE IMAGE REGISTRATION Heterogenous and cross-scale images collected all at once and at different times provide complementary information to improve our understanding of an entire scene during Earth observation or even during disaster rescues. However, such data usually have dramatically different spatial resolutions, intensities, noise, geometries, and so on, owing to different imaging principles. Some studies have focused on spatial registration, including optical image and SAR registration, optical image and infrared image registration, and satellite image and map registration [36], [57]. These works emphasized the robust construction of descriptors to resist intensity and noise differences and other influential factors. Large-scale differences between cross-scale images (which are much greater than four times the resolution difference between the panchromatic and multispectral images) introduce difficulties for extracting geometrical features from low-resolution images that are similar to those from high-resolution images. Thus, generating the tie features of cross-scale images for transformation model construction, even during high-precision registration, is difficult. Additionally, highefficiency heterogenous and cross-scale image registration remains an open problem that is worth researching in the near future. For a concrete example, the approximately realtime registration of optical and SAR images may offer an approach for analyzing disaster regions as quickly as possible for rescue purposes by means of registering and comparing images before and after an event. These applications are vital for rescue operations. Precise and efficient heterogeneous and cross-scale image registration is a mandatory prerequisite for high-precision, real-time applications. SMART REMOTE SENSING IMAGE REGISTRATION To register multiple remote sensing images, one simple and conventional idea is to align them frame by frame, namely, DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE by converting multiple image registration into pair-to-pair alignment. This process, learning from the simultaneous mosaicking of multiframe images, specifies a reference image connected to others and stitches other images to the reference one. Therefore, when images to be registered are read into the program, the coordinates of the four corners in each image are extracted. The reference image is determined by comparing these coordinates. As presented in Figure 14, images A, B, C, and D are simultaneously aligned with the reference image (marked in green) according to a general registration strategy, as there is overlap between two images. Unlike frameto-frame approaches, this technique needs to specify only the reference image, and the intermediate results do not output and input many times, which saves memory and improves computational efficiency. From our point of view, this is smart registration, which is particularly useful for WFV-image generation. However, when images overlap, a more intelligent approach needs to be developed. Moreover, images to be registered may have small overlapping areas. This overlap presents a challenge for high-accuracy alignment because a small number of geometric and intensity features is available for constructing the transformation model. This problem should be intelligently solved to register images with a low ratio of overlapping regions. Typically, these images are used to produce WFV images by means of stitching. Further solutions should be provided in the future. Therefore, the large-scale, complex distortion of high-resolution, heterogenous, and cross-scale remote sensing images must be a focus of future research. In this situation, the traditional single-registration approach may not meet requirements. For real time, high-precision registration, a combination of alignment approaches and high-performance computing is considered very promising. CONCLUSIONS In this article, we presented a comprehensive and quantitative summary of intensity-based, feature-based, and combined approaches to remote sensing image registration. Conventional methods and new applications of deep learning and optical A B Reference Image C D FIGURE 14. The spatial position of multiple images to be registered. 135
flow techniques were included. The performance of registration software packages and tools was analyzed. Additionally, novel registration evaluations were presented to support an effective assessment. The development of any approach aims to improve registration accuracy as much as possible because registration is an important step for preprocessing remote sensing images. Several such techniques have been developed, as recounted in this article. However, as resolutions increase, the problem of inconsistent local distortion caused by high-rise buildings and topographic relief has become apparent; this cannot be exactly described by the transformation model. Moreover, WFV images are an emerging trend in satellite image production, enabling a whole ROI to be contained within one image. This poses a challenge for real-time registration and memory for registration processing. Therefore, we believe that future research on remote sensing image registration will use accelerated registration, combined approaches for remote sensing image registration, heterogeneous and cross-scale image registration, and smart registration. Challenges remain, and considerable additional research is required. We perform this research with the advantage of lower entrance barriers than the TDOM generation. ACKNOWLEDGMENTS The work was supported by the National Natural Science Foundation of China (grants 41971303 and 41701394), Key Research and Development Program of Shaanxi Province (grant 2020NY-166), and Fundamental Research Funds for the Central Universities (grant GK202103143). The authors thank the editor-in-chief and associate editor of IEEE Geoscience and Remote Sensing Magazine as well as four anonymous reviewers for their advice for strengthening their manuscript. AUTHOR INFORMATION Ruitao Feng (feng-rt@snnu.edu.cn) is with the School of Geography and Tourism, Shaanxi Normal University, Xi’an, 710062, China. Huanfeng Shen (shenhf@whu.edu.cn) is with the School of Resource and Environment Science and the Collaborative Innovation Center for Geospatial Technology, Wuhan University, Wuhan, 430072, China. He is a Senior Member of IEEE. Jianjun Bai (bjj@snnu.edu.cn) is with the School of Geography and Tourism, Shaanxi Normal University, Xi’an, 710062, China. Xinghua Li (lixinghua5540@whu.edu.cn) is with the School of Remote Sensing and Information Engineering, Wuhan University, Wuhan, 430072, China. He is a Senior Member of IEEE. [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] REFERENCES [1] [2] 136 A. Wong and D. A. Clausi, “ARRSI: Automatic registration of remote-sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 45, no. 5, pp. 1483–1493, 2007. doi: 10.1109/TGRS.2007.892601. X. Li, N. Hui, H. Shen, Y. Fu, and L. Zhang, “A robust mosaicking procedure for high spatial resolution remote sensing images,” [16] [17] ISPRS J. Photogram. Remote Sens., vol. 109, pp. 108–125, Nov. 2015. doi: 10.1016/j.isprsjprs.2015.09.009. H. Shen, X. Meng, and L. Zhang, “An Integrated Framework for the Spatio-Temporal-Spectral Fusion of Remote Sensing Images,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 12, pp. 7135–7148, 2016. doi: 10.1109/TGRS.2016.2596290. Y. Lu, P. Wu, X. Ma, and X. Li, “Detection and prediction of land use/land cover change using spatiotemporal data fusion and the Cellular Automata–Markov model,” Environ. Monitoring Assessment, vol. 191, no. 2, p. 68, 2019. doi: 10.1007/s10661-019-7200-2. Z. Lv, T. Liu, C. Shi, J. A. Benediktsson, and H. Du, “Novel land cover change detection method based on k-means clustering and adaptive majority voting using bitemporal remote sensing images,” IEEE Access, vol. 7, pp. 34,425–34,437, Jan. 2019. doi: 10.1109/ACCESS.2019.2892648. C. Yuan, F. Wang, S. Wang, and Y. Zhou, “Accuracy evaluation of flood monitoring based on multiscale remote sensing for different landscapes,” Geomatics, Natural Hazards Risk, vol. 10, no. 1, pp. 1389–1411, 2019. doi: 10.1080/19475705.2019. 1580224. L. Yang and G. Cervone, “Analysis of remote sensing imagery for disaster assessment using deep learning: A case study of flooding event,” Soft Comput., vol. 23, no. 24, pp. 13,393– 13,408, 2019. doi: 10.1007/s00500-019-03878-8. K. Barbieux, “Pushbroom hyperspectral data orientation by combining feature-based and area-based co-registration techniques,” Remote Sens., vol. 10, no. 4, p. 645, 2018. doi: 10.3390/ rs10040645. Y. Jiang, J. Wang, L. Zhang, G. Zhang, X. Li, and J. Wu, “Geometric processing and accuracy verification of zhuhai-1 hyperspectral satellites,” Remote Sens., vol. 11, no. 9, p. 996, 2019. doi: 10.3390/rs11090996. I. Aicardi, F. Nex, M. Gerke, and A. M. Lingua, “An image-based approach for the co-registration of multi-temporal UAV image datasets,” Remote Sens., vol. 8, no. 9, p. 779, 2016. doi: 10.3390/rs8090779. F. P. M. Oliveira and J. M. R. S. Tavares, “Medical image registration: A review,” Comput. Methods Biomech. Biomed. Eng., vol. 17, no. 2, pp. 73–93, 2014. doi: 10.1080/10255842.2012.670855. A. Sotiras, C. Davatzikos, and N. Paragios, “Deformable medical image registration: A survey,” IEEE Trans. Med. Imag., vol. 32, no. 7, pp. 1153–1190, 2013. doi: 10.1109/TMI.2013.2265603. M. A. Viergever, J. B. A. Maintz, S. Klein, K. Murphy, M. Staring, and J. P. W. Pluim, “A survey of medical image registration,” Med. Image Anal., vol. 33, pp. 140–144, Oct. 2016. doi: 10.1016/j.media.2016.06.030. G. Haskins, U. Kruger, and P. Yan, “Deep learning in medical image registration: A survey,” Mach. Vis. Appl., vol. 31, nos. 1–2, p. 8, 2020. doi: 10.1007/s00138-020-01060-x. L. G. Brown, “A survey of image registration techniques,” ACM Comput. Surv., vol. 24, no. 4, pp. 325–376, 1992. doi: 10.1145/146370.146374. B. Zitová and J. Flusser, “Image registration methods: A survey,” Image Vis. Comput., vol. 21, no. 11, pp. 977–1000, 2003. doi: 10.1016/S0262-8856(03)00137-9. M. Deshmukh and U. Bhosle, “A survey of image registration,” Int. J. Image Process., vol. 5, no. 3, p. 245, 2011. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
[18] Z. Xiong and Y. Zhang, “A critical review of image registration methods,” Int. J. Image Data Fusion, vol. 1, no. 2, pp. 137–158, 2010. doi: 10.1080/19479831003802790. [19] M. V. Wyawahare, P. M. Patil, and H. K. Abhyankar, “Image registration techniques: An overview,” Int. J. Signal Process., Image Process. Pattern Recognit., vol. 2, no. 3, pp. 11–28, 2009. [20] C. Dalmiya and V. Dharun, “A survey of registration techniques in remote sensing images,” Indian J. Sci. Technol., vol. 8, no. 26, pp. 1–7, 2015. doi: 10.17485/ijst/2015/v8i26/81048. [21] R. M. Ezzeldeen, H. H. Ramadan, T. M. Nazmy, M. A. Yehia, and M. S. Abdel-Wahab, “Comparative study for image registration techniques of remote sensing images,” Egyptian J. Remote Sens. Space Sci., vol. 13, no. 1, pp. 31–36, 2010. doi: 10.1016/j. ejrs.2010.07.004. [22] M. P. S. Tondewad and M. M. P. Dale, “Remote sensing image registration methodology: Review and discussion,” Proc. Comput. Sci., vol. 171, pp. 2390–2399, June 2020. doi: 10.1016/j. procs.2020.04.259. [23] P. E. Anuta, “Spatial registration of multispectral and multitemporal digital imagery using fast Fourier transform techniques,” IEEE Trans. Geosci. Electron., vol. 8, no. 4, pp. 353–368, 1970. doi: 10.1109/TGE.1970.271435. [24] X. Xu, X. Li, X. Liu, H. Shen, and Q. Shi, “Multimodal registration of remotely sensed images based on Jeffrey’s divergence,” ISPRS J. Photogram. Remote Sens., vol. 122, pp. 97–115, Dec. 2016. doi: 10.1016/j.isprsjprs.2016.10.005. [25] J. Ma, H. Zhou, J. Zhao, Y. Gao, J. Jiang, and J. Tian, “Robust feature matching for remote sensing image registration via locally linear transforming,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 12, pp. 6469–6481, 2015. doi: 10.1109/ TGRS.2015.2441954. [26] N. Hanaizumi and S. Fujimur, “An automated method for registration of satellite remote sensing images,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 1993, pp. 1348–1350. doi: 10.1109/IGARSS.1993.322087. [27] W. F. Webber, “Techniques for image registration,” in Proc. LARS Symp., West Lafayette, IN, 1973, pp. 1–7. [28] D. I. Barnea and H. F. Silverman, “A class of algorithms for fast digital image registration,” IEEE Trans. Comput., vol. C-21, no. 2, pp. 179–186, 1972. doi: 10.1109/TC.1972.5008923. [29] S. i. Kaneko, Y. Satoh, and S. Igarashi, “Using selective correlation coefficient for robust image registration,” Pattern Recognit., vol. 36, no. 5, pp. 1165–1173, 2003. doi: 10.1016/S00313203(02)00081-X. [30] H. Gonçalves, J. A. Gonçalves, L. Corte-Real, and A. C. Teodoro, “CHAIR: Automatic image registration based on correlation and Hough transform,” Int. J. Remote Sens., vol. 33, no. 24, pp. 7936– 7968, Dec. 20, 2012. doi: 10.1080/01431161.2012.701345. [31] J. Inglada and A. Giros, “On the possibility of automatic multisensor image registration,” IEEE Trans. Geosci. Remote Sens., vol. 42, no. 10, pp. 2104–2120, 2004. doi: 10.1109/TGRS.2004. 835294. [32] J. Ma, J. C. Chan, and F. Canters, “Fully automatic subpixel image registration of multiangle CHRIS/Proba data,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 7, pp. 2829–2839, 2010. doi: 10.1109/TGRS.2010.2042813. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE [33] Y. Wu, W. Ma, Q. Su, S. Liu, and Y. Ge, “Remote sensing image registration based on local structural information and global constraint,” J. Appl. Remote Sens., vol. 13, no. 1, p. 1, 2019. doi: 10.1117/1.JRS.13.016518. [34] G. Wolberg and S. Zokai, “Image registration for perspective deformation recovery,” in Proc. SPIE, Automatic Target Recognit. X, Orlando, FL, 2000, vol. 4050, pp. 259–270. [35] D. P. Huttenlocher, G. A. Klanderman, and W. J. Rucklidge, “Comparing images using the Hausdorff distance,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 15, no. 9, pp. 850–863, 1993. doi: 10.1109/34.232073. [36] Y. Ye and J. Shan, “A local descriptor based registration method for multispectral remote sensing images with non-linear intensity differences,” ISPRS J. Photogram. Remote Sens., vol. 90, pp. 83–95, 2014. doi: 10.1016/j.isprsjprs.2014.01.009. [37] Y. Hel-Or, H. Hel-Or, and E. David, “Fast template matching in non-linear tone-mapped images,” in Proc. Int. Conf. Comput. Vision (ICCV), Barcelona, Spain, 2011, pp. 1355–1362. doi: 10.1109/ICCV.2011.6126389. [38] Y. Bentoutou, N. Taleb, K. Kpalma, and J. Ronsin, “An automatic image registration for applications in remote sensing,” IEEE Trans. Geosci. Remote Sens., vol. 43, no. 9, pp. 2127–2137, 2005. doi: 10.1109/TGRS.2005.853187. [39] K. Taejung and I. Yong-Jo, “Automatic satellite image registration by combination of matching and random sample consensus,” IEEE Trans. Geosci. Remote Sens., vol. 41, no. 5, pp. 1111– 1117, 2003. doi: 10.1109/TGRS.2003.811994. [40] J. P. Kern and M. S. Pattichis, “Robust multispectral image registration using mutual-information models,” IEEE Trans. Geosci. Remote Sens., vol. 45, no. 5, pp. 1494–1505, 2007. doi: 10.1109/ TGRS.2007.892599. [41] H. m. Chen, M. K. Arora, and P. K. Varshney, “Mutual information-based image registration for remote sensing data,” Int. J. Remote Sens., vol. 24, no. 18, pp. 3701–3706, 2003. doi: 10.1080/0143116031000117047. [42] A. A. Cole-Rhodes, K. L. Johnson, J. LeMoigne, and I. Zavorin, “Multiresolution registration of remote sensing imagery by optimization of mutual information using a stochastic gradient,” IEEE Trans. Image Process., vol. 12, no. 12, pp. 1495–1511, 2003. doi: 10.1109/TIP.2003.819237. [43] D. Brunner, G. Lemoine, and L. Bruzzone, “Earthquake Damage assessment of buildings using VHR optical and SAR imagery,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 5, pp. 2403– 2420, 2010. doi: 10.1109/TGRS.2009.2038274. [44] X. Wang, W. Yang, A. Wheaton, N. Cooley, and B. Moran, “Efficient registration of optical and IR images for automatic plant water stress assessment,” Comput. Electron. Agriculture, vol. 74, no. 2, pp. 230–237, 2010. doi: 10.1016/j.compag.2010. 08.004. [45] S. Chen, X. Li, L. Zhao, and H. Yang, “Medium-low resolution multisource remote sensing image registration based on SIFT and robust regional mutual information,” Int. J. Remote Sens., vol. 39, no. 10, pp. 3215–3242, 2018. doi: 10.1080/01431161. 2018.1437295. [46] L. Y. Zhao, B. Y. Lü, X. R. Li, and S. H. Chen, “Multi-source remote sensing image registration based on scale-invariant 137
[47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] 138 feature transform and optimization of regional mutual information,” Acta Phys. Sin., vol. 64, no. 12, pp. 124204, 1-11), 2015. G. Hermosillo, C. Chefd’Hotel, and O. Faugeras, “Variational methods for multimodal image matching,” Int. J. Comput. Vis., vol. 50, no. 3, pp. 329–343, Dec. 1, 2002. doi: 10.1023/ A:1020830525823. R. N. Bracewell and R. N. Bracewell, The Fourier Transform and Its Applications. New York: McGraw-Hill, 1986. H. Foroosh, J. B. Zerubia, and M. Berthod, “Extension of phase correlation to subpixel registration,” IEEE Trans. Image Process., vol. 11, no. 3, pp. 188–200, Mar. 2002. doi: 10.1109/83.988953. X. Wan, J. G. Liu, and H. Yan, “The illumination robustness of phase correlation for image alignment,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 10, pp. 5746–5759, 2015. doi: 10.1109/ TGRS.2015.2429740. X. Wan, J. Liu, H. Yan, and G. L. K. Morgan, “Illumination-invariant image matching for autonomous UAV localisation based on optical sensing,” ISPRS J. Photogram. Remote Sens., vol. 119, pp. 198–213, Sept. 2016. doi: 10.1016/j.isprsjprs.2016.05.016. Z. Liu, L. Wang, X. Wang, X. Shen, and L. Li, “Secure remote sensing image registration based on compressed sensing in cloud setting,” IEEE Access, vol. 7, pp. 36,516–36,526, Mar. 2019. doi: 10.1109/ACCESS.2019.2903826. M. Xu and P. K. Varshney, “A subspace method for Fourierbased image registration,” IEEE Geosci. Remote Sens. Lett., vol. 6, no. 3, pp. 491–494, 2009. doi: 10.1109/LGRS.2009.2018705. L. Lucchese, S. Leorin, and G. M. Cortelazzo, “Estimation of two-dimensional affine transformations through polar curve matching and its application to image mosaicking and remotesensing data registration,” IEEE Trans. Image Process., vol. 15, no. 10, pp. 3008–3019, 2006. doi: 10.1109/TIP.2006.877519. P. Bao and D. Xu, “Complex wavelet-based image mosaics using edge-preserving visual perception modeling,” Comput. Graph., vol. 23, no. 3, pp. 309–321, 1999. doi: 10.1016/S00978493(99)00040-0. H. Gang and Z. Yun, “Combination of feature-based and area-based image registration technique for high resolution remote sensing image,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2007, pp. 377–380. doi: 10.1109/ IGARSS.2007.4422809. Y. Ye, J. Shan, L. Bruzzone, and L. Shen, “Robust registration of multimodal remote sensing images based on structural similarity,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 5, pp. 2941–2958, 2017. doi: 10.1109/TGRS.2017.2656380. H. Yang, X. Li, L. Zhao, and S. Chen, “A novel coarse-to-fine scheme for remote sensing image registration based on SIFT and phase correlation,” Remote Sens., vol. 11, no. 15, p. 1833, 2019. doi: 10.3390/rs11151833. Y. Han, F. Bovolo, and L. Bruzzone, “An approach to fine coregistration between very high resolution multispectral images based on registration noise distribution,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 12, pp. 6650–6662, 2015. doi: 10.1109/ TGRS.2015.2445632. A. Plyer, E. Colin-Koeniguer, and F. Weissgerber, “A new coregistration algorithm for recent applications on urban [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] SAR images,” IEEE Geosci. Remote Sens. Lett., vol. 12, no. 11, pp. 2198–2202, 2015. doi: 10.1109/LGRS.2015.2455071. G. Brigot, E. Colin-Koeniguer, A. Plyer, and F. Janez, “Adaptation and evaluation of an optical flow method applied to coregistration of forest remote sensing images,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 9, no. 7, pp. 2923–2939, 2016. doi: 10.1109/JSTARS.2016. 2578362. R. Feng, X. Li, and H. Shen, “Mountainous remote sensing images registration based on improved optical flow estimation,” ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci., vol. IV-2/ W5, pp. 479–484, June 2019. doi: 10.5194/isprs-annals-IV2-W5-479-2019. B. K. P. Horn and B. G. Schunck, “Determining optical flow,” Artif. Intell., vol. 17, nos. 1–3, pp. 185–203, 1981. doi: 10.1016/0004-3702(81)90024-2. J. J. Gibson, The Perception of the Visual World. Oxford: Houghton Mifflin, 1950. B. D. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision,” in Proc. Imag. Understanding Workshop, 1981, pp. 121–130. Z. Tu et al., “A survey of variational and CNN-based optical flow techniques,” Signal Processing: Image Commun., vol. 72, pp. 9–24, Mar. 2019. doi: 10.1016/j.image.2018.12.002. M. J. Black and P. Anandan, “The robust estimation of multiple motions: Parametric and piecewise-smooth flow fields,” Comput. Vision Image Understand., vol. 63, no. 1, pp. 75–104, 1996. doi: 10.1006/cviu.1996.0006. C. Liu, J. Yuen, and A. Torralba, “SIFT Flow: Dense correspondence across scenes and its applications,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 5, pp. 978–994, 2011. doi: 10.1109/TPAMI.2010.147. T. Brox, A. Bruhn, N. Papenberg, and J. Weickert, “High accuracy optical flow estimation based on a theory for warping,” in Proc. Eur. Conf. Comput. Vision (ECCV), 2004, pp. 25–36. doi: 10.1007/978-3-540-24673-2_3. J.-Y. Xiong, Y.-P. Luo, and G.-R. Tang, “An improved optical flow method for image registration with large-scale movements,” Acta Autom. Sin., vol. 34, no. 7, pp. 760–764, 2008. doi: 10.3724/SP.J.1004.2008.00760. A. Plyer, G. Le Besnerais, and F. Champagnat, “Massively parallel Lucas Kanade optical flow for real-time video processing applications,” J. Real-Time Image Process., vol. 11, no. 4, pp. 713– 730, 2016. doi: 10.1007/s11554-014-0423-0. Y. Xiang, F. Wang, L. Wan, N. Jiao, and H. You, “OS-Flow: A robust algorithm for dense optical and SAR image registration,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 9, pp. 1–20, 2019. doi: 10.1109/TGRS.2019.2905585. C. Huo, C. Pan, L. Huo, and Z. Zhou, “Multilevel SIFT matching for large-size VHR image registration,” IEEE Geosci. Remote Sens. Lett., vol. 9, no. 2, pp. 171–175, 2012. doi: 10.1109/ LGRS.2011.2163491. L. Yu, D. Zhang, and E.-J. Holden, “A fast and fully automatic registration approach based on point features for multisource remote-sensing images,” Comput. Geosci., vol. 34, no. 7, pp. 838–848, 2008. doi: 10.1016/j.cageo.2007.10.005. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
[75] L. Hui, B. S. Manjunath, and S. K. Mitra, “A contour-based approach to multisensor image registration,” IEEE Trans. Image Process., vol. 4, no. 3, pp. 320–334, 1995. doi: 10.1109/83.366480. [76] H. Goncalves, L. Corte-Real, and J. A. Goncalves, “Automatic image registration through image segmentation and SIFT,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 7, pp. 2589–2600, 2011. doi: 10.1109/TGRS.2011.2109389. [77] H. P. Moravec, “Techniques towards automatic visual obstacle avoidance,” no. 2, p. 584, 1977. [Online]. Available: https://frc .ri.cmu.edu/~hpm/project.archive/robot.papers/1977/aip.txt [78] C. Harris and M. Stephens, “A combined corner and edge detector,” in Proc. Alvey Vision Conf., Manchester, U.K., 1988, vol. 15, pp. 147–151. [79] Y. Xiang, F. Wang, and H. You, “OS-SIFT: A robust SIFT-like algorithm for high-resolution optical-to-SAR image registration in suburban areas,” IEEE Trans. Geosci. Remote Sens., vol. 56, no. 6, pp. 3078–3090, 2018. doi: 10.1109/TGRS.2018.2790483. [80] I. Misra, S. M. Moorthi, D. Dhar, and R. Ramakrishnan, “An automatic satellite image registration technique based on Harris corner detection and Random Sample Consensus (RANSAC) outlier rejection model,” in 1st Int. Conf. on Recent Advances in Information Technology (RAIT), 2012, pp. 68–73. [81] S. M. Smith and J. M. Brady, “SUSAN–A new approach to low level image processing,” Int. J. Comput. Vis., vol. 23, no. 1, pp. 45–78, 1997. doi: 10.1023/A:1007963824710. [82] C. Leng, H. Zhang, B. Li, G. Cai, Z. Pei, and L. He, “Local feature descriptor for image matching: A survey,” IEEE Access, vol. 7, pp. 6424–6434, 2019. doi: 10.1109/ACCESS.2018.2888856. [83] W. He and X. Deng, “A modified SUSAN corner detection algorithm based on adaptive gradient threshold for remote sensing image,” in Proc. Int. Conf. Optoelectron. Image Process., 2010, vol. 1, pp. 40–43. [84] R. Feng, X. Li, W. Zou, and H. Shen, “Registration of multitemporal GF-1 remote sensing images with weighting perspective transformation model,” in Proc. IEEE Int. Conf. Image Process. (ICIP), 2017, pp. 2264–2268. doi: 10.1109/ ICIP.2017.8296685. [85] R. Feng, Q. Du, X. Li, and H. Shen, “Robust registration for remote sensing images by combining and localizing feature- and area-based methods,” ISPRS J. Photogram. Remote Sens., vol. 151, pp. 15–26, May 2019. doi: 10.1016/j.isprsjprs.2019.03.002. [86] Y. Duan, X. Huang, J. Xiong, Y. Zhang, and B. Wang, “A combined image matching method for Chinese optical satellite imagery,” Int. J. Digital Earth, vol. 9, no. 9, pp. 851–872, 2016. doi: 10.1080/17538947.2016.1151955. [87] P. K. Konugurthi, R. Kune, R. Nooka, and V. Sarma, “Autonomous ortho-rectification of very high resolution imagery using SIFT and genetic algorithm,” Photogram. Eng. Remote Sens., vol. 82, no. 5, pp. 377–388, 2016. doi: 10.14358/PERS.82.5.377. [88] Q. Li, G. Wang, J. Liu, and S. Chen, “Robust scale-invariant feature matching for remote sensing image registration,” IEEE Geosci. Remote Sens. Lett., vol. 6, no. 2, pp. 287–291, 2009. doi: 10.1109/LGRS.2008.2011751. [89] W. Ma et al., “Remote sensing image registration with modified SIFT and enhanced feature matching,” IEEE Geosci. ReDECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE mote Sens. Lett., vol. 14, no. 1, pp. 3–7, 2017. doi: 10.1109/ LGRS.2016.2600858. [90] M. Gong, S. Zhao, L. Jiao, D. Tian, and S. Wang, “A novel coarse-to-fine scheme for automatic image registration based on SIFT and mutual information,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 7, pp. 4328–4338, 2014. doi: 10.1109/ TGRS.2013.2281391. [91] C. A. Lee, S. D. Gasster, A. Plaza, C. Chang, and B. Huang, “Recent developments in high performance computing for remote sensing: A review,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 4, no. 3, pp. 508–527, 2011. doi: 10.1109/ JSTARS.2011.2162643. [92] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004. doi: 10.1023/B:VISI.0000029664.99615.94. [93] K. Mikolajczyk and C. Schmid, “A performance evaluation of local descriptors,” (in English), IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 10, pp. 1615–1630, Oct. 2005. doi: 10.1109/ TPAMI.2005.188. [94] K. Yan and R. Sukthankar, “PCA-SIFT: A more distinctive representation for local image descriptors,” in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognit. (CVPR), Washington, D. C., 2004, vol. 2, pp. 506–513. doi: 10.1109/CVPR.2004.1315206. [95] Y. Zheng, Z. Cao, and Y. Xiao, “Multi-spectral remote image registration based on SIFT,” Electron. Lett., vol. 44, no. 2, pp. 107–108, 2008. [96] J. Morel and G. Yu, “ASIFT: A new framework for fully affine invariant image comparison,” SIAM J. Imag. Sci., vol. 2, no. 2, pp. 438–469, 2009. doi: 10.1137/080732730. [97] A. Sedaghat, M. Mokhtarzade, and H. Ebadi, “Uniform robust scale-invariant feature matching for optical remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 11, pp. 4516–4527, 2011. doi: 10.1109/TGRS.2011.2144607. [98] A. Sedaghat and H. Ebadi, “Distinctive order based self-similarity descriptor for multi-sensor remote sensing image matching,” ISPRS J. Photogram. Remote Sens., vol. 108, pp. 62–71, Oct. 2015. doi: 10.1016/j.isprsjprs.2015.06.003. [99] H. Bay, T. Tuytelaars, and L. Van Gool, “SURF: Speeded up robust features,” in Proc. Eur. Conf. Comput. Vision (ECCV), Graz, Austria, 2006, pp. 404–417. [100] W. Yan, H. She, and Z. Yuan, “Robust registration of remote sensing image based on SURF and KCCA,” J. Indian Soc. Remote Sens., vol. 42, no. 2, pp. 291–299, 2014. doi: 10.1007/s12524-013-0324-x. [101] X. Yuan, S. Chen, W. Yuan, and Y. Cai, “Poor textural image tie point matching via graph theory,” ISPRS J. Photogram. Remote Sens., vol. 129, pp. 21–31, July 2017. doi: 10.1016/j.isprsjprs.2017.04.015. [102] R. Bouchiha and K. Besbes, “Automatic remote-sensing image registration using SURF,” Int. J. Comput. Theory Eng., vol. 5, no. 1, pp. 88–92, 2013. doi: 10.7763/IJCTE.2013.V5.653. [103] J. Chen et al., “WLD: A robust local image descriptor,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 9, pp. 1705–1720, 2010. doi: 10.1109/TPAMI.2009.155. [104] E. Rosten and T. Drummond, “Machine learning for highspeed corner detection,” in Proc. Eur. Conf. Comput. Vision (ECCV), Graz, Austria, 2006, pp. 430–443. 139
[105] M. Calonder, V. Lepetit, C. Strecha, and P. Fua, “BRIEF: Binary robust independent elementary features,” in Proc. Eur. Conf. Comput. Vision (ECCV), Crete, Greece, 2010, pp. 778–792. [106] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “ORB: An efficient alternative to SIFT or SURF,” in Proc. Int. Conf. Comput. Vision (ICCV), Barcelona, Spain, 2011, pp. 2564–2571. [107] D. Ma and H. Lai, “Remote sensing image matching based improved ORB in NSCT domain,” J. Indian soc. Remote Sens., vol. 47, no. 5, pp. 801–807, 2019. doi: 10.1007/s12524-019-00958-y. [108] P. F. Alcantarilla, A. Bartoli, and A. J. Davison, “KAZE Features,” in Proc. Eur. Conf. Comput. Vision (ECCV), Florence, Italy, 2012, pp. 214–227. [109] P. Alcantarilla, J. Nuevo, and A. Bartoli, “Fast explicit diffusion for accelerated features in nonlinear scale spaces,” in Proc. Brit. Mach. Vision Conf. (BMVC), Bristol, U.K., 2013, pp. 1–11. [110] Y. Ye, M. Wang, S. Hao, and Q. Zhu, “A novel keypoint detector combining corners and blobs for remote sensing image registration,” IEEE Geosci. Remote Sens. Lett., vol. 18, no. 3, pp. 451–455, Mar. 31, 2020. doi: 10.1109/LGRS.2020. 2980620. [111] X. Liu, Y. Ai, J. Zhang, and Z. Wang, “A novel affine and contrast invariant descriptor for infrared and visible image registration,” Remote Sens., vol. 10, no. 4, p. 658, 2018. doi: 10.3390/ rs10040658. [112] Z. Ye et al., “Robust fine registration of multisensor remote sensing images based on enhanced subpixel phase correlation,” Sensors, vol. 20, no. 15, p. 4338, Aug. 4, 2020. doi: 10.3390/ s20154338. [113] Y. C. Hsieh, D. M. McKeown, and F. P. Perlant, “Performance evaluation of scene registration and stereo matching for cartographic feature extraction,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 14, no. 2, pp. 214–238, 1992. doi: 10.1109/34.121790. [114] S. Dongseok, J. K. Pollard, and J. Muller, “Accurate geometric correction of ATSR images,” IEEE Trans. Geosci. Remote Sens., vol. 35, no. 4, pp. 997–1006, 1997. doi: 10.1109/36.602542. [115] J. Inglada and F. Adragna, “Automatic multi-sensor image registration by edge matching using genetic algorithms,” in Proc. Int. Geosci. Remote Sens. Symp. (IGARSS), Sydney, NSW, Australia, 2001, vol. 5, pp. 2313–2315. [116] W. Shi and A. Shaker, “The Line‐Based Transformation Model (LBTM) for image‐to‐image registration of high‐resolution satellite image data,” Int. J. Remote Sens., vol. 27, no. 14, pp. 3001– 3012, 2006. doi: 10.1080/01431160500486716. [117] T.-Z. Xiang, G.-S. Xia, X. Bai, and L. Zhang, “Image stitching by line-guided local warping with global similarity constraint,” Pattern Recognit., vol. 83, pp. 481–497, Nov. 2018. doi: 10.1016/j. patcog.2018.06.013. [118] C. Zhao and A. A. Goshtasby, “Registration of multitemporal aerial optical images using line features,” ISPRS J. Photogram. Remote Sens., vol. 117, pp. 149–160, July 2016. doi: 10.1016/j. isprsjprs.2016.04.002. [119] C. Li and W. Shi, “The generalized-line-based iterative transformation model for imagery registration and rectification,” IEEE Geosci. Remote Sens. Lett., vol. 11, no. 8, pp. 1394–1398, 2014. doi: 10.1109/LGRS.2013.2293844. 140 [120] A. O. Ok, J. D. Wegner, C. Heipke, F. Rottensteiner, U. Soergel, and V. Toprak, “Matching of straight line segments from aerial stereo images of urban areas,” ISPRS J. Photogram. Remote Sens., vol. 74, pp. 133–152, Nov. 2012. doi: 10.1016/j.isprsjprs.2012.09.003. [121] J. Canny, “A computational approach to edge detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-8, no. 6, pp. 679– 698, 1986. doi: 10.1109/TPAMI.1986.4767851. [122] D. Marr and E. Hildreth, “Theory of edge detection,” Proc. Roy. Soc. Ser. B-Biol. Sci., vol. 207, no. 1167, pp. 187–217, 1980. doi: 10.1098/rspb.1980.0020. [123] R. G. v. Gioi, J. Jakubowicz, J. Morel, and G. Randall, “LSD: A fast line segment detector with a false detection control,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 4, pp. 722–732, 2010. doi: 10.1109/TPAMI.2008.300. [124] C. Akinlar and C. Topal, “EDLines: A real-time line segment detector with a false detection control,” Pattern Recog. Lett., vol. 32, no. 13, pp. 1633–1642, 2011. doi: 10.1016/j.patrec.2011.06.001. [125] A. Goshtasby and G. C. Stockman, “Point pattern matching using convex hull edges,” IEEE Trans. Syst., Man, Cybern., vol. SMC15, no. 5, pp. 631–637, 1985. doi: 10.1109/TSMC.1985.6313439. [126] W. Dorigo, M. Hollaus, W. Wagner, and K. Schadauer, “An application-oriented automated approach for co-registration of forest inventory and airborne laser scanning data,” Int. J. Remote Sens., vol. 31, no. 5, pp. 1133–1153, 2010. doi: 10.1080/01431160903380581. [127] B. Sirmacek and C. Unsalan, “Urban-area and building detection using SIFT keypoints and graph theory,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 4, pp. 1156–1167, 2009. doi: 10.1109/ TGRS.2008.2008440. [128] J. Flusser and T. Suk, “A moment-based approach to registration of images with affine geometric distortion,” IEEE Trans. Geosci. Remote Sens., vol. 32, no. 2, pp. 382–387, 1994. doi: 10.1109/36.295052. [129] N. R. Pal and S. K. Pal, “A review on image segmentation techniques,” Pattern Recognit., vol. 26, no. 9, pp. 1277–1294, 1993. doi: 10.1016/0031-3203(93)90135-J. [130] D. Xiaolong and S. Khorram, “Development of a feature-based approach to automated image registration for multitemporal and multisensor remotely sensed imagery,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. Proc. Remote Sens.-A Sci. Vision Sustainable Develop. (IGARSS), 1997, vol. 1, pp. 243–245. [131] L. M. Fonseca and B. Manjunath, “Registration techniques for multisensor remotely sensed imagery,” Photogram. Eng. Remote Sensing (PERS), vol. 62, no. 9, pp. 1049–1056, 1996. [132] A. Goshtasby, G. C. Stockman, and C. V. Page, “A region-based approach to digital image registration with subpixel accuracy,” IEEE Trans. Geosci. Remote Sens., vol. GE-24, no. 3, pp. 390–399, 1986. doi: 10.1109/TGRS.1986.289597. [133] J. Ton and A. K. Jain, “Registering Landsat images by point matching,” IEEE Trans. Geosci. Remote Sens., vol. 27, no. 5, pp. 642–651, 1989. doi: 10.1109/TGRS.1989.35948. [134] Y. Chen, X. Zhang, Y. Zhang, S. J. Maybank, and Z. Fu, “Visible and infrared image registration based on region features and edginess,” Mach. Vis. Appl., vol. 29, no. 1, pp. 113–123, 2018. doi: 10.1007/s00138-017-0879-6. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
[135] A. Irani Rahaghi, U. Lemmin, D. Sage, and D. A. Barry, “Achieving high-resolution thermal imagery in low-contrast lake surface waters by aerial remote sensing and image registration,” Remote Sens. Environ., vol. 221, pp. 773–783, Feb. 2019. doi: 10.1016/j.rse.2018.12.018. [136] A. Li, X. Cheng, H. Guan, T. Feng, and Z. Guan, “Novel image registration method based on local structure constraints,” IEEE Geosci. Remote Sens. Lett., vol. 11, no. 9, pp. 1584–1588, 2014. doi: 10.1109/LGRS.2014.2305982. [137] S. Jiang and W. Jiang, “Hierarchical motion consistency constraint for efficient geometrical verification in UAV stereo image matching,” ISPRS J. Photogram. Remote Sens., vol. 142, pp. 222–242, Aug. 2018. doi: 10.1016/j.isprsjprs.2018. 06.009. [138] J. S. Beis and D. G. Lowe, “Shape indexing using approximate nearest-neighbour search in high-dimensional spaces,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), 1997, vol. 97, pp. 1000–1006. doi: 10.1109/ CVPR.1997.609451. [139] Y. Ma et al., “Remote sensing big data computing: Challenges and opportunities,” Future Gener. Comput. Syst., vol. 51, pp. 47– 60, Oct. 2015. doi: 10.1016/j.future.2014.10.029. [140] G. Stockman, S. Kopstein, and S. Benett, “Matching images to models for registration and object detection via clustering,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-4, no. 3, pp. 229–241, 1982. doi: 10.1109/TPAMI.1982.4767240. [141] G. Borgefors, “Hierarchical chamfer matching: A parametric edge matching algorithm,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 10, no. 6, pp. 849–865, 1988. doi: 10.1109/ 34.9107. [142] L. Livi and A. Rizzi, “The graph matching problem,” Pattern Anal. Appl., vol. 16, no. 3, pp. 253–283, 2013. doi: 10.1007/ s10044-012-0284-8. [143] L. Torresani, V. Kolmogorov, and C. Rother, “Feature correspondence via graph matching: models and global optimization,” in Proc. Eur. Conf. Comput. Vision (ECCV), Berlin, Heidelberg, 2008, pp. 596–609. [144] Z. Liu, J. An, and Y. Jing, “A simple and robust feature point matching algorithm based on restricted spatial order constraints for aerial image registration,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 2, pp. 514–527, 2012. doi: 10.1109/ TGRS.2011.2160645. [145] Z. Xiong and Y. Zhang, “A novel interest-point-matching algorithm for high-resolution satellite images,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 12, pp. 4189–4200, 2009. doi: 10.1109/ TGRS.2009.2023794. [146] H. Chang, G. Wu, and M. Chiang, “Remote sensing image registration based on modified SIFT and feature slope grouping,” IEEE Geosci. Remote Sens. Lett., vol. 16, no. 9, pp. 1363–1367, 2019. doi: 10.1109/LGRS.2019.2899123. [147] S. Zhili and Z. Jiaqi, “Image registration approach with scaleinvariant feature transform algorithm and tangent-crossingpoint feature,” J. Electron. Imag., vol. 29, no. 2, pp. 1–14, Mar. 2020. doi: 10.1117/1.JEI.29.2.023010. [148] M. A. Fischler and R. C. Bolles, “Random sample consensus: A paradigm for model fitting with applications to image analyDECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE sis and automated cartography,” Commun. ACM, vol. 24, no. 6, pp. 381–395, 1981. doi: 10.1145/358669.358692. [149] K. Zhang, X. Li, and J. Zhang, “A robust point-matching algorithm for remote sensing image registration,” IEEE Geosci. Remote Sens. Lett., vol. 11, no. 2, pp. 469–473, 2014. doi: 10.1109/ LGRS.2013.2267771. [150] Y. Wu, W. Ma, M. Gong, L. Su, and L. Jiao, “A novel pointmatching algorithm based on fast sample consensus for image registration,” IEEE Geosci. Remote Sens. Lett., vol. 12, no. 1, pp. 43–47, 2015. doi: 10.1109/LGRS.2014.2325970. [151] B. Li and H. Ye, “RSCJ: Robust sample consensus judging algorithm for remote sensing image registration,” IEEE Geosci. Remote Sens. Lett., vol. 9, no. 4, pp. 574–578, 2012. doi: 10.1109/ LGRS.2011.2175434. [152] H. Zhang et al., “Remote sensing image registration based on local affine constraint with circle descriptor,” IEEE Geosci. Remote Sens. Lett., early access, 2020. doi: 10.1109/ LGRS.2020.3027096. [153] F. Ye, Y. Su, H. Xiao, X. Zhao, and W. Min, “Remote sensing image registration using convolutional neural network features,” IEEE Geosci. Remote Sens. Lett., vol. 15, no. 2, pp. 232–236, 2018. doi: 10.1109/LGRS.2017.2781741. [154] C. Farabet, C. Couprie, L. Najman, and Y. LeCun, “Learning hierarchical features for scene labeling,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 1915–1929, 2013. doi: 10.1109/ TPAMI.2012.231. [155] W. Huang, L. Xiao, Z. Wei, H. Liu, and S. Tang, “A new pansharpening method with deep neural networks,” IEEE Geosci. Remote Sens. Lett., vol. 12, no. 5, pp. 1037–1041, 2015. doi: 10.1109/LGRS.2014.2376034. [156] Y. Xing, M. Wang, S. Yang, and L. Jiao, “Pan-sharpening via deep metric learning,” ISPRS J. Photogram. Remote Sens., vol. 145, pp. 165–183, Nov. 2018. doi: 10.1016/j.isprsjprs.2018. 01.016. [157] G. J. Scott, M. R. England, W. A. Starms, R. A. Marcum, and C. H. Davis, “Training deep convolutional neural networks for land–cover classification of high-resolution imagery,” IEEE Geosci. Remote Sens. Lett., vol. 14, no. 4, pp. 549–553, 2017. doi: 10.1109/LGRS.2017.2657778. [158] L. Ma, Y. Liu, X. Zhang, Y. Ye, G. Yin, and B. A. Johnson, “Deep learning in remote sensing applications: A meta-analysis and review,” ISPRS J. Photogram. Remote Sens., vol. 152, pp. 166–177, June 2019. doi: 10.1016/j.isprsjprs.2019.04.015. [159] Y. Liu, D. Minh Nguyen, N. Deligiannis, W. Ding, and A. Munteanu, “Hourglass-shapenetwork based semantic segmentation for high resolution aerial imagery,” Remote Sens., vol. 9, no. 6, p. 522, 2017. doi: 10.3390/rs9060522. [160] H. Zhang et al., “Registration of multimodal remote sensing image based on deep fully convolutional neural network,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 12, no. 8, pp. 3028–3042, 2019. doi: 10.1109/JSTARS.2019. 2916560. [161] N. Merkle, W. Luo, S. Auer, R. Müller, and R. Urtasun, “Exploiting deep matching and SAR data for the geo-localization accuracy improvement of optical satellite images,” Remote Sensing, vol. 9, no. 6, p. 586, 2017. doi: 10.3390/rs9060586. 141
[162] H. He, M. Chen, T. Chen, and D. Li, “Matching of remote sensing images with complex background variations via Siamese convolutional neural network,” Remote Sens., vol. 10, no. 3, p. 355, 2018. doi: 10.3390/rs10020355. [163] L. H. Hughes, M. Schmitt, L. Mou, Y. Wang, and X. X. Zhu, “Identifying corresponding patches in SAR and optical images with a pseudo-Siamese CNN,” IEEE Geosci. Remote Sens. Lett., vol. 15, no. 5, pp. 784–788, 2018. doi: 10.1109/LGRS.2018. 2799232. [164] S. Wang, D. Quan, X. Liang, M. Ning, Y. Guo, and L. Jiao, “A deep learning framework for remote sensing image registration,” ISPRS J. Photogram. Remote Sens., vol. 145, pp. 148–164, Nov. 2018. doi: 10.1016/j.isprsjprs.2017.12.012. [165] R. Fan, B. Hou, J. Liu, J. Yang, and Z. Hong, “Registration of multi-resolution remote sensing images based on L2-Siamese model,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 14, pp. 1–1, Nov. 19, 2020. doi: 10.1109/JSTARS.2020. 3038922. [166] N. Merkle, S. Auer, R. Müller, and P. Reinartz, “Exploring the potential of conditional adversarial networks for optical and SAR image matching,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 11, no. 6, pp. 1811–1820, 2018. doi: 10.1109/ JSTARS.2018.2803212. [167] H. L. Hughes, M. Schmitt, and X. X. Zhu, “Mining hard negative samples for SAR-optical image matching using generative adversarial networks,” Remote Sens., vol. 10, no. 10, p. 1552, 2018. doi: 10.3390/rs10101552. [168] J. Zhang, W. Ma, Y. Wu, and L. Jiao, “Multimodal remote sensing image registration based on image transfer and local features,” IEEE Geosci. Remote Sens. Lett., vol. 16, no. 8, pp. 1210– 1214, 2019. doi: 10.1109/LGRS.2019.2896341. [169] N. Girard, G. Charpiat, and Y. Tarabalka, “Aligning and updating cadaster maps with aerial images by multi-task, multi-resolution deep learning,” in Proc. Asian Conf. Comput. Vision (ACCV 2018), Cham, 2019, pp. 675–690. [170] L. Li, L. Han, M. Ding, Z. Liu, and H. Cao, “Remote sensing image registration based on deep learning regression model,” IEEE Geosci. Remote Sens. Lett., early access, 2020. doi: 10.1109/ LGRS.2020.3032439. [171] F. Liu, F. Bi, L. Chen, H. Shi, and W. Liu, “Feature-area optimization: A novel SAR image registration method,” IEEE Geosci. Remote Sens. Lett., vol. 13, no. 2, pp. 242–246, 2016. doi: 10.1109/LGRS.2015.2507982. [172] X. Huang, Y. Sun, D. Metaxas, F. Sauer, and C. Xu, “Hybrid image registration based on configural matching of scale-invariant salient region features,” in Proc. IEEE Comput. Society Conf. Comput. Vis. Pattern Recognit. (CVPR), Washington, D. C., 2004, pp. 167–167. doi: 10.1109/CVPR.2004.362. [173] G. Hong and Y. Zhang, “Combination of feature-based and area-based image registration technique for high resolution remote sensing image,” in Proc. Int. Geosci. Remote Sens. Symp. (IGARSS), Barcelona, Spain, 2007, pp. 377–380. [174] N. E. Mekky, F. E.-Z. Abou-Chadi, and S. Kishk, “Waveletbased image registration techniques: A study of performance,” Int. J. Comput. Sci. Netw. Security, vol. 11, no. 2, pp. 188–196, 2011. 142 [175] S. Suri, P. Schwind, P. Reinartz, and J. Uhl, “Combining mutual information and scale invariant feature transform for fast and robust multisensor SAR image registration,” in Proc. 75th Annu. ASPRS Conf., 2009. [176] Y. S. Heo, K. M. Lee, and S. U. Lee, “Joint depth map and color consistency estimation for stereo images with different illuminations and cameras,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 5, pp. 1094–1106, 2013. doi: 10.1109/ TPAMI.2012.167. [177] R. Feng, Q. Du, H. Shen, and X. Li, “Region-by-region registration combining feature-based and optical flow methods for remote sensing images,” Remote Sens., vol. 13, no. 8, p. 1475, 2021. doi: 10.3390/rs13081475. [178] C. Xing, J. Wang, and Y. Xu, “A method for building a mosaic with UAV images,” Int. J. Inform. Eng. Electron. Bus., vol. 2, no. 1, pp. 9–15, 2010. doi: 10.5815/ijieeb.2010.01.02. [179] Z. Kang, L. Zhang, S. Zlatanova, and J. Li, “An automatic mosaicking method for building facade texture mapping using a monocular close-range image sequence,” ISPRS J. Photogram. Remote Sens., vol. 65, no. 3, pp. 282–293, 2010. doi: 10.1016/j. isprsjprs.2009.11.003. [180] S. R. Lee, “A coarse-to-fine approach for remote-sensing image registration based on a local method,” Int. J. Smart Sens. Intell. Syst., vol. 3, no. 4, 2010. [181] K. Sharma and A. Goyal, “Very high resolution image registration based on two step Harris-Laplace detector and SIFT descriptor,” in 2013 4th Int. Conf. Comput., Commun. Netw. Technol. (ICCCNT), pp. 1–5. doi: 10.1109/ICCCNT.2013.6726632. [182] W. Ma, J. Zhang, Y. Wu, L. Jiao, H. Zhu, and W. Zhao, “A novel two-step registration method for remote sensing images based on deep and local features,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 7, pp. 4834–4843, 2019. doi: 10.1109/ TGRS.2019.2893310. [183] S. Li, L. Yuan, J. Sun, and L. Quan, “Dual-feature warpingbased motion model estimation,” in Proc. IEEE Int. Conf. Comput. Vision (ICCV), 2015, pp. 4283–4291. [184] S. Nag, “Image registration techniques: A survey,” Nov. 28, 2017, arXiv:1712.07540. [185] J. S. Bhatt and N. Padmanabhan, “Image Registration for meteorological applications: Development of a generalized software for sensor data registration at ISRO,” IEEE Geosci. Remote Sens. Mag. (replaces Newslett.), vol. 8, no. 4, pp. 23–37, 2020. doi: 10.1109/MGRS.2019.2949382. [186] A. Sedaghat and N. Mohammadi, “High-resolution image registration based on improved SURF detector and localized GTM,” Int. J. Remote Sens., vol. 40, no. 7, pp. 2576–2601, Apr. 2019. doi: 10.1080/01431161.2018.1528402. [187] Y. Ma, J. Wang, H. Xu, S. Zhang, X. Mei, and J. Ma, “Robust image feature matching via progressive sparse spatial consensus,” IEEE Access, vol. 5, pp. 24,568–24,579, Oct. 2017. doi: 10.1109/ ACCESS.2017.2768078. [188] H. Goncalves, J. A. Goncalves, and L. Corte-Real, “Measures for an objective evaluation of the geometric correction process quality,” IEEE Geosci. Remote Sens. Lett., vol. 6, no. 2, pp. 292– 296, 2009. doi: 10.1109/LGRS.2008.2012441. GRS IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
Deep Learning Meets SAR Concepts, models, pitfalls, and perspectives XIAO XIANG ZHU, SINA MONTAZERI, MOHSIN ALI, YUANSHENG HUA, YUANYUAN WANG, LICHAO MOU, YILEI SHI, FENG XU, AND RICHARD BAMLER D eep learning in remote sensing has received considerable international hype, but it is mostly limited to the evaluation of optical data. Although deep learning has been introduced in synthetic aperture radar (SAR) data processing, despite successful first attempts, its huge potential remains locked. In this article, we provide an introduction to the most relevant deep learning models and concepts, point out possible pitfalls by analyzing special characteristics of SAR data, review the state of the art of deep learning applied to SAR, summarize available benchmarks, and recommend some important future research directions. With this effort, we hope to stimulate more research in this inter- MOTIVATION In recent years, deep learning [1] has been developed at a dramatic pace, achieving great success in many fields. Unlike conventional algorithms, deep learning-based methods commonly employ hierarchical architectures, such as deep neural networks, to extract feature representations of raw data for numerous tasks. For instance, convolutional neural networks (CNNs) are capable of learning low- and high-level features from raw images with stacks of convolutional and pooling layers and then applying the extracted features to various computer vision tasks, such as large-scale image recognition [2], object detection [3], and semantic segmentation ©SHUTTERSTOCK.COM/WILLEM Digital Object Identifier 10.1109/MGRS.2020.3046356 Date of current version: 9 February 2021 esting yet underexploited field and to pave the way for the use of deep learning in big SAR data processing workflows. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 0274-6638/21©2021IEEE 143
sharp features when denoising. Furthermore, the development of SAR and optical image joint analysis has been motivated by the capacities of extracting features from both types of images. For applications in InSAR, only a few studies have been carried out, such as the work described in [10]. However, these algorithms neglect the special characteristics of phase and simply use an out-of-the-box deep learning-based model. Despite initial successes, and unlike the evaluation of optical data, the huge potential of deep learning in SAR and InSAR remains locked. For example, to the best knowledge of the authors, there is no example of deep learning in SAR that has been developed for the operational processing of big data or integrated into the production chain of any satellite mission. This article aims at stimulating more research in this interesting yet underexploited research field. [4]. Inspired by numerous successful applications in the computer vision community, the use of deep learning in remote sensing is now receiving significant attention [5]. As first attempts at SAR applications, deep learning-based methods have been adopted for a variety of tasks, including terrain surface classification [6], object detection [7], parameter inversion [8], despeckling [9], specific functions in interferometric SAR (InSAR) [10], and SAR–optical data fusion [11]. For terrain surface classification from SAR and polarimetric SAR (PolSAR) images, effective feature extraction is essential. These features are extracted based on expert domain knowledge and are usually applicable to a small number of cases and data sets. Deep learning feature extraction has, however, proved to overcome, to some degree, both of the aforementioned issues [6]. For SAR target detection, conventional approaches mainly rely on template matching, where specific templates are manually created [12] to classify different categories, and the use of traditional machine learning (ML) methods, such as support vector machines (SVMs) [13], [14]; in contrast, modern deep learning algorithms aim at applying deep CNNs to automatically extract discriminative features for target recognition [7]. For parameter inversion, deep learning models are employed to learn the latent mapping function from SAR images to estimated parameters, e.g., sea ice concentration [8]. Regarding despeckling, conventional methods often rely on artificial filters and may suffer from the improper elimination of INTRODUCTION TO RELEVANT DEEP LEARNING MODELS AND CONCEPTS In this section, we briefly review relevant deep learning algorithms that were originally proposed for visual data processing and that are widely used for state-of-the-art research into deep learning in SAR. In addition, we mention the latest deep learning developments, which are not yet widely applied to SAR but may help create the next generation of its algorithms. Figure 1 gives an overview of the deep learning models we discuss in this section. CNN (a) (d) (c) RNN (b) Deep Learning Generative Models Deep RL (f) GNN (e) (g) (h) (i) (j) FIGURE 1. A selection of relevant deep learning models. (a) The Visual Geometry Group Network. (Source: [15].) (b) The residual neural network (ResNet) block. (Source: [16].) (c) The U-Net. (Source: [17].) (d) The long short-term memory unit. (Source: [18].) (e) The variational autoencoder. (Source: [19].) (f) The recurrent neural network (RNN). (Source: [20].) (g) The generative adversarial network. (Source: [21].) (h) The convolutional graph neural network (GNN). (Source: [22].) (i) The recurrent GNN. (Source: [23].) (j) Neural architecture search using deep reinforcement learning (RL). (Source: [24].) ReLU: rectified linear unit; GRU: gated recurrent unit. 144 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
Before discussing deep learning algorithms, we would like to stress that the importance of high-quality benchmark data sets in deep learning research cannot be overstated. Especially in supervised learning, the knowledge that can be gained by a model is bounded by the information present in the training data set. For example, the Modified National Institute of Standards and Technology [25] data set played a key role in LeCun’s seminal paper about CNNs and gradient-based learning [26]. Similarly, there would be no AlexNet [27], the network that kickstarted the current deep learning renaissance, without the ImageNet [28] data set, which contains more than 14 million images and 22,000 classes. ImageNet has been such an important part of deep learning research that, more than 10 years after it was published, it is still used as a standard benchmark to evaluate the performance of CNNs for image classification. DEEP LEARNING MODELS The main principle of deep learning models is to encode input data into effective feature representations for target tasks. To exemplify how a deep learning framework works, we take the autoencoder as an example: it first maps input data to a latent representation via a trainable nonlinear mapping and then reconstructs inputs through reverse mapping. The reconstruction error is usually defined as the Euclidian distance between inputs and reconstructed inputs. Parameters of autoencoders are optimized by gradient descent-based optimizers, such as stochastic gradient descent, root mean square propagation [29], and Adam [30], during the backpropagation step. CNNs With the success of AlexNet in the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC), where the network scored a top-five test error of 15.3%, compared to the second-best test error of 26.2%, CNNs have attracted worldwide attention and are now used for many image understanding tasks, such as image classification, object detection, and semantic segmentation. AlexNet consists of five convolutional layers, three maximum pooling layers, and three fully connected layers. One of the key AlexNet innovations was the use of graphics processing units (GPUs), which made it possible to train such large networks with huge data sets without using supercomputers. In just two years, the Visual Geometry Group Network [2] overtook AlexNet in performance by achieving a 6.8%, top-five test error at ILSVRC-2014; the main difference was that it used only 3 × 3-sized convolutional kernels, which enabled it to have more channels and, in turn, capture more diverse features. The residual neural network (ResNet) [31], U-Net [32], and DenseNet [33] were the next major CNN architectures. Their main feature concerned the idea of connecting not only neighboring layers but any two layers in the network by using skip connections. This helped reduce information loss across networks, mitigated the problem of vanishing DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE gradients, and facilitated the design of deeper networks. U-Net is one of the most commonly used image segmentation networks. It has an autoencoder-based architecture that uses skip connections to concatenate features from the first layer to the last, the second to the second last, and so on; this way, it can get fine-grained information from the initial layers to the end layers. U-Net was initially proposed for medical image segmentation, where data labeling is a big problem. The authors employed heavy data augmentation techniques on input data, making it possible to learn from only a few hundred annotated samples. In ResNet, skip connections were incorporated within individual blocks, not across the whole network. Since its initial proposal, ResNet has undergone many architectural tweaks, and, even after five years, its variants are always among the top scorers on ImageNet. In DenseNet, all the layers were attached to all preceding layers, reducing the size of the network, albeit at the cost of memory usage. For a more detailed explanation of different CNN models, interested readers are referred to [34]. These CNN models have also proved their worth in SAR processing tasks; e.g., see [35]–[37]. For more examples and details of CNNs in SAR, see the “Recent Advances in Deep Learning Applied to SAR” section. RECURRENT NEURAL NETWORKS Besides CNNs, recurrent neural networks (RNNs) [38] are a major class of deep networks. Their main building blocks are recurrent units, which take the current input and output of the previous state as input. They provide state-of-the-art results for processing data of variable lengths, including text and time-series information. Their weights can be replaced with convolutional kernels for visual processing tasks, such as image captioning and predicting future frames/points in visual time series data. Long short-term memory (LSTM) [39] is one of the most popular RNN architectures: its cells can store values from any past instances and are not severely affected by the problem of gradient diminishing. As with any other time-series data from deep learning tool kits, RNNs are natural choices to process SAR time-series information; e.g., see [40]. GENERATIVE ADVERSARIAL NETWORKS Proposed by Ian Goodfellow et al. [41], generative adversarial networks (GANs) are among the most popular and exciting inventions in the field of deep learning. Based on gametheoretic principles, they consist of two networks called a generator and a discriminator. The generator’s objective is to learn a latent space through which it can create samples from the same distribution as the training data, while the discriminator tries to learn to distinguish whether a sample is from the generator or the training data. This very simple mechanism is responsible for most cutting-edge algorithms for various applications, e.g., generating artificial photorealistic images and videos, superresolution, and text-toimage synthesis. For example, in the SAR domain, GANs 145
have already been successfully used in cloud removal applications [42], [43]. See the “Recent Advances in Deep Learning Applied to SAR” section for more examples. SUPERVISED, UNSUPERVISED, AND REINFORCEMENT LEARNING [47], [48]. Recently, deep RL received particular attention and achieved popularity due to the success of Google Deep Mind’s AlphaGo [49], which defeated the Go board game world champion. This task was considered impossible for computers until a few years ago. RELEVANT DEEP LEARNING CONCEPTS SUPERVISED LEARNING Most popular deep learning models fall under the category of supervised deep learning; i.e., they need labeled data sets to learn objective functions. One big challenge of supervised learning is generalization, i.e., how well a trained model performs on test data. Therefore, it is vital that training data truly represent the actual data distribution so that they can handle all the unseen information. If a model fits well on training data and fails on test data, overfitting occurs. In the deep learning literature, there are several techniques that can be used to avoid overfitting, e.g., dropout [44]. UNSUPERVISED LEARNING Unsupervised learning refers to the class of algorithms where the training data do not contain labels. For instance, in classical data analysis, principal component analysis [45] can be used to reduce the data dimension, followed by a clustering algorithm to group similar data points. In deep learning generative models, autoencoders, variational autoencoders (VAEs), [46] and GANs [41] are some of the popular techniques that can be used for unsupervised learning. Their primary goal is to generate output data from the same distribution as the input data. Autoencoders consist of an encoder that finds a compressed, latent representation of the input and a decoder that translates a representation back to the original input. VAEs take autoencoders to the next level by learning a whole distribution instead of just a single representation at the end of the encoder; this, in turn, can be used by the decoder to generate the whole distribution of outputs. The trick to learning this distribution is to also acquire the variance along with the mean of the latent representation at the encoder–decoder meeting point and to add a Kullback–Leibler divergence-based loss term to the standard reconstruction loss function of the autoencoders. DEEP REINFORCEMENT LEARNING Reinforcement learning (RL) tries to mimic human learning behavior, i.e., taking actions and then adjusting them for the future, according to feedback from the environment. For example, young children learn to repeat or not repeat their actions based on the reaction of their parents. The RL model consists of an environment with states, actions to transition between those states, and a reward system for ending up in different states. The objective of the algorithm is to learn the best actions for given states using a feedback–reward system. In a classical RL algorithms function, approximators are used to calculate the probability of different actions in different states. Deep RL employs different types of neural networks to create these functions 146 AUTOMATIC ML Deep networks have many hyperparameters to choose from, for example, the number of layers, kernel sizes, types of optimizers, skip connections, and the like. There are billions of possible combinations of these parameters, and, given high computational time and energy costs, it is hard to find the best-performing network, even from among a few hundred candidates. In the case of deep learning, the objective of automatic ML (AutoML) is to find the most efficient and highperforming deep network for a given data set and task. The first major attempt in this field was by Zoph et al. [24], who used deep RL to find the optimum CNN for image classification. In the system, an RNN creates CNN architectures and, based on their classification results, proposes changes to them. This process continues to loop until the optimum architecture is found. This algorithm was better able to find competing networks than the state of the art, but it took more than 800 GPUs, which was unrealistic for practical application. Recently, there have been many developments in the AutoML field, and they have made it possible to perform such tasks in more intelligent and efficient ways. More details about the field of network architectural search can be found in [51]. Furthermore, AutoML has already been successfully applied to SAR for PolSAR classification [52]. The method shows great potential for segmentation and classification tasks, in particular. GEOMETRIC DEEP LEARNING: GRAPH NEURAL NETWORKS Except for well-structured image data, there is a large amount of unstructured data in real life, e.g., knowledge graphs and social networks, that cannot be directly processed by a deep CNN. Usually, these data are represented in the form of graphs, where each node indicates an entity and edges delineate mutual relations. As an approach to learning from unstructured data, geometric deep learning has been attracting increasing attention; the most common architecture is the graph neural network (GNN), which has also proved to be successful in dealing with structured data. Specifically, using the terminology of graphs, nodes of a graph can be regarded as feature descriptions of entities, and their edges are established by measuring their relations and distances and encoded in an adjacency matrix. Once a graph is constructed, messages can be propagated among nodes by simply performing matrix multiplication. Accordingly, [53] proposed graph convolutional networks (GCNs), which are characterized by utilizing graph convolutions; the authors of [45] accelerated the process. Moreover, the IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
units in recurrent GNNs (RGNNs) [23], [55] have been shown to obtain achievements in learning from graphs. The usefulness of GNNs in SAR is still to be properly explored, and [56] is one of the only attempts to do so. POSSIBLE PITFALLS To develop tailored deep learning architectures and prepare suitable training data sets for SAR and InSAR tasks, it is important to understand that SAR data are different from optical remote sensing data, not to mention images downloaded from the Internet. In this section, we discuss the special characteristics (and possible pitfalls) encountered while applying deep learning to SAR. What makes SAR data and SAR data processing by neural networks unique? SAR data are substantially different from optical imagery in many respects. The following points should be considered when transferring CNN experience and expertise from optical to SAR data: ◗◗ Dynamic range: Depending on the spatial resolution, the dynamic range of SAR images can be up to 90 dB (TerraSAR-X high-resolution spotlight data with a resolution of roughly 1 m). Moreover, the distribution is extremely asymmetric, with the majority of the pixels in the lowamplitude range (distributed scatterers) and a long tail representing bright discrete scatterers, in particular, in urban areas. Standard CNNs are not able to handle such dynamic ranges, and hence most approaches feature dynamic compression as a preprocessing step. In [57], the authors first take only amplitude values from zero to 255 and then subtract the mean values of each image. In [11] and [58], normalization is performed as a preprocessing step, which significantly compresses the dynamic range. ◗◗ Signal statistics: To retrieve features from SAR (amplitude and intensity) images, speckle statistics must be considered. Speckle is a multiplicative, rather than an additive, phenomenon. This has consequences: while the optimum estimator of the radar brightness of a homogeneous image patch under speckle is a simple moving-average operation (i.e., a convolution, such as in the additive noise case), other optimum detectors of edges and low-level features under additive Gaussian noise may no longer be optimum in the case of SAR. A popular example is Touzi’s constant false alarm rate edge detector [59] for SAR images, which uses the ratio of two spatial averages across adjacent windows. This operation cannot be emulated by the first layer of a standard CNN. Some studies use a logarithmic mapping of the SAR images prior to feeding them into a CNN [9], [60]. This turns speckle into an additive random variable and, as a side effect, reduces the dynamic range. But, still, a single convolutional layer can emulate only approximations to optimum SAR feature estimators. It could be valuable to supplement the original logarithmic SAR image by a few low-pass-filtered and logarithmized versions as input to a CNN. Another approach is DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE to apply a sophisticated speckle reduction filter before entering a CNN, e.g., nonlocal averaging [61]–[63]. ◗◗ Imaging geometry: SAR image coordinates’ range and azimuth are not arbitrary, such as east and north or x and y, but, rather, reflect the peculiarities of the image generation process. Layover always occurs at the near range of an object, and shadow always results at the far range. That means data augmentation by SAR image rotation would lead to nonsense imagery that would never be generated by a SAR. ◗◗ The complex nature of SAR data: The most valuable SAR data information lies in its phase. This applies to SAR image formation, which takes place in the complex signal domain, as well as for PolSAR, InSAR, and tomographic SAR data processing, meaning that an entire CNN must be able to handle complex numbers. For the convolution operation, this is trivial. The nonlinear activation function and the loss function, however, require thorough consideration. Depending on whether the activation function independently acts on the real and imaginary parts of the signal, or only on its magnitude, and where bias is added, the phase will be distorted to different degrees. If we use PolSAR data for land cover and target classification, a nonlinear processing of the phase is even desirable because the phase between different polarimetric channels has physical meaning and hence contributes to the classification process. In SAR interferometry and tomography, however, the absolute phase has no meaning; i.e., the CNN must be invariant to an arbitrary phase offset. Assume some interferometric input signal x to a CNN and the output signal CNN(x) with phase zt = +CNN (x). (1) Any constant phase offset z 0 does not change the meaning of the interferogram. Thus, we require an invariance that we refer to as phase linearity (which is valid at least in the expectation): CNN (xe jz 0) = CNN (x) e jz 0. (2) This linearity is violated, for example, if the activation function is separately applied to real and imaginary parts and if a bias is added to the complex numbers. Another point to consider in regression-type InSAR CNN processing (e.g., for noise reduction) is the loss function. If the quantity of interest is not the complex number itself but its phase, the loss function must be able to handle the cyclic nature of phases. It may also be advantageous that the loss function is independent, at least to a certain degree, of the signal magnitude to relieve a CNN from modeling the magnitude. A loss function that meets these requirements is, for example, L = E 6e j (+CNN (x) - +y)@ , (3) 147
where y is the reference signal. Some authors use the magnitude and phase, rather than real and imaginary parts, as input to a CNN. This approach is not invariant to phase offset, either. The interpretation of a phase function as a real-valued function forces a CNN to disregard the sharp discontinuities at the ! r transitions, whose positions are inconsequential. A standard CNN would pounce on these, interpreting them as edges. ◗◗ Simulation-based training and validation data: The prevailing lack of ground truth for regression-type tasks, such as speckle reduction and InSAR denoising, might tempt us to use simulated SAR data for the training and validation of neural networks. However, this bears the risk that our networks will learn models that are far too simplified. Unlike optical imaging, where highly realistic scenes can be simulated, e.g., by PC games, the simulation of SAR data is more of a scientific topic that lacks the power of commercial companies and a huge market. SAR simulators focus on specific scenarios, e.g., vegetation ­ (only distributed scatterers are considered) and persistent (point) scatterers. The most advanced simulators are probably the ones for computing the radar backscatter signatures of single military objects, such as vessels. To our knowledge, though, there is no simulator available that can, e.g., generate realistic interferometric data of rugged terrain with layover, spatially varying coherence, and diverse scattering mechanisms. Often, simplified scattering assumptions are made, e.g., that speckle is multiplicative. Even this is not true; pure Gaussian scattering can be found only for quite homogeneous surfaces and lowresolution SARs. As soon as the resolution increases, the chances of having a few dominating scatterers in a resolution cell increase, and the statistics become substantially different from those of fully developed speckle RECENT ADVANCES IN DEEP LEARNING APPLIED TO SAR In this section, we provide an in-depth review of deep learning methods applied to SAR data from six perspectives: terrain surface classification, object detection, parameter inversion, despeckling, InSAR, and SAR–optical data fusion. For each, we state notable developments in chronological order and report their advantages and disadvantages. Finally, each section concludes with a brief summary. It is worth mentioning that the application of deep learning to SAR image formation is not explicitly treated here. For SAR focusing, we have to distinguish between general-purpose focusing and the imaging of objects with a priori known properties, such as sparsity. General-purpose algorithms produce data for applications including land use and land cover classification, glacier monitoring, biomass estimation, and interferometry. These are complex-valued, focused data that retain all the information contained in the raw data. General-purpose focusing has a well-defined system model and requires a sequence of fast Fourier transforms 148 (FFTs) and phasor multiplications, i.e., linear operations, such as matrix–vector multiplications. For decades, optimal algorithms have been developed to perform these operations at the highest possible speeds and with diffraction-limited accuracy. There is no reason that deep neural networks should perform better or faster than this gold standard. If we want to introduce prior knowledge about imaged objects, however, specialized focusing algorithms may be beneficially learned by neural networks. But, even then, it might make sense to focus raw data first through a standard algorithm and apply deep learning for postprocessing. In [64], a CNN is trained to focus sparse military targets. Nevertheless, in this approach, the raw data are partially focused by an FFT before entering the CNN. TERRAIN SURFACE CLASSIFICATION As an important direction for SAR applications, terrain surface classification using PolSAR images is rapidly advancing with the help of deep learning. Regarding feature extraction, most conventional methods rely on exploring physical scattering properties [65] and texture information [66] in SAR images. However, these features are mainly human designed based on specific problems and characteristics of data sources. Compared to conventional methods, deep learning is superior in terrain surface classification due to its capability of automatically learning discriminative features. Moreover, deep learning approaches, such as CNNs, can effectively extract not only polarimetric characteristics but also spatial patterns of PolSAR images [6]. Some of the most notable deep learning techniques for PolSAR image classification are reviewed in the following. Xie et al. [67] first applied deep learning to terrain surface classification using PolSAR images. They employed a stacked autoencoder (SAE) to automatically learn deep features from PolSAR data and then fed the data to a Softmax classifier. Remarkable improvements in both the classification accuracy and the visual effect proved that this method could effectively learn a comprehensive feature representation for classification purposes. Instead of simply applying an SAE, Geng et al. [70] proposed a deep convolutional autoencoder (DCAE) for automatically extracting features and performing classification. The first layer of the DCAE is a handcrafted convolutional layer, where filters are predefined, such as gray-level co-occurrence matrices and Gabor filters. The second layer performs a scale transformation, which integrates correlated neighbor pixels to reduce speckle. Following these two layers, a trained SAE, which is similar to [67], is attached for learning more abstract features. Tested on high-resolution, single-polarization TerraSAR-X images, the method achieved remarkable classification accuracy. Based on a DCAE, for SAR image classification, Geng et al. [68] proposed a framework, called the deep supervised and contractive neural network (DSCNN), which introduced a histogram of oriented gradient descriptors. In addition, a supervised penalty was designed to capture relevant IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
classification. This method is built on two feature extraction channels: one to extract polarization features from the six-channel real matrix and the other to extract the spatial features of a Pauli decomposition. Next, the extracted features are combined using two parallel, fully connected layers, and they are finally fed to a Softmax layer for classification. The detailed architecture of this network is illustrated in Figure 3. Different variations of CNNs have been used for terrain surface classification, as well. In [77], Zhou et al. first extracted a six-channel covariance matrix and then fed it to a trainable CNN for PolSAR image classification. Wang et al. [78] proposed a fully convolutional network (FCN) integrated with sparse and low-rank subspace representations for classifying PolSAR images. Chen et al. [79] improved CNN performance by incorporating expert knowledge of target scattering mechanism interpretation and polarimetric feature mining. In a more recent work [80], He et al. proposed the combination of features learned from nonlinear manifold embedding and applying an FCN to input PolSAR images; the final classification was carried out in an ensemble approach by an SVM. In [81], the authors focused on the computational efficiency of deep learning methods, proposing the use of lightweight 3D CNNs. They showed that a classification accuracy comparable to other CNN methods was achievable while significantly reducing the number of learned parameters and therefore gaining computational efficiency. information between features and labels, and a contractive restriction, which can enhance the local invariance, was employed in the following trainable autoencoder layers. An example of applying the DSCNN to TerraSAR-X data from a small area in Norway appears in Figure 2. Compared to other algorithms, the ability of the DSCNN to achieve a highly accurate and noise-free classification map is observed. In addition to the aforementioned methods, many studies integrate SAE models with conventional classification algorithms for terrain surface classification. Hou et al. [73] proposed an SAE combined with superpixels for PolSAR image classification. Multiple layers of the SAE are trained on a pixel-by-pixel basis. Superpixels are formed based on Pauli-decomposed pseudocolor images. Outputs of the SAE are used as features in the final step of k-nearestneighbor superpixel clustering. Zhang et al. [74] applied a sparse SAE to PolSAR image classification by taking into account local spatial information. Qin et al. [75] applied adaptive restricted Boltzmann machine boosting to PolSAR image classification. Zhao et al. [76] proposed a discriminant deep belief network for SAR image classification, in which discriminant features are gleaned by combining ensemble learning with a deep belief network in an unsupervised manner. Moreover, taking into account that most current deep learning methods aim at exploiting features from PolSAR image polarization information and spatial information, Gao et al. [72] proposed a dual-branch CNN to learn features from both perspectives for terrain surface (a) (b) (g) (c) (h) (d) (i) (e) (j) (f) (k) FIGURE 2. Classification maps obtained from a TerraSAR-X image of a small area in Norway [68]. (a)–(f) depict the results of classifica- tion using (a) an SVM (accuracy = 78.42%), (b) a sparse representation classifier (SRC) (accuracy = 85.61%), (c) a random forest (accuracy = 82.20%) [69], (d) an SAE (accuracy = 87.26%) [67], (e) a DCAE (accuracy = 94.57%) [70], and (f) a contractive autoencoder (accuracy = 88.74). (g)–(i) show the combination of a DSCNN with (g) an SVM (accuracy = 96.98%), (h) an SRC (accuracy = 92.51%) [71], and (i) a random forest (accuracy = 96.87%). (j) and (k) represent the classification results of (j) a DSCNN (accuracy = 97.09%) and (k) a DSCNN followed by spatial regularization (accuracy = 97.53%), which achieves higher accuracy than the other methods. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 149
Apart from these single-image classification schemes using CNNs, the use of SAR image time series for crop classification has been shown in [40] and [82]. The authors of both papers experimented with using RNN-based architectures to exploit the temporal dependency of multitemporal SAR images to improve classification accuracy. A unique approach for tackling PolSAR classification was recently proposed in [52], where, for the first time, the authors utilized an AutoML technique to find the optimum CNN architecture for each data set. The approach takes into account the complex nature of PolSAR images, is cost effective, and achieves high classification accuracy [52]. Most of the aforementioned methods rely primarily on preprocessing and transforming raw, complex-valued data into features in the real domain and then inputting the data in a common CNN, which constrains the possibility of directly learning features from raw information. To tackle this problem, Zhang et al. [83] proposed a novel complexvalued CNN (CV-CNN) specifically designed to process complex values in PolSAR data, i.e., the off-diagonal elements of a coherency or covariance matrix. The CV-CNN not only takes complex numbers as inputs but also employs complex weights and complex operations throughout different layers. A complex-valued backpropagation algorithm was also developed for CV-CNN training. Other notable complex-valued deep learning approaches for classification using PolSAR images can be found in [84]–[86]. Differing from the previously mentioned works, which exploit the complex-valued nature of SAR images in PolSAR image classification, Huang et al. [87] recently proposed a novel deep learning framework called the Deep SAR-Net for land use classification focusing on feature extraction from single-polarimetric complex SAR images. The authors perform a feature fusion based on spatial features learned PolSAR Data Preprocessing Dual-CNN Feature Extraction and Classification Convolution 61, Convolution 62, 100 at 3 × 3 FC6_200 500 at 3 × 3 Pooling Pooling FC6_84 Six Channels Matrix T Six-Channel CNN from intensity images and time–frequency features extracted from the spectral analysis of complex SAR images. Since the time–frequency features are highly relevant for distinguishing different backscattering mechanisms within SAR images, they gain accuracy in classifying man-made objects compared to the use of typical CNNs, which focus only on spatial information. Although not completely related to terrain surface classification, it is also worth mentioning that the combination of SAR and PolSAR images with feed-forward neural networks has been extensively used for sea ice classification. This topic is not treated any further in this section, and the interested reader is referred to [88]–[92] for more information. Similar to the polarimetric signature, InSAR coherence provides information about physical scattering properties. In [35], interferometric volume decorrelation is used as a feature for forest/nonforest mapping together with radar backscatter and the incidence angle. The authors used bistatic TerraSAR-X Add-On for Digital Elevation Measurement data, where temporal decorrelation can be neglected. They compared different architectures and concluded that CNNs outperformed the random forest and that the U-Net [32] proved best for this segmentation task. To summarize, it is apparent that deep learning-based SAR and PolSAR classification algorithms have advanced considerably in the past few years. Although, at first, the emphasis was on low-rank representation learning using SAEs [67] and its modifications [70], later research focused on a multitude of issues relevant to SAR imagery, such as taking into account speckle-preserving [68], [70] spatial structures [72] and their complex nature [83]–[85], [87]. It can also be seen that the labeled data scarcity challenge has driven researchers to use semisupervised learning algorithms [86], although weakly supervised methods A B C D E F FC6_14 Softmax Dual-CNN Class 1 Class 1 Pauli RGB CNN Class N Pauli Decomposition Pooling Pooling FC3_84 Pauli RGB Convolution 31, Convolution 32, FC3_200 100 at 3 × 3 500 at 3 × 3 FIGURE 3. The architecture of the dual-branch deep CNN (the Dual-CNN) for PolSAR image classification proposed in [72]. FC: fully connected; RGB: red–green–blue. 150 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
for semantic annotation, which have been proposed for high-resolution optical data [93], have not been explicitly explored for classification tasks using SAR data. Furthermore, specific metric-learning approaches to enhance class separability [94] can be adopted for SAR imagery to improve overall classification accuracy. Finally, one of ML’s important fields, AutoML, which had not been extensively exploited by the remote sensing community, has found an application in PolSAR image classification [52]. OBJECT DETECTION Although various characteristics distinguish SAR images from optical red–green–blue (RGB) images, the SAR object detection problem is still analogous to optical image classification and segmentation in the sense that feature extraction from raw data is always a prior and crucial step. Hence, given the success in the optical domain, there is no doubt that deep learning is one of the most promising ways to develop state-of-the-art SAR object detection algorithms. The majority of the earlier work related to SAR object detection using deep learning consists of taking successful deep learning methods for optical object detection and applying them with minor tweaks to military vehicle detection [the Moving and Stationary Target Acquisition Recognition (MSTAR) data set] and ship detection with custom data sets. Even small networks are easily able to achieve more than 90% test accuracy for most of these tasks. The first attempt at military vehicle detection can be found in [7], where Chen et al. used an unsupervised sparse autoencoder to generate convolution kernels from random patches of a given input for a single-layer CNN, which generated features to train a Softmax classifier for categorizing military targets in the MSTAR data set [96]. The experiments in [7] showed great potential for applying CNNs to SAR target recognition. With this discovery, Chen et al. [97] proposed A-ConvNets, a simple five-layer CNN that was able to achieve state-of-the-art accuracy of approximately 99% on MSTAR. Following this trend, more and more authors applied CNNs to MSTAR [37], [98], [99]. Morgan [37] successfully applied a modestly sized, three-layer CNN to MSTAR, and, building on that work, Wilmanski et al. [100] investigated the effects that initialization and optimizer selection had on the final results. Ding et al. [98] investigated the capabilities of a CNN model combined with domainspecific data augmentation techniques (e.g., pose synthesis and speckle adding) in SAR object detection. Furthermore, Du et al. [99] proposed a displacement- and rotation-insensitive CNN and claimed that data augmentation using training samples is necessary and critical during the preprocessing stage. On the same data set, instead of treating a CNN as an end-to-end model, Wagner [101] and, similarly, Gao [102] integrated a CNN and an SVM by first using a CNN to extract features and then feeding the features to an SVM for final prediction. Specifically, Gao et al. [103] added a DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE class of separation information to the cross-entropy cost function as a regularization term, which they showed explicitly facilitated intraclass compactness and separability and improved the quality of the extracted features. More recently, Furukawa [104] proposed VersNet, an encoder– decoder-style segmentation network, to not only identify but localize multiple objects in an input SAR image. Moreover, Zhang et al. [95] proposed an approach based on multiaspect image sequences as a preprocessing step. They accounted for backscattering signals from different viewing geometries, followed by feature extraction through Gabor filters and dimensionality reduction; they eventually fed the results to a bidirectional LSTM model for the joint recognition of targets. This SAR awareness-trial-repeat framework is presented in Figure 4. Ship detection is another SAR task. Early studies of applying deep learning models to ship detection [105]–[109] mainly consisted of two stages: first, cropping patches from the whole SAR image and then identifying whether cropped patches belonged to target objects by using a CNN. Because of fixed patch sizes, these methods were not robust enough to accommodate variations in ship geometry, such as size and shape. This problem was overcome by using region-based CNNs [110], [111], with the creative use of skip connections and feature fusion techniques in later literature. For example, Li et al. [112] fused features of the last three convolution layers before feeding them to a region proposal network (RPN). Kang et al. [113] introduced a contextual region-based network that fused features from different levels. Meanwhile, to make the most use of features of different resolution, Jiao et al. [114] densely connected each layer to subsequent ones and fed features from all the layers to a separate RPN to generate proposals; in the end, the best proposal was chosen based on an intersection–overunion score. In more recent works on SAR object detection, scientists have tried to explore many other interesting ideas to complement current efforts. Dechesne et al. [115] proposed a multitask network that simultaneously learned to detect, classify, and estimate the length of ships. Mullissa et al. [84] showed that CNNs can be trained directly with complexvalued SAR data; Kazemi et al. [117] performed object classification using an RNN-based architecture directly on received SAR signals instead of processed SAR images; and Rostami et al. [118] and Huang et al. [119] explored knowledge transfers and transfer learning from other domains to the SAR arena for object detection. Perhaps one of the more interesting recent works in this application area relates to building detection, by Shahzad et al. [120]. The authors tackle the problem of very-high-resolution (VHR) SAR building detection using an FCN [121] architecture for feature extraction, followed by a conditional random fields RNN [122], which helps give similar weights to neighboring pixels. This architecture produced building segmentation masks with up to 93% accuracy. An example of the detected buildings can be seen in Figure 5, where Figure 5(a) is the amplitude of 151
FIGURE 4. A flowchart of the multiaspect-aware bidirectional approach for SAR automatic target recognition proposed in [95]. TPLBP: three-patch local binary pattern. Original Images Multiaspect Multiaspect Image Sequence Sample Construction Feature Detection Dimensionality Multiaspect Feature Learning Reduction Classification T72 : 0.9 BMP2 : 0.03 BRDM2: 0.07 Softmax LSTM LSTM TPLBP Gabor Filter LSTM T72 : 0.92 BMP2 : 0.02 BRDM2: 0.06 Softmax LSTM LSTM TPLBP Gabor Filter LSTM T72 : 0.98 BMP2 : 0.01 BRDM2: 0.01 Softmax LSTM LSTM TPLBP Gabor Filter LSTM 152 the input TerraSAR-X image of Berlin and Figure 5(b) is the predicted building mask. Another major contribution made in that paper addresses the lack of training data by introducing an automatic annotation technique, which annotates the SAR tomography data using Open Street Map (OSM) data. As an extension of the preceding work, Sun et al. [123] tackled the problem of individual building segmentation in large-scale urban areas. They proposed a conditional geographic information system (GIS)-aware network (CG-Net) that learns multilevel visual features and employs building footprint data to normalize these features for predicting building masks. Thanks to the novel network architecture and the large number of building labels automatically generated from accurate digital elevation model (DEM) and GIS building footprints, this network achieves an F1 score of 75.08% for individual building segmentation. With the predicted building masks, large-scale level-of-detail 1 building models are reconstructed, with a mean height error of 2.39 m. Overall, deep learning has shown very good performance in existing SAR object detection tasks. There are two main challenges that the algorithm designer needs to keep in mind when tackling any SAR object detection tasks. The first relates to identifying characteristics of SAR imagery, such as imaging geometry, the size of objects, and speckle noise. The second and bigger difficulty concerns the lack of good quality standardized data sets. As we observed, the most popular data set, MSTAR, is too easy for deep nets, and, for ship detection, the majority of authors create their data sets, which makes it very hard to judge the quality of the proposed algorithms and even harder to compare different algorithms. An example of a difficult-to-create data set can be found in global building detection. The shape, size, and style of buildings change quite drastically from region to region, and so a good data set for this purpose re­­ quires training examples taken from buildings from around the world, a task that requires significant effort to produce high-quality annotations of enough structures that deep nets can learn from them. PARAMETER INVERSION Parameter inversion from SAR images is a challenging field in SAR applications. As one important branch, ice concentration estimation is now attracting great attention due to its importance to ice monitoring and climate research [124]. Since there are complex interactions between SAR signals and sea ice [125], empirical algorithms face difficulties with interpreting SAR images for accurate ice concentration estimation. Wang et al. [8] resorted to a CNN for generating ice concentration maps from dual-polarized SAR images. Their method takes image patches of intensity-scaled dual-band SAR images as inputs and directly outputs ice concentrations. In [126] and [127], Wang et al. employed various CNN models to estimate ice concentrations from SAR images during the melt season. Labels IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
were produced by ice experts via visual interpretation. The algorithm was tested on dual-polarization RadarSat-2 data. Since the problem under consideration concerns the regression of a continuous value, the mean square error is selected as the loss function. Experimental results demonstrate that CNNs can offer a more accurate result than comparative operational products. In a different application, Song et al. [130] used a deep CNN, including five pairs of convolutional and maximum pooling layers followed by two fully connected layers, for inverting rough surface parameters from SAR images. The network training was based on simulated data, due solely to the scarcity of real training material. The method was able to invert the desired parameters with a reasonable accuracy, and the authors showed that training a CNN for parameter inversion purposes could be done quite efficiently. Furthermore, Zhao et al. [131] designed a CV-CNN to directly learn physical scattering signatures from PolSAR images. The authors notably proposed a framework to automatically generate labeled data, which led to a supervised learning algorithm for the aforementioned parameter inversion. The approach is similar to the study presented in [132], where the authors used deep learning for SAR image colorization and for learning a full PolSAR image from single-polarization data. Another interesting application of deep learning in parameter inversion was recently published in [133]. The authors propose a deep neural network architecture containing a CNN and a GAN to automatically learn SAR image simulation parameters from a small number of real SAR images. They later feed the learned parameters to a SAR simulator, such as RaySAR [134], to generate a wide variety of simulated SAR images, which can increase training data production and improve the interpretation of SAR images that have complex backscattering scenarios. On the whole, deep learning-based parameter estimation for SAR applications has not yet been fully exploited. Unfortunately, most of the remote sensing community’s focus has been devoted to classical problems, which overlap with computer vision tasks, such as classification, object detection, segmentation, and denoising. One reason for this might be that, since parameter estimation usually requires the incorporation of appropriate physical models and tackles the problem at hand as regression rather than classification, domain knowledge is quite essential for applying deep learning for such tasks, especially for SAR images, with their peculiar physical characteristics. One interesting study [87], described in detail in the “Terrain Surface Classification” section, designs discriminative features through the spectral analysis of complex-valued SAR data and is an important work toward including deep learning in parameter inversion studies using SAR data. We hope that, in the future, more studies will be carried out in this direction. DESPECKLING Speckle, which is caused by the coherent interaction among scattered signals from subresolution objects, often makes processing and interpreting SAR images difficult. Therefore, despeckling is a crucial procedure before applying SAR images to various tasks. Conventional methods aim at removing speckle either spatially, where local spatial filters, such as Lee [135], Kuan [136], and Frost filters [137], are employed, or by using wavelet-based methods [138]–[140]. For a full overview of these techniques, the reader is referred to [141]. During the past decade, patch-based methods for speckle reduction have gained popularity due to their ability to preserve spatial features while not sacrificing image resolution [142]. Deledalle et al. [143] proposed one of the first nonlocal patch-based methods applied to speckle reduction by taking into account the statistical properties of speckle, combined with the original nonlocal image denoising algorithm introduced in [144]. A vast number of variations of the nonlocal method for SAR despeckling have been proposed, with the most notable ones included in [145] and [146]. (a) (b) FIGURE 5. (a) A VHR TerraSAR-X image of Berlin and (b) the predicted building mask [120] (right). DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 153
However, on the one hand, the manual selection of appropriate parameters for conventional algorithms is not easy and is sensitive to reference images. On the other hand, it is difficult to achieve a balance between preserving distinct image features and removing artifacts through empirical despeckling methods. To solve these limitations, methods based on deep learning have been developed. Inspired by the success of image denoising using a residual learning network architecture in the computer vision community [147], Chierchia et al. [60] first introduced a residual learning CNN for SAR image despeckling by presenting a 17-layer CNN for learning to subtract speckle components from noisy images. Considering that speckle noise is assumed to be multiplicative, a homomorphic approach with coupled logarithmic and exponential transformations is performed before and after feeding images to the network. In this case, multiplicative speckle noise is transformed into an additive form and can be recovered by residual learning, where logarithmic speckle noise is regarded as residual. As shown in Figure 6, an input logarithmic noisy image is identically mapped to a fusion layer via a shortcut connection and then added elementwise with the learned residual image to produce a logarithmic clean image. Afterward, denoised images can be obtained by an exponential transformation. Wang et al. [9] proposed a CNN, called Intelligence Detection Using a CNN, for image despeckling, that can directly learn denoised images via a componentwise division-residual layer with skip connections. In another words, homomorphic processing is not introduced for transforming multiplicative noise into additive noise, and, at a final stage, the noisy image is divided by the learned noise to yield a clean image. As a step forward with respect to the two aforementioned residual-based learning methods, Zhang et al. [148] employed a dilated residual network (DRN), SAR–DRN, instead of simply stacking convolutional layers. Unlike [60] and similar to [9], SAR–DRN is trained in an end-to-end fashion using a combination of dilated convolutions and skip connections with a residual learning structure, which indicates that prior knowledge, such as a noise description model, is not required in the workflow. In [149], Yue et al. proposed a novel deep neural network architecture specifically designed for SAR despeckling. It used a CNN to extract image features and reconstruct a discrete radar cross section (RCS) probability density function (PDF). It was trained by a hybrid loss function that measured the distance between the actual SAR image intensity PDF and the estimated one derived from convolution between the reconstructed RCS PDF and a prior speckle PDF. Experimental results demonstrated that the proposed despeckling neural network could achieve performance comparable to nonlearning state-of-the-art methods. The unique distribution of SAR intensity images was also taken into account in [150]. The authors proposed a different loss function, which contained three terms between the true and reconstructed images: the common L2 loss, the L2 difference between the gradient of the two images, and the Kullback–Leibler divergence between the distribution of the two images. The three terms are designed to emphasize spatial details, the identification of strong scatterers, and speckle statistics, respectively. Experiments in [150] show improved performance compared to the SAR–block-matching 3D algorithm (BM3D) [128] and SAR–DRN [148]. In [57], the problem of despeckling was tackled using a time series of images. Employing a stack of images for despeckling is not unique to deep learning-based methods, as recently demonstrated in [151]. In [57], the authors utilized a multilayer perceptron with several hidden layers to learn nonlinear intensity characteristics of training image patches. This approach showed promising results and comparative performance with the state-of-the-art despeckling algorithms. Again using single images instead of time series, in [36], the authors proposed a deep encoder–decoder CNN architecture with a focus on feature preservation, which is a weakness of CNNs. They modified the U-Net [32] to accommodate speckle statistical features. Another notable CNN approach was introduced in [129], where the authors employed a nonlocal structure, while the weights for pixelwise similarity measures were assigned using a CNN. The results of this approach, called CNN–nonlocal means (NLM), are reported in Figure 7, where the superiority of the method with respect to both feature preservation and speckle reduction is clearly observed. One of the drawbacks of the aforementioned algorithms is the requirement of noise-free and noisy image pairs for training. Often, those training data are simulated using optical images with multiplicative noise. This is, of course, not Noisy Image – + Residual Image Exponent Convolution Convolution + BN + ReLU Convolution + BN + ReLU Convolution + ReLU Logarithm CNN Filtered Image FIGURE 6. The CNN architecture for SAR image despeckling [60]. BN: belief network. 154 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
ideal for real SAR images. Therefore, one elegant solution is the noise-to-noise framework [152], where the network requires only two noisy images of the same area. The authors of [152] prove that the network is able to learn a clean representation of the image, given that the noise distributions of the two noisy images are independent and identical. This idea was employed in SAR despeckling in [153]. The authors made use of multitemporal SAR images of the same area as the input to the noise-to-noise network. To mitigate the effect of the temporal change between the input SAR image pairs, they multiplied a patch similarity term to the original loss function. From the deep learning-based despeckling methods reviewed in this section, it can be observed that most methods (a) (b) (c) employ CNN-based architectures with single images of a scene for training; they either output clean images in an end-to-end fashion or propose residual-based techniques to learn underlying noise models. With the availability of large archives of time series thanks to the Sentinel-1 mission, an interesting direction is to exploit the temporal correlation of speckle characteristics for despeckling applications. One critical issue is oversmoothing in despeckling, and it needs to be addressed. Many of the CNN-based methods perform well in terms of speckle removal but are not able to preserve sharp edges. This is quite problematic in despeckling high-resolution SAR images of urban areas, in particular. Another problem in supervised deep learning-based despeckling techniques concerns the lack of ground truth (d) (e) FIGURE 7. A comparison of speckle reduction among SAR–BM3D [128], SAR–CNN [60], and CNN–NLM applied to a small strip of Constella- tion of Small Satellites for Mediterranean Basin Observation–SkyMed data above Caserta, Italy, where the reference clean image has been obtained by temporal multilooking applied to a stack of SAR images [129]. (a) The clean image. (b) The noisy image. (c) SAR–BM3D is applied. (d) SAR–CNN is applied. (e) CNN–NLM is applied. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 155
data. In many studies, the training data set is built by corrupting optical images through multiplicative noise. This is far from realistic for despeckling applied to real SAR data. Therefore, despeckling in an unsupervised manner would be highly desirable and worth attention. InSAR InSAR is one of the most important SAR techniques, and it is widely used in reconstructing the topography of the Earth’s surface, i.e., DEM generation [65], [154], [155], and detecting topographical displacements, e.g., monitoring volcanic eruptions [156]–[158], earthquakes [159], [160], land subsidence [161], and urban areas by using time-series methods [162]–[164]. The principle of InSAR is to first measure the interferometric phase between signals received by two antennas located at different positions and then extract topographic information from the obtained interferogram by unwrapping and converting the absolute phase to height. However, an actual interferogram often suffers from a large number of singular points, which originate from the interference distortion and noise in radar measurements. These points result in unwrapping errors and, consequently, lowquality DEMs. To tackle this problem, Ichikawa and Hirose [165] applied a complex-valued neural network (CV-NN) in the spectral domain to restore singular points. With the help of the complex Markov random field filter [166], they aimed at learning ideal relationships between the spectrum of neighboring pixels and that of the center pixels via a onehidden-layer CV-NN. Notably, the center pixels of each training sample are supposed to be ideal points, which indicates that singular points are not fed to the network during the training procedure. Similarly, Oyama and Hirose [167] restored singular points with a CV-NN in the spectrum domain. Related to topography extraction, Costante et al. [169] proposed a full CNN encoder–decoder architecture for estimating DEMs from single-pass image acquisitions. They demonstrated that this model was capable of extracting high-level features from input radar images using an encoder section and then reconstructing full-resolution DEMs via a decoder section. Moreover, the network can potentially solve the layover phenomenon in one single-look SAR image that has contextual features. In addition to reconstructing DEMs, Schwegmann et al. [170] presented a CNN-based technique to detect subsidence deformations from interferograms. They employed a nine-layer network to extract salient information from interferograms and displacement maps for discriminating deformation targets from deformation-like targets. Furthermore, Anantrasirichai et al. [10], [171], [172] used a pretrained CNN to automatically detect volcanic ground deformation through InSAR images. They divided each image into patches and relabeled it with binary labels, i.e., “background” and “volcano,” and finally fed it to the network to predict volcano deformation. In [173], they further improved their method to be able to detect slow-moving 156 volcanoes using a time series of interferograms. In another study related to automatic volcanic deformation detection, Valade et al. [168] designed and trained a CNN from scratch to learn a decorrelation mask from input wrapped interferograms; the CNN then was used to detect volcanic ground deformation. A flowchart of this approach appears in Figure 8. The training in both [168] and [173] was based on simulated data. Another geophysics-motivated example of using deep learning on InSAR data, which was actually proposed earlier than the previously mentioned CNN-based studies, can be found in [174]–[176], where the authors used simple feed-forward shallow neural networks for seismic event characterization and automatic seismic source parameter inversion by exploiting the power of neural networks in solving nonlinear problems. Recently, deep learning has been utilized for tomographic processing, as well. An unfolded deep network that involves vector-approximate message-passing algorithms was proposed in [177]. Experiments with simulated and real data were performed, showing the spectral estimation gains and achieving competitive performance. In [178], a real-valued deep neural network was applied for multiple-input, multiple-output SAR 3D imaging. It displayed a better superresolution power compared with other compressive sensing-based methods. In summary, it can be concluded that the use of deep learning methods in InSAR is still at a very early stage. Although deep learning has been incorporated in different applications combined with InSAR, the full potential of interferograms has not been fully exploited, except in the pioneering work of Hirose [179]. Many applications treat interferograms and deformation maps obtained from interferograms as images similar to RGB and gray-scale ones, and therefore the complex nature of interferograms has remained unnoticed. Apart from this issue, such as the SAR despeckling problem related to deep learning, the lack of ground truth data for detection and image restoration problems provides motivation to focus on developing semisupervised and unsupervised algorithms that combine deep learning and InSAR. Otherwise, a training database consisting of interferograms for different scenarios and different phase contributions could be beneficial for supervised learning applications. Simulation-based interferogram generation for the latter was recently proposed in [180]. SAR–OPTICAL DATA FUSION The fusion of SAR and optical images can provide complementary information about targets. However, considering the two different sensing modalities, the prior identification and the coregistration of corresponding images are challenging [181] but compulsory. For the purpose of identifying and matching SAR and optical images, many current methods resort to deep learning, given its powerful capabilities of extracting effective features from complex images. In [58], the authors proposed a CNN for identifying corresponding image patches of VHR optical and SAR IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
via a concatenation layer for further binary prediction of their correspondence. A selection of true positives, false positives, false negatives, and true negatives of SAR–optical image patches from [58] is presented in Figure 9. Similarly, imagery of complex urban scenes. Their network consists of two streams: one designed for extracting features from optical images and one responsible for learning features from SAR images. Next, the extracted features are fused Synthetic Decorrelation Mask Synthetic Training Data Synthetic Wrapped Interferogram Synthetic Phase Gradients Gradients, y Gradients, x Input CNN Training Desired Outputs (a) Trained CNN Wrapped Interferogram Estimated Decorrelation Mask Estimated Phase Gradients Real Data Gradients, y Gradients, x (b) Estimated Unwrapped Phase (W) Deformation Map (Wm) Wm = W . λ /4π Deformation Score (DEF) DEF = std – dev (Wm) Phase Unwraping (c) – Time Series and Deformation Maps (Public Website) – email Alert If DEF > 0.001 (Private List) (d) FIGURE 8. The workflow of the volcano deformation (DEF) detection proposed in [168]. The CNN is trained on simulated data and later used to perceive phase gradients and a decorrelation mask from the input wrapped interferograms to locate ground deformation caused by volcanoes. (a) The CNN training. (b) The phase gradient detection. (c) The phase unwrapping and score computation. (d) The dissemination. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 157
Hughes et al. [11] proposed a pseudo-Siamese CNN for learning a multisensor correspondence predictor for SAR and optical image patches. Notably, the networks in [11] and [58] are trained and validated on the SARptical data set [182], [183], which is specifically built for the joint analysis of VHR SAR and optical images in dense urban areas. In [184], the authors proposed a deep learning framework that can obtain an end-to-end mapping between image patch pairs and their matching labels. An image pair is first transformed into two 1D vectors and then concatenated to build a large 1D vector as the network input. Then, hidden layers are stacked for learning the mapping between input vectors and output binary labels, which indicate their correspondence. For the purpose of matching SAR and optical images, Merkle et al. [185] presented a CNN that consists of a feature extraction stage (a Siamese network) and a similarity measure stage (a dot product layer). Specifically, features of input optical and SAR images are extracted via two separate nine-layer branches and then fed to a dot product layer for predicting the shift of the optical image within the large SAR reference patch. Experimental results indicate that this deep learning-based method outperforms state-of-the-art matching approaches [186], [187]. Furthermore, Abulkhanov et al. [188] successfully trained a neural network to build feature point descriptors to identify corresponding patches among SAR and optical images and match the detected descriptors using the random sample consensus algorithm [189]. In contrast to training a model to identify corresponding image patches, Merkle et al. [190] first employed a conditional GAN (cGAN) to generate artificial SAR-like images from optical images and then matched them with real SAR images. The authors demonstrated that the matching accuracy and precision improved through the proposed strategy. Inspired by that study, more researchers resorted to using GANs for the purpose of SAR–optical image matching (see [191] and [192] for a review). With respect to applications of SAR and optical image matching, Yao et al. [193] aimed at applying SAR and optical images to semantic segmentation with deep neural networks. They collected corresponding optical patches from Google Earth that accorded to TerraSAR-X patches and built ground (a) (b) truths using data from OSM. Then, SAR and optical images were separately fed to different CNNs to predict semantic labels (buildings, natural areas, land use, and water). Despite the fact that their experimental results did not outperform the state of the art [194], likely because of the network design or the training strategy, they deduced that introducing advanced models and simultaneously using both data sources can greatly improve the performance of semantic segmentation. Another application, mentioned in [195], demonstrated that standard fusion techniques for SAR and optical images require data from both sources, which indicates that it is still not easy to interpret SAR images without the support of optical ones. To address this issue, Schmitt et al. [195] proposed an automatic colorization network composed of a VAE and a mixture density network [196] to predict artificially colored SAR images (i.e., Sentinel-1 images). These images proved to disclose more information to human interpreters than the original SAR data did. In [42], the authors tackled the problem of cloud removal from optical imagery. They introduced a cGAN architecture to fuse SAR and cloud-corrupted multispectral data for generating cloud- and haze-free multispectral optical information. Experiments proved the effectiveness of the proposed network for removing clouds from multispectral data with auxiliary SAR data. Extending previous multimodal networks for cloud removal, the authors of [43] proposed a cycle-consistent GAN architecture [197] that utilizes an image forward–backward translation consistency loss. Cloudcovered optical information is reconstructed via SAR data fusion, while changes to cloud-free areas are minimized through the cycle consistency loss. The cycle-consistent architecture facilitates training without pixelwise correspondences between cloudy input and cloud-free target optical imagery, relaxing requirements for the training data set. In summary, it can be seen that the utilization of deep learning methods for SAR–optical data fusion has been a hot topic in the remote sensing community. Although a handful of data sets consisting of optical and SAR corresponding image patches is available for different terrain types and applications, one of the biggest problems remains the scarcity of high-quality training data. Semisupervised methods, as proposed in [198], seem to be a viable option (c) (d) FIGURE 9. Randomly selected patches obtained from the testing phase of the network for SAR–optical image patch correspondence detec- tion proposed in [11]. (a) True positives. (b) False positives. (c) False negatives. (d) True negatives. 158 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
to tackle the issue. A great challenge in SAR–optical image matching concerns the extreme difference between the two sensors’ viewing geometries. For this, it is important to exploit auxiliary 3D data to assist the training data generation. EXISTING BENCHMARK DATA SETS AND THEIR LIMITATIONS To train and evaluate deep learning models, large data sets are indispensable. Unlike RGB images in the computer vision community, which can be easily collected and interpreted, SAR images are much more difficult to annotate due to their complex properties. Our research shows that big SAR data sets created for the primary purpose of deep learning investigations are nearly nonexistent in the community. In recent years, only a few SAR data sets have been made public for training and assessing deep learning models. In the following, we categorize those data sets according to their bestsuited deep learning problem and focus on openly accessible and well-curated large data sets (see Table 1 for summaries the open SAR data sets). In particular, we consider the following categories of deep learning problems in SAR: ◗◗ Image classification: Each pixel or patch in one image is classified into a single label. This is often the case in typical land use/land cover classification problems. TABLE 1. AVAILABLE OPEN SAR DATA SETS. NAME DESCRIPTION SUITABLE TASKS RELATED WORK So2Sat LCZ421 [200], TensorFlow 2 This data set contains 400,673 pairs of corresponding Sentinel-1 dual-polarity image patches, Sentinel-2 multispectral image patches, and manually labeled LCZ classes across 42 urban agglomerations (plus 10 additional smaller areas) around the globe. It is the first Earth observation data set that provides a quantitative measure of the label uncertainty, achieved by having a group of domain experts cast 10 independent votes for 19 cities in the data set. Image classification Data fusion Quantification of uncertainties [201] OpenSARUrban3 [199] This data set includes 33,358 Sentinel-1 dual-polarity image patches covering 21 major cities in China, labeled with 10 classes of urban scenes. Image classification SEN12MS 4 [202] In this data set, there are 180,748 corresponding image triplets containing Sentinel-1 dual-polarity SAR data, Sentinel-2 multispectral imagery, and MODIS-derived land cover maps, covering all inhabited continents during all meteorological seasons. Image classification Semantic segmentation Data fusion MSAW5 [204] This data set contains quad-polarity X-band SAR imagery from Capella Space, with a 0.5-m spatial resolution, which covers 120 km2 in the area of Rotterdam. A total of 48,000 unique building footprints are labeled with associated height information curated from the 3D Basis Registratie Adressen en Gebouwen data set. Semantic segmentation PolSAR Image Data Set on San Francisco6, Label7 [205] The data set includes PolSAR images of San Francisco from five different sensors. Each image was densely labeled to five or six classes, such as mountain, water, high-density urban, low-density urban, vegetation, developed, and bare soil. Image classification Semantic segmentation Data fusion [206] MSTAR8 [207] This data set contains 17,658 X-band VHR SAR images chips (patches) of 10 classes of different vehicles plus one class of a simple geometricshaped target. SAR images of pure clutter are also included. Object detection Scene classification [97], [98], [208] OpenSARShip 2.0 9 [209] This data set includes 34,528 Sentinel-1 SAR image chips of ships, with ship geometric information, types, and corresponding AIS information. Object detection Scene classification [210] SAR-Ship data set10 [211] Here, there are 43,819 Gaofen-3 and Sentinel-1 image chips of different ships. Each image chip has a dimension of 256 × 256 pixels in range and azimuth. Object detection Scene classification SARptical11 [212] The SARptical data set includes 10,108 coregistered pairs of TerraSAR-X VHR spotlight image patch and UltraCam aerial RGB image patches for Berlin. The coregistration is defined by the matching of the 3D position of the center of the image pair. Image matching [11], [183] SEN1-212 [203] This data set contains 282,384 pairs of corresponding Sentinel-1 singlepolarization-intensity and Sentinel-2 RGB image patches collected across the globe. The patches are 256 × 256 pixels. Image matching Data fusion [202] [203] 1https://doi.org/10.14459/2018mp1483140. 2https://www.tensorflow.org/datasets/catalog/so2sat. 3https://doi.org/10.21227/3sz0-dp26. 4https://mediatum.ub.tum.de/1474000. 5https://spacenet.ai/sn6-challenge/. 6https://www.ietr.fr/polsarpro-bio/san-francisco/. 7https://github.com/liuxuvip/PolSF. 8https://www.sdms.afrl.af.mil/index.php?collection=mstar. 9http://opensar.sjtu.edu.cn/Data/Search?key=OpenSARShip. 10https://github.com/CAESAR-Radi/SAR-Ship-Dataset. 11https://syncandshare.lrz.de/getlink/figixjRV9idETzPgG689dGB/SARptical_data.zip. 12 https://mediatum.ub.tum.de/1436631. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 159
◗◗ Scene classification: Similar to image classification, one image or patch is classified into a single label. However, one scene is usually much larger than an image patch. Hence, a different network architecture is required. ◗◗ Semantic segmentation: One image or patch is segmented to a classification map of the same dimension. Training such neural networks also requires densely annotated data. ◗◗ Object detection: This is much like scene classification. However, detection often requires the estimation of the object location. ◗◗ Registration/matching: This provides binary classification (matched and unmatched) and estimates the translation between two image patches. Such tasks require that pairs of two different image patches be matched as training data. IMAGE/SCENE CLASSIFICATION So2Sat LCZ42 So2Sat LCZ42 [200] follows the local climate zones (LCZs) classification scheme. The data set consists of 400,673 pairs of dual-polarity Sentinel-1 and multispectral Sentinel-2 image patches from 42 urban agglomerations, plus 10 additional smaller areas, across five continents. The image patches are hand labeled into one of the 17 LCZ classes [213]. The Sentinel-1 image patches contain a geocoded, single-look complex image as well as a despeckled Lee-filtered variant. In particular, it is the first Earth observation data set that provides a quantitative measure of the label uncertainty, achieved by letting a group of domain experts cast 10 independent votes covering 19 cities. It therefore can be considered a large-scale data fusion and classification benchmark data set for cutting-edge ML methodological developments, such as automatic topology learning, data fusion, and the quantification of uncertainties. OpenSARUrban OpenSARUrban [199] consists of 33,358 patches of Sentinel-1 dual-polarity images covering 21 major cities in China. The data set was manually annotated according to a hierarchical classification scheme, with 10 classes of urban scenes at its finest level. Each image patch has a dimension of 100 × 100 pixels, with a pixel spacing of 10 m [the Sentinel-1 ground-range-detected (GRD) product]. This data set can support deep learning studies of urban target characterization and content-based SAR image queries. Figure 10 shows samples. expect this data set to support the community in developing sophisticated deep learning-based approaches for common tasks, such as scene classification and semantic segmentation for land cover mapping. MULTISENSOR ALL-WEATHER MAPPING The Multisensor All-Weather Mapping (MSAW) [204] data set includes high-resolution SAR data, which covers 120 km2 in the area of Rotterdam, The Netherlands. The quad-polarized X-band SAR imagery from Capella Space, with a 0.5-m spatial resolution, was used for the SpaceNet 6 Challenge. In total, 48,000 unique building footprints have been labeled with additional building heights. PolSAR IMAGE DATA SET ON SAN FRANCISCO This data set [205] consists of PolSAR images of San Francisco from eight different sensors, including Airborne SAR, Advanced Land Observing Satellite (ALOS)-1, ALOS-2, RadarSat-2, Sentinel-1A, Sentinel-1B, Gaofen-3, and Radar Imaging Satellite (data compiled by E. Pottier of the Institute of Electronics and Telecommunications of Rennes). Five of the eight images were densely labeled to five or six land use land cover classes in [205]. These densely annotated images correspond to roughly 3,000 training patches of 128 × 128 pixels. Although the data volume is relatively low for deep learning research, this is the only annotated multisensory PolSAR data set, to the best of our knowledge. Therefore, we suggest that its creator increase the number of annotated images to enable its greater potential use. OBJECT DETECTION MSTAR MSTAR [207] is one of the earliest data sets for SAR target recognition. It consists of 17,658 X-band SAR image chips (patches) of 10 classes of vehicles plus one class of simple geometric-shaped targets. The collected SAR image patches are 128 × 128 pixels, with a resolution of 1 ft in the range and azimuth. In addition, 100 SAR images of clutter are provided. In our opinion, the number of image patches is relatively low for deep learning models, especially considering the number of classes. In addition, this data set represents a rather ideal and unrealistic scenario: vehicles are centered in the patch, and the clutter is quite homogeneous, without disturbing signals. However, considering the scarcity of such data sets, MSTAR is a valuable source for target recognition. SEMANTIC SEGMENTATION/CLASSIFICATION SEN12MS SEN12MS [202] was created based on its previous version SEN1-2 [203]. It consists of 180,662 triplets of dualpolarity Sentinel-1 image patches, multispectral Sentinel-2 image patches, and Moderate Resolution Imaging Spectroradiometer (MODIS) land cover maps. The patches are georeferenced, with a ground sampling distance of 10 m. Each image patch has a dimension of 256 × 256 pixels. We 160 OpenSARShip 2.0 This data set [209] is based on its previous version, OpenSARShip [210]. It contains 34,528 Sentinel-1 SAR image patches of different ships, with automatic identification system (AIS) information. For each SAR image patch, the creators manually extracted the ship length, width, and direction as well as the vessel type by verifying the data on the Marine Traffic website [209]. Roughly one-third of the patches are extracted from Sentinel-1 GRD products, IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
and the other two-thirds are from Sentinel-1 single-look complex products. OpenSARShip 2.0 is one of the handful of SAR data sets suitable for object detection. ships, tankers, fishing boats, and others. The scene types include ports, islands, reefs, and sea surfaces of different levels. REGISTRATION/MATCHING SAR-SHIP-DATA SET This data set [211] was created using 102 Gaofen-3 and 108 Sentinel-1 images. It consists of 43,819 ship chips of 256 pixels in both the range and azimuth. The ships mainly have distinct scales and backgrounds. Therefore, this data set can be employed for developing multiscale object detection models. FUSAR–SHIP The FUSAR–Ship data set [214] was created using space–time matched-up data sets of Gaofen-3 SAR images and ship AIS messages. It consists of more than 5,000 ship chips with corresponding vessel information extracted from AIS messages, which can be used to trace back to each unique ship of any particular chip. AIR–SARShip 1.0/2.0 The AIR–SARShip data set [215] has 31 (300) SAR images from the Geofen-3 satellite, including 1- and 3-m-resolution imagery with different imaging modes, such as spotlight and stripmap. There are more than 10 object categories, including SARptical The SARptical data set [183], [212] was designed for interpreting VHR spaceborne SAR images of dense urban areas. It consists of 10,108 pairs of corresponding VHR SAR and optical image patches whose locations are precisely coregistered in 3D. The patches are extracted from TerraSAR-X VHR spotlight images with a resolution better than 1 m and from UltraCam aerial optical images with a 20-cm pixel spacing, respectively. Unlike low- and medium-resolution images, high-resolution SAR and optical images in dense urban areas have very distinct geometries. Therefore, in the SARptical data set, the center points of each image pair are matched in 3D space via sophisticated 3D reconstruction and matching algorithms. The universal transverse Mercator coordinates of the center pixel of each pair are also made publicly available. This data set contributes to applications of multimodal data classification and SAR optical images coregistering. However, we believe more training samples are required for learning complicated SAR optical image-to-image mapping. FIGURE 10. Samples of the OpenSARUrban data set [199]. Six classes are shown from top to bottom: dense and low-rise residential buildings, a general residential area, high-rise buildings, villas, an industrial storage area, and vegetation. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 161
SEN1-2 The SEN1-2 data set [203] includes 282,384 pairs of corresponding Sentinel-1 single-polarization-intensity and Sentinel-2 RGB image patches collected from across the globe and in all meteorological seasons. The patches are 256 × 256 pixels. Their distribution through the four seasons is roughly even. SEN1-2 is the first large open data set of this kind. We believe it will support further developments in the field of deep learning for remote sensing as well as multisensor data fusion, such as SAR image colorization and SAR–optical image matching. ◗◗ OTHER DATA SETS SAMPLE PolSAR IMAGES FROM THE EUROPEAN SPACE AGENCY These data sets (https://earth.esa.int/web/polsarpro/data -sources/sample-datasets) include, for example, the Flevoland PolSAR data set, which several works use for agricultural land use/land cover classification. The authors of [216]–[218] manually labeled it according to different classification schemes. SAR IMAGE LAND COVER This data set [219] is not publicly available. Readers should contact the creator. ◗◗ AIRBUS SHIP DETECTION CHALLENGE This data set can be accessed at https://www.kaggle.com/c/ airbus-ship-detection. CONCLUSION AND FUTURE TRENDS This article reviewed the state of the art of an important and underexploited research field: deep learning in SAR. Relevant deep learning models were introduced, and their applications in six application fields—terrain surface classification, object detection, parameter inversion, despeckling, InSAR, and SAR–optical data fusion—were analyzed in depth. Existing benchmark data sets and their limitations were discussed. In summary, despite early successes, the full exploitation of deep learning in SAR is mostly limited by 1) the lack of large and representative benchmark data sets and 2) the defect of tailored deep learning models that makes full consideration of SAR signal characteristics difficult. Looking forward, the years ahead will be exciting. Nextgeneration spaceborne SAR missions will simultaneously provide high-resolution and global coverage, which will enable novel applications, such as monitoring the dynamic Earth. To retrieve geoparameters from these data, the development of new analytics methods is warranted. Deep learning is among the most promising methods. To fully unlock its potential in SAR/InSAR applications in this big SAR data era, there are several promising future directions, including the following: ◗◗ Large and representative benchmark data sets: As summarized in this article, there is only a handful of SAR 162 ◗◗ ◗◗ benchmarks, in particular, when multimodal ones are excluded. For instance, in SAR target detection, methods are mainly tested on a single benchmark data set, MSTAR, where only several thousands of target samples (several hundred for each class) are provided for training. With respect to InSAR, due to the lack of ground truth, data sets are extremely deficient or nearly nonexistent. Large and representative expert-annotated benchmark data sets are in high demand in the SAR community and deserve more attention. Unsupervised deep learning: To bypass the deficiencies in annotated data in SAR, unsupervised deep learning is a promising direction. These algorithms derive insights directly from the data themselves and work as feature learning, representation learning, and clustering, which could be further used for data-driven analytics. Autoencoders and their extensions, such as VAEs and deep embedded clustering algorithms, are popular choices. With respect to denoising, in despeckling, the high complexity of SAR images and the lack of ground truth make it infeasible to produce appropriate benchmarks from real data. Noise to noise [152] is an elegant example of unsupervised denoising, where the authors of [152] learn denoised data without clean data. Despite the nice visual appearance of the results, preserving details is a must for SAR applications. Interferometric data processing: Since deep learning methods were initially applied to perception tasks in computer vision, many methods resort to transforming SAR images, e.g., PolSAR images, into RGB-like images in advance, or they focus only on intensities. In other words, the most essential component of an SAR measurement— the phase information—is not appropriately considered. Although CV-CNNs are capable of learning phase information and show great potential in processing CV-SAR images, only a few such attempts have been made [83]. Extending CNNs to the complex domain, while preserving precious phase information, would enable networks to directly learn features from raw data and would open up a wide range of SAR/InSAR applications. Quantification of uncertainties: Generally speaking, geoparameter estimates without uncertainty measures are considered invalid in remote sensing. Appropriately trained deep learning models can achieve highly accurate predictions. Yet they fail in quantifying the uncertainty of these predictions. Here, giving a statement about the predictive uncertainty, while considering both aleatoric uncertainty and epistemic uncertainty, is of crucial importance. The Bayesian deep learning community has developed a model-agnostic and easy-to-implement methodology to estimate both the data uncertainty and model uncertainty within deep learning models [54], which is awaiting exploration by the SAR community. Large-scale nonlinear optimization problems: The development of inversion algorithms should keep up the pace of data growth. Fast solvers are demanded for many IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
advanced parameter inversion models, which often involve nonconvex, nonlinear, and complex-valued optimization problems, such as compressive sensing-based tomographic inversion and low-rank complex tensor decomposition for InSAR time series data analysis. In some cases, the iterations of the optimization algorithms perform computations similar to those in layers in neural networks, that is, a linear step followed by a nonlinear activation (see for example, the iteratively reweighted least-squares approach). And it is thus meaningful to replace computationally expensive optimization algorithms with unrolled deep architectures that could be trained from simulated data [50]. ◗◗ Cognitive sensors: Radars—and SARs, in particular—are very complex and versatile imaging machines. A variety of modes (stripmap, spotlight, ScanSAR, terrain observation with progressive scans, and so on), swath widths, incidence angles, and polarizations can be programmed in near real time. Cognitive radars go a giant step further: they autonomously adapt their operational modes to the environment to be imaged through an intelligent interplay of transmit waveforms, adaptive signal processing on the receiver side, and learning. Cognitive SARs are still in their conceptual and experimental phase and are often justified by the stunning capabilities of the echolocation system of bats. In his pioneering article [116], Haykin defines three ingredients of a cognitive radar: “1) intelligent signal processing, which builds on learning through interactions of the radar with the surrounding environment; 2) feedback from the receiver to the transmitter, which is a facilitator of intelligence; and 3) preservation of the information content of radar returns, which is realized by the Bayesian approach to target detection through tracking.” Such a SAR could, e.g., perform low-resolution yet wide-swath surveillance of a coastal area and, in a first step, detect objects of interest, such as ships, in real time. Based on such detection, the transmit waveform could be modified, for instance, by zooming into the region of interest and enabling a close-up look at an object and possibly classifying or even identifying it. Reinforcement (online) learning is part of the concept, as are fast and reliable detectors and classifiers (trained offline), e.g., based on deep learning. All this is edge computing; the learning algorithms have to perform in real time and with the limited compute resources onboard the satellite or airplane. Last but not least, technology advances in deep learning in remote sensing will be possible only if experts in remote sensing and ML work closely together. This is particularly true when it comes to SAR. Thus, we encourage more joint initiatives to work collaboratively toward deep learning powered, explainable, and reproducible big SAR data analytics. ACKNOWLEDGMENTS The work of Xiao Xiang Zhu is jointly supported by the European Research Council, under the European Union’s DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE Horizon 2020 research and innovation program (grant ERC-2016-StG-714087); the Helmholtz Association, through Helmholtz AI, Munich Unit at Aeronautics, Space, and Transport, and through the Helmholtz Excellent Professorship Data Science in Earth Observation: Big Data Fusion for Urban Research; and the German Federal Ministry of Education and Research, through the international Future AI Lab AI4EO (grant 01DD20001). AUTHOR INFORMATION Xiao Xiang Zhu (xiaoxiang.zhu@dlr.de) received her M.Sc., Dr.-Ing., and habilitation degrees in signal processing from the Technical University of Munich (TUM), Munich, Germany, in 2008, 2011, and 2013, respectively. She is a professor of data science in Earth observation at TUM and the head of the Department of Earth Observation Data Science, Remote Sensing Technology Institute, German Aerospace Center, Wessling, 82234, Germany. Since 2019, she has been a co-coordinator of the Munich Data Science Research School and the head of the aeronautics, space, and transport research field at the Helmholtz Association, Bonn, Germany. She has directed the Future Lab AI4EO: Artificial Intelligence for Earth Observation: Reasoning, Uncertainties, Ethics and Beyond, Munich, since 2020 and she serves on the board of directors of the Munich Data Science Institute, TUM. She was a guest scientist or visiting professor at the Italian National Research Council, Naples, Italy; Fudan University, Shanghai, China; the University of Tokyo, Tokyo, Japan; and the University of California, Los Angeles, Los Angeles, California, USA, in 2009, 2014, 2015, and 2016, respectively. Her research interests include remote sensing and Earth observation, signal processing, machine learning, and data science, with a special focus on global urban mapping. She is a member of the Junge Akademie/ Junges Kolleg, Berlin–Brandenburg Academy of Sciences and Humanities; the German National Academy of Sciences Leopoldina; and the Bavarian Academy of Sciences and Humanities. She is an associate editor of IEEE Transactions on Geoscience and Remote Sensing and a Fellow of IEEE. Sina Montazeri (sina.montazeri@dlr.de) received his B.Sc. degree in geodetic engineering from the University of Isfahan, Isfahan, Iran, in 2011; his M.Sc. degree in geomatics from Delft University of Technology, Delft, The Netherlands, in 2014; and his Ph.D. degree in radar remote sensing from the Technical University of Munich (TUM), Munich, Germany, in 2019, with a dissertation on geodetic synthetic aperture radar (SAR) interferometry. In 2012, he spent two weeks with the Laboratoire des Sciences de l’Image, de l’Informatique et de la Télédétection, University of Strasbourg, Strasbourg, France, as a junior researcher working on thermal remote sensing. From 2013 to 2015, he was a research assistant at the Remote Sensing Technology Institute (IMF), German Aerospace Center (DLR), Wessling, 82234, Germany, where he was involved in the absolute localization of point clouds obtained from SAR tomography. From 2015 to 2019, he was a research associate with the Signal 163
Processing in Earth Observation research group, TUM, and IMF–DLR, working on the automatic positioning of ground control points from multiview radar images. He is currently a senior researcher in the Department of Earth Observation Data Science, IMF–DLR, focused on developing machine learning algorithms applied to radar imagery. His research interests include advanced interferometric SAR techniques for the deformation monitoring of urban infrastructure, image and signal processing relevant to radar imagery, and applied machine learning. He received the DLR Science Award and the IEEE Geoscience and Remote Sensing Society Transactions Prize Paper Award, in 2016 and 2017, respectively, for his work on geodetic SAR tomography. Mohsin Ali (syed.ali@dlr.de) received his B.S. degree in computer engineering from the National University of Science and Technology, Islamabad, Pakistan, in 2013 and his M.S. degree in computer science from the University of Freiburg, Freiburg, Germany, in 2018. He is a Ph.D. degree candidate at the Earth Observation Center, German Aerospace Center, Wessling, 82234. Germany, supervised by Prof. Xiao Xiang Zhu. His research interests include uncertainty estimation in deep learning models for remote sensing applications. Yuansheng Hua (yuansheng.hua@dlr.de) received his B.S. degree in remote sensing science and technology from Wuhan University, Wuhan, China, in 2014 and his M.S. degree in Earth-oriented space science and technology from the Technical University of Munich (TUM), Munich, Germany, in 2018. He is pursuing his Ph.D. degree at the German Aerospace Center, Wessling, 82234, Germany, and at TUM. In 2019, he was a visiting researcher at Wageningen University and Research, Wageningen, The Netherlands. His research interests include remote sensing, computer vision, and deep learning, especially their applications in remote sensing. He is a Student Member of IEEE. Yuanyuan Wang (y.wang@tum.de) received his B.Eng. degree, with honors, in electrical engineering from Hong Kong Polytechnic University, Hong Kong, China, in 2008, and his M.Sc. and Dr. Ing. degrees from the Technical University of Munich (TUM), Munich, Germany, in 2010 and 2015, respectively. In June and July 2014, he was a guest scientist at the Institute of Visual Computing, ETH Zürich, Zürich, Switzerland. He is currently with the Department of Earth Observation Data Science, Remote Sensing Technology Institute, German Aerospace Center, Wessling, 82234, Germany, where he leads the Big SAR Data working group. He is also a guest member of the Professorship of Data Science in Earth Observation, TUM, where he supports the scientific management of European Research Council projects So2Sat and AI4SmartCities. His research interests include optimal and robust parameter estimation in multibaseline interferometric synthetic aperture radar (SAR), multisensor fusion algorithms of SAR and optical data, nonlinear optimization with complex numbers, machine learning in SAR, and high-performance computing for big data. He serves as a reviewer for multiple IEEE Geoscience and Remote 164 Sensing Society and other remote sensing journals, and he was named one of the best reviewers of IEEE Transactions on Geoscience and Remote Sensing, in 2016. He is an associate editor of the Royal Meteorological Society’s Geoscience Data Journal. He is a Member of IEEE. Lichao Mou (lichao.mou@dlr.de) received his B.S. degree in automation from the Xi’an University of Posts and Telecommunications, Xi’an, China, in 2012; his M.S. degree in signal and information processing from the University of the Chinese Academy of Sciences, Beijing, China, in 2015; and his Dr.-Ing. degree from the Technical University of Munich (TUM), Munich, Germany, in 2020. He is a guest professor at the Munich AI Future Lab AI4EO, TUM, and the head of the Visual Learning and Reasoning team, Department of Earth Observation Data Science, Remote Sensing Technology Institute (IMF), German Aerospace Center (DLR), Wessling, 82234, Germany. Since 2019, he has been an artificial intelligence consultant for the Helmholtz Artificial Intelligence Cooperation Unit of the Helmholtz Association of Germany. In 2015, he spent six months at the Computer Vision Group, University of Freiburg, Freiburg, Germany. In 2019 he was a visiting researcher at the Cambridge Image Analysis Group, University of Cambridge, Cambridge, U.K. From 2019 to 2020, he was a research scientist at IMF–DLR. He was the first-place winner of the 2016 IEEE GRSS Data Fusion Contest and a finalist for the Best Student Paper Award at the Joint Urban Remote Sensing Event, in 2017 and 2019. He is a Member of IEEE. Yilei Shi (yilei.shi@tum.de) received his Dipl.-Ing. degree in mechanical engineering and his Dr.-Ing. degree in engineering from the Technical University of Munich (TUM), Germany. In April and May 2019, he was a guest scientist in the Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, U.K. He is currently a senior scientist with the Chair of Remote Sensing Technology, TUM, Munich, 82024, Germany. His research interests include computational intelligence; fast-solver and parallel computing for large-scale problems; advanced methods for synthetic aperture radar (SAR) and interferometric SAR processing; machine learning and deep learning for a variety data sources, such as SAR, optical images, medical images, and so on; and partial differential equation-related numerical modeling and computing. He is a Member of IEEE. Feng Xu (fengxu@fudan.edu.cn) received his B.E. degree, with honors, in information engineering from Southeast University, Nanjing, China, in 2003 and his Ph.D. degree, with honors, in electronic engineering from Fudan University, China, in 2008. From 2008 to 2010, he was a postdoctoral fellow at the National Oceanic and Atmospheric Administration Center for Satellite Application and Research, Camp Springs, Maryland, USA. From 2010 to 2013, he was with Intelligent Automation, Rockville, Maryland, USA, and with the NASA Goddard Space Flight Center, Greenbelt, Maryland, USA, as a research scientist. In 2012, he was selected for China’s Global Experts Recruitment Program and subsequently returned to Fudan IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
University, Shanghai, 200433, China, in 2013, where he is currently a professor in the School of Information Science and Technology and the vice director of the Ministry of Education Key Laboratory for Information Science of Electromagnetic Waves. He has authored more than 30 papers in peer-reviewed journals, coauthored two books, and written many conference papers, and he holds two patents. His research interests include electromagnetic scattering modeling, synthetic aperture radar information retrieval, and radar system development. He was a recipient of the second-class National Nature Science Award and the 2014 SUMMA graduate fellowship in advanced electromagnetics. He serves as an associate editor of IEEE Geoscience and Remote Sensing Letters. He is the founding chair of the IEEE Geoscience and Remote Sensing Society Shanghai Chapter and a Senior Member of IEEE. Richard Bamler (richard.bamler@dlr.de) received his Dipl.-Ing. degree in electrical engineering, Dr.-Ing. degree in engineering, and habilitation degree in signal and systems theory, in 1980, 1986, and 1988, respectively, from the Technical University of Munich, Germany. He worked at the university, from 1981 to 1989, on optical signal processing, holography, wave propagation, and tomography. He joined the German Aerospace Center (DLR), Wessling, 82234, Germany, in 1989, where he is currently the director of the Remote Sensing Technology Institute. In early 1994, he was a visiting scientist at the NASA Jet Propulsion Laboratory in preparation of the Spaceborne Imaging Radar-C/X-band Synthetic Aperture Radar (SIR-C/X-SAR) missions, and, in 1996, he was a guest professor at the University of Innsbruck. Since 2003, he has held a full professorship in remote sensing technology at the Technical University of Munich, Munich, 80333, Germany, as a double appointment with his DLR position. His teaching activities include university lectures and courses covering signal processing, estimation theory, and synthetic aperture radar (SAR). Since he joined the DLR, his team has worked on SAR and optical remote sensing, image analysis and understanding, stereo reconstruction, computer vision, ocean color, passive and active atmospheric sounding, and laboratory spectrometry. His team is responsible for the development of the operational processors for SIR-C/X-SAR, the Shuttle Radar Topography Mission, TerraSAR-X, TerraSAR-X Add-On for Digital Elevation Measurement, the Tandem-L mission, the Second European Remote Sensing Satellite Global Ozone Monitoring Experiment (GOME), Environmental Satellite Scanning Imaging Absorption Spectrometer for Atmospheric Cartography, Meteorological Operational Satellite/ GOME-2, Sentinel-5 Precursor, Sentinel-4, DLR Earth Sensing Imaging Spectrometer, the Environmental Mapping and Analysis Program mission, and others. His research interests include algorithms for optimum information extraction from remote sensing data, with an emphasis on SAR. This involves new estimation algorithms, such as sparse reconstruction, compressive sensing, and deep learning. He is a Fellow of IEEE. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015. doi: 10.1038/nature14539. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2014, arXiv:1409.1556. Z.-Q. Zhao, P. Zheng, S.-T. Xu, and X. Wu, “Object detection with deep learning: A review,” IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 11, pp. 3212–3232, 2019. doi: 10.1109/ TNNLS.2018.2876865. Y. Guo, Y. Liu, T. Georgiou, and M. S. Lew, “A review of semantic segmentation using deep neural networks,” Int. J. Multimedia Inf. Retrieval, vol. 7, no. 2, pp. 87–93, 2018. doi: 10.1007/s13735017-0141-z. X. X. Zhu et al., “Deep learning in remote sensing: A comprehensive review and list of resources,” IEEE Geosci. Remote Sens. Mag., vol. 5, no. 4, pp. 8–36, 2017. doi: 10.1109/MGRS.2017.2762307. H. Parikh, S. Patel, and V. Patel, “Classification of SAR and PolSAR images using deep learning: A review,” Int. J. Image Data Fusion, vol. 11, no. 1, pp. 1–32, 2020. doi: 10.1080/19479832.2019.1655489. S. Chen and H. Wang, “SAR target recognition based on deep learning,” in Proc. Int. Conf. Data Sci. Adv. Anal. (DSAA), 2014, pp. 541–547. doi: 10.1109/DSAA.2014.7058124. L. Wang, A. Scott, L. Xu, and D. Clausi, “Ice concentration estimation from dual-polarized SAR images using deep convolutional neural networks,” in IEEE Trans. Geosci. Remote Sens., vol. 11, no. 1, pp. 1–32, 2014. doi: 10.1109/TGRS.2016.2543660. P. Wang, H. Zhang, and V. Patel, “SAR image despeckling using a convolutional neural network,” IEEE Signal Process. Lett., vol. 24, no. 12, pp. 1763–1767, 2017. doi: 10.1109/LSP.2017.2758203. N. Anantrasirichai, J. Biggs, F. Albino, P. Hill, and D. Bull, “Application of machine learning to classification of volcanic deformation in routinely generated InSAR data,” JGR, Solid Earth, vol. 123, no. 8, pp. 6592–6606, 2018. doi: 10.1029/2018JB015911. L. Hughes, M. Schmitt, L. Mou, Y. Wang, and X. X. Zhu, “Identifying corresponding patches in SAR and optical images with a pseudo-Siamese CNN,” IEEE Geosci. Remote Sens. Lett., vol. 15, no. 5, pp. 784–788, 2018. doi: 10.1109/LGRS.2018.2799232. K. Ikeuchi, T. Shakunaga, M. Wheeler, and T. Yamazaki, “Invariant histograms and deformable template matching for SAR target recognition,” in Proc. CVPR IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 1996, pp. 100–105. doi: 10.1109/ CVPR.1996.517060. Q. Zhao and J. Principe, “Support vector machines for SAR automatic target recognition,” IEEE Trans. Aerosp. Electron. Syst., vol. 37, no. 2, pp. 643–654, 2001. doi: 10.1109/7.937475. M. Bryant and F. Garber, “SVM classifier applied to the MSTAR public data set,” in Proc. Algorithms Synth. Aperture Radar Imag., 1999, pp. 355–360. doi: 10.1117/12.357652. M. Ferguson, R. Ak, Y.-T. T. Lee, and K. H. Law, “Automatic localization of casting defects with convolutional neural networks,” in Proc. IEEE Int. Conf. Big Data (Big Data), Dec. 2017, pp. 1726–1735. doi: 10.1109/BigData.2017.8258115. K. Chen, K. Chen, Q. Wang, Z. He, J. Hu, and J. He, “Short-term load forecasting with deep residual networks,” IEEE Trans. Smart Grid, vol. 10, no. 4, pp. 3943–3952, July 2019. doi: 10.1109/ TSG.2018.2844307. 165
[17] Y. Han and J. C. Ye, “Framing U-Net via Deep Convolutional Framelets: Application to Sparse-View CT,” IEEE Transactions on Medical Imaging, vol. 37, no. 6, pp. 1418–1429, Jun. 2018, doi: 10.1109/TMI.2018.2823768. [18] “Long short-term memory.” Wikimedia. https://upload.wiki​ media.org/wikipedia/commons/thumb/3/3b/The_LSTM_ cell.png/1280px-The_LSTM_cell.png (accessed May 27, 2020). [19] Y. Yang, K. Zheng, C. Wu, and Y. Yang, “Improving the Classification Effectiveness of Intrusion Detection by Using Improved Conditional Variational AutoEncoder and Deep Neural Network,” Sensors, vol. 19, no. 11, p. 2528, Jun. 2019. doi: 10.3390/ s19112528. [20] W. Feng, N. Guan, Y. Li, X. Zhang, and Z. Luo, “Audio visual speech recognition with multimodal recurrent neural networks,” in Proc. Int. Joint Conf. Neural Netw. (IJCNN), May 2017, pp. 681–688. doi: 10.1109/IJCNN.2017.7965918. [21] A. Creswell, T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta, and A. A. Bharath, “Generative Adversarial Networks: An Overview,” IEEE Signal Process. Mag., vol. 35, no. 1, pp. 53– 65, Jan. 2018. doi: 10.1109/MSP.2017.2765202. [22] M. Zitnik, M. Agrawal, and J. Leskovec, “Modeling polypharmacy side effects with graph convolutional networks,” Bioinformatics, vol. 34, no. 13, pp. 457–466, 2018. doi: 10.1093/bioinformatics/bty294. [23] B. Huang and K. M. Carley, “Residual or gate? Towards deeper graph neural networks for inductive graph representation learning,” Aug. 2019, arXiv: 1904.08035. [24] M. Alioscha-Perez, A. D. Berenguer, E. Pei, M. C. Oveneke, and H. Sahli, “Neural architecture search under black-box objectives with deep reinforcement learning and increasingly-sparse rewards,” in 2020 Int. Conf. Artificial Intelligence in Information and Communication (ICAIIC), Fukuoka, Japan, Feb. 2020. pp. 276–281. doi: 10.1109/ICAIIC48513.2020.9065031. [25] Y. LeCun, C. Cortes, and C. Burges, “Mnist handwritten digit database,” 2010. [Online]. Available: http://yann.lecun. com/exdb/mnist/ [26] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, 1998. doi: 10.1109/5.726791. [27] A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. Adv. Neural Inf. Process. Syst., 2012, pp. 1097–1105. [28] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2009, pp. 248–255. doi: 10.1109/CVPR.2009.5206848. [29] T. Tieleman and G. Hinton, “Lecture 6.5-Rmsprop: Divide the gradient by a running average of its recent magnitude,” COURSERA: Neural Netw. Machine Learn., vol. 4, no. 2, pp. 26–31, 2012. [30] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2014, arXiv:1412.6980. [31] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 770–778. doi: 10.1109/CVPR.2016.90. [32] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Proc. Int. 166 [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] Conf. Med. Image Comput. Comput.-Assisted Intervention, 2015, pp. 234–241. doi: 10.1007/978-3-319-24574-4_28. G. Huang, Z. Liu, K. Weinberger, and L. Maaten, “Densely connected convolutional networks,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 2261–2269. doi: 10.1109/ CVPR.2017.243. T. Hoeser and C. Kuenzer, “Object detection and image segmentation with deep learning on earth observation data: A review-Part I: evolution and recent trends,” Remote Sens., vol. 12, no. 10, p. 1667, 2020. doi: 10.3390/rs12101667. A. Mazza, F. Sica, P. Rizzoli, and G. Scarpa, “TanDEM-X forest mapping using convolutional neural networks,” Remote Sens., vol. 11, no. 24, p. 2980, Jan. 2019. doi: 10.3390/rs11242980. F. Lattari, B. Gonzalez Leon, F. Asaro, A. Rucci, C. Prati, and M. Matteucci, “Deep learning for SAR image despeckling,” Remote Sens., vol. 11, no. 13, p. 1532, 2019. doi: 10.3390/rs11131532. D. Morgan, “Deep convolutional neural networks for ATR from SAR imagery,” in Proc. SPIE, vol. 9475, May 13, 2015. doi: 10.1117/12.2176558. B. A. Pearlmutter, “Learning state space trajectories in recurrent neural networks,” Neural Computat., vol. 1, no. 2, pp. 263–269, 1989. doi: 10.1162/neco.1989.1.2.263. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computat., vol. 9, no. 8, pp. 1735–1780, 1997. doi: 10.1162/neco.1997.9.8.1735. E. Ndikumana, D. Ho Tong Minh, N. Baghdadi, D. Courault, and L. Hossard, “Deep recurrent neural network for agricultural classification using multitemporal SAR sentinel-1 for Camargue, France,” Remote Sens., vol. 10, no. 8, p. 1217, 2018. doi: 10.3390/rs10081217. I. Goodfellow et al., “Generative adversarial nets,” in Proc. Adv. Neural Inf. Process. Syst., 2014, pp. 2672–2680. C. Grohnfeld, M. Schmitt, and X. X. Zhu, “A conditional generative adversarial network to fuse SAR and multispectral optical data for cloud removal from Sentinel-2 images,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2018, pp. 1726–1729, doi: 10.1109/IGARSS.2018.8519215. P. Ebel, M. Schmitt, and X. Zhu, “Cloud removal in unpaired sentinel-2 imagery using cycle-consistent GAN and SAR-optical data fusion,” in Proc. IGARSS 2020 IEEE Int. Geosci. Remote Sens. Symp. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958, 2014. doi: 10.5555/2627435.2670313. K. Pearson, “LIII. On lines and planes of closest fit to systems of points in space,” London, Edinburgh, Dublin Philosoph. Mag. J. Sci., vol. 2, no. 11, pp. 559–572, 1901. doi: 10.1080/14786440109462720. D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” 2013, arXiv:1312.6114. V. Mnih et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015. doi: 10.1038/nature14236. H. Mao, M. Alizadeh, I. Menache, and S. Kandula, “Resource management with deep reinforcement learning,” in Proc. 15th ACM Workshop Hot Topics Netw., 2016, pp. 50–56. doi: 10.1145/3005745.3005750. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
[49] D. Silver et al., “Mastering the game of go with deep neural networks and tree search,” nature, vol. 529, no. 7587, p. 484, 2016. doi: 10.1038/nature16961. [50] X. Chen, J. Liu, Z. Wang, and W. Yin, “Theoretical linear convergence of unfolded ista and its practical weights and thresholds,” 2018. [51] T. Elsken, J. H. Metzen, and F. Hutter, “Neural architecture search: A survey,” 2018, arXiv:1808.05377. [52] H. Dong, B. Zou, L. Zhang, and S. Zhang, “Automatic design of CNNs via differentiable neural architecture search for PolSAR image classification,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 9, pp. 1–14, 2020. doi: 10.1109/TGRS.2020.2976694. [53] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” 2016, arXiv:1609.02907. [54] A. Kendall and Y. Gal, “What uncertainties do we need in Bayesian deep learning for computer vision?” in Proc. 31st Int. Conf. Neural Inf. Process. Syst., 2017, pp. 5580–5590. doi: 10.5555/3295222.3295309. [55] Y. Shi, Q. Li, and X. X. Zhu, “Building segmentation through a gated graph convolutional neural network with deep structured feature embedding,” ISPRS J. Photogram. Remote Sens., vol. 159, pp. 184–197, Jan. 2020. doi: 10.1016/j.isprsjprs. 2019.11.004. [56] F. Ma, F. Gao, J. Sun, H. Zhou, and A. Hussain, “Attention graph convolution network for image segmentation in big SAR imagery data,” Remote Sens., vol. 11, no. 21, p. 2586, 2019. doi: 10.3390/rs11212586. [57] X. Tang, L. Zhang, and X. Ding, “SAR image despeckling with a multilayer perceptron neural network,” Int. J. Digit. Earth, vol. 12, no. 3, pp. 1–21, 2018. doi: 10.1080/17538947.2018. 1447032. [58] L. Mou, M. Schmitt, Y. Wang, and X. X. Zhu, “A CNN for the identification of corresponding patches in SAR and optical imagery of urban scenes,” in Proc. Urban Remote Sens. Event (JURSE), 2017, pp. 1–4. doi: 10.1109/JURSE.2017.7924548. [59] R. Touzi, A. Lopes, and P. Bousquet, “A statistical and geometrical edge detector for SAR images,” IEEE Trans. Geosci. Remote Sens., vol. 26, no. 6, pp. 764–773, 1988. doi: 10.1109/36.7708. [60] G. Chierchia, D. Cozzolino, G. Poggi, and L. Verdoliva, “SAR image despeckling through convolutional neural networks,” 2017, arXiv:1704.00275. [61] Y. Shi, X. X. Zhu, and R. Bamler, “Optimized parallelization of non-local means filter for image noise reduction of InSAR image,” in Proc. IEEE Int. Conf. Inf. Automat., 2015, pp. 1515–1518. doi: 10.1109/ICInfA.2015.7279525. [62] X. X. Zhu, R. Bamler, M. Lachaise, F. Adam, Y. Shi, and M. Eineder, “Improving TanDEM-X DEMs by non-local InSAR filtering,” in Proc. Euro. Conf. Synth. Aperture Radar (EUSAR), 2014, pp. 1–4. [63] L. Denis, C.-A. Deledalle, and F. Tupin, “From patches to deep learning: Combining self-similarity and neural networks for SAR image despeckling,” in Proc. IGARSS 2019 - 2019 IEEE Int. Geosci. Remote Sens. Symp., pp. 5113–5116. doi: 10.1109/ IGARSS.2019.8898473. [64] J. Gao, B. Deng, Y. Qin, H. Wang, and X. Li, “Enhanced radar imaging using a complex-valued convolutional neural netDECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] work,” IEEE Geosci. Remote Sens. Lett., vol. 16, no. 1, pp. 35–39, 2019. doi: 10.1109/LGRS.2018.2866567. A. Moreira, P. Prats-Iraola, M. Younis, G. Krieger, I. Hajnsek, and K. P. Papathanassiou, “A tutorial on synthetic aperture radar,” IEEE Geosci. Remote Sens. Mag., vol. 1, no. 1, pp. 6–43, 2013. doi: 10.1109/MGRS.2013.2248301. C. He, S. Li, Z. Liao, and M. Liao, “Texture classification of PolSAR data based on sparse coding of wavelet polarization textons,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 8, pp. 4576– 4590, 2013. doi: 10.1109/TGRS.2012.2236338. H. Xie, S. Wang, K. Liu, S. Lin, and B. Hou, “Multilayer feature learning for polarimetric synthetic radar data classification,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2014, pp. 2818–2821. doi: 10.1109/IGARSS.2014.6947062. J. Geng, H. Wang, J. Fan, and X. Ma, “Deep supervised and contractive neural network for SAR image classification,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 4, pp. 2442–2459, 2017. doi: 10.1109/TGRS.2016.2645226. S. Uhlmann and S. Kiranyaz, “Integrating color features in polarimetric SAR image classification,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 4, pp. 2197–2216, 2014. doi: 10.1109/TGRS. 2013.2258675. J. Geng, J. Fan, H. Wang, X. Ma, B. Li, and F. Chen, “High-resolution SAR image classification via deep convolutional autoencoders,” IEEE Geosci. Remote Sens. Lett., vol. 12, no. 11, pp. 2351–2355, 2015. doi: 10.1109/LGRS.2015.2478256. B. Hou, B. Ren, G. Ju, H. Li, L. Jiao, and J. Zhao, “SAR image classification via hierarchical sparse representation and multisize patch features,” IEEE Geosci. Remote Sens. Lett., vol. 13, no. 1, pp. 33–37, 2016. doi: 10.1109/LGRS.2015.2493242. F. Gao, T. Huang, J. Wang, J. Sun, A. Hussain, and E. Yang, “Dual-branch deep convolution neural network for polarimetric SAR image classification,” Appl. Sci., vol. 7, no. 5, p. 447, 2017. doi: 10.3390/app7050447. B. Hou, H. Kou, and L. Jiao, “Classification of polarimetric SAR images using multilayer autoencoders and superpixels,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 9, no. 7, pp. 3072–3081, 2016. doi: 10.1109/JSTARS.2016.2553104. L. Zhang, W. Ma, and D. Zhang, “Stacked sparse autoencoder in PolSAR data classification using local spatial information,” IEEE Geosci. Remote Sens. Lett., vol. 13, no. 9, pp. 1359–1363, 2016. doi: 10.1109/LGRS.2016.2586109. F. Qin, J. Guo, and W. Sun, “Object-oriented ensemble classification for polarimetric SAR imagery using restricted Boltzmann machines,” Remote Sens. Lett., vol. 8, no. 3, pp. 204–213, 2017. doi: 10.1080/2150704X.2016.1258128. Z. Zhao, L. Jiao, J. Zhao, J. Gu, and J. Zhao, “Discriminant deep belief network for high-resolution SAR image classification,” Pattern Recognit., vol. 61, pp. 686–701, 2017. doi: 10.1016/j.patcog.2016.05.028. Y. Zhou, H. Wang, F. Xu, and Y. Jin, “Polarimetric SAR image classification using deep convolutional neural networks,” IEEE Geosci. Remote Sens. Lett., vol. 13, no. 12, pp. 1935–1939, 2016. doi: 10.1109/LGRS.2016.2618840. Y. Wang, C. He, X. Liu, and M. Liao, “A hierarchical fully convolutional network integrated with sparse and low-rank subspace 167
[79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] 168 representations for PolSAR imagery classification,” Remote Sens., vol. 10, no. 2, p. 342, 2018. doi: 10.3390/rs10020342. S. Chen and C. Tao, “PolSAR image classification using polarimetric-feature-driven deep convolutional neural network,” IEEE Geosci. Remote Sens. Lett., vol. 15, no. 4, pp. 627–631, 2018. doi: 10.1109/LGRS.2018.2799877. C. He, M. Tu, D. Xiong, and M. Liao, “Nonlinear manifold learning integrated with fully convolutional networks for PolSAR image classification,” Remote Sens., vol. 12, no. 4, p. 655, 2020. doi: 10.3390/rs12040655. H. Dong, L. Zhang, and B. Zou, “PolSAR image classification with lightweight 3D convolutional networks,” Remote Sens., vol. 12, no. 3, p. 396, 2020. doi: 10.3390/rs12030396. N. Teimouri, M. Dyrmann, and R. N. Jørgensen, “A novel spatio-temporal FCN-LSTM network for recognizing various crop types using multi-temporal radar images,” Remote Sens., vol. 11, no. 8, p. 990, 2019. doi: 10.3390/rs11080990. Z. Zhang, H. Wang, F. Xu, and Y. Jin, “Complex-valued convolutional neural network and its application in polarimetric SAR image classification,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 12, pp. 7177–7188, 2017. doi: 10.1109/TGRS.2017.2743222. A. G. Mullissa, C. Persello, and A. Stein, “PolSARNet: A deep fully convolutional network for polarimetric SAR image classification,” IEEE J. Select. Topics Appl. Earth Observ. Remote Sens., vol. 12, no. 12, pp. 5300–5309, 2019. doi: 10.1109/ JSTARS.2019.2956650. L. Li, L. Ma, L. Jiao, F. Liu, Q. Sun, and J. Zhao, “Complex contourlet-CNN for polarimetric SAR image classification,” Pattern Recognit., vol. 100, p. 107,110, Apr. 2020. doi: 10.1016/j. patcog.2019.107110. W. Xie, G. Ma, F. Zhao, H. Liu, and L. Zhang, “PolSAR image classification via a novel semi-supervised recurrent complex-valued convolution neural network,” Neurocomputing, vol. 388, pp. 255–268, May 2020. doi: 10.1016/j.neucom. 2020.01.020. Z. Huang, M. Datcu, Z. Pan, and B. Lei, “Deep SAR-Net: Learning objects from signals,” ISPRS J. Photogram. Remote Sens., vol. 161, pp. 179–193, Mar. 2020. doi: 10.1016/j.isprsjprs.2020.01.016. R. Ressel, A. Frost, and S. Lehner, “A neural network-based classification for sea ice types on x-band SAR images,” IEEE J. Select. Topics Appl. Earth Observ. Remote Sens., vol. 8, no. 7, pp. 3672–3680, 2015. doi: 10.1109/JSTARS.2015.2436993. R. Ressel, S. Singha, and S. Lehner, “Neural network based automatic sea ice classification for CL-pol RISAT-1 imagery,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2016, pp. 4835–4838. doi: 10.1109/IGARSS.2016.7730261. R. Ressel, S. Singha, S. Lehner, A. Rosel, and G. Spreen, “Investigation into different polarimetric features for sea ice classification using x-band synthetic aperture radar,” IEEE J. Select. Topics Appl. Earth Observ. Remote Sens., vol. 9, no. 7, pp. 3131–3143, 2016. doi: 10.1109/JSTARS.2016.2539501. S. Singha, M. Johansson, N. Hughes, S. M. Hvidegaard, and H. Skourup, “Arctic sea ice characterization using spaceborne fully polarimetric L-, C-, and X-band SAR with validation by airborne measurements,” IEEE Trans. Geosci. Remote Sens., vol. 56, no. 7, pp. 3715–3734, 2018. doi: 10.1109/TGRS.2018.2809504. [92] N. Zakhvatkina, V. Smirnov, and I. Bychkova, “Satellite SAR data-based sea ice classification: An overview,” Geosciences, vol. 9, no. 4, p. 152, 2019. doi: 10.3390/geosciences9040152. [93] X. Yao, J. Han, G. Cheng, X. Qian, and L. Guo, “Semantic annotation of high-resolution satellite images via weakly supervised learning,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 6, pp. 3660–3671, 2016. doi: 10.1109/TGRS.2016.2523563. [94] G. Cheng, C. Yang, X. Yao, L. Guo, and J. Han, “When deep learning meets metric learning: Remote sensing image scene classification via learning discriminative CNNs,” IEEE Trans. Geosci. Remote Sens., vol. 56, no. 5, pp. 2811–2821, 2018. doi: 10.1109/TGRS.2017.2783902. [95] F. Zhang, C. Hu, Q. Yin, W. Li, H. Li, and W. Hong, “SAR target recognition using the multi-aspect-aware bidirectional LSTM recurrent neural networks,” 2017, arXiv:1707.09875. [96] E. Keydel, S. Lee, and J. Moore, “MSTAR extended operating conditions: A tutorial,” in Proc. SPIE, vol. 2757, pp. 228–242, 1996. doi: 10.1117/12.242059. [97] S. Chen, H. Wang, F. Xu, and Y. Jin, “Target classification using the deep convolutional networks for SAR images,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 8, pp. 4806–4817, 2016. doi: 10.1109/TGRS.2016.2551720. [98] J. Ding, B. Chen, H. Liu, and M. Huang, “Convolutional neural network with data augmentation for SAR target recognition,” IEEE Geosci. Remote Sens. Lett., vol. 13, no. 3, pp. 364–368, 2016. doi: 10.1109/LGRS.2015.2513754. [99] K. Du, Y. Deng, R. Wang, T. Zhao, and N. Li, “SAR ATR based on displacement-and rotation-insensitive CNN,” Remote Sens. Lett., vol. 7, no. 9, pp. 895–904, 2016. doi: 10.1080/2150704X.2016.1196837. [100] M. Wilmanski, C. Kreucher, and J. Lauer, “Modern approaches in deep learning for SAR ATR,” in Proc. SPIE 9843, Algorithms for Synthetic Aperture Radar Imagery XXIII, vol. 9843, May 14, 2016, p. 98430N. doi: 10.1117/12.2220290. [101] S. Wagner, “SAR ATR by a combination of convolutional neural network and support vector machines,” IEEE Trans. Aerosp. Electron. Syst., vol. 52, no. 6, pp. 2861–2872, 2016. doi: 10.1109/ TAES.2016.160061. [102] F. Gao, T. Huang, J. Sun, J. Wang, A. Hussain, and E. Yang, “A new algorithm for SAR image target recognition based on an improved deep convolutional neural network,” Cogn. Computat., vol. 11, no. 6, pp. 809–824, 2019. doi: 10.1007/s12559-018-9563-z. [103] F. Gao, T. Huang, J. Wang, J. Sun, E. Yang, and A. Hussain, “Combining deep convolutional neural network and SVM to SAR image target recognition,” in Proc. IEEE Int. Conf. Internet of Things (iThings) IEEE Green Comput. Commun. (GreenCom) IEEE Cyber, Phys. Soc. Comput. (CPSCom) IEEE Smart Data (SmartData), 2017, pp. 1082–1085. doi: 10.1109/iThings-GreenComCPSCom-SmartData.2017.165. [104] H. Furukawa, “Deep learning for end-to-end automatic target recognition from synthetic aperture radar imagery,” 2018, arXiv:1801.08558. [105] D. Cozzolino, G. D Martino, G. Poggi, and L. Verdoliva, “A fully convolutional neural network for low-complexity singlestage ship detection in Sentinel-1 SAR images,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2017, pp. 886–889. doi: 10.1109/IGARSS.2017.8127094. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
[106] C. Schwegmann, W. Kleynhans, B. Salmon, L. Mdakane, and R. Meyer, “Very deep learning for ship discrimination in synthetic aperture radar imagery,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2016, pp. 104–107. doi: 10.1109/ IGARSS.2016.7729017. [107] C. Bentes, A. Frost, D. Velotto, and B. Tings, “Ship-iceberg discrimination with convolutional neural networks in high resolution SAR images,” in Proc. Euro. Conf. Synth. Aperture Radar (EUSAR), 2016, pp. 1–4. [108] N. Ødegaard, A. Knapskog, C. Cochin, and J. Louvigne, “Classification of ships using real and simulated data in a convolutional neural network,” in Proc. IEEE Radar Conf. (RadarConf), 2016, pp. 1–6. doi: 10.1109/RADAR.2016.7485270. [109] Y. Liu, M. Zhang, P. Xu, and Z. Guo, “SAR ship detection using sea-land segmentation-based convolutional neural network,” in Proc. Int. Workshop Remote Sens. Intell. Process. (RSIP), 2017, pp. 1–4. doi: 10.1109/RSIP.2017.7958806. [110] R. Girshick, “Fast R-CNN,” 2015, arXiv:1504.08083. [111] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, 2017. doi: 10.1109/TPAMI.2016.2577031. [112] J. Li, C. Qu, and J. Shao, “Ship detection in SAR images based on an improved faster R-CNN,” in Proc. SAR Big Data Era: Models, Methods Appl. (BIGSARDATA), 2017, pp. 1–6, doi: 10.1109/ BIGSARDATA.2017.8124934. [113] M. Kang, K. Ji, X. Leng, and Z. Lin, “Contextual region-based convolutional neural network with multilayer fusion for SAR ship detection,” Remote Sens., vol. 9, no. 8, p. 860, 2017. doi: 10.3390/rs9080860. [114] J. Jiao et al., “A densely connected end-to-end neural network for multiscale and multiscene SAR ship detection,” IEEE Access, vol. 6, pp. 20,881–20,892, Apr. 2018. doi: 10.1109/ACCESS.2018.2825376. [115] C. Dechesne, S. Lefèvre, R. Vadaine, G. Hajduch, and R. Fablet, “Multi-task deep learning from sentinel-1 SAR: Ship detection, classification and length estimation,” presented at the Conf. Big Data from Space, 2019. [116] S. Haykin, “Cognitive radar: A way of the future,” IEEE Signal Process. Mag., vol. 23, no. 1, pp. 30–40, 2006. doi: 10.1109/ MSP.2006.1593335. [117] S. Kazemi, B. Yonel, and B. Yazici, “Deep learning for direct automatic target recognition from SAR data,” in Proc. IEEE Radar Conf. (RadarConf), 2019, pp. 1–6. doi: 10.1109/RADAR.2019. 8835492. [118] M. Rostami, S. Kolouri, E. Eaton, and K. Kim, “Deep transfer learning for few-shot SAR image classification,” Remote Sens., vol. 11, no. 11, p. 1374, 2019. doi: 10.3390/rs11111374. [119] Z. Huang, Z. Pan, and B. Lei, “What, where, and how to transfer in SAR target recognition based on deep CNNs,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 4, 2019. doi: 10.1109/ TGRS.2019.2947634. [120] M. Shahzad, M. Maurer, F. Fraundorfer, Y. Wang, and X. X. Zhu, “Buildings detection in VHR SAR images using fully convolution neural networks,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 2, pp. 1100–1116, 2019. doi: 10.1109/TGRS.2018.2864716. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE [121] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. (CVPR), 2015, pp. 3431–3440. doi: 10.1109/CVPR.2015.7298965. [122] S. Zheng et al., “Conditional random fields as recurrent neural networks,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 1529–1537. doi: 10.1109/ICCV.2015.179. [123] Y. Sun, Y. Hua, L. Mou, and X. X. Zhu, “CG-net: Conditional GIS-aware network for individual building segmentation in VHR SAR images,” 2020, arXiv:2011.08362. [124] F. Radar and J. Falkingham. “Global satellite observation requirements for floating ice.” World Meteorological Organization. https://globalcryospherewatch.org/satellites/docs/PSTG-4_ Doc_08-04_GlobSatObsReq-FloatingIce.pdf (accessed Jan. 25, 2021). [125] W. Dierking, “Sea ice monitoring by synthetic aperture radar,” Oceanography, vol. 26, no. 2, pp. 100–111, 2013. doi: 10.5670/ oceanog.2013.33. [126] L. Wang, K. Scott, L. Xu, and D. Clausi, “Sea ice concentration estimation during melt from dual-pol SAR scenes using deep convolutional neural networks: A case study,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 8, pp. 4524–4533, 2016. doi: 10.1109/TGRS.2016.2543660. [127] L. Wang, “Learning to estimate sea ice concentration from SAR imagery,” Ph.D. dissertation, Univ. Waterloo, 2016. [Online]. Available: http://hdl.handle.net/10012/10954 [128] S. Parrilli, M. Poderico, C. V. Angelino, and L. Verdoliva, “A nonlocal SAR image denoising algorithm based on LLMMSE wavelet shrinkage,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 2, pp. 606–616, 2012. doi: 10.1109/TGRS.2011.2161586. [129] D. Cozzolino, L. Verdoliva, G. Scarpa, and G. Poggi, “Nonlocal CNN SAR image despeckling,” Remote Sens., vol. 12, no. 6, p. 1006, 2020. doi: 10.3390/rs12061006. [130] T. Song, L. Kuang, L. Han, Y. Wang, and Q. H. Liu, “Inversion of rough surface parameters from SAR images using simulationtrained convolutional neural networks,” IEEE Geosci. Remote Sens. Lett., vol. 15, no. 7, pp. 1130–1134, 2018. doi: 10.1109/ LGRS.2018.2822821. [131] J. Zhao, M. Datcu, Z. Zhang, H. Xiong, and W. Yu, “Contrastive-regulated CNN in the complex domain: A method to learn physical scattering signatures from flexible polsar images,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 12, pp. 10,116–10,135, 2019. doi: 10.1109/TGRS.2019.2931620. [132] Q. Song, F. Xu, and Y.-Q. Jin, “Radar image colorization: converting single-polarization to fully polarimetric using deep neural networks,” IEEE Access, vol. 6, pp. 1647–1661, 2018. [Online]. Available: https://ieeexplore.ieee.org/document/8141881 doi: 10.1109/ACCESS.2017.2779875. [133] S. Niu, X. Qiu, B. Lei, C. Ding, and K. Fu, “Parameter extraction based on deep neural network for SAR target simulation,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 7, pp. 4901–4914, 2020. doi: 10.1109/TGRS.2020.2968493. [134] S. Auer, R. Bamler, and P. Reinartz, “RaySAR - 3D SAR simulator: Now open source,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), Beijing, 2016, pp. 6730–6733. doi: 10.1109/ IGARSS.2016.7730757. 169
[135] J. Lee, “Digital image enhancement and noise filtering by use of local statistics,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-2, no. 2, pp. 165–168, 1980. doi: 10.1109/TPAMI.1980.4766994. [136] D. Kuan, A. Sawchuk, T. Strand, and P. Chavel, “Adaptive noise smoothing filter for images with signal-dependent noise,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-7, no. 2, pp. 165–177, 1985. doi: 10.1109/TPAMI.1985.4767641. [137] V. Frost, J. Stiles, K. Shanmugan, and J. Holtzman, “A model for radar images and its application to adaptive digital filtering of multiplicative noise,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-4, no. 2, pp. 157–166, 1982. doi: 10.1109/ TPAMI.1982.4767223. [138] H. Xie, L. Pierce, and F. Ulaby, “SAR speckle reduction using wavelet denoising and Markov random field modeling,” IEEE Trans. Geosci. Remote Sens., vol. 40, no. 10, pp. 2196–2212, 2002. doi: 10.1109/TGRS.2002.802473. [139] F. Argenti and L. Alparone, “Speckle removal from SAR images in the undecimated wavelet domain,” IEEE Trans. Geosci. Remote Sens., vol. 40, no. 11, pp. 2363–2374, 2002. doi: 10.1109/ TGRS.2002.805083. [140] A. Achim, P. Tsakalides, and A. Bezerianos, “SAR image denoising via Bayesian wavelet shrinkage based on heavy-tailed modeling,” IEEE Trans. Geosci. Remote Sens., vol. 41, no. 8, pp. 1773–1784, 2003. doi: 10.1109/TGRS.2003.813488. [141] F. Argenti, A. Lapini, T. Bianchi, and L. Alparone, “A tutorial on speckle reduction in synthetic aperture radar images,” IEEE Geosci. Remote Sens. Mag., vol. 1, no. 3, pp. 6–35, 2013. doi: 10.1109/MGRS.2013.2277512. [142] F. Tupin, L. Denis, C.-A. Deledalle, and G. Ferraioli, “Ten years of patch-based approaches for SAR imaging: A review,” in Proc. IGARSS 2019–2019 IEEE Int. Geosci. Remote Sens. Symp., pp. 5105–5108. doi: 10.1109/IGARSS.2019.8900596. [143] C.-A. Deledalle, L. Denis, and F. Tupin, “Iterative weighted maximum likelihood denoising with probabilistic patch-based weights,” IEEE Trans. Image Process., vol. 18, no. 12, pp. 2661– 2672, 2009. doi: 10.1109/TIP.2009.2029593. [144] A. Buades, B. Coll, and J.-M. Morel, “A non-local algorithm for image denoising,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR’05), 2005, vol. 2, pp. 60–65. doi: 10.1109/CVPR.2005.38. [145] X. Su, C.-A. Deledalle, F. Tupin, and H. Sun, “Two-step multitemporal nonlocal means for synthetic aperture radar images,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 10, pp. 6181–6196, 2014. doi: 10.1109/TGRS.2013.2295431. [146] C.-A. Deledalle, L. Denis, F. Tupin, A. Reigber, and M. Jager, “NL-SAR: A unified nonlocal framework for resolutionpreserving (pol)(in)SAR denoising,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 4, pp. 2021–2038, 2015. doi: 10.1109/ TGRS.2014.2352555. [147] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising,” IEEE Trans. Image Process., vol. 26, no. 7, pp. 3142– 3155, 2017. doi: 10.1109/TIP.2017.2662206. [148] Q. Zhang, Q. Yuan, J. Li, Z. Yang, and X. Ma, “Learning a dilated residual network for SAR image despeckling,” Remote Sens., vol. 10, no. 2, p. 196, 2018. doi: 10.3390/rs10020196. 170 [149] D.-X. Yue, F. Xu, and Y.-Q. Jin, “SAR despeckling neural network with logarithmic convolutional product model,” Int. J. Remote Sens., vol. 39, no. 21, pp. 7483–7505, 2018. doi: 10.1080/01431161.2018.1471539. [150] S. Vitale, G. Ferraioli, and V. Pascazio, “Multi-objective CNN based algorithm for SAR despeckling,” Aug. 2020, arXiv: 2006.09050v4. [151] G. Baier, W. He, and N. Yokoya, “Robust nonlocal low-rank SAR time series despeckling considering speckle correlation by total variation regularization,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 11, pp. 1–13, 2020. doi: 10.1109/TGRS. 2020.2985400. [152] J. Lehtinen et al., “Noise2noise: Learning image restoration without clean data,” 2018, arXiv:1803.04189. [153] X. Ma, C. Wang, Z. Yin, and P. Wu, “SAR image despeckling by noisy reference-based deep learning method,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 12, pp. 1–12, 2020. doi: 10.1109/ TGRS.2020.2990978. [154] H. Zebker, C. Werner, P. Rosen, and S. Hensley, “Accuracy of topographic maps derived from ERS-1 interferometric radar,” IEEE Trans. Geosci. Remote Sens., vol. 32, no. 4, pp. 823–836, 1994. doi: 10.1109/36.298010. [155] R. Abdelfattah and J. Nicolas, “Topographic SAR interferometry formulation for high-precision DEM generation,” IEEE Trans. Geosci. Remote Sens., vol. 40, no. 11, pp. 2415–2426, 2002. doi: 10.1109/TGRS.2002.805071. [156] D. Massonnet, P. Briole, and A. Arnaud, “Deflation of mount Etna monitored by spaceborne radar interferometry,” Nature, vol. 375, no. 6532, p. 567, 1995. doi: 10.1038/375567a0. [157] J. Ruch, J. Anderssohn, T. Walter, and M. Motagh, “Calderascale inflation of the Lazufre volcanic area, South America: Evidence from InSAR,” J. Volcanol. Geotherm. Res., vol. 174, no. 4, pp. 337–344, 2008. doi: 10.1016/j.jvolgeores.2008. 03.009. [158] E. Trasatti et al., “The 2004–2006 uplift episode at Campi Flegrei caldera (Italy): Constraints from SBAS-DInSAR ENVISAT data and Bayesian source inference,” Geophys. Res. Lett., vol. 35, no. 7, pp. 1–6, 2008. doi: 10.1029/2007GL033091. [159] D. Massonnet et al., “The displacement field of the landers earthquake mapped by radar interferometry,” Nature, vol. 364, no. 6433, p. 138, 1993. doi: 10.1038/364138a0. [160] G. Peltzer and P. Rosen, “Surface displacement of the 17 May 1993 Eureka valley, California, earthquake observed by SAR interferometry,” Science, vol. 268, no. 5215, pp. 1333–1336, 1995. doi: 10.1126/science.268.5215.1333. [161] V. B. H. (Gini) Ketelaar, Satellite Radar Interferometry (Remote Sensing and Digital Image Processing), vol. 14. The Netherlands: Springer-Verlag, 2009. [162] X. X. Zhu and R. Bamler, “Let’s do the time warp: Multicomponent nonlinear motion estimation in differential SAR tomography,” IEEE Geosci. Remote Sens. Lett., vol. 8, no. 4, pp. 735–739, 2011. doi: 10.1109/LGRS.2010.2103298. [163] S. Gernhardt and R. Bamler, “Deformation monitoring of single buildings using meter-resolution SAR data in PSI,” ISPRS J. Photogram. Remote Sens., vol. 73, pp. 68–79, Sept. 2012. doi: 10.1016/j.isprsjprs.2012.06.009. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
[164] S. Montazeri, X. X. Zhu, M. Eineder, and R. Bamler, “Threedimensional deformation monitoring of urban infrastructure by tomographic SAR using multitrack TerraSAR-x data stacks,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 12, pp. 6868–6878, 2016. doi: 10.1109/TGRS.2016.2585741. [165] K. Ichikawa and A. Hirose, “Singular unit restoration in InSAR using complex-valued neural networks in the spectral domain,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 3, pp. 1717–1723, 2017. doi: 10.1109/TGRS.2016.2630719. [166] R. Yamaki and A. Hirose, “Singular unit restoration in interferograms based on complex-valued markov random field model for phase unwrapping,” IEEE Geosci. Remote Sens. Lett., vol. 6, no. 1, pp. 18–22, 2009. doi: 10.1109/LGRS.2008. 2005588. [167] K. Oyama and A. Hirose, “Adaptive phase-singular-unit restoration with entire-spectrum-processing complex-valued neural networks in interferometric SAR,” Electron. Lett., vol. 54, no. 1, pp. 43–44, 2018. doi: 10.1049/el.2017.2680. [168] S. Valade et al., “Towards global volcano monitoring using multisensor sentinel missions and artificial intelligence: The MOUNTS monitoring system,” Remote Sens., vol. 11, no. 13, pp. 1528, 2019. doi: 10.3390/rs11131528. [169] G. Costante, T. Ciarfuglia, and F. Biondi, “Towards monocular digital elevation model (DEM) estimation by convolutional neural networks-application on synthetic aperture radar images,” 2018, arXiv:1803.05387. [170] C. Schwegmann, W. Kleynhans, J. Engelbrecht, L. M ­ dakane, and R. Meyer, “Subsidence feature discrimination using deep convolutional neural networks in synthetic aperture radar imagery,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2017, pp. 4626–4629. doi: 10.1109/IGARSS.2017.8128031. [171] N. Anantrasirichai, F. Albino, P. Hill, D. Bull, and J. Biggs, “Detecting volcano deformation in InSAR using deep learning,” 2018, arXiv:1803.00380. [172] N. Anantrasirichai, J. Biggs, F. Albino, and D. Bull, “A deep learning approach to detecting volcano deformation from satellite imager y using synthetic datasets,” Remote Sens. Environ., vol. 230, p. 111,179, Sept. 2019. doi: 10.1016/j.rse. 2019.04.032. [173] N. Anantrasirichai, J. Biggs, F. Albino, and D. Bull, “The application of convolutional neural networks to detect slow, sustained deformation in InSAR time series,” Geophys. Res. Lett., vol. 46, no. 21, pp. 11,850–11,858, 2019. [174] F. Del Frate, M. Picchiani, G. Schiavon, and S. Stramondo, “Neural networks and SAR interferometry for the characterization of seismic events,” in Proc. SPIE, 2010, p. 78290J. doi: 10.1117/12.867915. [175] M. Picchiani, F. Del Frate, G. Schiavon, S. Stramondo, M. Chini, and C. Bignami, “Neural networks for automatic seismic source analysis from DInSAR data,” in Proc. SPIE, 2011, p. 81790K. doi: 10.1117/12.898575. [176] S. Stramondo, F. Del Frate, M. Picchiani, and G. Schiavon, “Seismic source quantitative parameters retrieval from InSAR data and neural networks,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 1, pp. 96–104, 2011. doi: 10.1109/TGRS. 2010.2050776. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE [177] J. Gao, Y. Ye, S. Li, Y. Qin, X. Gao, and X. Li, “Fast super-resolution 3D SAR imaging using an unfolded deep network,” in Proc. IEEE Int. Conf. Signal, Inf. Data Process. (ICSIDP), 2019, pp. 1–5. doi: 10.1109/ICSIDP47821.2019.9173392. [178] C. Wu, Z. Zhang, L. Chen, and W. Yu, “Super-resolution for MIMO array SAR 3-D imaging based on compressive sensing and deep neural network,” IEEE J. Select. Topics Appl. Earth Observ. Remote Sens., vol. 13, pp. 3109–3124, 2020. doi: 10.1109/JSTARS.2020.3000760. [179] A. Hirose, Complex-Valued Neural Networks (Studies in Computational Intelligence). Berlin: Springer-Verlag, 2012, vol. 400. [180] G. Rongier, C. Rude, T. Herring, and V. Pankratius, “Generative Modeling of InSAR Interferograms,” Earth Space Sci., vol. 6, no. 12, pp. 2671–2683, 2019. doi: 10.1029/2018EA000533. [181] M. Schmitt and X. X. Zhu, “On the challenges in stereogrammetric fusion of SAR and optical imagery for urban areas,” Int. Arch. Photogram. Remote Sens. Spatial Inf. Sci., vol. XLI-B7, pp. 719–722, June 2016. doi: 10.5194/isprs-archives-XLI-B7-719-2016. [182] Y. Wang, X. X. Zhu, S. Montazeri, J. Kang, L. Mou, and M. Schmitt, “Potential of the ‘SARptical’ system,” presented at the FRINGE, 2017. [183] Y. Wang and X. X. Zhu, “The SARptical dataset for joint analysis of SAR and optical image in dense urban area,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2018, pp. 6840–6843. doi: 10.1109/IGARSS.2018.8518298. [184] S. Wang, D. Quan, X. Liang, M. Ning, Y. Guo, and L. Jiao, “A deep learning framework for remote sensing image registration,” ISPRS J. Photogram. Remote Sens., vol. 145, pp. 148–164, Nov. 2018. doi: 10.1016/j.isprsjprs.2017.12.012. [185] N. Merkle, W. Luo, S. Auer, R. Müller, and R. Urtasun, “Exploiting deep matching and SAR data for the geo-localization accuracy improvement of optical satellite images,” Remote Sens., vol. 9, no. 6, p. 586, 2017. doi: 10.3390/rs9060586. [186] S. Suri and P. Reinartz, “Mutual-information-based registration of TerraSAR-X and Ikonos imagery in urban areas,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 2, pp. 939–949, 2010. doi: 10.1109/TGRS.2009.2034842. [187] F. Dellinger, J. Delon, Y. Gousseau, J. Michel, and F. Tupin, “SARSIFT: A SIFT-like algorithm for SAR images,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 1, pp. 453–466, 2015. doi: 10.1109/ TGRS.2014.2323552. [188] D. Abulkhanov, I. Konovalenko, D. Nikolaev, A. Savchik, E. Shvets, and D. Sidorchuk, “Neural network-based feature point descriptors for registration of optical and SAR images,” in Proc. SPIE 10696, Tenth Int. Conf. Machine Vision (ICMV 2017), vol. 10696 2017, pp. 106960L. doi: 10.1117/12.2310085. [189] M. A. Fischler and R. C. Bolles, “Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography,” Commun. ACM, vol. 24, no. 6, pp. 381–395, 1981. doi: 10.1145/358669.358692. [190] N. Merkle, S. Auer, R. Müller, and P. Reinartz, “Exploring the potential of conditional adversarial networks for optical and SAR image matching,” IEEE J. Select. Topics Appl. Earth Observ. Remote Sens., vol. 11, no. 6, pp. 1–10, 2018. doi: 10.1109/ JSTARS.2018.2803212. [191] L. H. Hughes, N. Merkle, T. Burgmann, S. Auer, and M. Schmitt, “Deep learning for SAR-optical image matching,” in Proc. 171
IGARSS 2019 – 2019 IEEE Int. Geosci. Remote Sens. Symp., pp. 4877–4880. doi: 10.1109/IGARSS.2019.8898635. [192] M. Fuentes Reyes, S. Auer, N. Merkle, C. Henry, and M. Schmitt, “SAR-to-optical image translation based on conditional generative adversarial networks-optimization, opportunities and limits,” Remote Sens., vol. 11, no. 17, p. 2067, 2019. doi: 10.3390/ rs11172067. [193] W. Yao, D. Marmanis, and M. Datcu, “Semantic segmentation using deep neural networks for SAR and optical image pairs,” presented at the Big Data from Space, 2017. [194] N. Audebert, B. Le Saux, and S. Lefevre, “Semantic segmentation of earth observation data using multimodal and multi-scale deep networks,” in Computer Vision–ACCV 2016 (Lecture Notes in Computer Science), vol. 10111, S.-H. Lai, V. Lepetit, K. Nishino, and Y. Sato, Eds. Cham: Springer-Verlag, 2017, pp. 180–196. [195] M. Schmitt, L. Hughes, M. Körner, and X. X. Zhu, “Colorizing Sentinel-1 SAR images using a variational autoencoder conditioned on Sentinel-2 imagery,” Int. Arch. Photogram. Remote Sens. Spatial Inform. Sci., vol. 42, no. 2, pp. 1045–1051, 2018. doi: 10.5194/isprsarchives-XLII-2-1045-2018. [196] C. Bishop, “Mixture density networks,” Citeseer, Tech. Rep., 1994. Accessed: Jan. 25, 2020. [Online]. Available: https://publications. aston.ac.uk/id/eprint/373/1/NCRG_94_004.pdf [197] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-toimage translation using cycle-consistent adversarial networks,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 2242– 2251. doi: 10.1109/ICCV.2017.244. [198] L. H. Hughes and M. Schmitt, “A semi-supervised approach to SAR-optical image matching,” ISPRS Ann. Photogram. Remote Sens. Spatial Inform. Sci., vol. IV-2/W7, pp. 71–78, Sept. 2019. doi: 10.5194/isprs-annals-IV-2-W7-71-2019. [199] J. Zhao, Z. Zhang, W. Yao, M. Datcu, H. Xiong, and W. Yu, “OpenSARUrban: A Sentinel-1 SAR image dataset for urban interpretation,” IEEE J. Select. Topics Appl. Earth Observ. Remote Sens., vol. 13, pp. 187–203, 2020. doi: 10.1109/JSTARS.2019.2954850. [200] X. Zhu et al., “So2Sat LCZ42: A benchmark dataset for global local climate zones classification,” IEEE Geosci. Remote Sens. Mag., vol. 8, no. 3, pp. 187–203, 2020. doi: 10.1109/MGRS.2020. 2964708. [201] M. Neumann, A. S. Pinto, X. Zhai, and N. Houlsby, “In-domain representation learning for remote sensing,” Nov. 2019, arXiv: 1911.06721. [202] M. Schmitt, L. H. Hughes, C. Qiu, and X. X. Zhu, “SEN12MS A curated dataset of georeferenced multi-spectral Sentinel-1/2 imagery for deep learning and data fusion,” ISPRS Ann. Photogram. Remote Sens. Spatial Inform. Sci., vol. IV-2/W7, pp. 153– 160, Sept. 2019. doi: 10.5194/isprs-annals-IV-2-W7-153-2019. [203] M. Schmitt, L. H. Hughes, and X. X. Zhu, “The SEN1-2 dataset for deep learning in SAR-Optical data fusion,” in Proc. ISPRS Ann. Photogram. Remote Sens. Spatial Inf. Sci., pp. 141–146, 2018. [204] J. Shermeyer et al., “Spacenet 6: Multi-sensor all weather mapping dataset,” 2020, arXiv:2004.06500. [205] X. Liu, L. Jiao, and F. Liu, “PolSF: Polsar image dataset on San Francisco,” 2019, arXiv:1912.07259. [206] Y. Cao, Y. Wu, P. Zhang, W. Liang, and M. Li, “Pixel-wise Polsar image classification via a novel complex-valued deep fully con- 172 volutional network,” Remote Sens., vol. 11, no. 22, p. 2653, 2019. doi: 10.3390/rs11222653. [207] T. Ross, S. Worrell, V. Velten, J. Mossing, and M. Bryant, “Standard SAR ATR evaluation experiments using the MSTAR public release data set,” in Proc. Algorithms Synth. Aperture Radar Imag., 1998. doi: 10.1117/12.321859. [208] F. Gao, Y. Yang, J. Wang, J. Sun, E. Yang, and H. Zhou, “A deep convolutional generative adversarial networks (DCGANS)based semi-supervised method for object recognition in synthetic aperture radar (SAR) images,” Remote Sens., vol. 10, no. 6, p. 846, 2018. doi: 10.3390/rs10060846. [209] B. Li, B. Liu, L. Huang, W. Guo, Z. Zhang, and W. Yu, “OpenSARShip 2.0: A large-volume dataset for deeper interpretation of ship targets in Sentinel-1 imagery,” in Proc. SAR Big Data Era: Models, Methods Appl. (BIGSARDATA), Nov. 2017, pp. 1–5. doi: 10.1109/BIGSARDATA.2017.8124929. [210] L. Huang et al., “OpenSARShip: A dataset dedicated to Sentinel-1 ship interpretation,” IEEE J. Select. Topics Appl. Earth Observ. Remote Sens., vol. 11, no. 1, pp. 195–208, Jan. 2018. doi: 10.1109/JSTARS.2017.2755672. [211] Y. Wang, C. Wang, H. Zhang, Y. Dong, and S. Wei, “A SAR dataset of ship detection for deep learning under complex backgrounds,” Remote Sens., vol. 11, no. 7, p. 765, Mar. 2019. doi: 10.3390/rs11070765. [212] Y. Wang, X. X. Zhu, B. Zeisl, and M. Pollefeys, “Fusing meterresolution 4-D InSAR point clouds and optical images for semantic urban infrastructure monitoring,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 1, pp. 14–26, Jan. 2017. doi: 10.1109/ TGRS.2016.2554563. [213] I. D. Stewart and T. R. Oke, “Local climate zones for urban temperature studies,” Bull. Amer. Meterol. Soc., vol. 93, no. 12, pp. 1879–1900, 2012. doi: 10.1175/BAMS-D-11-00019.1. [214] H. Xiyue, W. Ao, Q. Song, J. Lai, H. Wang, and F. Xu, “FUSARship: A high-resolution SAR-AIS matchup dataset of Gaofen-3 for ship detection and recognition,” Sci. China Inf. Sci., vol. 68, 2020, Art. no. 140303. doi: 10.1007/s11432-019-2772-5. [215] S. Xian, W. Zhirui, S. Yuanrui, D. Wenhui, Z. Yue, and F. Kun, “Air-sarship–1.0: High resolution SAR ship detection dataset,” J. Radars, vol. 8, no. 6, pp. 852–862, 2019. [216] P. Yu, A. Qin, and D. Clausi, “Unsupervised polarimetric SAR image segmentation and classification using region growing with edge penalty,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 4, pp. 1302–1317, 2012. doi: 10.1109/TGRS.2011.2164085. [217] D. Hoekman and M. Vissers, “A new polarimetric classification approach evaluated for agricultural crops,” IEEE Trans. Geosci. Remote Sens., vol. 41, no. 12, pp. 2881–2889, 2003. doi: 10.1109/ TGRS.2003.817795. [218] W. Yang, D. Dai, J. Wu, and C. He, “Weakly supervised polarimetric SAR image classification with multi-modal Markov aspect model,” in Proc. ISPRS, 2010. [219] C. O. Dumitru, G. Schwarz, and M. Datcu, “SAR image land cover datasets for classification benchmarking of temporal changes,” IEEE J. Select. Topics Appl. Earth Observ. Remote Sens., vol. 11, no. 5, pp. 1571–1592, May 2018. doi: 10.1109/ JSTARS.2018.2803260. GRS IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
©SHUTTERSTOCK.COM/1968 Forward-Looking Ground-Penetrating Radar Subsurface target imaging and detection: A review DAVIDE COMITE, FAUZIA AHMAD, MOENESS G. AMIN, AND TRAIAN DOGARU D etection of shallow-buried, in-road threats using a forward-looking (FL) ground-penetrating radar (GPR) system has attracted significant research interest in the last decade. An FL-GPR mounted on a moving platform can provide standoff target detection and imaging. This enables real-time sensing and situation awareness over large ground areas. The main challenge facing this sensing technology is high false-alarm rates due to scattering arising from air– ground interface roughness and subsurface clutter. In this article, we present a comprehensive review of the state-of-the-art techniques that address the unique challenges associated with FL-GPR technology. Specifically, we focus on array-based FL-GPR systems and consider both electromagnetic modeling and signal processing for problem formulation and solutions. Image formation methods and target detection approaches are discussed, highlighting their offerings and shortcomings in providing reliable system performance. Digital Object Identifier 10.1109/MGRS.2020.3048368 Date of current version: 9 February 2021 DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE THE CHALLENGES OF FORWARD-LOOKING GROUND-PENETRATING RADAR In recent years, radar imaging and detection of shallow-buried targets have garnered much interest due to the need for reliable subsurface investigations in a variety of applications, including real-time security, military situational awareness, and humanitarian demining of unexploded ordnance over large areas [1]–[8]. Although a broad class of sensing modalities, including seismic and radiometric, have been proposed in the literature for the detection of buried targets [9]–[11], electromagnetic waves remain a viable option (see, e.g., [12]) owing to their various attributes, such as superior ground penetration, sensitivity to arbitrarily shaped plastic targets, and robustness to different soil conditions. In particular, the FL-GPR technology is gaining impetus as it enables sensing from a standoff distance. A major motivation for the development of early FL-GPR systems has been their terrain-mapping capabilities, used to clear roads from explosive hazards. Vehicle-borne, downlooking radar systems previously employed in this application lacked the standoff detection range that would enable 0274-6638/21©2021IEEE 173
spotting of the hazard before the vehicle drove over it. By pointing the antenna array to look ahead of the vehicle, FL-GPR systems are able to achieve a reasonable lead detection time before reaching the actual explosive hazard location. However, performance of an FL-GPR system is highly impacted by rough surface clutter (see, e.g., [13]). Depending on the soil conditions and degree of surface roughness, the returns from the ground interface can dominate the radar measurements and obscure the target response. This leads to significant uncertainty in the assessment and interpretation of the attained radar images. Compared to its downward-looking counterpart, wherein the antennas are either coupled or very close to the ground surface (see, e.g., [3], [4], and [14]–[16]), an FL-GPR system employs oblique and near-grazing incidence sensing to enable target detection from a safe standoff distance. In this case and depending on the roughness profile of the illuAPPROACHES BASED minated surface, most of the ON CONTROLLED energy would be forward-scatEXPERIMENTS, THOUGH tered along the specular direcCOMPLICATED AND COSTLY, tion, yielding reduced returns ARE VALUABLE IN from the air–ground interface. UNDERSTANDING In practice, however, even PHENOMENOLOGY AND if the backscattered echo from CAN PROVIDE REAL the rough surface is relatively weak, the intensity of the sigSCATTERING DATA. nal returns from concealed targets can also be quite low. This renders target detection and localization challenging, especially in the case of nonmetallic objects. Therefore, proper design of both imaging and detection approaches becomes fundamental to improving the performance of the FL configuration. To compensate for the loss of energy due to the signal bounce at the ground interface, synthetic aperture radar (SAR)-based focusing is typically employed [4], [17]–[21], wherein coherently combining the returns at multiple antenna positions focuses the energy to an image pixel, thereby improving weak target representations. Several approaches based on electromagnetic modeling and statistical detection analysis have been proposed for FL-GPR (see, e.g., [19], [22], and [23] and the references therein). In this article, we focus on array-based FL-GPR systems and provide a comprehensive review of the state-of-theart radar imaging and detection methods, highlighting their advantages and limitations. We attempt to group advances in FL-GPR based on the nature of the data employed, system prototyping, properties of the imaging scene assumed, and principal signal processing algorithms undertaken. SOLUTION TO THE FORWARD PROBLEM A controlled solution for the forward-scattering problem can be a powerful tool to assess and characterize the ground interface contributions, predict the target signature, and design 174 and validate image formation methods, including clutter mitigation approaches. This would require determining the scattered field from the illuminated scene that essentially comprises the targets buried in a dielectric half-space with a rough surface profile. For simplicity and without loss of generality, the involved media are assumed homogeneous. Under known materials and imaging geometry, in the presence of a flat ground interface and considering targets represented by canonical simple shapes, the underlying scattering problem can be analytically characterized by solving Maxwell’s equations (see, e.g., [12], [24], and [25]). However, in most cases of practical interest, those assumptions and prior information do not hold, and more realistic and flexible approaches are needed. These approaches call for implementing a full-wave solution of the scattering problem numerically (see, e.g., [26] and the references therein) or collecting experimental data. In the next sections, we summarize key FL-GPR systems used in data collection and also discuss the numerical approaches used for data modeling. FORWARD-LOOKING GROUND-PENETRATING RADAR PROTOTYPES AND EXPERIMENTAL APPROACHES Approaches based on controlled experiments, though complicated and costly, are valuable in understanding phenomenology and can provide real scattering data. This, however, requires the availability of specific facilities and preparation of experimental campaigns. Toward this end, different prototypes and radar systems have been developed for data measurements, which are summarized as follows. In [27]–[29], a prototype of a high-resolution GPR system was designed and deployed by SRI International. This system is a stepped-frequency, fully polarimetric radar, operating over the 0.3–3-GHz frequency band. The prototype was conceived to operate as an FL SAR system, providing a ground–surface resolution of about 5 cm. The experimental activity was originally designed to define optimal FL-GPR parameters and support image processing for the standoff target detection of concealed antitank mines. Reference [27] was the first publication reporting experimental data collection and processing with an FL SAR system for GPR applications. It provided insights into the signal-to-clutter ratio (SCR) of shallow-buried targets as well as key features of clutter statistics. Time-frequency analysis was applied in [30] using an FL-GPR system to detect plastic targets buried under a rough ground surface. Different quadratic time-frequency distributions were considered to characterize and interpret the scattering from both the targets and the rough surface. This work employed experimental data described in [27] and proposed a target detector based on the signal ambiguity function, which showed superior detection performance over a conventional detector. In [31], an FL-GPR operating from 0.76 to 3.8 GHz was developed by Planning System Incorporated (PSI). This FLGPR system is a broadband, stepped-frequency, continuous wave (CW) system performing digital phase detection of the CW echoes on a fixed number of receiving channels. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
Experimental campaigns were carried out to collect data in the field, accounting for both metallic and plastic objects. A near-field delay-and-sum beamforming algorithm (more details are provided in the “Image Formation” section) was implemented to provide focused images of the considered area. To meet the system bandwidth requirements, the antennas were constituted by Archimedean spirals, and each antenna was housed in a cavity-backed structure. The U.S. Army Combat Capabilities Development Command Army Research Laboratory (ARL) FL-GPR prototype, called the synchronous impulse reconstruction (SIRE) radar [see Figure 1(a)], is an ultrawideband (UWB) radar based on the transmission of short pulses [32]. For the imaging and detection of buried targets, the system employs a physical array of 16 receiving antennas, which provide a long aperture for high cross-range resolution. The transmitted pulse has a 0.3–3-GHz bandwidth, which represents a tradeoff between fine down-range resolution and the ability to penetrate soil depths of a few centimeters. To increase the signal-to-noise ratio, the baseband receiver integrates radar returns from multiple pulses prior to processing for target detection. The system hardware was based on commercially available integrated circuits, which provided a low-cost and lightweight digitizing scheme. In [32], both simulations and measurements in the field were conducted considering on-surface metallic targets; the possibility of penetrating foliage and weather was experimentally assessed. Following the design and testing of SIRE, ARL researchers proposed a new UWB radar system, called the spectrally agile frequency-incrementing reconfigurable (SAFIRE) radar system [34]. SAFIRE was designed to provide an unprecedented capability of adapting the operating frequency to the surrounding electromagnetic environment, thereby lowering the susceptibility of the system to radio-frequency (RF) interference. To this end, SAFIRE employed steppedfrequency waveforms and sought to eliminate system transmissions that are likely to cause interference to nearby sources of disturbance [35]. Currently, such a feature is considered essential for FL-GPRs operating in congested RF environments. The SAFIRE operating band ranges from 300 to 2,000 MHz, with a minimum frequency step-size of 1 MHz. The SAFIRE system can be configured in either an FL or side-looking orientation and is equipped with a uniform linear array made of 16 Vivaldi receiving antennas and two quad-ridge horn transmit antennas. The latter are placed above the ends of the receiver array. The sequential firing of the two transmitters provided orthogonal waveforms, which established a multiple-input, multiple-output (MIMO) configuration with an extended virtual aperture for improved cross-range resolution. Experimental FL-GPR data, collected by the Army Look Ahead Radar Impulse (for) Countermine (ALARIC) vehicle-borne UWB impulse radar system, were used in [36] to provide the first assessment of coherent integration DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE through exploitation of the platform movement. The system employs an impulse generator at approximately 950 MHz and has a 300–3,000-MHz bandwidth (down-range resolution of .5 cm). A pair of transverse electromagnetic horn transmit antennas, placed at two ends of a 2-m-wide receiver array, was considered to provide good pulse fidelity while minimizing the reflected power of the transmitter. The receiver array comprised 16 identical Vivaldi notch antennas, which were selected because of their compact size and low cross coupling between elements. Using physical array measurements from multiple platform positions, it was shown that conventional synthetic aperture processing can be used to form FL-GPR images of good quality, though at the expense of the lateral position estimates of the targets within the illuminated scene. The preliminary results also demonstrated the possibility of successfully detecting metallic targets buried near the surface. More recently, the authors in [33] proposed an experimental test for the assessment of imaging and detection (a) (b) FIGURE 1. (a) The system and field test of the U.S. ARL. (Source: [32]). (b) The test facilities at Ingegneria dei Sistemi S.p.A., Italy. (Source: [33]). 175
with two transmitters and 16 receivers, were presented in [18]. The modeled system has a frequency bandwidth from 0.3 to 1.5 GHz. In particular, a near-field army FDTD software package was developed at the ARL for synthesizing FL-GPR numerical data accounting for realistic sensing conditions. More information on the modeling approach and targets used for the analysis can be found in [37] and [38]. An example of FL-GPR-focused numerical data from [38] is reproduced in Figure 2; the images are formed over a horizontal plane in front of the transmitting and receiving antennas considering both metallic and plastic targets whose locations are specified in Figure 2(a). Both the flat ground interface [Figure 2(a)] and the rough surface [Figure 2(b)] are simulated. The latter was generated by assuming a random process model described by Gaussian statistics. A 3D full-wave approach, based on a finite- difference frequency- domain method and optimized to provide realtime solutions, was proposed in [20] to model an FL-GPR on a moving platform and calculate the scattering from rough terrains located at large electrical distances from the antennas. For a synthetic aperture, the computational domain was reduced to a small subset of the observed region, and the surface clutter was determined by performing the simple multiplication of a precomputed impulse response NUMERICAL APPROACHES matrix of the rough profile with a matrix characterizing the Numerical data obtained by means of a finite- difference FL-GPR transmitted signal. time- domain (FDTD) method, modeling an FL-GPR system This approach significantly reduced the complexity through an efficient use of computational resources, –20 4 thereby permitting the representa5 8 3 –25 2 tion of lossy/frequency-dispersive 2 Tx: (θinc, φ1) –30 11 soils and target-detection processing 1 3 –35 Rx Array 0 6 in real time. This is especially useful –40 10 9 –1 in scenarios where an experimental Tx: (θinc, φ2) –2 1 –45 performance validation may incur a 7 –3 –50 high cost and/or require significantly 4 –4 more resources. –8 –6 –4 –2 0 2 4 6 8 The authors in [39] extended the rex (m) al-time 3D simulation to the multiview (a) case, considering a realistic velocity of 4 the moving platform. The matrix-mul–25 3 tiplication-based surface clutter com2 –30 1 putation in this case required an addi–35 0 tional precomputed correction matrix –40 –1 of the moving platform measurement –45 –2 steps along the direction of motion. –3 –50 The method was tested via Monte Carlo –4 –55 simulations. In practice, the proposed –8 –6 –4 –2 0 2 4 6 8 simulation-based approach can be x (m) (b) used to estimate the scattering from the rough surface profile, which can then be subtracted from the actual FL-GPR FIGURE 2. The focused numerical data (in decibels) for a scene with size equal to 9 × 19 m: measurements. The resulting difference ground with (a) a flat surface and (b) a randomly rough surface characterized by a root mean square (rms) surface height equal to 0.8 cm and correlation length of 14.93 cm. Further details signals can then be processed for image formation and target detection. can be found in [38]. Rx: receive; Tx: transmit. (Source: [38].) y (m) y (m) performance by means of an FL-GPR under realistic conditions. Test facilities at Ingegneria dei Sistemi S.p.A., headquartered in Italy [see Figure 1(b)], are equipped with a moving platform that can support two or more antennas, pointing toward a test site that comprises several resolution cells of the FL-GPR system. The test field allows the inclusion of heterogeneous soils. Data were gathered in the frequency range from 0.4 to 2 GHz using a transmit and a receive horn antenna, both connected to a network analyzer. The antennas, spaced 93.5 cm apart and tilted at a 45° angle, were mounted on the moving platform at a distance of 1.42 m from the air–soil interface. The platform was moved along straight tracks with a constant spatial step of size 0.02 m. Scanning lines of about 8 m were used to collect data over a sandy portion of the test site. The experiments were performed after intense rainfall, which reproduced challenging operational conditions. This resulted in a nonhomogeneous background medium that consisted of two layers: the upper layer, with thickness of a few centimeters, was dry sand while the deeper layer comprised wet sand. This work also discussed the performance achievable with a conventional microwave tomographic approach to focus FL-GPR data. 176 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
IMAGE FORMATION Once the scattering data have been collected or generated by solving the forward problem, a postprocessing procedure is needed to produce an image of the illuminated scene [12], [40]. In most cases, when the detection of concealed targets is of interest, the image is 2D and formed over a horizontal plane within an area ahead of the moving FL system [see Figures 2(a) and (b)]. Although the height of the 2D image can be arbitrarily chosen, the capability of penetrating lossy soils at microwave frequencies is on the order of 2–10 cm, which is comparable to the achievable resolution. Therefore, varying the height by a few centimeters will not significantly affect the image quality and target-detection capability. Several image formation approaches have been proposed in the literature, with a majority being simple adaptations of conventional algorithms used for focusing SAR data. More involved strategies based on electromagnetic formulations of the problem have been presented to account for the presence of the dielectric interface and near-field conditions arising due to shorter distances between the antennas and imaging region of interest. In the following sections, we give an overview of these methods. MIGRATION Among the most well-known image formation algorithms, migration has been broadly used to focus GPR data. Migration is a family of imaging techniques that originate from the seismic literature [41], [42]. Over the years, this class of algorithms has been adapted within radar imaging frameworks, including SAR and GPR (see, e.g., [43] and [44]). From a practical viewpoint, the algorithm essentially operates on the scattered field at the receiver to compensate for the different delays encountered by the signal generated by point-like scatterers, which are illuminated within a certain time interval during the movement of the system (the FLGPR platform in this case). In radar imaging, the migration algorithm is sometimes assimilated to beamforming approaches since both essentially compensate for the hyperbolic patterns representing raw data in a time-range scattering diagram. Image reconstruction in terms of migrated data can be achieved by numerically implementing a double integral function of time and range, which includes the scattered field and migration operator (see, e.g., [45] and [46]). A number of contributions on the application of migration and beamforming algorithms to GPR data have been proposed. A comprehensive review of these approaches can be found in [17]. MICROWAVE TOMOGRAPHY GPR imaging methods based on an electromagnetic formulation constitute a so-called inverse problem [12], [47]. Mathematically, a solution to the direct problem exists that is unique and has a continuous dependence on the data (see, e.g., [40] and [48]). The problem becomes ill posed when the uniqueness of the solution and/or continuity of DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE its data dependence do not hold. The latter implies that even a small error in the scattered field (e.g., the presence of additive thermal noise) can cause a considerable error in the reconstruction of the background dielectric characteristics. In practice, a regularization is applied to solve the inverse problem [48]. The main objective of the numerous methods that exist to regONCE THE SCATTERING ularize the inverse problem is renouncing an ideal solution DATA HAVE BEEN and looking for suitable roCOLLECTED OR GENERATED bustness in the results. BY SOLVING THE FORWARD Imaging procedures based PROBLEM, A on a linear solution of the POSTPROCESSING scattering equation have been PROCEDURE IS NEEDED TO shown to be simple and particPRODUCE AN IMAGE OF THE ularly suitable for the processILLUMINATED SCENE. ing of GPR data [4], [49]–[51], including FL configurations [23], [33]. These procedures are mainly based on the Born approximation (BA) (see e.g., [40] and [49] and the references therein), which essentially approximates the internal field of a dielectric object with the incident field; the latter being a known term. By suitably defining the Green’s function of the problem [12], the electromagnetic formulation of the scattering field based on a linear solution allows near-field consideration consistent with the nature of the illumination. We can also describe in the formulation a flat air–soil interface by defining a Green’s function for multilayered media. Methods based on the inversion of the linear scattering equation are often referred to as microwave tomography approaches [49], which essentially consist of retrieving the unknown profile of a dielectric object, i.e., the contrast function, from the knowledge of the scattered field collected at the receiving antenna. The contrast function is defined as the relative difference between the (complex) permittivity of the target and that of the reference propagation scenario (free space, in the case at hand). By modeling the transmitting antennas as vertically oriented Hertzian dipoles and measuring only the VV-polarization scattered field from the investigation domain D, the linear relationship under BA for shallow-buried targets can be expressed as [12], [23] E s ^rr, ~h = - jk b2 ~n 0 z 0 $ ##D G^r, rr, ~h $ 6G ^r, rt, ~h $ z 0@ | (r) dr, (1) where E s is the VV-polarized scattered field corresponding to angular frequency ~ collected at point rr, | is the unknown scene reflectivity, G is the free-space dyadic Green’s function, k b = f r k 0 is the wavenumber in the medium, and k 0 = ~ f 0 n 0 is the free-space wavenumber. The vectors rr and rt represent the positions of the receive and transmit antennas, respectively; r denotes a generic point 177
in the image area; and z 0 is the unit vector along the vertical direction. The operator “$” in (1) represents the dyadic product and is implemented as the usual product between a 3 # 3 matrix and a 3 # 1 vector. To generate the image, (1) is discretized by means of a conventional methods-of-moments approach (i.e., implementing a point-matching procedure) [12]. To limit the computational burden, the linear problem can be simply solved by applying the adjoint operator, which is also known as the backpropagation algorithm (BPA) [52], and solving for the unknown scene reflectivity. That is, | = L )zz E s z, (2) where L )zz is the adjoint of the discretized linear operator in (1), and E s z and | are stacked vectors representing the collected scattered field data and discretized version of the unknown scene reflectivity, respectively. The spatial map defined by the magnitude of | is the tomographic image of D. An alternative, computationally more demanding approach to regularize and solve (1) can be implemented using truncated singular value decomposition (TSVD) [48], [54]. To achieve robustness of the solution against noise and the uncertainties of the parameters of the reference 0 –5 y (m) –10 –20 0 –30 –40 5 0 2 4 6 8 10 12 14 16 18 x (m) (a) –50 0 –5 scenario, the inversion is performed by implementing the following equation [23], [48]: N |= / n=1 1 v n G E s z, u n H v n , (3) where G·,·H denotes the inner product, v n denotes the singular values (sorted in a decreasing order) of the linear operator L zz, u n and v n are the singular vectors of L zz, and N is the truncation index, whose choice should ensure a compromise between resolution and smoothness of the reconstruction and the stability of the solution against noise. TSVD belongs to the class of inverse filtering methods [54] and has frequently been applied to process GPR data [33], [49]. A performance comparison between TSVD and BPA for an FL-GPR was conducted in [23]. In Figure 3, we reproduce the images generated by the two methods using the near-field numerical data in [23]. The reconstruction capabilities of both schemes were investigated by analyzing the achievable resolution limits and considering the impact of rough surface clutter on image quality. It was shown that the two methods provide comparable imaging capabilities with few differences. More specifically, a microwave inverse imaging approach can provide improved target reconstructions over BPA, specifically enhancing the response of weak targets. On the other hand, BPA provides smoother and cleaner images that are less affected by environmental clutter. In general, BPA is preferred in the case of a large investigation domain and when implementing multiview and multiaperture strategies [19], [36], [53] or multilook incoherent processing [44]. An example of an FL-GPR image achieved with BPA based on a multiaperture strategy, i.e., integration of a certain number of FL-GPR scans selected along the track of the sensor platform [53], is shown in Figure 4. An FL-GPR image based on real data, described in [28] and [29], is depicted in Figure 5. The crosses with label P10 point to the nominal locations of plastic mines buried at a depth of 10 cm. y (m) –10 –20 0 –30 5 0 2 4 6 8 10 12 14 16 18 x (m) (b) –50 FIGURE 3. Reconstructed images for plastic and metallic targets, both on top of and buried below a rough surface with an rms height equal to 0.8 cm and correlation length of 14.93 cm [23]. The amplitude is normalized to the maximum and expressed in decibels over the interval [–50, 0]. (a) TSVD inversion with a truncation index of N = 180. (b) Adjoint inversion. The range between the strongest and weakest target is around 45 dB for the TSVD and nearly 48 dB for the adjoint method. (Source: [23].) 178 y (m) –40 5 4 3 2 1 0 –1 –2 –3 –4 –5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 x (m) 0 –5 –10 –15 –20 –25 –30 –35 –40 FIGURE 4. The normalized BPA tomographic reconstruction, on the decibel scale, achieved the integration of the sets of eight FL-GPR apertures. The processed numerical data are described in [37]. The true target positions are indicated with red crosses. (Source: [53].) IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
0 –5 40 45 50 55 Down Range (m) 60 –10 –15 (dB) –1 0 1 2 3 (a) –1 0 1 2 3 0 –5 40 45 50 55 Down Range (m) 60 –10 –15 (dB) Cross Range (m) DATA-ADAPTIVE AND COMPRESSIVE SENSING METHODS A data-adaptive approach for FL-GPR image formation was proposed in [44] (Figure 6). It is based on amplitude and phase estimation and rank-deficient robust Capon beamforming. There were 12 evenly spaced scans (each scan covering 2 m in the down range) used to form the entire image, covering 24 m in total. The amplitude- and phase-estimation algorithm in conjunction with the robust Capon beamformer provided a significantly enhanced image quality compared to BPA. Compressive sensing (CS) methods can also be applied to exploit the intrinsic sparsity of the illuminated scene in terms of the number of buried targets. A CS approach was employed in [56] for scene reconstruction using measurements from a MIMO FL-GPR system. Assuming a linear model relating the measured data and the unknown scene reflectivity, the image formation can be posed as a solution to an inverse problem regularized by a sparsity-inducing norm. This framework permits scene reconstruction with spatial and temporal sampling at sub-Nyquist rates. In real environments, even with few targets, there exists strong clutter that populates and subsequently degrades the quality of the reconstructed image. This is because the rough surface clutter in the FL-GPR can be distributed over the entire region. An FL-GPR image from [56], generated by processing real data of a shallow-buried metallic antitank landmine using the CS technique, is depicted in Figure 7. Clearly, without clutter suppression, it Cross Range (m) The image is strongly cluttered with contributions from the rough surface. The aforementioned tomographic imaging methods are based on free-space approximation, neglecting the presence of the air-to-ground interface and assuming the propagation as occurring in a homogeneous dielectric medium. The performance of the approximate free-space tomographic imaging was contrasted in [55] with that of a tomographic algorithm that accounts for the presence of the actual half-space geometry. The latter implements the spectral representation of the dyadic Green’s function. Using numerical electromagnetic FL-GPR data, the authors in [55] demonstrated that a free-space approximation can lead to a loss of imaging resolution and degradation in the SCR, as compared to its halfspace counterpart. The impact of the lower resolution was also observed in the estimated target statistics [53]. (a) FIGURE 6. Real-data-based, single-look imaging results: (a) a BPA imaging result and (b) the results of a hybrid of amplitudeand phase-estimation algorithm and robust Capon beamformer. (Source: [44].) 0 X Distance (m) 1 P10 –1 –0.8 2 P10 3 –0.6 –0.4 4 P10 15.4 –2 15.6 6 0 1 2 3 4 5 Y Distance (m) 6 7 FIGURE 5. An FL-GPR image of plastic mines. The crosses point to the nominal buried locations of the targets. The label P10 denotes a plastic mine buried at a depth of 10 cm. As expected, the image is strongly cluttered by the contribution from the rough surface. The blank region indicates where a strong stake (fiducial) return has been masked from the image to make the mine returns more visible. (Source: [30].) DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE Azimuth (m) 5 15.8 16 16.2 –1 0 1 2 3 14 14.5 15 15.5 16 16.5 17 17.5 18 Range (m) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 FIGURE 7. A CS image of sparse data without clutter suppression. (Source: [56].) 179
Additionally, reflections generated by rocks and other objects lying on the surface above the targets can be the source of strong clutter or false alarms (see, e.g., [53], [65], and [66]). Since the illuminated area in the FL-GPR usually CLUTTER-MITIGATION STRATEGIES extends beyond the image region where the targets reside, The standoff-sensing capability of FL-GPR comes at the strong clutter can also derive from nearby shrubbery, rocks, expense of the energy backscattered by the illuminated and other objects lying on the surface. Because of these factargets. The weak target responses are vulnerable to intertors, clutter-suppression approaches devised for the DL conference scattering arising from the air–ground interface figuration may not directly apply to FL-GPR. roughness and subsurface clutter. Therefore, it is imperative Figure 8 shows an image from [56] obtained by applying to eliminate or significantly reduce the clutter for effective BPA to real data corresponding to a shallow-buried landand reliable target detection. mine in a road 6 m wide, with rocks and shrubs populating Over many years, considerable attention has been dethe roadside. The various types of clutter are clearly visible voted to the suppression of clutter generated by the ground in the image. More specifically, in addition to the clutter in bounce in the down-looking the image region, strong azimuth clutter and short-range (DL) configuration, wherein clutter are also visible. The former is due to large shrubs the detection of objects burTHE STANDOFF-SENSING and on-surface rocks on the side of the road, while the latter ied at large depths (on the orCAPABILITY OF FL-GPR is associated with ranges adjacent to the radar system that der of tens of centimeters) is cause returns with small propagation delays. possible [54], [57]–[63]. Since COMES AT THE EXPENSE Some research efforts have been devoted to rough surthe ground bounce in DL is OF THE ENERGY face clutter characterization and reduction in FL-GPR (see, typically from a fixed range BACKSCATTERED BY THE e.g., [20], [56], and [67] and the references therein). One of and has the highest strength, ILLUMINATED TARGETS. the first attempts to characterize rough surface clutter in it is conventionally removed FL-GPR is documented in [68], where plane-wave timeby estimating and subtracting domain scattering from a fixed target in the presence of the ground return from the a rough surface was numerically solved by means of an measured signals or via time gating [64]. In FL-GPR sensFDTD algorithm. The authors examined the statistics of the ing, however, the rough air–ground interface creates clutter pulse scattered from the surface and applied conventional that is essentially distributed over the entire area illumimatched filtering for target detection. nated by the sensor. A method based on the scattering solution through physical optics was proposed in [69]. The authors demonstrated that, by analyzing both Clutter in Buried Reconstruction scattering amplitude and phase as Short-Range Clutter Reconstruction Region Landmine Region well as employing time-frequency signal representations, it is possible 1 to suppress clutter and improve 0.9 –6 target-detection performance over 0.8 conventional approaches based on –4 background subtraction or param0.7 eter analysis [70]. –2 0.6 An analytical approach was devel0 0.5 oped in [71] to examine the impact 0.4 of the rough surface on the detection 2 of buried targets in FL-GPR. This ap0.3 4 proach quantified the coherent and 0.2 incoherent components of the cross 6 0.1 section of buried targets using phys8 ical-optics approximation. The total 6 8 10 12 14 16 18 20 received signal from the targets and Range (m) surrounding clutter was determined Azimuth Clutter to consist of three components: a coherent signal (whose phase is well defined and can be tracked) correFIGURE 8. An FL-GPR image showing different types of clutter. The data are acquired by a sponding to the target, an incohervehicle-mounted stepped-frequency FL-GPR virtual aperture radar, and BPA is used to generent signal (whose phase is random) ate the image. (Source: [56].) Azimuth (m) is difficult to distinguish the target from the clutter in the reconstructed image. 180 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE y (m) In [67] and [75], an alternative approach, based on the coherence factor (CF), was proposed for clutter reduction. The performance of the CF-based approach was quantified in terms of the SCR in the image domain. The approach leveraged the matched filtering formulation of microwave near-field tomographic imaging to define the CF for a multiantenna FL-GPR system. The CF was used to generate a coherence map of the region of interest, which was then applied as a mask to the original tomographic image. Since the CF map assumes small values for low-coherence image regions, which correspond to strong rough surface clutter contributions, the final image has significantly reduced clutter and is more amenable to the implementation of a subsequent target-detection procedure. A comparative example is shown in Figure 9. In [56], a clutter-suppression method, in conjunction with CS imaging, described in the “Image Formation” section, was designed for a MIMO, array-based FL-GPR. A preprocessing method was proposed for reducing the azimuth and short-range clutter localized in specific regions outside of the image area, as depicted in Figure 8. This was achieved by implementing azimuth filtering on sparse-array data and range-profile domain suppression via an inverse Fourier transform. The clutter-suppressed version of the CS-based image in Figure 7 is depicted in Figure 10, where the impact of the clutter reduction method is clearly visible. The clutter y (m) generated by the target, and an incoherent clutter contribution. As such, the problem of subsurface target detection can rely on the identification of a partially coherent broadband signal in the presence of noise. This approach, however, would require the design of a coherent system, which is complicated and expensive. Further, it could fail not only in the presence of strong surface roughness profiles or inhomogeneities but also under weak target response (i.e., dielectric) when the useful signal can lose its partially coherent nature. Nonetheless, the main analytical approaches are based on physical-optics scattering and a Gaussian representation of the correlation function of the rough soil; these assumptions, however, may not represent all possible realistic conditions. To overcome some intrinsic limitations of the analytical approaches and provide a more realistic prediction of back-scattering in FL-GPR systems, in both the presence and absence of buried targets, a full-wave solution based on an FDTD modeling of dispersive soil (i.e., described by a frequency-dependent permittivity) was proposed in [72]. This work also developed a statistical analysis of the roughsurface scattering, constituting one of the first attempts at the application of optimum hypothesis testing to solve the problem of the detection of radar returns from buried mines in the presence of rough surface clutter. The effects of surface clutter on time-reversal-based FLGPR imaging were investigated numerically in [73], where a large realistic scene consisting of landmines buried under a rough surface was considered. This work emphasized the role of the polarization of the incidence wave and impact of the surface parameters on the dynamic range of the radar images comprising both clutter and metallic/dielectric targets. The impact of target orientation was also considered therein. Following a similar full-wave approach, the authors in [74] characterized clutter in the image domain and proposed a statistical polarimetric approach for the reduction of the rough surface clutter to improve the signal-to-background ratio. Specifically, the method was based on the analysis of the polarimetric coherence of the backscattered signal, which is assumed to be zero for the rough surface clutter and nonzero for human-made discrete targets. A synthetic aperture near-field beamforming approach was used to reduce clutter for antitank mine detection in [31], which also proposed a statistical analysis of the signal and clutter contributions based on real data from metallic and plastic landmines. This work provided useful insights into the relative intensity of clutter and targeted echoes, discussing the challenging nature of the detection of plastic materials. In [36], [37], [44], and [53], authors demonstrated the advantage of coherently integrating measurements corresponding to multiple consecutive platform positions for rough surface clutter reduction. The approach simply takes advantage of the coherent and static nature (when observed from different spatial positions) of the scattering generated by human-made targets with respect to the rough surface contribution. 5 4 3 2 1 0 –1 –2 –3 –4 –5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 x (m) (a) 0 –5 5 4 8 4 3 2 2 1 5 0 9 6 –1 1 –2 7 –3 3 –4 –5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 x (m) (b) 0 –5 –10 –15 –20 –25 –30 –35 –40 –10 –15 –20 –25 –30 –35 –40 FIGURE 9. The CF-based imaging results using FL-GPR numerical data of plastic and metallic landmines: images (a) before clutter suppression and (b) after CF-based enhancement. (Source: [67].) 181
in the region containing the targets, on the other hand, can be reduced by fine-tuning the regularization parameter associated with the sparsity-based inverse problem. Toward this end, an iterative procedure was implemented in [56] to estimate an optimum regularization parameter in the presence of rough surface clutter, based on the ratio of clean areas within the image with respect to cluttered regions. TARGET DETECTION The presence of rough surface clutter in FL-GPR imagery renders the detection of on-surface and buried targets challenging. Owing to the oblique illumination in the FL configuration, only a small fraction of the transmitted energy is backscattered from the target THE PRESENCE OF ROUGH and collected by the radar reSURFACE CLUTTER IN ceiver. The deeper the burial depth of the target, the weaker FL-GPR IMAGERY RENDERS the signal return. More imTHE DETECTION OF portantly, due to the similar ON-SURFACE AND BURIED dielectric features of plastic TARGETS CHALLENGING. targets and the surrounding soil (permittivity on the order of 3–4 for a dry background medium), the scatterers cannot be easily differentiated from clutter in both the spectral and image domains [11]. To this end, innovative statistical and spectral approaches have been devised in the literature to offer reliable target detection in FLGPR applications. In the following sections, we group these approaches based on statistical and spectral methods. STATISTICAL DETECTORS In [28] and [29], the effectiveness of two statistical signalprocessing techniques, namely, the polarimetric whitening –1 –0.8 –0.6 –0.4 Azimuth (m) 15.4 –2 15.6 15.8 16 16.2 –1 0 1 2 3 14 14.5 15 15.5 16 16.5 17 17.5 18 Range (m) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 FIGURE 10. A CS image of Figure 7 after clutter suppression. (Source: [56].) 182 filter and generalized likelihood ratio test (LRT), was investigated for different types of targets buried at various depths below the interface. The capability of these methods to detect metallic targets with high confidence was illustrated. However, an unsatisfactory detection performance for plastic mines was observed due to 1) a mismatch between the ground truth and assumed target and clutter statistics as well as 2) an incomplete exploitation of the target signatures. A locally adaptive detection method that adjusted the detection criteria automatically and dynamically across different spatial regions of the FL-GPR image was proposed in [76]. In this work, an FL-GPR image was processed with a locally adaptive standard deviation filter to compute the standard deviation of a small neighborhood around each pixel of interest in the image. More specifically, prior to performing target detection, each image pixel value was replaced by the maximum pixel value within a rectangular neighborhood of dimensions equal to 3 m in the cross range and 1.5 m in the down range. Potential targets within the image were identified by performing the following operation: A = arg u, v {G f (u, v) $ min {O f (u, v), - 60}}, (4) where O f (u, v) denotes the filtered image, A is the set of local-maxima locations, and G denotes the FL-GPR image. An empirical value of –60 dB was selected as the threshold. An example from [76] is depicted in Figure 11, where the associated false-alarm locations are indicated with white crosses. Expectedly, because of the nonoptimal choice of the threshold, the processed image still exhibits a considerable number of false alarms. An image-domain LRT-based detection strategy was proposed in [53], which exploits the multiview intrinsic nature of the FL-GPR configuration. The multiple views of the scene correspond to measurements from different positions along the platform trajectory. For an LRT detector, the exact statistics of the targets and clutter in the FL-GPR images need to be known a priori. To this end, clutter and target pixel sets, obtained from the training data, were used to determine the target and clutter statistics. The targets were represented by a three-component Gaussian mixture model, whereas the clutter was found to be Rayleigh distributed. Two different LRT detection strategies were employed for fusion of the multiview images. The first performed simultaneous detection and fusion within the LRT framework under the assumption of independent target and clutter statistics from one viewpoint to another. Mathematically, the pixelwise LRT applied on N im images, {X n (i, j); n = 1, 2, f, N im}, is given by N im LR (i, j) = % n=1 H0 p (X n (i, j)| H 1) 1 c, (5) p (X n (i, j)| H 0) 2 H1 where p (X n (i, j) ; H 0) and p (X n (i, j) ; H 1) are the conditional probability density functions of the nth image under the IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
null (target absent) and alternative (target present) hypotheses, respectively, and data independence across the multiple views is assumed. By comparing the likelihood ratio with a threshold c determined using the Neyman–Pearson theorem [77], a fused binary image Ff can be defined as Ff (i, j) = ( 1 if LR (i, j) 2 c (6) 0 if LR (i, j) # c. y (m) The second method applied the LRT detector to individual images, followed by fusion of the detected binary images through a pixel-by-pixel multiplication. Since the clutter generates different image-domain signatures when observed from different viewpoints, both strategies take advantage of the clutter diversity provided by the multiple views, though the latter scheme does not require the data independence assumption across the multiple views. An adaptive version of the LRT detector of [53] was proposed in [78] to allow enhanced multiview detection of lowsignature targets in a rough surface clutter environment. To achieve a more accurate estimation of the image-domain statistics, the target and clutter distributions were iteratively adjusted by means of a two-step procedure. The first step aimed at separating the image into target and clutter regions, whereas the second step used the extracted target and clutter regions to update the target and clutter statistics. This process was repeated until convergence was achieved. –275 Along Track (m) A binary image from [78], corresponding to the image presented in Figure 4, is reported in Figure 12. In [78], it was shown that an adaptive detector can outperform its nonadaptive counterpart in terms of the false-alarm rates while providing comparable detection performance. A robust LRT detector, under the independence viewpoints assumption, was proposed in [79] for multiview FLGPR imaging. Instead of modeling the distributions of the target and clutter pixels with parametric families, a band of feasible probability densities under each hypothesis was constructed using training data. The detector was then designed such that it minimized the maximum error probability for all feasible density pairs within the two bands. This relaxed the strong assumptions about the clutter and noise distributions, rendering the detector robust against statistical model deviations. The minimax approach is critical in cases where accurate estimation of the distribution of the background clutter may be challenging. A binary image from [79], corresponding to the image of Figure 4, is depicted in Figure 13. It was demonstrated that, –280 –285 –10 –5 0 Along Array (m) 5 10 (a) 5 4 3 2 1 0 –1 –2 –3 –4 –5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 x (m) FIGURE 12. The detection result for the image presented in Figure 4 obtained through the adaptive procedure [78]. The red areas indicate the detected target regions, while the black areas represent false alarms. (Source: [78].) –208 –210 5 –212 3 –214 –216 –10 –5 0 Along Array (m) 5 10 (b) FIGURE 11. An FL-GPR-processed image in [76]. The × symbol indi- cates false alarms, + indicates fiducial alarm, and a circle indicates a target. The panels correspond to two different regions along a track that have slightly different lengths: (a) –285 to –275 m and (b) –216 to –206 m. (Source: [76].) DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE y (m) Along Track (m) –206 4 1 2 –1 5 7 3 0 9 6 1 –3 –5 8 2 4 6 8 x (m) 10 12 14 16 FIGURE 13. The detection result for the image presented in Figure 4 obtained using the robust LRT detector. (Source: [79].) 183
extracted at each narrow-frequency subband and employed to assess their role in target detection. For this purpose, the authors considered a Fisher’s linear discriminant (FLD)based classifier, which can be mathematically described as follows. With C 1 and C 2 representing the “false-alarm” and “target” classes, respectively, and given N training feature vectors {x n, n = 1, f, N} that have been labeled as C 1 and C 2, FLD finds the projection direction in the feature space that maximizes the class separation as compared to detectors based on parametric models, robust detectors can lead to significantly reduced false-alarm rates, particularly in cases where there is a mismatch between the assumed model and true distributions. Both the robust and parametric detectors were extended to incorporate statistical dependence between multiview images via a copulabased function in [80]. y FLD = (m 2 - m 1) S -w1, (7) where m i, i = 1, 2,is the mean feature vector of the ith class and S w is the within-class scatter matrix, defined as 1 S w = N : / (x n - m 1) (x n - m 1) T xn ! C1 + / xn ! C2 (x n - m 2) (x n - m 2) TD.  (8) When an unlabeled testing data point is collected, its confidence value is determined by the projection of its feature vector on y FLD, and it is classified by means of simple thresholding. In [84], the authors performed target detection using space–wavenumber processing and a feature-based method, employing data measured by means of a vehiclemounted FL-GPR equipped with a MIMO array. The approach was applied in the image domain and relied on the definition of a bistatic scattering function associated with selected pixels. A set of images achieved with different incident and scattering angles was used to estimate the bistatic scattering function. Experimental results demonstrated that the proposed method can offer an efficient feature vector for landmine discrimination. An original approach to process measured data collected at the U.S. Army test site using the radar system 24 Vehicle Direction Along-Track Range (m) SPECTRAL APPROACHES To improve the detection performance of plastic objects, a time-frequency approach was proposed in [30]. The detection problem was conventionally formulated by considering a signal corrupted by interference. To deal with the nonstationary nature of both the signal and clutter, the authors employed time-frequency distribution to provide temporal localization of the signal spectral components. The detector considered the signal time-frequency representation based on the Choi–Williams distribution or, equivalently, the ambiguity function and applied discriminant features extracted using principal component analysis plus the linear discriminant method [81], [82]. The effectiveness of this approach and the employed detector was demonstrated using experimental results. Frequency subband processing was used in [83], together with co- and cross-polarized signals, for enhanced target-detection performance in FL-GPR sensing. Images were formed using one wide subband and four narrow-frequency subbands within a 2.5-GHz signal bandwidth to analyze the frequency dependency of landmines and clutter. An FLGPR image, corresponding to the copolarized (VV) signal over multiple subbands, is shown in Figure 14. It is evident that the clutter is particularly strong, but its distribution changes over the frequency bands considered. On this basis, a number of features, including the magnitude, local contrast, ratio between copolarized and cross-polarized signals, and features of polarimetric decompositions, were 20 16 12 8 –4 0 4 Cross-Track Range (m) (a) –4 0 Cross-Track Range (m) (b) 4 –4 0 4 Cross-Track Range (m) (c) –4 0 Cross-Track Range (m) (d) 4 –4 0 4 Cross-Track Range (m) (e) FIGURE 14. An example of images achieved with the FL-GPR system in [83] keeping unchanged the illuminated scene and polarization (VV) but exploiting different frequency subbands: (a) 0.8–2.8, (b) 0.75–1.35, (c) 1.25–1.85, (d) 1.75–2.35, and (e) 2.25–2.85 GHz. The black circle indicates the true target location. (Source: [83].) 184 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
developed by PSI was proposed in [85]. The method relied on the definition of a set of spatial lanes in the radar image. The identification of potential targets was first independently performed in each subregion, and they were then tracked across the subregions. Weighted averages of the corresponding geometrical features were evaluated, and the target persistence across the spatial regions was used to reduce the false-alarm rates. Targets appearing in a limited number of lanes were removed as part of the detection scheme. An analysis of the spectral features extracted from the scattered signal, with the goal to improve the performance of buried explosive hazard detection, was provided in [86]. Natural resonant frequency and polarization features of improvised explosive devices were examined in [87] for FLGPR. In [76], a spectrum-based classifier was proposed that rejected false alarms by classifying each potential target based on its spatial frequency spectrum. A method based on the use of narrow-band and fullband radar processing, coupled with a classifier exploiting complex-valued Gabor filter responses, was proposed in [66]. Full-band radar images yielded high spatial resolution, while narrow-band images provided the means to detect targets with unique signatures. A composite confidence map was implemented to detect local maxima and isolate potential target pixels. FUTURE TRENDS A completely different radar-based approach from FL-GPR to road mapping and clearing is to employ a traditional airborne, side-looking SAR system flying on a track parallel to the road and imaging the ground area of interest. This approach has the advantages of a high coverage rate as well as the fact that the platform does not come in contact with the in-road hazard. Nevertheless, these radar systems operate at relatively long ranges (at least 1 km) and, consequently, require larger transmitted power and longer coherent integration intervals to achieve high image resolution. Both modeling and experimental studies have demonstrated the difficulty of detecting weak buried targets (such as plastic landmines) by side-looking GPR systems, even in mild clutter conditions [21], [88]. Recent advances in radar sensors based on unmanned aerial vehicle (UAV) platforms promise to bring together the advantages of both types of aforementioned imaging systems [89]–[96]. Thus, a UAV-based SAR system can operate at small elevations and ranges, requiring a small amount of power and a short synthetic aperture length. At the same time, as the flying platform does not come in contact with the ground, the standoff range requirement relevant to ground-vehicle-borne systems does not apply in this case. A UAV-mounted radar system is likely to be significantly less costly to build and operate than any of the current airborne or FL-GPR systems, while the excellent control of UAV flight trajectories makes motion compensation easier DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE to accomplish. The radar antenna can be readily configured as down-looking, side-looking, or FL, depending on the imaging application, while antenna arrays can be combined with the synthetic aperture created by platform motion. Another possible scenario is using a network of distributed UAV-based SAR imaging senONE POTENTIAL sors working cooperatively to CAPABILITY OF FL-GPR map a ground area. VehicleSYSTEMS THAT HAS RARELY borne radar systems will still BEEN EXPLOITED IN have a role in explosive hazard PRACTICE THUS FAR IS THE detection; however, we envision that future trends will CREATION OF 3D IMAGES OF favor employing unmanned THE SCENE UNDER ground vehicle platforms for INVESTIGATION. this application. One potential capability of FL-GPR systems that has rarely been exploited in practice thus far is the creation of 3D images of the scene under investigation. By adding the height dimension to a radar map, one can infer the depth of a buried target and partially mitigate the surface clutter, which now affects only a limited part of the image volume. Additional target features inferred from the 3D map can also be useful in target-classification applications. One example of a GPR system operating on the FL SAR principle and capable of creating 3D images is the iRadar, developed by the Lawrence Livermore National Laboratory [97]. While the iRadar antenna array is mounted close to the ground and provides only a modest standoff range, one can envision a system equipped with a similar array mounted on a UAV flying at a height of 1–2 m over the road and mapping the area of interest, including the underground volume [96]. Although the surface clutter becomes less an issue in the detection of buried targets in 3D images, underground inhomogeneities created by different soil layers, rocks, roots, and so on represent a new source of clutter that may degrade the detection and classification performance. In addition to buried object detection, FL radar technology is finding use in other emerging applications. One example is attempting to exploit the 3D imaging capability of an FL radar to assist helicopter landing in degraded visual environments (DVEs), such as those created by brownout conditions. A prototype radar system based on this principle is currently being developed at ARL [98]. To achieve an angular resolution of 0.1–0.2°, comparable to optical sensors such as lidar, this radar system must operate in the millimeter-wave regime (Ka band). The wave attenuation through dust, sand storms, or other DVE conditions at these frequencies (less than 1 dB/km one way) is still low enough to provide a see-through capability, which is not available in infrared, optical, or lidar sensors. The 3D map of the landing zone obtained by the FL SAR would be interpreted in terms of natural or human-made terrain features, and this information would be passed to the pilot via 185
a helmet-mounted display to assist in deciding whether the landing zone is safe. The principle of the helicopter-mounted FL SAR system for 3D landing zone mapping is explained in Figure 15. The system is equipped with a 2-m-wide front-bumpertype linear antenna array, which provides resolution in the azimuth direction, while THIS EMERGING the forward motion of the TECHNOLOGY HAS GAINED platform at constant height AN INCREASING INTEREST creates a synthetic aperture DUE TO ITS HUMANITARIAN w it h suf f ic ient ele vat ion look-angle diversity to offer AND MILITARY resolution in the vertical diAPPLICATIONS WHILE rection. The radar waveform MAINTAINING OPERATOR bandwidth (between 0.5 and SAFETY. 1 GHz) provides resolution in the down-range direction. To date, several studies based on computer simulations have demonstrated the feasibility of this concept and emphasized some of the major challenges associated with it. CONCLUSIONS In this article, we presented an overview of image formation and subsurface target-detection techniques using FLGPR. This emerging technology has gained an increasing interest due to its humanitarian and military applications while maintaining operator safety. We provided a balanced account of existing methods and discussed their respective advantages and limitations. The presented image formation approaches included conventional back-projection, 2D Synthetic Array 1D Linear Array (a) ar Line Arra y Forward Motion Equivalent 2D Array ∆φ ∆θ Obstacle (b) FIGURE 15. (a) A schematic representation of the helicopter-borne FL SAR system for 3D landing-zone imaging, showing the proposed configuration involving a linear antenna array. (b) The equivalence of this imaging system with a 2D antenna array. 186 microwave tomographic techniques, and CS-based methods, with the last of these assuming the underlying scene to be sparse. We also outlined different approaches to deal with clutter arising from the rough ground interface. Finally, we detailed statistical and spectral techniques for landmine detection in FL-GPR applications. While a broad range of imaging, target-detection, and clutter-suppression techniques have been proposed in the literature, there are still open issues, particularly associated with the detection of plastic targets and real-time operation under various challenging realistic conditions, that require further investigations. New machine learning algorithms could also be devised for target classification, especially in the presence of strong clutter. The future trend of radar deployment on unmanned platforms (both aerial and ground based) brings forth new challenges. From an implementation perspective, the antenna design is a critical issue, especially when using antenna arrays with the limited space available on an unmanned aerial platform. At the preferred operational frequency range of 0.3–3 GHz, depending on the radiation performance, the antennas can be quite bulky and heavy. Compact designs using metamaterial-based UWB conformal antenna technology are a promising potential solution to the implementation challenges. From an algorithmic perspective, multiplatform data fusion strategies under both communication and computation constraints could be devised to achieve enhanced performance using a distributed network of unmanned platforms. In short, research into devising effective solutions for addressing the aforementioned challenges is critical to providing performance guarantees. AUTHOR INFORMATION Davide Comite (davide.comite@uniromal.it) received his master’s degree (cum laude) in communications engineering and Ph.D. degree in electromagnetics and mathematical models for engineering from the Sapienza University of Rome, Rome, Italy, in 2011 and 2015, respectively. He was a visiting Ph.D. student with the Institute of Electronics and Telecommunications of Rennes, University of Rennes 1, Rennes, France, from March to June 2014 and a postdoctoral researcher with the Center of Advanced Communications, Villanova University, Villanova, Pennsylvania, USA, in 2015. He is currently a post-doctoral researcher with the Sapienza University of Rome, Rome, 00184, Italy. His research interests include the study of scattering from natural surfaces as well as GNSS reflectometry over the land, microwave imaging and object detection performed through ground-penetrating radar, modeling of the radar signature in forward-scatter radar systems, study and design of leakywave antennas, and generation of nondiffracting waves and pulses. He has been a recipient of a number of awards at national and international conferences. Most recently, he received a Young Scientist Award for the General Assembly and Scientific Symposium of the International Union IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
of Radio Science (URSI) 2020. In 2019 and 2020, the IEEE Antennas and Propagation Society recognized him as an Outstanding Reviewer for IEEE Transactions on Antennas and Propagation, and he was honored as the best reviewer for IEEE Journal of Selected Topics in Applied Earth Observation and Remote Sensing in 2020. He is an associate editor of Journal of Engineering and Microwaves, Antennas, and Propagation, both by the Institution of Engineering and Technology, and IEEE Access. He is a Senior Member of IEEE and of URSI. Fauzia Ahmad (fauzia.ahmad@temple.edu) received her Ph.D. degree in electrical engineering from the University of Pennsylvania in 1997. She is an associate professor in the Department of Electrical and Computer Engineering, Temple University, Philadelphia, Pennsylvania, 19122, USA. Prior to joining Temple University in 2016, she was a research professor and the director of the Radar Imaging Lab at the Center for Advanced Communications, Villanova University. She has more than 250 publications in the areas of array and statistical signal processing, computational imaging with applications in radar and ultrasonics, compressive sensing , machine learning, radar signal processing, and structural heath monitoring. She is a Fellow of IEEE and of the Society of Photo-Optical Instrumentation Engineers (SPIE). She is the past chair of the IEEE Dennis J. Picard Medal for Radar Technologies and Applications Committee and SPIE Compressive-Sensing Conference series. She currently chairs the SPIE Big Data Conference series. She is a member of the Sensor Array and Multichannel Technical Committee of the IEEE Signal Processing Society, member of the Computational Imaging Technical Committee of the IEEE Signal Processing Society, and member of the Electrical Cluster of the Franklin Institute Committee on Science and the Arts. She also serves as an associate editor of IEEE Transactions on Computational Imaging and IET Radar, Sonar, & Navigation. Moeness G. Amin (moeness.amin@villanova.edu) received his Ph.D. degree in 1984 from the University of Colorado, Boulder. Since 1985, he has been on the faculty of the Department of Electrical and Computer Engineering at Villanova University, Villanova, Pennsylvania, 19085, USA, where he is now a professor and director of the Center for Advanced Communications. He is a Fellow of IEEE, the Society of Photo-Optical Instrumentation Engineers, the Institute of Engineering and Technology (IET), and the European Association for Signal Processing (EURASIP). He is a recipient of the U.S. Fulbright Distinguished Chair in Advanced Science and Technology, Alexander von Humboldt Research Award, IET Achievement Medal, IEEE Warren D. White Award for Excellence in Radar Engineering, IEEE Signal Processing Society Technical Achievement Award, NATO Scientific Achievement Award, EURASIP Technical Achievement Award, and IEEE Third Millennium Medal. He was a Distinguished Lecturer of the IEEE Signal Processing Society. He has more than 850 journal and conference publications in the areas of wireless communications, time–frequency analysis, sensor array processing, satellite DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE navigations, ultrasound imaging, and radar signal processing. He is a recipient of 12 best paper awards. He is the editor of three books from CRC Press: Through-the-Wall Radar Imaging (2011), Compressive Sensing for Urban Radar (2014), and Radar for Indoor Monitoring (2017). He serves on the editorial board of Proceedings of the IEEE. Traian Dogaru (traian.v.dogaru.civ@mail.mil) received his degree in electrical engineering from the Polytechnic University of Bucharest, Bucharest, Romania, in 1990 and his M.S. and Ph.D. degrees in electrical engineering from Duke University, Durham, North Carolina, USA, in 1997 and 1999, respectively. He was a research associate with Duke University, developing algorithms for electromagnetic field modeling, between 1999 and 2001. He has been with the U.S. Army Research Laboratory, Adelphi, Maryland, 20783, USA, since 2001. His research interests include radar signature modeling, computational electromagnetics, signal processing, radar imaging and detection of concealed targets, sensing through the wall, foliage penetration, and ground-penetrating radar, as well as applying advanced computational modeling techniques to the analysis of complex sensing scenarios. He is a Member of IEEE. REFERENCES [1] A. P. Annan, “GPR—History, trends, and future developments,” Subsurface Sens. Technol. Appl., vol. 3, no. 4, pp. 253–270, 2002. doi: 10.1023/A:1020657129590. [2] D. J. Daniels, “A review of GPR for landmine detection,” Sens. Imag., vol. 7, no. 3, p. 90, 2006. doi: 10.1007/s11220-006-0024-5. [3] M. Sato, “Principles of mine detection by ground-penetrating radar,” in Anti-personnel Landmine Detection for Humanitarian Demining. Berlin: Springer-Verlag, 2009, pp. 19–26. [4] I. Catapano, G. Gennarelli, G. Ludeno, F. Soldovieri, and R. Persico, “Ground-penetrating radar: Operation principle and data processing,” in Wiley Encyclopedia Elect. Electron. Eng. Hoboken, NJ: Wiley, 2019, pp. 1–23. [5] L. Robledo, M. Carrasco, and D. Mery, “A survey of land mine detection technology,” Int. J. Remote Sens., vol. 30, no. 9, pp. 2399–2410, 2009. doi: 10.1080/01431160802549435. [6] W. R. Scott, K. Kim, G. D. Larson, A. C. Gurbuz, and J. H. McClellan, “Combined seismic, radar, and induction sensor for landmine detection,” J. Acoust. Soc. Amer., vol. 123, no. 5, pp. 3042–3042, 2008. doi: 10.1121/1.2932726. [7] C. R. Ratto, P. A. Torrione, and L. M. Collins, “Exploiting groundpenetrating radar phenomenology in a context-dependent framework for landmine detection and discrimination,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 5, pp. 1689–1700, 2010. doi: 10.1109/TGRS.2010.2084093. [8] M. G. Fernández et al., “Synthetic aperture radar imaging system for landmine detection using a ground penetrating radar on board a unmanned aerial vehicle,” IEEE Access, vol. 6, pp. 45,100–45,112, 2018. [9] S. Vitebskiy and L. Carin, “Resonances of perfectly conducting wires and bodies of revolution buried in a lossy dispersive halfspace,” IEEE Trans. Antennas Propag., vol. 44, no. 12, pp. 1575– 1583, 1996. doi: 10.1109/8.546243. 187
[10] I. J. Gupta, A. van der Merwe, and C.-C. Chen, “Extraction of complex resonances associated with buried targets,” in Proc. SPIE Detection Remediation Technol. Mines Minelike Targets III, 1998, vol. 3392, pp. 1022–1032. doi: 10.1117/12.324149. [11] L. Carin, N. Geng, M. McClure, J. Sichina, and L. Nguyen, “Ultrawide-band synthetic-aperture radar for mine-field detection,” IEEE Antennas Propag. Mag., vol. 41, no. 1, pp. 18–33, 1999. doi: 10.1109/74.755021. [12] R. Persico, Introduction to Ground Penetrating Radar: Inverse Scattering and Data Processing. Hoboken, NJ: Wiley, 2014. [13] M. El-Shenawee and C. M. Rappaport, “Monte Carlo simulations for clutter statistics in minefields: AP-mine-like-target buried near a dielectric object beneath 2-D random rough ground surfaces,” IEEE Trans. Geosci. Remote Sens., vol. 40, no. 6, pp. 1416– 1426, 2002. doi: 10.1109/TGRS.2002.800275. [14] A. C. Gurbuz, J. H. McClellan, and W. R. Scott, “A compressive sensing data acquisition and imaging method for stepped frequency GPRs,” IEEE Trans. Signal Process., vol. 57, no. 7, pp. 2640–2650, 2009. doi: 10.1109/TSP.2009.2016270. [15] D. Comite, A. Galli, I. Catapano, and F. Soldovieri, “Advanced imaging for down-looking contactless GPR systems,” Appl. Comput. Electromagn. Soc. J., vol. 33, no. 7, pp. 1–4, 2017. [16] G. Ludeno, G. Gennarelli, S. Lambot, F. Soldovieri, and I. Catapano, “A comparison of linear inverse scattering models for contactless GPR imaging,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 10, pp. 7305–7316, 2020. doi: 10.1109/TGRS.2020.2981884. [17] R. Solimene, I. Catapano, G. Gennarelli, A. Cuccaro, A. Dell’Aversano, and F. Soldovieri, “SAR imaging algorithms and some unconventional applications: A unified mathematical overview,” IEEE Signal Process. Mag., vol. 31, no. 4, pp. 90–98, 2014. doi: 10.1109/MSP.2014.2311271. [18] T. Dogaru, “NAFDTD—A near-field finite difference time domain solver,” Army Research Lab., Sensors and Electronic Devices Directorate, Adelphi, MD, Tech. Rep. ARL-TR-6110, 2012. [19] D. Comite, F. Ahmad, M. Amin, and T. Dogaru, “Multi-aperture processing for improved target detection in forward-looking GPR applications,” in Proc. Eur. Conf. Antennas Propag., 2016, pp. 1–3. [20] M. M. Tajdini, B. Gonzalez-Valdes, J. A. Martinez-Lorenzo, A. W. Morgenthaler, and C. M. Rappaport, “Real-time modeling of forward-looking synthetic aperture ground penetrating radar scattering from rough terrain,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 5, pp. 2754–2765, May 2019. doi: 10.1109/ TGRS.2018.2876808. [21] L. Nguyen, K. Ranney, K. Sherbondy, and A. Sullivan, “Detection of buried in-road IED targets using airborne ultra-wideband (UWB) low-frequency SAR,” in Proc. 60th MSS Tri-Service Radar Symp., 2014. [22] J. A. Camilo, J. M. Malof, P. A. Torrione, L. M. Collins, and K. D. Morton Jr., “Clutter and target discrimination in forwardlooking ground penetrating radar using sparse structured basis pursuits,” in Proc. SPIE Detection Sens. Mines, Explosive Objects, and Obscured Targets XX, 2015, , vol. 9454, p. 94540V. doi: 10.1117/12.2176491. [23] F. Soldovieri, G. Gennarelli, I. Catapano, D. Liao, and T. Dogaru, “Forward-looking radar imaging: A comparison of two data processing strategies,” IEEE J. Sel. Topics Appl. Earth Observ. Re- 188 mote Sens., vol. 10, no. 2, pp. 562–571, 2016. doi: 10.1109/ JSTARS.2016.2543840. [24] A. W. Morgenthaler and C. M. Rappaport, “Scattering from lossy dielectric objects buried beneath randomly rough ground: Validating the semi-analytic mode matching algorithm with 2-D FDFD,” IEEE Trans. Geosci. Remote Sens., vol. 39, no. 11, pp. 2421– 2428, 2001. doi: 10.1109/36.964978. [25] J. T. Johnson and R. J. Burkholder, “Coupled canonical grid/ discrete dipole approach for computing scattering from objects above or below a rough interface,” IEEE Trans. Geosci. Remote Sens., vol. 39, no. 6, pp. 1214–1220, 2001. doi: 10.1109/36.927443. [26] D. Comite, A. Galli, I. Catapano, and F. Soldovieri, “The role of the antenna radiation pattern in the performance of a microwave tomographic approach for GPR imaging,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 10, no. 10, pp. 4337–4347, 2017. doi: 10.1109/JSTARS.2016.2636833. [27] J. Kositsky and P. Milanfar, “Forward-looking high-resolution GPR system,” in Proc. SPIE Detection and Remediation Technol. Mines Minelike Targets IV, 1999, vol. 3710, pp. 1052–1062. [28] J. Kositsky and C. A. Amazeen, “Results from a forward-looking GPR mine detection system,” in Proc. SPIE Detection and Remediation Technol. Mines and Minelike Targets VI, 2001, vol. 4394, pp. 700–711. [29] J. Kositsky, R. Cosgrove, C. A. Amazeen, and P. Milanfar, “Results from a forward-looking GPR mine detection system,” in Proc. SPIE Detection and Remediation Technol. Mines Minelike Targets VII, 2002, vol. 4742, pp. 206–217. [30] Y. Sun and J. Li, “Time–frequency analysis for plastic landmine detection via forward-looking ground penetrating radar,” Inst. Elect. Eng. Proc. Radar, Sonar, Navigation, vol. 150, no. 4, pp. 253–261, 2003. [31] M. R. Bradley, T. R. Witten, M. Duncan, and R. McCummins, “Anti-tank and side-attack mine detection with a forward-looking GPR,” in Proc. SPIE Detection and Remediation Technol. Mines Minelike Targets IX, 2004, vol. 5415, pp. 421–432. [32] M. Ressler, L. Nguyen, F. Koenig, D. Wong, and G. Smith, “The Army Research Laboratory (ARL) synchronous impulse reconstruction (SIRE) forward-looking radar,” in Proc. SPIE Unmanned Systems Technology IX, 2007, vol. 6561, pp. 35–46. [33] I. Catapano, A. Affinito, A. Del Moro, G. Alli, and F. Soldovieri, “Forward-looking ground-penetrating radar via a linear inverse scattering approach,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 10, pp. 5624–5633, 2015. doi: 10.1109/TGRS.2015.2426502. [34] B. R. Phelan, K. D. Sherbondy, K. I. Ranney, and R. M. Narayanan, “Proc. SPIE Design and performance of an ultra-wideband stepped-frequency radar with precise frequency control for landmine and IED detection,” in Proc. Radar Sensor Technology XVIII, 2014, vol. 9077, pp. 53–64. [35] B. R. Phelan, K. I. Ranney, K. A. Gallagher, J. T. Clark, K. D. Sherbondy, and R. M. Narayanan, “Design of ultrawideband stepped-frequency radar for imaging of obscured targets,” IEEE Sensors J., vol. 17, no. 14, pp. 4435–4446, 2017. doi: 10.1109/ JSEN.2017.2707340. [36] T. Ton, D. Wong, and M. Soumekh, “ALARIC forward-looking ground penetrating radar system with standoff capability,” in IEEE Int. Conf. Wireless Information Technol. Syst., 2010, pp. 1–4. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
[37] D. Liao, T. Dogaru, and A. Sullivan, “Large-scale, full-wave-based emulation of step-frequency forward-looking radar imaging in rough terrain environments,” Sens. Imag., vol. 15, no. 1, p. 88, 2014. [38] D. Liao and T. Dogaru, “Full-wave characterization of rough terrain surface scattering for forward-looking radar applications,” IEEE Trans. Antennas Propag., vol. 60, no. 8, pp. 3853–3866, 2012. doi: 10.1109/TAP.2012.2201076. [39] M. M. Tajdini, A. W. Morgenthaler, and C. M. Rappaport, “Multiview synthetic aperture ground-penetrating radar detection in rough terrain environment: A real-time 3-d forward model,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 5, pp. 3400–3410, 2019. doi: 10.1109/TGRS.2019.2954776. [40] M. Pastorino, Microwave Imaging, vol. 208. Hoboken, NJ: Wiley, 2010. [41] G. A. McMechan, “A review of seismic acoustic imaging by reverse-time migration,” Int. J. Imag. Syst. Technol., vol. 1, no. 1, pp. 18–21, 1989. doi: 10.1002/ima.1850010104. [42] C. Özdemir, Ş. Demirci, E. Yiğit, and B. Yilmaz, “A review on migration methods in b-scan ground penetrating radar imaging,” Math. Problems Eng., vol. 2014, pp. 1–17, June 2014. doi: 10.1155/2014/280738. [43] J. M. Lopez-Sanchez and J. Fortuny-Guasch, “3-D radar imaging using range migration techniques,” IEEE Trans. Antennas Propag., vol. 48, no. 5, pp. 728–737, 2000. doi: 10.1109/8.855491. [44] Y. Wang, Y. Sun, J. Li, and P. Stoica, “Adaptive imaging for forward-looking ground penetrating radar,” IEEE Trans. Aerosp. Electron. Syst., vol. 41, no. 3, pp. 922–936, 2005. doi: 10.1109/ TAES.2005.1541439. [45] J. Gazdag, “Wave equation migration with the phase-shift method,” Geophysics, vol. 43, no. 7, pp. 1342–1351, 1978. doi: 10.1190/1.1440899. [46] I. Catapano, F. Soldovieri, G. Alli, G. Mollo, and L. A. Forte, “On the reconstruction capabilities of beamforming and a microwave tomographic approach,” IEEE Geosci. Remote Sens. Lett., vol. 12, no. 12, pp. 2369–2373, 2015. doi: 10.1109/LGRS.2015.2476514. [47] W. C. Chew, Waves and Fields in Inhomogeneous Media. Piscataway, NJ: IEEE Press, 1995. [48] M. Bertero and P. Boccacci, Introduction to Inverse Problems in Imaging. Boca Raton, FL: CRC Press, 1998. [49] G. Leone and F. Soldovieri, “Analysis of the distorted born approximation for subsurface reconstruction: Truncation and uncertainties effects,” IEEE Trans. Geosci. Remote Sens., vol. 41, no. 1, pp. 66–74, 2003. doi: 10.1109/TGRS.2002.806999. [50] P. Meincke, “Linear GPR inversion for lossy soil and a planar airsoil interface,” IEEE Trans. Geosci. Remote Sens., vol. 39, no. 12, pp. 2713–2721, 2001. doi: 10.1109/36.975005. [51] T. B. Hansen and P. M. Johansen, “Inversion scheme for ground penetrating radar that takes into account the planar air-soil interface,” IEEE Trans. Geosci. Remote Sens., vol. 38, no. 1, pp. 496–506, 2000. doi: 10.1109/36.823944. [52] A. Ben-Israel and T. N. Greville, Generalized Inverses: Theory and Applications, vol. 15. New York: Springer Science & Business Media, 2003. [53] D. Comite, F. Ahmad, D. Liao, T. Dogaru, and M. G. Amin, “Multiview imaging for low-signature target detection in rough-surface clutter environment,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 9, pp. 5220–5229, 2017. doi: 10.1109/TGRS.2017.2703820. [54] R. Solimene, A. Cuccaro, A. Dell’Aversano, I. Catapano, and F. Soldovieri, “Ground clutter removal in GPR surveys,” IEEE J. Sel. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE Topics Appl. Earth Observ. Remote Sens., vol. 7, no. 3, pp. 792–798, 2013. doi: 10.1109/JSTARS.2013.2287016. [55] D. Comite, F. Ahmad, and T. Dogaru, “Performance of free-space tomographic imaging approximation for shallow-buried target detection,” in Proc. IEEE 7th Int. Workshop on Comput. Adv. MultiSensor Adaptive Process., 2017, pp. 1–4. [56] J. Yang, T. Jin, X. Huang, J. Thompson, and Z. Zhou, “Sparse MIMO array forward-looking GPR imaging based on compressed sensing in clutter environment,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 7, pp. 4480–4494, 2013. doi: 10.1109/ TGRS.2013.2282308. [57] L. M. van Kempen, H. Sahli, J. Brooks, and J. P. Cornelis, “New results on clutter reduction and parameter estimation for land mine detection using GPR,” in Proc. 8th Int. Conf. Ground Penetrating Radar, 2000, vol. 4084, pp. 872–879. [58] F. Abujarad, A. Jostingmeier, and A. Omar, “Clutter removal for landmine using different signal processing techniques,” in Proc. Int. Conf. Ground Penetrating Radar, 2004, pp. 697–700. [59] R. Wu et al., “Adaptive ground bounce removal,” Electron. Lett., vol. 37, no. 20, pp. 1250–1252, 2001. doi: 10.1049/el:20010855. [60] R. Wu, J. Liu, Q. Gao, H. Li, and B. Zhang, “Progress in the research of ground bounce removal for landmine detection with ground penetrating radar,” PIERS Online, vol. 1, no. 3, pp. 336– 340, 2005. doi: 10.2529/PIERS041130195615. [61] G. Nadim, “Clutter reduction and detection of landmine objects in ground penetrating radar data using likelihood method,” in Proc. IEEE Int. Symp. Commun., Control Signal Process., 2008, pp. 98–106. [62] F. Abujarad, G. Nadim, and A. Omar, “Clutter reduction and detection of landmine objects in ground penetrating radar data using singular value decomposition (SVD),” in Proc. Int. Workshop on Adv. Ground Penetrating Radar, 2005, pp. 37–42. [63] O. Lopera, N. Milisavljević, and S. Lambot, “Clutter reduction in GPR measurements for detecting shallow buried landmines: A Colombian case study,” Near Surface Geophys., vol. 5, no. 1, pp. 57–64, 2007. doi: 10.3997/1873-0604.2006018. [64] D. J. Daniels, “Ground penetrating radar,” in Encyclopedia of RF and Microwave Engineering. Hoboken, NJ: Wiley, 2005. [65] T. C. Havens et al., “Improved detection and false alarm rejection using FLGPR and color imagery in a forward-looking system,” in Proc. SPIE Detection and Sensing Mines, Explosive Objects, and Obscured Targets XV, 2010, vol. 7664, p. 76641U. doi: 10.1117/12.852274. [66] T. C. Havens, J. M. Keller, K. Ho, T. T. Ton, D. C. Wong, and M. Soumekh, “Narrow-band processing and fusion approach for explosive hazard detection in FLGPR,” in Proc. SPIE Detection and Sensing Mines, Explosive Objects, and Obscured Targets XVI, 2011, vol. 8017, p. 80171F. doi: 10.1117/12.884610. [67] D. Comite, F. Ahmad, T. Dogaru, and M. Amin, “Coherence-factor-based rough surface clutter suppression for forward-looking GPR imaging,” Remote Sens., vol. 12, no. 5, p. 857, 2020. doi: 10.3390/rs12050857. [68] T. Dogaru and L. Carin, “Time-domain sensing of targets buried under a rough air-ground interface,” IEEE Trans. Antennas Propag., vol. 46, no. 3, pp. 360–372, 1998. doi: 10.1109/8.662655. [69] H. Jin-feng and Z. Zheng-ou, “A novel method for clutter reduction in the FLGPR measurements,” in Proc. IEEE Int. Conf. Commun., Circuits Syst., 2004, vol. 2, pp. 896–900. 189
[70] L. Van Kempen and H. Sahli, “Signal processing techniques for clutter parameters estimation and clutter removal in GPR data for landmine detection,” in Proc. IEEE Signal Process. Workshop on Stat. Signal Process. (Cat. No. 01TH8563), 2001, pp. 158–161. [71] K. F. Casey, “Rough-surface effects on subsurface target detection,” in Proc. SPIE Detection and Remediation Technol. Mines Minelike Targets VI, 2001, vol. 4394, pp. 754–763. [72] G. A. Tsihrintzis, C. M. Rappaport, S. C. Winton, and P. M. Johansen, “Statistical modeling of rough surface scattering for ground-penetrating radar applications,” in Proc. SPIE Detection and Remediation Technol. Mines Minelike Targets III, 1998, vol. 3392, pp. 735–744. [73] D. Liao and T. Dogaru, “Full-wave-based emulation of forwardlooking radar target imaging in rough terrain environment,” in Proc. IEEE Int. Symp. Antennas Propag., 2011, pp. 2107–2110. [74] D. Liao, “Ground surface scattering and clutter suppression in ground-penetrating radar applications,” in Proc. IEEE Int. Symp. Antennas Propag., 2012, pp. 1–2. [75] D. Comite, F. Ahmad, T. Dogaru, and M. G. Amin, “Coherence factor for rough surface clutter mitigation in forward-looking GPR,” in Proc. IEEE Radar Conf., 2017, pp. 1803–1806. [76] T. C. Havens et al., “Locally adaptive detection algorithm for forward-looking ground-penetrating radar,” in Proc. SPIE Detection and Sensing Mines, Explosive Objects, and Obscured Targets XV, 2010, vol. 7664, p. 76642E. doi: 10.1117/12.851512. [77] S. M. Kay, Fundamentals of Statistical Signal Processing. Englewood Cliffs, NJ: Prentice Hall, 1993. [78] D. Comite, F. Ahmad, T. Dogaru, and M. Amin, “Adaptive detection of low-signature targets in forward-looking GPR imagery,” IEEE Geosci. Remote Sens. Lett., vol. 15, no. 10, pp. 1520–1524, Oct. 2018. [79] A. D. Pambudi, M. Fauß, F. Ahmad, and A. M. Zoubir, “Minimax robust landmine detection using forward-looking ground-penetrating radar,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 7, pp. 1–10, 2020. doi: 10.1109/TGRS.2020.2971956. [80] A. D. Pambudi, F. Ahmad, and A. M. Zoubir, “Copula-based robust landmine detection in multi-view forward-looking GPR imagery,” in Proc. IEEE Radar Conf., 2020, pp. 1–6. [81] R. O. Duda, P. E. hart, and D. G. Stork, Pattern Classification. Hoboken, NJ: Wiley, 2001. [82] C. M. Bishop, Pattern Recognition and Machine Learning. Berlin: Springer-Verlag, 2006. [83] T. Wang, J. M. Keller, P. D. Gader, and O. Sjahputera, “Frequency subband processing and feature analysis of forward-looking ground-penetrating radar signals for land-mine detection,” IEEE Trans. Geosci. Remote Sens., vol. 45, no. 3, pp. 718–729, 2007. doi: 10.1109/TGRS.2006.888142. [84] T. Jin, J. Lou, and Z. Zhou, “Extraction of landmine features using a forward-looking ground-penetrating radar with MIMO array,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 10, pp. 4135–4144, 2012. doi: 10.1109/TGRS.2012.2188803. [85] T. Wang, O. Sjahputera, J. M. Keller, and P. D. Gader, “Landmine detection using forward-looking GPR with object tracking,” in Proc. SPIE Detection and Remediation Technol. Mines Minelike Targets X, 2005, vol. 5794, pp. 1080–1088. [86] J. Farrell et al., “Evaluation and improvement of spectral features for the detection of buried explosive hazards using forward- 190 looking ground-penetrating radar,” in Proc. SPIE Detection and Sensing Mines, Explosive Objects, and Obscured Targets XVII, 2012, vol. 8357, p. 83571C. doi: 10.1117/12.918779. [87] H.-S. Youn et al., “Feasibility study for IED detection using forward-looking ground penetrating radar integrated with target features classification,” in Proc. IEEE Antennas Propag. Soc. Int. Symp., 2010, pp. 1–4. [88] T. Dogaru and C. Le, “Polarization differences in airborne ground penetrating radar performance for landmine detection,” in Proc. SPIE Radar Sensor Technology XX, 2016, vol. 9829, pp. 85–97. [89] M. Garcia-Fernandez, Y. Alvarez-Lopez, and F. Las Heras, “Autonomous airborne 3D SAR imaging system for subsurface sensing: UWB-GPR on board a UAV for landmine and IED detection,” Remote Sens., vol. 11, no. 20, p. 2357, 2019. doi: 10.3390/ rs11202357. [90]A. Alzeyadi, J. Hu, and T. Yu, “Electromagnetic sensing of a subsurface metallic object at different depths,” in Proc. SPIE Nondestructive Characterization and Monitoring Adv. Mater., Aerosp., Civil Infrastructure, and Transp. XIII, 2019, vol. 10971, p. 1,097,105. [91] M. González-Díaz, M. García-Fernández, Y. Álvarez-Loópez, and F. Las-Heras, “Improvement of GPR SAR-based techniques for accurate detection and imaging of buried objects,” IEEE Trans. Instrum. Meas., vol. 69, no. 6, pp. 3126–3138, 2019. doi: 10.1109/ TIM.2019.2930159. [92] D. Šipoš and D. Gleich, “A lightweight and low-power UAVborne ground penetrating radar design for landmine detection,” Sensors, vol. 20, no. 8, p. 2234, 2020. doi: 10.3390/s20082234. [93] I. Catapano et al., “Small multicopter-UAV-based radar imaging: Performance assessment for a single flight track,” Remote Sens., vol. 12, no. 5, p. 774, 2020. doi: 10.3390/rs12050774. [94] T. Dogaru, “Imaging study for small unmanned aerial vehicle (UAV)-mounted ground-penetrating radar: Part I – Methodology and analytic formulation,” Army Res. Lab., Sensors and Electronic Devices Directorate, Adelphi, MD, Tech. Rep. ARLTR-8645, 2019. [95] T. Dogaru, “Imaging study for small unmanned aerial vehicle (UAV)-mounted ground-penetrating radar: Part II – Numeric examples and performance analysis,” Army Res. Lab., Sensors and Electronic Devices Directorate, Adelphi, MD, Tech. Rep. ARLTR-8725, 2019. [96] T. Dogaru, “Imaging study for small unmanned aerial vehicle (UAV)-mounted ground-penetrating radar: Part III – A multistatic approach,” Army Res. Lab., Sensors and Electronic Devices Directorate, Adelphi, MD, Tech. Rep. ARL-TR-8773, 2019. [97] D. W. Paglieroni, D. H. Chambers, J. E. Mast, S. W. Bond, and N. Reginald Beer, “Imaging modes for ground penetrating radar and their relation to detection performance,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 8, no. 3, pp. 1132–1144, 2015. doi: 10.1109/JSTARS.2014.2357718. [98] T. Dogaru, “Synthetic aperture radar for helicopter landing in degraded visual environments,” Army Res. Lab., Sensors and Electronic Devices Directorate, Adelphi, MD, Tech. Rep. ARLTR-8595, 2018. GRS IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
Gaussianizing the Earth Multidimensional information measures for Earth data analysis J. EMMANUEL JOHNSON, VALERO LAPARRA, MARÍA PILES, AND GUSTAU CAMPS-VALLS I nformation theory (IT) is an excellent framework for analyzing Earth system data because it enables us to characterize uncertainty and redundancy and is universally interpretable. However, accurately estimating information content is challenging because spatiotemporal data are high-dimensional and heterogeneous and have nonlinear characteristics. In this article, we apply multivariate Gaussianization for probability density estimation, which is robust to dimensionality, comes with statistical guarantees, and is easy to apply. In addition, this methodology enables us to estimate information-theoretic measures to characterDigital Object Identifier 10.1109/MGRS.2021.3066260 Date of current version: 6 May 2021 ize multivariate densities: information, entropy, total correlation, and mutual information (MI). We demonstrate how IT measures can be applied in various Earth system data analysis problems. First, we show how the method can be used to jointly Gaussianize radar backscattering intensities, synthesize hyperspectral data, and quantify information content in aerial optical images. We also quantify the information content of several variables that describe the soil–vegetation status in agroecosystems and investigate the temporal scales that maximize their shared information under extreme events, such as droughts. Finally, we measure the relative information content of space and time dimensions in remote sensing products and model simulations involving long records ©SHUTTERSTOCK.COM/SUMANBHAUMIK DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 0274-6638/21©2021IEEE 191
of key variables, such as precipitation, sensible heat (SH), and evaporation. Results confirm the validity of the method, for which we anticipate wide use and adoption. Code and demonstrations of the implemented algorithms and IT measures are provided. EARTH DATA AND INFORMATION DELUGE Understanding spatial temporal dynamics of Earth system models and ovservation data are fundamental to monitoring our planet and understanding climate change [1]–[4]. We now face an information deluge from remote sensing platforms that continuously increase the spatial, temporal, and spectral resolution of data sources. Earth system data come in high volumes, are heterogeneous, and are riddled with uncertainty [5], which poses important challenges in analysis, modeling, and understanding. The statistical analysis of remote sensing data and model simulations requires dealing with this large amount of heterogeneous, multivariate, and spatiotemporal material. Copious amounts of data do not necessarily mean large quantities of information. For example, it is now widely acknowledged that models are often correlated and share common traits, features, and information content. Which features are the most appropriate and representative? How can we best quantify their information content in meaningful units? Essential Earth variables and data products exhibit high levels of redundancy in space and time. So, what is the most appropriate space, time, or spatiotemporal scale one should look at? The same questions arise when trying to assess and choose the most adequate observational variable and biogeophysical parameter for Earth monitoring. From a purely statistical standpoint, information quantification for Earth and climate data is difficult. IT is the appropriate framework to study information content, uncertainty, and redundancy [6]. The estimation of entropy and MI for discrete and continuous random variables has been addressed through different approaches in the statistics literature [7]–[10]. But the IT measure estimation of multivariate data is problematic. Some methods, such as using histograms [6], [11] and nearest neighbors [8]–[10], can be very limiting, as they do not scale well, do not converge to the true measure, and show a high estimation bias [12]. However, in the remote sensing and geosciences community, there have been many successful application-driven approaches to overcome this challenge. Examples include studying feature redundancy in image classifiers [13], assessing the maximum number of parameters that can be estimated given a set of observations [14], remote sensing feature extraction and weighting [15], [16], data fusion [17], image registration [18]– [20], synthetic aperture radar (SAR) data characterization [21], [22], and quantifying uncertainty in models and observations [23]. However, again, these methods are applicationdriven, and none have been tested in very-high-dimensional scenarios, which is crucial for data characterization. All information quantification metrics require a good multivariate density estimator. This is especially 192 problematic in Earth observation (EO) data with moderate- to high-dimensional problems and nonlinear feature relations. These issues affect the classic parametric density estimators based on the exponential family of solutions and mixture distributions as well as nonparametric methods based on histograms, kernel density estimation (KDE), and k-nearest neighbors (kNNs). As an alternative to these traditional methods, there is a new class of techniques called neural density estimators [24], which are parameterized neural networks that approximate densities. They use the “change-of-variables” formula to estimate the densities of inputs and enable one to draw samples of input data. They have promise, as they have been successfully used in applications related to Earth system sciences, including inverse problems [25] and density estimation [26]. In this article, we look at a particular class of models in the neural density estimation family. In particular, we introduce the Gaussianization method [27] and a generalized algorithm called rotation-based iterative Gaussianization (RBIG) [28]. This uses a repeated sequence of simpler feature-wise Gaussian transformations and orthogonal rotations until convergence. In each iteration, the total correlation and the non-Gaussianity are reduced and converge toward zero, that is, toward full independence. The learned transformation toward the Gaussian domain is invertible, which enables us to easily synthesize data by inverting samples drawn from the Gaussian domain. The approach is also advantageous because it enables us to estimate IT measures, such as entropy, total correlation, non-Gaussianity, and MI in high-dimensional data. It is fast and easy to apply and has links to deep neural networks [28]–[30]. MULTIVARIATE GAUSSIANIZATION PROBABILITY DENSITY FUNCTION ESTIMATION Most problems in signal and image processing, IT, and machine learning involve the challenging task of multidimensional probability density function (PDF) estimation. A PDF, or simply a density p ($), takes an input x ! X and outputs a density following the properties 1) that p (x) $ 0, 6x ! R D and 2) that it has to sum to one, #X p (x) dx = 1. In practice, we usually do not have access to the PDF p ($), but we do have a set of (multivariate) samples drawn from the generating process x = " x 1, x 2, f, x N , to estimate the PDF from. Accurate PDF estimation is important because it enables us to 1) calculate the probability of any arbitrary input data point, which accounts for the relative likelihood that the value of the random variable will equal the sample; 2) generate samples x l + p (x) from this distribution, thus facilitating data synthesis, background and support estimation, and anomaly detection; and 3) calculate expectations for functions (or transformations) of arbitrary form f (x) given p (x), i.e., E x [f (x)], which enables us to, e.g., characterize a system. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
Having access to all these properties gives us the ability to tackle long-standing problems in machine learning and statistics. With accurate PDF estimates, one can model the conditional densities of data generated from a prior distribution, develop accurate and efficient compression schemes, and use principled objective functions, such as the maximum likelihood. In addition, having access to an accurate density estimator can be useful in many hybrid applications to deal with out-of-sample and out-ofdistribution problems [31]. The problem is, therefore, to estimate the density p (x) given a set of samples from X. The simplest approach to PDF estimation assumes that the density has a parametric functional form defined by a fixed number of tunable parameters. The Gaussian assumption is the most widely adopted for unimodal distributions, which come parameterized by a mean n and a covariance function R. If more than one mode is assumed, a mixture of Gaussians (MoG) generally leads to better fits. However, finding a parametric form for the distribution that fits properly to particular data is very difficult in most cases. The alternative technique comes from nonparametric models, which do not assume a specific form for the distribution and are learned from data. The simplest nonparametric method estimates the PDF by partitioning the data space into nonoverlapping bins, where the density is estimated as the fraction of data points in the bin divided by the volume of the bin. This estimator runs the risk of overfitting or underfitting, depending upon how the bins are selected. Thus, there are several rule-of-thumb estimators with a wide range of guidelines for choosing the most appropriate bin size: 1) an overall good estimator using Sturges’s Rule [32], an estimator that is better for a larger number of samples and is more robust to outliers by using the Freedman–Diaconis method [33], and more Bayesian approaches [34]. However, histogram-based PDF estimation methods are affected by the curse of dimensionality, so they cannot be applied to a large number of features. Alternative parametric estimates that follow probability estimation schemes for the optimal bin width determined by the maximum likelihood have been introduced [24]. However, they are very rigid and lead to extremely rough density functions. To achieve smoother PDF estimates, KDE is popular. It places a nonlinear kernel function with a varying bandwidth parameter to control the degree of smoothness on top of each example. Unfortunately, a bias–variance tradeoff will result in over/underfitting the PDF, especially in moderate- to high-dimensional problems. In the previous approaches, the bandwidth is typically fixed a priori following heuristics in the literature [35], and it rarely accounts for the concentration of points, i.e., that smaller bins should be placed in regions with a higher concentration of points, in the form of an adaptive bit allocation scheme. This can be addressed by using kNNs, which have one adaptive bandwidth per location and depend on the number of available training points. However, all the preceding DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE density estimators suffer from the curse of dimensionality: as the dimensionality increases, the space becomes sparser, and density estimates are unreliable. GAUSSIANIZATION FOR PDF ESTIMATION An alternative way to estimate a PDF from observational data is to employ a data transformation to a convenient domain instead of working explicitly in the high-dimensional input domain. The question of what constitutes a convenient domain is a long-standing one. Ideally, the domain should have independent components so that one can work in each dimension independently to get rid of the curse of dimensionality. It should enable one to perform operations and compute quantities therein, and it should be invertible so that one can express these quantities in meaningful units of the input domain. The Gaussian distribution has the desirable properties of showing independent components and being mathematically tractable and is thus a good candidate for density estimation. A class of Gaussianization methods [28], [30] looks for transforms to a multivariate Gaussian domain. These transforms are related to projection pursuit transformations introduced in [42] and seek to transform a multivariate distribution p (x), where x + X ! R d, into a standardized multivariate Gaussian distribution [27], [28]: G i: x ! R d 7 z ! R d  + p (x) + N (0, I d), (1) where i are the parameters learned to Gaussianize the data x, 0 is a vector of zeros (for the means), and I d is the identity matrix (for the covariance). By construction, the Gaussianization transform is a parameterized function G i consisting of a sequence of L iterations (or layers), each performing an orthogonal rotation of the data and a marginal Gaussianization transformation to every feature. The transformation G i in each iteration , is defined as G i : x , + 1 = R , W , (x ,), , = 0, 1, f, L, where x 0 corresponds to the original data x, W , is the marginal Gaussianization of each dimension of x , for the iteration ,, and R , is a rotation matrix for the marginally Gaussianized variable W , (x ,). After convergence in L iterations, the transformation contains all the needed information to convert data coming from the original density into a multivariate Gaussian. Here, i collectively group all parameters: those from the rotation matrix R and the marginal transformation W. For example, one could use a principal component analysis (PCA) transformation for the rotation matrix R and a histogram transformation for the marginal Gaussianization transformation W. Then, the eigenvectors obtained from PCA describing R and the parameterizations of W would define i. See Table 1 for more details on the decomposition of this formula and Figure 1 for a full decomposition of a toy data set. 193
TABLE 1. A SUMMARY OF THE COMPONENTS OF THE GAUSSIANIZATION ALGORITHM. DESCRIPTION NOTATION TRANSFORMATION DOMAIN Marginal uniformization U Histogram [28], mixture CDF [36], KDE [30], Lambert [37], spline [38], Box–Cox [39] R " R [0, 1] Inverse CDF CDF −1 Inverse Gaussian CDF, logistic, inverse Cauchy CDF R [0, 1] " R Marginal Gaussianization W = CDF -1 % U Marginal uniformization + inverse CDF R"R Rotation R PCA [28], independent component analysis [27], random rotations [28], Householder transformations [40], [41] Rd " Rd Gaussianization block G , = R 6W 1 gW d@ Composition of rotation + marginal Gaussianization Rd " Rd Gaussianization transform G = 6G 1 % g % G L@ Composition of Gaussianization blocks Rd " Rd BEFORE AFTER CDF: cumulative distribution function. We can use the change-of-variables formula to calculate the PDF of x as p x (x) = p z ^G i (x)h d x G i (x) ,  we can sample points in the original domain xl ! X by generating samples in the transformed Gaussian domain and propagating these through the inverse transformation G -H1 . Because the transform is a product of linear and marginal operations, the Jacobian and the inverse transform can be easily computed [28], [44]. The original Gaussianization algorithm [27] worked by applying an orthogonal rotation matrix via independent component analysis and an MoG for the marginal Gaussian transformation. After enough repetitions L, it was shown that this converged to a multivariate Gaussian distribution [27]. In [28], we extended Gaussianization by realizing that the method will converge with any orthogonal rotation matrix R, and we named the algorithm RBIG. (2) where d x G i (x) is the determinant of the Jacobian of G i with respect to x. Generally, any unknown PDF of x can be estimated as long as we have the transformation G i along with its Jacobian. Intuitively, this transformation essentially converts the density of X into unstructured noise (often Gaussian or normal) [24], [26], [43]. There is no limit to the number of composite transformations G H = G i1 % G i 2 % g % G i L that can be used to sufficiently converge to the Gaussian distribution. In addition, because G H is invertible, G1 x0 G2 x1 GO G3 x2 x3 GL xO xL FIGURE 1. A complete Gaussianization of a noisy sine wave to a marginally and jointly Gaussian distributed one. We use PCA for the rotation matrix and a histogram cumulative distribution function estimator for the marginal transformation. Plots were generated with seaborn [73]. 194 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
This facilitated simpler and faster algorithms, such as PCA, and even randomly generated orthogonal rotation matrices. In addition, much simpler univariate estimators, such as histograms, were used to significantly speed up the algorithm. Meng et al. [30] coined the term Gaussianization flows and extended the iterative algorithm to be fully parameterized and trainable by incorporating a mixture of logistics as the marginal Gaussianization layer and a sequence of Householder flows [40], [41] as the rotation layer. They also proved that this is a universal approximator and showed convincing results that Gaussianization is comparable to other classes of methods specifically designed for density estimation and sampling [30]. All transformations and example variants can be found in Table 1. For details about the theoretical convergence properties of Gaussianization flows, see [27], [28], and [30]. Regardless of the chosen method, to find the parameters i for the transformation G i, we minimize the following cost function with respect to i: L ^i h = D KL 7p z ^G i ^ x hh N ^0, I D hA, (3) Gaussian). See the RBIG site, https://ipl-uv.github.io/rbig_ jax/, for a working implementation of the RBIG algorithm in Python and MATLAB. IT MEASURES USING THE RBIG TRANSFORM RBIG was designed for density estimation but was inspired by, and had connections to, IT [6]. The series of transformations learned by RBIG converts data from the original domain to a standard multivariate Gaussian one. The features are marginally independent, which is important for determining information-theoretic measures using the Gaussianization scheme. This reduction in redundancy is iteratively achieved and can be explicitly computed by summing up all the layer redundancy reductions. This metric is known as the total correlation, and computing it enables us to derive information-theoretic measures from data. INFORMATION Shannon information I [47] is based on the idea that a sample, x i, is more interesting (it carries more information) if it is less probable. The formal definition of information is which is the Kullback–Leibler (KL) divergence between the estimated Gaussian distribution and the true multivariate Gaussian distribution of mean 0 and covariance I; in other words, this is a measure of how non-Gaussian our distribution is after transformation. This reveals a direct relationship with information-theoretic concepts and measures. Chen [27], [46] showed that (3) can be decomposed as where T (x) is the total correlation (T) (as well as multi-information and multivariate MI) between all the marginal distributions and J m (x) is the KL divergence between the marginal distributions and the standard Gaussian normal distribution. Intuitively, this cost function is trying to minimize the information shared among each of the marginal distributions and ensure that they follow a normal Gaussian distribution. We want to highlight the fact that RBIG vastly transforms and simplifies the PDF estimation problem, from directly estimating the density of the highdimensional multivariate distribution in X to doing it indirectly through a transformation to a Gaussian domain, all by using a series of marginal transformations, which are straightforward and fast. An example of how RBIG works on a simple 2D toy data set is provided in Figure 2. We transform a non-Gaussian 2D data set into a 2D marginal and jointly Gaussian distribution along with the inverse transformation (first row). The second row demonstrates how we can use RBIG to synthesize points in the data domain by using the inverse transformation. Figure 2(f) shows evolution through iterations of the final total correlation (as a measure of redundancy) and the non-Gaussianity (as a measure of the distance to a DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE y x (a) (c) (b) 0.6 0.4 0.2 0 Loss L (i) = T (x) + J m (x), (4) I (x i) = - log (p x (x i)). (5) (d) (e) 0 10 20 Iterations (f) ∆ Total Correlation Non-Gaussianity FIGURE 2. The density estimation of a sinusoid with heteroscedas- tic noise, using RBIG. The original data distribution X is mapped to a Gaussian domain Z, with transform G i parameterized by a set of rotations and marginal Gaussianizations collectively denoted as i, which has an analytic inverse transformation, x = D -i 1 (zt ), to recover the original data. One can sample random data from the Gaussian in domain Z and use the inverse transformation of z to xt for data synthesis. We also demonstrate the losses: the equivalence of the change in the total correlation between layers DT and the KL divergence between transformed data and a multivariate Gaussian (non-Gaussianity). (a) X. (b) zt = G i (x). (c) x = G -i 1 (zt ) . (d) Z. (e) xt = G -i 1 (z) . (f) DT and non-Gaussianity Plots were generated with matplotlib [74]. 195
It can be used, for instance, to highlight regions of more interest in a data set. Information can be computed for each sample in our data set by using RBIG and (2). The expected value of the information provided by a complete data set, x, is called entropy: H (x) = E x [- log (p x (x))]. (6) While entropy could be computed by estimating the information of each sample in a data set using (5) and averaging, computing it using the ability of RBIG to calculate the total correlation is more convenient, as we will see in the following section. TOTAL CORRELATION The total correlation, T, accounts for the information shared among the dimensions of a multidimensional random variable [48], [49]. Details of how to compute T using RBIG can be found in [28]; here, we sketch the main idea. Given data x ! R D, we first learn the Gaussianization transform with L iterations and compute the cumulative reduction in the total correlation in each iteration as T^x h = / d D H^N^0,1hh - / H^x ,dhn.(7) L D ,=1 d=1 x = [x1, x2] y = [y1, y2] H (x1) x1 H (x2) y1 H (x) = H ([x1, x2]) x2 y2 The number of layers L will be determined by the reduction in the total correlation with each transformation. If there is no change in the total correlation after some threshold number of layers, we can assume that x d are completely independent. It is important to note that all entropy calculations involve only marginal operations, which are simple and fast, enabling RBIG to be used on large data sets that have a high number of dimensions. JOINT ENTROPY While the concept of information is attached to a particular sample, entropy is used in different fields to characterize how unpredictable a complete process is. Entropy can be easily computed from the learned RBIG transformation by H (x) = D / H (x i) - T (x),(8) d=1 D d=1 where R H (x i) are marginal entropy estimations and T (x) also involves marginal estimations [see (7)]. MULTIVARIATE MI Multivariate MI accounts for the information shared by two data sets [6]. Estimating MI can be very challenging when working with high-dimensional data. Our approach is based on the invariance property of MI to reparameterize the space of each variable [8]. Therefore, we essentially Gaussianize the two data sets, X and Y, with corresponding H (y1) transforms that remove their total correlations. Then, the total correlation H (y2) remaining between both Gaussianized data sets is equivalent to the MI H (y) between the original data sets: T (x) = MI (x1, x2) T (y)   MI (X, Y) = T ([G i x (X), G i y (Y)]), (9) MI (x, y) T ([x, y]) which again implies only marginal operations [see (7)]. Figure 3 includes a Venn diagram illustrating the different IT measures used in this article, and Table 2 demonstrates how they FIGURE 3. A Venn diagram of the relationships of all IT measures used in this article. The solid-colored circles represent marginal variables, and the intersection regions with bold lines show regions for IT measures, such as MI and total correlation, T. TABLE 2. A COMPARISON OF DIFFERENT IT MEASURES AND THE POPULAR PEARSON CORRELATION COEFFICIENT, t. Correlation t (x, y) Low Medium Low Low Medium High MI MI (x, y) Low Medium High High High High Marginal entropy H (x), H (y) High High High High High High Joint entropy H (x, y) High Medium Medium Low Low Low This table is also a visual demonstration of how to interpret MI and its relationship to marginal entropy and joint entropy; MI (x, y) = H (x) + H (y) - H (x, y). 196 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
compare to the popular Pearson correlation coefficient for different toy data sets. ILLUSTRATIVE EXPERIMENTS In this section, we explore the information content, redundancy, and relation in a selection of Earth data analysis problems involving remote sensing data and models, using RBIG. First, we illustrate the method’s ability to analyze standard remote sensing settings involving total correlation estimation in hyperspectral, radar, and very-high-resolution imagery. Second, we quantify the information content of several variables that describe a soil–vegetation status and investigate the temporal scales leading to the maximum shared information for the detection and precursors of anomalies, such as droughts. Finally, we explore the challenging problems of IT measurement estimates and the quantification of the spatiotemporal information tradeoff in global Earth products. Table 3 summarizes the experiments in terms of measures, applications, and data/simulations. GAUSSIANIZATION IN REMOTE SENSING DATA This first set of experiments considers the use of RBIG for standard remote sensing image processing. We show the performance of RBIG in hyperspectral, very-high-resolution, and radar imagery and for several applications: joint (multivariate) Gaussianization, data synthesis, and information estimation. GAUSSIANIZATION OF RADAR IMAGES The first part of the experiment focuses on the analysis of radar imagery. The data were collected in the Urban Expansion Monitoring (UrbEx) project, a part of the European Space Agency’s European Space Research Institute Data User Program [51]. Results from the UrbEx project were used to perform the analysis of the selected test sites and for validation purposes. We consider a European Remote Sensing Satellite 2 (ERS-2) SAR pair selected with perpendicular baselines between 20 and 150 m to obtain the interferometric coherence. The corresponding pair (I 1, I 2) of SAR backscattering intensities (0–35 days) was stacked for analysis; see Figure 4. The relation between the intensity features is strongly nonlinear and non-Gaussian and shows a large dispersion; see Figure 4(a). The total correlation, T, computed with RBIG for the original domain is T = 0.0929 b. A standard approach in SAR image (pre)processing consists of noise removal and marginal Gaussianization, which can address these problems only partially. This marginal Gaussianization cannot deal with the saturation for high and low signal values [Figure 4(b)]. A multivariate Gaussianization leads to a fully Gaussian density [Figure 4(c)]. This is confirmed by the estimated total correlation of T = 0.0095 b, as it is less than the marginally Gaussianized data. SYNTHESIZING HYPERSPECTRAL IMAGES To show the ability of the method to deal with high-dimensional data, we consider hyperspectral image processing. We took the standard Airborne Visible/Infrared Imaging Spectrometer Indian Pines data set [52], where the data have spectral redundancy and complex joint distributions. The images contain 200 spectral channels, constituting the (very high) input dimensionality. We learned a Gaussianization transform that led to a multivariate Gaussian domain of 200-dimension spectral bands. Then, we selected from a multivariate Gaussian n = 10 6 samples of 200 dimensions and inverted them back to the spectral domain. RBIG can be used this way to easily generate synthetic spectra. Figure 5(a) presents the original and synthesized spectra. It shows how the proposed method enables us to generate/synthesize seemingly spectral distributions, even in such a highdimensional setting. Figure 5(b) and (c) gives corner plots illustrating joint distributions among various spectral bands (10, 20, 50, 100, and 150). We see that the marginal and joint distributions for the RBIG-generated spectra in Figure 5(c) TABLE 3. A SUMMARY OF EXPERIMENTS, WITH DETAILS OF THE DATA SETS, CONFIGURATIONS, APPLICATIONS, AND MEASURES EMPLOYED. EXPERIMENT 1 2 3 DECEMBER 2021 DATA SET CHARACTERISTICS REFERENCE CONFIGURATION APPLICATION MEASURES SAR: European Remote Sensing Satellite 2 26 m, backscatter intensity [51] Pixel-wise Gaussianization T Hyperspectral: Airborne Visible/Infrared Imaging Spectrometer 30 m, 224 channels [52] Pixel-wise Synthesis T Airborne camera: red-green-blue images 10 cm, 21 classes, 100 images/class [53] Spatial I quantification T Optical: Moderate Resolution Imaging Spectroradiometer land surface temperature, normalized-difference vegetation index 0.05º, 5.5 years, 14 days [54] Temporal I quantification, PDF comparison H, MI Passive microwave: Soil Moisture and Ocean Salinity (SMOS) soil moisture, vegetation optical depth 25 km, 5.5 years, daily [55] Temporal I quantification, PDF comparison H, MI Observed and simulated: evaporation, SH, precipitation 0.083º, 10 years, monthly, global [56] Spatiotemporal I quantification I, H IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 197
Machine learning and, in particular, deep learning have led to an important leap in classification accuracy. However, owing to the wealth of data and their diversity, it becomes necessary to design algorithms that exploit most of the images’ information content in terms of relevant features and examples. We validate RBIG to estimate the total correlation (multi-information) in a set of aerial scenes collected in the University of California, Merced, data set [53], which contains manually extracted images from the United States Geological Survey’s National Map Urban Area Imagery collection, from 21 aerial scene categories, are very similar to the real data in Figure 5(b) across all pairwise band combinations. It is important to note that some of the most widely used methods, such as PCA, could replicate Figure 5(a) with a good approximate mean and standard deviation, but they would not be able to duplicate Figure 5(d), where all joint distributions are approximately Gaussian. INFORMATION IN HIGH-RESOLUTION IMAGES Very-high-resolution images are constantly acquired by the new generation of sensors on airborne and spaceborne platforms. A systematic analysis of the images is necessary. I2 I1 l1 50 100 150 200 250 3 200 l2 l2 150 100 –2 l1 0 2 3 2 2 1 1 0 l2 250 0 l1 0 –2 2 0 –1 –1 50 –2 –2 0 –3 –3 (a) (b) (c) 9,000 8,000 7,000 6,000 5,000 4,000 3,000 2,000 1,000 0 25 50 75 100 125 150 175 200 Wavelength, λ (nm) (a) 20 50 150 100 50 20 Real Generated 150 100 Radiance (Wm–2nm–1) FIGURE 4. Radar image processing. We illustrate the Gaussianization of 2D radar data comprised of a pair (I1, I2) of ERS-2 SAR backscattering intensities. (a) The joint distribution is non-Gaussian, and preprocessing before applying any algorithm is generally convenient. The (b) standard marginal Gaussianization does not achieve a full spherical (joint) Gaussian, unlike (c) the RBIG transformation [75], [76]. 10 20 50 (b) 100 150 10 20 50 100 150 (c) FIGURE 5. The Gaussianization and synthesis of hyperspectral data, using RBIG. In (a) we show the mean and standard deviation spectrum for the 21,000 real pixels (mean = black; standard deviation = darker shade) and the 1 million pixels generated synthetically (mean = red; standard deviation = lighter shade) using RBIG. In (b) and (c), we show the marginal and joint distributions of 10, 20, 50, 100, and 150 spectral bands for the real data and for data generated with RBIG, respectively. Plots were generated with corner [77]. 198 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
Freeways b12 b13 Runways Buildings b11 Overpasses Intersections Baseball Diamonds Dense Residential Storage Tanks Parking Lots Tennis Harbors Rivers Beaches Airplanes Golf Courses Sparse Residential Mobile Home Parks Agriculture Chaparral Forest Medium Residential (spatiospectral) T using RBIG; see Figure 6(d). We show in Figure 6(e) the average and standard deviation of the T evolution through 50 iterations for the 21 classes (note the log scale) and the total correlation per class. More textured classes, such as runaways, freeways, buildings, and intersections, lead to higher T, while rather homogeneous/flat with a 1-ft/pixel resolution. The data set contains highly overlapping classes and has 100 images per class; examples appear in Figure 6(a). We extracted 3 # 3 # 3 color patches from each image, which yielded 6,499,950 27-dimension feature vectors per class. Then, we developed a Gaussianization transformation for each class and computed the (a) b11 b12 b13 g11 g12 g13 r11 r12 r13 22 g23 r r r 32 g33 21 22 g11 22 b23 r11 32 b33 r21 23 r31 r32 r33 r31 20 0 –5 15 log (T ) –10 –15 10 –20 –25 5 –30 0 –35 20 30 Iterations (d) r33 (c) 5 10 b33 g33 r32 (b) 0 b23 T 40 50 Parking Lots Runways Airplanes Buildings Freeways Beaches Sparse Residential Intersections Agriculture Dense Residential Overpasses Rivers Storage Tanks Baseball Diamonds Harbors Medium Residential Tennis Courts Mobile Home Parks Golf Courses Forest Chaparral 0 2 4 T (e) 6 8 10 FIGURE 6. The estimation of the total correlation, T, in very-high-resolution aerial imagery. (a) Images for each of the 21 classes in the database, ranked according to their estimated T. (b) Each image is decomposed in 3 # 3 patches with three channels (red-green-blue), making samples of 27 dimensions. (c) We measure how much overlap there is between the information content (i.e., the total correlation) of the 27 dimensions for each class. We show a Venn diagram to illustrate the measured information, following the same criteria as in Figure 3. (d) The average total correlation is iteratively computed for the different 21 class-specific RBIG models through 50 iterations, with the mean T (solid) and the T standard deviation (shaded) across all models. Convergence is achieved very rapidly for all classes (note the log scale). (e) The ranked T per class computed from the RBIG models [78]. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 199
classes, including chaparral, fields, and forests, have little information content. NFORMATION QUANTIFICATION OF TERRESTRIAL BIOSPHERE VARIABLES IN TIME According to climate projections, extreme events are likely to intensify and become more frequent during the coming years [59]. The effects of extreme events (such as droughts) are prevalent not only in the biosphere and atmosphere but in the anthroposphere. Drought is a major cause of limited agricultural productivity, which accounts for a large proportion of the crop losses and annual yield variations throughout the world [60]. Droughts are also direct contributors to social conflicts, migration, and political unrest (e.g., [61]). There are many studies that show the value of incorporating EO data for global agricultural systems and applications [62], [63]. Variables, such as the land surface temperature (LST) and the normalized-difference vegetation index (NDVI), derived from optical satellites, and, more recently, soil moisture (SM) and the vegetation optical depth (VOD) derived from passive microwave sensors, are just a few of the many features that can potentially be key to the early detection of droughts [54], [55], [64]. The Soil Moisture Agricultural Drought Index (SMADI) was proposed in [65] to integrate SM with the LST and NDVI, showing good agreement with other indices and documented events worldwide [54]. In this experiment, we quantify the information in and between LST, NDVI, SM, and VOD variables for a study area in California (only agricultural fields); see Figure 7. The LST and NDVI are descriptors of the surface temperature and vegetation chlorophyll content, whereas SM and the VOD characterize the water content in soils and vegetation [55], [65]. We also use information measures to evaluate whether it would be worthwhile to include the VOD as an additional variable in the SMADI ensemble to characterize droughts. Prior to the analyses, variables are resampled into a common 0.05º grid and biweekly temporal resolution. Details of the data sets are provided in Table 3. Measures are conducted for 2010 and 2011 and 2014–2016, which are representative of conditions with and without droughts (see Figure 7). We focus on computing multivariate IT measures in a temporal feature setting, where previous time steps are included as input features. For example, one input feature includes the current time stamp, two input features include the current time stamp and a time stamp from 14 days earlier, and so on. This enables us to investigate temporal scales that maximize shared information among remotely sensed variables. This is particularly relevant for droughts since there is a time lag between soil/climatic conditions (e.g., represented by SM and the LST) and plant responses (e.g., described by the NDVI and VOD), which varies in the literature from two or three weeks to three months [66]. The amount of expected information H for each of the four variables and how it changes as we include more 200 temporal dimensions is analyzed in Figure 8(a). Entropy will always increase with more features. The entropy shown here has been normalized by the total number of features, which enables us to quantify the amount of entropy per feature. It can be seen that the amount of entropy for the VOD is the highest in all temporal settings, closely followed by the LST. All variables decrease in entropy as we add more temporal features. The NDVI saturates at roughly 1.5 b, whereas the other variables have a steady, smooth decline. We can also see that the LST and VOD show the largest difference between years with and without droughts and that the difference is largest as we increase the temporal dimension. This result suggests that the LST and VOD observed during longer periods could be more useful in detecting droughts. Figure 8(b) demonstrates that the VOD increases the amount of expected information when added to the SMADI variable ensemble in all the considered temporal settings, suggesting that it would be worthwhile to include the VOD in agricultural drought studies. The results indicate that vegetation monitoring operational settings could benefit from synergistic approaches that facilitate including multisensory, multidimensional variables, in particular, under stress and during disturbances, such as agricultural droughts. The MI of every pair of multidimensional variables was analyzed to investigate the pairs’ relationships and redundancies as well as the optimal time scales for combining them. Note that standard measures for pairwise comparison, such as Pearson’s correlation, are restricted to one temporal dimension and hence do not facilitate exploring these scales. The MI scores obtained for LST relations are given in Figure 9. Interestingly, the figure shows that the LST–NDVI and LST–VOD show an MI increase to approximately two to four temporal dimensions and then saturate. This result suggests that a period of about one or two months is needed to capture the soil–plant status through the remotely sensed variables analyzed in our study region. The curves are relatively similar regardless of whether there is a drought or not, and the value spread for drought years is considerably reduced for all variables and especially for the VOD. This could be related to reduced variability (a limited range of values) during droughts, but further studies are needed to confirm this. We also observed that the MI is consistently low among SM and all variables with any number of temporal dimensions, and it is also low between the NDVI and VOD, highlighting the value of combining optical and microwave variables for vegetation/land monitoring. INFORMATION IN SPATIAL–TEMPORAL EARTH DATA DATA For our experiments, we used observational and model simulated variables from the Earth Science Data Lab [56] (https://www.earth​systemdatalab.net/), which is a platform that provides an opportunity for datacentric processing methodologies. The analysis-ready data cube contains and harmonizes more than 40 variables to monitor key IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
42° N 39° N 36° N Legend California Cropland Selected Pixel 33° N 100 0 100 200 300 400 km 30° N 123° W 120° W (a) 117° W 114° W VOD 1 0 1 SM NDVI 0 1 LST 0 1 SMADI 0 5 0 2011 2012 2013 (b) 2014 2015 2013 (c) 2014 2015 100 80 60 40 D0 Abnormally Dry D1 Moderate Drought D2 Severe Drought D3 Extreme Drought D4 Exceptional Drought 20 0 2011 2012 FIGURE 7. (a) The distribution of cropland in California, according to the Moderate Resolution Imaging Spectroradiometer International Geosphere–Biosphere Program land cover classification. (b) A time series of the normalized VOD, NDVI, SM, and LST, as well as the SMADI index [57] obtained at the selected pixel. The SMADI extreme drought category is marked with an orange horizontal line. (c) The percentage of the area in California that is in U.S. drought monitor [58] categories. Figure 7(a) was generated with QGIS [79]. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 201
Normalized Entropy 2 1.75 1.5 1.25 1 0.75 0.5 Variables VOD NDVI SM LST Droughts False True 2 6 5 t1 t2 t3 t4 NDVI t1 t2 t3 t4 SM t1 t2 t3 t4 LST t1 t2 t3 t4 10 SMADI Variables SMADI SMADI + Droughts False True 7 Normalized Entropy 4 6 8 Temporal Dimensions (a) VOD NDVI t1 t2 t3 t4 SM t1 t2 t3 t4 LST t1 t2 t3 t4 SMADI+ 4 NDVI 3 2 4 6 8 Temporal Dimensions (b) 10 12 t1 t2 t3 t4 SM t1 t2 t3 t4 LST t1 t2 t3 t4 VOD t1 t2 t3 t4 FIGURE 8. (a) A comparison of the entropy for the VOD, LST, NDVI and SM individually against the number of considered temporal dimensions. (b) A comparison of the VOD entropy contribution to the joint multidimensional variables integrated in the SMADI (the LST, NDVI, and SM) and the SMADI + (the LST, NDVI, SM, and VOD) and how it changes as we include more temporal dimensions. Solid lines are mean estimates, and shaded regions are the variance estimates for 2010 and 2011, when there were no droughts, and 2014 and 2015, when there were droughts. Next to each graphic, we show a Venn diagram to illustrate the measured information for three temporal dimensions as an example, following the same criteria as in Figure 3. Plots were generated using seaborn [73]. Variables VOD NDVI SM LST Droughts False True Normalized MI 0.7 0.6 0.5 0.4 0.3 VOD t1 t2 t3 t4 LST t1 t2 t3 t4 NDVI t1 t2 t3 t4 LST t t t t 1 2 3 4 SM t1 t2 t3 t4 LST t1 t2 t3 t4 0.2 0.1 2 4 6 8 Temporal Dimensions 10 12 FIGURE 9. MI between pairs of multidimensional variables: LST–VOD, LST–NDVI, and LST–SM. Solid lines are mean estimates, and shaded regions are the variance estimates for 2010 and 2011, when there were no droughts, and 2014 and 2015, when there were droughts. The Venn diagram illustrates the measured information for three temporal dimensions as an example, following the same criteria as in Figure 3. Plots were generated using seaborn [73]. 202 processes of the terrestrial surface and the atmosphere. The data exhibit clear spatial–temporal relations that need to be accounted for to properly convey and quantify information. Figure 10 illustrates how we represent the spatial–temporal relations as inputs given a single variable. We focus on three key land surface variables: precipitation, SH, and evaporation, which are outlined in the following: ◗  Precipitation: This is a fundamental variable in land–atmosphere processes. The collected data cover the period from 1980 to 2015 and come from the Global Precipitation Climatology Project [67], [68]. ◗  SH: These data cover 2001–2012 and were generated by training an ensemble of machine learning algorithms with eddy covariance data from FLUXNET and satellite observations in a cross-validation approach. Regressions from these observations to different kinds of carbon and energy fluxes were established and used to generate data sets with a spatial resolution of 5 arc minutes and a temporal resolution of eight days. The SH is a conductive heat flux from the Earth’s surface to the atmosphere; it is an important component of Earth’s surface energy budget and is expressed in [W m–2] [69]. ◗  Evaporation: These data span 2001– 2011 and build on the Global Land Evaporation Amsterdam Model, which consists of a set of algorithms that separately estimate the different components of land evaporation by using input-forcing data sets from re-analyses, optical and microwave satellites, and other merged sources. The model consists of four modules: potential evaporation (the Priestley–Taylor equation), interception (the Gash analytical model), soil (the multilayer soil model plus data assimilation), and stress (semiempirical). The data are sampled on a grid of 0.25º and have daily temporal coverage [70], [71]. The data are organized in a 4D cube x (u, v, t, k) involving (latitude, IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
longitude) spatial coordinates (u, v), time sampling t, and the variable k. They are provided at two spatial resolutions (0.083º and 0.25º) and at a temporal resolution of eight days, encompassing the years 2001–2011. In our experiments, we focus on the lower-resolution products and on the period from 2008 to 2010. SPATIAL–TEMPORAL ANALYSIS The considered variables (precipitation, SH, and evaporation) are fully coupled. Moisture and precipitation interactions are vastly modulated by both land–atmosphere exchanges and large-scale atmospheric circulation. Nevertheless, before understanding variable relations, it is important to identify when and where individual variables are expressive. This may help to assess the coupling mechanisms among variables and improve Earth system models. The question we want to address in this experiment is as follows: What are the optimal (in information terms) spatial and temporal scales for exploiting each variable’s information? Using RBIG, we show that the ratio of the spatiotemporal neighboring pixels that gives the most information can be explicitly calculated. We used RBIG to calculate the entropy H for the aforementioned variables under different spatial–temporal configurations (fully temporal, spatiotemporal, and fully spatial) as well as the corresponding information I (x) for each time pixel and variable. Figure 11 shows the entropy for the different variables and configurations, following the same procedure as [72] (and used, only in the spatial domain, in the “Information and Redundancy in High-Resolution Images” section). Essentially, we formed cubes with the same dimensionality but different spatiotemporal configurations and computed the entropy values for each. We chose several configurations ranging from a ratio of purely spatial (ratio = zero) up to purely temporal (ratio = one). We also looked at different configurations for the number of spatial–temporal dimensions used, e.g., a maximum of four dimensions up to a maximum of 49 dimensions (temporally, this is approximately one year). Notice how each variable has a different spatial–temporal relationship with entropy, but, in general, temporal configurations (ratio = one) convey more information than purely spatial ones (ratio = zero) for all the considered variables. The trends are clear, in particular, for precipitation, where incorporating temporal data for any number of dimensions yields a higher amount of expected information. For SH and evaporation, the entropy paths are similar and reveal a fast entropy increase for particular spatiotemporal configurations (ratio ~ 0.8). These results suggest different optimal (in information terms) time and space scales for various variables, which may have implications in further analyses and applications. Using the same data configurations, we computed the information content of each sample, following the procedure described in the “Information” section. This helps to visualize the regions that have more and less information. We show in Figure 12 the results of a spatiotemporal analysis of the information content of all three variables. In regions where we expect pronounced seasonal patterns, the information (complexity) is apparently high in fully temporal configurations, as the seasonal cycle controls ecosystem dynamics. Actually, seasonal (temporal) modes have less informative content in the spatial domain, as they are mainly driven by solar forcing. The information values tend to be higher in tropical regions, whereas arid locations show low-complexity (low-information) patterns. Let us now look in deeper detail at the different spatiotemporal configurations and their information patterns. Global rainfall patterns are traditionally related to strong seasonality, dominated by the position of the Intertropical Convergence Zone and El Niño–La Niña cycles, which occur irregularly at intervals of two to seven years. Spatial data generally dominate with high probability in the Amazon and the tropics and with little information in desert areas (e.g., California, the Arabian Peninsula, and central Australia). As we quantify information in spatiotemporal configurations, clearer patterns of little information (e.g., Australia) and Spatial Temporal Latitude Time Longitude 7×7×1 4×4×3 1 × 1 × 46 FIGURE 10. The decomposition of the Earth science data cube (ESDC) [56] into different spatial–temporal configurations ranging from completely spatial to completely temporal. The 7 # 7 # 1 spatial configuration consists entirely of spatial pixels; this is very similar to spatial patches. The 1 # 1 # 46 configuration includes only temporal pixels, which is essentially a time series. The 4 # 4 # 3 configuration includes a mix of spatial and temporal pixels. Throughout this article, we see different notions of spatial–temporal representation of the ESDC data. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 203
Ratio Spatial Temporal (a) 0 4 9 16 Entropy –0.5 25 36 49 –1 –1.5 –2 –2.5 0 0.5 1 Ratio (b) 4 9 16 1 25 36 49 Entropy 0.5 0 –0.5 –1 0 0.5 Ratio 1 (c) –1.25 4 9 16 –1.5 25 36 49 Entropy –1.75 –2 –2.25 –2.5 –2.75 0 0.5 Ratio 1 (d) FIGURE 11. (a) The entropy measurement for different spatiotemporal configurations in the ESDC variables. The IT Venn diagram represents Figure 10 and how it relates to measuring entropy: the expected uncertainty. The measured entropy for (b) precipitation, (c) SH, and (d) evaporation from the ESDC [56] changes with different spatial–temporal representations, ranging from fully spatial (ratio = zero) to fully temporal (ratio = one). 204 high probability (e.g., the east–west U.S. gradient) emerge [45]. Studying precipitation in the fully temporal configuration translates into a clear ruling of the winter season in the Amazon, Indonesia, and northern Europe. Yet a comparison of temporal versus spatial information in Figure 12 (bottom row) reveals that spatial information dominates in desert areas (e.g., Australia, the Iberian Peninsula, the Sahara, and Mexico), which are reasonably independent of time, and that temporal information dominates in the Sahel (savanna), northern latitudes, and southwest China, which are generally characterized by high rain factors, seasons, and moisture. The transfer of SH into the air is dependent on the temperature gradient between the surface and the space above. SH information patterns stand out clearly. While (fully) spatial information dominates in the northern hemisphere, (fully) temporal information patterns appear in the tropics, where rainfall is present across larger regions and longer seasons. The global spatial distribution of SH information shows the largest values in subtropical, dry regions, where available energy is preferentially partitioned to SH rather than latent heat [50], and it seems to be anticorrelated with the amplitude of the mean seasonal cycle. These results reveal the most SH information in tropical and subtropical deserts, where a high surface temperature conducts much heat into the air above, and the least information near the poles, where surface temperatures are much lower. The information is mainly concentrated in the tropics, too, and shows patterns similar to those of precipitation, with the exception of clear spatial information in India. Evaporation maps reveal that spatial information dominates in deserts and dry regions, where evaporation is limited, while temporal information (with more interannual variability) resides in northern latitudes. This is mainly due to low temperatures and radiation, equating to little evaporation throughout the year. Temperate areas show increased evaporation information in purely spatial and temporal configurations, coinciding with increasing temperatures above ground moistened by winter rains. Cooler winter temperatures in the southern hemisphere reduce evaporation, which is also captured in the spatial-versus-temporal divergence maps. Note that, in very dry regions, there is more information (a lower evaporative fraction), while, conversely, for very humid regions, the information agrees with [50]. CONCLUSIONS This article introduced a Gaussianization method and illustrated how to use it for multivariate density estimation in the context of Earth system science. The problem is highly relevant given the advent of all kinds of Earth data (both remotely sensed and in situ observations), novel products, and model simulations. Density estimation is a long-standing, unresolved problem in statistics and machine learning, mainly because of the curse of dimensionality. Data in remote sensing and geosciences pose additional problems for PDF estimation: high-dimensional data, nonlinear feature relations, many noise sources, and distinct spatial–temporal IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
7×7×1 1 × 1 × 46 –60 –40 –20 0 20 40 60 80 –60 –40 –20 0 20 40 60 80 –60 –40 –20 0 20 40 60 80 (a) –150 –100 –50 0 50 Longitude (°) –1 0 1 –150 –100 –50 0 50 Longitude (°) 1 0.5 0 –150 –100 –50 0 50 Longitude (°) 1 0.5 0 100 100 100 150 150 150 –60 –40 –20 0 20 40 60 80 –60 –40 –20 0 20 40 60 80 –60 –40 –20 0 20 40 60 80 (b) –150 –100 –50 0 50 Longitude (°) –1 0 1 –150 –100 –50 0 50 Longitude (°) 1 0.5 0 –150 –100 –50 0 50 Longitude (°) 1 0.5 0 100 100 100 150 150 150 –60 –40 –20 0 20 40 60 80 –60 –40 –20 0 20 40 60 80 –60 –40 –20 0 20 40 60 80 (c) –150 –100 –50 0 50 Longitude (°) –1 0 1 –150 –100 –50 0 50 Longitude (°) 1 0.5 0 –150 –100 –50 0 50 Longitude (°) 1 0.5 0 100 100 100 150 150 150 FIGURE 12. The top two first rows show information content maps for precipitation, SH, and evaporation, using a fully spatial (7 # 7 spatial width and temporal length of one) and a fully temporal (1 # 1 spatial width and temporal length of 46) configuration. The bottom row shows a divergent map of the tradeoff (subtraction) between the fully spatial and fully temporal information content per each variable. (a) Precipitation. (b) SH. (c) Evaporation. Plots were generated with xarray and cartopy [80], [81]. Spatial–Temporal I (x ) Latitude (°) Latitude (°) Latitude (°) I (x ) I (x ) Spatial–Temporal Latitude (°) Latitude (°) Latitude (°) Latitude (°) Latitude (°) Latitude (°) I (x ) I (x ) I (x ) I (x ) Spatial–Temporal IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE Spatial–Temporal DECEMBER 2021 205
structures. Looking at the literature, most sources dealing with density estimation involve few dimensions, treat the problems marginally, and construct parametric models of the densities. The Gaussianization methodology 1) scales to very high dimensions, 2) jointly works with all dimensions through simple orthogonal transforms plus marginal operations, and 3) does not assume any parametric form of the density. Using the standard multivariate Gaussian as a convenient goal distribution in the transform domain leverages the change-of-variables formula to compute exact probability densities. And, by extension, we are able to compute IT metrics easily. We showed empirical performance evidence in several Earth system data analysis problems, using a wide diversity of data (multispectral, hyperspectral, SAR, and global products from satellites and Earth system models), and addressed the key problems of information estimation, redundancy, and synthesis. Our results confirmed the validity of the method, for which we anticipate wide use and adoption. The framework enables us to tackle all applications involving a PDF estimation, from data classification to denoising and coding, which were not treated in this article. The methodology also facilitates computing other interesting IT measures, such as KL divergence and conditional independence, which will be a subject of future research. ACKNOWLEDGMENTS This research was funded by the European Research Council (ERC), under the ERC–CoG-2014 project (grant 647423) and the ERC-SyG-2019 USMILE project (grant 855187). J. Emmanuel Johnson thanks the European Space Agency (ESA) for support via the Early Adopter Call of the Earth System Data Lab project. Additional support was provided by Project RTI2018-096765-A-100 (MCIU/AEI/ FEDER, UE). Valero Laparra is supported by the projects TEC2016-77741-R, DPI2017-89867-C2-2-R, and PID2019109026RB-I00. Maria Piles thanks the ESA for the longterm support of this initiative. [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] AUTHOR INFORMATION J. Emmanuel Johnson (juan.johnson@uv.es) is with the Image Processing Laboratory, University of Valencia, Valencia, 46010, Spain. Valero Laparra (valero.laparra@uv.es) is with the Image Processing Laboratory, University of Valencia, Valencia, 46010, Spain. María Piles (maria.piles@uv.es) is with the Image Processing Laboratory, University of Valencia, Valencia, 46010, Spain. She is a Senior Member of IEEE. Gustau Camps-Valls (gustau.camps@uv.es) is with the Image Processing Laboratory, University of Valencia, Valencia, 46010, Spain. He is a Fellow of IEEE. REFERENCES [1] 206 W. Buermann, J. Dong, X. Zeng, R. B. Myneni, and R. E. Dickinson, “Evaluation of the utility of satellite-based vegetation leaf [14] [15] [16] area index data for climate simulations,” J. Clim., vol. 14, no. 17, pp. 3536–3550, 2001. doi: 10.1175/1520-0442(2001)0142.0.CO;2. R. H. Moss et al., “The next generation of scenarios for climate change research and assessment,” Nature, vol. 463, no. 7282, pp. 747–756, 2010. doi: 10.1038/nature08823. J. T. Overpeck, G. A. Meehl, S. Bony, and D. R. Easterling, “Climate data challenges in the 21st century,” Science, vol. 331, no. 6018, pp. 700–702, 2011. doi: 10.1126/science.1197869. V. Eyring et al., “Taking climate model evaluation to the next level,” Nature Climate Change, vol. 9, no. 2, pp. 102–110, 2018. doi: 10.1038/s41558-018-0355-y. M. Reichstein et al., “Deep learning and process understanding for data-driven Earth System Science,” Nature, vol. 566, pp. 195–204, Feb. 2019. doi: 10.1038/s41586-019-0912-1. T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd ed. Hoboken, NJ: Wiley, 2006. G. A. Darbellay and I. Vajda, “Estimation of the information by an adaptive partitioning of the observation space,” IEEE Trans. Inf. Theory, vol. 45, no. 4, pp. 1315–1321, Sept. 2006. doi: 10.1109/18.761290. A. Kraskov, H. Stögbauer, and P. Grassberger, “Estimating mutual information,” Phys. Rev. E, vol. 69, no. 6, p. 066138, 2004 June. doi: 10.1103/PhysRevE.69.066138 Q. Wang, S. Kulkarni, and S. Verdú, “A nearest-neighbor approach to estimating divergence between continuous random vectors,” in Proc. IEEE Int. Symp. Inf. Theory, 2006, pp. 242–246. doi: 10.1109/ISIT.2006.261842. N. Leonenko, L. Pronzato, and V. Savani, “A class of Rényi information estimators for multidimensional densities,” Annu. Statist., vol. 36, no. 5, pp. 2153–2182, 2008. doi: 10.1214/ 07-AOS539. D. W. Scott, Multivariate Density Estimation: Theory, Practice, and Visualization. Hoboken, NJ: Wiley, 2015. F. Pérez-Cruz, “Estimation of information theoretic measures for continuous random variables,” in Proc. 22nd Ann. Conf. Neural Inf. Process. Syst., in Advances in Neural Information Processing Systems 21, D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, Eds. Curran Associates, 2009, pp. 1257–1264. S. Paul and D. N. Kumar, “Spectral-spatial classification of hyperspectral data with mutual information based segmented stacked autoencoder approach,” ISPRS J. Photogramm. Remote Sens., vol. 138, pp. 265–280, 2018. doi: 10.1016/j.isprsjprs.2018.02.001. A. G. Konings, K. A. McColl, M. Piles, and D. Entekhabi, “How many parameters can be maximally estimated from a set of measurements?” IEEE Geosci. Remote Sens. Lett., vol. 12, no. 5, pp. 1081–1085, 2015. doi: 10.1109/LGRS.2014.2381641. A. Marinoni and P. Gamba, “Unsupervised data driven feature extraction by means of mutual information maximization,” IEEE Trans. Computat. Imag., vol. 3, no. 2, pp. 243–253, 2017. doi: 10.1109/TCI.2017.2669731. J. Zhang, M. Zareapoor, X. He, D. Shen, D. Feng, and J. Yang, “Mutual information based multi-modal remote sensing image registration using adaptive feature weight,” Remote Sens. Lett., vol. 9, no. 7, pp. 646–655, 2018. doi: 10.1080/2150704X.2018 .1458343. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
[17] S. Prasad and L. M. Bruce, “Hyperspectral feature space partitioning via mutual information for data fusion,” in Proc. 2007 IEEE Int. Geosci. Remote Sens. Symp., pp. 4846–4849. doi: 10.1109/IGARSS.2007.4423946. [18] L.-Y. Zhao, B.-Y. Lü, X.-R. Li, and S.-H. Chen, “Multi-source remote sensing image registration based on scale-invariant feature transform and optimization of regional mutual information,” Acta Phys. Sin., vol. 64, no. 12, p. 124,204, 2015. doi: 10.1109/IGARSS.2016.7729658. [19] S. Chen, X. Li, L. Zhao, and H. Yang, “Medium-low resolution multisource remote sensing image registration based on sift and robust regional mutual information,” Int. J. Remote Sens., vol. 39, no. 10, pp. 3215–3242, 2018. doi: 10.1080/01431161.2018.1437295. [20] X. Xu, X. Li, X. Liu, H. Shen, and Q. Shi, “Multimodal registration of remotely sensed images based on Jeffrey’s divergence,” ISPRS J. Photogramm. Remote Sens., vol. 122, pp. 97–115, Dec. 2016. doi: 10.1016/j.isprsjprs.2016.10.005. [21] A. C. Frery, “Stochastic contrast measures for SAR data: A survey,” J. Radars., vol. 8, no. 6, pp. 758–781, 2019. doi: 10.12000/JR19108. [22] A. D. C. Nascimento, A. C. Frery, and R. J. Cintra, “Detecting changes in fully polarimetric SAR imagery with statistical information theory,” IEEE Trans. Geosci. Remote. Sens., vol. 57, no. 3, pp. 1380–1392, 2019. doi: 10.1109/TGRS.2018.2866367. [23] B. L. Ruddell, N. A. Brunsell, and P. Stoy, “Applying information theory in the geosciences to quantify process uncertainty, feedback, scale,” Eos, Trans. Amer. Geophys. Union, vol. 94, no. 5, pp. 56–56, 2013. doi: 10.1002/2013EO050007. [24] G. Papamakarios, “Neural density estimation and likelihoodfree inference,” 2019, arXiv:abs/1910.13233. [25] L. Ardizzone, J. Kruse, C. Rother, and U. Köthe, “Analyzing inverse problems with invertible neural networks,” 2019, arXiv:1808.04730. [26] D. J. Rezende et al., “Normalizing flows on tori and spheres,” 2020, arXiv:abs/2002.02428. [27] S. S. Chen and R. A. Gopinath, “Gaussianization,” in Proc. Adv. Neural Inf. Process. Syst., in Advances in Neural Information Processing Systems 13, Papers from Neural Information Processing Systems (NIPS), Denver, CO, 2000, pp. 423–429. [28] V. Laparra, G. Camps-Valls, and J. Malo, “Iterative gaussianization: From ICA to random rotations,” IEEE Trans. Neural Netw., vol. 22, no. 4, pp. 537–549, 2011. doi: 10.1109/TNN.2011 .2106511. [29] J. Ballé, V. Laparra, and E. P. Simoncelli, “Density modeling of images using a generalized normalization transformation,” 2016, arXiv:abs/1511.06281. [30] C. Meng, Y. Song, J. Song, and S. Ermon, “Gaussianization flows,” 2020, arXiv:abs/2003.01941. [31] E. T. Nalisnick, A. Matsukawa, Y. W. Teh, D. Görür, and B. Lakshminarayanan, “Hybrid models with deep and invertible features,” 2019, arXiv:abs/1902.02767. [32] H. A. Sturges, “The choice of a class interval,” J. Amer. Statist. Assoc., vol. 21, no. 153, pp. 65–66, 1926. doi: 10.1080/01621459.1926 .10502161. [33] D. Freedman and P. Diaconis, “On the histogram as a density estimator:l2 theory,” Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, vol. 57, no. 4, pp. 453–476, 1981. doi: 10.1007/BF01025868. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE [34] K. H. Knuth, “Optimal data-based binning for histograms and histogram-based probability density models,” Digit. Signal Process., vol. 95, p. 102,581, Dec. 2019. doi: 10.1016/j. dsp.2019.102581. [35] C. M. Bishop, “Pattern recognition and machine learning,” in Information Science and Statistics, 5th ed. New York: Springer, 2007. [36] S. S. Chen and R. A. Gopinath, “Gaussianization,” in Advances in Neural Information Processing Systems 13, T. K. Leen, T. G. Dietterich, and V. Tresp, Eds., Cambridge, MA: MIT Press, 2001, pp. 423–429. [37] G. M. Goerg, “The lambert way to gaussianize heavy-tailed data with the inverse of Tukey’s h transformation as a special case,” 2015, arXiv:1010.2265. [38] C. Durkan, A. Bekasov, I. Murray, and G. Papamakarios, “Neural spline flows,” in Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché Buc, E. Fox, and R. Garnett, Eds., Vancouver, BC, CA: Curran Associates, 2019, pp. 7511–7522. [39] G. E. P. Box and D. R. Cox, “An analysis of transformations,” J. Roy. Statist. Soc. Ser. B (Methodological), vol. 26, no. 2, pp. 211– 252, 1964. doi: 10.1111/j.2517-6161.1964.tb00553.x. [40] G. Liu, Y. Liu, M. Guo, P. Li, and M. Li, “Variational inference with Gaussian mixture model and householder flow,” Neural Netw. Official J. Int. Neural Netw. Soc., vol. 109, pp. 43–55, Jan. 2019. doi: 10.1016/j.neunet.2018.10.002. [41] J. M. Tomczak and M. Welling, “Improving variational autoencoders using householder flow,” 2016, arXiv:abs/1611.09630. [42] J. H. Friedman, “Exploratory projection pursuit,” J. Amer. Statist. Assoc., vol. 82, no. 397, pp. 249–266, 1987. doi: 10.1080/01621459.1987.10478427. [43] D. I. Inouye and P. Ravikumar, “Deep density destructors,” in Proc. 35th Int. Conf. Machine Learn., 2018, pp. 2167–2175. [44] P. Jaini, K. A. Selby, and Y. Yu, “Sum-of-squares polynomial flow,” Proceedings of Machine Learning Research, vol. 97, K. Chaudhuri and R. Salakhutdinov, Eds., Long Beach, CA: PMLR, June 9–15, 2019, pp. 3009–3018. [45] S. Tuttle and G. Salvucci, “Empirical evidence of contrasting soil moisture–precipitation feedbacks across the United States,” Science, vol. 352, no. 6287, pp. 825–828, 2016. doi: 10.1126/science.aaa7185. [46] J. Cardoso, “Dependence, correlation and Gaussianity in independent component analysis,” J. Mach. Learn. Res., vol. 4, nos. 7–8, pp. 1532–4435, 2004. doi: 10.1162/jmlr.2003.4.7-8.1177. [47] C. E. Shannon, “A mathematical theory communication,” Bell Syst. Techn. J., vol. 27, pp. 379–423, 1948. [48] M. S. Watanabe, “Information theoretical analysis of multivariate correlation,” IBM J. Res. Develop., vol. 4, no. 1, pp. 66–82, 1960. doi: 10.1147/rd.41.0066. [49] M. Studený and J. Vejnarová, The multiinformation function as a tool for measuring stochastic dependence,” in Proc. NATO Adv. Study Inst. Learn. Graph. Models., 1998, pp. 261–297. doi: 10.5555/308574.308673. [50] M. Jung et al., “Global patterns of land-atmosphere fluxes of carbon dioxide, latent heat, and sensible heat derived from eddy covariance, satellite, and meteorological observations,” J. Geophys. Res., BiogeoSci., vol. 116, no. G3, 2011. doi: 10.1029/2010JG001566. 207
[51] L. Gómez-Chova, D. Fernández-Prieto, J. Calpe, E. Soria, J. VilaFrancés, and G. Camps-Valls, “Urban monitoring using multitemporal SAR and multispectral data,” Pattern Recognit. Lett., vol. 27, no. 4, pp. 234–243, 2006. doi: 10.1016/j.patrec.2005.08.004. [52] M. F. Baumgardner, L. L. Biehl, and D. A. Landgrebe. “220 Band AVIRIS Hyperspectral Image Data Set: June 12, 1992 Indian Pine Test Site 3.” Sept. 2015. Purdue University. https://purr .purdue.edu/publications/1947/1 [53] Y. Yang and S. Newsam, “Bag-of-visual-words and spatial extensions for land-use classification,” in Proc. 18th SIGSPATIAL Int. Conf. Adv. Geogr. Inf. Syst., 2010, pp. 270–279. doi: 10.1145/1869790.1869829. [54] N. Sánchez, Á. González-Zamora, J. Martínez-Fernández, M. Piles, and M. Pablos, “Integrated remote sensing approach to global agricultural drought monitoring,” Agric. For. Meteorol., vol. 259, pp. 141–153, Sept. 2018. doi: 10.1016/j.agrformet.2018.04.022. [55] R. Fernandez-Moran et al., “SMOS-IC: An alternative SMOS soil moisture and vegetation optical depth product,” Remote Sens., vol. 9, no. 5, p. 457, May 2017. doi: 10.3390/rs9050457. [56] M. Mahecha et al., “Earth system data cubes unravel global multivariate dynamics,” Earth Syst. Dynamics, vol. 11, no. 1, pp. 201–234, Feb. 2020. doi: 10.5194/esd-11-201-2020. [57] Á. González-Zamora, N. Sánchez, and M. Piles. “Global Soil Moisture Agricultural Drought Index (SMADI).” June 17, 2019. Zenodo. https://zenodo.org/record/3247649#.YFCot9IzbIU [58] U.S. Drought Monitor. https://droughtmonitor.unl.edu/ (accessed Mar. 3, 2020). [59] J. Zscheischler, M. D. Mahecha, S. Harmeling, and M. Reichstein, “Detection and attribution of large spatiotemporal extreme events in Earth observation data,” Ecol. Informat., vol. 15, pp. 66–73, May 2013. doi: 10.1016/j.ecoinf.2013.03.004. [60] J. S. Boyer, “Plant productivity and environment,” Science, vol. 218, no. 4571, pp. 443–448, 1982. doi: 10.1126/science.218. 4571.443. [61] C. P. Kelley, S. Mohtadi, M. A. Cane, R. Seager, and Y. Kushnir, “Climate change in the Fertile Crescent and implications of the recent Syrian drought,” Proc. Natl. Acad. Sci., vol. 112, no. 11, pp. 3241–3246, 2015. doi: 10.1073/pnas.1421533112. [62] S. Fritz et al., “A comparison of global agricultural monitoring systems and current gaps,” Agric. Syst., vol. 168, pp. 258–272, Jan. 2019. doi: 10.1016/j.agsy.2018.05.010. [63] M. Weiss, F. Jacob, and G. Duveiller, “Remote sensing for agricultural applications: A meta-review,” Remote Sens. Environ., vol. 236, p. 111,402, Jan. 2020. doi: 10.1016/j.rse.2019.111402. [64] S. Sadri, E. F. Wood, and M. Pan, “Developing a drought-monitoring index for the contiguous US using SMAP,” Hydrol. Earth Syst. Sci., vol. 22, no. 12, pp. 6611–6626, 2018. doi: 10.5194/ hess-22-6611-2018. [65] N. Sánchez, Á. González-Zamora, M. Piles, and J. MartínezFernández, “A new Soil Moisture Agricultural Drought Index (SMADI) integrating MODIS and SMOS products: A case of study over the Iberian Peninsula,” Remote Sens., vol. 8, no. 4, p. 287, 2016. doi: 10.3390/rs8040287. [66] G. P. Petropoulos and T. Islam, Remote Sensing of Hydrometeorological Hazards. Boca Raton, FL: CRC Press, 2017. 208 [67] R. F. Adler et al., “The Version-2 Global Precipitation Climatology Project (GPCP) Monthly Precipitation Analysis (1979-Present),” J. Hydrometeorol., vol. 4, no. 6, pp. 1147–1167, 2003. doi: 10.1175/1525-7541(2003)004<1147:T VGPCP> 2.0.CO;2. [68] G. J. Huffman, R. F. Adler, D. T. Bolvin, and G. Gu, “Improving the global precipitation record: GPCP version 2.1,” Geophys. Res. Lett., vol. 36, no. 17, 2009. doi: 10.1029/2009GL040000. [69] G. Tramontana et al., “Predicting carbon dioxide and energy fluxes across global fluxnet sites with regression algorithms,” Biogeosciences, vol. 13, no. 14, pp. 4291–4313, 2016. doi: 10.5194/ bg-13-4291-2016. [70] B. Martens et al., “Gleam v3: Satellite-based land evaporation and root-zone soil moisture,” Geoscientific Model Develop., vol. 10, no. 5, pp. 1903–1925, 2017. doi: 10.5194/gmd-10-1903-2017. [71] D. G. Miralles, T. R. H. Holmes, R. A. M. De Jeu, J. H. Gash, A. G. C. A. Meesters, and A. J. Dolman, “Global land-surface evaporation estimated from satellite-based observations,” Hydrol. Earth Syst. Sci., vol. 15, no. 2, pp. 453–469, 2011. doi: 10.5194/hess15-453-2011. [72] V. Laparra and R. Santos-Rodriguez, “Spatial/spectral information trade-off in hyperspectral images,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2015, pp. 1124–1127. [73] M. L. Waskom, “seaborn: statistical data visualization,” J. Open Source Softw., vol. 6, no. 60, p. 3021, 2021. doi: 10.21105/ joss.03021. [74] J. D. Hunter, “Matplotlib: A 2D graphics environment,”Comput. Sci. Eng., vol. 9, no. 3, pp. 90–95, May-June 2007. doi: 10.1109/ MCSE.2007.55. [75] L. Gomez-Chova, D. Fernández-Prieto, J. Calpe, E. Soria, J. Vila, and G. Camps-Valls, “Urban monitoring using multi-temporal SAR and multi-spectral data,” Pattern Recognit. Lett., vol. 27, no. 4, pp. 234–243, 2006. doi: 10.1016/j.patrec.2005.08.004. [76] P. Castracane et al., “Monitoring urban sprawl and its trends with EO data. UrbEx, a prototype national service from a WWF-ESA joint effort,” in Proc. 2003 2nd GRSS/ISPRS Joint Workshop on Remote Sens. Data Fusion over Urban Areas, pp. 245– 248. doi: 10.1109/DFUA.2003.1219997 [77] D. Foreman-Mackey, “corner.py: Scatterplot matrices in Python,” J. Open Source Softw., vol. 1, no. 2, June 2016. doi: 10.21105/joss.00024. [78] Y. Yang and S. Newsam, “Bag-of-visual-words and spatial extensions for land-use classification,” in Proc. 18th SIGSPATIAL Int. Conf. Adv. Geographic Inf. Syst. (GIS ‘10), Nov. 2010, pp. 270–279. doi: 10.1145/1869790.1869829 [79] QGIS Development Team. QGIS Geographic Information System. (2021). QGIS Association. [Online]. Available: https://www .qgis.org [80] S. Hoyer and J. Hamman, “xarray: N-D labeled arrays and datasets in Python,” J. Open Res. Softw., vol. 5, no. 1, p.10, 2017. doi: 10.5334/jors.148. [81] Met Office, 2010–2015, “Cartopy: A Cartographic Python Library With a Matplotlib Interface,” Exeter, Devon. [Online]. Available: http://scitools.org.uk/cartopy GRS IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
Wireless Sensor Networks Applied to Precision Agriculture A worldwide literature review with emphasis on Latin America ©SHUTTERSTOCK.COM/MONOPOLY919 MÓNICA KAREL HUERTA, ANDREA GARCÍA-CEDEÑO, JUAN CARLOS GUILLERMO, AND ROGER CLOTET A griculture is fundamental to the economic and social development of populations worldwide since the food of millions of people depends on agriculture. According to the Food and Agriculture Organization (FAO) of the United Nations, in 2017, more than 100 million people were food insecure. In developing countries, where this situation is more pronounced, agriculture is a family activity in which farming processes don’t make use of technology. The use of wireless sensor networks (WSNs) to provide precision agriculture (PA) has demonstrated Digital Object Identifier 10.1109/MGRS.2020.3044235 Date of current version: 9 February 2021 DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE positive results related to crop yields and resource management, which raises the need to determine the progress of research on the impact of these technologies. This article analyzes different proposals focused on the optimization of agricultural processes, with particular focus on their benefit to small and medium producers and, therefore, to food security. A literature review was conducted with an emphasis on scientific developments in Latin America, a region where family farming is one of the main economic activities. Through this study, it was possible to generate indicators of development and general successes and setbacks as well as to determine the main technical characteristics and most addressed topics in publications on this subject. 0274-6638/21©2021IEEE 209
PROBLEMS IN AGRICULTURE Today, millions of people suffer from hunger due to the lack of an available and stable of food supply, the main causes of food insecurity [1]–[3]. An example of this is a situation in which the inhabitants of a given territory do not have access to enough food to maintain a healthy life [3]–[7]. Food insecurity is also a growing concern worldwide because of its serious consequences: the degradation of health, acute malnutrition, increased risk of birth defects, and high mortality [4], [7]–[10]. According to statistics from FAO, in 2017, 124 million people were food insecure in 51 countries [11]. Among the different factors that give rise to these conditions are climate change, government policies, inequitable land distribution, chronic poverty, and insufficiently developed agriculture. Agriculture is a key activity in overcoming famine and promoting economic growth [2], [3], [5]–[7], [12]–[14]. Despite the fact that the practice of agriculture depends on natural resources, its irresponsible use results in environmental degradation: water scarcity, soil erosion, greenhouse gas emissions, and deforestation [2], [15], [16]. In the most advanced countries, priority is given to the industrial agricultural model that depends on hydrocarbons, external energy, and agrochemicals, generating an unfavorable impact on biodiversity and human beings [16], [17]. In contrast to first world countries, the agricultural economic sector of many nations is constituted mostly of family farms [18]. Family agriculture contributes approximately 50% of crops globally [1]; in Latin America and Africa, their lands correspond to 34.5 and 80%, respectively [16]. Nevertheless, family agriculture production normally doesn’t reach markets. On the basis of this information, it has been established that improving the performance of these small producers will significantly increase the availability of food [16], [19], [20]. Given these circumstances, there is a need to adopt new sustainable procedures and apply them to the agricultural sector, with a focus on environmental conservation, profitability through the appropriate use of resources, and promotion of sustainable family-farming production [1], [2], [5], [13], [15], [16]. Such methods form a safeguard against food insecurity because of their economic viability and guarantee of maintaining or increasing agricultural yields for present and future generations [1], [13], [14], [16], [21]. With regard to government initiatives, most administrations intended to guide their policies and financing plans in favor of low-income farmers and to implement technology in the fields [4], [5], [13], [17], [22]. However, the impact has been minimal as small producers are unable to adopt the innovations or they do so inappropriately [17], [23]–[26]. Due to the gap between small farmers and technology, the scientific community is working on various proposals aimed at optimizing and combating obstacles generated by climate factors through the use of various technologies: WSNs, mobile geographic information systems, geostatistics, and spectrum analysis, among others [1], [16], [19], [20], [24], [27]– [29]. The application of these techniques in different crop processes constitutes a new trend of PA [30]–[32]. 210 Within PA, the use of WSNs is the most common technique for improving of traditional agricultural processes [30], [31], [33]–[35]. It is a noninvasive method that monitors information regarding the number of resources, the weather, and environmental factors. Data processing can generate systems for predicting and modeling agricultureaffected parameters to respond to, for example, climate change [36]. As a consequence, appropriate agronomic techniques that comply with the principle of sustainability and establish a solution to food insecurity are designed for a given crop. Therefore, the use of PA increases yields and leads to a crop-management strategy, where investment in agrochemicals is reduced and profits increase due to the consequent productivity growth [30], [31]. The established problems support the indisputable importance of WSNs in PA for the promotion of agriculture, based on scientific evidence presented in literature reviews and surveys analyzing the application of this technology and its effectiveness. In light of the results and to provide continuity to this technology’s development, this article seeks to establish the main technical characteristics of several articles and determine their impact on and benefit for small producers. TECHNOLOGY AND FAMILY-FARMING PRODUCTION Within PA is a large amount of scientific evidence that reflects the positive effects of WSNs. The application of this method allows farmers to identify the interaction between crop growth and soil/environmental factors. Its effectiveness is manifested in decision making based on monitored data regarding agronomic techniques suitable for a given crop as well as for the protection of the environment and farm products [26], [37]–[39]. However, in Latin American countries, where agriculture is a major economic activity, most of the agriculture-producing population belongs to rural sectors and practices farming at a family level as a source of livelihood. The biggest drawback for the rural agricultural sector is the lack of synergy between popular advice and scientific development organizations. This is an obstacle to the inclusion of innovative, low-cost proposals for rural farmers. Faced with this problem, governments have opted for measures that promote domestic agriculture [1], [6], [16]. Nevertheless, technological solutions can be perceived incorrectly if they are defined primarily as highly sophisticated devices instead of as techniques that generate new capacities to produce goods and services [5]. In the same area, there is another impediment concerning education: small farmers cannot acquire industrial electronic solutions due to their high prices and the farmers’ lack of access to the training required for the use of such equipment [23]. METHODOLOGY To obtain current information that shows the state of scientific technological research on agriculture WSNs and their impact on small producers, we have analyzed research documents developed in various regions of the planet, with an emphasis on Latin America. The literature review will be a starting point IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
ANALYSIS AND RESULTS In this section, we extract data regarding the implementation and evaluation of the various platforms detailed in the articles. Applying the selection criteria specified in Table 1, a total of 86 documents match these criteria as primary DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE scientific research documents from January 2005 to December of 2019 [25], [44]–[127]. The results and analysis of these documents were divided into three sections: utility, technical characteristics, and implementation. UTILITY The documentary sample is composed of documentation on platforms that function as a support tool with a specific utility. To determine the main uses of the technology in question, we analyzed the most common applications for which the various sensor networks have been evaluated or designed. Likewise, the variables and parameters that coincide in the majority of the documentary sample were extracted. This review uses the term variable as a reference to the components of an ecosystem: soil, plant, environment, and others. The state and features of each variable have been set as parameters, a term used to reference chemical and physical indicators, such as humidity, temperature, pH, and others. As can be seen in Figure 2, according to the information collected, it is established that the main applications of WSNs in agriculture are oriented toward the appropriate administration “Wireless Sensor Network” and “Precision Agriculture” Documents Sample, Sorted by Year 30 Documents (%) 25 20 15 10 5 0 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 for future projects in the search to meet environmental needs and for proposing solutions to reduce food insecurity. Articles from the Scopus database were considered due to its prestige, magnitude, and the quality of content [40]. In addition, because we wanted to give geographical priority to Latin America and found scarce evidence about it in the main database, we included works from digital repositories of universities and research institutes as well as from journals in the main Latin American indexes [41], [42]. The established methodology is based on a qualitative study of primary research documents. The documents correspond to indexed articles related to proposals and studies of WSN-based platforms developed in the last 14 years and applied to PA. For the extraction of scientific articles, the following search string was applied: “wireless sensor network” and “precision agriculture,” for a metadata analysis corresponding to the title, abstract, and keywords. The sample size of matching documents was 694, resulting from the search string plus a temporary filter that limits the selection of works to those published from January 2005 to December 2019. By applying Cochran’s sample calculation formula [43], a 95% confidence level, ±10% accuracy, and maximum variability are assumed, obtaining 86 as a result—a value that corresponds to the number of documents to be analyzed (see Figure 1). Using inclusion and exclusion criteria (see Table 1), we made the selection of articles that belong to the sample. Similarly, we carried out the extraction of the technical data and results regarding the implementation of these proposals. From the information acquired, the main parameters and variables of monitoring were analyzed as well as the acquisition devices and their accessibility according to the use of open hardware and open source software. Other assessments correspond to the most used sensors and brands, communication protocols, application areas, the development of graphic interfaces, and the creation of open databases. Similarly, the main research problems that have motivated the development of these proposals, the beneficiaries, and the effects of the implementations of these proposals at scientific and social levels were considered. As a final stage, the information collected is classified according to ◗◗ utility: the main applications for which the different networks were designed ◗◗ technical characteristics: to obtain indicators of the most commonly used equipment and materials ◗◗ implementation: to determine the type of crops in which WSN technology is implemented and the level of scientific development in different regions, with emphasis on Latin America. Year FIGURE 1. The documents sample, sorted by year. TABLE 1. THE INCLUSION AND EXCLUSION CRITERIA FOR HARDWARE-RELATED ARTICLES. INCLUDED EXCLUDED Primary research presenting a proposal for a WSN-based platform applied to agriculture Documentation of secondary research of any kind Documents presenting an agricultural technological proposal, by which a minimum of one physical or chemical parameter is monitored Documents describing agricultural platforms based on non-WSN technologies Documentation detailing the parameters monitored or the use of hardware in terms of sensing and data acquisition Documentation that does not technically express hardware specifications or parameters to be monitored Documents whose proposals are physically applied in a laboratory, greenhouse, or open field Documents focused only on software matters, such as algorithms, information processing, and device programming, among others 211
of water resources reflected in irrigation systems and toward increasing of production and efficiency of crops through the automation and optimization of processes. These two practices correspond to 26.7 and 36.1% of the projects, respectively. On a smaller scale, the objectives of the technical evaluation of network performance and development of inclusive technologies, whether low cost or user friendly, represent 20.9 and 10.5%, respectively. Platforms aimed at improving the final quality of products and reducing the vulnerability of crops to climate change account for 4.7% each, and pest treatment accounts for 2.3%. The results can be grouped into four areas. First is the improvement of crop yields, which includes production optimization and the automation of water management and pest treatment, representing 65.1%. Second is the evaluation of the proposals and/or how to make them affordable, both of which focus on generalizing the use of WSNs; in this case, they represent 31.4%. In third and fourth place are final product quality and climate change vulnerability reduction, both cases with 4.7%. WSN Applications in Agriculture Documents (%) 40 36.1% 26.7% 20.9% 20 10.5% 4.7% 0 2.3% Water Management/Irrigation Production Optimization Performance Technical Evaluation Inclusive Technology/Affordable Solutions Final Product Quality Climate Change Vulnerability Reduction Pest Treatment FIGURE 2. The main applications of WSN in agriculture. Analyzed Variables 77.91 72.09 16.28 0 10 20 30 40 50 60 70 80 90 Documents (%) Plant Environment Soil FIGURE 3. The percentage representation of analyzed variables within the documentary sample. 212 From the Figure 2 background data, three major aspects are deduced. First, a solution for optimization in agriculture is needed. Using WSNs, farmers can reduce operational costs and/or improve productivity to obtain higher incomes. Additionally, the environmental impact is reduced with this optimization as a result of improved awareness of greener production (environment friendly, organic, and so forth) and the completion of different required regulations (for example, the reduction of pesticides in pest control). As a result of this enhanced leverage, farmers get a betterquality crop. The second aspect is the limited resource of water. Its constraint is exacerbated by climate change, and it is estimated that about 70% of freshwater will be consumed by 2050 [128]. However, it is in great demand in agriculture for different kinds of irrigation, from flooding to dripping; this has led to conflicts in establishing fair usage among different consumers. The application of WSNs optimizes water management and enables a more equitable distribution. Third, WSN solutions must be tested and validated to demonstrate their feasibility. Working with WSNs has many advantages over other technologies: low cost, adaptability, an easy learning curve, inclusive technology with open and proprietary solutions, a good cost–benefit relation, and the possibility of nondestructive tests of the technology in crops. All of this makes many preliminary studies possible, as the number of articles shows. To make the right agronomical decisions possible, WSN applications need to monitor various parameters. From the analyzed works, a variable classification was made based on where data were acquired: in “plant,” “environment,” and “soil.” Figure 3 shows that soil stands out most frequently, with 77.91%, followed closely by the environment (72.09%) and, lastly, in a much smaller value, by the characteristics of the plant or fruit, corresponding to 16.28% of the total investigations from the documentary sample. Normally, different parameters were monitored simultaneously. Also, sometimes, the same parameter was measured for distinct variables, for example, humidity in the soil and environment. Monitoring plants is more difficult since, as living entities, they grow based on each of their own characteristics—height, foliage, thickness, and so on—resulting in the need to relocate sensors to correct measurement and function. In our experience, the direct monitoring of plants entailed some problems in relation to that specific issue; sensors lost the line of sight, which reduced the effective range of transmission and, in the worst case, resulted in losing all connectivity or part of the data. In some situations, increasing the power of transmission, with the handicap of more power consumption, may mitigate these disruptions. It is sufficient to effectively assess the condition of crops; most scientific documents rule out the direct surveillance of plants. For instance, water/irrigation management can be monitored with a high degree of accuracy using two variables: soil moisture and ambient humidity. These parameters are specific and are present in many investigations. For each acquisition location, further analysis is performed, identifying individual monitored variables. Figure 4 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
75 Documents (%) displays that information. For soil, the most frequently monitored parameter is moisture, followed by temperature and, to a lesser extent, pH and electrical conductivity. Concerning the environment, temperature is the most monitored parameter, followed by relative humidity and lighting or solar radiation. On a minor scale, the parameters of wind speed and direction, atmospheric pressure, and gas concentration are considered. From the scarce documentation focused on monitoring plant variables, temperature, humidity, and the diameter of the stem or trunk are the most common. The common use of humidity and temperature parameters, either in the soil or in the environment, together with the luminance (the amount of solar radiation the crop receives), is not surprising. As can be deduced from most studies, and as we can reaffirm with our experience, they are the main agents that have the most significant impact on the appropriate development of the majority of crops. Sensorized Parameters Per Variable 68.6 65.12 50 25 6.98 0 Soil Environment Moisture/Relative Humidity Luminosity Electric Conductivity Carbon Dioxide Concentration Trunk Diameter Growth 6.98 Plant Temperature pH Pressure Wind Speed/Direction Others FIGURE 4. The percentage representation of measured parameters per variable within the documentary sample. Campbell: 4.7% Meter Group (Formerly Decagon): 15.1% Honeywell: 4.7% TECHNICAL CHARACTERISTICS An important factor is the technical Davis characteristics of the sensing device. To Instruments: Most analyze this information about sensors, 2.3% Popular the communication protocols, user inVegetronix: Sensor Brands 2.3% terface, and availability of freely accessible data were extracted. The objective of this was to collect the main techniFIGURE 5. The percentage representation of the most commonly used sensor brands within cal characteristics of WSNs in the sethe documentary sample. lected articles and then to determine the included needs and specify which new approaches would be convenient for future research. TABLE 2. THE MOST POPULAR SENSORS. To obtain this information, the frequency of the use of the sensing hardware within the documentary sample was MODEL PARAMETER analyzed, revealing which models and brands of sensors SHT1X by Sensirion Digital humidity sensor stand out among the great diversity that exists. This inforSHT7X by Sensirion Digital relative humidity and temperature sensor mation can be seen in Figure 5 and Table 2. DHT11 by Aosong Digital temperature and humidity sensor Meter Group (formerly Decagon and UMS) has more DHT22 by Aosong Digital temperature and humidity sensor than half of the market; it has specialized in agricultural and EC-5 by Meter Group Soil moisture/volumetric water content (formerly Decagon) food sensors since its creation in the 1980s. Other brands FC-28, YL-69, and Soil moisture share the rest of the market without a clear second position. HL-69 modules for Most sensors created by these vendors are centered in huArduino midity or temperature monitoring in the environment, soil, LM-35 by various Precision centigrade temperature or plant. Both Decagon and UMS were pioneers before their manufacturers merger and have been in the market for a long time with DS18B20 by various Digital thermometer manufacturers well-known products and broad support, which gives them an advantage over other companies. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 213
Documents (%) material does not specify the communication protocol used, making analyWireless Communication Protocols sis impossible in such cases. Recently, Does Not Specify: 19.1% LoRaWAN has been increasingly used. It is a newer protocol, with its first version defined in 2015. It has a higher range of transmission (up to 15 km Others: 6.7% in the countryside), more bandwidth, and only a slightly increased power consumption (compensated for in Zigbee: 56.2 Zigbee: 56.2% % LoRaWAN: 4.5% part by the evolution of technology, with better batteries and more effiWi-Fi: 4.5% cient solar panels). Bluetooth: 4.5% According to the devices used GSM: 2.3% for data acquisition, as displayed in RFID: 2.3% Figure 7, about 42% of documents make use of commercial or private FIGURE 6. The percentage representation of the most commonly used wireless communicamotes or nodes. Motes and nodes are tion protocols. terms that refer to platforms whose development depends on closedsource software or whose architecture expansion is limited Data Acquisition Device to components of the same company that markets them. Affordability Evaluation The most import­ant companies in private motes are Mem60 sic (formerly Crossbow Technology) [136] and Waspmote 50 50 [137]. About 50% of the devices use open hardware nodes, 41.86 most of them belonging to different microcontroller mod40 18.6 12.79 els and various versions of development boards from the 30 Arduino microcontroller platform. Normally, research doc6.98 20 uments use open hardware due to its low cost and flexibil27.91 ity. Commercial hardware’s strength is its reliability, while 22.09 10 8.14 its weakness is limited parametrization and higher cost. In 8.14 3.49 0 open hardware, microcontrollers are the most used thanks Private/ Open No to their flexibility and adaptability. Commercial Hardware Specification As a part of this review, we verified whether the authors Crossbow–Memsic prioritized easy and friendly interaction between the platWaspmote form and the user through the development of a graphiOthers cal user interface (GUI), and we determined the type of Raspberry application the authors chose. Many of the studies didn’t Microcontrollers include the development of a GUI in their methodology Arduino or didn’t consider it important to report; as can be seen in Does Not Specify Figure 8, this is the case in 50% of the analyzed documents. When the authors used a GUI and reported it, most of them FIGURE 7. The classification of data acquisition devices by open developed a web application, representing 36.05%, while source and private characterization within the documentary 8.14% used the native mote application, and only 3.49% sample. developed a mobile application. Papers reporting only research proposals normally didn’t include or report a GUI. Those reporting applied To communicate between sensors or with the sink, many research, for example, with farmers as the end users, were different wireless protocols may be used. The most commonmore prone to include and report a developed GUI. The end ly used one is Zigbee (IEEE Standard 802.15.4) [129], thanks users don’t have to know the technical details and prefer a to its low energy consumption. It was defined in 2004 and user-friendly interface. focuses on low-range and low-consumption connectivity. As presented in Figure 6, more than half of the collected maIMPLEMENTATION terial uses Zigbee (56.2%). Representing no more than 5% We also analyzed the type of farming conditions on which of the documents each, other protocols are used: Bluetooth the research focused, the validated results of the applica[130], Wi-Fi [131], LoRaWAN [132], [133], radio-frequency tion of this technology, and, finally, the regions where identification [134], and GSM [135]. Almost 20% of the 214 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE Graphic User Interface 36.05 3.49 1.16 8.14 50 0 5 10 15 20 25 30 35 Documents (%) Web Application Mobile Application Hybrid Application 40 45 50 55 Native Application Not Specified FIGURE 8. The percentage representation of GUIs within the documentary sample. 40 Research Documentation Distribution per Region Indexed Articles Digital Repositories 30 20 10 Oceania Africa North America Europe Asia 0 Latin America CONCLUSIONS Progress in the development of WSNs applied for PA is part of the solution for food insecurity because results from different authors show that, through this technology, it is possible to guarantee the quality of products, optimize processes to increase production and/or preserve resources, and reduce the gap between small farmers and new technologies. According to the experience in all of the analyzed publications, the application of WSNs for PA is very beneficial in terms of time, production, and environmental care factors. In the studied documentary sample, the parameters selected most often for monitoring correspond to humidity and the temperature of the soil and environment, which constitute sufficient basic information to generate optimal agricultural plans. The main focus of the scientific community for the development of WSNbased technologies for PA is production optimization and water management through the design of irrigation systems. Furthermore, it is agreed that these platforms, if based on open source and open hardware technology, correspond to inclusive solutions for small and medium farmers, favoring more than one economic sector. However, many of the proposals do not direct their development objectives toward specific beneficiaries, such as populations affected Documents the research was deployed to agriculture. The implementation of the proposed platforms was, in most cases, in an open field (54.65%), followed by a greenhouse (20.93%), laboratories (5.81%), and not specified (18.61%). Normally, research started with a laboratory test and then moved to a greenhouse (as a more controlled ambient environment) and, finally, an open field. In some cases, due to the crop type, the greenhouse was the final stage. The implementation of WSNs for PA has achieved the following, according to the recollected conclusions from different researchers in the analyzed documents: ◗◗ a considerable reduction in water consumption through irrigation, based on monitored data rather than scheduled watering ◗◗ decision support to farmers and the generation of autonomous practices through automated processes, implemented together with the monitoring system ◗◗ increased crop yields and optimized crop growth while reducing the use of resources ◗◗ investigation of the variations of environmental conditions ◗◗ a decrease in the price of the systems compared to industrial solutions, thanks to the implementation of open software and hardware, which provide similar yields. From a regional point of view, studied works are from various countries on five continents. Asia turned out to be the continent with the largest number of publications, with a total of 27 documents. From Latin America, the priority region for this analysis, nine articles were considered. There were 16 from Europe, nine from North America, four from Africa, and two from Oceania, as presented in Figure 9. As a result of the analysis concerning Latin America, the countries that contributed notable published research in the field of PA, including undergraduate and graduate theses, are Colombia, Mexico, Brazil, Ecuador, and Argentina. The geographic location of all of the reviewed documents and whether they are indexed publications or correspond to degree projects in digital repositories are outlined in Figure 10. On one hand, Asia has the largest population of the five continents and a huge extension of crops (extensive or greenhouse). For these reasons and with many universities and research centers, Asian researchers explored and applied PA to improve crop management, resulting in more published documents in the area. On the other hand, Latin American countries have a high percentage of family farming [138] and also have the need for better crop management. However, with a limited budget in research and with smaller universities, they have more difficulty making research contributions in the area. For these reasons, some analyzed documents in this region are from digital repositories. FIGURE 9. The percentage representation of the regional distribution of the documentary sample. 215
216 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021 FIGURE 10. The documentary sample’s geographical locations. Digital Repositories Indexed Articles Colombia (4) Ecuador (1) Ecuador (2) Mexico (3) United States (9) Canada (1) Iran (1) Egypt (2) Palestine (1) Kenya (1) Argentina (1) Brazil (2) Brazil (1) Colombia (4) Portugal (2) Spain (5) Tunisia (1) Malawi (1) Australia (1) Vietnam (3) Indonesia (1) Thailand (1) India (8) China (13) Macedonia (1) Greece (4) Germany (2) Italy (4) New Zealand (1) Malaysia (3) Philippines (1) Taiwan (1)
by their economic capacity or a low level of production. The documents corroborate the validity of the platforms for general purposes, such as water saving and crop monitoring, without reflecting economic–social results. The East, South, and Insular Asia regions are the main producers of scientific documents related to the implementation of WSNs for PA. Latin America is the second largest producer of scientific documents on this topic. This contrasts with the reality that the level of Latin American research cannot yet be compared with that of agricultural power regions, such as China, India, and the United States. This situation is justified due to the inclusion of the digital repository criteria since, in Latin America, most useful contributions correspond to undergraduate and graduate projects that, in some cases, are not published in indexed repositories. The documents from this region are articles released in regional journals, indexed in Latinindex and Publindex, whose impact is not comparable to global databases, such as Scopus or IEEE Xplore. Another feature that stands out in this analysis is the lack of importance given to user friendliness, with some systems requiring permanent assistance from a technician and intensive training because of the lack of a user-friendly interface. There are very few primary research articles experimenting with new wireless communication protocols, such as LoRaWAN. Most of them focus on the application of Zigbee, and fewer of them center on other well-known protocols. The most notable deficiency in the scientific research of this technology corresponds to the lack of freely accessible data, which would generate starting points for innovation through big data or machine learning to provide services adapted to different crop conditions and forecast the variation of diverse environmental indicators over time. ACKNOWLEDGMENTS The authors would like to express their gratitude for the support of the PLAGRI (Plataforma de digitalización agrÍcola para pymes basada en IOT/IOT-based agricultural digitization platform for SMEs) project by the Telecommunications and Telematics Research Group (GITEL), Universidad Politécnica Salesiana, Cuenca, Ecuador. They would also like to thank project EXT-2020-06 “Sistema de alerta temprana de heladas” from CONGOPE–ESPE and project RED DUS-C-01 “MASCHA – Monitoreo de microclima urbano” from RedDUS-C-ESPE. Additionally, the authors gratefully acknowledge the contributions of Jose Ignacio Castillo and Wolfgang Lichtenwagner for their suggestions in the original version of this document. AUTHOR INFORMATION Mónica Karel Huerta (mhuerta@ups.edu.ec) received her M.Sc. degrees in biomedical engineering and electrical engineering from Universidad Simón Bolívar (USB) in 1999 and 1994, respectively, and her Ph.D. degree (cum laude) in telematics engineering from the Universitat Politècnica de Catalunya (Spain) in 2006. She is with the Telecommunications and Telematics Research Group (GITEL), DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE Universidad Politécnica Salesiana, Cuenca, 010102, Ecuador. She is a full professor at the Universidad Politécnica Salesiana of Cuenca–Ecuador, Cuenca, 010102, Ecuador. She was a professor, dean of graduate studies, coordinator of the doctorate in engineering, and the founder of the Networks and Telematics group at USB. She was a researcher at the Universidad de las Fuerzas Armadas Escuela Politécnica del Ejército in 2014, and in 2015 and 2016 she was also a researcher at the Universidad Politécnica Salesiana of Cuenca, both under the Prometeo program of Secretaría de Educación Superior, Ciencia, Tecnología e Innovación, Ecuador. Her research focuses on wireless networks, wireless sensor networks, precision agriculture, the Internet of Things, and telemedicine. She is a Senior Member of IEEE, vice president of the IEEE Ecuador Section for 2020–2021, and a member of the IEEE Women in Engineering, Women in Communications, and Women in Engineering in Medicine and Biology. Andrea García-Cedeño (agarciac@ups.edu.ec) received her B.S. degree in electronic engineering with a major in industrial systems from Universidad Politécnica Salesiana in 2017. Since 2017, she has been a research assistant at Telecommunications and Telematics Research Group (GITEL), Universidad Politécnica Salesiana, Cuenca, 010102, Ecuador. Her research interests include precision agriculture, wireless sensor networks, and signals acquisition and processing. She is a Member of IEEE and currently serves as the Chapter secretary of the IEEE Engineering in Medicine and Biology Society of the IEEE Ecuador Section. Juan Carlos Guillermo (jguillermo@ups.edu.ec) received his degree in systems engineering (computer engineering) with a major in telematics from Universidad Politécnica Salesiana in 2015. From 2015 to 2017, he was a research assistant with the Computer Science Department of the Universidad de Cuenca. From 2017 to 2019, he was a research assistant at Telecommunications and Telematics Research Group (GITEL), Universidad Politécnica Salesiana, 010102 Cuenca, Ecuador. His research interests include projects related to the Internet of Things applied to agriculture and weather stations, radiological image analysis and processing, network security, and the analysis and processing of big data. He is a Member of IEEE. Roger Clotet (roger.clotet@campusviu.es) received his Ph.D. degree in engineering from Universidad Simón Bolívar (USB), Venezuela, in 2019, and is a computer science engineer with the Universitat Politècnica de Catalunya, Spain in 2004. Currently, he is a teacher and researcher at Valencian International University (VIU) and a member of the Astronomy, Big Data, and Computing Science Group, VIU, Valencia, 46002, Spain. He taught in the Computer Science Department of USB between 2010 and 2013 and at the Telecommunications Engineering School of Universidad Católica Andrés Bello between 2011 and 2013, both in Caracas, Venezuela. He was a researcher with the Networks and Applied Telematics group at USB from 2009 to 2019. His current research interests 217
include electronic health records, telemedicine, e-health, e-agriculture, big data, and wireless sensor networks. He is a Senior Member of IEEE. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] 218 M. Á. Altieri and C. I. Nicholls, “Agroecología: única esperanza para la soberanía alimentaria y la resiliencia socioecológica,” Agroecología, vol. 7, no. 2, pp. 65–83, 2013. O. L. Balogun, “Sustainable agriculture and food crisis in subSahara Africa,” in Global Food Insecurity, M. Behnassi, S. Draggan, and S. Yaya, Eds. Berlin: Springer-Verlag, 2011, pp. 283– 297. M. Behnassi, S. Draggan, and S. Yaya, Global Food Insecurity: Rethinking Agricultural and Rural Development Paradigm and Policy. Berlin: Springer-Verlag, 2011. M. Sassi, Understanding Food Insecurity: Key Features, Indicators, and Response Design. Berlin: Springer-Verlag, 2017. U. Haruna and M. B. Umar, “Agricultural development for food security and sustainability in Nigeria,” in Global Food Insecurity, M. Behnassi, S. Draggan, and S. Yaya, Eds. Berlin: Springer-Verlag, 2011, pp. 63–71. G. Kasza, J. Szigeti, S. Podruzsik, and K. Keszthelyi, “Risk communication at the Hungarian guar-gum scandal,” in Global Food Insecurity, M. Behnassi, S. Draggan, and S. Yaya, Eds. Berlin: Springer-Verlag, 2011, pp. 173–183. C. B. Barrett, “Measuring food insecurity,” Science, vol. 327, no. 5967, pp. 825–828, 2010. doi: 10.1126/science.1182768. K. L. Sharma, “Food security in the south pacific island countries with special reference to the Fiji Islands,” in Food Insecurity, Vulnerability and Human Rights Failure, B. Guha-Khasnobis, S. S. Acharya, and B. Davis, Eds. Berlin: Springer-Verlag, 2007, pp. 35–57. A. Charman and J. Hodge, “Food security in the SADC region: An assessment of national trade strategy in the context of the 2001–03 food crisis,” in Food Insecurity, Vulnerability and Human Rights Failure, B. Guha-Khasnobis, S. S. Acharya, and B. Davis, Eds. Berlin: Springer-Verlag, 2007, pp. 58–81. C. Gundersen and J. P. Ziliak, “Food insecurity and health outcomes,” Health Affairs, vol. 34, no. 11, pp. 1830–1839, 2015. doi: 10.1377/hlthaff.2015.0645. “Las crisis alimentarias continúan golpeando: el hambre aguda se intensifica,” Organización de las Naciones Unidas para la Alimentación y la Agricultura, FAO Headquarters, Rome, Italy. 2018. Accessed: June 8, 2019. [Online] Available: http:// www.fao.org/news/story/es/item/1110457/icode/ H. G. Bohle, T. E. Downing, and M. J. Watts, “Climate change and social vulnerability: Toward a sociology and geography of food insecurity,” Global Environ. Change, vol. 4, no. 1, pp. 37–48, 1994. doi: 10.1016/0959-3780(94)90020-5. M. S. I. Molla, “18,000 children die of starvation everyday: Cannot we save them?” in Global Food Insecurity, M. Behnassi, S. Draggan, and S. Yaya, Eds. Berlin: Springer-Verlag, 2011, pp. 127–147. R. Gebbers and V. I. Adamchuk, “Precision agriculture and food security,” Science, vol. 327, no. 5967, pp. 828–831, 2010. doi: 10.1126/science.1183899. [15] B. Cerfontaine, S. Panhuysen, and C. Wunderlich, Sostenibilidad Agrícola: Kit de herramientas de planificación. Sustainable Commodity Assistance Network, Winnipeg, Canada, 2014. [16] M. Altieri and C. Nicholls, “Agroecología: Potenciando la agricultura campesina para revertir el hambre y la inseguridad alimentaria en el mundo,” Revista de Economía Crítica, vol. 10, no. 2, pp. 62–74, 2010. [17] H. Valenzuela, “Agroecology: A global paradigm to challenge mainstream industrial agriculture,” Horticulturae, vol. 2, no. 1, p. 2, 2016. doi: 10.3390/horticulturae2010002. [18] “Agricultura familiar y desarrollo territorial rural en América Latina y el Caribe,” Organización de las Naciones Unidas para la Alimentación y la Agricultura, FAO Headquarters, Rome, Italy, 2014. Accessed: June 12, 2019. [Online] Available: http://www.fao.org/3/a-at886s.pdf [19] D. Mulla and R. Khosla, “Historical evolution and recent advances in precision farming,” in Soil-Specific Farming Precision Agriculture, R. Lal and B. A. Stewart, Eds. Boca Raton, FL: CRC Press, 2016, pp.1–35. [20] R. Bongiovanni, E. Mantovani, S. Best, and Á. Roel, “Agricultura de precisión: Integrando conocimientos para una agricultura moderna y sustentable,” Procisur/IICA, 2006. [21] Y. Lambrou and R. Laub, “Gender, local knowledge and lessons learnt in documenting and conserving agrobiodiversity,” in Food Insecurity, Vulnerability and Human Rights Failure, B. GuhaKhasnobis, S. S. Acharya, and B. Davis, Eds. Berlin: SpringerVerlag, 2007, pp. 161–194. [22] M. Piamonte, M. Huerta, R. Clotet, J. Padilla, T. Vargas, and D. Rivas, “WSN prototype for African oil palm bud rot monitoring,” in International Conference of ICT for Adapting Agriculture to Climate Change, P. Angelov, J. Iglesias, and J. Corrales, Eds. Berlin: Springer-Verlag, 2017, pp. 170–181. [23] J. Wanjiku, J. U. Manyengo, W. Oluoch-Kosura, and J. T. Karugia, “Gender differentiation in the analysis of alternative farm mechanization choices on small farms in Kenya,” in Food Insecurity, Vulnerability and Human Rights Failure, B. Guha-Khasnobis, S. S. Acharya, and B. Davis, Eds. Berlin: Springer-Verlag, 2007, pp. 194–218. [24] “Sistemas de innovación para el desarrollo rural sostenible en América Latina y el Caribe,” Organización de las Naciones Unidas para la Alimentación y la Agricultura, FAO Headquarters, Rome, Italy. Accessed: June 17, 2019. [Online] Available: http://www.fao.org/3/ a-i7769s.pdf [25] F. J. Ferrández-Pastor, J. M. García-Chamizo, M. Nieto-Hidalgo, J. Mora-Pascual, and J. Mora-Martínez, “Developing ubiquitous sensor network platform using Internet of Things: Application in precision agriculture,” Sensors, vol. 16, no. 7, p. 1141, July 2016. doi: 10.3390/s16071141. [26] I. H. Erden and O. Tozan, “Remote sensors and mobile technologies for precision agricultural data,” in Proc. 4th Int. Conf. AgroGeoinformatics (Agro-Geoinformatics), July 2015, pp. 105–108. [27] G. Carrión, M. Huerta, and B. Barzallo, “Monitoring and irrigation of an urban garden using IoT,” in Proc. IEEE Colombian Conf. Commun. Comput. (COLCOM), 2018, pp. 1–6. doi: 10.1109/ColComCon.2018.8466722. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
[28] A. García-Cedeño et al., “Platano: Intelligent technological support platform for Azuay province farmers in Ecuador,” in Proc. IEEE Int. Conf. Eng. Veracruz (ICEV), 2019, vol. 1, pp. 1–7. doi: 10.1109/ICEV.2019.8920501. [29] M. Erazo-Rodas et al., “Multiparametric monitoring in equatorian tomato greenhouses (i): Wireless sensor network benchmarking,” Sensors, vol. 18, no. 8, p. 2555, 2018. doi: 10.3390/ s18082555. [30] A. Baggio, “Wireless sensor networks in precision agriculture,” in Proc. ACM Workshop Real-World Wireless Sensor Netw. (REALWSN 2005), Stockholm, Sweden, 2005, vol. 20, pp. 1567–1576. [31] T. Ojha, S. Misra, and N. S. Raghuwanshi, “Wireless sensor networks for agriculture: The state-of-the-art in practice and future challenges,” Comput. Electron. Agri., vol. 118, pp. 66–84, Oct. 2015. doi: 10.1016/j.compag.2015.08.011. [32] E. García and F. Flego, “Agricultura de precisión,” Revista Ciencia y Tecnología, vol. 8, pp. 99–116, 2008. [Online]. Available: https://www.palermo.edu/ingenieria/Ciencia_y_tecnologia/ ciencia_y_tecno_8.htm [33] J. Abad et al., “Coffee crops variables monitoring: A case of study in Ecuadorian Andes,” in International Conference of ICT for Adapting Agriculture to Climate Change, J. Corrales, P. Angelov, and J. Iglesias, Eds. Berlin: Springer-Verlag, 2018, pp. 202–217. [34] J. C. Guillermo, A. García-Cedeño, D. Rivas-Lalaleo, M. Huerta, and R. Clotet, “IoT architecture based on wireless sensor network applied to agricultural monitoring: A case of study of cacao crops in Ecuador,” in International Conference of ICT for Adapting Agriculture to Climate Change, J. Corrales, P. Angelov, and J. Iglesias, Eds. Berlin: Springer-Verlag, 2018, pp. 42–57. [35] F. Sichiqui et al., “Agricultural information management: A case study in corn crops in ecuador,” in The International Conference on Advances in Emerging Trends and Technologies. Berlin: Springer-Verlag, 2019, pp. 113–124. [36] A. de la Piedra, F. Benitez-Capistros, F. Dominguez, and A. Touhafi, “Wireless sensor networks for environmental research: A survey on limitations and challenges,” in Proc. IEEE International Conference on Smart Technologies (EUROCON), July 2013, pp. 267–274. [37] S. Wolfert, L. Ge, C. Verdouw, and M.-J. Bogaardt, “Big data in smart farming: A review,” Agri. Syst., vol. 153, pp. 69–80, May 2017. doi: 10.1016/j.agsy.2017.01.023. [38] R. A. Viscarra Rossel and J. Bouma, “Soil sensing: A new paradigm for agriculture,” Agri. Syst., vol. 148, pp. 71–74, Oct. 2016. doi: 10.1016/j.agsy.2016.07.001. [39] Y. Zhu, J. Song, and F. Dong, “Applications of wireless sensor network in the agriculture environment monitoring,” Proc. Eng., vol. 16, pp. 608–614, 2011. [Online]. Available: https://www .sciencedirect.com/science/article/pii/S1877705811026324 doi: 10.1016/j.proeng.2011.08.1131. [40] Scopus. Accessed: May 9, 2020. [Online]. Available: http:// www.scopus.com [41] Latindex, Sistema regional de información en línea para revistas científicas de América Latina, el Caribe, España y Portugal. Accessed: May 9, 2020. [Online]. Available: https://www.latindex .org/latindex/gCatalogo DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE [42] Publindex, Sistema de indexación y homologación de revistas especializadas de CTI. Accessed: May 9, 2020. [Online]. Available: https://scienti.minciencias.gov.co/publindex/#/ revistasPublindex/buscador [43] G. D. Israel, “Determining sample size,” IFAS Extension, 1992. [44] J. E. Guaña Moya, “Diseño de una Red de Sensores Inalámbricos (WSN) para monitorear parámetros relacionados con la agricultura,” Ph.D. thesis, Escuela Politécnica Nacional, Quito, Ecuador, Nov. 2016. [45] C. Goumopoulos, “An autonomous wireless sensor/actuator network for precision irrigation in greenhouses,” in Smart Sensing Technology for Agriculture and Environmental Monitoring, S. Mukhopadhyay, Ed. Berlin: Springer-Verlag, 2012, pp. 1–20. [46] O. Postolache, J. M. Pereira, P. S. Girão, and A. A. Monteiro, “Greenhouse environment: Air and water monitoring,” in Smart Sensing Technology for Agriculture and Environmental Monitoring, S. Mukhopadhyay, Ed. Berlin: Springer-Verlag, 2012, pp. 81–102. [47] L. Bencini, S. Maddio, G. Collodi, D. D. Palma, G. Manes, and A. Manes, “Development of wireless sensor networks for agricultural monitoring,” in Smart Sensing Technology for Agriculture and Environmental Monitoring. Berlin: Springer-Verlag, 2012, pp. 157–186. [48] C. Cambra, S. Sendra, J. Lloret, and L. Garcia, “An IoT serviceoriented system for agriculture monitoring,” in Proc. IEEE Int. Conf. Commun. (ICC), May 2017, pp. 1–6. [49] G. Sahitya, N. Balaji, C. D. Naidu, and S. Abinaya, “Designing a wireless sensor network for precision agriculture using Zigbee,” in Proc. IEEE 7th Int. Adv. Comput. Conf. (IACC), Jan. 2017, pp. 287–291. doi: 10.1109/IACC.2017.0069. [50] I. Mat, M. R. M. Kassim, and A. N. Harun, “Precision agriculture applications using wireless moisture sensor network,” in Proc. IEEE 12th Malaysia Int. Conf. Commun. (MICC), 2015, pp. 18–23. [51] K. Ferentinos, N. Katsoulas, A. Tzounis, T. Bartzanas, and C. Kittas, “Wireless sensor networks for greenhouse climate and plant condition assessment,” Biosyst. Eng., vol. 153, pp. 70–81, Jan. 2017. doi: 10.1016/j.biosystemseng.2016.11.005. [52] F. A. Urbano-Molano, “Redes de sensores inalámbricos aplicadas a optimización en agricultura de precisión para cultivos de café en colombia,” J. de Ciencia e Ingenier ia, vol. 5, no. 1, pp. 46–52, 2013. [53] A. Torre Neto et al., “Wireless sensor network for variable rate irrigation in citrus,” Inform. Technol. Sustainable Fruit Vegetable Prod., vol. 7, pp. 563–569, Sept. 2005. [54] M. Flores-Medina, F. Flores-García, V. Velasco-Martínez, G. González-Cervantes, and F. Jurado-Zamarripa, “Monitoreo de humedad en suelo a través de red inalámbrica de sensores,” Tecnología y ciencias del agua, vol. 6, no. 5, pp. 75–88, 2015. [55] Y. Kim, R. G. Evans, and W. M. Iversen, “Remote sensing and control of an irrigation system using a distributed wireless sensor network,” IEEE Trans. Instrum. Meas., vol. 57, pp. 1379–1387, July 2008. doi: 10.1109/TIM.2008.917198. [56] L. Fernandez, M. Huerta, G. Sagbay, R. Clotet, and A. Soto, “Sensing climatic variables in a orchid greenhouse,” in Proc. 219
[57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] 220 Int. Caribbean Conf. Devices, Circuits Syst. (ICCDCS), 2017, pp. 101–104. doi: 10.1109/ICCDCS.2017.7959719. M. Culman, J. M. T. Portocarrero, C. D. Guerrero, C. Bayona, J. L. Torres, and C. M. d. Farias, “PalmNET: An open-source wireless sensor network for oil palm plantations,” in Proc. IEEE 14th Int. Conf. Netw., Sensing Control (ICNSC), May 2017, pp. 783– 788. doi: 10.1109/ICNSC.2017.8000190. J. Petearson Anzola, V. García-Díaz, and A. C. Jiménez, “Wsn analysis in grid topology for potato crops for IoT,” in Proc. 4th Multidisciplinary Int. Soc. Netw. Conf., 2017, p. 44. doi: 10.1145/3092090.3092104. T. Cao-hoang and C. N. Duy, “Environment monitoring system for agricultural application based on wireless sensor network,” in Proc. 7th Conf. Inf. Sci. Technol. (ICIST), 2017, pp. 99–102. F. Mesas-Carrascosa, D. V. Santano, J. Meroño, M. S. de la Orden, and A. García-Ferrer, “Open source hardware to monitor environmental parameters in precision agriculture,” Biosystems engineering, vol. 137, pp. 73–83, Sept. 2015. doi: 10.1016/j.biosystemseng.2015.07.005. M. R. Marín, L. Padilla Sánchez, and J. Gómez Gómez, “Sistema de monitoreo agrícola mediante redes inalámbricas de sensores para el monitoreo de variables ambientales SISMOAGRO,” Ingeniería al Día, vol. 2, no. 2, pp. 4–22, Sept. 2016. A. L. Diedrichs, G. Tabacchi, G. Grünwaldt, M. Pecchia, G. Mercado, and F. G. Antivilo, “Low-power wireless sensor network for frost monitoring in agriculture research,” in Proc. IEEE Biennial Congr. Argentina (ARGENCON), June 2014, pp. 525–530. doi: 10.1109/ARGENCON.2014.6868546. R. Aquino-Santos, A. González-Potes, A. Edwards-Block, and R. A. Virgen-Ortiz, “Developing a new wireless sensor network platform and its application in precision agriculture,” Sensors, vol. 11, no. 1, pp. 1192–1211, 2011. doi: 10.3390/ s110101192. R. Filev Maia, I. Netto, and A. L. H. Tran, “Precision agriculture using remote monitoring systems in brazil,” in Proc. IEEE Global Humanitarian Technol. Conf. (GHTC), 2017, pp. 1–6. N. Fahmi, S. Huda, E. Prayitno, M. U. H. A. Rasyid, M. C. Roziqin, and M. U. Pamenang, “A prototype of monitoring precision agriculture system based on WSN,” in Proc. Int. Seminar Intell. Technol. Its Appl. (ISITIA), Aug. 2017, pp. 323–328. doi: 10.1109/ ISITIA.2017.8124103. J. M. Nunez, F. Fonthal, and Y. Quijada, “Design and implementation of WSN for precision agriculture in white cabbage crops,” in Proc. IEEE XXIV Congreso Internacional de Ingeniería Eléctrica, Electrónica y Computación ( INTERCON), Arequipa, Peru, 2017, pp. 1–4. N. Karimi, A. Arabhosseini, M. Karimi, and M. H. Kianmehr, “Web-based monitoring system using Wireless Sensor Networks for traditional vineyards and grape drying buildings,” Comput. Electron. Agri., vol. 144, pp. 269–283, Jan. 2018. doi: 10.1016/j.compag.2017.12.018. Y. E. M. Hamouda and B. H. Y. Elhabil, “Precision agriculture for greenhouses using a wireless sensor network,” in Proc. Palestinian Int. Conf. Inf. Commun. Technol. (PICICT), May 2017, pp. 78–83. [69] R. K. Kodali, N. Rawat, and L. Boppana, “WSN sensors for precision agriculture,” in Proc. Region 10 Symp., 2014, pp. 651–656. [70] R. Godoi Vieira, A. Cunha, M. da Cunha, L. B. Ruiz, and A. Pires de Camargo, “On the design of a long range WSN for precision irrigation,” IEEE Sensors J., vol. 18, no. 2, pp. 773–780, Jan. 2018. doi: 10.1109/JSEN.2017.2776859. [71] U. Dorji, T. Pobkrut, and T. Kerdcharoen, “Electronic nose based wireless sensor network for soil monitoring in precision farming system,” in Proc. 9th Int. Conf. Knowledge Smart Technol. (KST), 2017, pp. 182–186. [72] R. K. Kodali, S. Soratkal, and L. Boppana, “WSN in coffee cultivation,” in Proc. Int. Conf. Comput., Commun. Automat. (ICCCA), Apr. 2016, pp. 661–666. doi: 10.1109/CCAA.2016.7813804. [73] J. Wang, K. Damevski, and H. Chen, “Sensor data modeling and validating for wireless soil sensor network,” Comput. Electron. Agri., vol. 112, pp. 75–82, Mar. 2015. doi: 10.1016/j.compag.2014.12.016. [74] Z. Li et al., “Practical deployment of an in-field soil property wireless sensor network,” Comput. Standards Interf., vol. 36, no. 2, pp. 278–287, 2014. doi: 10.1016/j.csi.2011.05.003. [75] X. Dong, M. Vuran, and S. Irmak, “Autonomous precision agriculture through integration of wireless underground sensor networks with center pivot irrigation systems,” Ad Hoc Networks, vol. 11, no. 7, pp. 1975–1987, 2013. doi: 10.1016/j.adhoc.2012.06.012. [76] Z. Li, J. Wang, R. Higgs, L. Zhou, and W. Yuan, “Design of an intelligent management system for agricultural greenhouses based on the internet of things,” in Proc. IEEE Int. Conf. Comput. Sci. Eng. (CSE) and Embedded and Ubiquitous Comput. (EUC), 2017, vol. 2, pp. 154–160. [77] L. Liu and Y. Zhang, “Design of greenhouse environment monitoring system based on wireless sensor network,” in Proc. 3rd Int. Conf. Control, Automat. Robotics (ICCAR), 2017, pp. 463–466. [78] L. Geng and T. Dong, “An agricultural monitoring system based on wireless sensor and depth learning algorithm,” Int. J. Online Eng., vol. 13, no. 12, pp. 127–137, 2017. doi: 10.3991/ijoe. v13i12.7885. [79] A. Cama-Pinto, F. Gil-Montoya, J. Gómez-López, A. GarcíaCruz, and F. Manzano-Agugliaro, “Wireless surveillance system for greenhouse crops,” Dyna, vol. 81, no. 184, pp. 164–170, 2014. doi: 10.15446/dyna.v81n184.37034. [80] M. F. Quiñones Cuenca, “Sistema de monitoreo de variables medio ambientales usando una red de sensores inalámbricos y plataformas de internet de las cosas,” B.S. thesis, Universidad Nacional de Loja, Loja, Ecuador, 2017. [81] J. C. Ortega Ortiz, “Desarrollo de un prototipo de adquisición de variables ambientales en cultivos hidropónicos de lechuga, mediante una red de sensores, utilizando un sistema embebido,” Ingenierías USB Bogotá, 2014. [Online]. Available: http:// biblioteca.usbbog.edu.co:8080/Biblioteca/BDigital/83534.pdf [82] J. C. Suárez Barón and M. J. Suárez Barón, “Monitoreo de variables ambientales en invernaderos usando tecnología zigbee,” in Proc. XLIII Jornadas Argentinas de Informática e Investigación Operativa (43JAIIO)-VI Congreso Argentino de AgroInformática (CAI), Buenos Aires, 2014, pp. 165–175 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
[83] G. Aiello, I. Giovino, M. Vallone, P. Catania, and A. Argento, “A decision support system based on multisensor data fusion for sustainable greenhouse management,” J. Cleaner Prod., vol. 172, pp. 4057–4065, Jan. 2018. doi: 10.1016/j.jclepro.2017.02.197. [84] F. F. Montesano, M. W. van Iersel, F. Boari, V. Cantore, G. D’Amato, and A. Parente, “Sensor-based irrigation management of soilless basil using a new smart irrigation system: Effects of set-point on plant physiological responses and crop performance,” Agri. Water Manage., vol. 203, pp. 20–29, Apr. 2018. doi: 10.1016/j.agwat.2018.02.019. [85] J.-A. Jiang et al., “A wireless sensor network-based monitoring system with dynamic convergecast tree algorithm for precision cultivation management in orchid greenhouses,” Precision Agri., vol. 17, no. 6, pp. 766–785, Dec. 2016. doi: 10.1007/s11119-016-9448-7. [86] S. Khan, “Wireless sensor network based water well management system for precision agriculture,” in Proc. 26th Int. Telecommun. Netw. Appl. Conf. (ITNAC), 2016, pp. 44–46. doi: 10.1109/ ATNAC.2016.7878780. [87] Z. Li, N. Wang, T. Hong, T. Wen, and Z. Liu, “Design of wireless sensor network system based on in-field soil water content monitoring,” Nongye Gongcheng Xuebao/Trans. Chinese Soc. Agri. Eng., vol. 26, no. 2, pp. 212–217, 2010. [88] Z. Li, N. Wang, T. Hong, A. Franzen, and J. Li, “Closed-loop drip irrigation control using a hybrid wireless sensor and actuator network,” Sci. China Inform. Sci., vol. 54, no. 3, pp. 577–588, 2011. doi: 10.1007/s11432-010-4086-6. [89] J. Jao, B. Sun, and K. Wu, “A prototype wireless sensor network for precision agriculture,” in Proc. Int. Conf. Distrib. Comput. Syst., July 2013, pp. 280–285. [90] E. Kampianakis, J. Kimionis, K. Tountas, C. Konstantopoulos, E. Koutroulis, and A. Bletsas, “Wireless environmental sensor networking with analog scatter radio and timer principles,” IEEE Sensors J., vol. 14, no. 10, pp. 3365–3376, Oct. 2014. doi: 10.1109/JSEN.2014.2331704. [91] J. L. Chávez, F. J. Pierce, T. V. Elliott, and R. G. Evans, “A remote irrigation monitoring and control system for continuous move systems. Part A: Description and development,” Precision Agri., vol. 11, no. 1, pp. 1–10, Feb. 2010. doi: 10.1007/s11119-009-9109-1. [92] G. Vellidis, M. Tucker, C. Perry, C. Kvien, and C. Bednarz, “A real-time wireless smart sensor array for scheduling irrigation,” Comput. Electron. Agri., vol. 61, no. 1, pp. 44–50, 2008. doi: 10.1016/j.compag.2007.05.009. [93] W. Yitong, S. Yunbo, and Y. Xiaoyu, “Design of multi-parameter wireless sensor network monitoring system in precision agriculture,” in Proc. 4th Int. Conf. Instrum. Meas., Comput., Commun. Control (IMCCC 2014) 2014, pp. 721–725. doi: 10.1109/ IMCCC.2014.153. [94] I. Mat, M. R. M. Kassim, and A. N. Harun, “Precision irrigation performance measurement using wireless sensor network,” in Proc. 6th Int. Conf. Ubiquitous Future Netw. (ICUFN), 2014, pp. 154–157. doi: 10.1109/ICUFN.2014.6876771. [95] J. E. G. López, J. C. Chavez, and A. K. J. Sánchez, “Modelado de una red de sensores y actuadores inalámbrica para aplicaciones en agricultura de precisión,” in Proc. IEEE Mexican Humanitarian Technol. Conf. (MHTC), 2017, pp. 109–116. doi: 10.1109/ MHTC.2017.7926210. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE [96] L. Xiao and L. Guo, “The realization of precision agriculture monitoring system based on wireless sensor network,” in Proc. Int. Conf. Comput. Commun. Technol. Agri. Eng., 2010, vol. 3, pp. 89–92. [97] J. Xu, J. Zhang, X. Zheng, X. Wei, and J. Han, “Wireless sensors in farmland environmental monitoring,” in Proc. Int. Conf. Cyber-Enabled Distrib. Comput. Knowl. Discovery, 2015, pp. 372–379. [98] A. Medela, B. Cendón, L. González, R. Crespo, and I. Nevares, “IoT multiplatform networking to monitor and control wineries and vineyards,” in Proc. Future Netw. Mobile Summit, July 2013, pp. 1–10. [99] K. O. Flores, I. M. Butaslac, J. E. M. Gonzales, S. M. G. Dumlao, and R. S. Reyes, “Precision agriculture monitoring system using wireless sensor network and raspberry Pi local server,” in Proc. IEEE Region 10 Conf. (TENCON), 2016, pp. 3018–3021. [100] J. L. Hou, R. Hou, D. S. Gao, and H. R. Shu, “The design and implementation of orchard long-distance intelligent irrigation system based on Zigbee and GPRS,” in Advanced Materials Res., vol. 588, pp. 1593–1597, Nov. 2012. [101] I. Hajdu and I. Yule, “Application of a wireless sensor network for multi-depth soil moisture monitoring at farm scale in New Zealand’s hill country,” Adv. Animal Biosci., vol. 8, no. 2, pp. 412–417, July 2017. doi: 10.1017/S2040470017000450. [102] K. Manikandan and S. Rajaram, “Automatic monitoring system for a precision agriculture based on wireless sensor networks,” Int. J. Sci., Eng. Comput. Technol., vol. 6, no. 6, p. 208, 2016. [103] M. Mafuta, M. Zennaro, A. Bagula, G. Ault, H. Gombachika, and T. Chadza, “Successful deployment of a wireless sensor network for precision agriculture in Malawi,” Int. J. Distrib. Sensor Netw., vol. 9, no. 5, p. 150,703, 2013. doi: 10.1155/2013/150703. [104] J. A. López, A.-J. Garcia-Sanchez, F. Soto, A. Iborra, F. GarciaSanchez, and J. Garcia-Haro, “Design and validation of a wireless sensor network architecture for precision horticulture applications,” Precision Agri., vol. 12, no. 2, pp. 280–295, Apr. 2011. doi: 10.1007/s11119-010-9178-1. [105] S. Rodríguez, T. Gualotuña, and C. Grilo, “A system for the monitoring and predicting of data in precision agriculture in a rose greenhouse based on wireless sensor networks,” in Proc. CENTERIS 2017—Int. Conf. ENTERprise Inf. Systems/ProjMAN 2017—Int. Conf. Project MANagement/HCist 2017—Int. Conf. Health and Soc. Care Inf. Syst. Technol., CENTERIS/ProjMAN/ HCist 2017, Jan. 2017, vol. 121, pp. 306–313. [106] M. Srbinovska, C. Gavrovski, V. Dimcev, A. Krkoleva, and V. Borozan, “Environmental parameters monitoring in precision agriculture using wireless sensor networks,” J. Cleaner Prod., vol. 88, pp. 297–307, Feb. 2015. doi: 10.1016/j. jclepro.2014.04.036. [107] A. Matese, S. Di Gennaro, A. Zaldei, L. Genesio, and F. Vaccari, “A wireless sensor network for precision viticulture: The NAV system,” Comput. Electron. Agri., vol. 69, no. 1, pp. 51–58, Nov. 2009. doi: 10.1016/j.compag.2009.06.016. [108] S. M. Abd El-kader and B. M. Mohammad El-Basioni, “Precision farming solution in Egypt using the wireless sensor network technology,” Egyptian Inf. J., vol. 14, no. 3, pp. 221–233, Nov. 2013. doi: 10.1016/j.eij.2013.06.004. [109] F. Karim, F. Karim, and A. Frihida, “Monitoring system using Web of Things in precision agriculture,” Procedia Comput. Sci., 221
vol. 110, pp. 402–409, 2017. doi: 10.1016/j.procs.2017.06.083. [Online]. Available: https://www.sciencedirect.com/science/ article/pii/S1877050917312590 [110] J. Bauer, B. Siegmann, T. Jarmer, and N. Aschenbr uck, “On the potential of Wireless Sensor Networks for the insitu assessment of crop leaf area index,” Comput. Electron. Agri., vol. 128, pp. 149–159, Oct. 2016. doi: 10.1016/j.compag.2016.08.019. [111] G. E. John, “A low cost wireless sensor network for precision agriculture,” in Proc. 6th Int. Symp. Embedded Comput. Syst. Des. (ISED), 2016, pp. 24–27. doi: 10.1109/ISED.2016.7977048. [112] R. K. Math and N. V. Dharwadkar, “A wireless sensor network based low cost and energy efficient frame work for precision agriculture,” in Proc. Int. Conf. Nascent Technol. Eng. (ICNTE), 2017, pp. 1–6. doi: 10.1109/ICNTE.2017.7947883. [113] T. D. Le and D. H. Tan, “Design and deploy a wireless sensor network for precision agriculture,” in Proc. 2nd Nat. Found. Sci. Technol. Develop. Conf. Inf. and Comput. Sci. (NICS), 2015, pp. 294–299. [114] L. Hui, M. Zhijun, W. Hua, and X. Min, “Spatio-temporal variation analysis of soil temperature based on wireless sensor network,” Biol. Eng., vol. 9, no. 6, p. 9, 2016. [115] G. Nagarajan and R. Minu, “Wireless soil monitoring sensor for sprinkler irrigation automation system,” Wireless Personal Commun., vol. 98, no. 2, pp. 1835–1851, 2018. doi: 10.1007/s11277017-4948-y. [116] Y. Wang, Y. Wang, X. Qi, and L. Xu, “OPAIMS: Open architecture precision agriculture information monitoring system,” in Proc. 2009 Int. Conf. on Compilers, Archit., Synthesis Embedded Syst. pp. 233–240. doi: 10.1145/1629395.1629428. [117] M. Zeni et al., “Low-power low-cost wireless sensors for real-time plant stress detection,” in Proc. 2015 Annu. Symp. Comput. Develop. New York, NY, pp. 51–59. doi: 10.1145/2830629.2830641. [118] A. Khattab, S. E. Habib, H. Ismail, S. Zayan, Y. Fahmy, and M. M. Khairy, “An IoT-based cognitive monitoring system for early plant disease forecast,” Comput. Electron. Agri., vol. 166, p. 105,028, Nov. 2019. doi: 10.1016/j.compag.2019.105028. [119] R. S. Jo, M. Lu, V. Raman, and P. H. Then, “Design and implementation of IoT-enabled compost monitoring system,” in Proc. IEEE 9th Symp. Comput. Appl. Ind. Electron. (ISCAIE), Nov. 2019, pp. 23–28. [120] J. M. Núñez V, F. Fonthal, and Y. M. Quezada L, “Design and implementation of WSN and IoT for precision agriculture in tomato crops,” in Proc. IEEE Andean Conf., 2018, pp. 1–5. doi: 10.1109/ANDESCON.2018.8564674. [121] V. A. Vuh, D. C. Trinh, T. C. Truvant, T. D. Bui, “Design of automatic irrigation system for greenhouse based on LoRa technology,” in Proc. Int. Conf. Adv. Technol. Commun. (ATC), 2018, pp. 72–77. [122] T. Murthy and S. Rasool, “Design of smart bio-shed using IoT with raspberry PI,” Int. J. Recent Technol. Eng., vol. 8, no. 2 Special Issue 11, pp. 2249–2255, 2019. [123] Y. Kuang, Y. Shen, L. Lu, and G. Li, “Farmland monitoring system based on cloud platform,” in Proc. IEEE 8th Joint Int. Inf. Technol. Artif. Intell. Conf. (ITAIC), 2019, pp. 335–339. doi: 10.1109/ITAIC.2019.8785531. 222 [124]J. Bauer, T. Jarmer, S. Schittenhelm, B. Siegmann, and N. Aschenbruck, “Processing and filtering of leaf area index time series assessed by in-situ wireless sensor networks,” Comput. Electron. Agri., vol. 165, p. 104,867, 2019. doi: 10.1016/j.compag.2019.104867. [125] S. Sadowski and P. Spachos, “Solar-powered smart agricultural monitoring system using internet of things devices,” in Proc. IEEE 9th Annu. Inf. Technol., Electron. Mobile Commun. Conf. (IEMCON), 2018, pp. 18–23. doi: 10.1109/IEMCON.2018. 8614981. [126] X. Feng, F. Yan, and X. Liu, “Study of wireless communication technologies on Internet of Things for precision agriculture,” Wireless Personal Commun., 2019, pp. 1–18. [127] A. Rao, H. Shao, and X. Yang, “The design and implementation of smart agricultural management platform based on UAV and wireless sensor network,” in Proc. IEEE 2nd Int. Conf. Electron. Technol. (ICET), 2019, pp. 248–252. doi: 10.1109/ ELTECH.2019.8839480. [128] “Water in agriculture,” The World Bank Group Water Global Practice, Washington, D.C. Accessed: Sept. 3, 2020. [Online]. Available: https://www.worldbank.org/en/topic/water-in-agriculture [129] Zigbee, IEEE Standard for Low-Rate Wireless Networks, IEEE Standard 802.15.4-2015 (Revision of IEEE Standard 802.15.4-2011), 2016, pp. 1–709. [130] Bluetooth, IEEE Standard for Telecommunications and Information Exchange Between Systems – LAN/MAN – Specific Requirements – Part 15: Wireless Medium Access Control (MAC) and Physical Layer (PHY) Specifications for Wireless Personal Area Networks (WPANs), IEEE Standard 802.15.1-2002, pp. 1–473. [131] Wifi, IEEE Standard for Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications, IEEE Standard 802.11-1997, pp. 1–445. [132] W. Ayoub, A. E. Samhat, F. Nouvel, M. Mroue, and J. Prévotet, “Internet of mobile things: Overview of LoRaWAN, DASH7, and NB-IoT in LPWANs standards and supported mobility,” IEEE Commun. Surveys Tuts., vol. 21, no. 2, pp. 1561–1581, 2019. doi: 10.1109/COMST.2018.2877382. [133] F. Adelantado, X. Vilajosana, P. Tuset-Peiro, B. Martinez, J. Melia-Segui, and T. Watteyne, “Understanding the limits of LoRaWAN,” IEEE Commun. Mag., vol. 55, no. 9, pp. 34–40, 2017. doi: 10.1109/MCOM.2017.1600613. [134] V. D. Hunt, A. Puglia, and M. Puglia, An Overview of RFID Technology. New Jersey: Wiley Telecom, 2007, pp. 5–24. [135] J. Cai and D. J. Goodman, “General packet radio service in GSM,” IEEE Commun. Mag., vol. 35, no. 10, pp. 122–131, 1997. doi: 10.1109/35.623996. [136] Memsic. Accessed: May 10, 2020. [Online]. Available: https:// www.memsic.com/ [137] “Waspmote,” Libelium Comunicaciones Distribuidas S.L. Accessed: May 10, 2020. [Online]. Available: http://www.libelium .com/products/waspmote/ [138] B. E. Graeub, M. J. Chappell, H. Wittman, S. Ledermann, R. B. Kerr, and B. Gemmill-Herren, “The state of family farms in the world,” World Develop., vol. 87, pp. 1–15, Nov. 2016. doi: 10.1016/j.worlddev.2015.05.012. GRS IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
©SHUTTERSTOCK.COM/APERTURE75 Spectral Variability in Hyperspectral Data Unmixing A comprehensive review RICARDO AUGUSTO BORSOI, TALES IMBIRIBA, JOSÉ CARLOS MOREIRA BERMUDEZ, CÉDRIC RICHARD, JOCELYN CHANUSSOT, LUCAS DRUMETZ, JEAN-YVES TOURNERET, ALINA ZARE, AND CHRISTIAN JUTTEN T he spectral signatures of the materials contained in hyperspectral images, also called endmembers (EMs), can be significantly affected by variations in atmospheric, illumination, and environmental conditions that typically occur within an image. Traditional spectral unmixing (SU) algorithms neglect the spectral variability of the EMs, which propagates significant modeling errors throughout the whole unmixing process and compromises the quality of the results. Therefore, serious efforts have been dedicated to mitigating the effects of spectral variability in SU. This resulted in the development of algorithms that incorporate different strategies to enable the EMs to vary within a hyDigital Object Identifier 10.1109/MGRS.2021.3071158 Date of current version: 21 May 2021 DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE perspectral image, using, for instance, sets of spectral signatures known a priori as well as Bayesian, parametric, and local EM models. Each of these approaches has different characteristics and underlying motivations. This article presents a comprehensive literature review contextualizing both classic and recent approaches to solve this problem. We give a detailed evaluation of the sources of spectral variability and their effects in image spectra. Furthermore, we propose a new taxonomy that organizes existing works according to a practitioner’s point of view, based on the necessary amount of supervision and the computational cost. We also summarize methods to construct spectral libraries (which are required by many SU techniques) based on the observed hyperspectral image as well as algorithms for 0274-6638/21©2021IEEE 223
library augmentation and reduction. Finally, we conclude with discussions and an outline of possible future directions for the field. OVERVIEW Hyperspectral cameras can sample electromagnetic spectra at hundreds of contiguous wavelength intervals. The high spectral resolution of hyperspectral images makes them an important tool for the precise identification and discrimination of different materials in a scene. Hyperspectral images significantly contribute to different fields and are now at the core of a vast number of applications, such as space exploration [1], land use analysis, mineral detection, environment monitoring, field surveillance [2], [3], disease diagnosis, and image-guided surgery [4]. Despite the advantages of their high spectral resolution, hyperspectral cameras operate with a delicate tradeoff between spatial resolution and the signal-to-noise ratio. This happens because the light observed at the sensor is decomposed into several spectral bands, which, in turn, demands the pixel size to be large enough to attain an acceptable signal-to-noise ratio. When combined with a large target-to-sensor distance, which is common in many applications, this leads to images that have a low spatial resolution [5]. The limited spatial resolution of hyperspectral images means that each image pixel is actually a mixture of P different pure materials, whose spectra are termed EMs, that are present in the scene [6]. This mixing process conceals important information about the pure materials and their distribution in an image. SU aims to solve this problem by decomposing a hyperspectral image into the spectral signatures of the EMs and their fractional abundance proportions for each pixel [7]. The simplest and most widely used model to represent the interaction between light and the EMs in the scene is the linear mixing model (LMM) [6], which represents a given pixel y n indexed by n with L spectral bands as y n = M 0 a n + e n, subject to 1 < a n = 1 and a n $ 0, (1) where M 0 = [m 0, 1, f, m 0,P] is an L × P matrix whose columns are the P EMs, a n is a vector containing the abundances of every EM in the pixel y n, and e n is an additive noise vector. Traditionally, the LMM assumes that the signatures M 0 of the pure materials are the same for all pixels y n, n = 1, f, N in the image. Although this assumption leads to a well-posed and computationally simpler framework, it limits the applicability of the LMM since it can jeopardize the accuracy of estimated abundances in many circumstances, due to the spectral variability of the EMs. SPECTRAL VARIABILITY IN SPECTRAL UNMIXING Spectral variability is an effect commonly observed in many scenes in which the spectral signatures of the pure constituent materials vary across the observed hyperspectral image, as illustrated in Figure 1. It can be caused, for instance, by 224 variable illumination and atmospheric conditions. Variability can also be intrinsic to the very definition of a pure material, such as signatures of a single vegetation species that differ significantly due to growing and environmental conditions [8], [9]. In this context, the use of a single matrix M 0 for all pixels in the LMM (1) leads to problems such as proportion indeterminacy, where errors in the estimation of the EM spectra at each pixel propagate to the estimated abundances. This results in erroneous abundance estimation and the selection of too many EMs to represent the spectrum of each pixel y n [8]–[10]. Due to the significant impact of EM variability on abundance estimation quality, a lot of effort has been dedicated to developing algorithms that are able to obtain better abundance estimates in this scenario. The most general form of the LMM that considers spectral variability generalizes (1) to facilitate a different EM matrix for each pixel, resulting in y n = M n a n + e n, subject to 1 < a n = 1 and a n $ 0 (2) for n = 1, f, N, where M n ! R L # P is the nth pixel EM matrix. SU for spectral variability can be generally defined as two complementary problems related, respectively, to the recovery of the abundances and EMs. These can be defined as the following: ◗◗ P1, which mitigates the adverse effects of spectral variability in the abundance estimation ◗◗ P2, which estimates the spectral signatures of the EMs present in each pixel of the image. Both problems have attracted substantial interest. While all SU methods must deal with P1 while accounting for spectral variability, not all of them take P2 into consideration, due to the additional difficulty that is involved. CONTRIBUTION, TAXONOMY, AND ORGANIZATION Many SU algorithms have been proposed to address problems P1 and P2. Different algorithms follow various methodologies to represent the EMs in the scene. Existing methods employ Bayesian, parametric, and spatially localized models as well as libraries containing different instances of material spectra known a priori. This multiplicity of models gives rise to solutions presenting advantages and disadvantages in terms of computational complexity, accuracy, and the amount of user supervision. In this article, we categorize the methods according to criteria that are most relevant to the practitioner, such as, e.g., computational complexity, to provide a comprehensive review that complements and updates previous summaries [8], [9], [11], [12]. Since existing SU methods that address spectral variability have very heterogeneous characteristics, navigating the field can be difficult, especially when accounting for classical algorithms and recent developments. This difficulty motivated the present review, which presents a novel taxonomy aimed at the practitioner as well as a comprehensive categorization of existing approaches. The IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
contributions and highlights of the article are described in the following. A NEW TAXONOMY FOR THE PRACTITIONER We propose a taxonomy to organize the existing techniques according to a practitioner’s point of view, based on the amount of user supervision and the computational complexity required to solve the SU problem. The resulting taxonomy is summarized in the form of a decision tree shown in Figure 2, which can be used to guide the choice of a family of SU algorithms. The decision tree also dictates the organization of the rest of the article. We start from whether or not a spectral library is known a priori and proceed to different families of SU methods based on the tradeoffs they offer regarding the need for user supervision and computational cost. Table 1 summarizes the main characteristics of each group of techniques and points to illustrations with high-level descriptions of the key ideas on which the categories are based. 0.4 0.3 EX SITU SPECTRAL LIBRARIES A considerable number of SU methods address spectral variability by using libraries of spectra that originally had to be acquired a priori (e.g., through in situ measurements), which used to limit the applicability of these approaches. An important recent development concerns methods that can extract spectral libraries directly from the observed images or generate them using physics-based mathematical models of material spectra. This supports the widespread 0.3 0.6 0.25 0.5 0.2 0.2 0.4 0.15 0.3 0.1 0.1 0 A COMPREHENSIVE OVERVIEW AND RECENT HIGHLIGHTS We provide a comprehensive review of the methods developed to solve the SU problem with EM variability. We include and contextualize the classic strategies that have been reviewed before as well as numerous recent developments in the field. Thus, both classic and recent algorithms are categorized according to the proposed taxonomy, which helps to highlight the advances in each area. 0.2 0.05 0.5 1 1.5 2 Wavelengths (µm) 0 0.5 1 1.5 2 Wavelengths (µm) (a) (b) 0 0.5 1 1.5 2 Wavelengths (µm) (c) FIGURE 1. Spectral variability is ubiquitous in hyperspectral images: the pixels in regions composed of a single material [e.g., (a) the trees, (b) roof, and (c) soil in this image] can contain very different spectral signatures. (Source: Image generated based on sample data available from the HyperCUbe software, distributed by the U.S. Army Corps of Engineers.) DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 225
226 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021 Fuzzy Unmixing Is Expert Knowledge Available? Yes No Apply Spectral Transformation to the Image and Library? Yes MESMA and Variants Sparse Unmixing Yes EM Libraries Extraction Yes No Prune Signatures to Make the Library Small? Is the Library Very Large? No Less Less Cost Supervision or Less Computational Cost? Less Supervision User-Defined Spectral Transformation Yes Are Spectral Libraries Available a Priori? Machine Learning Methods No Bayesian Models No Parametric EM Models EM-ModelFree Methods Less Less Cost Supervision or Less Computational Cost? Less Supervision Extract Libraries From the Observed Image? Local Unmixing Low Medium High Color Code for the Computational Cost of the Unmixing Methods: FIGURE 2. The decision tree for hyperspectral unmixing, considering spectral variability. The blue boxes denote families of unmixing algorithms, while the yellow boxes represent additional techniques related to the extraction and processing of spectral libraries. MESMA: multiple-EM spectral mixture analysis. Library-Based Spectral Transformation No Yes Estimate Abundances and EMs for Each Pixel
applicability of library-based SU techniques in situations where spectral libraries are not available or cannot be built. Such methods are reviewed in the “How to Construct Spectral Libraries” section. Moreover, library pruning techniques, which were originally devised to reduce the size of libraries and lessen the computational complexity of SU, have evolved to consider the quality of the unmixing results. Recent library pruning methods aim at removing, before unmixing, entire EM classes or individual spectral signatures that are unlikely to be present in an observed image. This reduces the ill-posedness of the SU problem and can improve abundance estimation. These techniques are discussed in the “Library Pruning Techniques” section. Table 2 summarizes the key ideas involved in library TABLE 1. CHARACTERISTICS OF EACH GROUP OF SU TECHNIQUES AND WHERE THEY ARE REVIEWED IN THE ARTICLE. MESMA AND VARIANTS FUZZY SU SPARSE SU MACHINE LEARNING LOCAL SU PARAMETRIC EM MODELS EM-MODEL-FREE BAYESIAN Amount of user supervision • • •• ••• ••• ••• •• • Computational cost ••• ••• • •• • •• •• ••• Requires spectral libraries? ü ü ü ü û û û û Estimates pixeldependent EMs? ü ü ü û ü ü û û Section in the article “MultipleEndmember Spectral Mixture Analysis and Its Variants for Small Spectral Libraries” “MultipleEndmember Spectral Mixture Analysis and Its Variants for Small Spectral Libraries” “Sparse Unmixing” “Machine “Local Learning Unmixing Algorithms” Methods” “Parametric Models” “EndmemberModel-Free Methods” “Bayesian Methods” Illustration of the key ideas Figure 10 Figure 11 Figure 13 Figure 12 Figure 14 Sparse SU: SU with sparsity constraints; EM-model-free: SU without explicit EM models. TABLE 2. CHARACTERISTICS OF SPECTRAL LIBRARY EXTRACTION AND PRUNING TECHNIQUES AND WHERE THEY ARE REVIEWED IN THE ARTICLE. LIBRARY EXTRACTION TECHNIQUES IMAGE-BASED LIBRARY EXTRACTION LIBRARY GENERATION FROM PHYSICS MODELS SPATIAL INTERPOLATION OF EM SIGNATURES Key idea Extracts multiple EM signatures from the observed image and cluster them to construct a library Create synthetic EM signatures using physicochemical mathematical models describing EM variability Estimate EM signatures for each pixel by interpolating pure pixels at known spatial locations Adapted to the HI? ü û ü Amount of user supervision •• ••• •• Depends on the existence of pure pixels? ü û ü Section in the article “Image-Based Library Construction” “Generating Spectral Libraries From Physics Models” “Spatial Interpolation of Endmember Signatures” LIBRARY PRUNING TECHNIQUES LIBRARY REDUCTION EM SELECTION SAME-CLASS EM PRUNING Removes redundant signatures from an existing library to reduce the computational complexity of SU Removes entire EM classes (e.g., water and trees) not present in the observed image from the library Selects the signatures from each EM class most closely related to the observed image before SU Key idea Adapted to the HI? û ü ü Amount of user supervision • •• •• Improves the computational cost of SU? ü ü ü Improves SU quality? û ü ü Section in the article “Library Reduction Techniques” “Endmember Selection Methods” “Pruning Libraries Within the Same Class” HI: hyperspectral image. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 227
extraction and pruning methods as well as the approaches’ main characteristics. EXPERIMENTAL ASPECTS AND TOOLBOX The practical aspects related to the evaluation of SU methods when spectral variability is considered are also discussed in the “Experimental Evaluation” section. This includes the generation of realistic synthetic data and a list of existing software resources that are available to the reader. We also present a simulation to demonstrate the application of a few of the SU techniques reviewed in the article, which were chosen by selecting different paths in the proposed decision tree. This example is made publicly available in the form of a software toolbox at https:// github.com/ricardoborsoi/­u nmixing_spectral_variability and in [13]. ORIGINS OF SPECTRAL VARIABILITY AND ITS EFFECTS The variability in spectral signatures occurs mainly due to 1) atmospheric effects, 2) illumination and topographic changes, and 3) the intrinsic variation of the spectral signatures of the materials (i.e., due to physicochemical differences). Understanding how these conditions affect the spectral signatures of the materials and the unmixing results is important for the development of informed models and methods to deal with EM variability. Such knowledge can be used, for instance, to generate physics-based and physically inspired models that include the effects of spectral signature variability. Such representations can then be directly incorporated into the SU process (as discussed in the “Parametric Models” section) and used to generate synthetic spectral libraries for library-based SU (as discussed in the “Generating Spectral Libraries From Physics Models” section). In addition to SU, spectral variability affects other hyperspectral imaging tasks, which prompted extensive investigations into its causes and how it manifests in material spectra. In this context, a recent review article by Theiler et al. [14] provides an excellent overview of spectral variability in hyperspectral target detection. In particular, the causes and effects of spectral variability in target detection are reviewed, with a focus on the study of environmentally induced variability (caused by, e.g., atmospheric and topographic changes) through an in-depth examination of radiative transfer models. A detailed computer simulation is included to illustrate how the material spectra are affected by changes in the parameters of the radiative transfer model. In the following, we review the causes and effects of spectral variability from an SU perspective. Although we also introduce the radiative transfer function interpretation of some atmospheric and topographic effects, we focus our exposition on a more generic analysis of the consequences that spectral variability has for the observed pixel spectra and on the results of SU as reported 228 by previous experimental works (i.e., with a stronger focus on the results of, e.g., atmospheric compensation methods as opposed to the interpretation of the imaging models themselves). The interested reader can find a more comprehensive analysis from a radiative transfer function standpoint in [14]. ATMOSPHERIC EFFECTS One of the main sources of spectral variability is atmospheric interference when measuring ground reflectance. Atmospheric gases (such as ozone, oxygen, methane, carbon dioxide, and so on), aerosols, and, most prominently, water vapor absorb significant amounts of radiation, while other molecules and vapors scatter incoming light [15]. These effects have an impact on the radiance measured at the sensor, which can become significantly different than that corresponding to the desired ground reflectance. Atmospheric absorption from gases is also heavily wavelength dependent, whereas aerosol absorption varies smoothly in spectra. These effects must be compensated for to achieve an accurate characterization of surface reflectance. Atmospheric compensation models can be roughly divided into statistical (empirical) and physics-based varieties [15]. Statistical models are based on additional information about the atmospheric influence, usually obtained by means of reference objects and calibration panels in the scene. This information is used to find a relationship (e.g., linear) between the radiances observed at the sensor and at the surface of the scene [15]. This results in gain and offset factors for each spectral band, which are then uniformly applied to every image pixel to compensate for the atmospheric effects [15]. Sometimes, when a reference object is not present in the scene, naturally occurring objects can be employed, most commonly consisting of smooth bodies of water, which exhibit low reflectance and can be considered dark objects [5]. The downsides of this approach are 1) the true reflectance of a reference object must be accurately known, and 2) it does not account for the spatial variability of the distribution of gases and aerosols. This variability can be very significant, and thus it can introduce spatially dependent residual atmospheric effects. A classic example of statistical methods is the empirical line method [5]. Physics-based models, on the other hand, are robust alternatives to empirical methods and do not assume that additional information about the scene is known. These methods are mature and widely used, addressing the limitations of empirical approaches by employing a rigorous model that explicitly describes the absorption and scattering effects due to atmospheric gases and aerosols [16]. Popular examples include the atmospheric removal algorithm and the Fast Line-of-Sight Atmospheric Analysis of Spectral Hypercubes (FLAASH) [15]. Assuming ground terrain illuminated by the sun, the light incident on a pixel in the sensor can be roughly IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
characterized by four sources: 1) solar radiation directly reflected off the ground, 2) light directly reflected off the atmosphere into the sensor, 3) light scattered by the atmosphere and reflected off the ground, and 4) light reflected off surrounding regions on the ground and then scattered before reaching the sensor (constituting the adjacency effect) [17], [18]. These effects are illustrated in Figure 3. A model for the reflectance at the sensor y sensor is given by [15] y sensor = y atm Tg + y s Tg T. T- + (y avg - y s) Tg T. T- r , (3) 1 - y avg s where y s is the reflectance of the surface of interest, Tg is the gaseous transmittance, y atm is the reflectance of the atmosphere, T. and T- are the upward and downward scattering transmittances, r is the ratio between the diffuse and total transmittance for the ground-to-sensor path, s is the spherical albedo of the atmosphere, and y avg is the average surface reflectance in a region around a pixel, which is used to account for scattering (adjacency) effects [15]. Physics-based atmospheric correction algorithms then try to obtain the ground reflectance y s from the at-sensor reflectance y sensor by solving (3). In the overall working of these algorithms, the first step for atmospheric compensation consists of retrieving the atmospheric parameters necessary to represent the quantities in (3), mainly consisting of an aerosol description (visibility and type of aerosol) and the amount of water vapor for each pixel [19]. They are typically based on variations of the so-called three-band ratio technique, which is an important step to quantify the amount of water vapor for each pixel. The three-band ratio technique basically compares ratios of radiances measured near the edges of a number of spectral wavelengths that are known to present heavy water vapor absorption (e.g., at around 0.91, 0.94, and 1.14 μm), using this information to derive the column water vapor information for each pixel [5], [20]. After the necessary parameters have been estimated, (3) can be solved for the ground reflectance, and an optional postprocessing step can be employed (called spectral polishing) to remove artifacts from the correction process [19]. Physics-based models can represent and account for the interaction between solar radiation and the atmosphere very accurately. However, for this accuracy to translate into meaningful surface reflectance estimates, these models require precise information about atmospheric properties, which is very difficult to obtain in practice. This is especially true for scattering and absorption by aerosols, which are hard to characterize accurately due to their spatial and temporal variability [21]. Inaccuracies in the estimation of these parameters (which include the atmospheric visibility, aerosol model type, and an atmospheric model) introduce errors in the retrieved surface reflectance spectra that can be significant and spectrally nonuniform [22]. Furthermore, unlike water vapor compensation, which is performed on a pixel-by-pixel basis, most methods DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE assume that individual aerosol and gas concentrations are uniform across the scene (resulting in a single transmittance spectrum being computed for each gas) [19], [22]. While this is true for some gases (such as ammonium, oxygen, methane, carbon dioxide, and so on) that are fairly constant in the atmosphere [20], it is far from true for aerosols, which may show significant variation in space [23], [24]. The aerosol concentration can change depending on the environment (e.g., in large cities and rural areas), and thus it must be provided by users of existing algorithms [20]. Moreover, standard aerosol types often do not adequately represent the scene being processed, leading to inaccuracies in the retrieved spectra [25]. In addition, experimental studies have found that aerosol optical thickness has significant spatial variability within a single scene [23], [26] and is often correlated with cloud concentrations [26]. Some works attempted to estimate the aerosol optical thickness for smaller patches of the image individually by using shadow detection results [27], an approach that depends on the presence of a large number of shadowed pixels. However, acquiring precise data for an accurate and possibly spatially variable atmospheric correction is generally difficult, which means that the results of common atmospheric compensation methods can be subject to significant errors [23]. For instance, a number of studies have investigated the residual errors in surface reflectance data after the application of atmospheric compensation methods by comparing the processed results with in situ data and by using simulations. These studies found that, generally, there is still an appreciable error in the retrieved reflectances. As an example, errors in the retrieved reflectance by atmospheric corrections due to the spatial variability of the aerosol optical thickness above southern England were found to be as high as 1.7%, with errors of (a) (b) Light Source (c) Viewer (d) Terrain FIGURE 3. The effects of the atmosphere on the acquired hyperspectral image. The sources of radiation are represented by (a) light directly reflected by the atmosphere to the sensor, (b) light scattered by the atmosphere and reflected by the ground, (c) light directly reflected by the ground, and (d) light reflected by surrounding regions on the ground and then scattered to the sensor. 229
5% in the normalized difference vegetation index (NDVI) [23]. This can be significant for practical applications, as it corresponds to errors of up to 30% in biomass production estimates [23], [28]. Moreover, standard methods for column water vapor retrieval lose accuracy when the aerosol optical thickness is high, leading to errors of up to 10% if aerosol effects are not properly compensated for [29]. Note that experimental measurements in a water quality management application found significant differences between the true and retrieved spectral responses. Errors of up to 15% in reflectance spectra were indentified, more prominently concentrated in short (<450-nm) and long (>750-nm) wavelength intervals [30]. Another study evaluated a number of physics-based atmospheric correction methods in an experiment for a playa and canola target and found that, although the average relative differences were moderate, ranging between 0.023 and 0.042, larger deviations of up to 0.12 occurred in the near-infrared region [31]. A study with simulated data found that incorrectly supplying input parameters to the model used in the FLAASH algorithm can lead to considerable errors in the retrieved reflectance, with an absolute difference of up to 0.11, and a strong sensitivity to moisture/optical depth (visibility) errors [22]. Also, very large errors can be introduced by a bad specification of the aerosol model type, with higher errors generally present in short wavelengths, where scattering processes are most significant [22]. The influence that uncertainties in the column water vapor and aerosol optical depth specification have on SU (given their influence on the retrieved reflectances) was investigated in [24]. The performance degradation was found to be more severe in the abundance estimation than it was in the reflectance estimation, with degradation of up to 30% in high-scattering conditions. The results were more acutely affected due to uncertainties in the water vapor amount than in the aerosol optical thickness, although the latter showed a strong influence on the quality of the reconstructed abundance maps when the EMs were spectrally similar. Finally, it is interesting to highlight that two characteristics were noticed in these studies. First, the errors in the retrieved reflectances are fairly nonuniform in spectral bands, with large spikes often concentrated near bands where there is significant gas/water absorption [22], [24]. Second, errors due to bad aerosol specification are quite significant in short wavelengths (450–750 nm), where they are concentrated [22], [30]. All these effects are illustrated in Figure 4. ILLUMINATION AND TOPOGRAPHIC EFFECTS Varying illumination conditions are one of the main sources of spectral variability in spectral mixture analysis [32]. Illumination changes are mainly due to two effects: varying terrain topography, which affects the angles of the incident radiation, and the occlusion of the light source by other objects (leading to shaded areas). A number of works handled the presence of heavily shaded areas by considering the presence of an additional EM representing shadow [33]–[39]. Although this approach is very simple, its effectiveness is certainly limited since a single spectral signature can be insufficient to adequately represent all pixels affected by shadow [40]. For instance, there might be many shadow EMs since shadows in different regions of the image are influenced by both the material that is being shaded and the absorption properties of the material that is blocking the light, which might lead to significantly different spectral signatures [41]. Furthermore, besides presenting a lower reflectance amplitude, shadow EM is usually significantly affected by nonlinear atmospheric scattering and multipath effects because these areas are illuminated by a large proportion of diffuse irradiation scattered by the atmosphere (i.e., skylight) and nearby objects. This implies that shadow Atmospherically Corrected Smaller Errors in the Remaining Spectra Reflectance True Large Errors Concentrated Near Water Absorption Bands Significant Errors in Short Wavelengths Due to Aerosols Wavelengths FIGURE 4. Variability caused by atmospheric effects. 230 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
Reflectance the scene. To this end, one could resort to the model develEM is sensitive to the state of the atmosphere and can vary oped by Hapke [48], [49], which describes the bidirectional significantly in space, depending on the amount of scattered reflectance (i.e., the reflectance as a function of the incidence light being reflected from the sky at each position [42], [43]. angles of the light source and observer/viewer, as in Figure 6) When illumination predominantly comes from scattered as a function of the single scattering albedo and the photometradiation, the spectrum not only presents a lower amplitude ric parameters of the material [50]. Hapke’s model suggests a but is skewed to short (e.g., blue) wavelengths [44], [45]. This more complex relationship between the EM signatures and means that the signal amplitudes in the shorter (blue) wavethe topography. In this context, the mixture of materials is aslengths are considerably larger than in the rest of the spectra sumed to happen at the macroscopic level, enabling the con[45]. Furthermore, since the shadow spectral signature is a sideration of the LMM in the albedo domain, where Hapke’s function of diffuse illumination, it depends on the neighmodel acts separately on each EM. boring image area (where the skylight is scattered) [45] and Besides the dependency on the spectral signature the cloud cover. Moreover, variations of ground reflectance with photometric parameters, which is discussed in the may not be easily discernible from atmospheric effects since following section, the dependence on the single scatterboth phenomena are jointly observed and not easily separaing albedo indicates that changes in incident angles can ble [45]. These facts introduce a strong dependence between the shadow signature and the spatial position, and they go against the common notion that shadow EMs can be adequately represented by scaled versions of true EMs [5] (that 0.55 is true only for small illumination variations). This makes 0.5 the detection, correction, and quantification of shadow a 0.45 challenging task because the physical-based inversion of 0.4 these atmospheric effects turns out to be a hard problem. 0.35 However, this task is still necessary since linear SU with a 0.3 single dark EM usually does not successfully quantify the 0.25 presence of shadow in the scene [45]. 0.2 Although the presence of shadows is common in hy0.15 perspectral images, a more prominent source of variability 0.1 comes from the differing topography of the scene, which 0.05 introduces complex fluctuations of the relative angles be0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 tween the incoming light source and the sensor for each Wavelength (µm) pixel of the scene. Topographic variations have been shown to significantly affect the spectral reflectance values of soil FIGURE 5. Thirty pixel instances classified as a red roof in the Pavia and green vegetation [46] as well as rocks in lithologic image (in gray), which are primarily affected by illumination, and mapping [47], expanding EM clusters and causing overlap their spectral average (in red). The average Pearson correlation between classes, hindering the EM identification and uncoefficient between each signature and the scaled version of the mixing processes. Considering that only the amplitude of mean spectra that is closest to it is about 0.993, indicating good the incident radiation changes through the scene, the reagreement between illumination-based spectral variability and flectance spectra of the observed pixels in the LMM become the constant scaling model. scaled by a constant positive factor. This model agrees with the observation that most of the variability in a hyperspectral image can be represented by a constant scaling of reference EMs [5]. As a simple empirical Light Source verification, we plot a random subIncoming Outgoing set of 30 pixels of red roofs from the Angle Angle Viewer Pavia image; the pixels are pure and mostly affected by illumination effects. The results, which are depicted in Figure 5, indicate that the pixTerrain els differ mostly by a scaling factor. Although a constant scaling model is intuitive and simple, a more rigorous conclusion can be achieved by FIGURE 6. Hapke’s model relates the reflectance to the incidence angles of the light source analyzing the dependence of radiative and the observer/viewer, given a material’s single scattering albedo and photometric paramtransfer models on the topography of eters [50]. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 231
affect each material in a pixel differently than the others since the behavior of the reflectance as a function of the angle is different for each material. This indicates that each EM/material in a pixel can be variously affected by topographic effects. Furthermore, the nontrivial relationship between geometry and spectral signatures leads to a more complex variation than single scaling of each EM for high-albedo materials [51], [52]. Even small topographic variations can significantly affect the ground reflectance. For instance, in [53], experimental studies found that even small slopes (of fewer than 10º) originating from irregularities in the tree canopy can lead to appreciable (enough to influence the results of subsequent tasks) changes in the measured reflectance of vegetation spectra. 0.9 0.8 0.7 Reflectance 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Reflectance Reflectance INTRINSIC SPECTRAL VARIABILITY Another important source of spectral variability is the intrinsic variation pertaining the definition of a material, which is also called intrinsic variability. The characterization of this type of variability has been prominently studied in the area of vegetation monitoring, where it poses a huge challenge to identifying tree species from spectral measurements [54], [55] and to the characterization of soil and mineral spectra. Vegetation’s spectral signature can change due to many factors, including microclimates, soil characteristics, precipitation, the presence of heavy metals, drought, foliage age, and colonization by leaf pathogens [54]. The spectral signature of soil is also heavily affected by variations in its composition and moisture content [56]. Furthermore, intrinsic spectral variability is common in mineral spectra, due to differences in the grain size distribution and the presence of variable amounts of impurities [57], [58]. Moreover, it also depends on what level of detail is adopted to represent a given material (e.g., a tree EM may be split into trunk and leaf EMs), which is generally application dependent [59]. Although there is a large impact on EM spectral signatures, the dependence of intrinsic spectral variability on physicochemical parameters, which are usually unknown, makes this area very hard to tackle. One characteristic consistently obser ved in experimental studies is the smoothness of the observed spectra (i.e., the reflectance varies slowly between spectral bands). This behavior can be taken into account when designing SU algorithms. Moreover, unlike spectral changes caused by illumination and topography effects, intrinsic spectral variability frequently presents a considerable dependence of the variability amplitude on the spectral wavelength. For instance, the signatures of different instances of minerals in the United States Geological Survey (USGS) library, presented in Figure 7, show complex dependence between the reflectance variation and the wavelength. The samples from alunite and muscovite have a variability that is far from uniform across the spectrum. Moreover, different instances from pyrite display complex variation, which is not consistent across all samples, occurring independently in different regions of the spectra. This behavior has been verified in similar experimental studies, and it poses a great challenge for differentiating mineral classes based on their spectral signatures [60]. These characteristics are even more prominent in the spectral variation of vegetation reflectance, which shows significant dependence on the wavelength and behaves very differently in visible, near-infrared, and short-waveinfrared ranges [61]. This means that a simple scaling of a reference spectral signature is usually not sufficient to account for variations within tree species [54]. Extensive experimental studies support this claim. In [54], the author found that the variation of spectral reflectance in the visible and near-infrared regions can occur independently when measuring the tropical forest canopy in Brazil. Similar inhomogeneity in spectral variation was also observed in other studies with tropical tree species [62] and in many distinctive environments, including conifer [63] and boreal tree forests [64]. Similar nonuniform variation trends are also consistently observed in seasonal changes, as indicated by many experiments, including in salt marshes [65], semiarid environments [66], and boreal tree species [67]. Furthermore, nonuniform spectral variations have also 0.6 0.5 0.4 0.3 0.5 1 1.5 2 Wavelength (µm) (a) 2.5 0.2 0.5 1 1.5 2 2.5 Wavelength (µm) (b) 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0.5 1 1.5 2 2.5 Wavelength (µm) (c) FIGURE 7. Spectra variation samples from the USGS library. (a) Alunite. (b) Muscovite. (c) Pyrite. 232 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
0.5 1 1.5 2 2.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 Wavelength (µm) (a) ill-posedness of unsupervised SU problems that account for spectral variability. Another important characteristic is that EMs affected by intrinsic spectral variability usually display significant spatial correlation [77]. For instance, many experimental geostatistical works evaluating the spatial distribution and variability of the physicochemical properties of the soil (e.g., the sand and clay concentration, electrical conductivity, acidity, compaction, and available elements, such as nitrogen, phosphorus, and potassium) have reported significant spatial correlation/smoothness in these properties. Reports include measurements performed in Rhodes grass crop terrain [78], calcareous soils [79], rice fields [80], and tobacco plantations [81]. Besides directly impacting the spectral signature of the soil, these characteristics have been widely acknowledged to directly influence vegetation growth (e.g., they show a strong correlation with crop productivity [78]) and hence their spectral signature [61], [78]. Therefore, spatial correlation in the variability is expected both in soil/terrain and in vegetation signatures. A similar behavior has also been observed in mineral spectra in the presence of spatially correlated grain size distributions and impurity concentrations [57], [58]. This implies that the variability tends to be small in modest spatial neighborhoods even though it may be large across a sizeable scene. This fact can be leveraged to design SU algorithms since it supplies information that can be used to reduce the severe ill-posedness of the problem. To illustrate this effect, we performed an experiment by measuring the spectral variability in a homogeneous region (composed mostly by pure pixels) of soil in the Samson image, presented in Figure 9(a). We then computed the Euclidean distance and spectral angle between each soil pixel and the average spectra of all pixels in the subregion, which was used as a reference material signature. The results are depicted in Figure 9(b) and (c), where it can be seen that the variability shows strong spatial correlation, as observed in the Euclidean distance and the spectral angle. Reflectance 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 Reflectance Reflectance been observed in samples from mineral, soil, and rock spectra [60]. Numerous works model the spectral signature of materials as a function of photometric and chemical properties of the medium, based on radiative transfer modeling and empirical approaches. A well-known example is Hapke’s model, which describes the spectra of a surface composed of particles as a function of parameters, such as surface roughness and density and the size of the particles [48], [49]. Another prominent line of work models the spectral characteristics of vegetation and soil samples as a function of biophysical parameters [68]. Models of this kind have been applied for the estimation of leaf biochemistry from the observed spectra. An important example consists of the characterization of leaf reflectance spectra as a function of leaf biophysical parameters [68], for which a wide variety of models has been used, ranging from a simple description of leaf scattering and absorption properties to complex representations that include a detailed description of plant cells’ shape, size, position, and biochemical content [68]. Some instances of those models include the characterization of the spectra of broadleaf vegetation as a function of leaf mesophyll structure, pigment and water concentration [69], angular profiles [70], and, in pine needles, cellulose, lignin, and water content [71]. Other works model soil reflectance spectra as functions of moisture conditions [72]–[74] and snow albedo as a function of snow grain sizes and liquid equivalent depth [75]. As an example, we generated spectral signatures of vegetation spectra using the PROSPECT-D model [76] as a function of varying degrees of chlorophyll content, equivalent water thickness, and dry matter content. The resulting signatures, which appear in Figure 8, show that intrinsic spectral variability can present complex patterns and nonuniformity, as it is often concentrated in specific regions of the spectrum. Through their analytical characterization of EM spectra, these kinds of models confine spectral variability to a lowdimensional manifold. This constitutes important information that can be leveraged to alleviate/reduce the severe 0.5 1 1.5 2 Wavelength (µm) (b) 2.5 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0.5 1 1.5 2 2.5 Wavelength (µm) (c) FIGURE 8. Reflectance spectra for vegetation generated with the PROSPECT-D model [76] for varying degrees of (a) chlorophyll content, (b) equivalent water thickness, and (c) dry matter content. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 233
UNMIXING METHODS THAT USE SPECTRAL LIBRARIES METHODS THAT USE SPECTRAL LIBRARIES Here, the following characteristics are present: ◗◗ The approaches are usually conceptually simple and easy to interpret. ◗◗ The quality of the SU results strongly depends on the spectral library. One of the main approaches to addressing spectral variability in SU is to consider large libraries of spectra acquired a priori. These libraries contain different instances of each material in a scene, and the unmixing problem becomes generally equivalent to finding which signatures can best represent each pixel in the scene. Different algorithms have been proposed for this task, which we review. The spectral libraries used by these methods are sometimes called bundles, and, in principle, they should account for all possible variations of each material. Mathematically, they are represented as u p, 1, f, m u p, Mp}, p = 1, f, P, (4) M p = {m where M p is a library/bundle containing M p reference u p, i ! R L of the pth material and P is the spectral signatures m number of materials in the scene. The spectral signature of each material in the nth pixel y n of the hyperspectral image is then represented as an unknown element m n, p ! M p belonging to this bundle. Those sets can be readily used to constrain the EM matrices of the LMM for the N pixels to belong to a new set M n ! M, with n = 1, f, N, where M = {[m 1, f, m P], m p ! M p, p = 1, f, P}(5) is the set of all possible EM matrices, with P Pp = 1 M p elements. This definition assumes that only one signature from each library, M p, p = 1, f, P , is present in each pixel. However, other representations of the EM signatures as, e.g., sparse and convex combinations of the elements in M p, can also be considered to obtain more flexibility (see, e.g., [82]–[84]). Such strategies are discussed in the “Multiple-Endmember (a) (b) Spectral Mixture Analysis and Its Variants for Small Spectral Libraries” and “Sparse Unmixing” sections. Different methods have been proposed to solve the SU problem by using spectral libraries. These can be roughly divided into four groups of formulations: multiple-EM spectral mixture analysis (MESMA), sparse SU, machine learning, and spectral transformations. The MESMA algorithm and its variants formulate SU as a computationally demanding optimization problem and achieve good quality. Sparse SU formulations use mathematical relaxations to the MESMA problem that are computationally easier to solve. Machine learning algorithms provide more flexible ways to perform SU but also at a large computational complexity. Spectral transformations are empirically oriented techniques that can be employed to improve methods from the first three categories. Although these families use spectral libraries to address spectral variability in SU, the reasoning underlying each of them can be quite different, leading to varying degrees of required user supervision, computational complexity, and abundance estimation quality, as illustrated in Figure 2. Moreover, additional prior knowledge can be considered in different ways, including, e.g., the design of principled neural network architectures and the manual specification of the robustness of particular spectral bands to variability. We review each family of approaches in the following. MULTIPLE-ENDMEMBER SPECTRAL MIXTURE ANALYSIS AND ITS VARIANTS FOR SMALL SPECTRAL LIBRARIES MULTIPLE-ENDMEMBER SPECTRAL MIXTURE ANALYSIS AND ITS VARIANTS This algorithm and its modifications have the following qualities: ◗◗ They generally provide good SU results. ◗◗ They are easy to set up, with few or no parameters. ◗◗ They have very high computational complexity. ◗◗ Their results depend strongly on the quality of the available spectral library. ×10–3 2 0.05 1.6 0.04 1.2 0.03 0.8 0.02 0.4 0.01 0 (c) 0 FIGURE 9. The spatial behavior of EM variability. (a) The soil subregion of the Samson image (highlighted by a red square). (b) The Euclidean distance and (c) the spectral angle between each pixel and the average spectra of the region. 234 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
The MESMA algorithm [37] and its variants (sometimes also referred to as iterative mixture analysis cycles) are among the most widely used for this task. These methods (Figure 10) enable the EM signatures to vary on a per-pixel basis while following the model in (4). The unmixing problem is solved by searching for the EM and abundance combinations that result in the smallest reconstruction error (RE) for each observed pixel, i.e., argmin y n - M n a n 2 a n, M n subject to M n ! M, a n $ 0, 1 < a n = 1.  (6) The EM matrices M n constructed by taking spectra from the bundles are sometimes called EM models. The MESMA algorithm has been employed in a wide variety of situations, including natural, urban, and extraterrestrial environments [9, p. 1,607] as well as single and multidate scenarios [85]. However, even though MESMA is very amenable to parallelization [86], it consists of a combinatorial optimization problem whose associated computational cost can become very high. More specifically, its computational cost scales as the product of the sizes of the individual libraries, as it consists of solving a fully constrained least-squares (FCLS) problem, P Pp = 1 M p times [87]. This can make the unmixing complexity unrealistic for large library sizes. Furthermore, the problem (6) can become ill-posed when there are many EMs in the bundles since different material combinations can lead to very similar REs. To circumvent these limitations, several modifications to the original MESMA algorithm have been proposed. Many variants of MESMA aim to provide computationally efficient approximate solutions to (6). The simplifications consist of stopping the exhaustive search Sparse Unmixing minY – M A2, Subject to Lib F A Spectral Library optimization procedure (6) early by selecting the first EM model that presents an RE that is below a threshold and well distributed across spectral bands [37]. Another proposed approach is to solve (6) approximately by performing unconstrained least squares with every possible EM model and then select the solution that yields positive abundances and the smallest RE [88]. Although these simple modifications successfully reduce the computational complexity of MESMA, the approximations involved can also negatively impact the abundance reconstruction results [89], which imposes practical limitations on the selection of the thresholds and tolerances. This motivated the consideration of more elaborate strategies to provide better complexity reductions without impacting the unmixing performance. An alternative approach to MESM A attempted to lessen the computational complexity by solving an angle minimization problem with respect to each library [87], [90]. Although not guaranteed to converge to the optimal solution of (6), this strategy performed similarly to MESMA on practical experiments, and it scales linearly with library sizes, leading to computational improvements for large numbers of signatures M p, p = 1, f, P, in the EM bundles. Another work considered a mixed integer linear program reformulation of the MESMA problem. This approach enables a more efficient computation of an exact solution to (6) for small- to medium-scale problems [91]. A simple technique that is largely employed to reduce the computational complexity of MESMA is to perform a careful pruning of the spectral libraries M p, p = 1, f, P. This process attempts to remove redundant and irrelevant spectra from the libraries before unmixing. These strategies are described in detail in the “Library Reduction Techniques” section. Sparsity Constraints Physical Constraints Structuring Constraints Estimated Abundances and Endmembers Optimize Cost Function Observed Image MESMA Perform SU With Every Possible Combination of EMs Select Abundances and EMs With the Smallest Reconstruction Error FIGURE 10. The MESMA, fuzzy, and sparse SU techniques. MESMA and sparse SU are the main methods based on spectral libraries. The ba- sic principle behind MESMA is to iteratively search for the combination of EM signatures in the library that, among all possibilities, enables the closest reconstruction of each observed pixel under the LMM. Sparse SU, on the other hand, performs EM selection and abundance estimation in a single optimization problem by using sparsity and structuring constraints and penalties, which facilitates faster processing times. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 235
Besides reducing complexity, other approaches modify MESMA with the purpose of improving its accuracy. For instance, an early practice attempted to alleviate the illposedness of MESMA by prioritizing models with a smaller number of EMs for otherwise comparable REs [92], [93]. This avoids increasing the complexity of the model for marginal gains. When consideration of material nuances is important, it may be necessary to allow multiple signatures of the same broader EM class in the model. This was the case in [94], where effects such as different vegetation species in a single pixel were of interest. Spatial information has also been considered with MESMA by using segmentation algorithms to divide the image into different homogeneous objects, which are then unmixed individually by using a library that is also constructed from objectbased spectra [95], [96]. A different formulation attempted to increase the flexibility of MESMA by enabling the EMs of each pixel to be represented as a sparse, nonnegative combination of the signatures contained in the library for their respective material class [82], [83]. Under this model, SU was then formulated as a nonconvex optimization problem with different sparsity constraints, including both L 1/2 [82] and L 0 norm-based penalties [83]. This problem was solved through a multiplicative update rule in [82] and by using the proximal alternating linearized minimization method in [83]. Another set of approaches related to MESMA, referred to as fuzzy unmixing, consider a measure of uncertainty or indeterminacy in the estimated abundances by computing quantities such as average, maximum, and minimum cover fractions. One of the first techniques of this kind employed linear programming methods to determine maximum and minimum fractional abundances for each material by using spectral libraries extracted from the observed image [97]. Another approach attempted to determine the abundance indeterminacy (i.e., its fuzzy membership for each value of the abundance fractions) by evaluating how close synthetically mixed spectra that had all the possible EM combinations were to the observed pixel spectra y n [98]. This procedure, however, required the discretization of the abundance values, and its computational complexity does not scale well with the number P of EM classes. Other approaches performed linear SU with a large number of EM models selected at random from the library. Afterward, measures of uncertainty in the estimated fractional abundances, such as maximum, minimum, and average cover fractions, were computed from the results, providing a more detailed characterization of the abundances [99]–[101]. A similar work proposed to compute the final abundance fractions as a weighted sum of the abundances obtained from SU, with each possible combination of signatures drawn from the library M [92]. The weights corresponded to the probability of each EM model being actually present in the scene, which was supposed to be known a priori. 236 SPARSE UNMIXING SPARSE UNMIXING Sparse unmixing has the following characteristics: ◗◗ Generally, it is very computationally efficient, especially compared to multiple-EM spectral mixture analysis (MESMA). ◗◗ The SU results might not be as accurate as those from MESMA. ◗◗ The process can be harder to interpret (e.g., it might select multiple signatures of the same material to represent a given pixel). ◗◗ The SU results are sensitive to the selection of the regularization coefficients. An alternative approach to SU with spectral libraries is to formulate the SU as a sparse regression problem, where we want to select a small number of spectral signatures from the library that can best represent each observed pixel according to the LMM. Most sparse unmixing methods are based on an unstructured library, which can be derived from (4) by concatenating all the signatures in a single matrix M Lib, defined as u 1, 1, f, m u p, k, m u p, k + 1, f, m u P, MP]. (7) M Lib = [m Using the spectral library defined in (7), the sparse unmixing problem can be formulated as the optimization problem [102], [103] 2 argmin y n - M Lib a n an $ 0 subject to a n 0 # P, 1 < a n = 1,  (8) where $ 0 is the L 0 pseudonorm, which counts the number of nonzero elements in a vector. Different strategies have been proposed to solve the sparse SU problem by using the L 0 pseudonorm, employing, for instance, greedy (e.g., matching pursuit and forward–backward) algorithms [104], [105], Lagrangian function (regularized) formulations [106], and multiobjective optimization procedures that jointly consider the RE and the sparsity of the solution [107]–[109]. Note that (8) would be equivalent to MESMA if we added an additional linear structuring constraint to enforce the occurrence of only a single nonzero abundance per material class [91]. The optimization problem (8) is, however, nonconvex and generally NP-hard to solve. It is therefore common to relax the L 0 pseudonorm constraint into its convex surrogate, leading to the following optimization problem [102]: argmin y n - M Lib a n an $ 0 2 + m a n 1, (9) where $ 1 is the L 1 norm and the parameter m controls the level of sparsity of the estimated abundances. The sum-toone constraint is not used in (9), due to its incompatibility with the L 1 norm [102]. Although problem (9) is nonsmooth, it is convex and can be solved very efficiently. Besides, it produces good experimental performance. This motivated a great deal of interest in sparse unmixing methods, resulting IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
“background” spectrum of the EMs. This background spectrum was defined as the low-frequency part of the spectral signatures and estimated a priori from the library as a parametric function of smooth splines. The performance of an L 1 norm-based sparse SU framework under this model was reported to be similar to MESMA, albeit at a much smaller computational cost. in a number of works proposing improvements, such as the use of alternative sparsity-promoting penalties [110], [111] and different means of spatial regularization [112], [113]. Sparse unmixing methods would merit a more comprehensive review, which is beyond the scope of this article. Thus, in the following, we restrict ourselves to modifications of the sparse SU framework specifically aimed at dealing with spectral variability and structured libraries. In [114], L 2, 1-norm based group sparsity constraints were used to favor the selection of abundance vectors containing many entire material classes with zero proportions. A later formulation considered a fractional group (p, q)-norm sparsity constraint as a generalization of the approaches based on the L 2, 1 norm [84]. The (p, q)-norm penalty provides better control of the sparsity within each group of variables as well as the addition of the sum-to-one constraint. However, this comes at the expense of making the optimization problem nonconvex. Another sparse SU formulation [115] proposed to explicitly represent mismatches between the library spectra and the hyperspectral image caused by different acquisition conditions. In this case, the spectral signatures of the library are also estimated in the SU process. However, they are constrained to be within a given Euclidean distance of a corresponding element of the library known a priori. This enables the estimated signatures to vary arbitrarily within Euclidean balls centered at the library elements to compensate for spectral mismatches. A different approach [116] was to modify the LMM for unmixing mineral spectra in mining applications by including an additional term representing the mixture of the Spectral Library Synthetic Abundances MACHINE LEARNING ALGORITHMS MACHINE LEARNING ALGORITHMS The following is true of machine learning algorithms: ◗◗ They are very flexible approaches that, in principle, can deal with any effect that is represented in the training data. ◗◗ Most methods have significant computational complexity or do not have a clear physical motivation. ◗◗ The SU quality depends on the representativeness of the training data (which are usually generated using a spectral library). ◗◗ They generally do not return EM spectra for each pixel. Some works propose to address spectral variability using machine learning methods (Figure 11) by formulating SU as a supervised regression problem. The objective is to learn transformations that map the observed (mixed) pixel to the abundance fractions [117]–[121] using a supervised training procedure. Mixed pixels with known proportions are employed as training data for algorithms, such as neural networks, random forests, and support vector machines (SVMs). These techniques can be straightforwardly adapted to address spectral variability by considering multiple spectral Mixed Pixels Supervised Learning Estimated Abundances and EMs Training Data Hidden Layers Observed Image Neural Network FIGURE 11. A description of machine learning-based SU techniques. The flexibility and representation power of machine learning algorithms can be exploited to address spectral variability by formulating SU as a supervised learning problem. One simple approach is to learn a mapping between the mixed pixels and the abundance and EMs based on training data generated synthetically via a spectral library. However, the incorporation of expert knowledge about the SU problem in the design of the machine learning algorithm is important to obtain better performance and effectively address spectral variability. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 237
signatures for each EM when generating the synthetic training data set. This has been done by directly applying regression methods [122] and by converting SU into a classification problem by quantizing the solution space of abundance values and using a one-against-all strategy [123]. Another work modified the SVM cost function to directly minimize the unmixing RE during the training process [124]. Usually, these approaches result in extremely large training sets for sizable spectral libraries. Thus, even though some strategies, such as bootstrap aggregation, have been employed to speed up the training process [125], [126], the computational cost is still very high. Although methodologies to discard irrelevant (regarding the impact on performance) subsets of the training data [127], [128] could, in principle, be applied to accelerate training, recent works have focused on modifying the algorithms to reduce their complexity. One of the main reasons for this complexity is that the training data must jointly describe spectral variations due to changes in the abundances and EMs. Some works have tried to address this issue by using only pure pixels from a spectral library as training data. One such approach, which received considerable attention, consists of extended SVMs. Extended SVMs employ hybrid soft–hard classification and regression to address spectral variability. It is assumed that the spectral space is separable by hyperplanes delimiting two complementary regions containing only pure and only mixed pixels, respectively [129]. The extended SVM is then trained to find a soft–hard classifier containing 1) a hard classification rule consisting of the hyperplanes delimiting the regions in which the pixels are considered pure and 2) a soft classification rule, which determines the abundances of the pixels considered to be mixed. Different forms of the extended SVM have been studied, using either a single kernel [129] or multiple kernels [130], accounting for the abundance indeterminacy by computing the maximum and minimum proportion values similarly to the fuzzy SU procedures [131] and using Fisher discriminant analysis (FDA) to reduce the within-class spectral signature variability in the spectral library before training [132]. Although hybrid soft–hard classification methods can be fast to train, they lack a clear physical interpretation of the results since they have no direct relation to the physical mixing model. Moreover, the influence of spectral variability on the regions of the spectral space containing mixed pixels is limited because it comes only from the marginal hyperplanes that separate the pure from the mixed pixel regions [129]. A related strategy that uses only pure pixels in the training process consists of modeling the latent function from the mixed pixel spectra to the abundance maps in a probabilistic framework as a multitask Gaussian process [133]. In this case, the abundance means and covariance matrices are obtained through the posterior distribution of the abundances conditioned on the training set (i.e., the spectral library) and the mixed pixels. This strategy was extended to consider spatial correlation in a two-step procedure by using the Gaussian process results from [133] as input to the abundances 238 prior information in a maximum a posteriori estimation problem [134]. Although this strategy has a strong statistical motivation, the introduction of additional constraints (e.g., abundance nonnegativity and sum-to-unity) is not straightforward and results in high computational complexity. Another work proposed to mitigate the influence of spectral variability by processing the image using a geodesic SU method [135] before applying Gaussian process regression to estimate the final abundances [136]. Although possibly inaccurate, the preliminary abundances estimated by the geodesic SU algorithm are not affected by EM variations caused by differences in illumination and acquisition conditions. The Gaussian process regression then learns to map the inaccurate initial abundances to the desired ones. Despite increasing the robustness to spectral variability, geodesic SU can introduce significant distortions in the abundances for complex data manifolds, which may not be trivial to mitigate. Note that other machine learning techniques have been employed to perform SU without directly addressing spectral variability. These include the use of convolutional neural networks [137], the consideration of neural networks that are well adapted to learn from fewer samples [138], and the use of autoencoders to perform unsupervised SU by identifying the latent codes with the fractional abundances and the decoder with the mixture model [139]–[142]. Other works considered specific neural network architectures inspired by unfolding iterative optimization algorithms [143], and they employed Hopfield neural networks to optimize the SU cost functions more efficiently [144]. Machine learning methods have been applied in different experimental settings, such as unmixing spectrally similar vegetation types [145] and urban surfaces [146] and using training data collected at multiple locations [147]. Given the success that machine learning methods are achieving with different problems, particularly in the area of remote sensing [148], [149], such techniques may bring important advances if used to address the EM variability problem in SU. SPECTRAL TRANSFORMATIONS SPECTRAL TRANSFORMATIONS Spectral transformations have the following qualities: ◗◗ They can be seen as a “preprocessing” strategy that can be used with other library-based SU methods. ◗◗ They are conceptually simple and have low complexity. ◗◗ Many of the methods are empirical and require a significant degree of expert knowledge about the underlying application. ◗◗ The performance of the less supervised methods depends strongly on the representativeness/quality of the library. An approach frequently used to mitigate the effect of spectral variability in library-based SU consists of selecting a subspace of the spectral space that is minimally influenced by the variability of the EMs to be prioritized in the unmixing process. This idea was introduced to improve the classification of materials under varying atmospheric IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
illumination conditions [17], [20]. The majority of these methods are based on affine transformations of the observed pixels, defined as W m y n + b n = W m M n a n + W m e n + b n, (10) where the matrix W m and the affine term b n are determined to minimize the effects of EM variability in the subsequent SU process. Besides modifying the observed pixel spectrum y n, this transformation is applied to the elements of the spectral library, yielding W m M p + b n _ {W m m + b n : m ! M p}(11) for p = 1, f, P. Different cases of this model have been considered in the literature, most notably with W m being a diagonal matrix with positive real (band weighting) and binary (band selection) elements. Note that although traditional dimensionality reduction [e.g., principal component analysis (PCA)] and band selection methods used to compress the hyperspectral image could be implemented through this transformation with b n = 0, the direct application of compression techniques does not necessarily improve the robustness to spectral variability [150]. Spectral transformation approaches can be generally divided into two major groups: those defined a priori based on the user’s expert knowledge and those constructed automatically by incorporating information in a spectral library. We review each case in the following. USER-DEFINED SPECTRAL TRANSFORMATIONS The first user-defined spectral transformations were proposed to normalize the effects of illumination and brightness variations and to emphasize useful spectral features. These approaches include subtracting the reflectance value of a selected (specific) spectral band from all remaining bands [99], subtracting from each EM its mean value in the spectral dimension to reduce the variability due to differences in brightness [151], and normalizing/dividing the reflectance value at each wavelength by the corresponding value of the convex hull of the spectral signatures [152]. Other examples include using the first or second derivatives [153], [154] and the wavelet transform of the spectral signatures [150] for SU. A spectrum-based approach that has become very popular for solving this problem consists of using band selection methods. These techniques basically work by performing SU via selected wavelength intervals in which there is little spectral variability between different spectral signatures of the same material [9], [99]. Although many of these approaches rely on expert knowledge about the specific underlying application, they are simple and easily interpretable and also help in reducing the computational cost of the SU problem. Examples of band selection methods defined a priori by the user include the use of the short-wave infrared 2 spectral region (2,100–2,400 nm) for unmixing soil and vegetation in arid and semiarid environments [99] and the combination of various spectral regions, such as visible, near infrared, and short-wave infrared, for other applications [100]. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE LIBRARY-BASED SPECTRAL TRANSFORMATIONS Spectral transformations proposed more recently leverage information contained in the spectral library to compute the terms W m and b n of the affine transformation. This circumvents one of the main downsides of the previous approaches by making the process automated instead of delegating the choice to the user. These techniques can be further divided in three groups, namely, band selection, band weighting, and more general spectral transformations. BAND SELECTION Newer band selection methods seek to identify robust spectral regions based on samples in the spectral library. Different strategies have been proposed. One of the first approaches is based on the analysis of the spectral residuals obtained by performing a preliminary unmixing of the image using the LMM with an average EM matrix [155]. Only the spectral bands with minimal residual variance are then used for SU, based on the empirical observation that they correspond to more robust spectral zones. Another method, called stable zone unmixing, proposes to select spectral bands that are robust to spectral variability by minimizing an instability index defined as the ratio between the intra- and interclass EM variances (computed based on a spectral library) [156]. This method was later extended to minimize the instability index and the correlation between signatures of different EM classes at the same time, aiming to improve the numerical conditioning of the SU problem [157]–[159]. The work [160] proposed to improve the separability between classes by employing the stable zone unmixing framework to select an individual set of spectral bands for each possible subset of EM/material classes that could be tested with MESMA when considering EM models with fewer than P signatures in the SU process. BAND WEIGHTING Band weighting methods are more flexible techniques that enable one to prioritize the spectral bands in the unmixing process according to their reliability and significance by using a continuous weight term. This is usually done by weighting the RE of each band in the SU cost function. Different approaches have been proposed to compute the weight to be applied to each band. For instance, a weighting strategy based on two terms was offered in [161]. One term normalizes the energy of the reflectance spectra to equalize the contributions to low- and high-reflectance bands, and another term accounts for the robustness of each band to spectral variability through its instability index (i.e., the ratio between the intraand interclass EM variance). This approach was later applied to monitor the level of defoliation in Eucalyptus plantations [162] and invasive plant species via multitemporal data [163]. It was later extended in [164] to consider SU that integrated reflectance and derivative spectra. Band weights based on the instability index were used to prioritize the more stable spectral bands when designing spectral filters that were robust to spectral variability, which are low-complexity alternatives 239
that approximate the solution of the SU problem as a direct application of a single linear transformation [165]. GENERAL SPECTRAL TRANSFORMATIONS Another group of approaches employ more flexible linear transformations to better mitigate the effect of spectral variability. These techniques consist of variations of FDA, which is widely used for pattern classification. FDA aims to find a transformation of the data to obtain a feature space with the best separability between different classes [166]. In the context of SU, this amounts to minimizing the variance of the signatures of each material while maximizing the distance between the mean values of the different EM classes [167]. Mathematically, this is formulated as W m = argmin W W < S within W , (12) W < S between W where S within is the weighted sum of the within-class covariance matrices and S between is the covariance matrix of the mean EM spectra. The first approaches directly applied FDA to SU by using spectral libraries that were known a priori [167] or constructed with pure pixels extracted from the observed hyperspectral image [168]. Another work considered the augmentation of the spectral library with pure pixels extracted from the image to improve the discrimination among spectrally similar vegetation species [169]. Later approaches studied other variations, such as the iterative addition of more column vectors to W m using a Gram–Schmidt orthonormalization procedure to increase the dimensionality of the output space for multispectral images with a small number of bands [170]. Another work proposed to make the spectral signatures of different EMs orthogonal to one another and the spectral signatures of the same EM all unitary and collinear to improve the numerical conditioning of the SU problem [171]. FDA was successfully applied to improve the performance of MESMA when unmixing urban surfaces (containing vegetation, soil, water, and man-made materials) using imageextracted spectral libraries [172]. In contrast to its improved flexibility, FDA has as a downside: its dependence on good estimation of the covariance matrices to be used in (12). Thus, it may not perform well if the number of samples in the libraries is not statistically representative [173]. UNMIXING METHODS THAT ESTIMATE ENDMEMBERS FROM THE IMAGE In recent years, a large number of works proposed to address spectral variability in SU without relying on prior knowledge about spectral libraries. Different strategies have been proposed to this end, which we divide into four groups. Local unmixing methods are computationally and conceptually simple but require significant user supervision. Parametric EM models provide more flexibility to represent EM spectral variability but make the SU problem harder to solve. EM-model-free methods address spectral variability by using different modifications to the SU cost 240 function. Bayesian methods employ statistical representations for the EMs, which leads to less user supervision at the price of high computational complexity. All these methods are able to estimate EMs and abundances directly from the image. However, as seen in the “Unmixing Methods That Use Spectral Libraries” section, the reasoning underlying each group is quite different, which leads to various levels of required user supervision, computational complexity, and abundance estimation quality, as illustrated in Figure 2. Moreover, prior knowledge in the design of the algorithms is an important ingredient to guarantee good performance, and it includes, e.g., the spectral and spatial correlation of EM signatures and their statistical properties. We review each family in the following. LOCAL UNMIXING METHODS LOCAL UNMIXING Local unmixing techniques are characterized by the following: ◗◗ They are conceptually simple and physically motivated. ◗◗ They are computationally efficient. ◗◗ Using them requires a significant amount of user supervision. ◗◗ The selection of the local image regions has a significant impact on the results. ◗◗ Local EM extraction can be difficult. ◗◗ Grouping the local estimates into global results is also challenging. A conceptually simple and efficient method to deal with spectral variability is to perform EM extraction and SU locally for small, nonoverlapping regions of the hyperspectral image. This approach, called local unmixing and detailed in Figure 12, assumes the EM signatures to be constant in each region of the image, benefiting from the knowledge that spectral variability is often negligible in small areas. The basic framework of local unmixing can be summarized into the following steps: 1) Divide the observed image into a set of regions. 2) Estimate the number of spectral signatures and extract the EMs in each region. 3) Perform SU with the local EM signatures. 4) Combine all the local SU results into global sets of EM signatures and global abundance maps using, e.g., clustering procedures. Although local unmixing methods share similar overall methodologies, there are important differences in the way the hyperspectral image is partitioned (e.g., by using simple square tiles or more advanced image segmentation) and how the EMs are extracted from each region. This can have a significant impact on the results. The first approaches for local unmixing required complete user supervision. For instance, the variable MESMA algorithm proposed in [10] used manual image segmentation to divide the image into local regions. SU was then performed iteratively, updating the segmentation maps and manually including additional EMs in the process until a satisfying result was obtained. Later IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
techniques attempted to reduce the need for user supervision in the process. For instance, EM extraction and SU were performed individually in local (square) image tiles in [174] and [175]. Afterward, the locally extracted EMs and abundance maps were merged into the global EM sets and abundance maps using clustering algorithms. Image segmentation methods were later applied to provide more flexibility when dividing the hyperspectral image into local regions. For instance, in [176], manual EM extraction and SU (using the FCLS algorithm) were individually performed in each image region defined by a segmentation algorithm. Another work considered a superpixel decomposition of the image, aided by external map metadata to compute a more accurate segmentation [177]. A more sophisticated method proposed in [178] and [179] used a binary partition tree to divide the image into different regions, from a coarse to a fine spatial scale. Local unmixing was then performed at the scale of the partition tree that yielded the smallest REs. Besides the choice of the segmentation procedure, EM extraction is a challenging part of local unmixing and has a great impact on the performance of these algorithms. A spatially adaptive unmixing method was proposed in [180] to estimate the distribution of different surfaces in urban environments. EM spectra for each pixel were synthesized as a weighted average of pure pixels extracted in a spatial neighborhood specified by the user, with weights given as a function of their distance to each mixed pixel at hand. A similar approach employed as EMs the mean values of pure pixels extracted within each (square) image region, with the pixels identified through a classification strategy [181]. These techniques can positively weight pure Divide Image Into Local Regions Observed Image pixels that are spatially close to each pixel being unmixed. This idea was also in other works, such as [182], which performed SU through a variant of the MESMA algorithm, and in [183], which used only the spatially closest pure pixels to process each mixed pixel. Other local unmixing approaches considered hierarchical segmentation approaches in which the hyperspectral image was divided into two spatial scales: a coarse one, where unmixing was performed with MESMA, and a fine one in which the spectral libraries were extracted by using the spectral signatures of small and homogeneous objects [184] or a priori knowledge about the abundances obtained from external high-resolution classification maps [185]. An important issue related to local unmixing algorithms is the determination of the number of EMs contained in each local image region. While in most experimental works this was performed empirically and even manually, it is desirable to have automated methodologies to estimate the number of local EMs and their spectral signatures. This usually involves the estimation of the intrinsic dimensionality of the local subset of the hyperspectral image [186]. However, the performance of intrinsic dimensionality estimators is often negatively impacted when the size of the data set is small [187]. This strongly limits the characteristics (i.e., the size) of the subsets and the segmentation procedures selected for unsupervised local unmixing. Collaborative sparse regression approaches [110] were proposed to deal with the shortcomings of intrinsic dimensionality estimation by avoiding the selection of repeated and mixed signatures during unmixing [188]. The sparsity level was selected via a Bayesian information criterion to obtain a good compromise between small REs and a low number of selected signatures. Clustering Estimated Abundances and EMs Unmixing Results for Local Regions FIGURE 12. A description of local SU techniques. Local SU addresses the variability of EM signatures across space by performing the process on small, compact spatial regions of the image in which the EMs can be assumed to be approximately constant. The local SU results for each image region are then clustered to assemble global abundance maps and sets of EM spectra. Local SU offers a lot of flexibility in the choice of the segmentation of the image and the local EM extraction and clustering strategies, which can have a significant impact on the global SU results. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 241
A different line of work attempted to relax the assumption of the connectedness of the local spatial regions, performing SU in different subsets of the hyperspectral image that were not necessarily spatially adjacent. For instance, the piecewise convex model proposed in [189] considered a set of different EM matrices, all estimated from the entire image. Each pixel was then assigned to one of these EM matrices through a (fuzzy) membership function, which was estimated with the other variables in a nonconvex matrix factorization problem. Other works extended this approach by considering cluster validity indices [190] and sparsity-promoting priors [191] to estimate parameters such as the number of EM matrices and material classes in each segment and using spatial constraints to encourage neighboring pixels to have similar membership values [192]. Similar work considered the estimation of multiple EM matrices in a nonnegative matrix factorization framework by using abundance sparsity constraints instead of employing (fuzzy) membership functions while also penalizing the mutual coherence between the signatures of different material classes to improve interclass separability [193]. A related strategy considered a self-dictionary model, where multiple EM signatures were directly selected as the hyperspectral image pixels that could best reconstruct most of the remaining pixels in the scene as a sparse linear combination [194]. Another approach with even more flexibility considered an individual EM matrix for each image pixel in a nonnegative matrix factorization formulation [195]. A regularization term penalizing the trace of the covariance matrix of the estimated spectral signatures for each class was also considered to reduce the ill-posedness of the estimation problem. PARAMETRIC MODELS PARAMETRIC ENDMEMBER MODELS Parametric EM models possess the following traits: ◗◗ The SU algorithms are computationally efficient. ◗◗ They are very flexible and physically motivated models to represent any kind of variability. ◗◗ It is easy to incorporate prior information. ◗◗ Determining a good EM model might require some degree of expert knowledge. ◗◗ They require significant user supervision for tuning the free model parameters. ◗◗ Estimating the parameters of the EM models (along with the abundances) can be challenging due to the presence of nonconvex optimization problems and sensitivity to parameter choice and initialization. A flexible and physically reasonable way to address spectral variability consists of employing parametric models to represent EM spectra; see Figure 13. These strategies provide great freedom to incorporate constraints and information from the underlying application. They are generally based on representing the EM spectra as M n = f (M 0, i n), (13) where f (·) is a function of an average or reference EM matrix M 0 and a vector of parameters i n. The number of parameters in i n is usually small, which enables one to confine the EM spectra to a low-dimensional manifold. The SU problem is then formulated as the recovery of the abundances and parameters i n for all pixels in the image. The model in (13) can be defined based on the underlying physics describing material spectra as a function of numerous geometric and Parametric EM Model min N {θn,an}n = 1 Estimated Abundances and EMs N Σ yn – f (M0, θn)an2 + R ({θn}nN= 1, A) n=1 Optimize Cost Function Observed Image EM-Model-Free Objective Function min φ, A gφ(Y – M0A) +R (φ, A) Regularization Terms FIGURE 13. A description of SU techniques based on parametric EM models and EM-model-free SU approaches. Parametric EM models represent the (variable) signatures of EMs as a function of a low-dimensional vector of parameters. The abundances and vector of EM parameters for each pixel are then recovered by solving an optimization problem. EM-model-free methods, on the other hand, generally attempt to indirectly mitigate spectral variability through the design of robust cost functions using, e.g., additive residual terms. The use of regularization terms is important in both cases to incorporate a priori knowledge about the problem. 242 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
photometric parameters, such as Hapke’s [48], [49] and Shkuratov’s [196] models for packed particles and the PROSPECT and PROSAIL models for vegetation [68], [69]. However, the model (13) can also be inspired by physics but chosen to facilitate more flexibility and mathematical tractability. We review these approaches in the following. PHYSICS-BASED METHODS The first SU approaches using parametric models aimed to obtain fractional abundances from intimate mineral mixtures by inverting the Hapke model [197]. With perfect knowledge about the viewing geometry, the scattering properties of the different materials, and the single scattering albedo of the EMs, the SU problem using Hapke’s model becomes linear in the albedo domain [198]. However, since these variables are hardly available in practice, many works attempted to blindly invert Hapke and related models. This inversion is mathematically and computationally very difficult, in general, and requires hyperspectral images acquired at multiple viewing geometries [197]. Thus, subsequent works proposed simplifications of the scattering characteristics of the materials in the model (13) to improve its mathematical tractability [199]. These methods have been successfully applied to estimate abundance maps of different scenes, including the Cuprite mining district in Nevada [200] and the moon [201], [202]. This approach has also been applied to the SU of vegetation mixtures based on the inversion of radiative transfer models. The first works simplified the problem by assuming external knowledge of biophysical parameters. For instance, a model for mixtures of vegetation and shadowed and illuminated soil was proposed for SU by approximating plant geometry with spatially distributed cylinders containing layers of leaves [203]. Although spectral variability was supported by means of changes in biophysical parameters, these were assumed to be known a priori to solve the SU problem. Another approach considered the SU of soil and vegetation by using a simplified mixing model as a function of NDVI values instead of the full spectral signatures [204]. In this case, a physical model was employed to represent the variability of the NDVI values as a function of parameters, such as the viewing geometry, leaf density, and clumping effect. However, the NDVI “EMs” for each pixel had to be estimated before SU by using multiangled observations and assuming prior knowledge of the leaf biophysical parameters. A later technique for the SU of soil and vegetation mixtures proposed to blindly estimate the biophysical parameters from the hyperspectral image using the PROSAIL model for vegetation spectra [205]. The SU problem was formulated as the recovery of both the abundances and the two parameters of the PROSAIL model and solved through an alternating optimization procedure. Note that the other parameters of the DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE PROSAIL model had to be fixed a priori. Although these models possess a strong physical motivation, their use in SU leads to computationally intensive and mathematically challenging (i.e., nonconvex and significantly ill-posed [206]) problems. This occurs because physics models were originally devised as forward representations that accurately describe reflectance spectra based on a set of parameters and were not designed to be inverted, which limits their use for SU in practical problems [207]. PHYSICALLY MOTIVATED AND NONPHYSICS-BASED METHODS The low mathematical tractability of physics-based models has motivated recent studies leading to more flexible and parsimonious models that are inspired only by the underlying physics. Although these models are not as precise as those presented in the “Physics-Based Methods” section when representing physical phenomena underlying spectral variability, they enable more efficient SU algorithms that estimate the involved parameters i n from the observed image. Moreover, although models inspired by physics can be ill-posed, EM spectra are often confined to a low-dimensional manifold since they depend only on a small number of physicochemical variables. This property can be exploited to design parsimonious models with possible constraints and reduce the ill-posedness of the SU problem. Several parametric models have been proposed with these objectives. One of the resulting SU algorithms is the scaled constrained least-squares method [18], which attempts to represent uniform illumination variations in each pixel by introducing an additional scaling factor } n ! R + into the EM matrices as M n = } n M 0. (14) SU can be performed using the model (14) by solving a simple nonnegative least-squares problem, which is convex and computationally efficient. However, this representation lacks the ability to represent the more complex spectral variability that has been observed in practical scenes, motivating the search for more flexible models. A version of the LMM, the extended LMM (ELMM), was proposed in [51] and [208] by enabling each EM in a pixel to be individually scaled by a constant factor, resulting in the following representation for the EM matrices: M n = M 0 diag (} n), (15) where vector } n ! R P+ contains the scaling factors for each of the P materials. The ELMM can represent more complex variability originated from variations in the illumination and topography, which can affect each material in the hyperspectral image differently. Furthermore, the ELMM 243
can be obtained from successive physical approximations of the Hapke model for small-albedo materials [52]. Based on an estimate of M 0 obtained from the observed image, SU under the ELMM was formulated as a nonconvex matrix factorization problem in which the model (15) was enforced by means of an additive penalty in the cost function [51]. A regularization promoting the spatial homogeneity of the scaling factors } n was also considered to reduce the ill-posedness of the SU problem [51]. The ELMM has shown good performance for multitemporal data [209] and been used to facilitate the interpretation of local unmixing results [210]. Moreover, it can be derived from a Taylor series expansion of a general nonlinear mixture model [211], which introduces SU with spectral variability (viewed as a locally linear SU problem) as a direct way to address the general nonlinear SU problem. This shows that some mixture models originally devised to represent spectral variability (such as the ELMM) can achieve good performance in nonlinear SU. Despite its physical motivation, the ELMM lacks the f lexibility to represent more complex spectral variability, e.g., nonuniformly affecting the spectra. To address this limitation, the generalized LMM (GLMM) was proposed in [212] by introducing an individual scaling factor for each band, leading to the following EM model: M n = W n % M 0, (16) where the matrix W n ! R L # P contains the scaling factors for each element of M 0 and % denotes the Hadamard (elementwise) matrix product. Note that the amount of spectral variability produced by the GLMM is proportional to the amplitude of the reference spectra M 0 in each band. However, the larger number of parameters makes the SU problem resulting from (16) more ill-posed, with challenging estimation problems. This motivated the development of a tensor interpolation framework to estimate the matrices W n from training hyperspectral data obtained from prior knowledge about the positions of pure pixels in the hyperspectral image [213]. However, the performance of the method proposed in [213] strongly depends on the number of pure pixels available in the image. The GLMM also has been successfully used in multitemporal SU [214] and to represent spectral variability when fusing hyperspectral and multispectral images acquired at different time instants [215]. Note that the performance of unmixing methods based on the ELMM and the GLMM heavily depends on the quality of the reference EM matrix M 0, which must be estimated from the obser ved image. To reduce the dependence of the ELMM on M 0, the authors of [216] proposed to jointly estimate M 0 with the remaining variables during SU. Each column of M 0 was constrained to have a unit norm to obtain EMs as directional data in the spectral space. Moreover, M 0 244 was initialized using a simple cosine-based k-means clustering of the obser ved data cube, which improved the robustness of the method to the presence of shadowed pixels. A different EM model was proposed in [217] by considering an additive term to the mean EM matrix, resulting in the following EM representation: M n = M 0 + dM n, (17) where the matrix dM n ! R L # P is an additive perturbation representing spectral variability. In this case, the reference EM matrix M 0 and the pixel-dependent additive perturbation terms dM n were blindly estimated from the hyperspectral image. However, this model has a large number of parameters. Thus, to mitigate the ill-posedness of the SU problem, a regularization term consisting of the Frobenius norm of dM n, n = 1, f, N was included in the unmixing cost function. Besides the simplicity and mathematical tractability, the use of an additive perturbation in (17) also makes the problem amenable to an interesting interpretation when only a single additive perturbation matrix is considered for all image pixels. In this case, the SU problem becomes equivalent to a total leastsquares problem with constraints [218]. Furthermore, the perturbed LMM (PLMM) has been considered for robust SU using an outlier-insensitive RE metric with an L p-quasi norm [219] and for multitemporal and distributed SU [220], [221]. One difficulty of parametric EM models involves constructing functions f (·) that are parsimonious and still flexible enough to represent complex spectral variations. To circumvent this issue, a deep generative EM model was proposed in [222], based on the hypothesis that the EMs lie on low-dimensional manifolds. Instead of fixing the EM model a priori, variational autoencoders with neural networks were employed to learn the parametric function f (·) in (13) by using pure pixels extracted from the observed hyperspectral image. SU was then formulated as the recovery of the abundances and representations of the EMs in the learned manifold, which can be of very small dimension. Despite making SU more well-posed, the resulting cost functions are nonconvex and can be difficult to optimize. A different work proposed to exploit the spatial correlation of the EMs and abundance maps by proposing a general multiscale mixing model addressing EM variability [223]. The SU problem was solved using a multiscale representation of the mixing model, which facilitated the use of any parametric EM representations, as in (13). This generated improved results when compared to standard spatial regularization strategies. Although the formulation was algebraically involved, an approximate algorithm with little complexity was derived through some simplifying assumptions. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
ENDMEMBER-MODEL-FREE METHODS ENDMEMBER-MODEL-FREE UNMIXING EM-model-free unmixing is characterized by the following: ◗◗ The algorithms are usually computationally efficient. ◗◗ It involves different strategies with a wide range of model complexities and user supervision. ◗◗ Its methods usually make few or restrained assumptions about the EM models (unlike Bayesian and parametric models). ◗◗ Some approaches have a more limited modeling ability. Methods have proposed to blindly mitigate the effects of spectral variability, without assuming any specific model to represent the EM signatures; they are described in Figure 13. One simple approach consists of using a metric depending on the RE in the SU cost function to improve the robustness of SU to EM variability. It can be motivated by the fact that the commonly used Euclidean distance is very sensitive to variations in the amplitude of the pixel spectra, thus being significantly influenced by illumination variations [224]. This drove consideration of spectral angle mapper (SAM), spectral correlation, and spectral information divergence metrics, due to their insensitiveness to scaling variations [224], [225]. The downside lies in the nonlinear and possibly nonconvex nature of the resulting SU optimization problem, which becomes harder to solve. An efficient strategy based on the projected gradient descent algorithm was proposed to optimize the SU cost function when using the SAM metric [226]. Although conceptually simple, these approaches focus on specific effects, such as brightness variations, and it is not clear how they can be generalized to address more complex spectral variability. Recent SU methods consider more general models to deal with complex intrinsic variability effects. For instance, an additive residual term in the LMM (1) was introduced in [227] to account for spectral variability and other unmodeled effects. This term was represented as the product of two matrices. The first corresponded to the first columns of the discrete cosine transform, forcing the additive terms to be spectrally smooth. The second was defined for the pixel-dependent coefficients, which were forced to be spatially sparse and were estimated by solving a convex optimization problem. A similar technique included ideas from physically motivated parametric models by considering the LMM with a constant scaling factor for each pixel to account for rough illumination variations and an additional nonparametric additive term to account for other types of spectral variability [228]. This additive term was defined as the product between an approximately orthonormal basis matrix that had low coherence with the EM signatures and a coefficient matrix representing the variability contribution to each pixel. However, these constraints make the resulting optimization problem nonconvex. A different idea was to estimate a subspace projection of the observed hyperspectral image that minimizes the effect of spectral variability in SU [229]. This strategy enables DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE SU to be performed by minimizing the RE in the projected space. This subspace is forced to be of a low dimension by penalizing the nuclear norm of the projection operator in the cost function, which is jointly estimated with the abundances during SU. A more recent method considered the multidimensional representation of the pixel-dependent EM matrices and abundance vectors by applying mathematical tools from tensor decomposition [230]. By assuming that the EMs and abundance tensors were approximately low rank, the SU problem was formulated as a nonconvex, nonnegative tensor factorization problem. This led to a parsimonious model without the need for explicit parametric representations of the EMs that are tied to specific applications. BAYESIAN METHODS BAYESIAN METHODS In Bayesian methods, the following qualities are observed: ◗◗ The approaches benefit from well-developed statistical estimation tools to derive SU methods. ◗◗ They can have a very low degree of user supervision once the statistical distributions are selected. ◗◗ They can use unrealistic distributions (e.g., isotropic Gaussians) to represent EM for mathematical tractability. ◗◗ Generally, they do not return the specific spectral signatures at each image pixel. ◗◗ They suffer from a very high computational cost. ◗◗ Hyperparameters may need to be set by the user, and specifying hyperprior distributions for hierarchical Bayesian models may not be trivial. Another set of methods considers EMs to be random vectors, following multivariate statistical distributions, i.e., m n, p ` D (i n, p), (18) where i n, p encodes parameters of a distribution D. The spectral signatures actually present in each pixel are realizations of this random vector, and SU is then formulated as the problem of finding a statistical estimator for the abundances and for the EMs. These approaches depend on the statistical distribution D employed to represent EM spectra, the amount of user supervision required, and the computational algorithm used to solve the problem. Some methods require the parameters of the distribution i n, p to be set a priori, which might be difficult in the absence of a large spectral library. Other works reduce user supervision by employing hierarchical Bayesian methods to jointly estimate i n, p with the remaining parameters, at the cost of a higher computational cost [231], [232]. Bayesian methods addressing spectral variability (Figure 14) can be classified according to the statistical distribution used to represent the EMs: a Gaussian distribution, which provides mathematical tractability, and more complex distributions providing a more physically reasonable representation. We discuss both cases in the following. 245
THE NORMAL COMPOSITIONAL MODEL The first statistical model considered for the representation of EM spectra was a multivariate Gaussian distribution, in the so-called normal compositional model (NCM), given by m p, n + N (i n, p), (19) where D / N and i n, p = {mean, covariance} contain the mean vector and covariance matrix for the pth EM of the nth pixel. The NCM has been widely used due to its mathematical tractability [233], [234]. The first works employing it for SU considered expectation-maximization strategies in which the abundances and the mean EM values and their covariance matrices were iteratively estimated [233]. However, due to the nonconvexity of the estimation problem, a direct application of expectation-maximization approaches is unable to decide whether variations observed in the mixed pixel spectra y n are due to different abundances or to EM variability. This might result in the EMs absorbing all the variation in the observed scene with nearly constant abundances [234]. Some techniques proposed to address this problem by considering the use of diagonal covariance matrices and empirical strategies to estimate the EM data more easily from the observed mixed pixels. For instance, EM means and covariances were estimated a priori by using pure pixels selected from the hyperspectral image [235], and they were iteratively based on large regions of observed pixels with homogeneous abundances (obtained from the segmentation of estimates of the abundances available a priori) [236]. Other works attempted to improve different aspects of this method by using a particle swarm optimization algorithm to solve the (usually intractable) integrals involved in the estimation of the abundances in the “expectation” step of the algorithm [237] and by incorporating a priori information in the form of additional constraints penalizing the nuclear norm of the abundances in groups of pixels determined through image segmentation methods (to promote spatial homogeneity) [238]. Despite these advances, the susceptibility of expectation-maximization-based methods to converge to poor local minima of the nonconvex cost function prevented their large-scale applicability for this problem. Instead, most recent approaches rely on more robust (although costly) techniques based on Markov chain Monte Carlo methods to sample the posterior distribution. Although the works that adopt this approach share the same underlying idea, they differ significantly in the way in which EMs and abundances are represented and the amount of user supervision that is required. For example, different strategies have been proposed to represent the mean and the covariance matrices of the EMs in the NCM. One of the first considered the EM mean values to be known a priori and their covariance matrices to be multiples of the identity matrix [239] while employing conjugate distributions to make the estimation of the parameters easier. Later works attempted to add more flexibility by considering, for instance, a single full covariance matrix shared by all EMs [240] or a positive definite matrix defined a priori and multiplied by EM-dependent scaling parameters [241]. Diagonal covariance matrices were employed in [242], which also considered the estimation of the EM mean values in a hierarchical Bayesian framework, using hyperpriors to directly estimate the distribution parameters from the observed hyperspectral image. The Bayesian framework was also applied in [243] to blindly estimate the number of EMs in the scene via a uniform discrete prior. Other works attempted to address physically motivated cases of the general NCM. These included consideration of the statistical dependence between different EMs to represent spectral variability that may equally affect all materials Estimated Abundances and EM Distributions Bayesian Inference Observed Image p (YA, θM ) p (A, θM) ∝ p (A, θMY ) Likelihood Prior Posterior FIGURE 14. A description of Bayesian SU techniques. Bayesian SU methods represent the EM signatures at each pixel as a realization of a statistical distribution. Statistical distributions are first attributed to the EMs and abundances and, possibly, other variables or to hyperparameters of these distributions. Using the Bayes rule, the SU results are then derived from the posterior distribution in a Bayesian inference problem. The abundances and EM distributions can be computed as, e.g., the mean or mode of the posterior distribution. 246 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
in the scene (e.g., atmospheric effects) [244] and the explicit representation of the higher correlation between adjacent spectral bands to introduce spectral smoothness to the signatures, leading to a well-posed model that is fast to compute [245]. An alternative approach, which has been used to simplify the unmixing process associated with the NCM, is to estimate the EM means and covariance matrices a priori based on spectral libraries extracted from the observed image. This has been performed by considering libraries obtained through pure pixel-based EM bundle extraction methods [246] and on multiple EM matrices estimated by a piecewise, convex, blind SU algorithm [191]. However, these methods suffer from the limitations of image-based EM bundle extraction techniques, which is discussed in detail in the “How to Construct Spectral Libraries” section. Other works considered a piecewise convex model that uses a set of different Gaussian distributions to model EMs. Afterward, during SU, each image pixel is assigned to one of these distributions through a membership function represented by a Dirichlet random variable. The unmixing problem in this model was solved by considering an alternating optimization method in a maximum a posteriori framework [247] and a Markov chain Monte Carlo sampling approach providing an estimate of the posterior distribution of interest [248]. Although the Dirichlet prior distribution is frequently used to represent abundances, many works have considered variations that incorporate useful information from the underlying practical problem. Examples include the enforcement of abundance sparsity through a sparse Dirichlet prior [249] and the encouragement of spatial homogeneity by dividing abundance maps into a finite number of classes sharing the same Dirichlet distribution parameters. This division has been performed blindly by means of a classification prior by using the Potts model [242] and through an a priori segmentation of the hyperspectral image in a latent Dirichlet allocation framework [250]. More recently, the NCM has been applied to problems unrelated to linear unmixing and spectral variability. For instance, it has been considered for representing the uncertainties in EM estimation instead of the intrinsic variability of the material classes, which changes the problem by introducing statistical dependence between the different image pixels [251]. Other works applied the NCM to problems such as nonlinear SU with a bilinear mixing model [252] for the linear unmixing of sediment grain size distribution (where EMs represent the grain sizes of constituent materials) to study the transport and deposition of sediments [253] and represent the variability of the EMs across multiple images in multitemporal SU by using additional spatially sparse terms accounting for potential abrupt spectral changes between the different images [254]. OTHER ENDMEMBER DISTRIBUTIONS Despite its popularity, the NCM does not have a strong physical motivation, which led to the consideration of more accurate distributions to represent EMs. For instance, a beta DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE distribution was considered in [255] to constrain reflectance values to physically meaningful ranges and enable possible skewness in the distribution. Unfortunately, a direct solution to the SU problem cannot be obtained. Thus, a piecewise constant model was assumed for the abundances, which enabled the parameters of the distribution to be estimated using a combination of a clustering algorithm and a variant of the method of moments. A Gaussian mixture model was also considered in [256] to facilitate possibly multimodal EM distributions. The SU problem was solved as a maximum a posteriori estimation problem using a generalized expectation-maximization approach. However, since learning the parameters of Gaussian mixture models can be difficult, the distributions were estimated before performing SU, based on spectral libraries assumed to be known a priori. Another approach proposed to represent EM spectra as a sum of an average spectral signature known a priori and a spatially and spectrally smooth function representing EM variability to provide a model that is physically more reasonable [257]. Bilinear mixing models were also considered with an additive residual term to account for mismodeling effects and outliers. A different technique has been offered that does not make an explicit assumption about the distribution of EM spectra and, instead, relies only on some of their statistics. This is the case in [258], which formulates the SU problem similarly to the method of moments by trying to find the abundance values that match the mean and covariances obtained through the LMM to those of the observed mixed pixels. A similar work applied the same idea by using transformed statistics constructed from the ratio between the means and covariances of the pixels and EMs in different spectral bands [259]. This strategy increases the robustness of the method since band ratios are invariant to illumination variations. However, similar to [255], a piecewise constant abundance model is employed to estimate the covariance matrix of the observed pixels. Moreover, the covariance matrices of the EMs are assumed to be known a priori. SPECTRAL LIBRARIES A large number of the SU techniques discussed in the “Unmixing Methods That Use Spectral Libraries” section address spectral variability by using spectral libraries and bundles known a priori. The performance of these methods is often heavily impacted by how well the libraries can represent the EMs actually present in the scene. Moreover, in many practical situations, it is either very costly or even impossible to obtain laboratory and in situ measurements of EM spectra. Another issue with many methods presented in the “Unmixing Methods That Use Spectral Libraries” section (such as MESMA) is that the approaches’ computational complexity increases very quickly with the library size, which can make the problem intractable for large libraries. Thus, the challenges of removing redundant and irrelevant spectra before SU and, especially, extracting spectral libraries directly from observed hyperspectral images are of central importance to enable the techniques discussed in the 247
“Unmixing Methods That Use Spectral Libraries” section to be widely applicable. Fortunately, several techniques have been proposed to address both of these problems, which we discuss in detail in this section. HOW TO CONSTRUCT SPECTRAL LIBRARIES Many library-based SU works assume that spectral libraries are manually obtained from in situ and controlled laboratory measurements [37], [260], which may be complicated in practical applications. Moreover, existing libraries may have been acquired in conditions that do not reflect those actually observed in the scene, which introduces errors in the SU process [37], [103], [261]. Even the spatial resolution at which the hyperspectral image was acquired was found to have a considerable impact on the results of SU with MESMA in urban environments when the library was fixed a priori [262]. Traditional EM extraction algorithms (EEAs), on the other hand, typically consider only a single spectral signature per material and are thus unable to appropriately address spectral variability [33], [263]. These shortcomings make the construction of spectral libraries one of the main challenges of library-based SU methods [260]. A simple and reliable method that has been employed to construct spectral libraries depends on expert knowledge to manually select pure pixels of each material from the hyperspectral image [262], [264]. However, there has been growing interest in developing methods that can reduce the amount of user supervision and automatically extract libraries directly from observed hyperspectral images. Three main general lines of research can be identified in this direction: 1) Extract multiple pure pixels from the observed hyperspectral image to generate a candidate library, and then cluster the extracted signatures into their respective material classes. 2) Generate libraries using radiative transfer models that represent EM variability mathematically. 3) Extract pure pixels while keeping information about their spatial locations, and apply an interpolation algorithm to generate EM signatures for each image pixel. Figure 15 provides an overview of the key ideas underlying each of these approaches, which are reviewed in the following. IMAGE-BASED LIBRARY CONSTRUCTION IMAGE-BASED LIBRARY EXTRACTION Image-based library extraction is characterized by the following: ◗◗ It enables spectral libraries to be extracted with signatures that have the same conditions as the image pixels. ◗◗ It can benefit from expert knowledge to reliably identify pure pixels in the image. ◗◗ It strongly depends on the presence of pure pixels. ◗◗ The observed image should not be too small. ◗◗ Mixed pixels may be included in the library by mistake. ◗◗ Clustering the extracted signatures into their correct material classes is challenging. 248 The simplest approaches for the construction of imagebased spectral libraries are completely supervised. Image pixels are included in the library based on their correlation to some initial EMs manually selected as the extreme points of the PCA of the observed image [97], [265] or simply by manually screening a large number of pure pixels extracted from the image by using expert knowledge about the spectral characteristic of the materials in the scene [264]. Pure pixels were also extracted from multiple hyperspectral images of the same scene acquired at different spatial resolutions to increase the diversity of the resulting spectral library in urban environments [262]. Other work used only partially labeled data to reduce the amount of domain knowledge that is required [266]. Recent strategies attempted to automate this process by extending EEAs for the extraction of multiple signatures of each material in the observed image. The first work in this direction proposed to apply traditional EEAs to random subsets of pixels that are sampled from the hyperspectral image (with or without replacement) [267]. Different sets of EM signatures are generated through this method. All the extracted signatures are then grouped into different sets corresponding to the material classes by using a clustering algorithm (e.g., k-means). The size of the image subsets, however, must be selected with great care for the EEAs to work satisfactorily [187], and the clustering step can be challenging. Later works proposed different strategies for the extraction and selection of multiple pure pixels and EM candidates from the observed image. One simple iterative strategy consists of including in the library all pixels that are within a given spectral distance of reference EMs [268]. This process is performed iteratively, with the reference EMs initialized using a standard EEA and then updated as the mean values of the library signatures at the previous iteration. Besides being very simple, this procedure does not require the library signatures to be clustered afterward. A related strategy worked in a reverse way by iteratively removing pure pixels from a large initial set of candidate signatures to obtain the final spectral library [269]. A pixel candidate is removed if it can be represented with little error as a convex combination of the remaining signatures in the library. A clustering procedure is then performed to group the selected spectra into EM classes. Recent works have proposed more involved empirical approaches to differentiate between spectrally similar materials when extracting and clustering the EM signatures and to remove mixed pixels from the constructed library. For instance, in [270], EM extraction was performed multiple times for different subsets of the spectral bands constructed at multiple spectral scales and intervals. These signatures were then clustered into EM classes based on a metric constructed from features derived by individually applying clustering algorithms to the spectral scales and intervals used previously. A related strategy considered the extraction and clustering of the library signatures based on IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
subsets of the wavelet transform coefficients of the reflectance spectra that are robust to spectral variability [271]. These subsets were selected based on how much their empirical statistical distribution deviated from an uncorrelated Gaussian distribution. The hyperspectral image was also partitioned into spatial segments via a hierarchical clustering algorithm, and only one signature for each spatial segment was considered for inclusion in the library. Another strategy proposed to extract spectral signatures as the image pixels that can best represent all others in the observed image as a sparse linear combination [272]. Afterward, these signatures are grouped into material classes through spectral features derived from the slopes of a piecewise linear approximation of each signature. Auxiliary libraries available a priori were used to aid in the extraction of image-based spectral libraries in [172]. The k-nearest-neighbor algorithm was first employed to classify the image pixels in the different material classes, using library spectra known a priori as training data. This led to a set of candidate EMs for each material class. Based on the classification results, the image-extracted library was defined as the average spectra of the candidate EMs of each class that were contained in a spectral neighborhood of each of the training samples (from its corresponding class) [172]. Another group of approaches makes use of the empirical observation that pure pixels are more likely to be contained in spatially homogeneous regions. Spectral libraries can be constructed by restricting EM candidates to sufficiently homogeneous regions [273], [274] by applying an image oversegmentation strategy before the pure pixel extraction [275] and by considering EM candidates as the average of homogeneous regions obtained from a coarse spatial scale selected from a multiscale image decomposition [276]. These strategies should be applied with care to Physical Model (Expert Knowledge) Observed Image Extract Pure Pixels and Their Spatial Locations Extract a Set of Pure Pixels or Candidate EMs Synthesize EMs for the Other Spatial Locations Cluster the Extracted Signatures (a) (b) e.g., Chlorophyll Select Different Concentration Sets of Physical θ1, θ2,..., θk Parameters f(θ1) f(θ2) f(θk) Synthesize EM Signatures Using the Parametric Model f (θ) (c) Spectral Libraries FIGURE 15. Approaches to generate spectral libraries: (a) image-based library generation using EM extraction (discussed in the “ImageBased Library construction” section), (b) the spatial interpolation of pure pixels extracted from the image at known locations (the “Spatial Interpolation of Endmember Signatures” section), and (c) the generation of synthetic signatures from physics-based models (the “Generating Spectral Libraries From Physics Models” section). DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 249
avoid the inclusion of pixels extracted from mixed homogeneous regions in the library. Some alternatives tried to build spectral libraries by using different forms of the matrix factorization of the hyperspectral image. For instance, spectral libraries for each material class are constructed in [193] by learning sparse representations of sets of pure pixels of each material, which are extracted from the observed image. More precisely, dictionary learning is applied to the pure pixels of each material, from which the resulting basis matrices are used to construct the spectral library. Another approach proposed to extract the spectral library using the results of an SU procedure employing a matrix factorization approach, which does not account for spectral variability [277]. However, besides depending on the results of another SU algorithm, there is no guarantee that the selected signatures are pure pixels. GENERATING SPECTRAL LIBRARIES FROM PHYSICS MODELS PHYSICS-BASED LIBRARY SYNTHESIS Physics-based library synthesis has the following qualities: ◗◗ It can generate libraries independently of the observed image. ◗◗ It can represent a wide range of spectral variability if more complex models are employed. ◗◗ It depends on the availability of an accurate physical model for the EM spectra. An alternative approach to generate spectral libraries that does not depend on the observed hyperspectral image is to employ a physics-based (i.e., radiative transfer) model describing the reflectance of the EMs as a function of physicochemical parameters. This enables us to generate different instances of the material spectra to constitute a synthetic library by sampling the free parameters of the model. Examples of such representation include the PROSPECT model [69] for vegetation and Hapke’s [48] and Shkuratov’s models [196] for packed particle spectra. Different representations inspired by physics have been employed to generate and augment spectral libraries for SU in many applications, which include describing tree canopy as a function of its height and radius [279], fire temperature radiance as a function of the view and solar geometry and atmospheric conditions [280], and soil reflectance as a function of moisture content [73]. This strategy has also been applied to generate training data for the SU of binary mixtures of vegetation and impervious materials through machine learning algorithms [281]–[284]. Note, however, that directly sampling all parameters of complex models, such as PROSPECT, might lead to a very large number of signatures. This has motivated strategies to sample the parametric models more efficiently and to remove redundant spectra from the generated library [285]. Despite their advantages, a significant drawback of these methods is the requirement for accurate knowledge of the physical process governing the observation of the 250 reflectance of the materials by the sensor. A different approach attempted to circumvent this issue by proposing a data augmentation strategy, where one wishes to synthesize additional signatures to be included in small, preexisting libraries [286]. The spectral signatures in the library are used as training data to learn the statistical distribution of the EMs through deep generative models, such as variational autoencoders and deep neural networks. This enables one to sample new signatures from the learned distributions to augment the existing library. SPATIAL INTERPOLATION OF ENDMEMBER SIGNATURES SPATIAL ENDMEMBER INTERPOLATION For spatial EM interpolation, the following is true: ◗◗ The process uses the hypothesis of spatially correlated EM signatures. ◗◗ It needs knowledge of the spatial position of pure pixels in the scene. ◗◗ The number of available pure pixels can strongly affect the performance of the methods. A number of approaches based on the assumption that EMs are spatially correlated have proposed to synthesize pixel-dependent EM signatures based on a set of pure pixels at known spatial locations through interpolation techniques. Many of these works aim to perform the SU of vegetation and soil mixtures by employing vegetation indices (i.e., spectral features given by ratios of band differences, such as the NDVI) in lieu of traditional EMs. For instance, the spatial interpolation of vegetation and soil NDVIs based on linear regression has been considered for the SU of coarse-resolution images, where the training samples for the EMs were obtained using classification maps from complementary high-resolution images available a priori [287]. A similar strategy considered the use of spatially weighted kriging employing as training samples pure pixels that were manually extracted from the scene [288] or obtained by randomly sampling the vertices of the simplex obtained by a low-dimensional projection of the hyperspectral image [289]. This strategy enabled one to weight the contribution of the training samples according to their spatial distance to each interpolated signature. Other works considered the spatial interpolation of actual spectral signatures, instead of just vegetation and soil indices, by using spatially weighted linear regression or kriging. This has been performed by considering training data obtained from complementary high-resolution classification maps [290] and pure pixels extracted from the image inside subregions that were selected with the aid of a classification algorithm [291]. LIBRARY PRUNING TECHNIQUES One significant problem with many SU methods based on spectral libraries, such as MESMA, is that their computational complexity increases quickly with the size of IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
the spectral library. Furthermore, databases containing laboratory-acquired spectra often contain hundreds of different materials. Using a library of this size can actually decrease the performance of SU since the problem becomes more and more ill-posed. One solution consists of removing redundant and irrelevant signatures from large spectral libraries before the SU process. These approaches, which are also called library pruning, have been largely applied to reduce the complexity and improve the accuracy of MESMA [38] and sparse unmixing algorithms [292]. There are three main groups of library pruning techniques. Library reduction approaches remove only redundant signatures to improve the computation time. EM selection techniques identify which materials are present in each hyperspectral image pixel to remove absent EM classes from the library before SU. Same-class library pruning attempts to identify and remove signatures that are acquired in different conditions from those of the observed image. These approaches are reviewed in detail in the following. LIBRARY REDUCTION TECHNIQUES SPECTRAL LIBRARY REDUCTION Spectral library reduction involves the following: ◗◗ It employs very simple strategies that do not depend on the observed hyperspectral image. ◗◗ It reduces computational complexity but does not improve the quality of the SU results. Library reduction techniques attempt to remove redundant spectral signatures, regardless of the observed hyperspectral image, which tends to improve the computational complexity of SU but not necessarily its quality. A common idea is to find a small set of signatures that can best represent the remaining spectra of the same EM class in some sense [293], such as the squared error [38], average spectral angle [89], and count-based EM selection metric, where one counts the number of signatures that one candidate can represent with an error below a threshold [93]. An alternative method divided the library signatures into groups according to their Euclidean norm, selecting one signature from each group to explicitly account for brightness variations [294]. EM selection techniques attempt to identify which EM classes are present in each pixel by using information, such as classification maps [10], [295], to remove absent materials from the library and improve the unmixing results [10], [295]. This relies on the observation that hyperspectral image pixels usually contain only a small number of materials, and it has also been applied to SU without considering spectral variability [296], [297]. The simplest EM selection methods use classification algorithms to select the EM classes present in mixed pixels [298], [299]. Another work employed a block sparse unmixing algorithm as a preprocessing step to remove material classes with low abundance values from the library for each image pixel before applying the MESMA algorithm to obtain the final SU results [300]. A more elaborate approach proposed to semantically organize subsets of material classes in a hierarchical tree, starting from a rough (e.g., pervious and impervious) differentiation and progressing to a fine one between the EMs (e.g., different vegetation species) [59]. Afterward, SU was performed at each level of the tree, using the abundance results in the previous, coarser level to constrain which EMs could be selected at the current one (i.e., a pixel containing only a pervious EM in the coarse scale cannot have a concrete EM in the finer one). Some recent approaches have proposed to use external complementary data to aid in identifying which materials are present in each pixel. For instance, in [301], the hyperspectral image was divided into rural and urban subsets using external data about road network density, which enabled the use of a separate set of EM classes for each of the subsets. Another work proposed to use additional lidar data to remove material classes from the library of each pixel, based on its height distribution (e.g., a “tree” or “building” EM can be removed from a pixel that has low height) [302]. PRUNING LIBRARIES WITHIN THE SAME CLASS SAME-CLASS ENDMEMBER PRUNING This process has the following qualities: ◗◗ It can remove spectral signatures from each material class that are not representative of the observed hyperspectral image. ◗◗ It can improve the SU quality and reduce the computational cost, even for libraries with few material classes. ◗◗ Identifying which signatures in the spectral library do not share acquisition conditions with the observed image is generally difficult. ENDMEMBER SELECTION METHODS ENDMEMBER SELECTION For EM selection, the following must be considered: ◗◗ It can remove only entire material classes from the library for each pixel; it is also effective for variability-free SU. ◗◗ It can leverage information from the observed hyperspectral image. ◗◗ It can improve SU quality and reduce computational complexity. ◗◗ It usually depends on some sort of classification procedure. ◗◗ It relies on the observation that only a few materials are normally contained in each pixel. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE Recent approaches proposed to remove signatures from the library that were acquired in conditions that differed from those of the hyperspectral image, keeping only signatures that were representative of the observed image. However, measuring the representativeness of EM signatures is a difficult task. A simple approach suggested removing signatures that have a large spectral angle and spectral L 1 distance relative to the observed pixels [303]. However, this strategy might discard relevant signatures in the presence of many mixed pixels. Another work proposed to compare only pure pixels extracted from the image with the library spectra in the wavelet domain 251
[304]. A different method was to remove library elements that have large distances to a small set of the leading eigenvectors of the observed hyperspectral image and are thus unlikely to be present therein [305]. This strategy eliminates the direct need for pure pixels in the scene. It has also been successfully applied for plant production system monitoring [306], and it was later extended to consider a brightness normalization preprocessing step and other strategies from the “Library Reduction Techniques” section to additionally remove redundant spectra [295]. Another work sought to perform library pruning iteratively in a sparse unmixing formulation by removing signatures corresponding to low abundance values during the SU process [307]. However, this directly depends on the accuracy of the SU process at the first iterations. EXPERIMENTAL EVALUATION This section presents a brief discussion about the experimental evaluation of the unmixing algorithms when spectral variability is considered. We first explore the generation of synthetic data in detail. Afterward, software packages that can be useful to practitioners are presented. Finally, a tutorial-style simulation is presented to demonstrate the use of a few of the SU techniques reviewed in this article, after selection using the decision tree in Figure 2. GENERATING SYNTHETIC DATA One challenge in the evaluation of unmixing methods is the lack of reliable ground truth data for the abundances of real hyperspectral images. The difficulty in collecting ground truth data is even more pronounced when EM variability is considered. Thus, being able to generate realistic synthetic data (for which the true abundances are available) turns out to be important to facilitate a quantitative evaluation of SU algorithms. The production of synthetic data can be roughly divided into the following three steps: 1) generating synthetic abundances 2) generating EM signatures for each pixel in the image 3) applying the mixing model of choice (in our case, the LMM) to generate the mixed image pixels. We discuss each in the following. GENERATING SYNTHETIC ABUNDANCE DATA The generation of synthetic abundance maps can be performed in different ways. A simple strategy is to sample the abundance values randomly from a Dirichlet distribution. This approach enables one to control the number of pure pixels in the image and can be useful when performing Monte Carlo simulations in which large amounts of data must be created. Another technique consists of introducing spatial contextual information (i.e., pixels that are close in space tend to have similar abundance values) into the generated abundances to create more realistic data. Such data can be produced using, for instance, piecewise smooth images sampled from a Gaussian random field [308]. This method can generate images containing smooth regions, sharp transitions, and fine details and whose spatial composition and 252 regularity characteristics can be controlled by the user [308]. One software tool that can be used to generate abundance maps according to Gaussian random fields is the Hyperspectral Imagery Synthesis tool for MATLAB (http://www .ehu.eus/ccwintco/uploads/f/fb/Synthesis.zip). Another way to obtain realistic synthetic abundance maps is to consider abundances obtained by applying an existing SU algorithm on a real hyperspectral image [309]. The resulting abundance maps will have a realistic spatial distribution and can be used as ground truth to generate new synthetic data sets. GENERATING SYNTHETIC ENDMEMBER VARIABILITY Generating realistic EM variability data is not a simple task since, as explained, the spectral signatures of the materials present a complex dependence of different physicochemical and environmental parameters. Fortunately, very accurate radiative transfer models have been developed for many applications. Such representations describe the physical processes governing, e.g., vegetation spectra [68], mineral interactions [48], [196], and atmospheric effects [310]. Well-calibrated radiative transfer models can be used to generate realistic simulated image scenes that enable one to simultaneously study nonlinear mixtures and EM variability effects. Experimental studies have found that the data simulated using such models show very strong agreement with reference ground truth information collected under the same circumstances using ground-based spectral measurement setups [311], [312]. This approach was employed to evaluate nonlinear unmixing models in [313]. Thus, wellcalibrated radiative transfer models can be used to generate realistic simulated hyperspectral data that enable us to develop, optimize, test, and compare different SU techniques that consider EM variability. Although complex ray tracing simulations can be considered (e.g., [48], [68], [196], [310], and [314]), here, we present simplified models for illustrative purposes; the models describe variability present in vegetation spectra and caused by different viewing geometries. The first is PROSPECT-D [76], which represents vegetation leaf spectra as a function of, e.g., the chlorophyll and dry matter content and the equivalent water thickness. PROSPECT-D and related models for vegetation spectra can be downloaded for different software platforms (http://teledetection.ipgp.jussieu.fr/prosail/) (including MATLAB and Python). We also consider a simplification of Hapke’s model [48] by assuming a Lambertian (isotropic) scattering and a densely packed medium. This simplified representation describes variations in the reflectance spectra of a material y sensor (at each wavelength) as a function of the viewing geometry [50], [52]: y sensor = ~ , (20) (1 + 2n 1 1 - ~ )(1 + 2n 2 1 - ~ ) where ~ is the single scattering albedo of the material, and n 1 (respectively, n 2) is the cosine of the angle between the incoming (resp. outgoing) radiation and the normal to the surface. This model enables us to generate different EM spectra IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
by varying the values of n 1 and n 2. While (20) is approximately linear for small albedo values, important nonlinearities occur for large albedo values [52]. Additionally, we explore variability introduced by errors occurring in a simple atmospheric compensation model, where the reflectance of each pixel at each wavelength is obtained by dividing the corresponding pixel’s radiance by the radiance observed at a perfectly reflective calibration panel [271, Sec. 5(a)(1)], [315]. Assuming full visibility and that the adjacency effect is negligible, this model is given by E sun - gr n 1 + E sky y sensor = y s E , (21) sun - gr n 2 + E sky where y s and y sensor denote the reflectance at the ground and the sensor, E sun-gr represents the solar radiance observed at the ground level, and E sky is the skylight. Parameters n 1 and n 2 are the cosines of the angles between the surface normal and the direction of the sun at each pixel and at the calibration panel, respectively. By fixing n 2, E sun-gr, and E sky a priori, n 1 can be varied to simulate spectral signatures at different viewing geometries. GENERATING THE MIXED PIXELS Finally, each pixel can be generated according to the spectral variability-accommodating LMM described in (2), with the EM for each pixel (M n columns) sampled randomly from the set of synthetically generated signatures. Additive noise can be introduced to obtain a desired signal-to-noise ratio. SOFTWARE RESOURCES Several software packages are available to perform SU with spectral variability. Classical techniques, such as MESMA and some of its alternatives (including library pruning and transformation methods), can be found in the Visualization and Image Processing for Environmental Research tools software package [316], which is available as a plug-in for well-established software, such as ENVI and the Quantum Geographic Information System. An implementation of the MESMA algorithm is also available in R in the remote sensing toolbox. Algorithms that were developed more recently, on the other hand, are usually available only as stand-alone prototypes implemented in MATLAB and Python. A list of software packages for some of the papers reviewed in this work (most of which are found at the authors’ websites) is in Table 3. Also, TABLE 3. COMPUTATIONAL CODE CONTAINING IMPLEMENTATIONS OF SOME OF THE WORKS REVIEWED IN THIS ARTICLE. METHOD LINK LANGUAGE https://sites.google.com/site/robheylenresearch/code/AAM .zip?attredirects=0&d=1 MATLAB METHODS THAT USE SPECTRAL LIBRARIES MESMA [37], AAM [87] SUnSAL [103], SUnSAL-TV [112] http://www.lx.it.pt/~bioucas/publications.html MATLAB Sparse SU with mixed norms [84] https://openremotesensing.net/knowledgebase/hyperspectral-image-unmixing -with-endmember-bundles-and-group-sparsity-inducing-mixed-norms/ MATLAB BAYESIAN METHODS BCM [255] https://github.com/GatorSense/BetaCompositionalModel MATLAB NCM-E (NCM by Eches et al.) [239] http://olivier.eches.free.fr/research.html MATLAB UsGNCM [242] https://sites.google.com/site/abderrahimhalimi/publications MATLAB Bayesian OU [254] https://pthouvenin.github.io/robust-unmixing-plmm/ MATLAB PCOMMEND [247] https://github.com/GatorSense/PCOMMEND MATLAB GMM [256] https://github.com/zhouyuanzxcv/Hyperspectral MATLAB https://openremotesensing.net/knowledgebase/spectral-variability-and -extended-linear-mixing-model/ MATLAB PARAMETRIC EM MODELS ELMM [51] PLMM [217] https://pthouvenin.github.io/unmixing-plmm/ MATLAB GLMM [212] https://github.com/talesimbiriba/GLMM MATLAB DeepGUn [222] https://github.com/ricardoborsoi/Unmixing_with_Deep_Generative_Models MATLAB MUA-SV [223] https://github.com/ricardoborsoi/DataDependentSUvarRelease MATLAB OU [221] https://pthouvenin.github.io/online-unmixing-plmm/ MATLAB EM-MODEL-FREE METHODS RUSAL [227] https://sites.google.com/site/abderrahimhalimi/publications MATLAB SULoRa [229] https://sites.google.com/view/danfeng-hong/data-code MATLAB ALMM [228] https://openremotesensing.net/knowledgebase/an-augmented-linear-mixing -model-to-address-spectral-variability-for-hyperspectral-unmixing/ MATLAB ULTRA-V [230] https://github.com/talesimbiriba/ULTRA-V MATLAB The code was provided by its respective authors. AAM: alternating angle minimization; SUnSAL: sparse unmixing via variable splitting and augmented Lagrangian; TV: total variation; BCM: beta compositional model; NCM-E: NCM published by Eches et al.; UsGNCM: unsupervised generalized NCM; OU: online unmixing; PCOMMEND: piecewise convex multiple-model EM detection and SU; GMM: Gaussian mixture model; DeepGUn: deep generative unmixing; MUA-SV: multiscale unmixing algorithm accounting for spectral variability; RUSAL: robust unmixing by variable splitting and augmented Lagrangian; SULoRa: subspace unmixing with low-rank attribute embedding; ALMM: augmented LMM; ULTRA-V: unmixing with low-rank tensor regularization algorithm. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 253
the OpenRemoteSensing website, which aims to share and disseminate code and papers, has an increasing number of SU methods, some of which consider spectral variability. EXPERIMENTAL SETUP AND RESULTS We now present a simulation to illustrate the application of some of the algorithms reviewed in this work. Note that this is merely illustrative and not a comprehensive performance evaluation. We generated a synthetic hyperspectral image containing vegetation, soil, and water as constituent materials. Spatially correlated abundances with 50 # 50 pixels were first sampled from a Gaussian random field. Then, we followed the procedure described in the “Generating Synthetic Endmember Variability” section to generate different EM spectra for each material in the scene. The PROSPECT-D model was used to produce vegetation spectra, while the simplified Hapke and atmospheric models were used to create dirt and water spectra, respectively, at different viewing geometries. The generated synthetic signatures, containing L = 198 bands, can be seen in Figure 16. The EMs contained in each pixel were randomly sampled from this set of synthesized signatures, and the pixel spectra were then generated following the LMM with variability in (2), with white Gaussian noise added to the image to obtain a signal-to-noise ratio of 30 dB. To evaluate the SU results, we considered as quantitative quality measures the root-mean-square error (RMSE) and the SAM. The RMSE between two generic variables X and Xt is defined as RMSE X = 1 t 2 N X X - X F , (22) where $ F is the Frobenius norm and N X denotes the number of elements in X. We used the RMSE to evaluate the estimated abundances At , the reconstructed images Yt , t n (for the cases and the estimated pixel-dependent EMs M when this estimate was available). The SAM was also used to evaluate the estimated EMs as 0.6 ALGORITHM SELECTION AND SETUP For illustrative purposes, we considered the recovery of the abundance maps following four different paths in the decision tree in Figure 2 (selected according to the algorithm implementations available in Table 3). These included the following: 1) small spectral libraries extracted directly from the image, with no expert knowledge available •• less user supervision: MESMA and its variants [37] •• lower computational cost: sparse unmixing (fractional sparse SU [84]) 2) spectral libraries not available a priori •• less user supervision: Bayesian methods [NCM published by Eches et al. (NCM-E) [239] and beta compositional model (BCM) [255], [317]] •• lower computational cost: parametric models [ELMM [51] and deep generative unmixing (DeepGUn) [222]] and EM-model-free methods [robust unmixing by variable splitting and augmented Lagrangian (RUSAL) [227]]. We additionally considered the FCLS solution as a baseline, using a single set of EMs extracted from the image using the vertex component analysis (VCA) algorithm [278]. The EMs extracted by VCA were also used as initialization and reference/mean signatures for some of the algorithms (ELMM, DeepGUn, RUSAL, and NCM-E). For MESMA and sparse SU, the spectral libraries were extracted from the 0.15 0.8 0.7 0.3 0.2 0.6 Reflectance 0.4 Reflectance Reflectance where N is the number of pixels and P the number of materials in the hyperspectral image. We assessed the complexity of the algorithms through their execution times, measured on an Intel Core i7 processor with 4.2 GHz and 16 Gb of random-access memory. Finally, to increase the reliability of the results, we executed the simulation for 10 independent Monte Carlo realizations and reported the average values for all metrics. 0.9 0.5 0.5 0.4 0.3 0.1 0.05 0.2 0.1 0 N P t p, n m <p, n m 1 o, (23) SAM M = LPN / / arccos e t p, n m p, n m n=1 p=1 0.1 0.5 1 1.5 2 Wavelength (µm) (a) 0 0.5 1 1.5 2 Wavelength (µm) (b) 0 0.5 1 1.5 2 Wavelength (µm) (c) FIGURE 16. Generated spectral signatures used in the synthetic hyperspectral image. (a) Vegetation. (b) Soil. (c) Water. 254 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
observed image, as described in greater detail in the following section. The spectral libraries were also used to estimate the parameters of the beta distribution in the BCM. The regularization/tuning parameters of the algorithms (fractional sparse SU, ELMM, DeepGUn, and RUSAL) were manually adjusted to maximize the abundance reconstruction performance measured in an independent data set generated following the specifications at the beginning of the “Experimental Setup and Results” section. LIBRARY EXTRACTION To demonstrate the use of library-based SU methods in practical settings, the spectral libraries used by MESMA and fractional sparse SU were extracted directly from the observed image. We used the method described in [267], which consists of performing EM extraction (in this case, using the VCA algorithm) in subsets of pixels randomly sampled from the image. We extracted five sets of EMs, using subsets of 500 pixels each (sampled with replacement). The library was kept small to prevent the inclusion of redundant signatures and reduce the probability of selecting mixed pixels by mistake. As a byproduct, this kept the computational complexity of methods such as MESMA very low, while providing good experimental results. The estimated signatures can be viewed in Figure 17. Although the spectral variability in Figure 17 is less accentuated than that of the true EMs in Figure 16, the estimated signatures are good representatives of the materials in the scene. The good performance of the library extraction method can be explained by the presence of multiple pure pixels in the synthetically generated abundance maps, as evident in the first row of Figure 18. 0.9 0.4 0.7 0.3 0.2 0.15 Reflectance 0.5 Reflectance Reflectance DISCUSSION The quantitative results are provided in Table 4, while the estimated abundance maps and EMs are depicted in Figures 18 and 19, respectively. Note that the RMSE M and SAM M are not available for FCLS, RUSAL, NCM-E, and BCM since these algorithms do not estimate the spectral signatures of the EMs present in each pixel of the image. All methods that considered spectral variability led to better abundance reconstruction results than the FCLS baseline. In particular, the library-based methods (MESMA and fractionalbased sparse SU) had very good performance, which likely occurred due to the image-extracted spectral library accurately representing the typical EM variability contained in the scene. Moreover, sparse SU with fractional norms performed similarly and slightly better than MESMA. The methods based on parametric EM models (ELMM and DeepGUn) also led to considerable improvements when compared to FCLS, especially considering that the EMs were estimated directly from the image. The EMmodel-free method (RUSAL), which takes general variability and incorrect models into account, also provided an improvement over FCLS, albeit smaller when compared to ELMM and DeepGUn. However, the sensitivity of these techniques to the selection of the regularization parameters can negatively impact their performance when Monte Carlo simulations are considered. Among the Bayesian methods (NCM-E and BCM), BCM provided a considerable performance improvement over FCLS, especially when taking into account the unsupervised nature of the method (i.e., no parameter has to be adjusted). The NCM-E results, on the other hand, were virtually identical to those of FCLS, which indicates that the isotropic Gaussian EM hypothesis may not be appropriate for this data set. The performance of the different methods can be visually distinguished in Figure 18, especially from the soil EM, in which the similarity between the reconstructions and the reference abundance maps reflects the general behavior of the quantitative results from Table 4. The EM reconstruction metrics in Table 4 indicate that the EMs selected by library-based approaches (MESMA and fractional sparse SU) are close to the reference ones, especially in terms of SAM M, while the model-based approaches (ELMM and DeepGUn) provided slightly worse results, 0.5 0.3 0.1 0.05 0.1 0.1 0 0.5 1 1.5 2 Wavelength (µm) (a) 0 0.5 1 1.5 2 Wavelength (µm) (b) 0 0.5 1 1.5 2 Wavelength (µm) (c) FIGURE 17. EM bundles extracted by batch VCA [267] for (a) vegetation, (b) soil, and (c) water. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 255
Soil Water Reference Vegetation 1 0.5 FCLS 0 1 0.5 MESMA 0.5 Fractional 0 1 0.5 0 1 ELMM 0.5 DeepGUn 0 1 0.5 0 1 RUSAL 0.5 NCM-E 0.5 BCM 0 1 0.5 0 1 0 1 0 FIGURE 18. Abundance maps estimated by the algorithms [values are mapped to colors ranging from blue (a = 0) to red (a = 1)]. TABLE 4. QUANTITATIVE SIMULATION RESULTS. RMSE A RMSE M SAM M RMSEY TIME (S) FCLS 9.899 — — 0.239 0.37 MESMA 6.083 0.504 0.234 0.159 4.9 Fractional 5.993 0.525 0.232 0.159 3.41 ELMM 8.695 0.697 0.56 0.127 28.84 DeepGUn 7.203 0.447 0.395 0.324 80.42 RUSAL 9.509 — — 0.108 1.05 NCM-E 9.897 — — 0.239 2,482.85 BCM 8.105 — — 0.472 468.69 The RMSE results are multiplied by 104. 256 in general, except for DeepGUn’s RMSE M. The visual assessment of the estimated signatures in Figure 19 shows an interesting pattern, since, despite the quantitative metrics, the amount of variability (i.e., the variance) estimated by the ELMM seems closer to the reference spectra. This shows that identifying the correct spectral signatures in each pixel is very difficult. We also note that smaller image REs RMSE Y did not correlate very well with better abundance estimation results. Since some SU methods that take spectral variability into account adopt flexible models, they can represent the hyperspectral image pixels in Y very closely without necessarily improving the abundance estimation. The execution times present a considerable difference among the methods. Library-based approaches were able to run very fast (even for MESMA) since the spectral library contained few signatures. This demonstrates that the construction of the library can significantly impact the runtime performance of these techniques. The methods based on parametric models (ELMM and DeepGUn) had intermediate execution times, while RUSAL was very fast. Finally, Bayesian methods took the longest to run, with NCM-E proceeding significantly slower than the other techniques. Finally, we note that this example is merely illustrative and not an in-depth evaluation of these methods. Thus, their performance can be different for other data sets and scenarios. DISCUSSION, CONCLUSIONS, AND FUTURE DIRECTIONS Significant advances have been made to mitigate spectral variability in SU during the past decade, encompassing experimental and theoretical contributions. Recent work has, for instance, enabled spectral libraries to be directly extracted from observed hyperspectral images, provided more accurate and flexible models to represent the EMs (e.g., in statistical and parametric methods), and included different kinds of a priori external information to alleviate the ill-posedness of the problem, such as the locally correlated characteristics of the EMs and abundances. This was performed explicitly, by means of regularization approaches and in the definition of the statistical models, as well as implicitly in the design of the algorithms (e.g., in local SU). IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
Other methods leveraged the spectral characteristics of EM variability to design improved algorithms (e.g., in spectral transformations and robust SU methods). However, there is still a noticeable dependence between the quality of the unmixing solutions and the necessary amount of user supervision in the algorithms. Many recent techniques need considerable tuning to reach their full potential, with a significant portion of algorithm design being left to the user. The lack of more extensive data with reliable ground truths has also made the evaluation of the algorithms somewhat difficult. In the following, we detail some aspects that we think deserve further consideration: Vegetation ◗◗ As discussed, one important research direction is to im- prove the robustness of the methods to the selection of their parameters and to develop informed adjustment methodologies. This could be performed, for instance, by leveraging metadata (e.g., external classification maps) that are available in many applications. This point applies to the majority of SU algorithms reviewed in this article and would make those methods more readily employable as out-of-the-box solutions in practical scenarios. ◗◗ Most SU algorithms that address spectral variability depend strongly on spectral libraries and reference EM Soil Water Reference 1 0.15 0.4 0.1 0.5 0.2 0.05 0 0 0.15 0 1 MESMA 0.4 0.1 0.5 0.2 0.05 0 1 0 0 0.15 Fractional 0.4 0.1 0.5 0.2 0.05 0 0.6 0 0 0.15 1 DeepGUn ELMM 0.4 0.1 0.5 0.2 0 0 0.4 0.6 0.05 0 0.2 0.4 0.2 0.1 0.2 0 0.5 1 1.5 2 Wavelength (µm) 0 0.5 1 1.5 2 Wavelength (µm) 0 0.5 1 1.5 2 Wavelength (µm) FIGURE 19. Spectral signatures returned by the algorithms that estimate the EM spectra for each image pixel. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 257
◗◗ ◗◗ ◗◗ ◗◗ ◗◗ ◗◗ 258 signatures known a priori or extracted from the observed hyperspectral image. Improving the robustness of these methods to the selection of these data is important to guarantee a more reliable SU performance in practice. The vast majority of the work reviewed in this article uses the LMM to describe the interaction between incident light and the materials in the scene, even though nonlinear mixtures are common in many applications [207]. However, as shown in [211], a general nonlinear mixture model is closely related to a spatially varying version of the LMM, which indicates that linear unmixing with spectral variability is able to address the nonlinear unmixing problem to some extent. Nevertheless, the relationship between these two models deserves to be further investigated. In particular, deciding whether variations in the observed pixel spectra originate from spectral variability, nonlinear interactions, or slight abundance variations can be very difficult. An aspect that induces difficulties to the evaluation of SU methods is the lack of more extensive data with reliable ground truth. However, there is no clear approach to reliably collect ground truth for abundance values. This problem is more pronounced when spectral variability is considered. There is not a clearly agreed-upon protocol to generate realistic synthetic data. A larger, publicly available data set would strengthen the validation of the methods. Although many techniques have been proposed to model spectral variability, there is still a distinction between restrained models inspired by specific, concrete applications and mathematically flexible ones that aim for a more generic representation. Combining insight from practical applications with a mathematically thorough treatment may lead to improved ways to represent spectral variability in a given scene. Many of the methods discussed here rely, explicitly or implicitly, on the solution to complex, nonconvex optimization problems that are often solved only approximately to achieve a computationally tractable algorithm. Investigating the use of more reliable approaches to solve those problems can help to evaluate the potential accuracy of the models by reducing the influence from the use of such approximations. Many algorithms (such as, e.g., MESMA and some statistical approaches) are computationally expensive and do not scale very well for large images. Considering the large amount of data currently in need of processing, it is important to have fast alternatives to solve this problem. Traditional SU can be readily interpreted as a nonnegative matrix factorization problem. This enables us to understand many of the limitations of the SU problem as well as to identify conditions under which it can be solved exactly. However, such understanding is generally not available when EM variability is considered, except for the particular case of illumination-based spectral variability [216]. A deeper theoretical insight would be valuable to clearly define limiting conditions under which this problem can or cannot be solved. Initially motivated by Earth observation applications, spectral variability is now considered one of the main challenges of SU. Although we have already seen a wealth of contributions from application- and theoretically oriented researchers, it is expected that the further exchange of ideas between these two areas will help to advance the field even further. ACKNOWLEDGMENT This work was supported in part by the Brazilian National Council for Scientific and Technological Development. AUTHOR INFORMATION Ricardo Augusto Borsoi (raborsoi@gmail.com) received his doctorate degree. He is with the Federal University of Santa Catarina, Florianópolis, SC, 88040-900, Brazil. He is a Student Member of IEEE. Tales Imbiriba (talesim@gmail.com) received his doctorate degree. He is a research scientist at Northeastern University, Boston, Massachusetts, 02115, USA. José Carlos Moreira Bermudez (jbermudez@ieee.org) received his Ph.D. degree in electrical engineering. He is a professor at the Federal University of Santa Catarina, Florianópolis, SC, 88040-900, Brazil. He is a Senior Member of IEEE. Cédric Richard (cedric.richard@unice.fr) received his Ph.D. degree. He is a full professor with the Laboratoire Lagrange, UMR CNRS 7293, Université Côte d’Azur, 06108 Nice CEDEX 2, France. He is a Senior Member of IEEE. Jocelyn Chanussot (jocelyn.chanussot@grenoble-inp.fr) received his Ph.D. degree. He is a professor at the University Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, Grenoble, 38000, France. He is a Fellow of IEEE. Lucas Drumetz (lucas.drumetz@imt-atlantique.fr) received his Ph.D. degree from the University of Grenoble Alpes, in 2016. He is an associate professor at IMT Atlantique, UMR CNRS 6285 LabSTICC, Brest, 29238, France. He is a Member of IEEE. Jean-Yves Tourneret (jean-yves.tourneret@enseeiht. fr) received his Ph.D. degree. He is a professor at the Institut National Polytechnique of Toulouse, Toulouse, France. He is a Fellow of IEEE. Alina Zare (azare@ece.ufl.edu) received his Ph.D. degree. He is a professor at the University of Florida, Gainesville, Florida, 32611, USA. He is a Senior Member of IEEE. Christian Jutten (christian.jutten@grenoble-inp.fr) received his Ph.D., in 1981, and his doctorate in physical sciences, in 1987. He is an emeritus professor at the University Grenoble Alpes, GIPSA-lab, Grenoble, 38400, France. He is a Fellow of IEEE. REFERENCES [1] T. Kouyama, Y. Yokota, Y. Ishihara, R. Nakamura, S. Yamamoto, and T. Matsunaga, “Development of an application scheme IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
[2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] for the SELENE/SP lunar reflectance model for radiometric calibration of hyperspectral and multispectral sensors,” Planetary Space Sci., vol. 124, pp. 76–83, May 2016. doi: 10.1016/j. pss.2016.02.003. J. M. Bioucas-Dias, A. Plaza, G. Camps-Valls, P. Scheunders, N. Nasrabadi, and J. Chanussot, “Hyperspectral remote sensing data analysis and future challenges,” IEEE Geosci. Remote Sens. Mag. (replaces Newslett.), vol. 1, no. 2, pp. 6–36, 2013. doi: 10.1109/MGRS.2013.2244672. D. Manolakis, “Detection algorithms for hyperspectral imaging applications,” IEEE Signal Process. Mag., vol. 19, no. 1, pp. 29–43, 2002. doi: 10.1109/79.974724. G. Lu and B. Fei, “Medical hyperspectral imaging: A review,” J. Biomed. Optics, vol. 19, no. 1, p. 010901, 2014. doi: 10.1117/1. JBO.19.1.010901. G. A. Shaw and H-h. K. Burke, “Spectral imaging for remote sensing,” Lincoln Laboratory J., vol. 14, no. 1, pp. 3–28, 2003. N. Keshava and J. F. Mustard, “Spectral unmixing,” IEEE Signal Process. Mag., vol. 19, no. 1, pp. 44–57, 2002. doi: 10.1109/79.974727. J. M. Bioucas-Dias, A. Plaza, N. Dobigeon, M. Parente, Q. Du, P. G, and, and J. Chanussot, “Hyperspectral unmixing overview: Geometrical, statistical, and sparse regressionbased approaches,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 5, no. 2, pp. 354–379, 2012. doi: 10.1109/ JSTARS.2012.2194696. A. Zare and K. C. Ho, “Endmember variability in hyperspectral analysis: Addressing spectral variability during spectral unmixing,” IEEE Signal Process. Mag., vol. 31, no. 1, pp. 95–104, Jan. 2014. doi: 10.1109/MSP.2013.2279177. B. Somers, G. P. Asner, L. Tits, and P. Coppin, “Endmember variability in spectral mixture analysis: A review,” Remote Sens. Environ., vol. 115, no. 7, pp. 1603–1616, 2011. doi: 10.1016/j. rse.2011.03.003. F. García-Haro, S. Sommer, and T. Kemper, “A new tool for variable multiple endmember spectral mixture analysis (VMESMA),” Int. J. Remote Sens., vol. 26, no. 10, pp. 2135–2162, 2005. doi: 10.1080/01431160512331337817. L. Drumetz, J. Chanussot, and C. Jutten, “Variability of the endmembers in spectral unmixing: Recent advances,” in Proc. 8th IEEE Workshop on Hyperspectral Image Signal Process.: Evolution Remote Sens., Los Angeles, CA, Aug. 2016, pp. 1–5. L. Drumetz, J. Chanussot, and C. Jutten, “Variability of the endmembers in spectral unmixing,” in Hyperspectral Imaging (Data Handling in Science and Technology), J. M. Amigo, Ed. Amsterdam, The Netherlands: Elsevier, 2020, ch. 2.7, vol. 32, pp. 167–203. R. A. Borsoi et al., A Complete Toolbox for Spectral Unmixing with Spectral Variability. (version 1.0). Zenodo. [Online]. Available: http://doi.org/10.5281/zenodo.4659311 J. Theiler, A. Ziemann, S. Matteoli, and M. Diani, “Spectral variability of remotely sensed target materials: Causes, models, and strategies for mitigation and robust exploitation,” IEEE Geosci. Remote Sens. Mag. (replaces Newslett.), vol. 7, no. 2, pp. 8–30, 2019. doi: 10.1109/MGRS.2019.2890997. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE [15] M. K. Griffin, and H-h. K. Burke, “Compensation of hyperspectral data for atmospheric effects,” Lincoln Lab. J., vol. 14, no. 1, pp. 29–54, 2003. [16] B.-C. Gao, M. J. Montes, C. O. Davis, and A. F. Goetz, “Atmospheric correction algorithms for hyperspectral remote sensing data of land and ocean,” Remote Sens. Environ., vol. 113, pp. S17–S24, Sept. 2009. doi: 10.1016/j.rse.2007.12.015. [17] G. Healey and D. Slater, “Models and methods for automated material identification in hyperspectral imagery acquired under unknown illumination and atmospheric conditions,” IEEE Trans. Geosci. Remote Sens., vol. 37, no. 6, pp. 2706–2717, 1999. doi: 10.1109/36.803418. [18] J. M. P. Nascimento and J. M. B. Dias, “Does independent component analysis play a role in unmixing hyperspectral data?” IEEE Trans. Geosci. Remote Sens., vol. 43, no. 1, pp. 175–187, Jan. 2005. doi: 10.1109/TGRS.2004.839806. [19] M. W. Matthew et al., “Atmospheric correction of spectral imagery: Evaluation of the FLAASH algorithm with AVIRIS data,” in Proc. 31st Appl. Imagery Pattern Recogn. Workshop, Washington, D.C., 2002, pp. 157–163. [20] I. C. Lau, “Application of atmospheric correction to hyperspectral data: Comparisons of different techniques on Hymap data,” in Proc. 12th Australasian Remote Sens. Photogrammetry Conf. (ARSPC), Freemantle, Australia, 2004, pp. 1–15. [21] C. Song, C. E. Woodcock, K. C. Seto, M. P. Lenney, and S. A. Macomber, “Classification and change detection using landsat TM data: When and how to correct atmospheric effects?,” Remote Sens. Environ., vol. 75, no. 2, pp. 230–244, 2001. doi: 10.1016/ S0034-4257(00)00169-3. [22] M. K. Griffin, H. Burke, J. Vail, S. Adler-Golden, and M. Matthew, “Sensitivity of atmospheric compensation model retrievals to input parameter specification,” in Proc. AVIRIS Earth Sci. Appl. Workshop, Pasadena, CA, 1999, pp. 99–17. [23] R. Wilson, E. Milton, and J. M. Nield, “Spatial variability of the atmosphere over southern England, and its effect on scene-based atmospheric corrections,” Int. J. Remote Sens., vol. 35, no. 13, pp. 5198–5218, 2014. doi: 10.1080/ 01431161.2014.939781. [24] N. Bhatia, M.-D. Iordache, A. Stein, I. Reusen, and V. A. Tolpekin, “Propagation of uncertainty in atmospheric parameters to hyperspectral unmixing,” Remote Sens. Environ., vol. 204, pp. 472–484, Jan. 2018. doi: 10.1016/j.rse.2017.10.008. [25] C. Bassani, C. Manzo, F. Braga, M. Bresciani, C. Giardino, and L. Alberotanza, “The impact of the microphysical properties of aerosol on the atmospheric correction of hyperspectral data in coastal waters,” Atmos. Measure. Techn., vol. 8, no. 3, pp. 1593– 1604, 2015. doi: 10.5194/amt-8-1593-2015. [26] Y. J. Kaufman, G. P. Gobbi, and I. Koren, “Aerosol climatology using a tunable spectral variability cloud screening of AERONET data,” Geophys. Res. Lett., vol. 33, no. 7, 2006. doi: 10.1029/2005GL025478. [27] D. Schläpfer, A. Hueni, and R. Richter, “Cast shadow detection to quantify the aerosol optical thickness for atmospheric correction of high spatial resolution optical imagery,” Remote Sens., vol. 10, no. 2, p. 200, 2018. doi: 10.3390/ rs10020200. 259
[28] Y. Kaufman and B. Holben, “Calibration of the AVHRR visible and near-IR bands by atmospheric scattering, ocean glint and desert reflection,” Int. J. Remote Sens., vol. 14, no. 1, pp. 21–52, 1993. doi: 10.1080/01431169308904320. [29] N. F. Larsen and K. Stamnes, “Use of shadows to retrieve water vapor in hazy atmospheres,” Appl. Opt., vol. 44, no. 32, pp. 6986–6994, 2005. doi: 10.1364/AO.44.006986. [30] L. Markelin et al., “Atmospheric correction performance of hyperspectral airborne imagery over a small eutrophic lake under changing cloud cover,” Remote Sens., vol. 9, no. 1, p. 2, 2016. doi: 10.3390/rs9010002. [31] K. Staenz, J. Secker, B.-C. Gao, C. Davis, and C. Nadeau, “Radiative transfer codes applied to hyperspectral data for the retrieval of surface reflectance,” ISPRS J. Photogrammetry Remote Sens., vol. 57, no. 3, pp. 194–203, 2002. doi: 10.1016/S09242716(02)00121-1. [32] R. J. Murphy, S. T. Monteiro, and S. Schneider, “Evaluating classification techniques for mapping vertical geology using field-based hyperspectral sensors,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 8, pp. 3066–3080, 2012. doi: 10.1109/ TGRS.2011.2178419. [33] A. Plaza, P. Martínez, R. Pérez, and J. Plaza, “A quantitative and comparative analysis of endmember extraction algorithms from hyperspectral data,” IEEE Trans. Geosci. Remote Sens., vol. 42, no. 3, pp. 650–663, 2004. doi: 10.1109/TGRS.2003. 820314. [34] N. Keshava, J. Kerekes, D. Manolakis, and G. Shaw, “Algorithm taxonomy for hyperspectral unmixing,” in Proc. Algorithms Multispectral, Hyperspectral, Ultraspectral Imagery VI, Orlando, FL, 2000, vol. 4049, pp. 42–63. [35] K. McGwire, T. Minor, and L. Fenstermaker, “Hyperspectral mixture modeling for quantifying sparse vegetation cover in arid environments,” Remote Sens. Environ., vol. 72, no. 3, pp. 360–374, 2000. doi: 10.1016/S0034-4257(99)00112-1. [36] J. W. Boardman, “Automating spectral unmixing of AVIRIS data using convex geometry concepts,” in Proc. 4th Annu. JPL Airbone Geosci. Workshop, Jet Propulsion Lab., Pasadena, CA, 1993, pp. 11–14. [37] D. A. Roberts, M. Gardner, R. Church, S. Ustin, G. Scheer, and R. Green, “Mapping chaparral in the Santa Monica mountains using multiple endmember spectral mixture models,” Remote Sens. Environ., vol. 65, no. 3, pp. 267–279, 1998. doi: 10.1016/ S0034-4257(98)00037-6. [38] P. E. Dennison and D. A. Roberts, “Endmember selection for multiple endmember spectral mixture analysis using endmember average RMSE,” Remote Sens. Environ., vol. 87, nos. 2–3, pp. 123–135, 2003. doi: 10.1016/S0034-4257(03)00135-4. [39] T. Roper and M. Andrews, “Shadow modelling and correction techniques in hyperspectral imaging,” Electron. Lett., vol. 49, no. 7, pp. 458–460, 2013. doi: 10.1049/el.2012.4406. [40] Q. Zhang, V. P. Pauca, R. J. Plemmons, and D. D. Nikic, “Detecting objects under shadows by fusion of hyperspectral and LiDAR DATA: A physical model approach,” in Proc. 5th Workshop on Hyperspectral Image Signal Process.: Evolution Remote Sens. (WHISPERS), Gainesville, FL, 2013, pp. 1–4. [41] G. J. Fitzgerald, P. J. Pinter, D. J. Hunsaker, and T. R. Clarke, “Multiple shadow fractions in spectral mixture analysis of a 260 [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] cotton canopy,” Remote Sens. Environ., vol. 97, no. 4, pp. 526– 539, 2005. doi: 10.1016/j.rse.2005.05.020. K. Choi and E. Milton, “An investigation into the properties of the dark endmember in spectral feature space,” in Proc. 25th IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), Seoul, South Korea, 2005, pp. 25–29. D. K. Lynch, “Shadows,” Appl. Opt., vol. 54, no. 4, pp. B154– B164, 2015. doi: 10.1364/AO.54.00B154. S. Adler-Golden, M. W. Matthew, G. P. Anderson, G. W. Felde, and J. A. Gardner, “An algorithm for de-shadowing spectral imagery,” in Proc. 11th JPL Airborne Earth Sci. Workshop, Pasadena, CA, 2000, pp. 1–8. D. Schläpfer, R. Richter, and A. Damm, “Correction of shadowing in imaging spectroscopy data by quantification of the proportion of diffuse illumination,” in Proc. 8th Imaging Spectroscopy Workshop (SIG-EARSeL), Nantes, France, 2013, pp. 8–10. R. Richter, T. Kellenberger, and H. Kaufmann, “Comparison of topographic correction methods,” Remote Sens., vol. 1, no. 3, pp. 184–196, 2009. doi: 10.3390/rs1030184. J. Feng, B. Rivard, and A. Sanchez-Azofeifa, “The topographic normalization of hyperspectral data: Implications for the ­selection of spectral end members and lithologic mapping,” Remote Sens. Environ., vol. 85, no. 2, pp. 221–231, 2003. doi: 10.1016/S0034-4257(03)00002-6. B. Hapke, “Bidirectional reflectance spectroscopy, 1, Theory,” J. Geophys. Res., vol. 86, no. B4, pp. 3039–3054, 1981. doi: 10.1029/JB086iB04p03039. B. Hapke, Theory of Reflectance and Emittance Spectroscopy. Cambridge, U.K.: Cambridge Univ. Press, 1993. R. Heylen, M. Parente, and P. Gader, “A review of nonlinear hyperspectral unmixing methods,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 7, no. 6, pp. 1844–1868, June 2014. doi: 10.1109/JSTARS.2014.2320576. L. Drumetz, M.-A. Veganzones, S. Henrot, R. Phlypo, J. Chanussot, and C. Jutten, “Blind hyperspectral unmixing using an extended linear mixing model to address spectral variability,” IEEE Trans. Image Process., vol. 25, no. 8, pp. 3890–3905, 2016. doi: 10.1109/TIP.2016.2579259. L. Drumetz, J. Chanussot, and C. Jutten, “Spectral unmixing: A derivation of the extended linear mixing model from the Hapke model,” IEEE Geosci. Remote Sens. Lett., vol. 17, no. 11, pp. 1866–1870, 2020. doi: 10.1109/LGRS.2019.2958203. B. Combal, and H. Isaka, “The effect of small topographic variations on reflectance,” IEEE Trans. Geosci. Remote Sens., vol. 40, no. 3, pp. 663–670, 2002. doi: 10.1109/TGRS.2002.1000325. M. Cochrane, “Using vegetation reflectance variability for species level classification of hyperspectral data,” Int. J. Remote Sens., vol. 21, no. 10, pp. 2075–2087, 2000. doi: 10.1080/01431160050021303. J. Zhang, B. Rivard, A. Sánchez-Azofeifa, and K. Castro-Esau, “Intra-and inter-class spectral variability of tropical tree species at La Selva, Costa Rica: Implications for species identification using HYDICE imagery,” Remote Sens. Environ., vol. 105, no. 2, pp. 129–141, 2006. doi: 10.1016/j.rse.2006.06.010. M. F. Baumgardner, L. F. Silva, L. L. Biehl, and E. R. Stoner, “Reflectance properties of soils,” in Advances in Agronomy, N. C. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
[57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] Brady, Ed. Amsterdam, The Netherlands: Elsevier, 1986, vol. 38, pp. 1–44. J. K. Crowley, “Visible and near-infrared spectra of carbonate rocks: Reflectance variations related to petrographic texture and impurities,” J. Geophys. Res., Solid Earth, vol. 91, no. B5, pp. 5001–5012, 1986. doi: 10.1029/JB091iB05p05001. R. N. Clark, “Spectroscopy of rocks and minerals, and principles of spectroscopy,” in Remote Sensing for the Earth Sciences: Manual of Remote Sensing, A. N. Rencz, Ed. New York: Wiley, 1999, vol. 3, pp. 3–58. J. Franke, D. A. Roberts, K. Halligan, and G. Menz, “Hierarchical multiple endmember spectral mixture analysis (MESMA) of hyperspectral imagery for urban environments,” Remote Sens. Environ., vol. 113, no. 8, pp. 1712–1723, 2009. doi: 10.1016/j. rse.2009.03.018. J. C. Price, “How unique are spectral signatures?” Remote Sens. Environ., vol. 49, no. 3, pp. 181–186, 1994. doi: 10.1016/00344257(94)90013-2. G. P. Asner, “Biophysical and biochemical sources of variability in canopy reflectance,” Remote Sens. Environ., vol. 64, no. 3, pp. 234–253, 1998. doi: 10.1016/S0034-4257(98)00014-5. M. P. Ferreira, A. E. B. Grondona, S. B. A. Rolim, and Y. E. Shimabukuro, “Analyzing the spectral variability of tropical tree species using hyperspectral feature selection and leaf optical modeling,” J. Appl. Remote Sens., vol. 7, no. 1, p. 73,502, 2013. doi: 10.1117/1.JRS.7.073502. P. Gong, R. Pu, and B. Yu, “Conifer species recognition: An exploratory analysis of in situ hyperspectral data,” Remote Sens. Environ., vol. 62, no. 2, pp. 189–200, 1997. doi: 10.1016/S00344257(97)00094-1. P. Lukeš, P. Stenberg, M. Rautiainen, M. Mõttus, and K. M. Vanhatalo, “Optical properties of leaves and needles for boreal tree species in Europe,” Remote Sens. Lett., vol. 4, no. 7, pp. 667–676, 2013. doi: 10.1080/2150704X.2013.782112. Z. Gao and L. Zhang, “Multi-seasonal spectral characteristics analysis of coastal salt marsh vegetation in Shanghai, China,” Estuarine, Coastal Shelf Sci., vol. 69, no. 1-2, pp. 217–224, 2006. doi: 10.1016/j.ecss.2006.04.016. H. Schmidt and A. Karnieli, “Remote sensing of the seasonal variability of vegetation in a semi-arid environment,” J. Arid Environ., vol. 45, no. 1, pp. 43–59, 2000. doi: 10.1006/jare.1999. 0607. M. Mõttus, M. Sulev, and L. Hallik, “Seasonal course of the spectral properties of alder and birch leaves,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 7, no. 6, pp. 2496–2505, 2014. doi: 10.1109/JSTARS.2013.2294242. S. Jacquemoud and S. L. Ustin, “Leaf optical properties: A state of the art,” in Proc. 8th Int. Symp. Phys. Measure. Signatures Remote Sens., Aussois, France, 2001, pp. 223–332. S. Jacquemoud and F. Baret, “PROSPECT: A model of leaf optical properties spectra,” Remote Sens. Environ., vol. 34, no. 2, pp. 75–91, 1990. doi: 10.1016/0034-4257(90)90100-Z. W. Verhoef, “Light scattering by leaf layers with application to canopy reflectance modeling: The SAIL model,” Remote Sens. Environ., vol. 16, no. 2, pp. 125–141, 1984. doi: 10.1016/00344257(84)90057-9. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE [71] T. P. Dawson, P. J. Curran, and S. E. Plummer, “LIBERTY—modeling the effects of leaf biochemical concentration on reflectance spectra,” Remote Sens. Environ., vol. 65, no. 1, pp. 50–60, 1998. doi: 10.1016/S0034-4257(98)00007-8. [72] D. B. Lobell and G. P. Asner, “Moisture effects on soil reflectance,” Soil Sci. Soc. Amer. J., vol. 66, no. 3, pp. 722–727, 2002. doi: 10.2136/sssaj2002.0722. [73] B. Somers, S. Delalieux, W. W. Verstraeten, and P. Coppin, “A conceptual framework for the simultaneous extraction of subpixel spatial extent and spectral characteristics of crops,” Photogrammetric Eng. Remote Sens., vol. 75, no. 1, pp. 57–68, 2009. doi: 10.14358/PERS.75.1.57. [74] M. Sadeghi, S. B. Jones, and W. D. Philpot, “A linear physicallybased model for remote sensing of soil moisture using short wave infrared bands,” Remote Sens. Environ., vol. 164, pp. 66– 76, July 2015. doi: 10.1016/j.rse.2015.04.007. [75] W. J. Wiscombe and S. G. Warren, “A model for the spectral ­albedo of snow. I: Pure snow,” J. Atmos. Sci., vol. 37, no. 12, pp. 2712–2733, 1980. doi: 10.1175/1520-0469(1980)037<2712: AMFTSA>2.0.CO;2. [76] J.-B. Féret, A. Gitelson, S. Noble, and S. Jacquemoud, “PROSPECT-D: Towards modeling leaf optical properties through a complete lifecycle,” Remote Sens. Environ., vol. 193, pp. 204– 215, May 2017. doi: 10.1016/j.rse.2017.03.004. [77] R. Webster, P. Curran, and J. Munden, “Spatial correlation in reflected radiation from the ground and its implications for sampling and mapping by ground-based radiometry,” Remote Sens. Environ., vol. 29, no. 1, pp. 67–78, 1989. doi: 10.1016/00344257(89)90079-5. [78] E. Tola, K. Al-Gaadi, R. Madugundu, A. Zeyada, A. Kayad, and C. Biradar, “Characterization of spatial variability of soil physicochemical properties and its impact on Rhodes grass productivity,” Saudi J. Biol. Sci., vol. 24, no. 2, pp. 421–429, 2017. doi: 10.1016/j.sjbs.2016.04.013. [79] A. Najafian, M. Dayani, H. R. Motaghian, and H. Nadian, “Geostatistical assessment of the spatial distribution of some chemical properties in calcareous soils,” J. Integr. Agri., vol. 11, no. 10, pp. 1729–1737, 2012. doi: 10.1016/S20953119(12)60177-4. [80] Y.-C. Wei, Y.-L. Bai, J.-Y. Jin, F. Zhang, L.-P. Zhang, and X.-Q. Liu, “Spatial variability of soil chemical properties in the reclaiming marine foreland to Yellow Sea of China,” Agri. Sci. China, vol. 8, no. 9, pp. 1103–1111, 2009. doi: 10.1016/S16712927(08)60318-1. [81] J. Hou-Long et al., “Spatial variability of soil properties in a long-term tobacco plantation in Central China,” Soil Sci., vol. 175, no. 3, pp. 137–144, 2010. doi: 10.1097/ SS.0b013e3181d82176. [82] Y. Yuan, Y. Feng, and X. Lu, “Projection-based NMF for hyperspectral unmixing,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 8, no. 6, pp. 2632–2643, 2015. doi: 10.1109/ JSTARS.2015.2427656. [83] T. Uezato, M. Fauvel, and N. Dobigeon, “Hyperspectral unmixing with spectral variability using adaptive bundles and double sparsity,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 6, pp. 3980–3992, 2019. doi: 10.1109/TGRS.2018.2889256. 261
[84] L. Drumetz, T. R. Meyer, J. Chanussot, A. L. Bertozzi, and C. Jutten, “Hyperspectral image unmixing with endmember bundles and group sparsity inducing mixed norms,” IEEE Trans. Image Process., vol. 28, no. 7, pp. 3435–3450, 2019. doi: 10.1109/ TIP.2019.2897254. [85] C. L. Lippitt, D. A. Stow, D. A. Roberts, and L. L. Coulter, “Multidate MESMA for monitoring vegetation growth forms in southern California shrublands,” Int. J. Remote Sens., vol. 39, no. 3, pp. 655–683, 2018. doi: 10.1080/01431161.2017.1388936. [86] S. Bernabe, F. D. Igual, G. Botella, M. Prieto-Matias, and A. Plaza, “Parallel implementation of the multiple endmember spectral mixture analysis algorithm for hyperspectral unmixing,” in Proc. High-Performance Comput. Remote Sens. V, 2015, vol. 9646, p. 96460J. doi: 10.1117/12.2195120. [87] R. Heylen, A. Zare, P. Gader, and P. Scheunders, “Hyperspectral unmixing with endmember variability via alternating angle minimization,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 8, pp. 4983–4993, 2016. doi: 10.1109/TGRS.2016.2554160. [88] J.-P. Combe et al., “Analysis of OMEGA/Mars express data hyperspectral data using a multiple-endmember linear spectral unmixing model (MELSUM): Methodology and first results,” Planetary Space Sci., vol. 56, no. 7, pp. 951–975, 2008. doi: 10.1016/j.pss.2007.12.007. [89] P. E. Dennison, K. Q. Halligan, and D. A. Roberts, “A comparison of error metrics and constraints for multiple endmember spectral mixture analysis and spectral angle mapper,” Remote Sens. Environ., vol. 93, no. 3, pp. 359–367, 2004. doi: 10.1016/j.rse. 2004.07.013. [90] L. Tits, R. Heylen, B. Somers, P. Scheunders, and P. Coppin, “A geometric unmixing concept for the selection of optimal binary endmember combinations,” IEEE Geosci. Remote Sens. Lett., vol. 12, no. 1, pp. 82–86, 2015. doi: 10.1109/LGRS.2014.2326555. [91] R. Mhenni, S. Bourguignon, J. Ninin, and F. Schmidt, “Spectral unmixing with sparsity and structuring constraints,” in Proc. 9th Workshop on Hyperspectral Image and Signal Process., Evolution Remote Sens., Amsterdam, The Netherlands, 2018, pp. 1–5. [92] C. Song, “Spectral mixture analysis for subpixel vegetation fractions in the urban environment: How to incorporate endmember variability?” Remote Sens. Environ., vol. 95, no. 2, pp. 248–263, 2005. doi: 10.1016/j.rse.2005.01.002. [93] D. A. Roberts, P. E. Dennison, M. E. Gardner, Y. Hetzel, S. L. Ustin, and C. T. Lee, “Evaluation of the potential of hyperion for fire danger assessment by comparison to the airborne visible/infrared imaging spectrometer,” IEEE Trans. Geosci. Remote Sens., vol. 41, no. 6, pp. 1297–1310, 2003. doi: 10.1109/ TGRS.2003.812904. [94] K. Tan, X. Jin, Q. Du, and P. Du, “Modified multiple endmember spectral mixture analysis for mapping impervious surfaces in urban environments,” J. Appl. Remote Sens., vol. 8, no. 1, p. 85096, 2014. doi: 10.1117/1.JRS.8.085096. [95] C. Zhang et al., “Mapping urban land cover types using object-based multiple endmember spectral mixture analysis,” Remote Sens. Lett., vol. 5, no. 6, pp. 521–529, 2014. doi: 10.1080/2150704X.2014.930197. [96] C. Zhang, “Multiscale quantification of urban composition from EO-1/Hyperion data using object-based spectral unmix- 262 ing,” Int. J. Appl. Earth Observ. Geoinf., vol. 47, pp. 153–162, May 2016. doi: 10.1016/j.jag.2016.01.002. [97] C. A. Bateson, G. P. Asner, and C. A. Wessman, “Endmember bundles: A new approach to incorporating endmember variability into spectral mixture analysis,” IEEE Trans. Geosci. Remote Sens., vol. 38, no. 2, pp. 1083–1094, 2000. doi: 10.1109/36.841987. [98] M. Petrou and P. G. Foschi, “Confidence in linear spectral unmixing of single pixels,” IEEE Trans. Geosci. Remote Sens., vol. 37, no. 1, pp. 624–626, 1999. doi: 10.1109/36.739132. [99] G. P. Asner and D. B. Lobell, “A biogeophysical approach for automated SWIR unmixing of soils and vegetation,” Remote Sens. Environ., vol. 74, no. 1, pp. 99–112, 2000. doi: 10.1016/S00344257(00)00126-7. [100] G. P. Asner and K. B. Heidebrecht, “Spectral ­u nmixing of vegetation, soil and dry carbon cover in arid regions: Comparing multispectral and hyperspectral observations,” ­ Int. J. Remote Sens., vol. 23, no. 19, pp. 3939–3958, 2002. doi: 10.1080/01431160110115960. [101] G. P. Asner, M. M. Bustamante, and A. R. Townsend, “Scale dependence of biophysical structure in deforested areas bordering the Tapajos National Forest, Central Amazon,” Remote Sens. Environ., vol. 87, no. 4, pp. 507–520, 2003. doi: 10.1016/j. rse.2003.03.001. [102] J. M. Bioucas-Dias and M. A. Figueiredo, “Alternating direction algorithms for constrained sparse regression: Application to hyperspectral unmixing,” in Proc. 2nd Workshop on Hyperspectral Image and Signal Process.: Evolution Remote Sens., Reykjavik, Iceland, 2010, pp. 1–4. [103] M.-D. Iordache, J. M. Bioucas-Dias, and A. Plaza, “Sparse unmixing of hyperspectral data,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 6, pp. 2014–2039, 2011. doi: 10.1109/ TGRS.2010.2098413. [104] Z. Shi, W. Tang, Z. Duren, and Z. Jiang, “Subspace matching pursuit for sparse unmixing of hyperspectral data,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 6, pp. 3256–3274, 2014. doi: 10.1109/TGRS.2013.2272076. [105] W. Tang, Z. Shi, and Y. Wu, “Regularized simultaneous forward–backward greedy algorithm for sparse unmixing of hyperspectral data,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 9, pp. 5271–5288, 2014. [106] Z. Shi, T. Shi, M. Zhou, and X. Xu, “Collaborative sparse hyperspectral unmixing using l0 norm,” IEEE Trans. Geosci. Remote Sens., vol. 56, no. 9, pp. 5495–5508, 2018. doi: 10.1109/ TGRS.2018.2818703. [107] X. Xu, and Z. Shi, “Multi-objective based spectral unmixing for hyperspectral images,” ISPRS J. Photogrammetry Remote Sens., vol. 124, pp. 54–69, Feb. 2017. doi: 10.1016/j.isprsjprs.2016.12.010. [108] X. Xu, Z. Shi, and B. Pan, “ℓ0 -based sparse hyperspectral unmixing using spectral information and a multi-objectives formulation,” ISPRS J. Photogramm. Remote Sens., vol. 141, pp. 46–58, July 2018. doi: 10.1016/j.isprsjprs.2018.04.008. [109] X. Xu, Z. Shi, B. Pan, and X. Li, “A classification-based model for multi-objective hyperspectral sparse unmixing,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 12, pp. 9612–9625, 2019. doi: 10.1109/TGRS.2019.2928021. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
[110] M.-D. Iordache, J. M. Bioucas-Dias, and A. Plaza, “Collaborative sparse regression for hyperspectral unmixing,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 1, pp. 341–354, 2014. doi: 10.1109/TGRS.2013.2240001. [111] Y. Qian, S. Jia, J. Zhou, and A. Robles-Kelly, “Hyperspectral unmixing via L ½ sparsity-constrained nonnegative matrix factorization,” IEEE Trans. Geosci. and Remote Sens., vol. 49, no. 11, pp. 4282–4297, 2011. [112] M.-D. Iordache, J. M. Bioucas-Dias, and A. Plaza, “Total variation spatial regularization for sparse hyperspectral unmixing,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 11, pp. 4484–4502, 2012. doi: 10.1109/TGRS.2012.2191590. [113] R. A. Borsoi, T. Imbiriba, J. C. M. Bermudez, and C. Richard, “A fast multiscale spatial regularization for sparse hyperspectral unmixing,” IEEE Geosci. Remote Sens. Lett., vol. 16, no. 4, pp. 598–602, Apr. 2019. doi: 10.1109/ LGRS.2018.2878394. [114] M.-D. Iordache, J. M. Bioucas-Dias, and A. Plaza, “Hyperspectral unmixing with sparse group lasso,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), Vancouver, Canada, 2011, pp. 3586–3589. [115] X. Fu, W.-K. Ma, J. M. Bioucas-Dias, and T.-H. Chan, “Semiblind hyperspectral unmixing in the presence of spectral library mismatches,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 9, pp. 5171–5184, 2016. doi: 10.1109/TGRS.2016.2557340. [116] M. Berman et al., “A comparison between three sparse unmixing algorithms using a large library of shortwave infrared mineral spectra,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 6, pp. 3588–3610, 2017. doi: 10.1109/TGRS.2017.2676816. [117] K. J. Guilfoyle, M. L. Althouse, and C.-I. Chang, “A quantitative and comparative analysis of linear and nonlinear spectral mixture models using radial basis function neural networks,” IEEE Trans. Geosci. Remote Sens., vol. 39, no. 10, pp. 2314–2318, 2001. doi: 10.1109/36.957296. [118] A. Baraldi, E. Binaghi, P. Blonda, P. A. Brivio, and A. Rampini, “Comparison of the multilayer perceptron with neuro-fuzzy techniques in the estimation of cover class mixture in remotely sensed data,” IEEE Trans. Geosci. Remote Sens., vol. 39, no. 5, pp. 994–1005, 2001. doi: 10.1109/36.921417. [119] A. Okujeni, S. Van der Linden, B. Jakimow, A. Rabe, J. Verrelst, and P. Hostert, “A comparison of advanced regression algorithms for quantifying urban land cover,” Remote Sens., vol. 6, no. 7, pp. 6324–6346, 2014. doi: 10.3390/rs6076324. [120] F. Bovolo, L. Bruzzone, and L. Carlin, “A novel technique for subpixel image classification based on support vector machine,” IEEE Trans. Image Process., vol. 19, no. 11, pp. 2983– 2999, 2010. doi: 10.1109/TIP.2010.2051632. [121] G. A. Licciardi and F. Del Frate, “Pixel unmixing in hyperspectral data by means of neural networks,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 11, pp. 4163–4172, 2011. doi: 10.1109/ TGRS.2011.2160950. [122] A. Okujeni, S. van der Linden, L. Tits, B. Somers, and P. Hostert, “Support vector regression and synthetically mixed training data for quantifying urban land cover,” Remote Sens. Environ., vol. 137, no. 1, pp. 184–197, 2013. doi: 10.1016/j. rse.2013.06.007. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE [123] F. A. Mianji and Y. Zhang, “SVM-based unmixing-to-classification conversion for hyperspectral abundance quantification,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 11, pp. 4318–4327, 2011. doi: 10.1109/TGRS.2011.2166766. [124] L. Wang, D. Liu, Q. Wang, and Y. Wang, “Spectral unmixing model based on least squares support vector machine with unmixing residue constraints,” IEEE Geosci. Remote Sens. Lett., vol. 10, no. 6, pp. 1592–1596, 2013. doi: 10.1109/LGRS.2013. 2262371. [125] A. Okujeni, S. van der Linden, S. Suess, and P. Hostert, “Ensemble learning from synthetically mixed training data for quantifying urban land cover with support vector regression,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 10, no. 4, pp. 1640–1650, 2017. doi: 10.1109/JSTARS.2016. 2634859. [126] J. Rosentreter, R. Hagensieker, A. Okujeni, R. Roscher, P. D. Wagner, and B. Waske, “Subpixel mapping of urban areas using EnMAP data and multioutput support vector regression,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 10, no. 5, pp. 1938–1948, 2017. doi: 10.1109/JSTARS.2017.2652726. [127] J. Plaza, A. Plaza, R. Perez, and P. Martinez, “On the use of small training sets for neural network-based characterization of mixed pixels in remotely sensed hyperspectral images,” Pattern Recognition, vol. 42, no. 11, pp. 3032–3045, 2009. doi: 10.1016/j.patcog.2009.04.008. [128] J. Plaza, and A. Plaza, “Spectral mixture analysis of hyperspectral scenes using intelligently selected training samples,” IEEE Geosci. Remote Sens. Lett., vol. 7, no. 2, pp. 371–375, 2010. doi: 10.1109/LGRS.2009.2036139. [129] L. Wang and X. Jia, “Integration of soft and hard classifications using extended support vector machines,” IEEE Geosci. Remote Sens. Lett., vol. 6, no. 3, pp. 543–547, 2009. [130] Y. Gu, S. Wang, and X. Jia, “Spectral unmixing in multiple-kernel Hilbert space for hyperspectral imagery,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 7, pp. 3968–3981, 2013. doi: 10.1109/ TGRS.2012.2227757. [131] X. Li, X. Jia, L. Wang, and K. Zhao, “On spectral unmixing resolution using extended support vector machines,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 9, pp. 4985–4996, 2015. doi: 10.1109/TGRS.2015.2415587. [132] X. Li, X. Jia, L. Wang, and K. Zhao, “Reduction of spectral unmixing uncertainty using minimum-class-variance support vector machines,” IEEE Geosci. Remote Sens. Lett., vol. 13, no. 9, pp. 1335–1339, 2016. [133] T. Uezato, R. J. Murphy, A. Melkumyan, and A. Chlingaryan, “A novel spectral unmixing method incorporating spectral variability within endmember classes,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 5, pp. 2812–2831, 2016. doi: 10.1109/ TGRS.2015.2506168. [134] T. Uezato, R. J. Murphy, A. Melkumyan, and A. Chlingaryan, “Incorporating spatial information and endmember variability into unmixing analyses to improve abundance estimates,” IEEE Trans. Image Process., vol. 25, no. 12, pp. 5563– 5575, 2016. [135] R. Heylen, D. Burazerovic, and P. Scheunders, “Non-linear spectral unmixing by geodesic simplex volume maximization,” 263
IEEE J. Sel. Topics Signal Process., vol. 5, no. 3, pp. 534–542, 2011. doi: 10.1109/JSTSP.2010.2088377. [136] B. Koirala, Z. Zahiri, A. Lamberti, and P. Scheunders, “Robust supervised method for nonlinear spectral unmixing accounting for endmember variability,” IEEE Trans. Geosci. Remote Sens., 2020. doi: 10.1109/TGRS.2020.3031012. [137] X. Zhang, Y. Sun, J. Zhang, P. Wu, and L. Jiao, “Hyperspectral unmixing via deep convolutional neural networks,” IEEE Geosci. Remote Sens. Lett., vol. 15, no. 11, pp. 1755–1759, 2018. doi: 10.1109/LGRS.2018.2857804. [138] Y. Zeng, C. Ritz, J. Zhao, and J. Lan, “Attention-based residual network with scattering transform features for hyperspectral unmixing with limited training samples,” Remote Sens., vol. 12, no. 3, p. 400, 2020. doi: 10.3390/rs12030400. [139] B. Palsson, J. Sigurdsson, J. R. Sveinsson, and M. O. Ulfarsson, “Hyperspectral unmixing using a neural network autoencoder,” IEEE Access, vol. 6, pp. 25,646–25,656, 2018. doi: 10.1109/ ACCESS.2018.2818280. [140] Y. Su, J. Li, A. Plaza, A. Marinoni, P. Gamba, and S. Chakravortty, “DAEN: Deep autoencoder networks for hyperspectral unmixing,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 7, pp. 4309–4321, 2019. doi: 10.1109/TGRS.2018.2890633. [141] Y. Su, A. Marinoni, J. Li, J. Plaza, and P. Gamba, “Stacked nonnegative sparse autoencoders for robust hyperspectral unmixing,” IEEE Geosci. Remote Sens. Lett., vol. 15, no. 9, pp. 1427– 1431, 2018. doi: 10.1109/LGRS.2018.2841400. [142] Y. Qu and H. Qi, “uDAS: An untied denoising autoencoder with sparsity for spectral unmixing,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 3, pp. 1698–1712, Mar. 2019. doi: 10.1109/ TGRS.2018.2868690. [143] Y. Qian, F. Xiong, Q. Qian, and J. Zhou, “Spectral mixture model inspired network architectures for hyperspectral unmixing,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 10, pp. 7418–7434, 2020. doi: 10.1109/TGRS.2020.2982490. [144] J. Li, X. Li, B. Huang, and L. Zhao, “Hopfield neural network approach for supervised nonlinear spectral unmixing,” IEEE Geosci. Remote Sens. Lett., vol. 13, no. 7, pp. 1002–1006, 2016. doi: 10.1109/LGRS.2016.2560222. [145] S. Cooper, A. Okujeni, C. Jänicke, M. Clark, S. van der Linden, and P. Hostert, “Disentangling fractional vegetation cover: Regression-based unmixing of simulated spaceborne imaging spectroscopy data,” Remote Sens. Environ., vol. 246, p. 111,856, Sept. 2020. doi: 10.1016/j.rse.2020.111856. [146] Z. Mitraka, F. Del Frate, and F. Carbone, “Nonlinear spectral unmixing of landsat imagery for urban surface cover mapping,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 9, no. 7, pp. 3340–3350, 2016. doi: 10.1109/JSTARS.2016.2522181. [147] A. Okujeni et al., “Generalizing machine learning regression models using multi-site spectral libraries for mapping vegetation-impervious-soil fractions across multiple cities,” Remote Sens. Environ., vol. 216, pp. 482–496, Oct. 2018. doi: 10.1016/j. rse.2018.07.011. [148] L. Zhang, L. Zhang, and B. Du, “Deep learning for remote sensing data: A technical tutorial on the state of the art,” IEEE Geosci. Remote Sens. Mag. (replaces Newslett.), vol. 4, no. 2, pp. 22–40, 2016. doi: 10.1109/MGRS.2016.2540798. 264 [149] X. X. Zhu et al., “Deep learning in remote sensing: A comprehensive review and list of resources,” IEEE Geosci. Remote Sens. Mag. (replaces Newslett.), vol. 5, no. 4, pp. 8–36, 2017. doi: 10.1109/MGRS.2017.2762307. [150] J. Li, “Wavelet-based feature extraction for improved endmember abundance estimation in linear unmixing of hyperspectral signals,” IEEE Trans. Geosci. Remote Sens., vol. 42, no. 3, pp. 644–649, 2004. doi: 10.1109/TGRS.2003.822750. [151] C. Wu, “Normalized spectral mixture analysis for monitoring urban composition using ETM+ imagery,” Remote Sens. Environ., vol. 93, no. 4, pp. 480–492, 2004. doi: 10.1016/j. rse.2004.08.003. [152] K. N. Youngentob, D. A. Roberts, A. A. Held, P. E. Dennison, X. Jia, and D. B. Lindenmayer, “Mapping two eucalyptus subgenera using multiple endmember spectral mixture analysis and continuum-removed imaging spectrometry data,” Remote Sens. Environ., vol. 115, no. 5, pp. 1115–1128, 2011. doi: 10.1016/j. rse.2010.12.012. [153] J. Zhang, B. Rivard, and A. Sanchez-Azofeifa, “Derivative spectral unmixing of hyperspectral data applied to mixtures of lichen and rock,” IEEE Trans. Geosci. Remote Sens., vol. 42, no. 9, pp. 1934–1940, 2004. doi: 10.1109/TGRS.2004.832239. [154] P. Debba, E. J. Carranza, F. D. van der Meer, and A. Stein, “Abundance estimation of spectrally similar minerals by using derivative spectra in simulated annealing,” IEEE Trans. Geosci. Remote Sens., vol. 44, no. 12, pp. 3649–3658, 2006. doi: 10.1109/TGRS.2006.881125. [155] X. Miao et al., “Estimation of yellow starthistle abundance through CASI-2 hyperspectral imagery using linear spectral mixture models,” Remote Sens. Environ., vol. 101, no. 3, pp. 329–341, 2006. doi: 10.1016/j.rse.2006.01.006. [156] B. Somers, S. Delalieux, W. Verstraeten, J. Van Aardt, G. Albrigo, and P. Coppin, “An automated waveband selection technique for optimized hyperspectral mixture analysis,” Int. J. Remote Sens., vol. 31, no. 20, pp. 5549–5568, 2010. doi: 10.1080/01431160903311305. [157] B. Somers and G. P. Asner, “Multi-temporal hyperspectral mixture analysis and feature selection for invasive species mapping in rainforests,” Remote Sens. Environ., vol. 136, no. 1, pp. 14–27, 2013. doi: 10.1016/j.rse.2013.04.006. [158] O. Ghaffari, M. J. V. Zoej, and M. Mokhtarzade, “Reducing the effect of the endmembers’ spectral variability by selecting the optimal spectral bands,” Remote Sens., vol. 9, no. 9, p. 884, 2017. doi: 10.3390/rs9090884. [159] Z. Tane, D. Roberts, S. Veraverbeke, Á. Casas, C. Ramirez, and S. Ustin, “Evaluating endmember and band selection techniques for multiple endmember spectral mixture analysis using post-fire imaging spectroscopy,” Remote Sens., vol. 10, no. 3, p. 389, 2018. doi: 10.3390/rs10030389. [160] B. Somers and G. P. Asner, “Tree species mapping in tropical forests using multi-temporal imaging spectroscopy: Wavelength adaptive spectral mixture analysis,” Int. J. Appl. Earth Observ. Geoinf., vol. 31, pp. 57–66, Sept. 2014. doi: 10.1016/j. jag.2014.02.006. [161] B. Somers, S. Delalieux, J. Stuckens, W. Verstraeten, and P. Coppin, “A weighted linear spectral mixture analysis approach to IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
address endmember variability in agricultural production systems,” Int. J. Remote Sens., vol. 30, no. 1, pp. 139–147, 2009. doi: 10.1080/01431160802304625. [162] B. Somers, J. Verbesselt, E. M. Ampe, N. Sims, W. W. Verstraeten, and P. Coppin, “Spectral mixture analysis to monitor defoliation in mixed-aged Eucalyptus globulus Labill plantations in southern Australia using Landsat 5-TM and EO-1 hyperion data,” Int. J. Appl. Earth Observ. Geoinf., vol. 12, no. 4, pp. 270–277, 2010. doi: 10.1016/j.jag.2010.03.005. [163] B. Somers and G. P. Asner, “Invasive species mapping in Hawaiian rainforests using multi-temporal hyperion spaceborne imaging spectroscopy,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 6, no. 2, pp. 351–359, 2013. doi: 10.1109/ JSTARS.2012.2203796. [164] B. Somers, S. Delalieux, W. W. Verstraeten, J. Verbesselt, S. Lhermitte, and P. Coppin, “Magnitude-and shape-related feature integration in hyperspectral mixture analysis to monitor weeds in citrus orchards,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 11, pp. 3630–3642, 2009. doi: 10.1109/ TGRS.2009.2024207. [165] W. Krippner, S. Bauer, and F. P. León, “Considering spectral variability for optical material abundance estimation,” tm-Technisches Messen, vol. 85, no. 3, pp. 149–158, 2018. doi: 10.1515/ teme-2017-0053. [166] C. M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics). New York: Springer-Verlag, 2006. [167] C.-I. Chang and B. Ji, “Weighted abundance-constrained linear spectral mixture analysis,” IEEE Trans. Geosci. Remote Sens., vol. 44, no. 2, pp. 378–388, 2006. [168] J. Jin, B. Wang, and L. Zhang, “A novel approach based on Fisher discriminant null space for decomposition of mixed pixels in hyperspectral imagery,” IEEE Geosci. Remote Sens. Lett., vol. 7, no. 4, pp. 699–703, 2010. doi: 10.1109/LGRS.2010.2046134. [169] B. D. Bue et al., “Leveraging in-scene spectra for vegetation species discrimination with MESMA-MDA,” ISPRS J. Photogrammetry Remote Sens., vol. 108, pp. 33–48, Oct. 2015. doi: 10.1016/j. isprsjprs.2015.06.001. [170] M. Liu, W. Yang, J. Chen, and X. Chen, “An orthogonal Fisher transformation-based unmixing method toward estimating fractional vegetation cover in semiarid areas,” IEEE Geosci. Remote Sens. Lett., vol. 14, no. 3, pp. 449–453, 2017. doi: 10.1109/ LGRS.2017.2648863. [171] A. Jafari, R. Safabakhsh, and M. M. Ebadzadeh, “Endmember orthonormal mapping in hyperspectral mixture analysis to address endmember variability,” Earth Sci. Informatics, vol. 9, no. 3, pp. 291–307, 2016. doi: 10.1007/s12145-016-0256-4. [172] F. Xu, X. Cao, X. Chen, and B. Somers, “Mapping impervious surface fractions using automated Fisher transformed unmixing,” Remote Sens. Environ., vol. 232, p. 111,311, 2019. doi: 10.1016/j.rse.2019.111311. [173] Q. Du, “Modified Fisher’s linear discriminant analysis for hyperspectral imagery,” IEEE Geosci. Remote Sens. Lett., vol. 4, no. 4, pp. 503–507, 2007. doi: 10.1109/LGRS.2007.900751. [174] K. Canham, A. Schlamm, A. Ziemann, B. Basener, and D. Messinger, “Spatially adaptive hyperspectral unmixing,” IEEE Trans. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE Geosci. Remote Sens., vol. 49, no. 11, pp. 4248–4262, 2011. doi: 10.1109/TGRS.2011.2169680. [175] M. A. Goenaga, M. C. Torres-Madronero, M. Velez-Reyes, S. J. Van Bloem, and J. D. Chinea, “Unmixing analysis of a time series of hyperion images over the Guánica dry forest in Puerto Rico,” IEEE J. Sel. Topics Appl. Earth Observ Remote Sens., vol. 6, no. 2, pp. 329–338, 2013. doi: 10.1109/JSTARS.2012. 2225096. [176] M. Li, S. Zang, C. Wu, and Y. Deng, “Segmentation-based and rule-based spectral mixture analysis for estimating urban imperviousness,” Adv. Space Res., vol. 55, no. 5, pp. 1307–1315, 2015. doi: 10.1016/j.asr.2014.12.015. [177] H. Sun and A. Zare, “Map-guided hyperspectral image superpixel segmentation using proportion maps,” in Proc. 37th IEEE Int. Geoscience and Remote Sensing Symp., Fort Worth, TX, 2017, pp. 3751–3754. [178] L. Drumetz et al., “Binary partition tree-based local spectral unmixing,” in Proc. 6th Workshop on Hyperspectral Image Signal Process.: Evolution Remote Sens., Lausanne, Switzerland, 2014, pp. 1–4. [179] M. A. Veganzones, G. Tochon, M. Dalla-Mura, A. J. Plaza, and J. Chanussot, “Hyperspectral image segmentation using a new spectral unmixing-based binary partition tree representation,” IEEE Trans. Image Process., vol. 23, no. 8, pp. 3574–3589, 2014. doi: 10.1109/TIP.2014.2329767. [180] C. Deng and C. Wu, “A spatially adaptive spectral mixture analysis for mapping subpixel urban impervious surface distribution,” Remote Sens. Environ., vol. 133, no. 1, pp. 62–70, 2013. doi: 10.1016/j.rse.2013.02.005. [181] C. Wu, C. Deng, and X. Jia, “Spatially constrained multiple endmember spectral mixture analysis for quantifying subpixel urban impervious surfaces,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 7, no. 6, pp. 1976–1984, 2014. [182] S. Cao, Q. Yu, A. Sanchez-Azofeifa, J. Feng, B. Rivard, and Z. Gu, “Mapping tropical dry forest succession using multiple criteria spectral mixture analysis,” ISPRS J. Photogrammetry Remote Sens., vol. 109, pp. 17–29, Nov. 2015. doi: 10.1016/j.isprsjprs.2015.08.009. [183] S. Mei, Q. Du, and M. He, “Equivalent-sparse unmixing through spatial and spectral constrained endmember selection from an image-derived spectral library,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 8, no. 6, pp. 2665–2675, 2015. doi: 10.1109/JSTARS.2015.2403254. [184] C. Deng, “Automated construction of multiple regional libraries for neighborhoodwise local multiple endmember unmixing,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 9, no. 9, pp. 4232–4246, 2016. doi: 10.1109/JSTARS.2016.2541660. [185] C. Deng, “Incorporating endmember variability into linear unmixing of coarse resolution imagery: Mapping large-scale impervious surface abundance using a hierarchically objectbased spectral mixture analysis,” Remote Sens., vol. 7, no. 7, pp. 9205–9229, 2015. [186] A. Robin, K. Cawse-Nicholson, A. Mahmood, and M. Sears, “Estimation of the intrinsic dimension of hyperspectral images: Comparison of current methods,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 8, no. 6, pp. 2854–2861, 2015. doi: 10.1109/JSTARS.2015.2432460. 265
[187] L. Drumetz et al., “Hyperspectral local intrinsic dimensionality,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 7, pp. 4063– 4078, 2016. doi: 10.1109/TGRS.2016.2536480. [188] L. Drumetz, G. Tochon, M. A. Veganzones, J. Chanussot, and C. Jutten, “Improved local spectral unmixing of hyperspectral data using an algorithmic regularization path for collaborative sparse regression,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), New Orleans, 2017, pp. 6190–6194. [189] A. Zare, P. Gader, O. Bchir, and H. Frigui, “Piecewise convex multiple-model endmember detection and spectral unmixing,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 5, pp. 2853– 2862, 2013. doi: 10.1109/TGRS.2012.2219058. [190] D. T. Anderson and A. Zare, “Spectral unmixing cluster validity index for multiple sets of endmembers,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 5, no. 4, pp. 1282–1295, 2012. doi: 10.1109/JSTARS.2012.2189556. [191] A. Zare, P. Gader, T. Allgire, D. Dranishnikov, and R. Close, “Bootstrapping for piece-wise convex endmember distribution detection,” in Proc. 4th Workshop on Hyperspectral Image and Signal Process.: Evolution Remote Sens., Shanghai, China, 2012, pp. 1–4. [192] A. Zare, O. Bchir, H. Frigui, and P. Gader, “Spatially-smooth piece-wise convex endmember detection,” in Proc. 2nd Workshop on Hyperspectral Image and Signal Process.: Evolution Remote Sens., Reykjavik, Iceland, 2010, pp. 1–4. [193] A. Castrodad, Z. Xing, J. B. Greer, E. Bosch, L. Carin, and G. Sapiro, “Learning discriminative sparse representations for modeling, source separation, and mapping of hyperspectral imagery,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 11, pp. 4263–4281, 2011. doi: 10.1109/TGRS.2011.2163822. [194] M.-D. Iordache, A. Okujeni, S. van der Linden, J. Bioucas-Dias, A. Plaza, and B. Somers, “A multi-measurement vector approach for endmember extraction in urban environments,” in Proc. Image Inf. Mining Conf.: The Sentinels Era, Bucharest, Romania, 2014, pp. 1–4. [195] C. Revel, Y. Deville, V. Achard, X. Briottet, and C. Weber, “Inertia-constrained pixel-by-pixel nonnegative matrix factorisation: A hyperspectral unmixing method dealing with intraclass variability,” Remote Sens., vol. 10, no. 11, p. 1706, 2018. doi: 10.3390/rs10111706. [196] Y. Shkuratov, L. Starukhina, H. Hoffmann, and G. Arnold, “A model of spectral albedo of particulate surfaces: Implications for optical properties of the moon,” Icarus, vol. 137, no. 2, pp. 235–246, 1999. doi: 10.1006/icar.1998.6035. [197] P. E. Johnson, M. O. Smith, and J. B. Adams, “Simple algorithms for remote determination of mineral abundances and particle sizes from reflectance spectra,” J. Geophys. Res., Planets, vol. 97, no. E2, pp. 2649–2657, 1992. doi: 10.1029/91JE02504. [198] R. Heylen and P. Gader, “Nonlinear spectral unmixing with a linear mixture of intimate mixtures model,” IEEE Geosci. Remote Sens. Lett., vol. 11, no. 7, pp. 1195–1199, 2014. doi: 10.1109/LGRS.2013.2288921. [199] J. F. Mustard and C. M. Pieters, “Photometric phase functions of common geologic minerals and applications to quantitative analysis of mineral mixture reflectance spectra,” J. Geophys. 266 Res., Solid Earth, vol. 94, no. B10, pp. 13,619–13,634, 1989. doi: 10.1029/JB094iB10p13619. [200] H. Shipman and J. B. Adams, “Detectability of minerals on desert alluvial fans using reflectance spectra,” J. Geophys. Res., Solid Earth, vol. 92, no. B10, pp. 10,391–10,402, 1987. doi: 10.1029/ JB092iB10p10391. [201] J. F. Mustard, L. Li, and G. He, “Nonlinear spectral mixture modeling of lunar multispectral data: Implications for lateral transport,” J. Geophys. Res., Planets, vol. 103, no. E8, pp. 19,419– 19,425, 1998. doi: 10.1029/98JE01901. [202] D. Dhingra, J. Mustard, S. Wiseman, M. Pariente, C. Pieters, and P. Isaacson, “Non-linear spectral un-mixing using Hapke modeling: Application to remotely acquired M3 spectra of spinel bearing lithologies on the moon,” in Proc. Lunar and Planetary Sci. Conf., 2011, vol. 42, p. 2431. [203] M. Gilabert, F. García-Haro, and J. Melia, “A mixture modeling approach to estimate vegetation parameters for heterogeneous canopies in remote sensing,” Remote Sens. Environ., vol. 72, no. 3, pp. 328–345, 2000. doi: 10.1016/S0034-4257(99)00109-1. [204] W. Song, X. Mu, G. Ruan, Z. Gao, L. Li, and G. Yan, “Estimating fractional vegetation cover and the vegetation index of bare soil and highly dense vegetation with a physically based method,” Int. J. Appl. Earth Observ. Geoinf., vol. 58, pp. 168–176, 2017. doi: 10.1016/j.jag.2017.01.015. [205] Q. Li, W. Luo, and F. Wang, “A PROSAIL-based spectral unmixing algorithm for solving vegetation spectral variability problem,” in Proc. MIPPR 2017: Multispectral Image Acquisition, Process., Anal., Xiangyang, China, 2018, vol. 10607, pp. 125– 130. [206] K. M. Cannon and J. F. Mustard, “A Monte Carlo approach to radiative transfer spectral unmixing,” in Proc. 48th Lunar Planetary Sci. Conf., The Woodlands, TX, 2017, pp. 1998– 1999. [207] N. Dobigeon, J.-Y. Tourneret, C. Richard, J. C. M. Bermudez, S. McLaughlin, and A. O. Hero, “Nonlinear unmixing of hyperspectral images: Models and algorithms,” IEEE Signal Process. Mag., vol. 31, no. 1, pp. 82–94, Jan. 2014. doi: 10.1109/ MSP.2013.2279274. [208] M. A. Veganzones et al., “A new extended linear mixing model to address spectral variability,” in Proc. 6th Workshop on Hyperspectral Image Signal Process.: Evolution Remote Sens., Lausanne, Switzerland, 2014, pp. 1–4. [209] S. Henrot, J. Chanussot, and C. Jutten, “Dynamical spectral unmixing of multitemporal hyperspectral images,” IEEE Trans. Image Process., vol. 25, no. 7, pp. 3219–3232, 2016. doi: 10.1109/ TIP.2016.2562562. [210] G. Tochon, L. Drumetz, M. A. Veganzones, M. Dalla Mura, and J. Chanussot, “From local to global unmixing of hyperspectral images to reveal spectral variability,” in Proc. 8th Workshop on Hyperspectral Image Signal Process.: Evolution Remote Sens., Los Angeles, CA, 2016, pp. 1–5. [211] L. Drumetz, B. Ehsandoust, J. Chanussot, B. Rivet, M. BabaieZadeh, and C. Jutten, “Relationships between nonlinear and space-variant linear models in hyperspectral image unmixing,” IEEE Signal Process. Lett., vol. 24, no. 10, pp. 1567–1571, 2017. doi: 10.1109/LSP.2017.2747478. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
[212] T. Imbiriba, R. A. Borsoi, and J. C. M. Bermudez, “Generalized linear mixing model accounting for endmember variability,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Calgary, Canada, 2018, pp. 1862–1866. [213] R. A. Borsoi, T. Imbiriba, and J. C. Moreira Bermudez, “Improved hyperspectral unmixing with endmember variability parametrized using an interpolated scaling tensor,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Brighton, U.K., 2019, pp. 2177–2181. [214] R. A. Borsoi, T. Imbiriba, P. Closas, J. C. M. Bermudez, and C. Richard, “Kalman filtering and expectation maximization for multitemporal spectral unmixing,” IEEE Geosci. Remote Sens. Lett., 2020. doi: 10.1109/LGRS.2020.3025781. [215] R. A. Borsoi, T. Imbiriba, and J. C. M. Bermudez, “Superresolution for hyperspectral and multispectral image fusion accounting for seasonal spectral variability,” IEEE Trans. Image Process., vol. 29, no. 1, pp. 116–127, 2020. doi: 10.1109/ TIP.2019.2928895. [216] L. Drumetz, J. Chanussot, C. Jutten, W.-K. Ma, and A. Iwasaki, “Spectral variability aware blind hyperspectral image unmixing based on convex geometry,” IEEE Trans. Image Process., vol. 29, pp. 4568–4582, 2020. doi: 10.1109/TIP.2020.2974062. [217] P.-A. Thouvenin, N. Dobigeon, and J.-Y. Tourneret, “Hyperspectral unmixing with spectral variability using a perturbed linear mixing model,” IEEE Trans. Signal Process., vol. 64, no. 2, pp. 525–538, Feb. 2016. doi: 10.1109/TSP.2015.2486746. [218] R. Arablouei, “Spectral unmixing with perturbed endmembers,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 1, pp. 194– 211, 2019. doi: 10.1109/TGRS.2018.2852745. [219] Y.-R. Syu, C.-H. Lin, and C.-Y. Chi, “An outlier-insensitive unmixing algorithm with spatially varying hyperspectral signatures,” IEEE Access, vol. 7, pp. 15,086–15,101, 2019. doi: 10.1109/ACCESS.2018.2890278. [220] J. Sigurdsson, M. O. Ulfarsson, J. R. Sveinsson, and J. M. Bioucas-Dias, “Sparse distributed multitemporal hyperspectral unmixing,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 11, pp. 6069–6084, 2017. doi: 10.1109/TGRS.2017.2720539. [221] P.-A. Thouvenin, N. Dobigeon, and J.-Y. Tourneret, “Online unmixing of multitemporal hyperspectral images accounting for spectral variability,” IEEE Trans. Image Process., vol. 25, no. 9, pp. 3979–3990, 2016. doi: 10.1109/TIP.2016.2579309. [222] R. A. Borsoi, T. Imbiriba, and J. C. M. Bermudez, “Deep generative endmember modeling: An application to unsupervised spectral unmixing,” IEEE Trans. Comput. Imag., vol. 6, pp. 374– 384, 2019. doi: 10.1109/TCI.2019.2948726. [223] R. A. Borsoi, T. Imbiriba, and J. C. Moreira Bermudez, “A data dependent multiscale model for hyperspectral unmixing with spectral variability,” IEEE Trans. Image Process., vol. 29, pp. 3638–3651, 2020. doi: 10.1109/TIP.2020.2963959. [224] J. Chen, X. Jia, W. Yang, and B. Matsushita, “Generalization of subpixel analysis for hyperspectral data with flexibility in spectral similarity measures,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 7, pp. 2165–2171, 2009. doi: 10.1109/ TGRS.2008.2011432. [225] L. Tits, W. De Keersmaecker, B. Somers, G. P. Asner, J. Farifteh, and P. Coppin, “Hyperspectral shape-based unmixing to imDECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE prove intra-and interclass variability for forest and agro-ecosystem monitoring,” ISPRS J. Photogrammetry Remote Sens., vol. 74, pp. 163–174, Nov. 2012. doi: 10.1016/j.isprsjprs.2012.09.013. [226] F. Kizel, M. Shoshany, N. S. Netanyahu, G. Even-Tzur, and J. A. Benediktsson, “A stepwise analytical projected gradient descent search for hyperspectral unmixing and its code vectorization,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 9, pp. 4925–4943, 2017. doi: 10.1109/TGRS.2017.2692999. [227] A. Halimi, J. M. Bioucas-Dias, N. Dobigeon, G. S. Buller, and S. McLaughlin, “Fast hyperspectral unmixing in presence of nonlinearity or mismodeling effects,” IEEE Trans. Comput. Imag., vol. 3, no. 2, pp. 146–159, 2017. doi: 10.1109/TCI.2016.2631979. [228] D. Hong, N. Yokoya, J. Chanussot, and X. X. Zhu, “An augmented linear mixing model to address spectral variability for hyperspectral unmixing,” IEEE Trans. Image Process., vol. 28, no. 4, pp. 1923–1938, 2019. doi: 10.1109/TIP.2018.2878958. [229] D. Hong and X. X. Zhu, “SULoRA: Subspace unmixing with low-rank attribute embedding for hyperspectral data analysis,” IEEE J. Sel. Topics Signal Process., vol. 12, no. 6, pp. 1351–1363, 2018. doi: 10.1109/JSTSP.2018.2877497. [230] T. Imbiriba, R. A. Borsoi, and J. C. M. Bermudez, “Low-rank tensor modeling for hyperspectral unmixing accounting for spectral variability,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 3, pp. 1833–1842, 2020. doi: 10.1109/TGRS.2019. 2949543. [231] S. Moussaoui, C. Carteret, D. Brie, and A. Mohammad-Djafari, “Bayesian analysis of spectral mixture data using Markov chain Monte Carlo methods,” Chemometr. Intell. Lab. Syst., vol. 81, no. 2, pp. 137–148, 2006. doi: 10.1016/j.chemolab.2005.11.004. [232] N. Dobigeon, S. Moussaoui, M. Coulon, J.-Y. Tourneret, and A. O. Hero, “Joint Bayesian endmember extraction and linear unmixing for hyperspectral imagery,” IEEE Trans. Signal Process., vol. 57, no. 11, pp. 4355–4368, 2009. doi: 10.1109/ TSP.2009.2025797. [233] D. Stein, “Application of the normal compositional model to the analysis of hyperspectral imagery,” in Proc. IEEE Workshop on Adv. Tech. Anal. Remotely Sens. Data, Greenbelt, MD, 2003, pp. 44–51. [234] M. T. Eismann and R. C. Hardie, “Stochastic spectral unmixing with enhanced endmember class separation,” Appl. Opt., vol. 43, no. 36, pp. 6596–6608, 2004. doi: 10.1364/AO.43.006596. [235] L. Liu, B. Wang, and L. Zhang, “Decomposition of mixed pixels based on Bayesian self-organizing map and Gaussian mixture model,” Pattern Recog. Lett., vol. 30, no. 9, pp. 820–826, 2009. doi: 10.1016/j.patrec.2008.05.026. [236] L. Gao, L. Zhuang, and B. Zhang, “Region-based estimate of endmember variances for hyperspectral image unmixing,” IEEE Geosci. Remote Sens. Lett., vol. 13, no. 12, pp. 1807–1811, 2016. doi: 10.1109/LGRS.2016.2614101. [237] B. Zhang, L. Zhuang, L. Gao, W. Luo, Q. Ran, and Q. Du, “PSOEM: A hyperspectral unmixing algorithm based on normal compositional model,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 12, pp. 7782–7792, 2014. [238] Y. Ma et al., “Hyperspectral unmixing with Gaussian mixture model and low-rank representation,” Remote Sens., vol. 11, no. 8, p. 911, 2019. doi: 10.3390/rs11080911. 267
[239] O. Eches, N. Dobigeon, C. Mailhes, and J.-Y. Tourneret, “Bayesian estimation of linear mixtures using the normal compositional model. Application to hyperspectral imagery,” IEEE Trans. Image Process., vol. 19, no. 6, pp. 1403–1413, 2010. doi: 10.1109/TIP.2010.2042993. [240] H. Kazianka, M. Mulyk, and J. Pilz, “A Bayesian approach to estimating linear mixtures with unknown covariance ­structure,” Appl. Stat., vol. 38, no. 9, pp. 1801–1817, 2011. doi: 10.1080/02664763.2010.529879. [241] H. Kazianka, “Objective Bayesian analysis for the normal compositional model,” Comput. Statist. Data Anal., vol. 56, no. 6, pp. 1528–1544, 2012. doi: 10.1016/j.csda.2011.08.016. [242] A. Halimi, N. Dobigeon, and J.-Y. Tourneret, “Unsupervised unmixing of hyperspectral images accounting for endmember variability,” IEEE Trans. Image Process., vol. 24, no. 12, pp. 4904– 4917, 2015. doi: 10.1109/TIP.2015.2471182. [243] O. Eches, N. Dobigeon, and J.-Y. Tourneret, “Estimating the number of endmembers in hyperspectral images using the normal compositional model and a hierarchical Bayesian algorithm,” IEEE J. Sel. Topics Signal Process., vol. 4, no. 3, pp. 582– 591, 2010. doi: 10.1109/JSTSP.2009.2038212. [244] A. Jafari, M. M. Ebadzadeh, and R. Safabakhsh, “Independent base vector representation to address endmember variability in hyperspectral unmixing,” J. Indian Soc. Remote Sens., vol. 45, no. 3, pp. 417–429, 2017. doi: 10.1007/s12524-016-0599-9. [245] C. Puladas, K. Hossler, and J. N. Ash, “Sum-product unmixing for hyperspectral analysis with endmember variability,” IEEE Geosci. Remote Sens. Lett., vol. 15, no. 12, pp. 1–5, 2018. doi: 10.1109/LGRS.2018.2861577. [246] L. Zhuang, B. Zhang, L. Gao, J. Li, and A. Plaza, “Normal endmember spectral unmixing method for hyperspectral imagery,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 8, no. 6, pp. 2598–2606, 2015. doi: 10.1109/JSTARS.2014. 2360888. [247] A. Zare and P. Gader, “PCE: Piecewise convex endmember detection,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 6, pp. 2620–2632, 2010. doi: 10.1109/TGRS.2010.2041062. [248] A. Zare, P. Gader, and G. Casella, “Sampling piecewise convex unmixing and endmember extraction,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 3, pp. 1655–1665, 2013. doi: 10.1109/ TGRS.2012.2207905. [249] F. Amiri and M. Kahaei, “A sparsity-based Bayesian approach for hyperspectral unmixing using normal compositional model,” Signal, Image Video Process., vol. 12, no. 7, pp. 1361–1367, 2018. doi: 10.1007/s11760-018-1290-0. [250] S. Zou and A. Zare, “Hyperspectral unmixing with endmember variability using partial membership latent Dirichlet allocation,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), New Orleans, LA, 2017, pp. 6200–6204. [251] Y. Zhou, A. Rangarajan, and P. D. Gader, “A spatial compositional model for linear unmixing and endmember uncertainty estimation,” IEEE Trans. Image Process., vol. 25, no. 12, pp. 5987–6002, 2016. doi: 10.1109/TIP.2016.2618002. [252] W. Luo, L. Gao, R. Zhang, A. Marinoni, and B. Zhang, “Bilinear normal mixing model for spectral unmixing,” IET Image Process., vol. 13, no. 2, pp. 344–354, 2018. doi: 10.1049/iet-ipr.2018.5458. 268 [253] S.-Y. Yu, S. M. Colman, and L. Li, “BEMMA: A hierarchical Bayesian endmember modeling analysis of sediment grain-size distributions,” Math. Geosci., vol. 48, no. 6, pp. 723–741, 2016. doi: 10.1007/s11004-015-9611-0. [254] P.-A. Thouvenin, N. Dobigeon, and J.-Y. Tourneret, “A hierarchical Bayesian model accounting for endmember variability and abrupt spectral changes to unmix multitemporal h ­ yperspectral images,” IEEE Trans. Comput. Imag., vol. 4, no. 1, pp. 32–45, 2018. doi: 10.1109/TCI.2017.2777484. [255] X. Du, A. Zare, P. Gader, and D. Dranishnikov, “Spatial and spectral unmixing using the beta compositional model,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 7, no. 6, pp. 1994–2003, 2014. doi: 10.1109/JSTARS.2014.2330347. [256] Y. Zhou, A. Rangarajan, and P. D. Gader, “A Gaussian mixture model representation of endmember variability in hyperspectral unmixing,” IEEE Trans. Image Process., vol. 27, no. 5, pp. 2242–2256, May 2018. doi: 10.1109/TIP.2018.2795744. [257] A. Halimi, P. Honeine, and J. M. Bioucas-Dias, “Hyperspectral unmixing in presence of endmember variability, nonlinearity, or mismodeling effects,” IEEE Trans. Image Process., vol. 25, no. 10, pp. 4565–4579, 2016. doi: 10.1109/TIP.2016. 2590324. [258] P. Bosdogianni, M. Petrou, and J. Kittler, “Mixture models with higher order moments,” IEEE Trans. Geosci. Remote Sens., vol. 35, no. 2, pp. 341–353, 1997. doi: 10.1109/36.563273. [259] M. Faraklioti and M. Petrou, “Illumination invariant unmixing of sets of mixed pixels,” IEEE Trans. Geosci. Remote Sens., vol. 39, no. 10, pp. 2227–2234, 2001. doi: 10.1109/36.957285. [260] B. Somers, L. Tits, D. Roberts, and E. Wetherley, “Endmember library approaches to resolve spectral mixing problems in remotely sensed data: Potential, challenges, and applications,” in Data Handling in Science and Technology, C. Ruckebusch, Ed. Amsterdam, The Netherlands: Elsevier, 2016, vol. 30, pp. 551– 577. [261] S. Tompkins, J. F. Mustard, C. M. Pieters, and D. W. Forsyth, “Optimization of endmembers for spectral mixture analysis,” Remote Sens. Environ., vol. 59, no. 3, pp. 472–489, 1997. doi: 10.1016/S0034-4257(96)00122-8. [262] E. B. Wetherley, D. A. Roberts, and J. P. McFadden, “Mapping spectrally similar urban materials at sub-pixel scales,” Remote Sens. Environ., vol. 195, no. 1, pp. 170–183, 2017. doi: 10.1016/j. rse.2017.04.013. [263] M. A. Veganzones and M. Grana, “Endmember extraction methods: A short review,” in Proc. Int. Conf. Knowledge-Based and Intell. Inf. Eng. Syst., 2008, pp. 400–407. [264] C. Quintano, A. Fernández-Manso, and D. A. Roberts, “Multiple endmember spectral mixture analysis (MESMA) to map burn severity levels from Landsat images in Mediterranean countries,” Remote Sens. Environ., vol. 136, pp. 76–88, Sept. 2013. doi: 10.1016/j.rse.2013.04.017. [265] A. Bateson and B. Curtiss, “A method for manual endmember selection and spectral unmixing,” Remote Sens. Environ., vol. 55, no. 3, pp. 229–243, 1996. doi: 10.1016/S0034-4257(95)00177-8. [266] S. Meerdink, J. Bocinsky, E. Wetherley, A. Zare, C. McCurley, and P. Gader, “Developing spectral libraries using multiple target multiple instance adaptive cosine/coherence estimator,” in IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
Proc. 10th Workshop on Hyperspectral Imaging Signal Process.: Evolution Remote Sens., Yokohama, Japan, 2019, pp. 1–5. [267] B. Somers, M. Zortea, A. Plaza, and G. P. Asner, “Automated extraction of image-based endmember bundles for improved spectral unmixing,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 5, no. 2, pp. 396–408, 2012. doi: 10.1109/ JSTARS.2011.2181340. [268] C. Gao, Y. Li, and C.-I. Chang, “Finding endmember classes in hyperspectral imagery,” in Proc. Satell. Data Compression, Commun. Process. XI, vol. 9501, Baltimore, MD, 2015, p. 95010M. [269] M. Xu, L. Zhang, B. Du, and L. Zhang, “An image-based endmember bundle extraction algorithm using reconstruction error for hyperspectral imagery,” Neurocomputing, vol. 173, pp. 397–405, Jan. 2016. doi: 10.1016/j.neucom.2015. 02.098. [270] C. Andreou, D. Rogge, and R. Müller, “A new approach for endmember extraction and clustering addressing inter-and intra-class variability via multiscaled-band partitioning,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 9, no. 9, pp. 4215–4231, 2016. doi: 10.1109/JSTARS.2016.2519610. [271] T. Uezato, R. J. Murphy, A. Melkumyan, and A. Chlingaryan, “A novel endmember bundle extraction and clustering approach for capturing spectral variability within endmember classes,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 11, pp. 6712–6731, 2016. doi: 10.1109/TGRS.2016.2589266. [272] J. Yin, C. Huang, X. Luo, and Q. Du, “Automatic endmember bundle unmixing methodology for lunar regional area mineral mapping,” Icarus, vol. 319, pp. 349–362, Feb. 2019. doi: 10.1016/j.icarus.2018.09.005. [273] M. Xu, L. Zhang, and B. Du, “An image-based endmember bundle extraction algorithm using both spatial and spectral information,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 8, no. 6, pp. 2607–2617, 2015. doi: 10.1109/ JSTARS.2014.2373491. [274] Z. Hua, X. Li, and L. Zhao, “Endmember bundle extraction based on pure pixel index and superpixel segmentation,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2019, pp. 2131–2134. [275] X. Xu, J. Li, C. Wu, and A. Plaza, “Regional clustering-based spatial preprocessing for hyperspectral unmixing,” Remote Sens. Environ., vol. 204, pp. 333–346, Jan. 2018. doi: 10.1016/j. rse.2017.10.020. [276] M. C. Torres-Madronero and M. Velez-Reyes, “Integrating spatial information in unsupervised unmixing of hyperspectral imagery using multiscale representation,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 7, no. 6, pp. 1985–1993, 2014. doi: 10.1109/JSTARS.2014.2319261. [277] C. Zhao, G. Zhao, and X. Jia, “Hyperspectral image unmixing based on fast kernel archetypal analysis,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 10, no. 1, pp. 331–346, 2017. doi: 10.1109/JSTARS.2016.2606504. [278] J. M. P. Nascimento and J. M. Bioucas-Dias, “Vertex Component Analysis: A fast algorithm to unmix hyperspectral data,” IEEE Trans. Geosci. Remote Sens., vol. 43, no. 4, pp. 898–910, Apr. 2005. doi: 10.1109/TGRS.2005.844293. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE [279] D. R. Peddle, F. G. Hall, and E. F. LeDrew, “Spectral mixture analysis and geometric-optical reflectance modeling of boreal forest biophysical structure,” Remote Sens. Environ., vol. 67, no. 3, pp. 288–297, 1999. doi: 10.1016/S0034-4257(98)00090-X. [280] P. E. Dennison, K. Charoensiri, D. A. Roberts, S. H. Peterson, and R. O. Green, “Wildfire temperature and land cover modeling using hyperspectral data,” Remote Sens. Environ., vol. 100, no. 2, pp. 212–222, 2006. doi: 10.1016/j.rse.2005.10.007. [281] L. Yang, K. Jia, S. Liang, X. Wei, Y. Yao, and X. Zhang, “A robust algorithm for estimating surface fractional vegetation cover from Landsat data,” Remote Sens., vol. 9, no. 8, p. 857, 2017. doi: 10.3390/rs9080857. [282] K. Jia et al., “Fractional vegetation cover estimation algorithm for Chinese GF-1 wide field view data,” Remote Sens. Environ., vol. 177, no. 1, pp. 184–191, 2016. doi: 10.1016/j.rse.2016. 02.019. [283] A. Verger, F. Baret, and F. Camacho, “Optimal modalities for radiative transfer-neural network estimation of canopy biophysical characteristics: Evaluation over an agricultural area with CHRIS/PROBA observations,” Remote Sens. Environ., vol. 115, no. 2, pp. 415–426, 2011. doi: 10.1016/j.rse.2010.09.012. [284] D. R. Peddle, “Integration of a geometric optical reflectance model with an evidential reasoning image classifier for improved forest information extraction,” Canadian J. Remote Sens., vol. 25, no. 2, pp. 189–196, 1999. doi: 10.1080/07038992.1999. 10874716. [285] L. Tits, B. Somers, and P. Coppin, “The potential and limitations of a clustering approach for the improved efficiency of multiple endmember spectral mixture analysis in plant production system monitoring,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 6, pp. 2273–2286, 2012. doi: 10.1109/TGRS.2011. 2173696. [286] R. A. Borsoi, T. Imbiriba, J. C. M. Bermudez, and C. Richard, “Deep generative models for library augmentation in multiple endmember spectral mixture analysis,” IEEE Geosci. Remote Sens. Lett., 2020. doi: 10.1109/LGRS.2020.3007161. [287] F. Maselli, “Definition of spatially variable spectral endmembers by locally calibrated multivariate regression analyses,” Remote Sens. Environ., vol. 75, no. 1, pp. 29–38, 2001. doi: 10.1016/ S0034-4257(00)00153-X. [288] B. Johnson, R. Tateishi, and T. Kobayashi, “Remote sensing of fractional green vegetation cover using spatially-interpolated endmembers,” Remote Sens., vol. 4, no. 9, pp. 2619–2634, 2012. doi: 10.3390/rs4092619. [289] W. Li, and C. Wu, “A geostatistical temporal mixture analysis approach to address endmember variability for estimating regional impervious surface distributions,” GISci. Remote Sens., vol. 53, no. 1, pp. 102–121, 2016. doi: 10.1080/15481603. 2015.1118975. [290] Z. Zhang, C. Liu, J. Luo, Z. Shen, and Z. Shao, “Applying spectral mixture analysis for large-scale sub-pixel impervious cover estimation based on neighbourhood-specific endmember signature generation,” Remote Sens. Lett., vol. 6, no. 1, pp. 1–10, 2015. doi: 10.1080/2150704X.2014.996677. [291] W. Li and C. Wu, “A geographic information-assisted temporal mixture analysis for addressing the issue of endmember class 269
and endmember spectra variability,” Sensors, vol. 17, no. 3, p. 624, 2017. doi: 10.3390/s17030624. [292] M.-D. Iordache, J. M. Bioucas-Dias, and A. Plaza, “Dictionary pruning in sparse unmixing of hyperspectral data,” in Proc. 4th Workshop on Hyperspectral Image and Signal Process.: Evolution Remote Sens., Shanghai, China, 2012, pp. 1–4. [293] K. L. Roth, P. E. Dennison, and D. A. Roberts, “Comparing endmember selection techniques for accurate mapping of plant species and land cover using imaging spectrometer data,” Remote Sens. Environ., vol. 127, pp. 139–152, Dec. 2012. doi: 10.1016/j.rse.2012.08.030. [294] Y. Xu, J. Shi, and J. Du, “An improved endmember selection method based on vector length for MODIS reflectance channels,” Remote Sens., vol. 7, no. 5, pp. 6280–6295, 2015. doi: 10.3390/rs70506280. [295] J. Degerickx, A. Okujeni, M.-D. Iordache, M. Hermy, S. van der Linden, and B. Somers, “A novel spectral library pruning technique for spectral unmixing of urban land cover,” Remote Sens., vol. 9, no. 6, p. 565, 2017. doi: 10.3390/rs9060565. [296] D. M. Rogge, B. Rivard, J. Zhang, and J. Feng, “Iterative spectral unmixing for optimizing per-pixel endmember sets,” IEEE Trans. Geosci. Remote Sens., vol. 44, no. 12, pp. 3725–3736, 2006. doi: 10.1109/TGRS.2006.881123. [297] J. Bian et al., “Monitoring fractional green vegetation cover dynamics over a seasonally inundated alpine wetland using dense time series HJ-1A/B constellation images and an adaptive endmember selection LSMM model,” Remote Sens. Environ., vol. 197, pp. 98–114, Aug. 2017. doi: 10.1016/j.rse.2017.05.031. [298] S. Roessner, K. Segl, U. Heiden, and H. Kaufmann, “Automated differentiation of urban surfaces based on airborne hyperspectral imagery,” IEEE Trans. Geosci. Remote Sens., vol. 39, no. 7, pp. 1525–1532, 2001. doi: 10.1109/36.934082. [299] Y. Deng and C. Wu, “Development of a class-based multiple endmember spectral mixture analysis (C-MESMA) approach for analyzing urban environments,” Remote Sens., vol. 8, no. 4, p. 349, 2016. doi: 10.3390/rs8040349. [300] F. Chen, K. Wang, and T. F. Tang, “Spectral unmixing using a sparse multiple-endmember spectral mixture model,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 10, pp. 5846–5861, 2016. doi: 10.1109/TGRS.2016.2574331. [301] T. Liu and X. Yang, “Mapping vegetation in an urban area with stratified classification and multiple endmember spectral mixture analysis,” Remote Sens. Environ., vol. 133, pp. 251–264, June 2013. doi: 10.1016/j.rse.2013.02.020. [302] J. Degerickx, D. A. Roberts, and B. Somers, “Enhancing the performance of multiple endmember spectral mixture analysis (MESMA) for urban land cover mapping using airborne LIDAR data and band selection,” Remote Sens. Environ., vol. 221, no. 1, pp. 260–273, 2019. doi: 10.1016/j.rse.2018.11.026. [303] F. Fan and Y. Deng, “Enhancing endmember selection in multiple endmember spectral mixture analysis (MESMA) for urban impervious surface area mapping using spectral angle and spectral distance parameters,” Int. J. Appl. Earth Observ. Geoinf., vol. 33, pp. 290–301, Dec. 2014. doi: 10.1016/j.jag.2014.06.011. [304] K. D. Singh and D. Ramakrishnan, “A comparative study of signal transformation techniques in automated spectral un- 270 mixing of infrared spectra for remote sensing applications,” Int. J. Remote Sens., vol. 38, no. 5, pp. 1235–1257, 2017. doi: 10.1080/01431161.2017.1280625. [305] M.-D. Iordache, J. M. Bioucas-Dias, A. Plaza, and B. Somers, “MUSIC-CSR: Hyperspectral unmixing via multiple signal classification and collaborative sparse regression,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 7, pp. 4364–4382, 2014. doi: 10.1109/TGRS.2013.2281589. [306] M.-D. Iordache, L. Tits, J. M. Bioucas-Dias, A. Plaza, and B. Somers, “A dynamic unmixing framework for plant production system monitoring,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 7, no. 6, pp. 2016–2034, 2014. doi: 10.1109/ JSTARS.2014.2314960. [307] X. Zhang et al., “Hyperspectral unmixing via low-rank representation with space consistency constraint and spectral library pruning,” Remote Sens., vol. 10, no. 2, p. 339, 2018. doi: 10.3390/rs10020339. [308] B. Kozintsev, “Computations with Gaussian random fields,” Ph.D. dissertation, Univ. of Maryland, College Park, 1999. [309] Z. Hao, M. Berman, Y. Guo, G. Stone, and I. Johnstone, “Semirealistic simulations of natural hyperspectral scenes,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 9, no. 9, pp. 4407–4419, 2016. doi: 10.1109/JSTARS.2016.2580178. [310] A. Berk et al., “MODTRAN5: A reformulated atmospheric band model with auxiliary species and practical multiple scattering options,” in Proc. Remote Sens. Clouds Atmos. IX, 2004, vol. 5571, pp. 78–85. [311] B. Somers, L. Tits, and P. Coppin, “Quantifying nonlinear spectral mixing in vegetated areas: Computer simulation model validation and first results,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 7, no. 6, pp. 1956–1965, 2013. doi: 10.1109/ JSTARS.2013.2289989. [312] B. Somers et al., “Nonlinear hyperspectral mixture analysis for tree cover estimates in orchards,” Remote Sens. Environ., vol. 113, no. 6, pp. 1183–1193, 2009. doi: 10.1016/j.rse.2009. 02.003. [313] N. Dobigeon, L. Tits, B. Somers, Y. Altmann, and P. Coppin, “A comparison of nonlinear mixing models for vegetated areas using simulated and real hyperspectral data,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 7, no. 6, pp. 1869–1878, 2014. doi: 10.1109/JSTARS.2014.2328872. [314] L. Tits, W. Delabastita, B. Somers, J. Farifteh, and P. Coppin, “First results of quantifying nonlinear mixing effects in heterogeneous forests: A modeling approach,” in Proc. IEEE Int. Geosci. Remote Sens. Symp., Munich, 2012, pp. 7185–7188. [315] R. Ramakrishnan, J. Nieto, and S. Scheding, “Shadow compensation for outdoor perception,” in Proc. IEEE Int. Conf. Robot. Automat. (ICRA), 2015, pp. 4835–4842. [316] D. A. Roberts, K. Halligan, P. Dennison, K. Dudley, B. Somers, and A. H. Crabbé. Viper Tools User Manual. (version 2.1), VIPER Lab., Univ. California Santa Barbara, Santa Barbara, CA. 92 pages. [317] X. Du and A. Zare. Gatorsense/betacompositionalmodel: Initial Release. (version 1.0). Zenodo. [Online]. Available: http://doi .org/10.5281/zenodo.2638288 GRS IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
WOMEN IN GRSS SHAWN C. KEFAUVER AND HEATHER MCNAIRN The Women in Engineering International Leadership Conference A s in previous years, the IEEE Geoscience and Remote Sensing Society (GRSS) Inspire, Develop, Empower, Advance (IDEA) Committee spent much of the spring getting prepared to host some exciting events at the International Geoscience and Remote Sensing Symposium (IGARSS). This year was no different, although uncertainty hung in the air as we awaited decisions on whether the largest annual GRSS event would be a hybrid or virtual one. Alas, the COVID-19 Delta variant swept across the globe in what was for many the fifth or sixth wave of coronavirus cases in the global pandemic, and IGARSS moved to all virtual. The IDEA Committee held its IGARSS GRSS Diversity Fireside Chat on Friday, 9 July 2021. The Fireside Chat was an opportunity to introduce ourselves and our core activities, following up on our success with the rollout of the Down to Earth Podcast, hosted by Stephanie Tumampos. We discussed our new and more inclusive structure and introduced our activity leads and coleads for IDEA’s core activities, including the Women Mentoring Women (WMW) program, our nascent Professional Development Microgrants, Women in Africa, our Diversity, Equality, and Inclusion Surveys, and our increasing connections with the IEEE Women in Engineering (WIE) group. We had lively discussions and considered it a great success and an excellent kickoff to the second fully virtual IGARSS conference. The WIE has been doing something quite extraordinary, which the IDEA committee has had several opportunities to benefit from, and we’d like to share our experiences. The WIE International Leadership Conference (ILC) also went all virtual this year, and with this format we were able to sponsor attendance to the ILC for four of our rising-star IDEA committee members: Margot Flemming and Victoria “Vicky” Vanthof, from our Digital Object Identifier 10.1109/MGRS.2021.3122734 Date of current version: 14 January 2022 DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE WMW core activity, and Mary Immaculate Neh Fru and Dr. Nkeiruka “Nke” Nneti Onyia, from our Women in Africa core initiative. Flemming is a Ph.D. student focusing on geostatistical approaches to downscaling coarse-resolution snow water equivalent estimates on a project with the government of the Northwest Territories in Canada (see Figure 1). She is a recent recipient of a National Science and Engineering Research Council Canada Graduate Scholarship—Doctoral, which supports her Ph.D. studies. As a colead of WMW, Flemming shares that “it is very fulfilling and powerful to be involved in supporting women in a male-dominated field connect and help each other grow and succeed.” Her favorite talk at the ILC was by Aisha Moore on how to manage stress and incorporate self-care into your professional life. Flemming expresses that “Being in academia, I often find that it becomes the norm to overwork yourself, and end in a burnout. I have seen it many times with my peers and have also felt it myself. A main takeaway from this session was that stress can be a positive thing, as long as it’s addressed in a healthy manner.” One quote from Moore that resonated with Flemming was “I don’t believe in stress-free living.” Regarding her take-home message from the conference, Flemming adds, “One thing I learned from this conference is to be more confident in my work and my intelligence, which is something I often struggle with. Hearing from so many women who have overcome a plethora of challenges and followed their dreams helped me realize that I too can do that if I just trust myself and believe in myself. Although it may be a slow journey, going forward, I hope to incorporate this into my research and onto my career following.” Vanthof is currently a Ph.D. candidate in geography at the University of Waterloo, also in Canada. She is working on remote sensing of surface-water resources to support water management for her dissertation thesis 271
her wisdom, “Everybody is a little behind somebody, so if we just pull one along with us, we will change the culture.” Vanthof expands, “She [Carter] pushed mentorship, and not only for senior researchers and leaders, but for everyone, which I [Vanthof] thought was so important.” This is an especially salient thought considering Vanthof ’s current role as colead for the GRSS WMW program, which supports women through mentorship to help them succeed. In addition to Carter’s words, Vanthof was also inspired by a quote and the Q&A interaction with Lynne Doherty, who first said that “Your career is a jungle gym,” on which Doherty later expanded with how she managed to navigate a family while pushing her career boundaries. Doherty humbly highlighted that it’s not easy but added, in a sentence that Vanthof said will stick with her forever, that “Your life is chapters in a book, sometimes it’s a chapter that’s for work and sometimes it’s on family. Sometimes it’s a good chapter and sometimes it’s a bad chapter, be okay with that in the moment.” As a final thought, Vanthof adds, “In my career thus far, I haven’t had the opportunity to attend a leadership conference or a conference that isn’t technical. While I knew that it would be different, I didn’t quite expect it to be as inspirational and motivating as it was.” Fru is presently one of IDEA’s coleads for our Women in Africa core initiative and a Ph.D. student at the University of Buea, Cameroon, in applied geology. Her specific interest is in remote sensing related to geosciences, especially on minera l e x plorat ion and disaster management (see Figure 3). One highlight from her current work is a team collaboration: “Assessing the Knowledge of Geoscience Education in Africa,” a forthcoming paper that attempts to explain the stereotypes and biases that exist in Earth sciences, especially in Africa. “Doing this project made me realize the gap FIGURE 1. Margot Flemming improving seasonal snow-monitoring approaches by incorpobetween women and men in the Earth rating satellite observations. (Source: Flemming.) sciences, and I proposed how this can be handled,” Fru adds. Fru also placed third in the IGARSS 2020 Women in Geoscience #InspireUs photo competition, organized by the IDEA committee. The contest inspired Fru to join IDEA, and she has since launched the first GRSS Chapter in Cameroon. Very impressive work. Since joining the GRSS, Fru states that being a member has “greatly helped me to learn more from experts in the field, meet wonderful mentors who are always open to listen and direct me on the right path to take, and most FIGURE 2. Victoria “Vicky” Vanthof doing geodesy calibrations in the field. (Source: Vanthof.) (see Figure 2). In 2019, Vanthof received the Hugh C. Morris Travel Fellowship. Vanthof reflects that “As a researcher you are always competing for awards, and during your Ph.D., you get exposed to writing grants and must learn to take rejection as a progress step and not as defeat. I applied for the fellowship primarily as a learning experience as it consisted of me developing my own research plan and budget, coordinated field work travel, and allowed me to think outside the box. I really struggled writing it because I felt like I wasn’t equipped or experienced enough to be awarded such a prestigious award, but I pushed through my doubts and submitted it. I received the award, and post-COVID will continue on my travel journey across the world.” It’s quite an accomplishment and a journey yet to fully unfold. Her first IGARSS conference was in Texas in 2017, and soon thereafter, Vanthof volunteered as a GRSS Social Media Ambassador for the North America Section before joining GRSS IDEA as the WMW program colead with Flemming. Their combined social media prowess has certainly showed during the launch of the GRSS-sponsored Down to Earth podcast earlier this year. Go team! Of her ILC experience, Vanthof was most inspired by an amazing talk by Sandy Carter and continues to quote 272 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
­especially, opened me more for scientific collaborations.” The talk that inspired Fru the most at ILC was “Lead Beyond: Advancing Women of Color in Engineering and STEM Entrepreneurship,” something that she could closely relate to as a black woman. The speaker provided definitive statistics on how black women are linked to science, technology, engineering, and mathematics (STEM) and how best to provide opportunities and work together for the better. Fru expressed that the virtual format was excellent, and the session was very interactive. Many questions popped up and with the virtual format, everybody had the opportunity to answer and interact during the session. Fru adds that “Every woman must choose to chalFIGURE 3. Mary Immaculate Neh Fru panning for heavy mineral concentrate. (Source: Fru.) lenge herself by learning new things all the time. That was my thought before attending the conference. I wanted to learn more from the different panelists, and I did. So many things were learned, like having a checklist to map your career as a STEM entrepreneur, which was very helpful, and I implemented it on different sectors of my career life. Also, I have to be self-aware, being conscious of my own character, feelings, motives, and desires.” Our final attendee at the ILC was Dr. Onyia. She is a research associate at the University of Leicester and CEO of LENKÉ Space and Water Solutions Ltd. Dr. Onyia also coleads the GRSS IDEA Women in Africa core initiative. Dr. Onyia is excited about working on developing the LENKÉ soil water index forecast tool (SWIFT) with a team of FIGURE 4. Dr. Nkeiruka “Nke” Nneti Onyia touring Leicester Space Park, the new site for scientists, programmers, and machine her expanding start-up, LENKÉ Space and Water Solutions Ltd. (Source: Dr. Onyia.) learning experts based in Canada and Latin America (see Figure 4). “SWIFT geographic information systems gained traction when is a satellite data-based tool designed to support natural she attended her first IGARSS conference in Milan in resource management in sub-Saharan Africa, particularly 2015. At the conference, she signed up for and particiwater and agricultural resources,” expands Dr. Onyia. Her pated in the WinGRSS luncheon, where she met womproudest accomplishment as a STEM entrepreneur is her en who had stayed on their paths, overcoming several start-up, LENKÉ Space and Water Solutions Ltd., a comchallenges. Her experience at IGARSS and the luncheon pany she cofounded with Dr. Lensa E. Jotte. First they won led Dr. Onyia to become an active volunteer in the IDEA the 2019 Copernicus hackathon hosted by the University committee. Her favorite talk at the ILC was the keynote of Leicester, then they won the Santander Seed Grant, and speech by Stacey Abrams. Dr. Onyia recalls that she next, LENKÉ became the first company to win the European “could relate to her life experience of facing and overSpace Agency Business Incubation Center in Leicester grant. coming challenges that instill fear in you. I admire the GRSS is a Society that Dr. Onyia says she cherishes for so many reasons. First and most significant is the fact that her professional journey in remote sensing and (continued on p. 282) DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 273
TECHNICAL COMMITTEES NAOTO YOKOYA, PEDRAM GHAMISI, RONNY HÄNSCH, COLIN PRIEUR, HANA MALHA, JOCELYN CHANUSSOT, CALEB ROBINSON, KOLYA MALKIN, AND NEBOJSA JOJIC Report on the 2021 IEEE GRSS Data Fusion Contest—Geospatial Artificial Intelligence for Social Good T he Image Analysis and Data Fusion Technical Committee (IADF TC) of the IEEE Geoscience and Remote Sensing Society (GRSS) has been organizing the annual Data Fusion Contest (DFC) since 2006. The contest promotes the development of methods for extracting geospatial information from large-scale, multisensor, multimodal, and multitemporal data. It aims to propose new problem settings that are challenging to address with existing techniques and to establish new benchmarks for scientific challenges in remote sensing image analysis [1]–[5]. THE 2021 DATA FUSION CONTEST The 2021 IEEE GRSS DFC promoted interdisciplinary research on geospatial artificial intelligence (AI) for social good. The ultimate goal of the contest is to build models to understand the state and changes in the manmade and natural environment using multisensor and multitemporal remote sensing data for sustainable development. This contest was designed as a benchmarking competition following previous editions [1], [2], [4], [6], [7]. The 2021 DFC had two tracks running in parallel: 1) Track DSE: detection of settlements without electricity 2) Track MSD: multitemporal semantic change detection. Track DSE, co-organized by Hewlett Packard Enterprise, SolarAid, and Data Science Experts, aimed to automatically detect human settlements without access to electricity using multimodal, multiresolution, and Digital Object Identifier 10.1109/MGRS.2021.3121628 Date of current version: 14 January 2022 274  multitemporal satellite remote sensing data. As input data, we used Sentinel-1 SAR data, Sentinel-2, Landsat-8, and Suomi Visible Infrared Imaging Radiometer Suite nighttime i m a ge s . The original ground sampling distance (GSD) ranged from 10 m to 750 m, but all images were resampled at 10 m. Semantic labels of four classes (i.e., settlements with and without electricity, no settlements with and without electricity) were provided at a GSD of 500 m for the training data. Participants submitted binary detection maps of settlements without electricity with a GSD of 500 m. The classification accuracy was assessed with the F1 score. The main challenge of Track DSE was to develop robust and efficient methods for extracting high-level semantic information from heterogeneous data. Track MSD, co-organized by Microsoft AI for Earth, focused on the automated detection and classification of land cover change from multitemporal, multiresolution, and multispectral imagery. The challenge of Track MSD was to create high-resolution land cover maps for two time periods using only low-resolution and noisy land cover labels for training. Participants were provided with 1) 1-m multispectral aerial imagery for 2013 and 2017 from the U.S. Department of Agriculture’s National Agriculture Imagery Program data, 2) 30-m multispectral satellite imagery (Landsat-8) for five time points between 2013 and 2017, and 3) 30-m noisy lowresolution land cover labels for 2013 and 2016 from the U.S. Geological Survey’s National Land Cover Database data over Maryland. Participants created IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
high-resolution (1-m GSD) land cover maps to identify semantic changes between 2013 and 2017. The performance was evaluated using the intersection-overunion metric averaged over eight types of change. The challenge was twofold: to detect which parts of the image changed between two highresolution aerial images, and to identify the class of change based on weak supervision. The 2021 DFC tackled two fundamental technical challenges rooted in real social problems: 1) analysis of multisensor, multiresolution, and multitemporal data, and 2) learning from weak supervision. These two issues are major challenges in a wide range of fields, from Earth observation to computer vision and maFIGURE 1. The awards for DFC 2021 were handed out during the virtual award ceremony of chine learning. The most important IGARSS21. The picture shows representatives of all eight winning teams as well as the chair feature of the 2021 DFC is that it is of IADF. directly related to social issues such as energy equality and environmental conservation. The re◗◗ Third place: dimartinot team; Thomas Di Martino, Maxsults of the contest will have a significant impact, not only ime Lenormand, and Elise Colin Koeniguer (ONERA, in terms of technological development, but also as a tool France); multibranch CNN with 3D convolutions and for solving real social problems. EfficientNet [10]. ◗ ◗ Third place: JIOJIO team; Ruoxian Feng, Mengjiao OUTCOME OF THE CONTEST Wang, Xuanming Zhang, and Jun Zhang (Xidian UniThe first- to fourth-ranked teams in each track were awardversity, China); ensembling of UNet and LinkNet with ed as winners of the contest and presented their solutions depthwise overparameterized convolutional layers [11]. during the 2021 IEEE International Geoscience and Remote The four winning teams of the 2021 DFC Track MSD Sensing Symposium (IGARSS 2021). were the following: The four winning teams of the 2021 DFC Track DSE were ◗◗ First place: AsheLee team; Zhuohong Li, Fangxiao Lu, the following: Hongyan Zhang, Guangyi Yang, and Liangpei Zhang (Wuhan University, China) [12]. ◗◗ First place: fengkexin team; Yanbiao Ma, Yuxin Li, Kexin Feng, and Xueli Geng (Xidian University, China); dual◗◗ Second place: tulilin team; Lilin Tu, Jiayi Li, and Xin task models based on squeeze-and-excitation networks Huang (Wuhan University, China) [13]. followed by postprocessing based on expert priors [8]. ◗◗ Third place: baoqianyue team; Qianyue Bao, Yang Liu, Zixiao Zhang, Dafan Chen, Yuting Yang, Licheng Jiao, ◗◗ Second place: WHU_YuXia team; Yu Xia, Qi Huang, and and Fang Liu (Xidian University, China) [14]. Hongyan Zhang (Wuhan University, China); ensembling of single-task and dual-task models based on glob◗◗ Fourth place: EVER team; Zhuo Zheng, Yinhe Liu, Shiqi al context convolutional neural networks and random Tian, Junjue Wang, Ailong Ma, and Yanfei Zhong (Wuforests [9]. han University, China) [15]. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 275
Contest Data The data of the 2021 Data Fusion Contest and its Codalab evaluation websites [Track DSE (https://competitions.codalab.org/competitions/27943) and Track MSD (https://competitions.codalab.org/competitions/27956)] with the public leaderboard will remain available for benchmarking algorithms and publishing research works. The data are usable free of charge for scientific purposes, but the contest terms and conditions on the contest webpage remain applicable (http://www.classic.grss-ieee.org/community/technical-committees/data -fusion/2021-ieee-grss-data-fusion-contest/). Please read them carefully. Join the GRSS IADF TC You can contact the Image Analysis Data Fusion Technical Committee (IADF TC) chairs at iadf_chairs@grss-ieee.org. If you are interested in joining the IADF TC, please complete the form on our website (http://www.grss-ieee.org/ community/technical-committees/data-fusion/) or send an email to us including your ◗◗ first and last name ◗◗ institution/company ◗◗ country ◗◗ IEEE membership number (if available) ◗◗ email address. Members receive information regarding research and applications on image analysis and data fusion topics, and updates on the annual Data Fusion Contest and on all other IADF TC activities. Membership in the IADF TC is free! You may join the LinkedIn IEEE GRSS data fusion discussion forum, http://www.linkedin .com/groups/IEEE-GRSS-Data-Fusion-Discussion-3678437, or join us on Twitter: Grssiadf. At the end of the competition, all winning teams wrote a four-page paper on their approach, which was peer-reviewed by the DFC organizing committee. These papers were included in the technical program of IGARSS 2021 and were presented in an invited session on the DFC during the symposium. All of these teams were awarded an IEEE Certificate of Recognition for their winning participation during the virtual award ceremony of IGARSS 2021 (see Figure 1). The first-, second-, and third-ranked teams in each track received special prizes, thanks to the support of the organizing partners. An extended article discussing the winning solutions of the first- and second-ranked teams will be submitted for peer review to the open access IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (JSTARS). As in previous years, the 2021 DFC attracted participants from a variety of disciplines, including AI and machine learning as well as the remote sensing community. The participation of such a diverse range of disciplines promotes the development of novel and interdisciplinary approaches to solve technical problems in the remote sensing and geoscience communities, and also leads to a movement to challenge global issues by bringing together knowledge from different fields. The winning teams are all studentled, and their extraordinary efforts have led to dramatic advances in technology for the new problems addressed in this competition and to the formation of a vibrant community. One unique feature that differentiates the contest from 276  previous editions is that the contest was not only a competition but also led to subsequent collaborative projects and real-world applications. ACKNOWLEDGMENTS The Image Analysis and Data Fusion Technical Committee chairs would like to thank the IEEE GRSS for continuously supporting the annual Data Fusion Content through funding and resources. AUTHOR INFORMATION Naoto Yokoya (yokoya@k.u-tokyo.ac.jp) is a lecturer at the University of Tokyo, Kashiwa, Chiba, 277-8561, Japan Pedram Ghamisi (p.ghamisi@gmail.com) is the head of the machine learning group, Helmholtz-Zentrum DresdenRossendorf, Freiberg, D-09599, Germany. Ronny Hänsch (rww.haensch@gmail.com) is with the German Aerospace Center, Weßling, 82234, Germany. Colin Prieur (colin.prieur@grenoble-inp.org) is with SICOM, Grenoble INP, Grenoble, Rhône-Alpes, 38400, France. Hana Malha (hana.malha@hpe.com) is with HPC&AI Competency Center, Grenoble, 38320, France. Jocelyn Chanussot (jocelyn.chanussot@grenoble-inp .fr) is with Grenoble INP, Grenoble, 38400, France. Caleb Robinson (caleb.robinson@microsoft.com) is with Microsoft AI for Good Research Redmond, Washington, 98052, USA. Kolya Malkin (kolya.malkin@yale.edu) is with Yale University, New Haven, Connecticut, 06520, USA. Nebojsa Jojic (jojic@microsoft.com) is with Microsoft Research Redmond, Washington, 98052, USA. REFERENCES [1] N. Yokoya et al., “Open data for global multimodal land use classification: Outcome of the 2017 IEEE GRSS data fusion contest,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 11, no. 5, pp. 1363–1377, May 2018, doi: 10.1109/JSTARS.2018.2799698. [2] Y. Xu et al., “Advanced multi-sensor optical remote sensing for urban land use and land cover classification: Outcome of the 2018 IEEE GRSS data fusion contest,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 12, no. 6, pp. 1709–1724, Jun. 2019, doi: 10.1109/JSTARS.2019.2911113. [3] B. Le Saux, N. Yokoya, R. Hansch, M. Brown, and G. Hager, “2019 data fusion contest [Technical Committees],” IEEE Geosci. Remote Sens. Mag., vol. 7, no. 1, pp. 103–105, Mar. 2019, doi: 10.1109/MGRS.2019.2893783. [4] C. Robinson et al., “Global land-cover mapping with weak supervision: Outcome of the 2020 IEEE GRSS data fusion contest,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 14, pp. 3185–3199, Mar. 2021, doi: 10.1109/ JSTARS.2021.3063849. [5] N. Yokoya et al., “2021 data fusion contest: Geospatial artificial intelligence for social good [Technical ­Committees],” IEEE Geosci. Remote Sens. Mag., vol. 9, no. 1, pp. 287–C3, 2021, doi: 10.1109/MGRS.2021.3055633. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
[6] S. Kunwar et al., “Large-scale semantic 3-D reconstruction: Outcome of the 2019 IEEE GRSS data fusion contest—Part A,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 14, pp. 922–935, Oct. 2020, doi: 10.1109/JSTARS.2020.3032221. [7] Y. Lian et al., “Large-scale semantic 3-D reconstruction: Outcome of the 2019 IEEE GRSS data fusion contest—Part B,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 14, pp. 1158–1170, Nov. 2020, doi: 10.1109/JSTARS.2020.3035274. [8] Y. Ma et al., “Multisource data fusion for the detection of settlements without electricity,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2021, pp. 1839–1842, doi: 10.1109/IGARSS47720.2021.9553860. [9] Y. Xia, Q. Huang, and H. Zhang, “A multi-model fusion of convolution neural network and random forest for detecting settlements without electricity,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2021, pp. 1843– 1846, doi: 10.1109/IGARSS47720.2021.9553087. [10] T. D. Martino, M. Lenormand, and E. C. Koeniguer, “Multibranch deep learning model for detection of settlements without electricity,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2021, pp. 1847–1850, doi: 10.1109/IGARSS47720.2021.9554286. [11] R. Feng et al., “DO-UNet, DO-LinkNet: UNet, D-LinkNet with DO-Conv for the detection of settlements without electricity challenge,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2021, pp. 1851–1854, doi: 10.1109/ IGARSS47720.2021.9553097. [12] Z. Li, F. Lu, H. Zhang, G. Yang, and L. Zhang, “Change cross-detection based on label improvements and multimodel fusion for multi-temporal remote sensing images,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2021, pp. 2054–2057, doi: 10.1109/IGARSS47720.2021. 9553120. [13] L. Tu, J. Li, and X. Huang, “High-resolution land cover change detection using low-resolution labels via a semisupervised deep learning approach - 2021 IEEE data fusion contest track MSD,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2021, pp. 2058–2061, doi: 10.1109/ IGARSS47720.2021.9555033. [14] Q. Bao et al., “MRTA: Multi-resolution training algorithm for multitemporal semantic change detection,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2021, pp. 2062–2065, doi: 10.1109/IGARSS47720.2021.9554425. [15] Z. Zheng, Y. Liu, S. Tian, J. Wang, A. Ma, and Y. Zhong, “Weakly supervised semantic change detection via label refinement framework,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2021, pp. 2066–2069, doi: 10.1109/ IGARSS47720.2021.9553768. TAP. CONNECT. NETWORK. SHARE. Connect to IEEE–no matter where you are–with the IEEE App. DECEMBER 2021 Stay up-to-date with the latest news Schedule, manage, or join meetups virtually Get geo and interest-based recommendations Read and download your IEEE magazines Create a personalized experience Locate IEEE members by location, interests, and affiliations IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 277
PAOLO DE MATTHAEIS Agenda Items of the World Radiocommunication Conference 2023 With a Potential Impact on Microwave Remote Sensing R emote sensing is the scientific discipline of measuring radiation emitted or reflected from an object or area to study its physical characteristics. In particular, microwave remote sensing uses the portion of the electromagnetic spectrum called the radio-frequency spectrum, which is defined as the range of frequencies from 9 kHz to 3,000 GHz by the International Telecommunication Union (ITU), a specialized agency of the United Nations. Rules governing the usage of the radio-frequency spectrum at the international level are contained in a treaty called Radio Regulations (RR) [1]. The Radiocommunication Sector of the ITU (ITU-R) is responsible for updating the RR at a World Radiocommunication Conference (WRC) that is held approximately every four years [2]. The next WRC will be held in November 2023 and is referred to as WRC-23. During a WRC, in addition to considering and deliberating on revisions to the RR proposed by the ITU members, the agenda for the following WRC is set. WRC agendas are composed of focused topics whose scope is described by accompanying resolutions. A First Conference Preparatory Meeting (CPM-1) is held to coordinate the work on the agenda items among six Study Groups of the ITU-R, which carry out technical studies with the contribution of the ITU Member States and Sector Members. Approximately six months before the WRC, a Second Conference Preparatory Meeting (CPM-2) takes place to consolidate the technical input for all study groups into one CPM Report that will then be used as a guideline in making decisions at the WRC [3]. The ITU-R study groups perform studies through their Working Parties (WPs), with each WP focusing on specific radiocommunication services and systems [1]. WP 7C, which falls under Study Group 7 (Science Services), is responsible for remote sensing systems. In ITU terminology, the radiocommunication service associated with spaceborne remote sensing instruments is called the Earth exploration-satellite Service (EESS), and it can be either active or passive. Figure 1 is a graphical illustration of the WRC-23 agenda items for which WP 7C is responsible and those to which WP 7C is contributing technical studies, as discussed during the CPM-1 [4] that followed WRC-19 and in Digital Object Identifier 10.1109/MGRS.2021.3120892 Date of current version: 14 January 2022 278  subsequent study group meetings. Note that some of the WRC-23 agenda items do not seek to change existing regulations globally, but only in some specific geographical areas. The ITU refers to them as ITU Regions [1]. These regions are shown in Figure 2 and will be used in the brief descriptions of the agenda items in the next section. AGENDA ITEMS AI 1.2: INTERNATIONAL MOBILE TELECOMMUNICATIONS BETWEEN 3.3 AND 10.5 GHz This agenda item will consider identification of the following frequency bands for international mobile telecommunications (IMT): ◗◗ 3,600–3,800 MHz and 3,300–3,400 MHz (in Region 2) ◗◗ 3,300–3,400 MHz (amend footnote RR No. 5.458 in Region 1) ◗◗ 6,425–7,025 MHz (in Region 1) ◗◗ 7,025–7,125 MHz (globally) ◗◗ 10,000–10,500 MHz (in Region 2). Resolution 245 (WRC-19) invites ITU-R to conduct sharing and compatibility studies that also consider protection of other coprimary services using these bands as well as services operating in adjacent bands. The remote sensing bands that could be affected are ◗◗ 6,425–7,250 MHz used by passive sensors without allocation (footnote RR No. 5.458) ◗◗ 10–10.4 GHz used by active sensors with a primary allocation ◗◗ 10.6–10.7 GHz used by passive sensors with a primary allocation. Footnote RR No. 5.458 indicates that administrations should keep in mind the needs of the remote sensing passive instruments in their future planning of the bands 6,425–7,075 MHz and 7,075–7,250 MHz as passive microwave sensor measurements are made in these frequency bands. AI 1.4: HIGH-ALTITUDE PLATFORM STATIONS AS INTERNATIONAL MOBILE TELECOMMUNICATIONS BASE STATIONS BELOW 2.7 GHz This agenda item seeks to extend the opportunities for the use of high-altitude platform stations as IMT base stations IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
Concerns for remote sensing related to this agenda item are the following: ◗◗ The adjacent 2,690–2,700-MHz band is allocated to passive scientific services, i.e., EESS (passive), and radio astronomy (RAS). ◗ ◗ Secondar y harmonics from portions of the 694– 960-MHz band place the L-band at 1,400–1,427 MHz at risk of interference. (HIBSs) in certain bands below 2.7 GHz already identified for IMT. The frequency bands under consideration are 694– 960 MHz, 1,710–1,885 MHz, 1,885–1,980 MHz, 2,010– 2,025 MHz, 2,110–2,170 MHz, and 2,500–2,690 MHz. The HIBSs are a combination of two types of systems, IMT and high-altitude platform stations, which individually have a high potential for interference. Responsible AI 1.12 AI 1.14 AI 9.1.a AI 9.1.d WP 4A WP 7C Remote Sensing Systems AI 1.15 AI 1.16 AI 1.17 AI 1.19 Contributing WP 5D AI 1.2 AI 1.4 AI 1.10 AI 1.13 AI 1.18 WP 5B WP 7B WP 4C WP 4A: Efficient Orbit/Spectrum Utilization for the Fixed-Satellite and Broadcasting-Satellite Services WP 4C: Efficient Orbit/Spectrum Utilization for the Mobile-Satellite and Radiodetermination-Satellite Services WP 5B: Maritime Mobile Service, Aeronautical Mobile Service, and Radiodetermination Service WP 5D: International Mobile Telecommunications (IMT) Systems WP 7B: Space Radiocommunication Applications 160° 140° 120° 100° 80° 60° 40° 20° 0° 20° 40° 60° 80° 100° 120° 140° 160° 180° 170° 170° FIGURE 1. The WRC-23 agenda item assignments to WP 7C. 75° 75° 60° 60° Region 1 Region 2 40° 40° 30° 20° 30° 20° 0° 0° 20° 30° 20° 30° 40° 40° Region 3 Region 3 60° 160° 140° 120° 100° 80° 60° 40° 20° 0° 20° 40° 60° 80° 100° 120° 140° 160° 180° 170° 170° 60° FIGURE 2. The ITU Regions. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 279
Also note that some passive instruments (e.g., wideband radiometers for cryosphere and salinity studies [5], [6]) are planned to operate at 0.5–2.0 GHz without allocation. AI 1.10: SHARING/COMPATIBILITY FOR AERONAUTICAL MOBILE AT 15.4–15.7 AND 22.21–22.5 GHz Resolution 430 (W RC -19) inv ites I T U-R to conduc t studies on spectrum needs for new nonsafety aeronautical mobile applications for air–air, ground–air, and air–ground communications of aircraft systems, particularly for ◗◗ possible new primary allocations to the aeronautical mobile service in the frequency band 15.4–15.7 GHz ◗◗ changing the primary 22–22.21 GHz allocation from “mobile except aeronautical mobile” to “mobile” service, which includes aeronautical mobile. The bands 15.35–15.4 GHz and 22.21–22.5 GHz are allocated to passive remote sensing systems as primary and are adjacent to the frequency ranges considered in this agenda item. However, currently no known missions have used or are using the 15.4–15.7-GHz band. The 22.21–22.5-GHz band is widely used for water vapor measurements. AI 1.12: POSSIBLE NEW SECONDARY ALLOCATION TO EARTH EXPLORATION-SATELLITE SERVICE (ACTIVE) AROUND 45 MHz Resolution 656 (WRC-19) invites one to study a new secondary allocation to the EESS (active) for spaceborne radar sounders in the 40–50-MHz band. These sensors would be used for investigating subsurface properties of polar ice and arid regions. The instruments would be spaceborne on a low Sunsynchronous orbit at an altitude of around 400 km and be subject to additional operational constraints, i.e., the radar is to transmit only over some geographic areas (Antarctica, Greenland, and the Sahara) at night between 3 a.m. and 6 a.m. local time to minimize errors due to ionospheric perturbations and limit any impact on other radiocommunication services. Technical parameters are still being discussed for an update of Report ITU-R RS.2455, “Preliminary Results of Sharing Studies Between a 45-MHz Radar Sounder and Incumbent Fixed, Mobile, Broadcasting, and Space Research Services Operating in the 40–50-MHz Frequency Range.” AI 1.13: POSSIBLE UPGRADE OF 14.8–15.35 GHz TO THE SPACE RESEARCH SERVICE Resolution 661 (WRC-19) invites ITU-R to conduct sharing and compatibility studies to determine the feasibility of upgrading the space research service (SRS) allocation to primary status in the frequency band 14.8–15.35 GHz, while still ensuring protection of the primary services fixed and mobile within the band and 280  R AS, EESS (passive), and SRS (passive) in the adjacent band 15.35–15.4 GHz. Since the band would be used for transmitting and receiving scientific data and related telemetry information, this agenda item falls under the responsibility of WP 7B. The 14.8–15.35-GHz band is already a primary allocation for SRS in the U.S. Table of Allocations under RR 5.340, and no emissions are allowed in the frequency range of 15.35–15.4 GHz. The primary concern is the potential for radio-frequency interference from out-ofband emission (OOBE) caused by transmissions from the adjacent band. AI 1.14: ALLOCATIONS TO PASSIVE REMOTE SENSING IN THE 231.5–252-GHz RANGE This agenda item considers possible adjustments of the existing or potential new primary frequency allocations to the EESS (passive) in the frequency range 231.5–252 GHz, with the purpose of better aligning the EESS (passive) allocations created 20 years ago with updated passive sensor design requirements or adding possible new allocations to the EESS (passive). Current EESS (passive) allocations are 235–238 GHz and 250–252 GHz. The band 237.9–238 GHz is also allocated to the EESS (active) for spaceborne cloud radars only. These frequencies can be used for measurement of ice cloud properties, and the 243.2-GHz band is being considered for future ice cloud imaging passive sensors. AI 1.15: GEOSTATIONARY EARTH STATIONS IN MOTION AT 12.75–13.25 GHz Agenda Item 1.15 is “to harmonize the use of the frequency band 12.75–13.25 GHz (Earth-to-space) by Earth stations on aircraft and vessels communicating with geostationary space stations in the fixed-satellite service globally, in accordance with Resolution 172 (WRC-19).” The potential for OOBEs into the adjacent EESS (active) allocation at 13.25–13.75 GHz is of particular concern for the scientific services. Resolution 172 (WRC-19) also mentions space-to-Earth operations at 10.7–10.95 GHz, which is adjacent to the 10.6–10.7-GHz EESS (passive) frequency band. AI 1.16: EARTH STATIONS IN MOTION NEAR 18.6–18.8 GHz AND OTHER BANDS Resolution 662 (WRC-19) invites the ITU-R to study and develop technical, operational, and regulatory measures for the use of the frequency bands 17.7–18.6, 18.8–19.3, 19.7–20.2 (space-to-Earth), and 27.5–29.1 and 29.5–30 GHz (Earth-to-space) by Earth stations in motion (ESIM) in nongeostationary orbit (non-GSO). The ESIM operations are intended to provide broadband data services to aeronautical (commercial and business aviation) and maritime routes (passenger cruise and merchant ships, fishing vessels, and so on) on a global basis. Within the frequency bands under consideration, the bands IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
17.7–18.6 GHz and 18.8–19.3 GHz are adjacent to the band 18.6–18.8 GHz, which is allocated to the EESS (passive). Instruments, such as the Advanced Microwave Scanning Radiometer 2 (AMSR-2) and global precipitation measurement microwave imager (GPM-GMI) sensor, operating at 18.6–18.8 GHz are already experiencing interference from reflections off the ocean surface of broadcast signals from geostationary satellites, so particular attention needs to be paid to this issue. AI 1.17: INTERSATELLITE LINKS AT 11.7–12.7, 18.1–18.6, 18.8–20.2, AND 27.5–30 GHz This agenda item calls for studies on provisions to allow satellite-to-satellite links to be operated in several frequency bands allocated to fixed satellite services. Some of these bands are adjacent to 18.6–18.8 GHz, where EESS (passive) systems also operate. Similarly to AI 1.16, there are concerns for interference due to reflection off Earth’s surface as well as from the direct path to the remote sensing sensor. AI 1.18: SPECTRUM NEEDS AND POTENTIAL NEW ALLOCATIONS TO THE MOBILE SATELLITE SERVICE FOR FUTURE DEVELOPMENT OF NARROWBAND MOBILE SATELLITE SYSTEMS This agenda item calls for consideration of new allocations to the mobile satellite service to be used by low-data-rate systems for the collection and management of data from terrestrial devices in the following bands: ◗◗ 1,695–1,710 MHz in Region 2 ◗◗ 2,010–2,025 MHz in Region 1 ◗◗ 3,300–3,315 MHz, 3,385–3,400 MHz in Region 2. The main concern for scientific services is that the frequency band 1,695–1,710 MHz is allocated to the meteorological satellite service and is primarily used for downlink of meteorological data from non-GSO meteorological satellites to ground stations around the world, thus potentially affecting other regions beside Region 2. Also, this frequency band is allocated to the meteorological aids on a primary basis in all three regions. The frequency band 3,100–3,300 MHz, adjacent to 3,300– 3,400 MHz, is allocated on a secondary basis to the EESS (active), with a potential for out-of-band interference affecting active remote sensing instruments operating in this band. AI 1.19: NEW PRIMARY ALLOCATION TO THE FIXED SATELLITE SERVICE IN THE SPACE-TO-EARTH DIRECTION IN THE FREQUENCY BAND 17.3–17.7 GHz IN REGION 2 Remote sensing has primary EESS (active) allocation in the 17.2–17.3-GHz band, and new fixed satellite service operations at 17.3–17.7 GHz could potentially result in increased adjacent band interference. AI 9.1.a: SPACE WEATHER SENSORS Resolution 657 (WRC-19) calls for studies on technical and operational characteristics, spectrum requirements, and DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE appropriate radio service designations for space weather sensors to achieve appropriate recognition and protection in the RR without placing additional constraints on incumbent services. No regulatory action is to be taken at WRC-23 as an outcome of this agenda item. ITU-R WP 7C has prepared a comprehensive Report ITU-R RS.2456, “Space Weather Sensor Systems Using Radio Spectrum.” The following two other documents are under development: ◗◗ spectrum requirements and applicable radio service designations for receive-only space weather sensors that provide data critical for predictions and warnings ◗◗ interference criteria of receive-only space weather sensors. AI 9.1.d: PROTECTION OF EARTH EXPLORATIONSATELLITE SERVICE (PASSIVE) AT 36–37 GHz A preliminary study was performed for WRC-19 Agenda Item 1.6 on the protection of EESS (passive) sensors operating in the band 36–37 GHz from space stations on non-GSO in large constellations at 37.5–38 GHz. As a result, WRC-19 invited the ITU-R to conduct further analyses on this topic. The nearby EESS (passive) 36–37-GHz band is critical for satellite passive microwave measurements primarily of precipitation and sea ice. AMSR-2, GMI, and WindSat operate in this band, and the planned European Space Agency mission Copernicus Imaging Microwave Radiometer will also use it. OOBEs into 36–37 GHz can arise in several ways: ◗◗ reflections off Earth’s surface, particularly from the ocean and ice ◗◗ direct coupling into the sensor receiving antenna ◗◗ interference with cold-sky calibration. THE FREQUENCY ALLOCATIONS IN REMOTE SENSING TECHNICAL COMMITTEE AND THE IEEE GEOSCIENCE AND REMOTE SENSING SOCIETY VIEWS ON WRC-23 AGENDA ITEMS DOCUMENT The Frequency Allocations in Remote Sensing (FARS) Technical Committee was established by the IEEE Geoscience and Remote Sensing Society (GRSS) in 2000. It is intended as a means for the GRSS community to discuss spectrum management issues that affect the remote sensing field and defend the interests of the remote sensing community in matters relevant to frequency allocations. Its membership includes scientists and engineers working in remote sensing at a variety of institutions worldwide. The mission of the FARS Technical Committee is to serve as an interface between the GRSS and the radio-frequency regulatory world by ◗◗ educating the remote sensing community on spectrum management processes and issues ◗◗ promoting the development of radio-frequency interference detection and mitigation technology ◗ ◗ organizing technical sessions at conferences, workshops, and so on regarding the preceding processes, issues, and technologies 281
◗◗ providing spectrum managers and regulators with tech- nical input and perspective from remote sensing scientists and engineers ◗◗ fostering the exchange of information between researchers in different fields, such as remote sensing, radio astronomy, and telecommunications, with the common scope of minimizing harmful interference between systems. The FARS Technical Committee is working on a document providing views on international regulatory issues affecting remote sensing operations. In particular, the GRSS Views document includes the THE GRSS VIEWS WRC-23 agenda items that could DOCUMENT INCLUDES have a potential impact on reTHE WRC-23 AGENDA mote sensing operations that ITEMS THAT COULD HAVE have been introduced here and A POTENTIAL IMPACT ON on other ITU-R topics that could REMOTE SENSING also affect remote s­ensing. The purpose of this document is to OPERATIONS THAT HAVE be a tool enabling GRSS memBEEN INTRODUCED HERE bers to familiarize themselves AND ON OTHER ITU-R with these issues and inform TOPICS THAT COULD ALSO remote sensing scientists and AFFECT REMOTE SENSING. engineers of these concerns so that they may engage in their administrations’ decision-making process to consider them. The FARS Technical Committee encourages the participation of the entire remote sensing community in developing this document. If you are interested in participating in this effort, please contact the WOMEN IN GRSS  REFERENCES [1] “Radio regulations,” International Telecommunication Union, Geneva, Switzerland, 2020. [Online]. Available: https://www .itu.int/pub/R-REG-RR/en [2] “Collection of the basic texts adopted by the plenipotentiary conference,” Constitution and Convention of the International Telecommunication Union, Geneva, Switzerland, 2019. [Online]. Available: https://www.itu.int/dms_pub/itu-s/opb/conf/ S-CONF-PLEN-2019-PDF-E.pdf [3] P. de Matthaeis, R. Oliva, Y. Soldo, and S. Cruz-Pol, “Spectrum management and its importance for microwave remote sensing [Technical Committees],” IEEE Geosci. Remote Sens. Mag., vol. 6, no. 2, pp. 17–25, June 2018, doi: 10.1109/MGRS.2018. 2832057. [4] “Results of the first session of the conference preparatory meeting for WRC-23 (CPM23-1),” ITU Radiocommunication Bureau (BR), Geneva, Switzerland, Administrative Circular CA/251, Dec. 19, 2019. [5] G. Macelloni et al., “Cryorad: A low frequency wideband radiometer mission for the study of the cryosphere,” in Proc. IGARSS 2018 2018 IEEE Int. Geosci. Remote Sens. Symp., pp. 1998–2000, doi: 10.1109/IGARSS.2018.8519172. [6] J. T. Johnson et al., “Microwave radiometry at frequencies from 500 to 1400 MHz: An emerging technology for earth observations,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 14, pp. 4894–4914, Apr. 2021, doi: 10.1109/JSTARS.2021.3073286. GRS (continued from p. 273) fact that she chose to embrace her fears, and through them found her calling.” Expanding on her ILC experience, Dr. Onyia was also “honestly surprised with how much I related to the experiences and realities shared by speakers at the conference. From tips on how to advance as a woman of color in STEM entrepreneurship, to tips on promoting a psychologically safe space at your workplace (my first time coming across this concept), to rocking difficult conversations, I found these topics very relevant to my current situation with my company and my other job role.” On the virtual experience, she adds that she would still prefer the face-to-face event, especially the networking aspect, but that it was a good compromise considering the situation. She had attended a previous ILC in person and adds it was a really eye-opening experience for her, even beyond her expectations: “The topics 282 Technical Committee chair and cochairs at fars_chairs@ grss-ieee.org. addressed were so contemporary and relevant to current work environments, and they provided handy tools for easy implementation.” At IDEA, our mission goal is to inspire, develop, empower, and advance all GRSS members. IDEA has developed an amazing team of dedicated volunteers. The sponsorship of attendance at the ILC has proved to be an inspirational experience for our IDEA members. The ILC speakers shared experiences, insights, and advice that will stay with our four attendees as they continue to pursue their incredible careers. We are a global community, and this coming together of women leaders is helping all women feel inspired and empowered. The IDEA committee is grateful to be able to support our core-initiative coleads in their journeys. GRS IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
Harness the publishing power of IEEE Access. ® IEEE Access is a multidisciplinary open access journal offering high-quality peer review, with an expedited, binary review process of 4 to 6 weeks. As a journal published by IEEE, IEEE Access offers a trusted solution for authors like you to gain maximum exposure for your important research. Explore the many benefits of IEEE Access: • Receive high-quality, rigorous peer review in only 4 to 6 weeks • Reach millions of global users through the IEEE Xplore® digital library by publishing open access • Submit multidisciplinary articles that may not fit in narrowly focused journals • Obtain detailed feedback on your research from highly experienced editors Learn more at ieeeaccess.ieee.org • Establish yourself as an industry pioneer by contributing to trending, interdisciplinary topics in one of the many topical sections IEEE Access hosts • Present your research to the world quickly since technological advancement is ever-changing • Take advantage of features such as multimedia integration, usage and citation tracking, and more • Publish without a page limit for $1,750 per article
CHAPTERS YUMING LU AND FUYOU TIAN Activities of the IEEE GRSS University of Chinese Academy of Sciences Student Branch Chapter T he IEEE Geoscience and Remote Sensing Society (GRSS) University of Chinese Academy of Sciences (UCAS) Student Branch Chapter is youthful and energetic. It was established on 2 September 2013 and was the first IEEE GRSS Student Branch Chapter in Beijing. Dr. Bin Peng served as the inaugural chair and Yuming Lu succeeded the previous chair Fuyou Tian. Currently, the Chapter officers are as follows: ◗◗ chair: Yuming Lu ◗◗ vice chair: Zhengdong Wang and Ke Zhang ◗◗ secretary: Yangjian Zhang and Xinyu Qian ◗◗ treasurer: Jingjing Zhao ◗◗ Young Professionals (YP) representative: Subit Chak­rabarti ◗◗ advisor: Guoqing Li. The Chapter conducts academic exchange activities in the field of remote sensing science and technology through the IEEE and IEEE GRSS platforms. It aims to enhance the understanding and connection of Student Members to IEEE and IEEE GRSS. It also helps members to obtain sufficient technical resources and assistance and promote communication between members and professionals. public account to summarize the work that has been carried out, to publicize the work to be carried out, and to mobilize everyone’s enthusiasm through publicity. Up to 1 June 2021, there were 211 subscribers to our account. IEEE FELLOW ACADEMIC FORUM IEEE Fellow Academic Forum is a distinguishing activity during which we invite an IEEE Fellow to deliver a lecture to Student Members. We have done this every year since 2015. On 17 December 2019, our Chapter held the fifth IEEE Fellow Academic Forum at Aerospace Information Research Institute (AIR), Chinese Academy of Sciences (CAS). At this event, we invited the Institute of Remote Sensing and Digital Earth (RADI) Graduate Student Association as co-organizers. Our distinguished invited guest was Prof. Bing Zhang, deputy dean of the AIR Institute, the director SOCIALIZATION OF THE ORGANIZATION To better promote the IEEE GRSS UCAS Student Branch Chapter, we built a WeChat official public account, which is widely used in China. An official logo was designed for the Student Chapter by Fuyou Tian, Zonghan Ma, and Yuming Lu, which combined elements of the GRSS logo with “Beijing” (Figure 1). Our WeChat official account number is IEEE_GRSS_UCAS. We put the introduction of IEEE and IEEE GRSS on the site to help students have a better understanding of the Chapter and the IEEE. Moreover, we use the website and ­WeChat Digital Object Identifier 10.1109/MGRS.2021.3115788 Date of current version: 14 January 2022 284 FIGURE 1. The official logo of the IEEE GRSS UCAS Student Branch Chapter. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
(a) (b) FIGURE 2. Live photos of the fifth IEEE Fellow Academic Forum. (a) Prof. Bing Zhang presenting his distinguished lecture. (b) A group photo of the lecture. of the Key L ­ aboratory of D ­ igital Earth of the CAS, and a professor at the UCAS (Figure 2). Because of his outstanding contributions in hyperspectral image acquisition (hyperspectral image acquisition and processing), he was one of the three scientists elected as IEEE Fellows in the IEEE GRSS community in 2019. His lecture title was “The Evolution of Civilization and Scientific Thinking,” which delivered an experience and thinking about scientific research. Around 60 IEEE Student Members attended this IEEE forum. On 25 November 2020, our Chapter held the fifth IEEE GRSS webinar and sixth IEEE Fellow Academic Forum online with the IEEE GRSS China office. Our distinguished invited guest was Prof. Jun Li, professor in the School of Geography and Planning, Sun Yat-Sen University, and editor-in-chief of IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (JSTARS) (Figure 3). Her main research interests comprise remotely sensed hyperspectral image analysis, signal processing, remote sensing image processing, supervised/semisupervised learning, and active learning. She is also a GRSS Distinguish Lecturer and was elected as an IEEE Fellow in the IEEE GRSS community in 2020. In this talk, Prof. Jun Li provided a comprehensive overview about how to write a paper for publication from the viewpoint of the editor-in-chief of JSTARS. Her lecture title was “How to Write a Paper for Publication with IEEE.” In total, 244 people attended this lecture. TECHNICAL ACTIVITIES On 28 October 2020, we organized the IEEE Members to attend the IEEE GRSS webinar, “Deep Learning for Remote Sensing Image Analysis: Applications, Methods, and Perspectives,” held by the IEEE GRSS China office (Figure 4). Our invited guest was Runmin Dong, senior researcher from Sense Time. In total, 124 people attended this lecture. Deep learning (DL) algorithms have seen a massive rise in popularity over the past few years and have achieved DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE (a) (b) FIGURE 3. (a) The poster of the sixth IEEE Fellow Academic Forum. (b) A screenshot of the Zoom webinar. 285
significant success in many remote sensing image analysis tasks. Sense Time is a leading global artificial intelligence (AI) company focused on developing cutting-edge AI technologies driven by DL algorithms. In the report, “Sense Remote,” the self-developed WE DEEM THAT DL algor it hms for remote MULTISPECTRAL UAV WILL sensing image by Sense Time BE USEFUL IN QUANTITATIVE were introduced, including ANALYSIS OF REMOTE land cover classification, object detection, semantic segSENSING AND WILL OBTAIN mentation, change detection, THE TRUTH OF LANDSCAPE and image super-resolution. CLASSIFICATION IN THE This report involved some sigNEAR FUTURE. nification issues in DL, for example, semiautomatic labeling, noise label learning, and multitask learning. The effectiveness of these algorithms has been shown by introducing practical applications, such as detection of illegal construction of buildings, protection of green space and other natural resources, and so on. (a) TRAINING WORKSHOP AND FAMOUS CORPORATION VISITS We held a Multibands and Multispectral Unmanned Aerial Vehicle (UAV) training workshop on 15 November 2019. When we visited the UAV in Nanjing Agricultural University, Prof. Cheng Tao, the chairman of Nanjing Chapter, showed us the multispectral UAV in his lab during the IEEE GRSS chairman meeting in 2019. We deem that multispectral UAV will be useful in quantitative analysis of remote sensing and will obtain the truth of landscape classification in the near future. We held a workshop to train members how to use the DJ P4M, one type of multispectral drone (Figure 5). The 2019 IEEE Chinese Student Congress was held in Hangzhou on 29–30 July 2019. During the congress, IEEE GRSS Student Branch Chapter representatives attended this meeting and visited Alibaba Group and Zhijiang Lab with other Chinese Student Members (Figure 6). The activity was organized by the IEEE Inc. Beijing Representative Office, aiming to promote cooperation between Alibaba Group and the IEEE Inc. Beijing Representative Office. Alibaba Group and Zhijiang Lab is a leading technology institute in China and is appealing to (b) FIGURE 4. Screenshots of Dr. Dong presenting her research in the webinar. (a) Dr. Dong explaining the method of building extraction and (b) showing the results of her research. (a) (b) FIGURE 5. (a) A photo of the multibands and multispectral UAV training when visiting a UAV device at Nanjing Agricultural University. (b) A photo of a multispectral drone training workshop. 286 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
(a) (b) FIGURE 6. (a) A group photo taken when we visited Alibaba Group. (b) A photo of representatives from UCAS. FUTURE PLANS The IEEE GRSS UCAS Student Branch Chapter will continue to organize technical activities to support and serve Student Members and provide opportunities for Student Members to communicate with IEEE Fellows in their fields of interest. In addition, as important issues were carried out in 2020, the WeChat public account matured. We will continue to maintain the running and update of the WeChat public account and will use it to carry out publicity work. AUTHOR INFORMATION Yuming Lu (luym@aircas.ac.cn) is with the College of Resources and Environment, University of Chinese Academy of Sciences, Beijing, 100049, China and State Key Laboratory of Remote Sensing Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, 100101, China. Fuyou Tian (tianfy@aircas.ac.cn) is with the College of Resources and Environment, University of Chinese Academy of Sciences Beijing 100049, China and State Key Laboratory of Remote Sensing Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, 100101, China.  GRS Are You Moving? Update your contact information so you don’t miss an issue of this magazine! Change your address E-MAIL: address-change@ieee.org PHONE: +1 800 678 4333 in the United States IMAGE LICENSED BY INGRAM PUBLISHING graduates. Alibaba Group considered the possibility of providing internship positions for IEEE Student Members when answering a related question. or +1 732 981 0060 outside the United States If you require additional assistance regarding your IEEE mailings, visit the IEEE Support Center at supportcenter.ieee.org. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 287
We have 30 million reasons to be proud. Thanks to our donors, supporters and volunteers who answered the call of the Realize the Full Potential of IEEE Campaign, helping impact lives around the world through the power of technology and education. Illuminate Educate Engage Energize Realize Your Impact Learn how: ieeefoundation.org/campaign
EDUCATION FRANK CANTERS AND FRIEKE VAN COILLIE IGARSS 2021 High School Program Green in the City S upported by the IEEE Geoscience and Remote Sensing Society (GRSS) High School and Undergrad Student Outreach Program (HSUSO), the educational chairs of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS 2021 developed a remote sensing school project targeting 16- and 17-yearold pupils in the third grade of secondary education in Flanders, Belgium. HSUSO was led by Dr. Linda Bailey Hayden and Dr. Josée Lévesque. The Green in the City project focused on the role of urban green in cities. Using dedicated course materials, including knowledge clips, manuals and tutorials, and data sets for several Belgian cities, the project offered hands-on training in basic remote sensing and geographic information system (GIS) skills. By exploring spectral reflectance properties of different materials, participating students learned how to map urban green from satellite data and analyze the relation between urban green and other environmental and socioeconomic variables at the intraurban scale. The project reached more than 500 students from 20 schools. Given the success of the project, plans to roll out an international version of the program are in the pipeline. BACKGROUND In 2014–2015, three Belgian universities (VUB, UGent, KULeuven) launched the Geomobiel (Geomobile) project. Over a period of one and a half years, more than 5,000 pupils in the third grade of secondary education in Flanders and Brussels were introduced to the basics of surveying, remote sensing, and GIS through a series of workshops organized on site in the more than 100 schools that participated in the project. Geomobiel turned out to become such a success that in 2019, when preparations for IGARSS 2021 started, the idea Digital Object Identifier 10.1109/MGRS.2021.3126597 Date of current version: 14 January 2022 DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE FIGURE 1. A teacher instructing pupils. came up to recycle the remote sensing component of Geomobiel and use it as a basis for developing a new educational project that would focus on sharpening student’s geospatial research skills. The main idea was to introduce students to basic remote sensing concepts through hands-on tutorials and train them in using geospatial tools to independently perform research on important environmental themes related to urban green, quality of life and sustainability. (See Figures 1–3.) TUTORIALS With the support of the IEEE GRSS HSUSO, a set of tutorials was developed consisting of three main modules. Module 1 focuses on how to produce an urban vegetation map from medium-resolution multispectral imagery. Students are made familiar with the characteristics of multiresolution satellite imagery by examining spectral reflectance curves for different types of urban land cover. Applying this knowledge, 289
they learn how to produce false color composite images and how to interpret this type of imagery. They are also introduced to the concept of vegetation indices and how to combine information from different spectral bands to calculate the normalized difference vegetation index (NDVI) at pixel level. After examining the histogram of the NDVI, they use thresholding to separate vegetation from nonvegetation (Figure 4). In Module 2, students learn how to transform image data into maps describing environSTUDENTS ARE MADE mental properties at the neighFAMILIAR WITH THE CHARborhood level. Using basic GIS ACTERISTICS OF MULTIREStools, they learn how to agOLUTION SATELLITE IMAGgregate pixel level data to administrative units and produce ERY BY EXAMINING SPECmaps showing the spatial variTRAL REFLECTANCE CURVES ation in greenness and average FOR DIFFERENT TYPES OF land surface temperature withURBAN LAND COVER. in the city. They also acquire basic cartographic skills, enabling them to produce good looking maps with a well-chosen color scheme, legend and other map components like map scale, north arrow, title, and so on. Module 3 focuses on examining spatial relationships between different types of data. By linking greenness and temperature maps at the neighborhood level with census data (population density, age, income, level of education, housing, and so on), students are encouraged to explore relationships among greenness, temperature, and sociodemographic/housing characteristics (Figure 5). They are also invited to look for other data on the web that might be useful to examine the role of urban green in cities, inequalities in access to urban green, and environmental justice issues. TEACHING THE TEACHERS While implementing the Green in the City project, an important role was given to the teachers. Before rolling out the project in the participating schools, teachers that registered for the project were subdivided into small working groups (five teachers each). During an introductory group session, teachers were informed about the goals of the project and the content of the tutorials and received detailed instructions for downloading data and other materials and for installing the software to get started. Next, with respect to the tutorial materials, they also received a detailed teacher’s guide providing background information on the concepts introduced in the tutorial, an overview of dos and don’ts, as well as, suggestions for working out research projects with small groups of students using the material provided. A period of one month was suggested for the teachers to get acquainted with the QGIS software and with the materials provided, before starting to work on the project with their classes. 290 The project ran over a three-month period (March– May 2021). During the entire project, online support was provided to teachers for solving technical issues, for providing additional information on concepts or methods used, and for sharing tips and tricks. About once a month, a group session was organized to exchange information and experiences with and among the teachers. Overall, teachers and students proved very enthusiastic about the project and about the materials provided. Few problems occurred. In some schools, technical problems were reported at the very start of the project that mostly had to do with installing the QGIS software on different types of platforms. Yet, apart from these startup issues, in most schools the project ran smoothly. Although the time reserved for the project differed from one school to another, depending on the amount of free space available in the curriculum; on average students spent around 12 h on the project. SCHOOL COMPETITION After completing the tutorials, participating groups of students were challenged to demonstrate their skills and develop their own research project using the image and census data provided for several Belgian cities. They were also invited to take part in the IGARSS2021 high school program competition by preparing a 5-min English-spoken multimedia presentation summarizing the results of their research. FIGURE 2. Students discussing preliminary results. FIGURE 3. Pupils carrying out independent spatial research on the relationship between urban green and temperature. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
development of an international demo version of the tutorial is on the table. Upscaling for international use will involve the translation of all teaching materials to English, including tutorials, instruction vidOVERALL, TEACHERS AND eos and knowledge clips, the STUDENTS PROVED VERY teacher’s guide, and the techENTHUSIASTIC ABOUT THE nical manual. It also requires PROJECT AND ABOUT THE the preparation and formatMATERIALS PROVIDED. ting of image and census data sets for different cities in the world. Besides, Brussels as an exemplar European city, the idea is to compile readyto-use data sets for two other cities for this international Presentations were evaluated based on: a) the scientific nature of the presentation and interpretation of the results; b) the storyline of the presentation; c) layout of the presentation, quality of figures, use of language/sound; and d) creativity and originality. The contributions of the three winners of the competition are available on the IGARSS2021 website. The three winning groups will be invited with their class for a day visit to the European Space Agency and Technology Center in Noordwijk (NL) in the Fall of 2021. OUTLOOK Given the success of the Green in the City project, ideas are taking shape to make the teaching material accessible and useful for a larger, international audience. As a first step, the Q-GIS Software Grid Data Raster Calculations NIR – R NIR + R Infrared Red Blue Green NDVI Value = Ghent, NIR, Band 4 NDVI-Map Water Vegetation Soil 0.3 mm 1 mm Ghent, Red, Band 3 NDVI-Map Ghent, NDVI Raster Calculations Vegetation Map NDVI ≥ T Then Pixel Value = 1 = Vegetation Ghent, NDVI NDVI < T Then Pixel Value = 0 = No Vegetation Ghent, Vegetation FIGURE 4. Module 1: mapping urban green from satellite data. DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE 291
Ghent Greenness 0 2.5 5 km Ghent Population Density 0.01–0.22 0.22–0.4 0.4–0.55 0.55–0.66 0.66–0.94 0 2.5 5 km Ghent Mean Temperature 0 2.5 5 km 1–174 174–981 981–2,938 2,938–6,202 6,202–19,348 Ghent Mean Income 36.9–42.1 42.1–43.8 43.8–45.2 45.2–46.4 46.4–49.3 0 2.5 5 km 15,664–23,839 23,839–27,623 27,623–32,293 32,293–37,609 37,609–48,593 FIGURE 5. Module 3: examining spatial relationships among different types of data. demo, one in the United States and one in Canada. This will allow teachers and pupils, in both Europe and the United States/Canada, to address questions related to urban green provision in a city they can easily relate to, but also to study regional and/or intercontinental differences. ACKNOWLEDGMENTS We would like to thank all participating students for their enthusiasm and for making the IGARSS 2021 high school program a success! We would also like to express 292 our gratitude to the teachers, for taking the initiative to participate in the project and for inspiring their students all the way. AUTHOR INFORMATION Frank Canters (frank.canters@vub.be) is with Vrije Universiteit Brussels, Brussels, 1050, Belgium. Frieke Van Coillie (frieke.vancoillie@ugent.be) is with Ghent University, Ghent, 9000, Belgium. GRS IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
IN MEMORIAM Thomas C. von Deak (1954–2021) T homas (Tom) C. von Deak passed away on 30 July 2021. He was a skilled electrical engineer working for NASA at the Glenn Research Center, Cleveland, Ohio, until his retirement, in 2018. Beginning in 2020, along with consulting for the National Oceanic and Atmospheric Administration (NOAA) on matters of frequency management, he became increasingly involved with the IEEE Geoscience and Remote Sensing Society (GRSS) Frequency Allocations in Remote Sensing (FARS) Technical Committee, participating in many of its activities and becoming its senior advisor. Tom received his B.S. degree in electrical engineering from Cornell University, Ithaca, New York. His NASA career had many phases. From 1991 to 1996, Tom worked on NASA’s Advanced Communications Technology Satellite (ACTS), contributing to the success of that unique program, which paved the way for the use of Ka-band frequencies for commercial and government communications. ACTS was a significant component of the NASA space communications program and provided for the development and flight testing of high-risk, advanced communications technologies. Using multiple spot beam antennas and advanced onboard switching and processing systems, ACTS pioneered initiatives in communications satellite technology. NASA’s Glenn Research Center (previously the Lewis Research Center) was responsible for the development, management, and operation of ACTS as part of a long legacy of experimental communications satellites. Tom led efforts to obtain public sector partnerships to collaborate with the ACTS Experiment Office; these partner experimenters represented a variety of technologies and fields, including the ISDN, high data rates, and the sciences. ACTS became known as a “switchboard in the sky,” and the geostationary satellite was launched in September 1993. While Digital Object Identifier 10.1109/MGRS.2021.3120438 Date of current version: 14 January 2022 DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE originally planned for 24–48 months of operation, the system remained operational for 127. After the launch of ACTS, Tom was part of the ACTS Experiments Office, which worked with industry and academia on various satellite-based experiments. Through its coordination, 53 Gigabit Earth Stations were built and used by more than 100 experimenters. As leader of the demonstration planning team, Tom was a participant in many of those experiments, including a groundbreaking demonstration of 520-Mb/s TCP/IP throughput using asynchronous transfer mode (ATM) among several ground stations. During this time, he FROM 1991 TO 1996, authored papers for telecomTOM WORKED ON munications conferences and NASA’S ADVANCED coauthored several IEEE papers COMMUNICATIONS pertaining to high-data-rate applications of ACTS technology. TECHNOLOGY SATELLITE, He also represented NASA at the CONTRIBUTING TO THE National Institute of Standards SUCCESS OF THAT UNIQUE and Technology ISDN Forum PROGRAM. to advance satellite–terrestrial ISDN telephony interoperability. For their pioneering achievements, Tom and the ACTS team received several NASA awards, and the ACTS Gigabit Satellite Network was officially inducted into the Space Technology Hall of Fame. Building on his experience with ACTS, Tom also represented NASA in the ATM Forum to obtain a specification inclusive of geosynchronous satellite requirements. The ATM Forum was a telecommunication consortium that established industry standards for ATM protocols operating predominately over fiber optics. Just prior to the official end of the ACTS experiments program, in 2000, Tom looked for new challenges and joined the NASA Spectrum Management Office at the Glenn Research Center. He first worked on various aspects of spectrum management supporting space 293
Tom led efforts to improve the interresearch missions and space radiocomference environment for passive sensors munication systems for NASA programs. through the development of and advocacy He assisted, for example, in efforts to obfor inputs to task groups of the ITU-R. He tain spectrum authorization from the coordinated NASA interests with other U.S. National Telecommunications and Inforgovernment agencies, such as NOAA and mation Administration (NTIA) to enable the National Science Foundation. He supNASA’s Tracking Data Relay Satellite Sysported the NTIA in drafting a proposal to tem (TDRSS) to provide support for comWRC 2003 on agenda item 1.8.2. The promercial programs, and he eventually got posal met with opposition from active serinvolved in international spectrum manvice interests but underscored the need to agement work, attending various technical find a means for protecting passive service meetings of the International Telecommu- FIGURE 1. Thomas von Deak was operations from adjacent band interfernication Union Radiocommunication Sec- a fierce advocate of frequency spectrum use for remote sensing ence. His efforts culminated in the WRC tor (ITU-R). 2007 decision to mandate specific protecHis first foray into supporting remote applications. tions in the radio regulations for certain sensing spectrum management came in passive sensing frequency bands that are critical to weather various groups that were studying the effects of ultrawideforecasting and climate studies. band communications systems and their possible impact Delegations to WRCs represent national commercial on space radiocommunication systems and remote sensing and not-for-profit interests. Participating as a U.S. delegate systems, especially passive remote sensing systems. As a requires not only expertise and the ability to converse U.S. delegate to many World Radiocommunication Conferclearly on highly technical subjects but strong negotiating ences (WRCs), including those in 2003, 2007, 2012, 2015, skills, gained largely from experience. Through many years, and 2019, Tom contributed vital work in the area of proTom was responsible for various WRC agenda items of tecting passive sensors from out-of-band (OOB) emissions. interest to NASA, working on important studies and texts At the 2003 WRC, for example, he represented NASA on for the ITU-R Conference Preparatory Meeting (CPM) reports two key issues related to enthat form the basis of members’ proposals to the WRC. suring the long-term protecAs part of that work, he was active in the Organization tion of TDRSS forward-link TOM BECAME PASSIONATE of American States Inter-American Telecommunication spectrum and shielding pasABOUT SPECTRUM Commission (CITEL). At CITEL, he was a U.S. delegate and sive service bands from adjaMANAGEMENT FOR REMOTE spokesperson for several WRC proposals that were imporcent band interference. Both tant to NASA and remote sensing. Tom was also detailed matters were hotly debated SENSING SYSTEMS AND for some time to the NASA Systems Engineering Office, and required long hours of THEIR PROTECTION FROM Space Communications and Navigation division, at NASA negotiation. Considered to RADIO-FREQUENCY Headquarters. In this role, he provided valuable input in be the most contentious topic INTERFERENCE. systems planning and engineering for NASA’s communicaof the conference that year, tion networks. the TDRSS issue was fully Although he had a great deal of experience in various resolved, ensuring long-term aspects of telecommunication and radiocommunication, access for space missions, notably the Space Transport Sysespecially using satellites and other space assets, Tom betem and the International Space Station. The U.S. head of came passionate about spectrum management (Figure 1) delegation recognized Tom in writing for his contributions. for remote sensing systems and their protection from radioTom continued to devote time and energy to remote frequency interference (RFI). He took on many roles, assistsensing, particularly spaceborne passive microwave sening the NASA remote sensing spectrum manager at various sors providing the ability to obtain all-weather, day and international venues, such as ITU-R Working Party 7C, night, global observations of Earth and its atmosphere. He which deals with remote sensing systems. He also particiorganized and cochaired an international workshop on Earth exploration-satellite service (EESS) wideband downpated in the Space Frequency Coordination Group, which link spectrum to examine how to best utilize the 8,025– includes more than 30 of the world’s civil space agencies. 8,400-MHz band for downlinking Earth remote sensing He represented NASA on the World Meteorological Organidata in a cooperative manner. He developed the agenda zation subgroup on RF issues as a subject matter expert in and solicited papers/presenters from across all aspects of active and passive remote sensing systems. the EESS community (government and private sector EESS Tom strongly believed in mentoring and sharing knowlservice providers—ground station operators, foreign and edge. He developed content for training classes to prodomestic, as well as federal regulators). Many of the attendvide information to developing-country regulators on ees wrote to say that the workshop was the best they had the use of spaced-based remote sensing, the value of participated in. spectrum regulations, and the importance of protecting 294 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
the operation of remote sensing and GPS. He also developed and taught a popular course to International Space University graduate students. Through role-playing exercises in a model WRC setting, students learned about the procedures and practices of an actual intergovernmental treaty meeting. It was during his work to protect passive remote sensing bands from OOB emissions that Tom became involved with FARS Technical Committee. He recognized that the remote sensing community needed to be more involved with spectrum management to promote its interests. He took the initiative to prepare a comprehensive presentation, “Spectrum 101 for Remote Sensing,” that he gave at a special session organized by the FARS Technical Committee at the 2006 International Geoscience and Remote Sensing Symposium (IGARSS), in Denver, Colorado. He attended many subsequent IGARSS meetings and assisted FARS members with spectrum management matters through his efforts to educate and interact with remote sensing scientists and engineers. In 2020, already a FARS senior advisor on spectrum management, he proposed and became involved with an initiative to standardize the methodology to quantitatively assess frequency bands with respect to RFI and contributed, among other things, to defining the GRSS views for WRC23 agenda items. Overall, Tom assisted FARS members with spectrum management for more than 15 years as part of his commitment to educate and engage remote sensing scientists and engineers. In the United States, Tom participated in some activities of the National Academy of Sciences Committee on Radio Frequencies (CORF). He gave briefings to CORF members at their annual meetings on various aspects of spectrum management, including consideration of WRC agenda items that might directly or indirectly affect remote sensing science. He also worked with representatives of NASA’s Science Mission Directorate Earth Sciences Division to further the interest and awareness of remote sensing spectrum management within the NASA science community. In 2013, Tom was tapped to be NASA’s remote sensing spectrum manager. Chief among his responsibilities was ensuring that all spectrum/regulatory aspects of NASA’s Earth science remote sensing program were addressed and protected in ITU-R meetings, and his contributions to ITU-R Working Party 7C were of critical importance. He advanced work at the ITU-R level in several areas, including ITU-R report RS.2178, “The Essential Role and Global Importance of Radio Spectrum Use for Earth Observations and Related Applications” and two ITU-R recommendations, RS.1859-1 and RS.1883-1: “Use of Remote Sensing Systems for Data Collection to be Used in the Event of Natural Disasters and Similar Emergencies” and “Use of Remote Sensing Systems in the Study of Climate Change and the Effects Thereof.” These international documents have helped bring awareness of the importance of Earth observation systems and DECEMBER 2021 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE the radio spectrum that they use to many people in the telecommunications field. Tom generated U.S. input to ITU-R document 7C/101, “Active Sensor Characteristics on Using Allocated Bands From 432 MHz to 238 GHz Applications,” and he was instrumental in developing ITU-R recommendation RS.2106-0, “Detection and Resolution of RFI to Earth Exploration Satellite Service (Passive) Sensors,” which helps explain the difficulties of dealing with RFI for passive sensors on satellites. He also contributed to the development and updates of recommendation RS.2017-0, “Performance and Interference Criteria for Satellite Passive Remote Sensing,” which added remote sensing applications above 275 GHz to the ITU-R technical literature. He led a multiyear effort to develop recommendation RS.2105-0, “Typical Technical and Operational Characteristics of Earth ExplorationSatellite Service (Active) Systems Using Allocations Between 432 MHz and 238 GHz,” and he was a key participant in efforts to update recommendation RS.1861-0, “Typical Technical and Operational Characteristics of Earth Exploration-Satellite Ser vice (Passive) Systems Using Allocations Between 1.4 and 275 GHz,” which was recently accepted at ITU-R Study Group 7 (Science Services), the parent organizaHIS PASSION FOR THE tion of Working Party 7C. BEAUTY AND When the approval process SUSTAINABILITY OF THIS is complete, the result will be an updated and much PLANET FUELED HIS improved RS.1861-1 thanks ADVOCACY FOR SCIENCE in no small part to Tom’s efTO STUDY EARTH AND forts during many years. MITIGATE CLIMATE After retiring from NASA, CHANGE. Tom continued to support the remote sensing community in various venues as a consultant to NOAA. He was always passionate about the need to protect Earth observation and remote sensing systems, especially those employing passive sensors on satellites. His calm demeanor and logical technical and policy arguments were effective in international negotiations. His insight and expertise in spectrum management, especially for remote sensing, were highly valued by his many colleagues both domestically and internationally. Always curious about how things work, Tom had an inventive nature and often customized tools and equipment for special purposes. His passion for the beauty and sustainability of this planet fueled his advocacy for science to study Earth and mitigate climate change. With a long career that took him from early satellite communications to global negotiations on behalf of NASA and the United States, Tom left an indelible imprint on spectrum management for remote sensing, and his absence creates a hole that will be difficult to fill. 295
Gail Skofronick-Jackson (1963–2021) W actly who she was—an avid outdoors ith a heavy heart, we announce that person, an athlete, and an admirer of Dr. Gail Skofronick-Jackson, 58, of our Earth.” McLean, Virginia, died suddenly on 7 SepGail is survived by her beloved tember 2021. Gail was deployed with a joint husband of 29 years, Dr. David JackNASA–European Space Agency airborne son, and their children, Marina (25) campaign team in St. Croix, U.S. Virgin Isand Matthew (23); her parents, Dr. lands. On a day off from the experiments, James and Dot Skofronick of Talshe perished in a tragic accident while hiklahassee; brothers, Greg of Ann ing with colleagues. Arbor, Michigan, and Gary (Anna) Gail was a brilliant scientist—as well Skofronick of DeLand, Florida; sisas a deeply passionate and principled perter, Gretchen (Dr. Paul) Desch of Nason—who carried her enthusiasm for life perville, Illinois; and many nieces, over into her career at NASA. She was a dednephews, aunts, uncles, and cousins. icated researcher whose interests included Dr. Gail Skofronick-Jackson. (Photo Gail loved spending time with her passive remote sensing, radiative transfer courtesy of Warren Shultzaberger/NASA.) family, traveling, and cooking gourtheory, and the detection and estimation of met meals with her husband. Her falling snow using active and passive spacemany interests included running, swimming, hiking, cavborne sensors. ing, and gardening. She was a dedicated mother who took Gail was born in Madison, Wisconsin, on 12 February great joy in her children’s numerous interests and activities. 1963, and moved to Tallahassee in 1964 with her parents, Gail cherished her many friends, especially in the McLean Dr. James and Dot Skofronick. Gail received her B.S. deMoms Run This Town running community. She was also gree in electrical engineering active in her church, the Foundry United Methodist Church from Florida State Univerin Washington, D.C. sity. She went on to complete WITHIN THE GRSS, GAIL Gail was an IEEE Fellow and active within the IEEE her M.S. and Ph.D. degrees in CHAMPIONED IEEE WOMEN Geoscience and Remote Sensing Society (GRSS), servelectrical engineering from IN ENGINEERING, ing on its Administrative Committee from 2012 to 2016. the Georgia Institute of TechORGANIZED SUCCESSFUL She was associate editor of IEEE Transactions on Geoscience nology, after which she was and Remote Sensing and IEEE Geoscience and Remote Sensing hired at NASA. At the time of WOMEN IN GRSS EVENTS, Magazine. She was also part of the local organizing comher death, she was a program AND PAVED THE WAY FOR mittee for the 2020 International Geoscience and Remote manager at NASA HeadquarTHE GRSS WOMEN Sensing Symposium. Within the GRSS, Gail championed ters, Science Mission DirecMENTORING WOMEN IEEE Women in Engineering, organized successful Women torate. Dr. Karen St. Germain, INITIATIVE. in GRSS events, and paved the way for the GRSS Women director of the Earth Sciences Mentoring Women initiative. Division at NASA HeadquarBecause Gail was always excited to encourage young ters and Gail’s longtime colwomen to pursue science, technology, engineering, and league, stated that “she was one of our very best—brilliant, mathematics careers, a memorial scholarship has been esthoughtful, and deeply committed to the science we do and tablished in her name for students studying science and the integrity with which we do it. And she died being exelectrical engineering at Florida State University. Information about this scholarship can be found at https://spark.fsu Digital Object Identifier 10.1109/MGRS.2021.3132777 .edu/Project/1935. GRS Date of current version: 14 January 2022 296 IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE DECEMBER 2021
What + If = IEEE 420,000+ members in 160 countries. Embrace the largest, global, technical community. People Driving Technological Innovation. ieee.org/membership knowledge community #IEEEmember professional development career advancement
Share Your Preprint Research with the World! TechRxiv is a free preprint server for unpublished research in electrical engineering, computer science, and related technology. Powered by IEEE, TechRxiv provides researchers across a broad range of fields the opportunity to share early results of their work ahead of formal peer review and publication. BENEFITS: • Rapidly disseminate your research findings • Gather feedback from fellow researchers • Find potential collaborators in the scientific community • Establish the precedence of a discovery • Document research results in advance of publication Upload your unpublished research today! Follow @TechRxiv_org Learn more techrxiv.org Powered by IEEE