Автор: Gonzalez-Perez C.   Martin-Rodilla P.   Pereira-Fariña M.  

Теги: history   archaeology   archaeological research  

ISBN: 978-3-031-37155-4

Год: 2023

Текст
                    Quantitative Archaeology and Archaeological Modelling

Cesar Gonzalez-Perez
Patricia Martin-Rodilla
Martín Pereira-Fariña Editors

Discourse and
Argumentation
in Archaeology:
Conceptual and
Computational
Approaches


Quantitative Archaeology and Archaeological Modelling Series Editors Andrew Bevan, University College London, London, UK Oliver Nakoinz, Institut für Ur- und Frühgeschichte, University of Kiel, Kiel, Germany
Quantitative approaches and modelling techniques have played an increasingly significant role in archaeology over the last few decades, as can be seen both by their prominence in published research and in university courses. Despite this popularity, there remains only a limited number of book-length treatments in archaeology on these subjects (with the exception perhaps being general-purpose GIS). ‘Quantitative Archaeology and Archaeological Modelling’ is a book series that therefore responds to this need for (a) basic, methodologically transparent, manuals for teaching at all levels, (b) good practice guides with a series of reproducible case studies, and (c) higher-level extended discussions of bleeding edge problems. This series is also intended to be interdisciplinary in the analytical theory and method it fosters, international in its scope, datasets, contributors and audience, and open to both deliberately novel and well-established approaches.
Cesar Gonzalez-Perez • Patricia Martin-Rodilla • Martín Pereira-Fariña Editors Discourse and Argumentation in Archaeology: Conceptual and Computational Approaches
Editors Cesar Gonzalez-Perez Incipit, CSIC Santiago de Compostela, Spain Martín Pereira-Fariña Facultade de Filosofía University of Santiago de Compostela Santiago de Compostela, Spain Patricia Martin-Rodilla Facultade de Informática Universidade de A Coruña A Coruna, Spain ISSN 2366-5998 ISSN 2366-6005 (electronic) Quantitative Archaeology and Archaeological Modelling ISBN 978-3-031-37155-4 ISBN 978-3-031-37156-1 (eBook) https://doi.org/10.1007/978-3-031-37156-1 This work was supported by COST Action “Saving European Archaeology from the Digital Dark Age” (SEADDA), CA 18128, https://www.cost.eu/actions/CA18128/ (CA 18128) and by grant PID2020114758RB-I00 funded by MCIN/AEI/10.13039/501100011033 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Paper in this product is recyclable.
Foreword How do archaeologists come to establish and validate accounts of the past, based on their encounters with its material remains as mediated by fieldwork, collections, and years of study and toil? How do they justify claims they make, and on what grounds do they accept or reject claims made by others? How do they reach good decisions as they investigate, construct, curate, and communicate the archaeological record? What are archaeological facts, and how do they come to be accepted as such? What are the traits of sound archaeological syllogisms? And, more generally, what is archaeological knowledge? Where can we find it, and in which forms does it manifest itself? How can it be captured, represented, and analyzed? How is it communicated, debated, and evaluated? Is there “good” and “bad” archaeological knowledge, and how can we tell them apart? Which factors are at play in knowledgemaking, and in knowing? What are the implications and stakes of archaeological knowledge, and the ways it comes into being? Few archaeologists spend much time reflecting directly on this Pandora’s box of vexing questions. Yet many of them, prompted by engaging with the transdisciplinary perspectives in this exciting volume on the use of computational approaches to discourse and argument analysis in archaeology, are central to methodological aspects of archaeological research, and to the acquisition of archaeological expertise. For one thing, competent archaeologists should surely be able to reason on the validity of an archaeological study in their area of expertise, and, beyond that, to produce research findings substantiated by persuasive arguments, supported by reliable evidence, and consonant with accepted knowledge in their field. On the other hand, scholars of archaeological theory, as well as those concerned with policies, decision-making, and interventions related to the preservation of archaeological heritage, its multiple and often conflicting socioeconomic, cultural and symbolic uses, and the future of archaeological work, need also to grapple routinely with questions related to the factors under which archaeological knowledge is produced, the felicity conditions under which archaeological facts can be deemed to be acceptable, and the status, impact, and repercussions of resulting knowledge for contemporary societies. In almost all aspects of archaeological work, researchers and professionals are inevitably entwined in knowledge-laden activities, v
vi Foreword as they engage with the body of scholarship in their area of expertise; as they identify research topics and questions; as they collect, represent, and analyze evidence from archaeological fieldwork and collections; as they develop identifications, classifications, descriptions, explanations, and, more generally, accounts of the material record of humanity and its implications for past societies and cultures; as they produce archaeological reports, catalogs, databases, monographs, articles, and conference papers; as they debate and come to conclusions on the validity of research ideas and findings, and on deliberations on the management and use of archaeological heritage, be it in scholarly publications, administrative and policy venues, or in informal interpersonal settings including online communications; and, last but not least, as they address the historical and contemporary misuses of archaeology by political and state actors, the appropriation of research agendas and heritage policies by dominant ideologies and sectarian and economic interests, and of archaeologically manifested phenomena by sensationalism, pseudo-science, and irrationalism. We might assert, paraphrasing Bruno Latour, that archaeology, not unlike experimental science, “has two faces: one that knows, and one that does not yet.” The latter is of relevance here. It offers a view of the discipline not as “readymade science” with its middle-range theories and accounts of particular sites, cultures, periods, artifact types, etc., but as a “science in the making”: a domain where archaeological knowledge, as an object (manifested in the representations of ideas in texts, visual representations, data structures, and the like), is examined in its articulation with archaeological knowing or knowledge-making as an activity, ripe with “uncertainty, people at work, decisions, competition, controversies.” It is precisely in this domain of archaeological activity where the Pandora’s box of our initial questions is primarily located. Studying how archaeologists establish ideas, facts, and assertions from their encounters with the material remains of the past, from the translation of the material record of features and finds in the field into an informational record made of descriptions, data points, visualizations, enmeshed with identifications of sites, archaeological contexts, artifacts, types and assemblages in the excavation report, and further developed into typologies, seriations, and other manifestations or archaeological systematics, as well as into synthetic accounts and interpretations, explanations, and theories in scholarly publication, has been a fruitful way to approach archaeology “in the making.” From publications such as Mike Edgeworth’s fascinating ethnography of the “acts of discovery” in an unnamed excavation in Britain, to the fertile qualitative investigations of diverse aspects of archaeological information work in northern Europe by Isto Huvila, and the multisited study of archaeological curation across different stages in the formation of four North American archaeological collections in Sarah Buchanan’s insightful doctoral dissertation, the study of archaeological practices and knowledge work has emerged as the pursuit of an growing trans-disciplinary community of researchers concerned with making sense of the agents, processes, settings, mediating tools, and objects of archaeology “in the making.”
Foreword vii A central aspect of “archaeology in the making” concerns how archaeological data, facts, and assertions related to them are represented in different genres of representations, and how such representations – from descriptive records, lists, and catalogs to research publications – underlie different modes of archaeological knowledge production. As I argued in an earlier manuscript (Dallas, 2016), we owe a seminal, and perhaps the first, systematic attempt toward a theorization of these questions to the still under-appreciated intellectual contribution of French Classical archaeologist and information scientist Jean-Claude Gardin. A pioneer of computational analysis in archaeology in the 1950s, he was initially preoccupied with the development of analytical “codes” or vocabularies for the formal description and classification of archaeological artifacts, culminating into the development of his Syntol free structure indexing language, a means for representing the content of documents through n-place predicates expressible in a machine language. Drawing critically from fields as diverse as documentation, classification theory, material culture studies, structural linguistics, argumentation theory, and philosophy of science, in his “Document analysis and linguistic theory” (1973), Gardin then expands his earlier attempts to account for the intellectual content of archaeological documents through term indexing by an added emphasis on their syntax and semantics, noting that “the boundary between syntax and semantics becomes so fuzzy that it is not possible any more to regard syntax as independent nor to confine semantics to an interpretative function.” This is the foundation of Jean-Claude Gardin’s seminal contribution to the theory of archaeological argumentation and discourse, translated into English as Archaeological constructs: an aspect of theoretical archaeology (1980). The book is a formidable theoretical construct in its own right. In the first chapter, it outlines Gardin’s “iterative model” linking the acquisition of archaeological materials with their annotation and consequent generation of propositions, and offers examples of what he calls a “logicist analysis” of processes of cataloging, classification, pattern recognition, and historical inference that constitute the “lifecycle” of archaeological knowledge process. He then goes on to analyze processes relevant to the construction of two very different kinds of archaeological publications: “compilations,” such as finds catalogs or excavation reports, typically concerned with material remains of the past and their attributes, and “explanations,” such as synthetic monographs and interpretative accounts of ancient societies, their history, and mode of life. In his analysis, he castigates the failure of traditional archaeological publication in the narrative genre to attend to methodological rigor, theoretical frugality, and clarity, even often violating sound reasoning. As an alternative, he advocates the “condensation” of archaeological scholarly prose through a process of schematization, taking the form of an ordered tree of logical inferences using modus ponens, and operating on a lexicon of structures of symbols representing propositions – in other words, an inference tree. But then, Gardin adds the following qualification: “I am not proposing a new handbook on archaeological theory, from which students can learn the techniques of observation and interpretation [ . . . ] my goal is an analysis of the mental operations carried out in archaeological constructions of all sorts, from the collecting of data to
viii Foreword the writing of an article or book in published form.” While his action-oriented, even polemical, advocacy of a mode of archaeological communication based on formal reasoning is undeniable, he notably advances also a salient approach to representing and understanding the way actual archaeological argument unfolds in practice: a way to make archaeologists “more aware of the empirical or social limits of our interpretations” – what he calls “a practical epistemology” of archaeological knowledge. Adopting Stephen Toulmin’s criterion of “reasonableness,” he advocates an archaeology whose propositions and theories, as represented in its publication practices, stand the test of reason, but also intends his logicist schematization as a means to “to gain a deeper understanding of what our interpretive writings ‘are’, as symbolic constructs; we also wish to evaluate what those constructs can ‘do’, in the universe of discourse under study.” The most notable methodological contribution of Gardin’s theorization of archaeological argumentation concerns archaeological publication. His method of re-expressing traditional archaeological argument in terms of a lexicon of symbols and a set of argumentation operations has been adopted by a limited number of studies. Among them, ethnoarchaeologist Valentine Roux’s Arkeotek project goes beyond logicist schematization to address the interdependence between archaeological data constitution on the one hand and scholarly argumentation on the other. Its hypertext-based “Scientific Constructs and Data” model provides for integrating archaeological argumentation structure with descriptive archaeological data. Further work demonstrates the possibility of modeling the logicist schema of scholarly reasoning as a formal ontology. In a parallel development, the UK Archaeology Data Service’s Internet Archaeology journal featured, as early as 1997, a similar ability of offering interactive access to archaeological studies that allowed simultaneous access to scholarly claims and supporting data: a non-lasting experiment which, nevertheless, still goes beyond the current stateof-play in research data publication. Such attention to the structure and content of archaeological scholarly communication, and its reliance on the propositional content and structure of publications, is self-evidently justified on pragmatic reasons of allowing better access to and evaluation of claims made by archaeological research. Yet, dealing with argumentation and discourse in archaeology makes the case for accounting, beyond methodology, for ontological, epistemological, and axiological considerations. In other words, when we consider archaeological knowledge “in the making” as a worthy subject of study, we need to decide on questions of existence, knowledge, and values. As regards ontology, most archaeologists would agree that their domain of reference – including material remains of past human activity and past people – exists, or has existed, independent from our knowledge of it; that it consists of differentiated objects and structures – be it natural or social – which have powers and ways of acting that contribute to the production of events; that apart from actual objects accessible directly to experience, this external world is also composed of latent, underlying entities and relations between observable entities, yet such relations may be contingent rather than necessary; but also that, unlike natural objects, social particulars such as a specific action, an artifact, or an archaeological
Foreword ix culture are dependent also on categories accessible only within our own interpretive frame, even if we still admit that they exist regardless of our specific interpretation of them. At the epistemological level, on the other hand, many (but not all) workers in the field will admit that archaeological knowledge is theory-laden, socially constructed, and historically situated; therefore, what we accept as true today may be falsified tomorrow, and “thought collectives” (to use Ludvik Fleck’s useful term) adopting different theoretical premises may legitimately have conflicting views of what constitutes knowledge on a given subject; that there are both continuities and discontinuities in the evolution of archaeological knowledge; and that the production of archaeological knowledge is a social practice, and therefore social relations, context, and interests, as well as the ways in which archaeological knowledge is communicated (typically, through historically sanctioned genres of information carriers), influence its content. Finally, at the axiological level, most archaeologists would adhere to the idea that archaeological research should be critical of its object of inquiry, and that the understanding of archaeological phenomena entails viewing them critically; some would also add that archaeological practice should be emancipatory, and adhere to values of social justice and an ethics of care. Readers with an interest in the philosophy of science may recognize that this set of ontological, epistemological, and axiological positions is aligned with a critical realist account of the human sciences (and, in fact, derived directly from Andrew Sayer’s account of critical realist assumptions): a transcendental realist ontology, a constructivist epistemology, and a value-laden, reflexive axiology. In tandem, a critical realist account conceives the process of archaeological explanation – one common objective of archaeological argumentation – as consisting of the identification of some past human activity or phenomenon to be explained and its resolution into elements, re-description of these elements in the theoretical language of archaeology (or the approach to archaeology espoused by the researcher), a retroductive attempt to describe the likely structural conditions (such as causal mechanisms, material-semiotic rules, etc.) and tendencies involved, and, finally, a process of elimination of alternative causes, or explanations. Of course, not all archaeological research aims at explanation: in fact, the reliance of archaeological knowledge related to social aspects of past reality on categories (kinds) that can only become accessible through human cognition – those which, in a more clearly constructivist vein, have been called “interactive kinds” by philosopher Ian Hacking – on the shared scholarly language of the epistemic community in which an archaeological study is situated, makes it clear that words used for identification or assignment of properties of archaeological entities have consequences on the content of archaeological knowledge. In other words, far from being the result of menial or mechanical work with limited value as knowledge, archaeological descriptions, such as those found in field recording sheets and collections databases, do matter. This has an interesting implication on what we consider as the scope of archaeological argumentation. Clearly, a causal syllogism connecting an archaeological phenomenon to likely causes, or a justification provided for some intervention concerning the protection and use of an aspect of the archaeological heritage,
x Foreword belongs within the purview of argumentation. But what about a finds database? What about the identification of some archaeological feature, its assignment to some particular function, provenience, or cultural period, in a catalog without explanatory aspirations? What about the broad range of visualizations often included as part of archaeological reports and publications? What of the illustrations – figures, photographs, diagrams, models – often accompanying archaeological texts? Are we to assume that they play no role in archaeological argumentation, and, if so, that they are not involved in knowledge production? The last statement points to an interesting conundrum: pragmatically, the very inclusion of visualizations and illustrations within archaeological documents indicates that they contribute to knowledge production. If we were to accept that they do not participate in argumentation, then we would need to posit other rhetorical modes of archaeological knowledge beyond argumentation. But, in fact, it should not surprise us that no archaeological document consists solely of propositions linked together to form an argumentation structure. The most lucid exposition (pun intended) of this is provided by Gavin Lucas in his recent Writing the Past monograph, where he demonstrates how argument not only co=exists but in fact cooperates in the very same text toward the archaeological knowledge construction with instances of all three alternative rhetorical modes systematized as early as the nineteenth century in the context of rhetoric and composition studies: narrative, presenting a story unfolding through time through the involvement of actors and events; description, involving the presentation of qualities and attributes of some observed object or event; and, exposition, explaining or clarifying a topic or issue. How, then, different archaeological communication objects mobilize different rhetorical modes, and how they are articulated in reports and publications to construct archaeological knowledge, is a fascinating topic. Going beyond rhetorical modes, the example of archaeological visualizations which I had the opportunity to reflect upon a few years ago in an interesting conference session on “Visualization as analysis in archaeology,” which provides good insights on how a site section and “hermeneutic matrix” diagram may act as an exposition of the temporality and longevity of each excavation cut; or, how a dynamic virtual reconstruction of the Antikythera mechanism captures performative knowledge, and supports a plausible explanation, about the function of the mechanism; and, more generally, how archaeological visualization constitutes an objectual epistemic practice rather than being merely an act of display; and an archaeological 3D visualization can act as an “epistemic contract” (borrowing Harold Garfinkel’s identification of the transcript of an outpatient clinic interview as “therapeutic contract” rather than as “actuarial record”), made to support the generation of knowledge claims in further steps of the interpretation ladder, rather than to represent faithfully “what the sensor saw.” This edited volume is not an archaeological study. It is, rather, a collective work about archaeology as a field of knowledge and as a practice of knowledge-making. It offers a shared foundation useful to archaeologists curious about the conditions of archaeological knowledge production and the potential of computational approaches for opening new paths for communicating and validating archaeological research,
Foreword xi computer scientists from the fields of natural language processing and argumentation support, information researchers interested in archaeological practices and knowledge work, anthropologists and sociologists of science, and others interested in how archaeologists produce knowledge through argumentation “in use.” In the spirit of the agonistic nature of argument, the volume accommodates diverse, and in some cases dissonant, conceptualizations and computational approaches to argumentation and discourse, ranging from archaeological to computational, from normative to accommodative, from pragmatic to illustrative, from synthetic to highly focused, and from instrumental to critical. It provides useful insights, and stimulates ample reflection toward new questions. It is unique in combining critical and theoretical accounts of archaeological discourse and knowledge work, and overviews of key computational approaches to discourse and argument analysis, with examples of specific applications to the formal representation of archaeological knowledge, ranging from the identification of topics through computer-assisted recognition of historical names and common descriptors, to formal conceptualizations that allow the articulation between the domain of archaeological discourse which archaeological texts inhabit, and the domain of past human activity which such texts refer to. Reiterating the core thesis he originally advanced in The Uses of Argument, Stephen Toulmin admits to “a single, deeply held conviction: that, in science and philosophy alike, [people] demonstrate their rationality not by ordering their concepts and beliefs in tidy formal structures, but by their preparedness to respond to novel situations with open minds—acknowledging the shortcomings of their former procedures and moving beyond them. Here again, the key notions are ‘adaptation’ and ‘demand’, rather than ‘form’ and ‘validity’.” In a similar vein, the dynamic nature, historicity, and pragmatic situatedness of archaeological argumentation are acknowledged across this volume. In diverse ways, different chapters address the content of archaeological argumentation, offer methods and examples to identify its subject-matter computationally and to represent formally its logical and procedural structure, and offer insights on the conditions under which particular claims are (and should be) accepted. They account for the reliance of archaeological argumentation on communicative processes, set in motion by archaeologists in conversational semiotic activity governed by historically situated systems of signification. Furthermore, they also engage with the dependence of archaeological discourse on reference to “things-in-the-world” – empirically manifested aspects of the archaeological record, persons and collectivities, objects, places, and events, as well as conceptual entities comprising the subject-matter of arguments. Finally, they illustrate how discourse “in use” hinges on the pragmatic dimensions of archaeological knowledge work – affiliation to thought collectives (to use Ludvik Fleck’s salient notion) and communities of interest with their shared communicative codes and accepted knowledge, presuppositions, norms, motivations, affects, and future stakes – which underpin the discursive activity of archaeologists as they respond and adapt to a changing field of epistemic, ethical, political, socioeconomic, and cultural challenges. Reaching beyond epistemological, methodological, and axiological considerations on the nature, poetics, and politics of archaeological
xii Foreword knowledge, argumentation, and discourse, which have been the focus of numerous earlier contributions (from Jean-Claude Gardin to Alison Wylie, Rosemary Joyce, and Gavin Lucas, to name but a few), this volume provides a pragmatically useful body of knowledge on the relevance, critical context, methods, and practical applications of discourse and argument analysis technologies as tools to represent, analyze, and reflect on archaeological knowledge and its production, aptly demonstrated through salient case studies of computational approaches. At a time when the representation of the archaeological record and the production of archaeological knowledge is increasingly mediated by digital research infrastructures and associated standards, tools, and procedures, and when the promises of deep learning and artificial intelligence assume renewed impetus across the disciplines, the task of understanding archaeological discourse and argumentation as knowledge work becomes an urgent undertaking. This volume addresses consequential issues and offers examples of promising computational approaches for representing the dynamic structure and situated process of archaeological argument, and its discursive and pragmatic underpinnings in past and contemporary realities. It opens important additional questions, contributing to the emergence of an important interdisciplinary subfield bridging archaeological theory and method with computational approaches to meaning and argument analysis. Most importantly, it also provides a springboard for intervening, by mobilizing the archaeological community to act toward the use of computational technologies to enable reflexive, critically informed, and relevant approaches to the production, publication, epistemic validation, and use of archaeological knowledge, adapted to the demands and challenges facing contemporary societies, and the planet. Faculty of Information, University of Toronto, Toronto, Canada Costis Dallas Reference Dallas, C. (2016). Jean-Claude Gardin on archaeological data, representation and knowledge: Implications for digital archaeology. Journal of Archaeological Method and Theory, 23(1), 305–330. https://doi.org/10.1007/s10816-015-9241-3
Preface Most of the knowledge that we produce in archaeology comes from careful argumentation from basic premises to elaborate conclusions. Initial premises include descriptions of finds, features, sites, and landscapes, while conclusions range from settlement patterns to trade routes or social organisations. In this regard, most archaeological texts constitute discourses aiming to persuade the reader to accept a series of conclusions based on some initial premises, often factual and evidentially supported. Whether or not an archaeological text is capable of persuading its readers and thus advance the state of the art in the field depends on the quality of the chosen premises as well as the robustness of the subsequent argumentation. Therefore, paying attention to discourse and argumentation in archaeology constitutes a crucial aspect of meta-research. Language technologies have evolved rapidly over the last 10 years, and today we can process natural language on a computer with relative ease, at least for some well-defined purposes. The conceptualisation of discourse and argumentation has advanced significantly as well, together with applied approaches. Although the importance of discourse and language in archaeology has been pointed out by many authors, there is no comprehensive work to date that presents a panoramic view of argumentation and discourse approaches and technologies in archaeology. In this book, we aim to provide this. Audience and Objectives This book is aimed at archaeologists with an interest in language, discourse, and argumentation, and specifically on how archaeological conclusions are obtained through argumentation processes. In particular, researchers in archaeology can find the book useful to gain a better understanding on how argumentation can take us from premises to conclusions and learn how to do it better. Lecturers and students of archaeology can use the book to learn specific conceptual approaches and xiii
xiv Preface computational approaches to discourse and argumentation analysis for archaeological texts. All in all, the book aims to provide a comprehensive overview of conceptual approaches and computational techniques for argument analysis in archaeology. It does so by building slowly from scratch, starting with introductory topics and progressing towards advanced and more specialised issues. Also, the book unites theory and practice, providing a comprehensive panorama of conceptual approaches and computational techniques. The book starts with the basic foundations of discourse and argumentation analysis, introducing the main goals of discourse analysis, presenting different approaches to what an argument is, and concluding with cutting-edge and stateof-the-art technologies for the fully automatic analysis of texts. In addition, the book tackles different contexts where archaeological discourses are found, from data collected during fieldwork to archiving of excavation reports or court resolutions on heritage-listed items. The book also presents an updated review of approaches and methods related to natural language processing and text mining that are applicable to archaeological settings, and at multiple linguistic levels such as lexical, grammatical, and discursive. Also, the book proposes some methodological approaches for the analysis of argumentative strengths and weaknesses in archaeological texts based on Toulmin’s schemes. Finally, the book considers different degrees of formalisation in discourse analysis, from critical Foucauldian approaches to the more quantitative computational analytics, and takes into account the social dimension of archaeological discourse production. Book Structure This book is organised into two major sections: Conceptual Approaches and Computational Techniques. A preface provides a general introduction, and a final chapter offers some speculations as to what the future of discourse and argumentation in archaeology may look like. The first section, Conceptual Approaches, contains a collection of contributions from different foundations and perspectives, offering a comprehensive overview of the discursive and argumentative phenomenon in archaeology and its ramifications. In Chap. 1, Martín Pereira-Fariña presents the fundamentals principles of discourse analysis and three different theoretical approaches of how arguments can be represented, summarising the process to transform raw data into an annotated corpus that allows us to draw conclusions anchored in how language is used in context. In Chap. 2, Stephen Stead deals with the issue of documenting the argumentation in a discourse so that it can interoperate with other sets of data. In Chap. 3, Michael E. Smith offers a historical journey through different stages and degrees of importance attributed to the study of archaeological argumentation, analyses some reasons for
Preface xv the low level of attention that is paid to argumentation in archaeology today, and presents a methodological proposal based on argument strengths and weaknesses. In Chap. 4, Alejandro Sobrino and Beatriz Calderón introduce a theoretical framework for the analysis of causal linguistic structures related to culturally relevant elements, acknowledging that causality can be linguistically expressed in multiple ways, and showing how this issue can be tackled. In Chap. 5, in turn, Cesar Gonzalez-Perez focuses on what archaeological texts talk about and presents an approach to connect the argumentation in the discourse with the underlying ontological elements in the world, using a referential device named ontological proxies. In Chap. 6, Isto Huvila takes on a more sociological, anthropological, and critical nature to archaeological discourse and reflects on discourses in archaeology as situated in their social context of production, including an analysis on the role of different agents and the impact of new ways of discourse production such as social networks or other techno-mediated mechanisms. In Chap. 7, Cesar Gonzalez-Perez, Martín Pereira-Fariña, Patricia Martín-Rodilla, and Leticia Tobalina tackle the issue of vagueness in archaeological discourses and present a conceptual framework to capture and manage vague information from the field to the text. Finally, in Chap. 8, Jeremy Huggett uses a multimodal approach to extend discourse analysis in archaeology beyond the mere text. The second section, Computational Techniques, provides a sample of some algorithmic approaches that have proved useful to deal with discourse and argumentation in archaeology. In Chap. 9, Patricia Martín-Rodilla offers an introductory overview of how computer-based processing of natural language has been applied to archaeological texts, and what major lines of work exist today. In Chap. 10, Holly Wright, Tim Evans, and Katie Green deal with the natural language processing of lexicon in archaeological texts from the perspective of a large digital archive, showing how these techniques are useful for information extraction for researchers. In Chap. 11, Alex Brandsen deals with text mining at the lexical, grammatical, and discursive levels, as well as machine learning applied to archaeological texts. In Chap. 12, John Lawrence, Martín Pereira-Fariña, and Jacky Visser go beyond the discourse itself to explore the mining and analysis of arguments from plain text, with a special focus on argument analytics and result dissemination. Lastly, in Chap. 13, Maria Elena Castiello provides an approach to processing the vagueness that is inherent to archaeological language in a site modelling context. For those readers who have a special interest in a particular topic, the book admits a theme-oriented reading in addition to a linear sequence of chapters. Chapters 2, 4, and 3 in Part I, as well as Chap. 12 in Part II, deal with argumentation and different approaches to understanding how people argue to defend their standpoints. Chapters 5 and 7 in Part I, as well as Chaps. 9, 10, and 11 in Part II, deal with lexical,
xvi Preface grammatical, and semantic language processing. Finally, Chaps. 4, 6, 7 and 8 in Section I, as well as Chap. 13 in Part II, deal with language as used in context, including social aspects, vagueness, and multi-modality. Enjoy reading! Santiago de Compostela, Spain A Coruna, Spain Santiago de Compostela, Spain Cesar Gonzalez-Perez Patricia Martín-Rodilla Martín Pereira-Fariña
Acknowledgements The editors wish to thank the authors of the chapters of this book for their generous contributions, as well as the Springer staff who guided and helped us throughout the publication process. The editors must acknowledge the contributions and support of the following grants towards the preparation of this book: project “Heritage 3.0: Argumentation and Conceptual Modelling for Enhanced Cultural Heritage Participation and Management Policies” (ACME), grant number PID2020-114758RB-I00 funded by MCIN/AEI/10.13039/501100011033; project “Deflationist Views in Ontology and Metaontology”, grant number PID2020-115482GB-I00 funded by MCIN/AEI/10.13039/501100011033; project “Saving European Archaeology from the Digital Dark Age” (SEADDA), grant number CA18128 funded by EC COST Actions; and Consellería de Educación, Universidade e Formación Profesional (accreditation 2019-2022 ED431G/01, ED431B 2019/03); and European Regional Development Fund, which acknowledges the CITIC Research Centre in ICT at the University of A Coruña as a member of the Galician University System. xvii
Contents 1 Introduction to Discourse Analysis and Argumentation Theory . . . . . . Martín Pereira-Fariña 1 Part I Conceptual Approaches 2 Discourse and Argumentation in Archaeology: Conceptual and Computational Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stephen Stead 29 3 Making Good Arguments in Archaeology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michael E. Smith 4 A Causal Model Application to a Cultural Heritage Sentence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alejandro Sobrino and Beatriz Calderón-Cerrato 55 What Archaeological Texts Argue About: Denotations and Ontological Proxies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cesar Gonzalez-Perez 93 5 37 6 The Social Production of Discourse in Archaeology . . . . . . . . . . . . . . . . . . . . 115 Isto Huvila 7 Dealing with Vagueness in Archaeological Discourses . . . . . . . . . . . . . . . . . 137 Cesar Gonzalez-Perez, Martín Pereira-Fariña, Patricia MartínRodilla, and Leticia Tobalina-Pulido 8 Extending Discourse Analysis in Archaeology: A Multimodal Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Jeremy Huggett xix
xx Contents Part II Computational Techniques 9 Computer Processing of Language: Where Archaeological Discourse and Computers Meet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Patricia Martín-Rodilla 10 NLP and Archaeology: A View from a Digital Archive . . . . . . . . . . . . . . . . 215 Holly Wright, Tim N. L. Evans, and Katie Green 11 Information Extraction and Machine Learning for Archaeological Texts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 Alex Brandsen 12 Argument Mining and Analytics in Archaeology . . . . . . . . . . . . . . . . . . . . . . . 263 John Lawrence, Martín Pereira-Fariña, and Jacky Visser 13 Computational Processing of Language Vagueness for Archaeological Site Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 Maria Elena Castiello Part III The Future 14 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 Cesar Gonzalez-Perez, Martín Pereira-Fariña, and Patricia MartínRodilla
Chapter 1 Introduction to Discourse Analysis and Argumentation Theory Martín Pereira-Fariña Abstract Discourses analysis is an explicit and systematic study of the structures, strategies and manoeuvres of texts or talks in terms of a given theoretical framework. The current stage of computational technologies allows us to tackle this task from different perspectives. Along this chapter, I explore how an argument can be characterised and analysed from three theoretical perspectives (logic, pragmatic and cognitive). Each of these approaches lead us to different types of discourse analysis, emphasizing different angles of the same text, which shows the richness of this analytical framework. After that, I describe the main steps for transforming raw text into an annotated corpus, essential to draw any reliable conclusions from it. Annotation is a complex task, essential for a good quality analysis of discourse, but it can be split into doable steps. Finally, the chapter concludes with some ideas for the exploitation of these results and how they can be disseminated. Keywords Argumentation · Annotation · Corpus creation · Discourse analysis · Ontology 1.1 Introduction It is 4 pm on a cold day in February. Two senior archaeologists are discussing about the future of The Cave of Altamira,1 a set of charcoal drawings and polychrome paintings that constitute one of the firsts masterpieces in the history of mankind. Sitting in front of each other, together with a moderator, they debate a question 1 World Heritage Site by UNESCO located in Santillana del Mar (Cantabria), North of Spain. For more information: http://www.culturaydeporte.gob.es/mnaltamira/en/home.html M. Pereira-Fariña (!) Department of Philosophy, University of Santiago de Compostela, Santiago de Compostela, Spain e-mail: martin.pereira@usc.gal © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 C. Gonzalez-Perez et al. (eds.), Discourse and Argumentation in Archaeology: Conceptual and Computational Approaches, Quantitative Archaeology and Archaeological Modelling, https://doi.org/10.1007/978-3-031-37156-1_1 1
2 M. Pereira-Fariña that has been floating around Altamira during the last 20 years: should be the Cave opened for public access or just for experts with research purposes? Experts have mixed opinions: • Researcher 1 (R1): I think the question is basically that the Cave of Altamira should be opened because it is obviously a place, to say simply, that everyone has the right to visit. Thus, from that principle, I think that all that can be negotiated, discussed, and talked about, it’s under what conditions and, above all, for what, that is, what is the benefit of opening it, right? Starting from the principle that heritage, everyone has the right to access it, then, considering the problematic and risky conditions that the cave has, from there one can think of possible restrictions and criteria to restrict and so on, but from the outset, I mean, it must be open. That is my position. • Researcher 2 (R2): I think not, precisely for the same reason; because everyone has the right to access heritage, but if everyone accesses heritage, that heritage is destroyed, isn’t it? So, I think that access should be restricted to experts, let’s say, and to researchers, and in fact I think that there was even a more or less exact reproduction and, well, that for tourism purposes or just for dissemination, I think that it could therefore do the job quite well. And basically, that is my point. If you want, we can go into more detail, but... • Moderator (M): Only experts, you say, should access to the cave. • Researcher 2 (R2): Yes, indeed, just researchers. Let’s say, people who are in research centres and it is essential, come on, that they enter to check certain things, for example... I don’t know. We can easily appreciate how each position is argued and how a certain common background is presupposed. So, how are both discourses elaborated? Are they talking about the cave as a physical object or a social object? What type of reasons do they use for supporting their corresponding positions? What is the connection between the researchers and the cave? Are they considering themselves as experts or as regular visitors? Could they have any potential conflict of interest? What is the context in which this debate is happening? This is just not a scientific debate, but it also has a social impact. Understanding and evaluating it requires to unpack the connections between language, reality, and speakers. Discourse analysis tackles precisely this question, as it is defined by Paltridge (2012, p. 2): Discourse analysis examines patterns of language across texts and considers the relationship between language and the social and cultural contexts in which is used. Discourse analysis also considers the ways that the use of language presents different views of the world and different understandings. It examines the use of language is influenced by relationships between participants as well as the effects the use of language has upon social identities and relations. It also considers how views of the world, and identities, are constructed through the use of discourse. Therefore, discourse analysis does not study the language itself, but the language in use (Gee, 2011). It is a broad and interdisciplinary field, connected with other disciplines such as semiotics (Eco, 1979), linguistics (Serrano, 1983) or communication
1 Introduction to Discourse Analysis and Argumentation Theory 3 studies (Chandler, 2003). There are two main methodological approaches which are differentiated by their respective goals (Gee, 2011): • Descriptive: It aims to understand how language works in different communicative situations: what are the topics of discussion, what is the grammar applied to produce meaning, what are the different stylist resources of manoeuvres to produce meaning, etc. • Critical: It aims to intervene in social, political or cultural problems and controversies and provoke changes in the world based on studying how language works. In this chapter, I will focus on a descriptive type of discourse analysis which main goal is to unpack argument structures in a given discourse and, eventually, the folk ontology underpinning a specific discourse. The goal is to provide robust theoretical and methodological grounds for understanding the different methods of reasoning and how knowledge is produced in a field such as cultural heritage (Lucas, 2019). Discourse analysis, following the spirit of archaeological stratigraphy, allows us to identify how the different layers that constitute the structures of meaning (what are the internal elements of the text and how they are organised) and interaction (how speakers take part in the discourse). This chapter is organised as follows. Section What is an argument? Introduce the three different views about the notion of argument. Section Designing and Annotation Campaign describes how to design and carry out a concrete study in discourse analysis. Section Exploiting the results provides some ideas for the communication and dissemination of the results extracted from the study. Finally, this chapter concludes with some reflections on the impact that discourse analysis can have in the field of archaeology and cultural heritage studies. 1.2 What Is an Argument? R1 says “I think the question is basically that the Cave of Altamira should be opened” while R2 replies “I think that access should be restricted to experts”. Both speakers maintain opposed positions with respect to the Cave, which can be reconstructed as assertions as follows: • R1: The access to the Cave should be opened to everybody. • R2: The access to the Cave should be restricted to experts. These are not arguments; these are just assertions. So, what does we need to have an argument? Generally speaking, an argument requires at least another assertion that play the role of support. A more specific definition is highly dependent on how the relationship between both statements is conceptualized. Next, I will describe three different approaches:
4 M. Pereira-Fariña • Logical approach: An argument is a sort of linguistic entity where a statement, named conclusion, is supported by one or more statements, named premises (Salmon, 1984). Logic studies the connection between premises and conclusion in order to determine when it is correct and when is not; i.e., the rules and principles to determine the validity of the argument. • Pragmatic approach: An argument is a particular type of speech act where a speaker has the intention to support a specific statement, the conclusion, by means of another statement or a set of them, the premises (Janier & Reed, 2017). Argument theories based on Speech Act Theory (Austin, 1989; Searle, 1965) aims to identify when a speaker intends to make an argument; determining its validity is a secondary issue. • Cognitive approach: An argument is a cognitive category where the linguistic expression is acting as a sign vehicle of a specific relation of support between two or more mental representations where one is the conclusion and the other or others are the premises (van den Hoven, 2015; Searle, 1965). This is the most ambitious approach, because it aims both to identify the structure of the argument and its strength. The key point of strength is not logical validity but acceptability; i.e., whether the argument has convinced its addressee or not (Mercier, 2012). In the following subsections, I will describe in detail each of these views and I will illustrate how they can be diagrammatically represented for their computational analysis. 1.2.1 Logical Approach The logical analysis of a discourse fragment entails three basic steps (Salmon, 1984): (i) checking whether that text is an argument or not; (ii) distinguishing between premises and conclusion; and, (iii) if the argument is not complete, adding the hidden or presupposed premises. Thus, let’s consider again the main positions expressed by R1 and R2: R1: I think the question is basically that the Cave of Altamira should be opened because it is obviously a place, to say simply, that everyone has the right to visit. R2: I think not, precisely for the same reason; because everyone has the right to access heritage, but if everyone accesses heritage, that heritage is destroyed, isn’t it? Arguments rarely appear in a stereotypical way (a premise per line, and horizontal line and the conclusion below it) in natural language discourse. Usually, they appear disorganised and hidden in the middle of the discourse, accompanied by non-argumentative fragments. So, step (i) aims to recognise argumentative text among non-argumentative one. We must start looking for certain linguistic particles or phrases that indicates the presence of arguments. Some typical expressions are “therefore”, “hence”, “consequently”, “so”, “it follows that”, “since”, “for”,
1 Introduction to Discourse Analysis and Argumentation Theory 5 “because”, etc. In the example, both R1 and R2 use “because” (in bold) and this indicates that there is an argument there. Step (ii) consists of identifying the premises and the conclusion of the argument and reconstructing the propositions expressed by them. For distinguishing premises from conclusion, we can use linguistic markers again. Particles such as “therefore”, “hence”, “consequently”, “so” or “it follows that” indicate that the conclusion is going to be introduced; particles such as “since”, “for” or “because”, indicate that what is following are premises. In the previous example, both R1 and R2 use “because”, which gives us a delimitation mark to split the text following this pattern “<conclusion>because<premise(s)>”. R2, in addition, uses the linguistic marker “but” (underlined), which usually indicates that a new premise is added to the argument. Next, I reconstruct the argument structure of both speakers (Table 1.1). The next step is reconstructing the propositions. This is a problematic notion in philosophy (Richard, 2013). For the sake of simplicity, we assume here its minimal definition: a proposition is what is expressed by a statement, and it has a truth-value (it is true or false). Next, we show the simplest reconstruction of the argument and the propositions by R1 and R2, removing epistemic verbs and any other linguistic elements not necessary to make clear its main contain (Table 1.2). However, both R1 and R2 seems to be incomplete, there is a lack of connection between the premise and the conclusion. Step (iii), following Salmon’s methodology, consists in the reconstruction of hidden premises. Thus, R1 is presupposing a link between “the right to visit a place by everyone” and “The Cave of Altamira should be opened”; therefore, we need an additional premise (a conditional) to make this connection: “If the Cave of Altamira is a place that everyone has the right to visit, then it should be opened”. In the case of R2, the additional premises are “The Cave of Altamira is a heritage site” and “The Cave of Altamira should not be Table 1.1 Reconstruction of the argument structure of R1 and R2 arguments R1 The Cave of Altamira is obviously a place that everyone has the right to visit. The Cave of Altamira should be opened. R2 The Cave of Altamira will be destroyed if everyone has the right to access it. If everyone accesses heritage, that heritage is destroyed. The access to the Cave of Altamira should be restricted to experts. Table 1.2 Reconstruction of the propositions of the R1 and R2 arguments R1 The Cave of Altamira is a place that everyone has the right to visit The access to the Cave of Altamira should be opened to everyone R2 If everyone accesses heritage, then that heritage is destroyed If everyone has the right to access the Cave of Altamira then, it will be destroyed The access to the Cave of Altamira should be restricted to experts
6 M. Pereira-Fariña Fig. 1.1 Full reconstruction of R1(left-hand side) and R2 (right-hand side) arguments. In green, the hidden premises that have been added. The nodes content premises and conclusions and the arrows always point to the conclusion destroyed”. Figure 1.1 shows the full reconstruction of R1 and R2, including hidden premises, by means of a diagrammatic representation using LogosLink. Logical approach considers arguments as single and autonomous units that must be fully reconstructed to be evaluated. The two basic types of arguments are: (i) deductive arguments; and (ii) inductive arguments. Deductive arguments are demonstrative (Salmon, 1984); therefore, if the premises are true and the argument is valid, then the conclusion is necessarily true. However, it does not provide new information because the information in the conclusion is already implicit in the premises; in other words, the conclusion only makes explicit information that was already in the premises. Inductive arguments are not demonstrative (Black, 1967); therefore, premises only provide a degree of support or confidence ore even probability to the truthfulness of the conclusion. However, it provides new information which is not included in the premises. R1 is reconstructed as a deductive argument, since the conclusion is just the consequent of the conditional that can be inferred because the antecedent is asserted as a premises. R2 is an inductive one, since it is adding new information, such as “experts” also entails that “the Cave should not be opened to everyone”. Logic is a very well-defined methodology for evaluating the quality of arguments. Different types of logics (propositional logic, first-order logic, etc.) allow us to evaluate different type of arguments. However, deductive or even inductive arguments are very rare in natural language discourse because we have to deal with incomplete information and uncertainty in many everyday situations. Moreover, reconstructing arguments in this way usually requires a lot of presuppositions and extracting implicit information that cannot be easily derived from the original text. Finally, it does not allow us to capture the dynamics of the debate and complex argumentative structures cannot be analysed. A more flexible framework under a logical approach is the Periodic Table of Arguments (PTA) (Wagemans, 2016, 2019). It focuses on the study of arguments
1 Introduction to Discourse Analysis and Argumentation Theory 7 in natural language by means of a step-by-step method for identifying arguments, including more types than deductive and inductive. 1.2.2 Pragmatic Approach Usually, arguments are elaborated during a communicative interchange, in a dialogue. In such as circumstances, any speaker pursues a specific goal: either justify him or herself or persuade others (Mercier & Sperber, 2018). To achieve this goal, speakers use different linguistic structures and argument structures. From a rhetorical point of view, if a speaker uses rational arguments, he must prove the truth of his premises and the audience will accept the truth of the conclusion (Perelman & Olbrechts-Tyteca, 1973). Speech act theory is the general frame upon which this approach is built up. A speech act is the production of a linguistic instance, an utterance, under specific circumstances (Searle, 1965). The illocutionary act is the minimal unit of linguistic communication, and it comprises two components (Searle & Vanderveken, 1989): (i) an illocutionary force; and, (ii) a propositional content. For example, “Open the window!” and “Could you open the window?” are two utterances with the same propositional content (i.e., ‘you should open the window’) but with different illocutionary forces: the former is an ‘order’, and the latter is a ‘request’. Currently, there is not a fixed catalogue of illocutionary forces, although some of them are widely accepted such as assertion or questioning (Searle & Vanderveken, 1989). In this section, I will introduce Inference Anchoring Theory (IAT) (Reed & Budzysnka, 2010; Janier & Reed, 2017), which main goal is to describe and capture dialogical aspects of argumentation; and Pragma-dialectics (van Eemeren & Grootendorst, 1984, 2004), a normative approach for the development of a rational conversation. IAT presupposes that the analysis of dialogical interactions allows us to extract the argument form of a discourse, since linguistic argumentative indicators (such as, ‘therefore’ or ‘because’) are not as common in spoken language as in written texts (Janier & Reed, 2017). The sequence of interventions during a dialogue also conveys the structure of the argument that the speaker wants to elaborate. Thus, IAT argument analysis requires the following steps: (i) segmenting the utterances of each speaker into argumentative units; (ii) identifying the illocutionary forces and reconstructing the propositional content of the argumentative units; and, (iii) unpacking and reconstructing the argumentative relations between the propositions. Figure 1.2 shows the diagrammatic analysis of a fragment of the first interchange between R1 and R2 using IAT framework and OVA+ (Janier & Reed, 2017), a web annotation tool specifically developed for IAT analysis. As can be observed in Fig. 1.2, the fragment of the dialogue between R1 and R2 is represented as a graph composed by three main sections: (i) the right-hand side, where we capture the dialogical structure and it comprises both the utterances from each speaker (locutions) and the relevant moves between them (transitions);
8 M. Pereira-Fariña Fig. 1.2 IAT analysis of dialogue between Researcher 1 and Researcher 2 (ii) the middle side, that contains the illocutionary forces representing the speaker’s communicative intentions; and, (iii) the left-hand side, which represent the argument structure. This analysis presents several relevant differences with respect to the logical one. Firstly, it captures both the utterances (the actual statement that was said by the speaker) and their propositional content (with the minimal possible reconstruction), which allows us to keep track about what was actually said by each speaker. Secondly, it shows the dynamic of the dialogue and the turn taking among the participants. Thirdly, the disagreement between both speakers is explicitly captured by the “Default conflict” node in the left-hand side, which indicates that an already said proposition is being neglected. Fourthly, it also gathers the intentions of the speakers through the illocutionary forces, which can come from both the utterances and the turn taking. Figure 1.2 contains four illocutionary forces, although IAT defines more than 20 different ones (Janier & Reed, 2017): (i) “Asserting”, which indicates that a speaker just made a statement; (ii) “Arguing”, which indicates that
1 Introduction to Discourse Analysis and Argumentation Theory 9 a speaker has the intention to support a claim; (iii) “Disagreeing”, capturing the speaker intention of rejecting a statement that has already been said; and, (iv) “Rhetorical question”, which shows that a speaker has made a claim but formulating it as a question, so no answer is required. Thus, from the perspective of Discourse Analysis, we can get a deeper understanding on how argumentation is elaborated using the pragmatic approach rather than the logical one. Its main weakness is the lack of a systematic methodology for evaluating the strength of the argument, something that logical approach provides. The other mentioned pragmatic approach, Pragma-dialectics (van Eemeren & Grootendorst, 1984, 2004), is, essentially, a normative model where any argumentative exchange is taken as an instantiation of the ideal model of a critical discussion which goal is a reasonable resolution of difference of opinion. This conversation is guided by a set of rules, named “dialogue protocol” (that should be captured by “Transitions”), to achieve the proposed goal; the violation of any of these rules will constitute a fallacy. Pragma-dialectics establishes three basic components for a rational conversation: (i) setting the roles of participants, basically protagonist (who argues in favour of the standpoint) and antagonist (who argues against the standpoint); (ii) going through the four stages of the discussion (confrontation stage, opening stage, argumentation stage and concluding stage); (iii) evaluation if any of the 15 rules of critical discussion (spread along the different stages) were violated. The analysis of an argument within this framework requires five different steps: (i) identifying the standpoints of the discussion, which is composed by a proposition and the illocutionary force (the attitude of the speaker with respect to that proposition); (ii) recognizing the protagonist and the antagonist, assigning their respective standpoints; (iii) agreement on the shared propositions that establishes the common ground of the speakers; and, (iv) identifying the argumentative structures used by the speakers during the discussion, which include both argument schemes and critical questions (Walton et al., 2008). A deeper analysis of this framework is out of the scope of this paper, since it requires the analysis of the full dialogue; however, from the perspective of Discourse Analysis, it is a very valuable framework. 1.2.3 Cognitive Approach The last approach to the nature of arguments that I will explore in he is considering argumentation as a mental process (van den Hoven, 2015) or a cognitive activity (Mercier & Sperber, 2018). An argument expressed in natural language (written, spoken, trough images, etc.) is not the argument itself but the representation of a mental process. Therefore, understanding or interpreting an argument always entails the reconstruction of the corresponding mental process. Thus, any linguistic argument is a sign vehicle of the mental process and, therefore, it must be analysed as a semiotic entity (van den Hoven, 2015): its textual part is a sort of representamen which stands for the argument itself –the
10 M. Pereira-Fariña object– which is a cognitive entity with a particular goal. As hearers, we reconstruct that connection between the textual argument and the argument itself through the interpretant (Peirce, 1958; Chandler, 2003). This semiotic conception of arguments presupposes two main types of relationships (van den Hoven, 2015): – Mimesis: The textual argument is a perfect imitation of the mental process of argumentation. – Diegesis: The propositions constituting the argument convey a specific interpretation and evaluation of the world. Both relationships are the constituent parts of the named ‘discourse world’. It comprises the background, presuppositions, commitments, beliefs or desires of each speaker –shared or not– and, therefore, it plays a major role in the reconstruction of the argument for its understanding and evaluation (Mercier & Sperber, 2018). Under this approach, the intention of the speaker of making an argument is not enough to have an argument, it also requires to be recognised as that by the hearer. Therefore, arguing is, essentially, a social activity (Mercier & Sperber, 2018). From our point of view, this is the richest framework for modelling argumentative discourse also it is the most complex one. To the best of my knowledge, there is not still a fully developed framework for that. IAT/ML (Gonzalez-Perez, 2020), that combines IAT with conceptual modelling (Gonzalez-Perez, 2018), is a theoretical approach under development grounded on this conception of argumentation. IAT/ML defines four basic steps to carry on a cognitive analysis of an argumentation: (i) setting the initial discourse world of the participants in the conversation by means of a conceptual modelling language; (ii) identifying the chunks of texts that are acting as a sign of a mental process of argumentation (following linguistic indicators, grammar structures, images, etc.); (iii) reconstructing the argument mentally elaborated by each speaker using the contextual information and foreknowledge available for the analyst (which might significantly vary between analysist); and, (iv) evaluating whether the result of the interaction requires any change in the discourse world. Figure 1.3 shows a reconstruction of the discourse world (ontology) underlying the debate between R1 and R2 about the Cave of Altamira using ConML, a conceptual modelling language.2 Each node represents a discourse entity, such as the “Cave of Altamira”, which appears in a central position since it is the main entity discussed in the debate. Both speakers know that the cave is the support of the prehistorical paintings, but they disagree with respect to the “RightOfUse”, which appears twice taking two different values: once as “Experts may access” and other as “Everyone may access”. Each edge defines a directed connection between entities, such as between the “RightOfUse” and two different groups of people, “Experts” from one side and “Everyone” from the other side. 2 http://www.conml.org/default.aspx
1 Introduction to Discourse Analysis and Argumentation Theory 11 Fig. 1.3 Conceptual model of the debate between Researcher 1 and Researcher 2 about the Cave of Altamira Fig. 1.4 Ontological proxy for connecting the argument and ontological analysis of the debate between Researcher 1 and Researcher 2 Ontological model in Fig. 1.3 can be related to the argument diagram in Fig. 1.2 to obtain a more complete analysis of this debate. However, both diagrams cannot be linked straightforward, but they require a sort of intermediary; i.e., an ontological proxy (Gonzalez-Perez, 2020). Figure 1.4 shows an example of ontological proxy, where R1 has committed with existence (“Makes reference that refers to”) of the entities “Altamira Cave” and “Everyone” when he is asserting the proposition “The cave should be opened to everyone”.
12 M. Pereira-Fariña Although IAT/ML is still under development, it has several strengths. It allows us to gather the “discourse world” underlying the debate, which is essential to identify the disposition of the hearer to be persuaded; i.e., whether the arguments exposed by the other speaker are align with his or her previous beliefs, desires or assumptions or, by the contrary, there is a conflict with that. The former, makes persuasion easier than the later. On the drawbacks, this type of analysis is more time consuming and demanding than an analysis based on a pragmatic approach. I have exposed three theoretical frameworks, supported on alternative views of the nature of arguments, that can be applied for argument analysis in Discourse Analysis. Next, I will describe the key steps of a general methodology for carrying on the argumentative analysis of a specific dataset, independently on the adopted theoretical framework. 1.3 Designing an Annotation Campaign Annotation means to add interpretative information (premise, conclusion, illocutionary force, etc.) at a meta-level to describe how language-in-use works (Fort, 2016, p. 10). This is the basic methodology in Discourse Analysis and the type of information that can be added includes a wide variety of elements, from just adding who are the speakers or the timestamps to a transcription of a recoded audio to a very detailed marks for intonation, voice pitch, emphasis, etc. Here, I will focus on the basic principles and requirements for annotating arguments; i.e., on marking the argument structure and their components on a selected dataset. Annotation is usually organised in campaigns. Every annotation campaign is guided by a goal; i.e., what is the research question or the hypothesis to be validated. In the case of argument annotation, those goals range from unpacking the argumentative structure of a specific discourse, such us in the case of US 2016 presidential debates (Visser et al., 2018), to other more specific aims, such as annotating the argument schemes (Visser et al., 2021) or identifying the type of argumentative propositions that have been used (Jo et al., 2020). An annotation campaign is, most of the times, a collaborative process that requires a team, although it can be done individually as well. A successful annotation campaign requires a preparatory work, which is crucial when the annotation is collaborative. The main actors participating in any annotation campaign are (Fort, 2016): (i) campaign manager, the person in charge of the full process and who decides when the annotation is ready to start and when is finished; (ii) expert annotator, basically experts, a person or set of people who knows very well the theory and the guidelines and who is able to assess the quality of the annotation; and, (iii) annotators, the team of people specifically trained for the annotation campaign who will do the main task of adding marks and label to the raw corpus. Next, I will describe the three basic stages of an annotation campaign (Fort, 2016): (i) pre-campaign, which consist of preparing all the requirements and training
1 Introduction to Discourse Analysis and Argumentation Theory 13 annotators for the task; (ii) annotation, when the material is actually annotated by the annotators; (iii) evaluation, when the quality of the annotation is assessed. 1.3.1 Pre-campaign The very first step in any annotation campaign is to set its goal; as I said, what is the research question that needs to be addressed, the hypothesis that we want to validate or even the type of annotated data that we want to obtain. The definition of this goal is also highly dependent on the theoretical framework supporting the annotation. For instance, if we adopt a logical approach, the dynamics of the interaction among the speaker is not relevant, while, if we adopt a pragmatic or a cognitive approach, this information must be annotated. Any pre-campaign should include, at least, the following stages (Fort, 2016): (i) creating the raw corpus, (ii) creating the guidelines; and, (iii) training the annotators. 1.3.1.1 Creating the Raw Corpus Annotation campaigns are expensive. Either they require a very well-trained team for completing the task well (which is expensive to train) or a large team to complete the task quick (which is expensive in terms of quality). For that reason, the material to be annotated must be carefully selected by the campaign manager in order to make the annotation campaign as efficient as possible. The manager campaign is the responsible for creating the raw corpus (Fort, 2016). It has to keep in mind what are the main goals of the annotation campaign and the theoretical framework for the argument analysis in order to select a representative and not bias raw data set. Some discourse features to consider are topic, genre, context, etc. Any error in the selection of the raw text might entail annotating more material or reannotating something already annotated, which makes the campaign more expensive. The second main task for the manager in this stage is to guarantee that all that material is in the right format to be tackled with the annotation tool (today, any annotation task is computationally supported) used by the annotators. Most of the times the format of the material is written text. Thus, it must be presented as a clean, structured, and human readable document for the annotators. This usually requires, at least, the following points: (i) removing all those text spans already discarded for the analysis (i.e., footnotes, comments, typos, etc.); (ii) introducing all those metadata required for the analysis (speakers, timestamps, etc.) and formatting the text in a specific manner to make clear the distinction among these types of information; (iii) dividing the intervention of the speakers in different paragraphs and make a clear cut between them; and, (iv) identifying the language in which the text is written.
14 M. Pereira-Fariña Annotating written text has several advantages (easy to store, stable, machine readable, etc.) but it has some drawbacks as well (paralinguistic elements of communication such as pitch, gestures, etc. are missed). Others available formats are just audio or video-recordings, but they are usually transcribed into written text because it is simpler working whit that in this format. What needs to be transcribed will depend on the goal of the analysis. 1.3.1.2 Annotation Guidelines Annotation guidelines constitute the set rules and principles that must be applied to identify what should be annotated and the categories that must be applied and written by the annotators (Fort, 2016). A relatively stable and agreed set of guidelines is crucial for good quality in annotation and it always must be attached to the annotated corpus. As it happens with the selection of the raw corpus, any change in the annotation guidelines might have a severe impact on the annotated material; for that reason, introducing a change must be carefully evaluated. In the case of annotation campaigns on argumentative texts, guidelines must indicate both what part of the text should be annotated (it might be the case that not all the text is argumentative) and how. These guidelines are highly dependent on the theoretical framework supporting the analysis, and they should be as short as possible but detailed enough to facilitate annotators task. Here, I suggest four basic recommendations for the elaboration of a stable and good set of guidelines for argument analysis: 1. Explaining how to navigate through the guidelines, recommending what tasks should be done first and how to proceed. It might the case that an annotator is more efficient doing the task in a different way, but it is important to set certain milestones to be able to evaluate the progress of annotation. 2. Defining as clear as possible criteria for distinguishing between argumentative and non-argumentative parts of the text (i.e., guide the annotator for selecting those parts that should be annotated and those not). Argumentative text must be carefully analysed following the principles and rules defined in the next point; non-argumentative texts can be omitted, but it is important to keep these text fragments in the raw corpus just in case further analysis is required. 3. Defining the set of labels, marks, and rules that the annotators must apply on the text. It is essential to illustrate this with examples, ideally, with real examples but selected from other data set to avoid potential biases. In the case of argumentative analysis, these include propositions, locutions, illocutionary forces, argumentation relations, transitions, etc. 4. Defining a checklist with the most common errors that the annotators must review before moving on to the next piece of text to minimize possible and recurrent errors.
1 Introduction to Discourse Analysis and Argumentation Theory 15 Annotation guidelines are usually a result of an iterative process by means of which they are frequently updated. Several annotation methodologies have been proposed which include the creation of guidelines as one of the first steps, such as Agile annotation (Voormann & Gut, 2008), Hovy and Lavid’s methodology (Hovy & Lavid, 2010), who have defined a 6 steps procedure, where steps from 2 to 5 consist of updating the annotation guidelines until they are reliable enough to be used; MAMA methodology (Pustejovsky & Stubbs, 2012) which is based on a cyclical annotation process until the quality of the annotation is good enough. All these methodologies coincide in two basic recommendations: (i) starting with a quick draft of the guidelines; and, (ii) testing it with the material that is going to be annotated and updating it accordingly until their quality is good enough. Another approach for annotating text is defining a set of relevant questions (instead of rules and labels) that must answered using the raw data. Gee (2011) proposes an annotation guide consisting of a set of questions to be asked to the raw material and labelling those statements in the text that answer the questions. 1.3.1.3 Annotators Training Annotators, the actual team who is going to create the annotated corpora, are an essential part of the process, and, therefore, they must be carefully selected and trained. A good training should be done under the same conditions of the actual annotation and must address, at least, three dimensions (Fort, 2016): training on the annotation itself, training on the annotation tool and evaluating the training. The training on annotation itself focuses on teaching the team how to read the guidelines and how to apply them. This should be done with a mini corpus extracted from the raw data to make them familiar with the material that is going to be analysed. In addition, these training sessions should be used to discard those people that cannot perform the task correctly. Once the annotator team has been selected, a new annotation session in real conditions is highly recommendable to assess how they work together and to obtain a realistic time estimation of the task. The training on the annotation tool should make all the annotators familiar with the annotation software. This should be also done with a mini corpus of the raw data to identify all the potential doubts and problems that might appear during the annotation itself. The evaluation of the training will allow us to know when the team is ready to begin the task. This should be done also in both on a mini-corpus of the raw material (F-measure or accuracy, see section Evaluating the Annotation) or between annotators (inter-annotator agreement, see section Evaluating the Annotation). The insights of the evaluation will be also very useful for the organization of the annotation itself, organising, for instance, annotators according to their better skills.
16 M. Pereira-Fariña 1.3.2 Annotation Once annotation guidelines are relatively stable and annotators have been trained, the actual annotation can start. Although it is likely that both guidelines and annotators need to be corrected to reach their maximum capabilities, these corrections should be as minimal as possible and never as relevant as during the pre-campaign stage. At this stage, which usually is time limited (although the limit can be months), trained annotators add the marks and labels to the raw data and produce the annotated corpus, the output of the process. There are two basic forms for organising an annotators team (Fort, 2016): (i) they can constitute a well-defined and limited team of people (collaborative annotation); or, (ii) an undefined and large group of people (crowdsourcing annotation). Collaborative annotation can run both in parallel, where each annotator is annotating a different part of the raw corpus at the same time; or sequentially, where each annotator is doing the same task in the whole corpus (i.e., one annotator is doing rule 1 of the guidelines, other is just doing rule 2, etc.). Sequential annotation can speed up the process, since annotators are specialised on one specific task (i.e., splitting the text between argumentative and non-argumentative, identifying argumentative relationships, etc.) and they can do it very efficiently. However, it also entails a risk of bias, given subsequently steps are based on a unique source and accuracy is more difficult to control. On the other hand, parallel annotation makes easier the quality control (each output from each annotator can be evaluated individually) but it is usually slower. No of those approaches is essentially better than the other, choosing one or another is a matter of each annotation campaign.3 Crowdsourcing annotation has been gaining popularity and relevance during the last 15 years, especially after Amazon Mechanical Turk4 has appeared. Unlike collaborative annotation, although both require a group of annotators, this is an undefined and large group of people, usually recruited through and open call (Fort, 2016, p. 63). Crowdsourcing has achieved a massive success due to it is a low-cost solution and it is broadly distributed, therefore annotators can be easily and quickly substituted. Sequential annotation is very rare in crowdsourcing because its core is, precisely, a group of people working in parallel. Its main challenge is training annotators, given we are dealing with an open and undefined team. There are several strategies to evaluate the skills of the crowd (i.e., background knowledge that they need, training exercises, tests, etc.) but they are much more difficult to control than in collaborative annotation. Collaborative and crowdsourcing annotation are two alternative and complementary ways of doing annotation campaigns, none of them is essentially better than the other. 3 http://bbc.arg.tech/ show an example of an annotation campaign done using collaborative annotation within 24 h. 4 https://www.mturk.com
1 Introduction to Discourse Analysis and Argumentation Theory 17 1.3.3 Evaluating the Annotation Evaluating the quality of the annotation is an essential step to obtain a good quality annotated corpus. However, it is also a challenging task since there is not such a ground truth. Annotating is highly interpretative and discrepancies among annotators are inevitable. Thus, we only can aim to tackle this evaluation in terms of consistency among the annotators. If two or more coincide in the analysis of a text span, that analysis is expected to be accepted by any other person who is realising the same task. However, this is also an expensive task, both in time and human resources, because it always requires more annotation. The advantage of a regular evaluation is to check whether the annotation is correctly done, if there is any error that needs to be fixed or if the guidelines require an update. There are two main approaches for assessing the quality of a annotation task (Fort, 2016): (i) inter-annotator agreement, which consists in checking the agreement between two or more annotators in the annotation, (ii) gold-standard, which consists in creating an ideal mini-reference annotated corpus from the raw data and compare the outputs from the annotators with that (precision, accuracy, Fmeasure, etc.). In both cases, a double annotation is required, and it usually consists of a randomly sample of the 10% of the raw data. For inter-annotator agreement metrics there are two basic categories of evaluation metrics: • Based on agreement: They asses the reproducibility of the annotation considering the matching of two annotators annotating the same excerpt. The main assumption can be formulated as follows: if two annotators match, the annotation is right; if there is a mismatch, the annotation is wrong. • Based on error detection: They compare annotated chunks of text with a goldstandard. The main assumption can be formulated as follows: if the annotator matches with the gold standard the annotation is right; if not, there is a mistake. Another possible method is to consider one of the annotators as the gold standard and evaluating other in terms of how close is to the gold standard and vice versa. They key concept in both measures is observed agreement; i.e., the percentage of times the annotators annotated the same chunk of text in the same way. It is calculated by means of a confusion matrix or contingency table, which allows us to observe in what categories or labels annotators agree or what categories have been actually used. However, observed agreement is not enough for a reliable quality measure, since it does not consider agreement by chance: the simpler the annotation, the higher the probability of an agreement by chance. Table 1.3 recreates a fictional contingency table for the annotation of two categories (Support and Conflict) in a corpus with 39 argumentative instances. Diagonal shows the actual agreement between both annotators (both agree that there were the same 15 instances of Support and the same 18 instances of Conflict) and the other values show disagreement; for instance, there were 2 instances classified as Support by Annotator 2 but annotated as Conflict by Annotator 1 and there were 4 instances
18 M. Pereira-Fariña Table 1.3 Contingency table for the annotation of two argumentative categories, Support and Conflict, in a corpus with 39 instances Annotator 2 Support Conflict Total Annotator 1 Support Conflict 15 2 4 18 19 20 Total 17 22 39 of Support annotated by Annotator 1 that were labelled as Conflict by Annotator 2. Thanks to this representation, it is easy to infer what are the prevalence categories or whether disagreements are concentrated into a single category, or they are sparse amongst most of them. The well-known Kappa measures, based on the notion of observed agreement, are useful both for assessing the reliability of the guidelines for annotation and the quality of the annotated corpora. Two are the most widely used metrics are:5 • Cohen’s kappa (Cohen, 1960): It compares the annotation between two annotators. • Fleiss’ kappa (Fleiss, 1971): Very similar to Cohen’s kappa, it compares the annotation between more than two annotators. Both measures define a coefficient between the observed agreement and the expected agreement; i.e. the chances of two annotators matches based on how they actually annotated categories. This means that the expected agreement between two annotators in two categories is not 50%, but the probability of each annotator to assign a specific category to a unit. Let’s imagine an unreliable annotator that applies the same category to all the units; the expected value for that annotator is not 50% to each category but 100% to one of them. Cohen’s kappa is defined as follows: κ= . (Ao − Ae ) (1 − Ae ) where Ao stands for the observed agreement and Ae stands for the expected agreement by chance (it must be calculated). There are different computational resources for obtaining Cohen’s kappa metrics, such as R, SPSS, Excel or even webpages (i.e., https://www.graphpad.com/quickcalcs/kappa1/). For instance, the Cohen’s kappa for the annotators of Table 1.3 is 0.691. Cohen’s kappa (and Fleiss’ kappa, since it is a generalization of Cohen’s kappa) presupposes an equal negative impact for all errors. However, this is not always the case. Let’s imagine an annotation task where annotators have to identify inference types and conflict types. A disagreement between annotating inference and conflict is much more relevant that a disagreement between inference types or conflict types. Weighted coefficients for errors allow us to rank disagreements according to their 5 For a very detailed analysis of kappa measures, cfr. (Fort, 2016).
1 Introduction to Discourse Analysis and Argumentation Theory 19 impact on the annotation. The most used of this type of measures is a weighted version of Cohen’s kappa (Cohen, 1968): κw = 1 − . Do De where Do is the observed disagreement and De the expected disagreement; i.e., the chance of agreement given the distance between categories (if they can be ordered) or the relevance of the error. It is worth noting that this weight is not empirically obtained but it depends on the knowledge or intuitions of the campaign manager. This is especially relevant in annotation campaigns where categories are highly interpretative or subjective, because this might introduce a bias from the manager. Other campaigns, where the annotation is not as subjective, such as argument schemes annotation (Visser et al., 2021), weighted Cohen’s kappa can provide an informative result of the reliability of the annotation. The result of kappa measures ranges from −1 to 1, where −1 means total disagreement, 1 total agreement and 0 no agreement. There is not unanimity in the selection of a single value to say what is a good agreement, since this is highly dependent on the complexity of the annotation task. However, three main ranges are generally stablished: • (0,0.4]: reliability is poor or very poor. • (0.4,0.8): reliability is good. • [0.8,1]: reliability is very good or almost perfect. Metrics based on gold standard aims to detect error in annotations rather than disagreements. The most used metric is F-measure, originally designed for information retrieval, and it is a useful tool for quality evaluation of in annotation campaigns (Fort, 2016). It is simpler to calculate than any of the previous measures and each annotation category can be individually evaluated. F-measure is defined as the weighted average of recall and precision, and it range from 0 to 1 ([0, 1]), where 0 is the worst result and 1 the perfect one: F − measure = 2· . precision· recall precision + recall where precision and recall are, respectively: Recall = . Number of correct annotations by annotator Number of correct annotations in gold standard Precision = . Number of correct annotations by annotator Total number of annotations by annotator
20 M. Pereira-Fariña The key point for F-measures is the gold standard. This can be built as the result of a very careful expert annotation or as the result of the overlapping and solved disagreements between two annotators. F-measure is very useful for training annotators because it allows us to analyse each annotated category individually and guide the training to improve errors very specifically. Its main drawback is that chance is not taking into account and this can have a severe impact on the reliability of the evaluation when the number of categories is very low (e.g., in annotation campaign with only two categories, baseline for F-measure is already 0.5 if both annotators randomly annotate the text); however, when number of categories is very high, kappa measures and F-measure tend to agree on (Hripcsak & Rothschild, 2005). Kappa family and F-measure are the two most widely metrics for assessing the quality of an annotation. There are other metrics (Fort, 2016, pp. 50–62), although these are out of the scope of this introductory chapter. A complementary measure is Intra-annotator agreement, which consists in applying the same measures but comparing the annotator with himself in order to assess the reproducibility of the annotation. 1.3.4 Finalization Finalization is the last stage of any annotation campaign and, when it is completed, it means that the corpus is ready to be made public. His main responsible is the campaign manager since it is the maximum expert both in the topic and in the theoretical framework that support annotation guidelines. Three main parts constitute it (Fort, 2016): (i) adjucation; (ii) technical reviewing; and, (iii) publication. The adjucation is the correction, by the campaign manager or other expert, of the annotated corpora. Its goal is to remove all discrepancies between annotators. In this case, the campaign manager (or another expert) has to review the annotation, check the competing interpretations and decide which one fits better with the annotation guidelines. The technical reviewing consists in checking that there is not errors or corrupted files. For instance, IAT guidelines (and IAT/ML as well) indicate that every inference node must be anchored to a transition, therefore any un-anchored node is a technical error. In addition, it should be checked whether the annotation includes any forbidden character or element that might corrupt the final file. Ideally, this reviewing is automatically or, at least, semi-automatically done and integrated in the annotation tool itself. Publication is the very last step of the process, meaning that the job has been finished. The annotated corpus should be published with the last version of the annotation guidelines. For instance, if during the adjucation appears an error that requires a modification in the guidelines, then that updated version of the annotation guide must go with the annotated corpora.
1 Introduction to Discourse Analysis and Argumentation Theory 21 1.4 Exploiting the Results Goodly annotated corpora are valuable resources. They can be exploited in several ways, both for the development of practical applications, such as the development of machine learning algorithms for Natural Language Processing, and for addressing theoretical issues, such as the study of specific research questions. In addition, they are highly reusable, since the same corpora can be used by different researchers and for tackling different research questions, even formulated years after the annotation has been done, such as in (Visser et al., 2021). In this chapter, we will focus on two ways of exploiting an annotated corpus of arguments: (i) descriptive metrics; and (ii) interpretative analysis. 1.4.1 Descriptive Metrics Descriptive metrics provide a first and quick insight of the annotated corpora. They consist of a range of statistics providing different numbers that allows us to build an overview of the corpora (Lawrence et al., 2016). The most basic stats, proposed in (Lawrence et al., 2016), are: • Raw numbers: A quantitative description of the entities that constitute the annotated corpora, such as the number of words, sentences, etc. Its main idea is to obtain a general idea of the size of the corpus. • Number of propositions: This shows the number of annotated propositions, which are one of the central points in any argumentative theoretical framework. • Number of argumentative nodes: It indicates the number of propositions linked with, at least, another proposition. This give us a general idea of the nature of the text, whether it is highly argumentative (a high proportion of arguments with respect the number of propositions) or not (a low proportion of arguments with respect the number of propositions). Depending on the adopted theoretical framework, more elaborated stats can be defined. Next, we propose some exploratory metrics ones based on IAT/ML (many of them are also applicable to IAT): • Categories of illocutionary connections: This metrics must capture the number of instances of each category of the illocutionary forces by speaker. The total number is the same of the number of propositions (every proposition has an illocutionary force) but the distribution among the different categories suggests us the type of text or even its genre; for instance, a text with a high proportion of challenging by one speaker and not by the other indicates that this might be an interview; on the other hand, a more or less well balance number of challenging might indicate a debate. • Number of conflict nodes and rephrase nodes: They are complementary to the metric of argumentative nodes. The combination of the three of them provides
22 M. Pereira-Fariña us some insights about the nature of the annotated corpora, whether it is a highly controversial topic or not, for instance. • Number of objects: This metric derives from conceptual modelling (GonzalezPerez, 2018). It denotes the number of relevant entities in the world mentioned in the corpora and the values of its attributes. This provides us insights about the presumptions and background of the speakers involved in the debate, what are their respective starting points and what is their final status with respect their beliefs or commitments respect to the world. • Hot objects: It is the result of calculating the number of references from the propositions that an object receives. This indicates how central is the entity in the debate and allows us to identify the main positions of speakers with respect to it. Given the statistical nature of these metrics, typical visualization methods, such as tables, boxplots, diagrams, etc. are perfectly suitable for the dissemination and communication of these results. 1.4.2 Interpretative Analysis We can go further than basic stats and elaborate a more in-depth analysis of the annotated corpora. A well-defined set of these type of statistics for IAT are described in (Lawrence et al., 2016), which distinguished between two main families: (i) dialogically oriented statistics, which focus on providing insights about the dynamics and complex interactions of the debate; and, (ii) real-time statistics, which focus on displaying how the debate is evolving. Dialogically stats unpack the inner dynamics of the debate, showing the key points of the interaction among speakers once the debate is over. They provide a posteriori analysis which allows us to characterise, among other things, the presence of each speaker in the discussion according to the uttered locutions, the interaction among participants (who replies who and when) or what are the most controversial propositions of the debate, those that have generated more reactions among speakers. An application of this technology in a real-world case can be checked at Piloting argument technology with the BBC,6 a project developed by the Center for Argument Technology (ARG-tech) and the BBC in 2017. Real-time stats, on the other hand, focus on showing how the debate is live evolving (Plüss et al., 2018). The current status of this technology only allows us to visualize the transcription of the debate, the participation of the speakers and the topics that have been discussed. However, since this is shown in real time, it can modify the behaviour of the proper participants: the visuals show if someone is speaking too much or who is totally quiet. 6 http://bbc.arg.tech/bts/
1 Introduction to Discourse Analysis and Argumentation Theory 23 A step further in this interpretative analysis of an annotated corpora under IAT/ML theory is to analyse its underlying conceptual model. Currently, I only propose a posteriori analysis, but it can provide us interesting insights about the worldview of each speaker and the potential overlapping and discrepancies between them. We distinguish three main families of metrics: (i) ontological oriented statistics; (ii) hierarchical oriented statistics; and (iii) temporal statistics. Ontological oriented statistics focus on the discrepancies among the entities presupposed or accepted by each speaker. These contain two main types of ontological disagreements (Gonzalez-Perez, 2018, pp. 158–159): • Predication conflict: It occurs when two different speakers agree both on the existence of an entity and its properties, but they differ on the values assigned to them. For instance, in Figs. 1.1 and 1.3, R1 and R2 agree on the existence of the “RightOfUse” but they differ on its predication, R1 asserts than “Everyone might access” while R2 asserts only “Experts might access”. • Existence conflict: It happens when two different speakers disagree on the existence of the entity itself; one of the speakers says that the entity exists while the other denies it. For instance, R2 asserts the existence of different situations that might affect the conservation of the Cave, which are not recognised by R1. Hierarchical oriented statistics describe what are the relationships between the different entities handled by speakers (both categories and instances of that categories). The typical disagreement that happens here is the classification conflict (GonzalezPerez, 2018), when two speakers classify an entity in different manners. In the debate between R1 and R2 there is not such as conflict, but let’s imagine a discussion where R1 is an archaeologist and R2 a geologist, both agree with the existence of the cave, but they might differ in its classification; the former asserts that is importance relies on being a cultural object while the latter only emphasises its relevance as a geological element. Finally, temporal statistics focus on assessing how many changes have occurred between the initial conceptual model (defined at the beginning of the debate) and the final one, when the debate has ended. These changes are derived from the disagreements described before and they capture whether they were solved (a change in the conceptual model) or not (the disagreement is kept). This would allow us to assess how the discourse world has been modified after the debate. 1.5 Conclusion Discourse analysis, and specifically argumentative analysis as an essential part of it, are useful theoretical frameworks to understand how speakers motivates and justifies their ideas or actions in the world and how these are communicated to others for persuading them. Despite this is a qualitative approach, highly interpretative and context dependent task, it must be rigorously performed in order to obtain acceptable results from a scientific point of view.
24 M. Pereira-Fariña There are different approaches for tackling the analysis of the argumentative structure of a given discourse in the literature. We have described three of them, which relies on the different views about what an argument is. Thus, when an argument is considered as a logical entity, it is considered as an isolated unity and the goal of the analysis is to identify its inner structure; when an argument is conceived as a speech act, the argument is studied as a communication element where the intention of the speaker plays a major role; and, when an argument is perceived as a cognitive element, it has a communicative part and a representational one and both the intention of the speaker and the background of the hearer are part of the analysis to reconstruct the discourse world underlying the debate. Each of these frameworks allows us to achieve different goals, ranging from looking for pure linguistic results to uncover the beliefs, intentions and commitments of speakers. Each theoretical approach determines how discourse in general and arguments in particular must be annotated. Annotation is the procedure through we add interpretative information to a text, and it is an essential step for performing a reliable analysis. Its key point is the definition of the annotation guidelines, which translate the theoretical principles into specific labels to be added. I have described the basic principles that should guide an annotation campaign for obtaining a valuable result, which include data collection, the creation of annotation guidelines, the training of the annotators and the finalization of the campaign. The final step is the exploitation and communication of the results. The exploitation is usually based on a qualitative analysis of the annotated corpus, identifying patterns, regularities, etc. supported on a quantitative analysis as well (such as the number of words, propositions, inferences, conflicts, rephrases, etc.). Visualization is the key point for the communication of these results, since annotated corpora is rarely friendly for a non-expert user and difficult to interpret. However, any discourse analysis study has its limitations. Firstly, theoretical frameworks are very interpretative and different researchers can elaborate different annotation guidelines from the same theory. This might potentially generate confusion when the obtained results are compared, but it is also a source of richness, showing how complex natural language is when it is studied inside its communicative dimension (rather than a representative one). Secondly, elaborating annotation guidelines is a time-consuming task which usually required several cycles, and a perfect and absolutely complete set of guidelines is impossible to reach. However, a rigorous evaluation both the process and the annotated corpora can guarantee a reliable result. The last main drawback is the risk of bias, because it is difficult for a researcher to avoid his or her presumptions or prejudices about the topic of analysis; however, this problem can be minimised following a rigorous evaluation as it was described. Discourse analysis in general, and argumentative analysis, is a methodology which will allow us to acquire a better understanding about how knowledge is produced and communicated to others in cultural heritage studies, among other areas.
1 Introduction to Discourse Analysis and Argumentation Theory 25 Acknowledgements This research has received financial support from the grant “Heritage 3.0: Argumentation and Conceptual Modelling for Enhanced Cultural Heritage Participation and Management Policies” (ACME), grant number PID2020-114758RBI00 funded by MCIN/AEI/10.13039/501100011033 and project “Deflationist Views in Ontology and Metaontology”, grant number PID2020-115482GB-I00, both funded by MCIN/AEI/10.13039/501100011033. References Austin, J. L. (1989). How to do things with words: The William James lectures delivered at Harvard University in 1955 (2nd ed.). University Press. Black, M. (1967). Induction. In P. Edwards (Ed.), The encyclopedia of philosophy (pp. 169–181). Macmillan/Free Press and Collier-Macmillan. Chandler, D. (2003). Semiotics: The basics (1st publication repr ed.). Routledge. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46. Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4), 213–220. Eco, U. (1979). A theory of semiotics. Indiana University Press. Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378–382. Fort, K. (2016). Collaborative annotation for reliable natural language processing: Technical and sociological aspects. ISTE/Wiley. Gee, J. P. (2011). An introduction to discourse analysis: Theory and method (3rd ed.). Routledge. Gonzalez-Perez, C. (2018). Information modelling for archaeology and anthropology: Software engineering principles for cultural heritage. Springer. Gonzalez-Perez, C. (2020). Connecting discourse and domain models in discourse analysis through ontological proxies. Electronics (Basel), 9(11), 1955. Hovy, E., & Lavid, J. (2010). Towards a “science” of corpus annotation: A new methodological challenge for corpus. International Journal of Translation, 22(1), 13–36. Hripcsak, G., & Rothschild, A. S. (2005). Agreement, the f-measure, and reliability in information retrieval. Journal of the American Medical Informatics Association: JAMIA, 12(3), 296–298. Janier, M., & Reed, C. (2017). I didn’t say that! Uses of SAY in mediation discourse. Discourse Studies, 19(6), 619–647. Jo, Y., Mayfield, E., Reed, C., & Hovy, E. (2020). Machine-aided annotation for fine-grained proposition types in argumentation. In 12th international conference on language resources and evaluation Marseille (p. 1008). Lawrence, J., Duthie, R., Budzynska, K., & Reed, C. (2016). Argument analytics (p. 371). Lucas, G. (2019). Writing the past (1st ed.). Routledge. Mercier, H. (2012). Looking for arguments. Argumentation, 26(3), 305–324. Mercier, H., & Sperber, D. (2018). The enigma of reason. Penguin Books. Paltridge, B. (2012). Discourse analysis: An introduction (2nd ed.). Bloomsbury. Peirce, C. S. (1958). Collected papers of Charles Sanders Peirce. Harvard University Press. Perelman, C., & Olbrechts-Tyteca, L. (1973). The new rhetoric. A treatise on argumentation. University of Notre Dame Press. Plüss, B., Sperrle, F., Gold, V., El-Assady, M., Hautli-Janisz, A., Budzynska, K., & Reed, C. (2018). Augmenting public deliberations through stream argument analytics and visualisations.18 October 2018 through 19 October 2018, p. 1. Pustejovsky, J., & Stubbs, A. (2012). Natural language annotation for machine learning. O’Reilly Media, Incorporated.
26 M. Pereira-Fariña Reed, C., & Budzysnka, K. (2010). How dialogues create arguments. In F. van Eemeren, B. Garrsen, D. Godden, & G. Mitchell (Eds.), 7th international conference of the International Society for the Study of argumentation. Ronzenberg/Sic Sat. Richard, M. (2013). What are propositions? Canadian Journal of Philosophy, 43(5–6), 702–719. Salmon, W. C. (1984). Logic (3rd ed.). Prentice-Hall. Searle, J. (1965). What is a speech act? In Philosophy in America (pp. 221–239). Allen and Unwin. Searle, J., & Vanderveken, D. (1989). Foundations of illocutionary logic (1st ed., repr. ed.). University Press. Serrano, S. (1983). La Semiótica: una introducción a la teoría de los signos (2nd ed.). Montesinos. van den Hoven, P. J. (2015). Cognitive semiotics in argumentation; a theoretical exploration. Argumentation, 29(2), 157–176. van Eemeren, F. H., & Grootendorst, R. (1984). Speech acts in argumentative discussions. De Gruyter. van Eemeren, F. H., & Grootendorst, R. (2004). A systematic theory of argumentation (1st Publication ed.). Cambridge University Press. Visser, J., Lawrence, J., Wagemans, J. H. M., Reed, C., Modgil, S., & Budzynska, K. (2018). Revisiting computational models of argument schemes: Classification, annotation, comparison (p. 313). IOS Press. Visser, J., Lawrence, J., Reed, C., Wagemans, J., & Walton, D. (2021). Annotating argument schemes. Argumentation, 35(1), 101–139. Voormann, H., & Gut, U. (2008). Agile corpus creation. Corpus Linguistics and Linguistic Theory, 4(2), 235–251. Wagemans, J. H. M. (2016). Constructing a periodic table of arguments. Available: https:// scholar.uwindsor.ca/ossaarchive/OSSA11/papersandcommentaries/106. 29 Jan 2021. Wagemans, J. H. M. (2019). Four basic argument forms. Research in Language, 17(1), 57–69. Walton, D., Reed, C., & Macagno, F. (2008). Argumentation schemes. Cambridge University Press.
Part I Conceptual Approaches
Chapter 2 A New Approach to Interoperable Argumentation Documentation Stephen Stead Abstract The chapter outlines the development of support for inference chains in the CIDOC Conceptual Reference Model family of standards. It illustrates the capabilities available in CRMbase and notes the limitation that the evolution of any assertion or knowledge revision cannot be adequately documented. It then continues by detailing the extended facilities delivered by CRMinf that address this shortcoming. Next it considers the added potential to document the scholarly reading of texts that are considered specious and finally looks to future work on the extension. Keywords CIDOC CRM · CRMinf · Argumentation · Inference chains 2.1 History The CIDOC Conceptual Reference Model (CIDOC CRM) is a formal ontology intended to facilitate the integration of cultural heritage data sets. It provides the semantic definitions of elements of scholarly discourse, about both tangible and intangible heritage, that are needed to support such cross-organisational integration. It has been an ISO standard (ISO 21127) since 2004 and is actively maintained, refined, and enhanced by a user community known as the CRM Special Interest Group (CRM-SIG). It consists of a core set of concepts (CRMbase ), that form the ISO standard, plus a family of extensions that add the elements needed to cover more specialised areas of data and their integration. The idea of incorporating Jean-Claude Gardin’s concept of an “Inference Chain” (Gardin, pers. comm. and 1990) was first suggested at the CIDOC Conceptual Reference Model working meeting in Agios Pavlos, Crete in 2000 (Stead, pers. comm.). This became truly possible in version 3.4.2 (Crofts et al., 2003) when, based on the experience of the team at the University of Oslo, the properties P140 S. Stead (!) Paveprime Ltd, Purley, UK e-mail: steads@paveprime.org © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 C. Gonzalez-Perez et al. (eds.), Discourse and Argumentation in Archaeology: Conceptual and Computational Approaches, Quantitative Archaeology and Archaeological Modelling, https://doi.org/10.1007/978-3-031-37156-1_2 29
30 S. Stead assigned attribute to (was attributed by) and P141 assigned (was assigned by) were introduced. However, as Doerr et al. (2011) demonstrated this is not sufficient to represent both the structure and the evolution of argumentation, as it only allows the representation of a “finished” chain at the point in time of the completion of the current documentation state. However, despite the successful proof-of-concept implementation of the Integrated Argumentation Model (IAM) (Boutsika, 2010), it was felt that a lighter version was required for integration into the CRM standards family. The work on this was started in 2014 and, after initial consultation drafts, a working version with an RDFS representation was released in 2015. A further enhancement was released in 2019 that formalised the concept of scholarly reading that had been developing in the CRM-SIG (Special Interest Group) since 2017 (see Issue 334: Scholarly Reading at http://www.cidoc-crm.org/Issue/ID-334-scholarlyreading). 2.2 Representing Inference Chains in CRMbase The representation of simple Inference Chains that document the construction of archaeological phases can be done by simply using the property P46 is composed of (forms part of). This allows the documentation of the constituent elements that together, form larger groupings that are of interest to the researcher. For example, at the West House in Akrotiri, Santorini: E19 Physical Object [individual slab] P46i forms part of E18 Physical Thing [upper room slab surface] P46i forms part of E19 Physical Object [West House] (Mιχαηλίδoυ, 2001, pp. 40, 68–70) However, this representation does not capture that this is a later interpretation of the original excavation data. It only represents the “current” understanding at the time of the 2001 publication of the Inference Chain. A richer picture can be captured using E13 Attribute Assignment and associated properties to capture the details of the process of making such assertions. For example, the first triple in the above (E19-P46i-E18) can be enriched with the following: Path 1} E19 Physical Object [individual slab] P141i was assigned by E13 Attribute Assignment [connecting slab to floor] P140 assigned attribute to (was attributed by) E18 Physical Thing [upper room slab surface] Path 2} E13 Attribute Assignment [connecting slab to floor] P177 assigned property of type E55 Type [“P46 is composed of (forms part of)”] P2 has type (is type of) E55 Type [“Type of Property”] P2 has type (is type of) E55 Type [“Type of Type”] Path 3} E13 Attribute Assignment [connecting slab to floor] P14 carried out by (performed) E39 Actor [ID of actor] (P14.1 in the role of E55 Type [“Assigner”]) P1 is identified by (identifies) E41 Appellation [“Mιχαηλίδoυ, A.”] This is in effect a reification triangle over the original triple, but it additionally allows the connection of a timespan to the instance of E13 Attribute Assignment
2 A New Approach to Interoperable Argumentation Documentation 31 and information on how it was done, by adding a reference to the E29 Design or Procedure used. Path 4} E13 Attribute Assignment [connecting slab to floor] P4 has time-span (is time-span of) E52 Time-Span [2001] Path 5} E13 Attribute Assignment [connecting slab to floor] P33 used specific technique (was used by) E29 Design or Procedure [Guidelines for Fragment Reconstruction] Even this enriched representation provides no direct mechanism for understanding the evolution of the inference chain. The temporal order of instances of E13 Attribute Assignment can of course be recovered but representing changes in understanding, using only this temporal order, can make querying for the current state of knowledge very difficult. For instance, if an earlier assertion was made that the individual slab was part of a different slab surface (the lower room one for instance), how would it be known that the corresponding triple was no longer “current” when the new assertion about the upper slab floor was entered? It would be apparent that the corresponding E13 Attribute Assignment was more recent, but this would not enable the user to understand if the old assertion was no longer valid, or if both assertions were valid, or if they were competing assertions. The querier would need detailed understanding of the specifics of each knowledge revision process to be able to craft queries that gave the desired results and in a multi-source integration environment this would be extremely onerous. 2.3 The Building Blocks of CRMinf CRMinf provides the framework for documenting the evolution of Inference Chains. It recognises there are clear separations between the assertions that constitute an argument; the process and temporal order of argumentation activities; and the belief in the result. This clear separation is a novelty in argumentation models (Doerr et al., 2011) and enables two important capabilities. The first is the documentation of Inference Chains that were not constructed in the logical order of their constituent assertions: so in the first example the recognition of the Upper Room Slab Surface being part of the West House occurred before the assertion that the Individual Slab was part of it (Mαρινάτoς 1974; Mιχαηλίδoυ, 2001). That the chronology of argumentation does not match the logical sequence of the assertions that make up the argument is almost axiomatic in archaeology as the observation process is usually the inverse of the formation process. The second capability is the clear and unambiguous recording of the process and results of knowledge revision activities acting upon an information system. To enable this separation CRMinf provides three functional groups of classes and properties. In the first functional group the argumentation activities are located in time (process); the second provides the mechanism for understanding who believes what (belief) and the third what it is they believe in (the assertions).
32 S. Stead Fig. 2.1 The first functional group of CRMinf entities showing the three types of knowledge creation activity The first functional group differentiates (see Fig. 2.1) between three types of knowledge creation activity and groups them under the I1 Argumentation class. The stipulation is that the person, or group, undertaking the argumentation are making honest inferences or observations; that is they attest the resulting believe value is correct at the time the activity was undertaken and that any methodology used was correctly applied. Knowledge creation is specifically the instilling of justified belief in the mind of some person(s) (Bruseker et al., 2018). The first type of knowledge creation is observation. The class for this is defined in CRMsci and covers the use of human senses, often augmented with tools and instruments, to note some attributes of real-world objects and processes. These attributes are approximations to the traits exhibited by the real-world entities and take the form of propositions about them. Subsequently, these propositions can be used for inference making, including evaluation, and are typically recorded in an information system of some sort. The second type of knowledge creation is inference making. In this, a set of propositions has some kind of processing or inference logic (see Fig. 2.3) applied to it to produce a new set of propositions. In CRMinf the source and outcome sets of propositions always have a conviction or belief associated with them. This ensures that all the steps in an Inference Chain are fully provenanced. The inference logic that is applied can be in any form that is acceptable in the community of use. It can include employing formal logic; using probabilistic reasoning and other mathematical models; the application of social theory, or the comparison with cultural parallels. The third type of knowledge creation is belief adoption. This covers the cases where an existing belief in a set of propositions is adopted by a person or group who were not the creator(s) of the belief that has been adopted. The adoption does not have to include all the propositions in the original set, but it does have to accept the same level of belief as originally believed. To disagree with the original belief,
2 A New Approach to Interoperable Argumentation Documentation 33 Fig. 2.2 The second functional piece of CRMinf : the I2 Belief. It also shows the elements required for scholarly reading a process of inference making is required in which the original belief is the premise for the inference making. The second functional group is the conviction or believe itself (see Fig. 2.2). Convictions and beliefs are temporal entities: that is, they are perdurants and exist only for a period of time. They come into existence when an individual or group (an instance of E39 Actor in CRM terms) performs some kind of knowledge creation activity (observation, inference making or belief adoption (see above)) and pass out of existence when, either the individual dies (or the group is dissolved), or the actors change their believe in the associated propositions through an(other) inference making activity. Note that the adoption of another contradictory believe about the some or all of the propositions in the original propositions set should, necessarily, terminate the original believe, despite humans being contrary creatures that can simultaneously believe quite contradictory things. The problem with allowing an actor to hold such contradictory opinions, in a system of honest believes, is that it both undermines faith in the “honesty” and means that the automation of change propagation in downstream inference chains becomes highly problematic. Forcing the use of an inference making activity to change belief in the whole of the original proposition set and simultaneously creating one or more new beliefs (with the same believe value in zero, one or more of the original propositions and new believe values in all of the remainder) alleviates this issue. Beliefs are always held in relation to a particular proposition set (which is the third functional group) and associate a specific belief value with that proposition set for their lifespan. Belief values can take many forms, including Bayesian probability values or ordinal scales (for instance: “unlikely”, “possible”, “probable”, “certain”). However, the
34 S. Stead minimum implementation requirement is a three-value system of “True”, “False”, and “Unknown”. This allows automation of downstream change propagation, with previous values of “True” or “False” being changed to “Unknown” for all inferences downstream of a changed belief value. Boutsika (2010) reports on a successful proof-of-concept implementation of this strategy for inferences about the Oetzi Iceman. The third functional group is the proposition set that the belief is held about (See Fig. 2.3). The standard is agnostic as to how this is implemented but requires that the propositions are uniquely identifiable and refer to recognisable instances of the classes or concepts of a formal ontology (including, but not restricted to, the CRM). Guidelines for using Named Graphs are under development in the CRM-SIG (http://www.cidoc-crm.org/Issue/ID-526-named-graphusage-recommendations-guideline-document) and interest in RDF-star is also covered by this issue. The properties that link the components of these three functional groups are straightforward (see Fig. 2.4). All three knowledge creation activities are linked to the resulting conviction (the superclass of belief introduced to support scholarly reading (see below)) using the J2 concluded that (was concluded by) property. The link from inference making activities to the input/source conviction (i.e. belief) is provided by the J1 used as premise (was premise for) property and the link to the inference logic applied is made with the J3 applies (was applied by) property. Belief adoption is connected to the adopted belief by the J6 adopted (adopted by) property. Fig. 2.3 The third functional piece of CRMinf : the I4 Proposition Set. The figure also shows the position in the hierarchy of I3 Inference Logic and I6 Belief Value
2 A New Approach to Interoperable Argumentation Documentation 35 Fig. 2.4 An overview of the key CRMinf properties linking the classes (in red). Super-properties from CRMbase are shown in green Finally, the beliefs are connected to their proposition sets by the J4 that (is subject of) property and to their believe values by the J5 holds to be property. 2.4 The Extension of CRMinf to Cover Scholarly Reading As originally conceived the CRMinf provides a rich framework for documenting the adoption of a scholar’s belief by another. However, what about the case where the requirement is to document a publication where some or all of the content is considered specious. Here the belief is that the content was correctly interpreted, irrespective of the readers belief in the propositions set out in the publication. This has been generalized to cover the uncontentious reading of any publication. By uncontentious reading the standard intends to cover cases where multiple scholars are likely to agree on the propositions that would be recovered from inspecting a copy (i.e. the symbolic representation) of the publication, even while some of them vehemently reject these propositions: that is scholars may agree that Gaius Suetonius Tranquillus wrote in De Vita Caesarum that Nero was singing in Rome while it was burning from July 19 in 64 CE even if they do not believe that this is true. However, it is categorically not intended to cover cases where scholarly debate is about the “reading” of partially illegible texts. The new superclass I8 Conviction was introduced to provide a generalisation over I2 Belief and the new I9 Provenanced Comprehension, that covers the correct reading of the overt message of an instance of E73 Information Object and its conversion into a set of propositions (see Fig. 2.2). Such a reading and conversion
36 S. Stead is always undertaken in the context of an explicit statement about the provenance of the source being read. Three new properties provide the necessary links for such uncontentious reading: J8 understands (is understood by) links it to the source information object, J9 believes in provenance (provenance is believed by) attaches the explicit statement of provenance and J10 reads as connects to the proposition set generated by the reading. At the same time as the introduction of the provenanced comprehension construct a set of simple links to the source of the proposition set being adopted by belief adoption activities were introduced. This is to provide a parallel, simple mechanism for documenting uncontentious reading of sources that are believed and provides simple links or shortcuts to a range of potential source types: J7 is based on evidence from (is evidence for) links to information objects, J11 used manifestation (was manifestation used by) links to LRMoo manifestations and J12 used item (was item used by) links to LRMoo items. 2.5 The Future of CRMinf Work continues on CRMinf and the development of support for I11 Situations, as a subclass of proposition sets, that deal the persistence of value ranges of things over a timespan is being undertaken as part of the work on the CRMsoc extension that is intended to deal with social relationships and obligations. In addition, the new, as yet unnamed, extension that provides properties that link to types rather than instances of classes, as well as “Closed-World” assertions about types of things that are not present will provide exciting new opportunities to exploit the power of CRMinf . References Boutsika, K. 2010. Computer supported collaborative factual argumentation and conflict resolution (Masters of Science thesis). Department of Computer Science, University of Crete. Bruseker, R., Daskalaki, M., Doerr, M., & Stead, S. (2018). 2018 is that a good concept? In Mieko Matsumoto and Espen Uleberg (Eds.), CAA2016: Oceans of data proceedings of the 44th conference on computer applications and quantitative methods in archaeology. Crofts, N., Doerr, M., Gill, T., Stead, S., & Stiff, M. (2003). Definition of the CIDOC conceptual reference model. Version 3.4.2. Doerr, M., Kritsotaki, A., & Boutsika, K. (2011). Factual argumentation—A core model for assertions making. Journal on Computing and Cultural Heritage, 3(3), 34. https://doi.org/ 10.1145/1921614.192161 Gardin, J-CL. 1990. The structure of archaeological theories. In Studies in modern archaeology. Vol 3. Mathematics and information science in archaeology: A flexible framework (pp. 7–25). Bonn. Mαρινάτoς, #. (1974). Aνασ καϕαί Θ ήρας VI. BAE 64, π ιν.38β. H Eν Aθήναις Aρχαιoλoγική Eταιρεία. Mιχαηλίδoυ, A. (2001). Aκρωτ ήρι Θ ήρας. H µελ0́τ η τ ων oρ óϕων σ τ α κτ ήρια τ oυ oικισ µoύ. BAE 212, Aθήνα: H Eν Aθήναις Aρχαιoλoγική Eταιρεία.
Chapter 3 Making Good Arguments in Archaeology Michael E. Smith Abstract This chapter reviews epistemological and methodological issues of argumentation in archaeology. It begins with historical reasons for the lack of attention to argumentation in recent decades. Next, it reviews the status of archaeological argumentation as set out in a 2015 paper (Smith ME, SAA Archaeol Record 15:18– 23, 2015b). This is followed by an expansion of this line of thought based on a methodological approach initiated by Stephen Toulmin in 1958. Toulmin’s scheme is based on visual diagrams to show the sequential steps in an argument. It is a particularly helpful method to show the difference between strong and weak archaeological arguments about the past. I examine four archaeological arguments, and use Toulmin’s method to assess their strength. The final topic is an examination of archaeological modeling as a form of argument. Keywords Arguments · Stephen Toulmin · Warrants · Analogy 3.1 Introduction After some serious missteps by Lewis Binford and other new archaeologists in the 1970s, most archaeologists stopped paying attention to the forms of argumentation in our field. Discussions of the use of analogy continued for a while longer (Wylie, 1985), but they eventually died down. Postprocessualists and scientifically-minded archaeologists alike avoided the argumentation topic, leaving the theme of archaeological epistemology impoverished. Recently, however, attention to arguments in archaeology has started to grow again (Smith, 2017; Chapman & Wylie, 2015, 2016; Smith, 2015b; Orser, 2014; Gibbon, 2014; Moro Abadía & Lewis-Sing, 2021 Currie, 2016). This chapter is a contribution to this line of work. M. E. Smith (!) School of Human Evolution and Social Change, Arizona State University, Tempe, AZ, USA e-mail: mesmith9@asu.edu © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 C. Gonzalez-Perez et al. (eds.), Discourse and Argumentation in Archaeology: Conceptual and Computational Approaches, Quantitative Archaeology and Archaeological Modelling, https://doi.org/10.1007/978-3-031-37156-1_3 37
38 M. E. Smith This chapter begins with historical reasons for the lack of attention to argumentation in recent decades. Next, it reviews the status of archaeological argumentation status as set out in a 2015 paper (Smith, 2015b). This is followed by an expansion of this line of thought based on a methodological approach initiated by Stephen Toulmin in 1958 (Toulmin, 2003). Toulmin’s scheme is based on visual diagrams to show the sequential steps in an argument. It is a particularly helpful method to show the difference between strong and weak archaeological arguments about the past. The final topic is an examination of archaeological modeling as a form of argument. 3.2 The Historical Lack of Attention to Argumentation in Archaeology In the 1960s and 1970s, Lewis Binford and the processualist archaeologists adopted the covering law approach associated with Carl Hempel (1965) and the logical positivist philosophers of science. The most explicit and strident statement of this choice was the book Explanation in Archaeology: An Explicitly Scientific Approach (Watson et al., 1971). To explain an event, these authors claimed, is to subsume the event under a general law, which then implies that the law explains the event. By the time the processualists had adopted what Wylie (2002) calls “Hempelian positivism,” philosophers of science had already rejected this approach to explanation as inappropriate for the social and historical sciences. Indeed, the explanatory approach of Watson et al. was criticized by a philosopher of science (Morgan, 1973). Watson et al. responded to this critique by saying that as a philosopher of science, Morgan didn’t know anything about archaeology, and therefore he should leave them alone (Watson et al., 1974). Independent of the philosophy of science, some archaeologists were quick to criticize the use of Hempel and covering laws by their colleagues. Jeremy Sabloff et al. (1973:112) called this work “naïve,” and Kent Flannery (1973:51)—never one to mince his words—opined that the Hempelian approach “has produced some of the worst archaeology on record.” The only covering laws archaeologists could come up with were so trivial that Flannery called them “Mickey Mouse laws” (p. 51). Neverthless, the processualists maintained their use of covering law explanation, in spite of its lack of fit for historical and social sciences. This stubborn holding to a faulty view of explanation “caused great harm to archaeology by setting scientifically minded archaeologists on an unproductive tangent” (Smith, 2017:521). During the 1970s and 1980s archaeology published in the English language saw acrimonious debates between two opposing philosophical and epistemological camps. Lewis Binford and the processualists promoted a scientific approach that used a faulty explanatory method (covering laws), while Ian Hodder and the postprocessualists promoted a non-scientific and humanities-oriented archaeology. Bruce Trigger (2006:444–478) characterized this debate as part of a long-term conflict between rationalism (processualism) and romanticism (post-processualism).
3 Making Good Arguments in Archaeology 39 In the wake of these archaeological “theory wars,” neither camp engaged publicly or published on the issue of argumentation. The postprocessualists—in their embrace of social constructivism, interpretivism, and the primacy of meaning— ignored explicit considerations of argumentation and explanations. In the words of Bruce Trigger (2006:466), “neither Leroi-Gourhan nor Hodder discovered how to advance beyond speculation in interpreting the meaning of such regularities.” Postprocessualists avoided engagement with the philosophy of science concepts on these topics, with one exception. That exception was an erroneous claim that logical positivism was synonymous with science, and therefore the more general scholarly rejection of logical positivism as an adequate explanatory framework implied that science was not appropriate for archaeology (Johnson, 2010; Martinón-Torres & Killick, 2013). If these authors were correct on this matter, then much of science— from genetics to astronomy—would cease to exist. But, of course, few if any of the sciences have ever used the covering law approach for many decades. Whether this claim by the postprocessualists originated in ignorance or guile, it does nothing to advance argumentation or theory in archaeology (Smith, 2017). The processualists and their descendants, whose theoretical approach has been called “processualist-plus” (Hegmon, 2003), similarly failed to engage explicitly with epistemological issues surrounding arguments and explanation, although there were a few exceptions (e.g., Fogelin, 2007). Archaeologists employing a scientific epistemology largely put their heads down, worked on their own materials with the best explanatory schemes they knew of, and stayed away from public discussion of arguments and explanations. In Trigger’s (2006:462) words, “Although American archaeologists were increasingly open to theoretical diversity, most of them lacked the ambition to try to determine in an operational manner under what circumstances specific sorts of theories were and were not applicable.” Allison Wylie noted that during the 1990s the loud debates between the processualists and the postprocessualists largely subsided from public view. She observed that, “An unreflective ‘live and let live’ pluralism exempts a great many untenable assumptions from reasoned examination” (Wylie, 2017a:129). But the situation is more serious than she suggested. Wylie has failed to examine the spread of weak and speculative arguments by the postprocessualists and their descendants. It seems likely that a major reason for this is the lack of a robust epistemological literature in archaeology that provides guidelines on acceptable arguments and explanations. Many archaeologists tried to conduct their research using a scientific perspective, but were frustrated with the lack of epistemological discussion of how to improve archaeological arguments. Nearly all of the professional debate was on the level of abstract social theory, not epistemology. In my own case, I began reading in social science disciplines beyond my home discipline of anthropology, largely to learn about research on cities and neighborhoods. I was pleasantly surprised to find an active epistemological literature on explanation and argument, particularly in the fields of sociology, political science, and historical social science (Tilly, 2008; Gerring, 2012; Abbott, 2004; Mahoney et al., 2009). Covering law explanations are absent from these fields, and causal mechanisms provide the dominant form of explanation (Demeulenaere, 2011; Hedström, 2005); see other papers in this
40 M. E. Smith volume for discussions of causality. The social-science works cited above lead one to the philosophy of social science and history for suggestions on how to improve argumentation in archaeology (Bunge, 2004; Little, 2010; Manicas, 2006). 3.3 Archaeological Argumentation A 2015 paper on argumentation (Smith, 2015b) included three broad critiques of widespread archaeological practices of argumentation and explanation: the lack of testing of ideas; the poor use of methods of analogy; and a reliance on abstract, philosophical social theory. This section reviews the current status of these issues and provide background for a discussion of Stephen Toulmin’s methods for analyzing arguments. 3.3.1 The Importance of Testing Stephen Haber (1999:312) articulates an important consideration in argumentation as follows: “The fundamental question of all serious fields of scholarly inquiry {is}: How would you know if you are wrong?” (Haber, 1999:312). This notion derives from Karl Popper’s (1934) concept of falsifiability. For Popper, scientific explanations must be falsifiable. He emphasized crucial experiments that can falsify definitively one or more propositions. In the social sciences, however, such definitive experiments are rare. In the words of John Gerring (2012:31), “Some theories are more falsifiable than others.” In their textbook on social science methods, Charles Ragin and Lisa Amoroso discuss the importance of testing as follows: By testing hypotheses, it is possible to improve the overall quality of the pool of ideas. Ideas that fail to receive support gradually lose their appeal, while those that are supported more consistently gain greater stature in the pool. While a single unsuccessful hypothesis rarely kills a theory, over time, unsupported ideas fade from current thinking. It is important to identify the most fertile and powerful ways of thinking and to assess different ideas, comparing them as explanations of general patterns and features of social life. Testing theories can also serve to refine them. By working through the implications of a theory and then testing this refinement, it is possible to progressively improve and elaborate a set of ideas. (Ragin & Amoroso, 2011:39). This emphasis on testing is almost universal in the literature on social science methods. Perri 6 and Christine Bellamy (2012:52) conclude a discussion of Popper’s views by observing that, for social scientists, “to meet scientific standards of rigour, theories must be stipulated in ways that make them empirically testable.” Philosopher of science Mario Bunge, discussing Bruce Trigger’s approach to archaeology, states that the scientific method “may be boiled down to the rule, check your guesses” (Bunge, 2013:153). Perhaps not surprisingly, quite a few postprocessualist archaeologists are on record opposing the usefulness of testing.
3 Making Good Arguments in Archaeology 41 Mathew Johnson (2010:223), for example, argues that archaeologists should “shift from a language of ‘testing’ to a language of ‘evaluation’,” and Ian Hodder and Scott Hutson (2003:239) claim that, “Instead of testing, we come to an understanding.” The post-hoc argument is a type of untested—and untestable—argument common in archaeology. I quote from my 2015 paper on this: Lewis Binford (1981) discussed problems with this procedure, which he called “post-hoc accommodative argument.” He was referring to an interpretation that is applied to the data and findings once the research activities are complete. The problem with post-hoc arguments is that they can’t be shown to be wrong. The analysis is done, and the post-hoc interpretation cannot be disproven without another round of research. We can all dream up numerous alternatives to explain (or explain away) any set of findings. But without some form of testing, post-hoc arguments serve to introduce potentially faulty or misleading interpretations into the literature. (Smith, 2015b:19). As pointed out by Geoffrey Clark (2000:852), such arguments are common in the field of paleoanthropology, in spite the fact that, “it is a weak form of explanation.” In some branches of psychology, post-hoc arguments are strongly condemned not only as problematic arguments but also as ethical lapses (Kerr, 1998; Leung, 2011). A common analogy for post-hoc arguments is a farmer who paints bulls-eyes around the bullet holes in his barn to show off his superior shooting skill. 3.3.2 The Decline of Argument by Analogy A large and sometimes contentious literature on the use of “ethnographic analogy” in Americanist archaeology was synthesized and formalized by Alison Wylie’s paper, “The Reaction Against Analogy” (Wylie, 1985). Nearly all of the explicit archaeological uses of the method of analogy—from Lewis Binford and the new archaeologists through Wylie’s paper—were in the form of inductive logic. Indeed, Wylie’s criteria for assessing the strength of an argument by analogy are almost identical to the criteria for inductive inferences as discussed in textbooks on logic (Copi, 1982:397–400). Analogies are neither correct nor incorrect; instead, they are more or less useful, typically depending on their strength. Here is what Wylie said about the strength of analogies: The standard criteria for evaluating what I have described as formal analogies are, then: the number and extent of similarities between source and subject; the number and diversity of sources cited in the premises in which known and inferred similarities co-occur as postulated for the subject; and finally, expansiveness of the conclusions relate to the premises (Wylie, 1985:98). The two strategies developed for strengthening formal analogy—the strategies of expanding the base of interpretation and elaborating the fit between source and subject—must be treated as directives for the active investigation of sources and subjects rather than as criteria for assessing analogical conclusions reflectively, after they are formulated. And the inquiry they initiate must be specifically designed to determine what causal connections hold between the material and cultural or behavioral variables of interest, and under what conditions these connections may be expected to hold (Wylie, 1985:101).
42 M. E. Smith This approach to argument by analogy—a formal argument based on the rules of inductive logic—has been abandoned by many archaeologists in the decades since 1985. In its place, archaeologists have begun using three problematic practices that generate weak and often misleading arguments: ad-hoc analogies, empty citations, and heuristic analogies. 1. The problem of “ad-hoc analogies” has been described as follows: Instead of following these simple and well-known guidelines, many authors today invoke analogy by citing one, or perhaps two, analogical cases from anywhere in the world that seem somehow related to the argument at hand. I refer to these arguments as “ad-hoc analogies.” There is little consideration for sampling or formal comparison. Ad hoc analogies provide no support at all for the argument at hand. The fact that some human group somewhere in the world did something vaguely similar to what you are claiming for your archaeological case does not in fact support your claim. (Smith, 2015b:20) 2. Empty citations are bibliographic references to works that do not contain any data supportive of the case at hand. Instead, they merely signal works that make a point similar to the point of the author. Such works are cited to lend an aura of support to the argument, when in fact they contain no empirical support at all. Empty citations are included in a work to falsely inflate the apparent strength or quality of an analogical argument. The classic discussion of empty citations is a paper by Anne-Wil Harzing (2002); other analyses include Todd et al. (2010), Henige (2011), and Abbott (2010). 3. The use of heuristic analogues is a growing practice in archaeology with parallels in the field of historical climate change and sustainability science. For the latter realm, Meyer et al. (1998) contrast formal and heuristic analogies. A formal analogy is an argument employing inductive logic, as promoted by Wylie. Heuristic analogues are less rigorous comparisons, either of whole societies or systems, or of parts of a small sample of societies. Typically, a complex historical or archaeological setting or event is compared with conditions today, without any formal testing. “They are heuristic because they are too complex or too contextually different to be formally specified” (Meyer et al., 1998:220). Examples include historical episodes (e.g., the collapse of an empire), and events (a plague); these can be based on historical narratives, archaeological data, paleoenvironmental reconstructions, or ethnographic documentation. Jared Diamond’s (2004) analyses of societal collapses are heuristic analogues, as are the cases promoted by Michael Glantz (1991, 2019) in what he called “forecasting by analogy.” While such analogues can be enlightening and educational, they rarely provide a rigorous scientific explanation or understanding (Dearing et al., 2010). They do not permit testing, and they do not conform to the criteria for a successful inductive (formal analogical) argument. Archaeologists are increasingly offering heuristic analogues in the name of argument by analogy. The procedure tends to go as follows. The archaeologist wants to explain attributes of a particular past cultural context, often a social or institutional setting; this is the target case. He or she chooses a single better-documented
3 Making Good Arguments in Archaeology 43 parallel case (from history, archaeology, or ethnography), and asserts that the two settings are sufficiently similar to apply information from the well-described case to the target case. This permits numerous details from the former to be simply applied to the latter without testing. Keith Eppich (2020), for example uses information from medieval Italy to illuminate Classic Maya society, and Maxime Lamoureux-St-Hilaire (2020) generalizes and promotes this process of heuristic analogue comparison for interpreting Classic Maya society. Davide Domenici’s (2018) application of Aztec evidence to Teotihuacan provides another example. These heuristic analogues in archaeology are a particularly weak form of argument. In comparison with formal analogy, where archaeologists have developed strategies to improve the strength and relevance of analogical reasoning (Wylie, 1985), heuristic analogues use only a single case for their source-side comparison. In his discussion of argument by analogy, Matthew Johnson (2010:66–69) phrases a number of hypothetical examples in terms of this kind of single-case, heuristic comparison. Additional critiques of complex untested analogues are found in the literature on comparisons in the discipline of history (Kocka, 2003; Sewell, 1967) and anthropology (Ember & Ember, 2009; Bodnár, 2019). While the single-case analogy may suffice if one’s goals are to “color the past,” or “make it recognizable to us and our audiences” (Lamoureux-St-Hilaire, 2020:8), such heuristic analogues are inadequate if our goal is to explain past events and processes (Meyer et al., 1998). 3.3.3 The Popularity of Abstract Social Theory For many archaeologists, “theory” has come to be synonymous with highly abstract social theory (Thomas, 2015). While this body of thought may be useful for understanding the social world on a very general, philosophical level, it is not of much help for understanding the basic human activities, institutions, and social conditions that comprise human life and society on a daily basis. High-level theory is very broad and applicable to many situations, but its empirical content is quite low (Abend, 2008; Mills, 1959; Tilly, 2008; Bunge, 1999). In the social sciences, most explanatory theory is of a lower epistemological level, and it is often called “middlerange theory” (Merton, 1968). This concept has almost nothing in common with the notion of middle-range theory as used by Lewis Binford (1983, 1989); for comment, see Raab and Goodyear (1984), or Smith (2015b). Because of the highly abstract epistemological level of much social theory in archaeology (Thomas, 2015), it is hard to make rigorous empirical arguments. Concepts such as practice theory, materiality, alterity and assemblage theory are consistent with a staggeringly broad range of propositions, to the point where it can be difficult to determine whether a particular argument is supported or not. Proponents of these approaches find it difficult or unpleasant to frame their arguments in a fashion that can be tested, including formal inductive analogies. Not surprisingly, those archaeologists who promote the use of abstract social theory
44 M. E. Smith (Johnson, 2010; Hodder & Hutson, 2003) are the same writers to disparage the use of testing in archaeology. 3.4 The Structure of Arguments: Stephen Toulmin’s Scheme “An argument is a connected series of statements intended to establish a proposition” (Monty Python, 1989:86). In 1958 Stephen Toulmin introduced a new formal approach to argumentation to the philosophy of science (Toulmin, 2003). In place of the former scheme—which was based on major premises, minor premises and conclusions—his approach emphasized the varied nature of facts and their level of support. He introduced the strength of arguments as an important consideration, using a diagram (Fig. 3.1) to show the logical trajectory from data or facts to the claim. Warrants, which providing justification and support for the claim, are a key feature. This scheme was then developed for archaeology by Chapman and Wylie (2016); see also Bonnin (2019) for an account relevant to archaeology. The following summary is based on Toulmin (2003) Chapman and Wylie (2016) and Bonnin (2019). Bonnin suggests that to begin the process, facts are preferable to data, the term used by Toulmin. Data connotes the total information generated by a project, whereas facts better describe the specific pieces of data arrayed for a given argument. The line from facts to claims is supported by warrants. Warrants are “general, hypothetical statements, which act as bridges and authorize the sort of step to which our particular argument commits us” (Toulmin, 2003:91). This is probably the most important innovation of Toulmin’s approach. Toulmin’s concept of warrant was incorporated into the generalized argument structure described in the textbook, The Craft of Research (Booth et al., 2008:chap. 7). These authors define warrant as a “general principle that justifies relating your particular reason to your particular claim” (p. 114). Most warrants in archaeological arguments consist of either comparative data or theory (Smith, 2015b:20), and a variety of models also regularly serve as archaeological warrants. Analogy is one of the formal ways that comparative data are employed as warrants. Warrants are justified and supported by backings. Bonnin (2019:6) defines backings as “further facts that can be brought to ensure the applicability of the warrants by specifying that the circumstances in which the warrants are applied are the right ones. Backings are Fig. 3.1 Stephen Toulmin’s diagram for arguments. (Graphic by Michael E. Smith, based on Toulmin (2003:97))
3 Making Good Arguments in Archaeology 45 secondary facts used in support of warrants. Backings are distinguished from facts functionally.” Rebuttals “identify exceptions and delimit the scope of an argument” (Chapman & Wylie, 2016:35); they “indicate specific circumstances in which the claim made would turn out to be invalid” (Bonnin, 2019:4). Finally, qualifiers describe the quality and strength of the evidence (facts, warrants, backings) as it relates to the strength of the argument. Toulmin’s (2003) examples of qualifiers include “This must be the case,” “This may be the case,” as well as terms such as certainly, probably, and possibly. These are difficult to incorporate into the basic diagram. Chapman and Wylie (2016) provide a number of complex archaeological arguments diagrammed with Toulmin’s scheme, and I refer readers to that source for an excellent discussion. 3.5 Using Warrants to Distinguish Strong and Weak Arguments The warrants in Toulmin’s scheme provide a way to measure the strength or weakness of a particular argument. In this section I link the topic of testing to Toulmin’s argument structure. The warrants in strong arguments are those that have been—or can be—tested and rejected if necessary, while many or most of the warrants in weak arguments cannot be tested. Social science methodologists 6 and Bellamy state, “We can define warrant as the degree of confidence that we have in an inference’s capability to deliver truths about the things we cannot observe directly” (6 and Bellamy 2012:13). Warrants and their backings provide support to the claim of an argument; they are not subjected to testing in a given study. If a given warrant and its backings have been tested in the past, their relevance and strength are much greater than a different warrant that has not been—or cannot be—tested. Economists Klappholz and Agassi discuss the importance of testing as follows: “our interest in testing stems from the fact that we learn by it. Yet in order to learn it is necessary that the test be such as to expose a hypothesis to the risk of falsification” (Klappholz & Agassi, 1959:65). These points are best illustrated with examples. Instead of using the graphical representation of arguments, I employ a list format that permits a more efficient use of space; these lists can be easily transformed into the graphical form of Fig. 3.1. For clarity of comparison, I limit consideration to relatively simple arguments for the use or significance of domestic artifacts and features. Two strong arguments from my own research are shown in Fig. 3.2. Argument 1—that small bowls were tools used in hand-spinning cotton—is supported by a number of warrants, each supported in turn by solid backings (Smith & Hirth, 1988; Smith, 2015a). I use the term “solid” because these backings have been exposed to evaluation in the past. For example, a prior implied claim that a twirling spindle
46 M. E. Smith Fig. 3.2 Schematic depiction of two strong arguments from the author’s research. Argument 1 is based on Smith and Hirth (1988) and Smith (2015a); argument 2 is based on Olson and Smith (2016) leaves abrasion on the base of a bowl (Warrant 2, Backing 1) has never been falsified. Most of the warrants for this argument amount to prior tests that have failed to falsify specific claims. As a result, Argument 1 is a strong argument that is widely accepted by archaeologists. The warrants and backings for Argument 2—that large houses at Aztec rural provincial sites were elite residences—are not quite as secure as in the first case. These backings are based on social patterns and trends, not on technological constraints as in Argument 1. While the social patterns are relatively strong (Olson & Smith, 2016; Smith, 1992), they are somewhat weaker than the
3 Making Good Arguments in Archaeology 47 warrants and backings in Argument 1. I judge “very likely” as an appropriate qualifier for this argument. Fig. 3.3 portrays two weak interpretivist arguments for the meanings of ceramic figurines in term of political ideology.1 Most archaeologists would probably agree that any interpretation of the meaning of ancient objects will be less secure— weaker—than most interpretations of the uses of objects or buildings. Indeed, many archaeologists are of the opinion that meanings cannot be recovered for ancient objects in the absence of texts (Flannery & Marcus, 1993). One advantage of Toulmin’s scheme is that it allows us to see precisely where the weaknesses in such arguments lie. In Argument 3, Elizabeth Brumfiel (1996) makes the claim that the attributes and contexts of female ceramic figurines imply that the dominant ideology of the Aztec state was resisted by commoners. The warrants she gives are noteworthy for their low level of support by backings. Some have no (stated) backings at all, and others are simply weak. Warrant 2 is backed by an abstract theoretical position that can be reconciled with a number of conflicting claims. Because of the abstract nature of the dominant ideology thesis, it cannot be tested directly, making this a weak warrant and backing Warrant 4 is backed by citing the opinions of other scholars, rather than empirical findings. This is an example of empty citation, as discussed above. Warrants 1 and 5 are basically assertions. I suggest the qualifier, “It is possible that” indicates the level of strength of this argument. Brumfiel (1996:161) qualifies her argument as follows: “where the influence of the dominant ideology is felt, it does not always result in ideological dominance.” While this is a somewhat vague conclusion, it does provide some qualification to her central claim. In Argument 4, Christina Halperin (2009) builds on Brumfiel’s paper to make a related argument about the role of figurines in transmitting a dominant ideology. As in Argument 3, the backings here are quite weak; they include abstract theoretical positions (Warrants 2 and 3), and assertions (Warrants 4 and 5). Again, this is a weak and speculative argument, but that does not stop the author from making rather definite claims without qualification; one example is her conclusion: “figurines aided in the dissemination of state symbols and ideologies” (Halperin, 2009:396). If archaeologists were to use Toulmin’s scheme to discuss and present their arguments, they might be induced to provide more realistic qualifiers of the strength of those arguments. This is, in fact, a fundamental requirement of arguments in science. Arguments “should never be a categorical assertion, but should always convey the author’s assessment of the credibility of his own claims” (Ziman, 1978:64); see also 6 and Bellamy (2012:36–37). In some fields—such as climate change research by the Intergovernmental Panel on Climate Change—scientists 1 I chose these examples because they are two of the cases that sparked my initial inquiry into the strength of arguments (Smith, 2015b). I saw these as particularly weak or problematic arguments whose validity could not easily be tested, an observation that led me to investigate the epistemology of argumentation in greater depth
48 M. E. Smith Fig. 3.3 Schematic depiction of two weak arguments. Argument 3 is from Brumfiel (1996); argument 4 is in Halperin (2009) have developed coding systems for explicitly indicating the strength of every claim (Adler & Hirsch Hadorn, 2014; Ebi, 2011), something archaeologists should consider doing.
3 Making Good Arguments in Archaeology 49 3.6 Models as Arguments A model is a simplified representation of some part of the world created in order to better understand the organization and dynamics of that part of the world. In the words of John Ziman (1978:23), a model is “no more than an analogy or metaphor. It implies a structure of logical and mathematical relations that has many similarities with what it purports to explain, but cannot be fully identified with it.” Following the definitions of arguments and models employed in this paper, models can be seen as a type of argument, and arguments can in turn be seen as a type of model. In spite of their abstract similarity, these two concepts—models and arguments— are rarely discussed together in archaeology. Each has its own literature, with relatively few citations across one another. Chapman and Wylie (2016) include discussion of both in their book, but they are included in separate chapters without cross-references. These authors only connect the two concepts at an abstract level, where arguments and models are both components or strands of the cables of evidence and inference that make up archaeological knowledge of the past. I suggest that in order to achieve a more comprehensive view of argumentation in archaeology, it is useful to view models as a type of argument. Models begin with a set of facts, they are manipulated by the analyst in ways analogous to warrants, and the end result is a claim. From the perspective of argumentation, archaeological models consist of two sequential arguments. The first argument—which I will call the internal argument— is the model itself. The second, or external, argument is the operation that links the results of the model to some aspect of the past, or to a more general realm of social or ecological processes. Archaeological works on modeling devote most or all of their attention to the internal arguments (Clarke, 1972; Wylie, 2017b; van der Leeuw & McGlade, 1997; Kohler & van der Leeuw, 2007; Romanowska et al., 2019). The external argument—usually called the validation of the model—is where the results are compared to external data to assess the degree of fit (Cegielski & Rogers, 2016; McGlade, 2014). I use the terms internal and external deliberately to line up with the concepts of internal and external validity in the field of social science methodology. Internal validity asks whether a finding is true for a chosen sample. That is, does the model operate properly and produce results that make sense given the inputs and methods? External validity asks whether a given finding can be generalized to a broader population of cases (Gerring, 2012:84). James McGlade (2014:288), in discussing issues in archaeological simulation models, stresses the need for “a stronger focus on epistemological issues, rather than on technological/methodological preoccupations,” and this suggestion can be mapped onto the internal/external division. Indeed, the preoccupation of modelers with internal validity at the expense of external validity parallels the situation in the field of economics. In the words of philosopher of science Nancy Cartwright: Economists make a huge investment to achieve rigor inside their models, that is to achieve internal validity. But how do they decide what lessons to draw about target situations outside
50 M. E. Smith from conclusions rigorously derived inside the model? That is, how do they establish external validity? We find: thought, discussion, debate; relatively secure knowledge; past practice; good bets. But not rules, check lists, detailed practicable procedures; nothing with the rigor demanded inside the models. (Cartwright, 2007:18) Perhaps an acknowledgement of models as two-part arguments may promote greater attention to epistemology by archaeological modelers, as called for by McGlade (2014). 3.7 Conclusions One of the negative consequences of the period when argumentation and epistemology receded in archaeology is that weak arguments have become tolerated. Weak arguments are now a regular feature of peer reviewed publications. Their conclusions—often based on abstract social theory combined with scanty empirical evidence with no testing—are not reliable, preventing the development of a strong foundation of solid archaeological evidence. While this situation may be acceptable to those with an interpretivist orientation—where concerns are local with little concern for generalization—a scientific perspective requires the creation of a reliable body of findings, and those findings must rely on adequate forms of argumentation. The chapters in this book contribute to a growing trend of published studies on argumentation in archaeology. In addition to the suggestions of other chapters, I propose that attention to the form and structure of arguments can improve the reliability and usefulness of the claims we make from archaeological data. The works of philosophers of science who focus on social science and history (Toulmin, 2003; Little, 2010; Wylie, 2002) are very helpful in this regard, but it is up to archaeological practitioners to do what it takes to improve our arguments. A continuing methodological advancement in argumentation will have two benefits. First, it will help create a more robust record of archaeological knowledge, thereby improving our understanding of past societies and cultures around the world. Second, it will allow archaeological data to be used in transdisciplinary research that goes beyond the confines of our discipline (Smith, 2021) and contributes to broader research questions in the social, natural, and historical sciences. Acknowledgements An email exchange with Alison Wylie helped organize my thoughts on several issues of argumentation. Iza Romanowska, Stefani Crabtree, and several other archaeological modelers on Twitter stimulated my thinking on models and their relationship with arguments. Frasier Neiman made some insightful comments and suggestions that helped me see the linkages between testing and warrants with more clarity.
3 Making Good Arguments in Archaeology 51 References 6, P., & Bellamy, C. (2012). Principles of methodology: Research design in social science. Sage. Abbott, A. (2004). Methods of discovery: Heuristics for the social sciences. Norton. Abbott, A. (2010). Varieties of ignorance. American Sociologist, 41, 174–189. Abend, G. (2008). The meaning of “theory”. Sociological Theory, 26, 173–199. Adler, C. E., & Hirsch Hadorn, G. (2014). The IPCC and treatment of uncertainties: Topics and sources of Dissensus. Wiley Interdisciplinary Reviews: Climate Change, 5(5), 663–676. https:/ /doi.org/10.1002/wcc.297 Binford, L. R. (1981). Bones: Ancient men and modern myths. Academic. Binford, L. R. (1983). In pursuit of the past: Decoding the archaeological record. Thames and Hudson. Binford, L. R. (1989). Debating archaeology. Academic. Bodnár, J. (2019). Comparing in global times: Between extension and incorporation. Critical Historical Studies, 6(1), 1–32. Bonnin, T. (2019). Evidential reasoning in historical sciences: Applying Toulmin schemes to the case of Archezoa. Biology and Philosophy, 34(2), 30. Booth, W. C., Colomb, G. G., & Williams, J. M. (2008). The craft of research. 3rd. ed. University of Chicago Press. Brumfiel, E. M. (1996). Figurines and the Aztec state: Testing the effectiveness of ideological domination. In R. P. Wright (Ed.), Gender and archaeology (pp. 143–166). University of Pennsylvania Press. Bunge, M. (1999). Social science under debate: A philosophical perspective. University of Toronto Press. Bunge, M. (2004). How does it work?: The search for explanatory mechanisms. Philosophy of the Social Sciences, 34(2), 182–210. Bunge, M. (2013). Bruce Trigger and the philosophical matrix of scientific research. In S. Chrisomalis & A. Costopolous (Eds.), Human expeditions: Inspired by Bruce Trigger (pp. 143– 159). University of Toronto Press. Cartwright, N. (2007). Are RCT’s the gold standard? BioSocieties, 2(1), 11–20. Cegielski, W. H., & Rogers, J. D. (2016). Rethinking the role of agent-based modeling in archaeology. Journal of Anthropological Archaeology, 41, 283–298. https://doi.org/10.1016/ j.jaa.2016.01.009 Chapman, R., & Wylie, A. (Eds.). (2015). Material evidence. Routledge. Chapman, R., & Wylie, A. (2016). Evidential reasoning in archaeology. Bloomsbury Press. Clark, G. A. (2000). On the questionable practice of invoking the metaphysic. American Anthropologist, 102(4), 851–853. Clarke, D. L. (Ed.). (1972). Models in archaeology. Methuen. Copi, I. M. (1982). Introduction to logic. 6th ed. Macmillan. Currie, A. (2016). Ethnographic analogy, the comparative method, and archaeological special pleading. Studies in History and Philosophy of Science Part A, 55, 84–94. Dearing, J. A., Braimoh, A. K., Reenberg, A., Turner, B. L., II, & van der Leeuw, S. (2010). Complex land systems: The need for long time perspectives to assess their future. Ecology and Society, 15(4), 21. Demeulenaere, P. (Ed.). (2011). Analytical sociology and social mechanisms. Cambridge Universitiy Press. Diamond, J. (2004). Collapse: How societies choose to fail or succeed. Viking. Domenici, D. (2018). Beyond dichotomies: Teotihuacan and the Mesoamerican urban tradition. In D. Domenici & N. Marchetti (Eds.), Urbanized landscapes in early Syro-mesopotamia and prehispanic Mesoamerica: Papers of a cross-cultural seminar held in honor of Robert McCormick Adams (pp. 35–70). Otto Harrassowitz. Ebi, K. L. (2011). Differentiating theory from evidence in determining confidence in an assessment finding. Climate Change, 108, 693–700.
52 M. E. Smith Ember, C. R., & Ember, M. (2009). Cross-cultural research methods. AltaMira. Eppich, K. (2020). Analogy as theory and method. The SAA Archaeological Record, 20(1), 31–34. Flannery, K. V. (1973). Archaeology with a capital S. In C. L. Redman (Ed.), Research and theory in current archaeology (pp. 47–58). Wiley. Flannery, K. V., & Marcus, J. (1993). Cognitive archaeology. Cambridge Archaeological Journal, 3, 260–270. Fogelin, L. (2007). Inference to the best explanation: A common and effective form of archaeological reasoning. American Antiquity, 72, 603–625. Gerring, J. (2012). Social science methodology: A unified framework. Cambridge University Press. Gibbon, G. (2014). Critically reading the theory and methods of archaeology: An introductory guide. Rowman and Littlefield. Glantz, M. H. (1991). The use of analogies: In forecasting ecological and societal responses to global warming. Environment: Science and Policy for Sustainable Development, 33(5), 10–33. Glantz, M. H. (2019). Societal responses to regional climatic change: Forecasting by analogy. Routledge. Haber, S. (1999). Anything goes: Mexico’s “new” cultural history. Hispanic American Historical Review, 79, 309–330. Halperin, C. T. (2009). Figurines as bearers of and burdens in late classic Maya state politics. In C. T. Halperin, K. A. Faust, R. Taube, & A. Giguet (Eds.), Mesoamerican figurines: Small-scale indices of large-scale social phenomena (pp. 378–403). University Press of Florida. Harzing, A.-W. (2002). Are our referencing errors undermining our scholarship and credibility? The case of expatriate failure rates. Journal of Organizational Behavior, 23, 127–148. Hedström, P. (2005). Dissecting the social: On the principles of analytical sociology. Cambridge University Press. Hegmon, M. (2003). Setting theoretical egos aside: Issues and theory in North American archaeology. American Antiquity, 68, 213–243. Hempel, C. (1965). Aspects of scientific explanation. Free Press. Henige, D. P. (2011). Truth or hope? Stimulus and response in scholarly publishing. Journal of Scholarly Publishing, 42(2), 205–225. Hodder, I., & Hutson, S. R. (2003). Reading the past. Cambridge University Press. Johnson, M. (2010). Archaeological theory: An introduction. Blackwell. Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and Social Psychology Review, 2(3), 196–217. Klappholz, K., & Agassi, J. (1959). Methodological prescriptions in economics. Economica, 26(101), 60–74. Kocka, J. (2003). Comparison and beyond. History and Theory, 42, 39–44. Kohler, T. A., & van der Leeuw, S. E. (Eds.). (2007). Model-based archaeology of socionatural systems. SAR Press. Lamoureux-St-Hilaire, M. (2020). Comparative approaches and analogical reasoning for Mayanists. The SAA Archaeological Record, 20(1), 8–13. Leung, K. (2011). Presenting post hoc hypotheses as a priori: Ethical and theoretical issues. Management and Organization Review, 7(3), 471–479. Little, D. (2010). New contributions to the philosophy of history. Springer. Mahoney, J., Kimball, E., & Koivu, K. L. (2009). The logic of historical explanation in the social sciences. Comparative Political Studies, 42(1), 114–146. Manicas, P. T. (2006). A realist philosophy of social science: Explanation and understanding. Cambridge University Press. Martinón-Torres, M., & Killick, D. (2013). Archaeological theories and archaeological sciences. In A. Gardner, M. Lake, & U. Sommer (Eds.), Oxford handbook of archaeological theory. Oxford University Press. McGlade, J. (2014). Simulation as narrative: Contingency, dialogics, and the modeling conundrum. Journal of Archaeological Method and Theory, 21(2), 288–305. Merton, R. K. (1968). Social theory and social structure. Free Press.
3 Making Good Arguments in Archaeology 53 Meyer, W. B., Butzer, K. W., Downing, T. E., II, Wenzel, B. L. T., & Wescoat, J. L. (1998). Reasoning by analogy. In S. Rayner, & E. L. Malone (Eds.), Human choice and climate change, vol. 3: Tools for policy analysis (pp. 217–289). Battelle Press. Mills, C. W. (1959). The sociological imagination. Oxford University Press. Monty Python. (1989). The complete Monty Python’s flying circus: All the words (Vol. 2). Pantheon Books. Morgan, C. G. (1973). Archaeology and explanation. World Archaeology, 4(3), 259–276. Moro Abadía, O., & Lewis-Sing, E. (2021). The decline of epistemology in archaeology: Comments on an ongoing discussion. In L. Coltofean-Arizancu & M. Díaz-Andreu (Eds.), Interdisciplinarity and archaeology: Scientific interactions in nineteenth- and twentieth-century archaeology (pp. 203–223). Oxbow Books. Olson, J. M., & Smith, M. E. (2016). Material expressions of wealth and social class at Aztecperiod sites in Morelos, Mexico. Ancient Mesoamerica, 27(1), 133–147. Orser, C. E., Jr. (2014). Archaeological thinking: How to make sense of the past. Rowman and Littlefield. Popper, K. R. (1934). The logic of scientific discovery. Harper and Row. Raab, L. M., & Goodyear, A. C. (1984). Middle-range theory in archaeology: A critical review of origins and applications. American Antiquity, 49, 255–268. Ragin, C. C., & Amoroso, L. M. (2011). Constructing social research: The unity and diversity of method. Sage. Romanowska, I., Crabtree, S. A., Harris, K., & Davies, B. (2019). Agent-based modeling for archaeologists: Part 1 of 3. Advances in Archaeological Practice, 7(2), 178–184. https://doi.org/ 10.1017/aap.2019.6 Sabloff, J. A., Beale, T. W., & Kurland, A. M., Jr. (1973). Supplement: Recent developments in archaeology. Annals of the American Academy of Political and Social Science, 408, 103–118. Sewell, W. H. (1967). Marc Bloch and the logic of comparative history. History and Theory, 6(2), 208–218. Smith, M. E. (1992). Archaeological research at Aztec-period rural sites in Morelos, Mexico. Volume 1, Excavations and Architecture/Investigaciones arqueológicas en sitios rurales de la época Azteca en Morelos, Tomo 1, excavaciones y arquitectura. University of Pittsburgh. Smith, M. E. (Ed.). (2015a). Artefactos Domésticos de Casas Posclásicas en Cuexcomate y Capilco, Morelos. Archaeopress. Smith, M. E. (2015b). How can archaeologists make better arguments? The SAA Archaeological Record, 15(4), 18–23. Smith, M. E. (2017). Social science and archaeological inquiry. Antiquity, 91(356), 520–528. Smith, M. E. (2021). Why archaeology’s relevance to global challenges has not been recognized. Antiquity, 95, 1061–1095. Smith, M. E., & Hirth, K. G. (1988). The development of Prehispanic cotton-spinning Technology in Western Morelos, Mexico. Journal of Field Archaeology, 15, 349–358. Thomas, J. (2015). The future of archaeological theory. Antiquity, 89, 1287–1296. Tilly, C. (2008). Explaining social processes. Paradigm Publishers. Todd, P. A., Guest, J. R., Lu, J., & Chou, L. M. (2010). One in four citations in marine biology papers is inappropriate. Marine Ecology Progress Series, 408, 289–303. Toulmin, S. (2003). The uses of arguments. Updated edition, Cambridge University Press. Trigger, B. G. (2006). A history of archaeological thought. 2nd ed. Cambridge University Press. van der Leeuw, S. E., & McGlade, J. (Eds.). (1997). Time, process, and structured transformation in archaeology. Routledge. Watson, P. J., LeBlanc, S. A., & Redman, C. L. (1971). Explanation in archaeology: An explicitly scientific approach. Columbia University Press. Watson, P. J., LeBlanc, S. A., & Redman, C. L. (1974). The covering law model in archaeology: Practical uses and formal interpretations. World Archaeology, 6(2), 125–132. Wylie, A. (1985). The reaction against analogy. Advances in Archaeological Method and Theory, 8, 63–111.
54 M. E. Smith Wylie, A. (Ed.). (2002). Thinking from things: Essays in the philosophy of archaeology. University of California Press. Wylie, A. (2017a). From the ground up: Philosophy and archaeology. Proceedings of the American Philosophical Association, 91, 118–136. Wylie, A. (2017b). Representational and experimental modeling in archaeology. In L. Magnani & T. Bertolotti (Eds.), Springer handbook of model-based science (pp. 989–1002). Springer. Ziman, J. (1978). Reliable knowledge: An exploration of the grounds for belief in science. Cambridge University Press.
Chapter 4 A Causal Model Application to a Cultural Heritage Sentence Analysis Alejandro Sobrino and Beatriz Calderón-Cerrato Abstract In this paper we will approach a cultural heritage sentence focusing on its causal content with the aim of providing a causal graph that, once pruned using bayesian techniques, schematically shows in an abbreviated way the essential content of the sentence for a non-specialist or general audience. For that purpose, the paper develops the following story line. We begin by noting the frequent controversies around heritage and its prosecution when discrepancies emerge. Next, we analyze a Spanish legal sentence about cultural heritage focusing on its causal structure and lexicon. In this respect, relevant aspects of causality are discussed both from a logical and a lexical point of view, which makes it possible to extract from the text of the judgment those sentences that are causally most salient. Differences between causality of physical and law facts are also cleared. Finally, a causal graph is depicted from the selected set of causal phrases of the sentence and a Bayesian analysis is applied to separate effective causes from the spurious ones for understanding the judge’s verdict, concluding the usefulness of the causal analysis with the aim to grasp the factual and evidentiary contents of a sentence about heritage. Keywords Causality · Knowledge · Explanation · Bayesian networks · Counterfactual A. Sobrino (!) Department of Philosophy, University of Santiago de Compostela, Santiago de Compostela, Spain e-mail: alejandro.sobrino@usc.es B. Calderón-Cerrato Incipit CSIC, Santiago de Compostela, Spain e-mail: beatriz.calderon-cerrato@incipit.csic.es © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 C. Gonzalez-Perez et al. (eds.), Discourse and Argumentation in Archaeology: Conceptual and Computational Approaches, Quantitative Archaeology and Archaeological Modelling, https://doi.org/10.1007/978-3-031-37156-1_4 55
56 A. Sobrino and B. Calderón-Cerrato 4.1 Cultural Heritage and Its Disputes The cultural heritage of a community is, together with other factors as its geography, its weather or its language, a property that defines the people. Cultural heritage is anchored in the past in terms of the legacy we receive and is projected into the future as an unifying force of societies that use their identities as elements that define and difference them: the peculiarity of their squares, their dances, their traditions or their accents. Cultural heritage comprises not only the material heritage, but also the natural and intangible one, with which the access to the cultural diversity is promoted, conveying lifestyles and experiences between generations. This has conferred cultural heritage notoriety in our modern societies, where tourism has turned into a massive and regular activity, with an essential economic function. It is indispensable, then, to look after its authenticity and its good conservation. Heritage has to be exposed, but at the same time preserved, in a difficult balance in which sustainability is a key element. Heritage maintenance requires policies that preserves it from its fragility, reconciling the necessary exploitation of the environment with the care of the legacy of our ancestors, investing in the conservation and revitalization of what is inherited and in its legal protection against possible abuses. It has been tried to preserve the ‘fragile wealth’ that heritage entails, at least, with three types of actions: 1. Identification: registrations and inscriptions, which determine what elements are valuable and require special attention in their defense or safeguard. Registering and inventorying thus become necessary activities to know what each culture has and to detect what it lacks. 2. Promotion: investment and awareness, that measure the degree of commitment of public bodies in the valorization and defense of what is inherited; promotion of continued investments that result in the conservation and luster of heritage, involving the private sector and civil society, which thus recognizes its importance. 3. Preservation: veto by law those individuals or actions that, voluntary or involuntarily, attack the heritage, receiving for it the appropriate notice or reprimand. If a loss of profit happens for the protection of that asset, the Administration assumes the appropriate compensation. In the case of law preserved Listed Heritage Items (LHI from now onwards), judicial decisions play a leading role. A judicial sentence is a legal resolution that expresses a final decision on a process, which can be either criminal or civil. With the judicial sentence, the litigation or lawsuit filed by the parties ends and the judge fails the final resolution. Due to the form, sentences can be classified into written or oral, although the latter is only possible for some processes. In this work we use a written sentence. Regarding their content and explanations, sentences must have the following parts or sections:
4 A Causal Model Application to a Cultural Heritage Sentence Analysis 57 1. Preamble: it encloses the data of the place, date, identification of the parties, lawyers, procedure number, etc. 2. Background of facts and proven facts: it literally explains the requests of the parties involved in the process and expresses what has happened according to the discretion of the judge and the available evidence. 3. Fundamentals of Law: it describes in separate and numbered paragraphs the legal arguments that have motivated the resolution in favor of one of the parties. 4. Operative part and ruling: it contains the judge’s decision and determines the resolution to which the parties must abide. However, civil sentences like the one we handle do not have a precise structure in the Civil Procedure Law itself (Taranilla, 2015: 67). Proven facts are, then, a section that does not usually have independence within the superstructure of the text and, instead, are integrated into the section on the fundamentals of law. Even so, the sentence that we analyze has a sub-section within the fundamentals of law called Relevant background for the resolution of the case to differentiate these two sections. Judicial sentences are argumentative texts where verdict is reasoned using proven facts and fundamentals of law terms. The sentence follows a logical scheme of legal coverage, where proven facts are subsumed in a juridical rule to extract the consequences expected. The parties must defend their points of view relying on convincing facts and theories as well as supporting evidence to persuade to the court and convince the judges to embrace their petitions through arguments. As an argumentative text, it can be formalized using a logical language attempting to demonstrate how conclusions follow deductively from premises. However, this approach is questionable because, (i) it ignores that law is not only a conceptual or axiomatic system, but it has intentional agents and social effects that may require that a legal rule to be annulled or modified and (ii) legislators can never fully predict under what circumstances the law should apply. Both factors imply that the legislation is formulated in general and abstract terms, which creates uncertainty and space to disagreement. Legal reasoning seems to respond to a dynamic logic that allows the logical status of some premise to be changed if others are incorporated and to change the assessment that the first one has deserved. Bayesian networks offer a model to formally study these cases. Since they are argumentatively linked texts, court rulings include conditional and causal sentences that attend to generic formats as ‘Si . . . entonces’ (‘If . . . then’), ‘En consecuencia’ (‘In consequence’) or ‘Hay relación de causalidad’ (‘There is causal relation’). In this paper, we analyze a court ruling about cultural heritage in order to: i) showing its conditional and causal structure, given the abundance and notoriety of this type of sentences, (ii) drawing a causal graph that visually shows the reasoning flow that leads to the verdict, (iii) using counterfactual or Bayesian analysis of causality to prune the causal graph and show in an effective and summarized way the causes that de facto substantiate the verdict.
58 A. Sobrino and B. Calderón-Cerrato 4.1.1 Analysis of a Sentence About Cultural Heritage The sentence object of our analysis corresponds to the Superior Court of Justice (Contentious Chamber) based in Pamplona/Iruña, appeal No. 306/2018, against Resolution 95/2018 of April 6 of the General Director of Culture and the Príncipe de Viana Institution. Next, we summarize the sentence according to the aforementioned parts: Background of facts: • 08.11.1984. Mármoles Baztán is granted the exploitation of the AlkerdiBerroberría area to extract marble. • The concession is extended 30 years from 2014. • 05.07.2014. There is a blast by Mármoles Baztán that causes alarm in the population. The Urdax City Council requires the concessionaire not to do any more blasting and entrusts a study to the Department of Environment of the Autonomous Community. • 12.08.2014. Blasting is temporarily suspended and archaeological studies are started in nearby caves. • 04.08.2016. Paleolithic engravings are discovered. • 06.08.2016. The entire Alkerdi-Berroberría system is registered as LHI. • 31.08.2016. Mármoles Baztán files a patrimonial claim for damages derived from the inscription as LHI that is dismissed. • 21.11.2016. Mármoles Baztán also files a patrimonial claim with the Economic Development Council of the Government of Navarra for damages derived from the declaration of nullity of the concession extension. It is dismissed. • 12.09.2016. Once a new extension was processed, it was rejected due to the existence of LHI. • 13.06.2018. An appeal was filed, and it was dismissed. • 29.05.2018. Such inscription was appealed again, and it was rejected because there was rock art. Fundamentals of Law: • Registration as LHI is an act of non-declarative procedure that lacks substantivity to produce damages (sentences 19 Nov 2013, TS 21 cot 2008). • It is necessary for the administration to issue an act by which the existence of LHI is recognized or declared (art 40.2 law 16/1985 Spanish Historical Heritage). The parties allege that: • Plaintiff: the causal relationship between the actions of the administration (registration as LHI) and the damages derived from the impossibility of continuing with the exploitation is evident. • Defendant: the causal relationship between the declaration of the registration as LHI of the Alkerdi-Berroberría system and the damage suffered for which it is now claimed has not been proven.
4 A Causal Model Application to a Cultural Heritage Sentence Analysis 59 • Co-defendant: Zurich Ins. says that the policy does not cover the claim that is the subject of this process. There is no causal relationship between the administrative action and the prejudices claimed. Petition of the Defendant and Co-defendant. Annulment of the appeal and, consequently, inadmissibility of patrimonial responsibility for damages derived from the registration as LHI of the Alkerdi-Berroberría system operated by Mármoles Baztán. Disagreement of the Plaintiff. There is a set of actions by the Administration that are determinants of the damage caused: Extension of the concession. Investigation of the Alkerdi-Berroberría system. Declaration of the Alkerdi-Berroberría system as LHI. and that they were carried out, not because the law required it, but by decision of the Administration, which is the cause of the damage suffered. Therefore, the causal relationship between the activity of the administration (registration of an area as LHI) and the damages derived from not continuing the exploitation is evident. Disagreement of the Defendant. The registration of the Alkerdi-Berroberría system as LHI is an administrative action in accordance with the law and an act of procedure for containing cave paintings that is not causing damage. The inscription itself has no effect on the exploitation of the quarry. There is no causal relationship between the administrative action and the damage that is said to be borne by the administered, since registration as a LHI does not produce the effect of either the revocation of the extension of exploitation or the denial of it. Resolution. Therefore, it was demonstrated that the contested administrative action was in accordance with the law and the reason for the appeal and the claim filed by Mármoles Baztán were dismissed because there is no causal relationship between the registration of the Alkerdi-Berroberría system as LHI and the damages claimed. In this schematic summary of the sentence, which paraphrased selected texts of the sentence, the presence of causal relationships can be noted both in the petitions of the parties and in the justification of the sentence. The causality and the implication or subsumption of particular cases in norms are essential in the justification of the verdict. Therefore, its causal study seems justified. This paper, then, is articulated in the following way: in Sect. 4.2 we analyze the structure and logical properties of conditional and causal sentences, showing analogies and differences. In Sect. 4.3 we show the differences between the concepts of causality, typical of the physical sciences, and causation, characteristic of the social sciences, such as law, and we show a causal model for its adequate analysis. In Sect. 4.4 we analyze the grammatical and lexical form of conditional and causal sentences. In Sect. 4.5, Bayesian networks are applied to the analysis of the sentence summarized above, in order to separate the effective causes from the spurious ones in the justification of the verdict or opinion. This analysis allows us to obtain a causal graph showing how the ruling follows form the proven facts. In Sect. 4.6,
60 A. Sobrino and B. Calderón-Cerrato we summarize the conclusion of the work and expose how to continue it. As complementary material, the sentence under analysis is added as an annex. 4.2 Conditional Logic and Causal Logic: Analogies and Divergences ‘Ingesting 2 grams of cyanide causes death’ can be paraphrased as ‘If someone takes 2 grams of cyanide, he dies’. Conditional statements and causal statements attempt to relate sentences to each other so that there is some entanglement or link between them. While they have similarities, they also show differences. Below, we will look at some analogies and some dissimilarities. 4.2.1 Strict and Material Conditional Conditional sentences usually contain the discursive markers if, . . . , then, but there are sentences containing these words that do not express any relation between the protasis and the apodosis, such as If Einstein was a physicist, Madrid is the capital of Spain. That is a material conditional. Strict conditionals are opposed to material conditionals because the relation between the antecedent and the consequent is imperative or necessary. A strict conditional is one that uses the modal operator of necessity, ! (p➔q), meaning that q follows from p in all possible worlds, i. e., that there is no imaginable situation that makes the antecedent true and the consequent false. Such is the case in If a set A is properly contained in another set B, then A is smaller than B. The conditional of a valid inference rule is strict since the conclusion necessarily follows from the premises. Although from a strictly logical point of view validity is only concerned with form, in the causal arguments the content is a matter of interest. A causal argument substantiates an effective link between cause and effect. But, in addition to the true connection between cause and effect, the possible non-present but conceivable elements that would help to enable or disable such a connection are kept in mind. Let the following arguments be in the Modus Ponens format (Table 4.1): Formal logic classifies both arguments as equally valid but causal logic gives them different credibility. Indeed, the argument on the left is more convincing than the argument on the right because it has fewer disabling factors: There are few Table 4.1 Examples of Modus Ponens arguments If you pull the trigger, the gun goes off I pull the trigger Then the gun goes off If I fertilize the plant, it grows I fertilize the plant Then it grows
4 A Causal Model Application to a Cultural Heritage Sentence Analysis 61 scenarios where, if the trigger is pulled, the gun will not fire, but there are plenty of scenarios in which a plant gets fertilized and does not grow – if it is not watered enough, if it does not have good sun exposure, if it does not have fungus, etc. (cf. Cummings et al., 1991). 4.2.2 Indicative Versus Subjunctive or Counterfactual Conditional In causal relations, indicative conditionals (if a then b) and subjunctive or counterfactual conditionals (not b unless a; if not a then not b) are relevant. Subjunctive conditionals are often used as a test of causality: if a is the cause of b, then if a does not occur, then b does not occur either or, in other view, b does not occur unless it does a. This is true if the cause is single, but not in a multi-causal scenario, where the effect may be due to an alternative cause sufficient as the previous one to cause it. Subjunctive conditionals allow us to conjecture what follows if the cause does not occur in the current world, either because we assume its non-occurrence (that the earth does not have rotational motion around the sun) or because we intervene in it (moving one billiard ball before it is hit by another and thus impeding the transmission of motion to a third). Intervention, which is a human intentional action, suggests an alternative interpretation for the word ‘condition’: in an indicative conditional statement, the antecedent is a sufficient condition for the consequent; in a subjunctive conditional statement, the effect is dependent or independent of a possible cause, conditioned on whether (or not) an intervention (Markov principle) is made on it (Cfr, Pearl, 2009). In the case of multiple causality, this permits to calculate how possible alternative causes influence each other. For example, whooping cough can be caused by two bacteria: bordetella pertussis and bordetella parapertussis. Each is sufficient to independently cause the disease. Suppose a person is found to have whooping cough and it is not due to the bacterium bordetella pertussis. The causes that were independent before are not now, because the exclusion of one of them allows the other to be inferred as the effective cause. 4.2.3 Positive and Negative Causality Positive causality came from the assumption that causal influence is produced by the transfer of matter, energy, or information from cause to effect (Cfr. Dowe, 2000). Negative causality, on the other hand, is the absence of such influence, either by inaction, omission, or absence. In positive causality there is an effective relationship between cause and effect (If I pull the trigger, the gun goes off ). Negative causation, on the other hand, is because the cause, the effect, or both, do not happen (The
62 A. Sobrino and B. Calderón-Cerrato absence of vigilance caused the robbery). Depending on this, they are called by different names: prevention, prevention by omission or prevention by absence and, finally, absence (Cfr. Barros, 2013). Some philosophers, such as Lewis or Armstrong, stand that negative causality is not a real causality because a non-fact cannot be considered a real cause (cf. Armstrong, 1999). But there is no doubt that negative causality takes place in fields such as medicine or law: thus, it is said that scurvy is caused by the absence of vitamin C or that the omission of help in an accident is a cause of punishment in case of serious injuries. It could be said that, even if they are not causally productive, absences are causally relevant; they do not transfer anything, but they cause effects (Cfr., Glennan, 2009). Negative causation or causation by disconnection presents problems for a counterfactual definition of causation. Thus, it is difficult to determine in omissions a single cause which, if it did not occur, would cause an effect. In positive causality, every alternative cause is sufficient for the effect, but in negative causality, all omissions are necessary. How many omissions must be taken into account and which of them are responsible for the effect are questions that frequently do not have a clear answer. In the legal field this is called promiscuity and indicates that, in multi-causal situations, it is necessary to distinguish between relevant and nonrelevant causes. But relevance is an imprecise term and the scales on which it is ordered may vary. 4.2.4 Transitivity The transitivity of causality is often associated with physical or Michottian causality, exemplified by the movement of a billiard ball, which is transferred to other balls with which it collides in a domino effect: Figure 4.1 illustrates the fact that if motion is impressed on ball A (efficient cause) and this in turn strikes ball B, C also moves because of A. In the ideal case, this example could have as many intermediate ball links as desired, although the inevitable friction on any surface limits the transmission of the motion initiated by A. Transitivity allows causes to be seen as proximate or distant; thus, in the previous fig. B is closer to C than A, and the closer it is to the effect, the better its causal influence can be identified. In Fig. 4.1, A causally influences C by means of B. But an intervention on B can put A out of play. This example shows that michottian causality is always transitive, but causal dependence may not be (cf., McDermott, 1995). Fig. 4.1 Michottian causality
4 A Causal Model Application to a Cultural Heritage Sentence Analysis 63 In effect, a counterfactual reading of the Fig. 4.1 says that ‘C if B’ and that ‘B if A’. But if an intervention is made on the ball B altering its state (e.g., pushing it towards C before A hits it or separating it from A’s trajectory), the link between C and A is broken and it is then meaningless to counterfactually attribute to A any causal power over C. 4.3 Conditionality and Causality in Natural Sciences and Law Causality plays an undeniable role in natural sciences, but also in -social sciences. In the former, it is usually named ‘causality’, and in the latter, ‘causation’. In the laws of physics, equations usually reflect the necessary connection between the events they connect, a link that is intended to be stable or eternal and precise. In contrast, causation in the social sciences, such as law, reflects the relationship between a person’s conduct and the damage caused, which legally takes the form of pointing out a fault or attributing guilt, and varies according to countries, courts, or individuals (cf. Lagnado & Gensterberg, 2017:574). While in the field of physical sciences one usually speaks of causality, in the field of law the term ‘causation’ is used to emphasize that not only physical processes are at work, but also rational agents acting intentionally. We will analyse this difference according to several parameters. 3.1 Causality and precision: while causality tends to be precise, causation is generally imprecise Physical causality is often characterized as precise: same causes, same effects and always the same effects. Physical or Laplacian determinism argues that, if we know the laws governing the matter and the initial conditions of a problem, it is possible to accurately predict any future event or occurrence. Causation in social sciences, on the other hand, is imprecise. Legal reasoning must be based on proved facts or relationships, but also on enacted laws and the social sentiment about them. While causality is based on physical laws, causation is rested on rules published by legal scholars attending only more or less widely accepted social norms, using therefore vague language to accommodate divergences which possibly lead to different interpretations (cf. Li, 2017). If the rules of physics tend to be precise, social rules are contextual and imprecise, since they involve not only immaterial objects or entities, but qualitatively reasoners using beliefs and presuppositions. Hence, legal reasoning perhaps shares more elements with history than with natural sciences (cf. Lehmann & Breuker, 2000:127–8). 3.2 Concrete and abstract causality: while causation is usually abstract, causation is concrete
64 A. Sobrino and B. Calderón-Cerrato Causality is concerned with regularities involving physical objects and shows a kind of necessary relationship between them, so that the cause provokes the effect, the effect does not occur without the cause, and the effect occurs after the cause. This relationship constitutes for Mackie (1980) the cement of the universe and illustrates the stable or essential regularities underlying scientific explanation. The laws of physics are often refined expressions of causal relationships. Given its relevance, causality is a key notion in science and metaphysically prior to the notions of space and time. Causation, on the other hand, refers to a social, not a physical property. It does not occur between objects, but between events affecting agents and it is characterized by the application of a general principle to a specific case, an action from which follows the effects that the rule foresees, usually in terms of attribution of guilt. In effect, Hart and Honoré (1959) based the assignment of liability to actors who cause harm to others on: • their specific conduct, • the causal connection between that conduct and the harm caused, and • the culpability legally implied by it. Legal causation is concerned with factual actions and the ascription of liability is, to a large extent, cause in fact and always after-the-fact. 3.3 INUS Causation and NESS Causation: while causality is usually INUS-type, causation is said to be NESS-type Physical causation can be uni or pluri-causal, the latter being more abundant due to the rarity of sufficient and necessary causes. Mackie characterizes physical causation as INUS causation -Insufficient but Non-redundant parts of a condition which is itself Unnecessary but Sufficient for the occurrence of the effect- (cfr. Mackie, 1980) where a cause is a necessary element in a non-redundant group that is sufficient, but not necessary, to cause the effect. For example, let’s say that to identify (I) an individual, one needs to know his full name (N) and ID number (D) or passport number (P); in symbols: N ∧(D∨P) ➔ I. Let us look at one of the causal factors: (D∨P) and, specifically, let us note that D. D is not necessary for identification, insofar as P can do that function, but it is not sufficient either, because N is always required to obtain I. Therefore, D belongs to a causal group (D∨P) that is insufficient, but not redundant, for the effect, and is an unnecessary, but sufficient part of that group. Wright adapted this causality to the field of law and called it NESS causality – Necessary Element of a Sufficient Set (Cfr. Wright, 1985): in a specific situation, a relevant causal condition is a necessary element of a set of conditions that are jointly responsible and sufficient for the harm to occur (Cfr. Honoré & Gardner, 2019). 3.4 Negative causes and negligence: while in causality absence is not always readily admitted as a cause, negative causes are a regular part of causation and can be substantiated in negligence, a clear form of causing harm Negligence in law relates to negative causation and has to do with sufficient causes that are independent of each other to the effect. When there are several
4 A Causal Model Application to a Cultural Heritage Sentence Analysis 65 negligent actors that are independent of each other, each of them can be considered an independent sufficient cause and, so, legally liable for the provoked damage, as it is a substantial and autonomous causal factor of it. For example, when a fire is favoured by the failure of a watchman and a fireman to act in time, the negligence is of both. In that case, liability arises from an inaction, although it can sometimes also be attributed to factual but ineffective attempts to prevent damage. 3.5 Causal transitivity: while in causality the cause can be propagated distantly or, under ideal conditions, even indefinitely, in causation the proximate cause has a relevant value, limiting the transmission of the ‘conditio sine qua non’ or ‘but-for test’ backward Causal chains break if the causal power decreases at each link and do not break otherwise. The latter is only the case if ideal assumptions are postulated, such as a frictionless surface or unbiased agents in their actions, far removed from how objects in nature or people in society behave. In law, the remoteness of the cause is relevant for causal attribution. Usually, only proximate causes have causal power. Causality decreases as causal links increase. In effect, assuming an unlimited transitive causal link the conditio sine qua non or but-for test would be endless and the causal attribution doubtful: the victim would not have been harmed if the aggressor had not fired, he would not have fired if the weapon had not been made, the weapon would not have been made if the metal was not available, etc. Proximate cause is a legal limitation of cause in fact. Despite this difficulty, counterfactual explanation is relevant in law since liability is always judged after the facts and once the process of reasoning involving facts and rules is completed. Causal reasoning is a kind of abductive inference that the occurrence of an event B (a damage) can be explained by a previous event A which causes it. In case of causal overdetermination, once the effect is instantiated -e.g., by proving that it occurs- backward inference allows causation to be attributed by virtue of the communication opened between the causal factors that were sufficient before to cause it independently. Let’s look at an example known as a ‘collider’, represented in Fig. 4.2: A is caused by C or B and, in turn, A causes E: B → A ← C, A → E. In principle, B and C are independent factors in causing E. But when the value of E is known and the value of B or C is instantiated, the communication between the possible causes is opened, allowing the choice of one of them as the effective cause. For example, let B and C be two people each wielding a firearm that can cause serious injury and death. The information that B has one weapon says nothing about what C, who has another gun, can do. However, if it is verified that E is dead and that B’s pistol is jammed, the cause of his death is attributable to C. Therefore, the instantiation of the effect and one of the causes turns causes that were independent before into causes that now depend on each other in the assessment of the harm caused.
66 A. Sobrino and B. Calderón-Cerrato Fig. 4.2 Example of causal overdetermination and backward inference 4.4 Causal and Conditional Lexicon Law language is considered a language of specialty. Unlike other specialty languages, that create new words to refer to a specific reality, the law often uses common words to which new meanings are added. That is why it is difficult to understand a legal text, because despite knowing most of the words used, they allude to specific meanings (Martí Sánchez, 2004: 175). Given the nature of the sentence as a legal textual genre, the purpose of which is to manifest the guilt or innocence of an entity or person for their actions, many of these new meanings denote causal and conditionality relationships. Another characteristic of legal texts is the abundance of conjunctions and phrases. In the case of conditional sentences, there are four common structures in English: (a) (b) (c) (d) If + present + present (universal truth); If + present + will + infinitive (real or possible); If + past simple + would + infinitive (hypothetical); If + past perfect + would + have + participle (regret). In Spanish, however, this is a more complex construction that contains different interpretations according to the verbal forms used (Bosque & Demonte, 1999): (a) Si + present indicative + present indicative/future indicative/conditional/past indicative forms; si + imperfect indicative + imperfect indicative/imperfect subjunctive/conditional (probability); (b) Si + imperfect subjunctive + conditional/imperfect indicative; si + imperfect indicative + imperfect indicative; si + conditional + conditional; si + imperfect subjunctive + conditional (improbable); (c) Si + plusperfect subjunctive + plusperfect subjunctive/compound conditional/conditional/plusperfect indicative; si + plusperfect indicative + plusperfect indicative; si + present indicative + present indicative (unreality).
4 A Causal Model Application to a Cultural Heritage Sentence Analysis 67 It also depends on the conjunction or phrase used instead of si (‘if’). In this section, we present some of these elements and new meanings briefly, but for our study, we only chose a small subset of all the sentences that could be rescued. In our sentence, there are words that coincide with or belong to the family of ‘cause’ or ‘consequence’ or others with the same flavor, but traditionally legal use, such as guilt, damage or verbs of the type to verify, which will be those that allow us to rescue in this case those relevant paragraphs in the causal argument that justifies the failure of the sentence. Let us look at some of those examples below: (a) Dicho cese constituye un daño efectivo, consistente en la pérdida de los gastos realizados ( . . . ). [‘Mentioned termination constitutes an effective damage, consisting of the loss of expenses incurred’] (b) ( . . . ) Se indemnice en concepto de daño emergente. [‘Compensation for consequential damages’] (c) ( . . . ) Se condene a la administración a la indemniaciIón de los daños y perjuicios causados. [‘The administration has to compensate for the damages and compensations caused’] (d) ( . . . ) Por la que se desestimó la reclamación patrimonial derivada del perjuicio sufrido a consecuencia de la suspensión de los trabajos de perforación y voladuras de la cantera. [‘By which the patrimonial claim derived from the damage suffered as a result of the suspension of the drilling and blasting works of the quarry was rejected’] In (a) and (b) we have two common lexical markers in law. On the one hand, (a) refers to the damage to tangible assets, that is, to the economic loss that the investments already made entail for the plaintiff (Mármoles Baztán S.L., (MB)). On the other hand, in (b) there is a reference to consequential damage that, in civil law (branch in which the sentence that we analyze is inserted), refers to the losses that are a consequence of not being complied with or that it has been belatedly fulfilled an obligation (Real Academia Española, 2020, definition 2)1 in this case, by the administration. In (c) ‘daños y perjuicios’ (‘damages and compensation’) is, a well-known expression in law that refers to economic injury resulting from actions or omissions, whether intentional or not (ibid., definition 1).2 Finally, in d) damage is used, with a meaning close to that of effective damage, to argue that the decisions made by the administration have involved a real damage in terms of expected earnings. Culpa (‘guilt’) is another of those words with causal implications that are commonly used in legal language. In our sentence we have the derivative culpabiblísticos (‘culpabilistic’): (a) Esta nota es la aparentemente más compleja, puesto que la doctrina común de la responsabilidad extracontractual y por actos ilícitos deviene en un complejo 1 Retrieved 2 Retrieved from https://dpej.rae.es/lema/da%C3%B1o-emergente from https://dpej.rae.es/lema/da%C3%B1os-y-perjuicios
68 A. Sobrino and B. Calderón-Cerrato fenómeno de examen sobre la relación de causalidad, la eventual concurrencia y relevancia de concausas y la existencia de elemento, culpabilísticos. [‘This is the apparently most complex note, since the common doctrine of tort liability and for illegal acts becomes a complex phenomenon of examination on the causal relationship, the eventual concurrence and relevance of concauses and the existence of elements, guilty’]. In this case, it refers to the procedure to attribute damage to the public administration, for which it is necessary to examine the causal relationship between the administration’s actions and the effects, as well as its relevance and determine if those effects can really be attributed to the proceed from the administration. As we said, we also find other voices that allow us to weave a causal network and that do not necessarily have a specific meaning in this type of text. For example, verbs such as derive (‘derivar’), force (‘obligar’) or check (‘comprobar’) flexed especially in the participle in our sentence, have a semantics that implies the influence of one origin or one agent on another or on something. Therefore, it is necessary to consider them in our analysis, since they are relevant for the conformation of the argument: (a) ( . . . ) Se reconozca la responsabilidad patrimonial por los daños derivados de la inscripción como BIC del Sistema Alkerdi Berroberría (AB). [‘( . . . ) The patrimonial responsibility for the damages derived from the registration as LHI of the Alkerdi Berroberría System (AB) is recognized’]. (b) El órgano que ordene un acto de ejecución material de resoluciones estará obligado a notificar al particular interesado la resolución que autorice la actuación administrativa. [‘The body that orders an act of material execution of resolutions will be obliged to notify the interested individual of the resolution that authorizes the administrative action’]. (c) La Administración ha comprobado que existe arte rupestre en el Sistema Alkerdi Berroberría. [‘The Administration has verified that there is rock art in the Alkerdi Berroberría System’]. Also, words typical of the idea of causality, such as causa (‘cause’), efecto (‘efect’) or consecuencia (‘consequence’): (a) Las partes, además se han pronunciado sobre la cuestión de fondo, de manera que no se causa indefensión alguna. [‘The parties have also ruled on the substantive issue, so that no defenselessness is caused’]. (b) ( . . . ) La inscripción del BIC no produce como efecto, ni revocación de la prórroga de la explotación que ostentaba la recurrente ni la denegación de la misma. [‘( . . . ) The registration as LHI does not produce the effect, neither revocation of the extension of the exploitation that the appellant held, nor the denial of the same’]. (c) La inscripción en el Registro de Bienes del patrimonio cultural de Navarra es consecuencia necesaria de su declaración como BIC por ministerio de la Ley. [‘The inscription in the Register of Listed Heritage Items of Navarra is a necessary consequence of its declaration as LHI by Ministry of Law’].
4 A Causal Model Application to a Cultural Heritage Sentence Analysis 69 Regarding to conjunctions and phrases, these are particles that abound in legal language, precisely because of their ability to relate both and other elements. Por (‘by’) is a prominent element due to its multiple functions. It works as introducer of causal complements and agent complements; the latter employed both in passive sentences and in participle constructions (Herrero Ruiz de Loizaga, 1992: 342). Likewise, it conforms lexicalized phrases, as por razón de (‘by reason of’), whose function is to introduce causal subordinate clauses; in addition to heading prepositional phrases that are also lexicalized, as por tanto (‘therefore’) or por ello (‘for that reason’), that allow anaphorically retaking the previous content and introducing it with causal value (Pérez Saldanya, 2014: 3450). Other examples traditionally defined as conjunctions and causal structures are porque (‘because’), pues, ya que, puesto que (‘since’), dado que (‘given that’) and al + infinitive (‘as+pronoun+to be’). All of them introduce causal subordinate clauses and their function in this sentence is that of adjunct, providing information that justifies or explains the content of the main clause: (a) La actuación de la administaciIón ha sido anormal porque se procedió a inscribir el BIC sin previo expediente de declaración ( . . . ). [‘The behaviour of the administration has been abnormal because the LHI was registered without priori declaration file ( . . . )’] (b) No se ha admitido tácitamente los daños ni su valoración, pues la resolución objeto de la litis no se pronuncia al respecto ( . . . ). [‘The damages or their assessment have not been tacitly admitted, since the resolution that is the subject of the dispute does not pronounce on the matter ( . . . )’]. (c) No existe vía de hecho ya que se tramitó expediente administrativo en la declaración del Sistema Alkerdi Berroberría como BIC. [‘There is no de facto procedure since an administrative file was processed in the declaration of the Alkerdi Berroberría System as LHI’]. (d) En el momento de interponer demanda la sentencia no era firme, puesto que contra ella se había presentado recurso de casación ante el Tribunal Supremo. [‘At the time of filing the claim, the sentence was not final, since an appeal for cassation had been filed against it before the Supreme Court’]. (e) ( . . . ) se ha interpuesto de forma prematura y extemporánea dado que pendía proceso judicial sobre la conformidad o disconformidad a derecho de la declaración inscripción del Sistema Alkerdi Berroberría como BIC. [‘( . . . ) it has been filed prematurely and extemporaneously, given that judicial proceedings were pending regarding the conformity or non-conformity with the right of the declaration of registration of the Alkerdi Berroberría System as LHI’]. (f) La inscripción de cuevas concretas como Alkerdi I o Alkerdi II no garantizan la protección del patrimonio al tratarse de un sistema kárstico único. [‘The inscription of specific caves such as Alkerdi I or Alkerdi II does not guarantee the protection of the heritage as it is a unique karst system’]. Otherwise, documents such as that of the Comisión para la modernización del lenguaje jurídico (‘Commission for the modernization of legal language’), directed
70 A. Sobrino and B. Calderón-Cerrato by Estrella Montolío Durán (Montolío Durán, 2011: 118), show a characteristic use of the gerund of legal language, called cause-consequence. Its use is considered incorrect, since it violates one of the three guidelines proposed by the author, which is that «la acción del gerundio tiene que realizarse al mismo tiempo o antes que la acción del verbo principal». This is the case of Interpuesto recurso, por sentencia de 13 de junio de 2018 ORD 112/2013 se desestimó confirmándose la resolución [‘Appeal filled, by sentence of June 13, 2018 ORD 112/2013 was dismissed confirming the resolution, in which the confirmation of the resolution is after the dismissal of the appeal’]. In this example, the dismissal of the appeal happens first, and then comes the confirmation of the resolution. That is why the use of gerund is considered incorrect, because the action expressed in gerund is not happening at the same time or before the one of the principal verb. In this case, in addition, it is not fulfilled that the subject of the gerund is the same as that of the main sentence (ibid.). With regard to conditionality, we have already seen that the boundaries between conditionality and causality are fuzzy. Even so, it is also possible to extract from the sentence some properly conditional lexicon, as siempre que (‘always that’), bajo la condición de (‘under the condition of’), en (el) caso de (‘in case of’) and una vez (‘once’) + participle. All these elements introduce conditional subordinate clauses and express either the conditions for a specific circumstance to occur, as in (a); or the hypothetical consequences that would occur if another act happened, also hypothetical, as indicated in example (b). The adverbs solo or solamente (‘only’) and si (‘if’) are also relevant, by means of which the prosthesis of the subordinate appears postponed and its function is to restrict the meaning of the main sentence (Bosque & Demonte, 1999: 3652), as in (c): (a) ( . . . ) prorrogables por períodos iguales hasta un máximo de 90 años, bajo la condición de cumplir el Plan de Restauración del Espacio. [‘( . . . ) extendable for equal periods up to a maximum of 90 years, under the condition of complying with the Space Restoration Plan’]. (b) ( . . . ) lo que hubiera podido variar el acto administrativo originario en caso de haberse observado el trámite omitido. [‘( . . . ) what could have changed the original administrative act in case of the omitted procedure had been observed’]. (c) ( . . . ) solamente podría estimarse una declaración de daños y perjuicios a favor de la demandante si la actuación administrativa fuera contraria al ordenamiento jurídico [‘( . . . ) a declaration of damages could only be upheld in favor of the plaintiff if the administrative action was contrary to the legal system’]. These elements appear throughout the sentence; however, an examination that uses the entire causal and conditional lexicon documented here is highly complex, especially due to discards, since sentences that are causal in themselves may not have relevance in the chain of cause-effects that lead to the failure. As we said at the beginning of this paper (see Sect. 4.1), our sentence has a sub-section within the fundamentals of law called Relevant background for the resolution of the case. Due to its relevance for the resolution, we comment very
4 A Causal Model Application to a Cultural Heritage Sentence Analysis 71 briefly on the characteristics of it: the expressed temporal indication and the use of the present durative or of impersonal sentences. These last two are due to the claim of objectivity typical of legal texts (Aguirrezabala & Fanduzzi, 2012: 111). 4.5 Cultural Heritage Sentence: Conditional and Causal Structure Once we have shown the logical and lexical properties of causal expressions, the aim is to apply these distinctive features to the extraction of relevant causal phrases from the heritage sentence described in point 1 with the purpose of using them to draw a causal graph that, once pruned with Bayesian methods, will provide the nonspecialist public with a summarized interpretation of the sentence. In the analysis of a legal sentence, it is a challenge to establish what, how and why the convicted actors did things or engaged in reprehensible conduct that allow responsibility to be attributed to them. To answer the question-what (what did they do?) means to set the facts and laws related to the case, to answer the question-why (why did they do it?) entails to show the necessity or plausibility of the judicial ruling, and to answer the question-how (how did they do it?) means to justify the reasoning process that evidences the coherence of the ruling from the facts and norms. In order to contextualize the answers to these questions in the analysis of a sentence, the heading must be retrieved and, from the facts mentioned, it should be selected those that permit to infer the conclusion. For the sake of simplicity, we will illustrate our proposal using a toy example. Generalizing this methodology to a larger number of sentences is beyond the scope of this paper and would be a challenge for future work. In order to address this task, we select the following sentences: (1) Sentences from the heading. (2) Phrases from the several sections -mentioned in point 1of the heritage sentence which include causal lexicon, such as the words ‘cause’, ‘effect’ or other relevant from a legal view, such as ‘fault’, ‘foreseeability’ (or their synonyms). (3) Sentences containing inferential words, such as ‘consequently’, ‘inferred’, . . . , linked to the evidentiary process. (4) Sentences containing proven facts; these constitute an essential part of any judgement since they are used as premises to infer a conclusion, although they correspond to sentences without a specific lexicon and they are so difficult to retrieve by a non-manual procedure. The lexicon will make it possible to retrieve sentences with causal content. Those sentences can be represented by means of a graph. The proven facts will permit to instantiate specific nodes of the graph with values and, consequently, to intervene in the causal chain by isolating those that lose their influence and thus varying the possible attribution of blame. Using the lexicon referred to in Table 4.2, it is possible to retrieve several sentences with causal content in the heritage sentence analyzed (see Annex). It deserves a special mention the heading and the proven facts, propositions that
72 A. Sobrino and B. Calderón-Cerrato Table 4.2 Scrutinized lexicon in the heritage sentence Lexicon +Causal Cause Effect Damage Loss Fault −Causal By absence Evidential Derived Consequence Committed Proved Foreeseable/typical hehaviour truthfully describe what happened. The heading can be highlighted not so much for the specificity of its vocabulary but for its position in the legal sentence (it appears in first place). In the legal sentence, the proven facts are to be found in the section ‘Background relevant to the resolution of the case’, and they are rescued as the lexical characterization described in Sect. 4.4 advises. Retrieved sentences can be represented by means of a causal graph that illustratively abbreviates the information contained in it. A causal graph consists of nodes labelled with data or information, and links denoting the causal relationship between them. An interventionist or counterfactual analysis of the causal graph permits to write down a possible instantiation of a node and so discarding other nodes linked to it as possible causes (Fig. 4.3). The ruling sentence, which is negative for the interests of Mármoles Baztán (MB) -the efficient cause of the complaint-, bases its argumentation on whether the administration voluntarily took the decision to register the exploited system as an Listed Heritage Item (LHI) (value 0) -as Mármoles Baztán says- or by legal imposition (value 1). Therefore, the crucial node in the causal tree is the one now highlighted in colour. It is a fact of law that the Administration must register as LHI any geographical area in which rock remains are found. Therefore, the node receives the value 1, becomes active and in position to transmit causality to those that follow it. At the same time, since it is instantiated, it breaks any connection with the nodes that precede it, deactivating its possible causal influence (Fig. 4.4). The administration could have omitted the inscription of the Alkerdi Berroberría system (AB) as LHI if it was not obliged by law so that the causal flow would be that of the complete graph and the liability for loss of profit causally attributable to the department of the administration that had made the registration. But if the law obliges the administration to inscribe as LHI any area in which Neolithic engravings are found, this exonerates it from any liability for the damage caused. An interventionist analysis of causality has, in this case, an illustrative and truthful exemplification and shows how structural models make it possible to intervene in causal nodes and attribute degrees of responsibility in terms of proximity to the effect (Cfr. Stapleton, 2015). In the sentences extracted from the sentence and in the graph that summarizes them, the following issues can be noted in relation to the theoretical background set out in the previous sections:
4 A Causal Model Application to a Cultural Heritage Sentence Analysis Fig. 4.3 Causal graph of the analyzed legal sentence 73
74 A. Sobrino and B. Calderón-Cerrato Fig. 4.4 Instancing and intervention in the causal graph of the sentence 1. Sentences are plenty of causal knowledge, as is revealed by the considerable number of paragraphs in which causal lexicon can be found. The causal content has a descriptive, but principally an evidential relevance, as the mined sentence from the resolution section shows. 2. Sentences containing the word ‘obliged’ in the factual background may refer to legal text of necessary or unavoidable compliance and, therefore, to legal facts to which value 1 must be assigned when they appear. Thus, we have included the term ‘obliged’ in the section on the logical lexicon.
4 A Causal Model Application to a Cultural Heritage Sentence Analysis 75 3. The expression ‘for lack of’ points to a negative cause of prevention by omission: the failure to publish the restructuring plan of Mármoles Baztán (an event which is due to intentional agents, not to natural forces) causes the nullity of the extension of the exploitation concession of the Alkerdi Berroberría system for the marble company. 4. The words ‘damage’ and ‘harm’ point to the possible liability of the defendant, from which it is exonerated if it is an unintended effect of its activity. The distinction between a central or collateral effect is relevant. The but-for test often serves this purpose characterizing the necessity or the conditio sine-qua-non of the cause. Side effects are unintended effects, but in some cases, foreseeable effects, i.e., derived from the cause. 5. The words ‘derived’ or ‘consequently’ are part of sentences with logical taste and causal implications: ‘for damage deriving from the extension of the concession’ is equivalent to ‘for damage caused by the extension of the concession’. Consequently, they point to a necessary inference from a premise which is by definition: ‘true’, being so stipulated by law; i.e., something therefore ‘foreseeable’, ‘irresistible’ or ‘force majeure’. 6. A relevant question in this legal sentence, and therefore worthy of attention, is whether the administration carried out the archaeological studies because of the cultural alarm caused in the population living near the quarry or because of another more spurious type of concern, such as the fear of landslides or the appearance of cracks in their houses. In other words, it is not clear whether people had suspicions about the archaeological value of the field and this caused them concern or whether, fearing that the quarry activity would cause damage to their geographical surroundings or their personal properties, they urged the administration to carry out archaeological studies. In that case, the cessation of quarrying activity would have had the desired collateral effect. This legal sentence shows not only the abundant presence of causal relationships, but also of theoretical aspects related to this subject. This is pointed out, f. ex., in the section devoted to the ‘Legal backgrounds’, where it is stated that a legal doctrine used in the debate derives from a complex examination of three aspects: “causality, the possible concurrence or relevance of causes and the existence of a fault element”, emphasizing multi-causality and the restrictions that must be taken into account when considering the causal factor, whether in a forward or backward (counterfactual) interpretation. 4.6 Conclusions and Future Work In this paper, we have analyzed a sentence on a lawsuit focusing on cultural heritage in wich cave engravings are involved. The sentence shows an argumentative thread where causal and conditional logic with causal content has a notable presence. In the text we note the frequent presence of phrases involving causal relationship between
76 A. Sobrino and B. Calderón-Cerrato the proven facts or proved or not damages resulting from those facts. With no claims of completeness, and recognising that important points of law are left out of the picture, we have summarized the sentence taking into account the facts described and the causal relationship between them, allowing the justification of the verdict. Furthermore, we have deactivated the nodes that are spurious resulting in a scaleddown version of the above-mentioned graph, useful for informational purposes of a non-specialist people. Court documents are increasingly available in an electronic format, facilitating their accessibility and digital processing. Sentences are documents that follow a general and conventional structure, where specific paragraphs have specific communicative functions. Also, they contain descriptions of facts and implicit and explicit references to legal norms. The facts are frequently expressed in a narrative way; they attend to events or episodes and are relevant because they are premises that serve to justify the verdict. Causal graph schematically show the causal link between norms, facts and failure, helping to explain the attribution of guilt or responsibility of the agents involved in the legal trial, focusing on causation as something typical of intentional agents who have diverse or opposing interests. The analysis carried out in this paper must be generalized so that it becomes useful to the litigations that have cultural heritage as focus. Sentences are, in general terms, difficult to understand for general public, who are often reluctant to read them in their entirety. The analysis proposed here can be extended to other sentences, using techniques of natural language processing and automatic generation of text summaries, thus placing itself in the tradition of automating the information contained in legal texts, such as LetSum systems (Farzindar & Lapalme, 2004) or DIRECT (Hoekstra & Breuker, 2007). Therefore, a future work that would be fruitful to undertake is to generalize and automate the technique of generating causal graphs of sentences on cultural heritage to achieve automatic summaries accessible to all the people, allowing a visual but sufficiently informative understanding of their content. Acknowledgement This research was funded by the Spanish Ministry for Science, Innovation and Universities (grants TIN2017-84796-C2-1-R, PID2020-112623GBI00, and PDC2021-121072C21) and the Galician Ministry of Education, University and Professional Training (grants ED431C2018/29 and ED431G2019/04). All grants were co-funded by the European Regional Development Fund (ERDF/FEDER program). Annex SENTENCE N◦ 8/2020 PRESIDENT, Dª Mª JESÚS AZCONA LABAINO MAGISTRATES, D. ANTONIO SÁNCHEZ IBÁÑEZ
4 A Causal Model Application to a Cultural Heritage Sentence Analysis 77 Dª ANA IRURITA DIEZ DE ULZURRUN In Pamplona, February 4 two thousand and twenty. Seen by the Contentious-Administrative Chamber of this Hon. Superior Court of Justice of Navarra, constituted by the Magistrates expressed, the judicial decree of the appeal number 306/2018, filed against Resolution 95/2018 of April 6, of the General Director of culture and the Príncipe de Viana Institution, being parties in it: as appellant the entity “MÁRMOLES BAZTÁN, SA”, represented by the Lawyer Mr. Miguel José Leache Resano and assisted by the Attorney Mr. Oriol Prósper Cardoso; as defendant the FORAL COMMUNITY OF NAVARRA, represented and directed by the Lawyer of the Legal Services of the aforementioned Public Administration; and as co-defendant the insurer “ZURICH INSURANCE PLC SUC. EN ESPAÑA”, represented by the Attorney Ms. Natividad Izaguirre Oyarbide and assisted by the Attorney Ms. Olga Triguero Arrojo. The Background FIRST. After the appropriate procedural steps, by means of a document presented on December 28, 2018, the claim corresponding to the appeal of the heading in supplication was formalized that a sentence be handed down by which, allowing the appeal, the appealed resolution is annulled, the patrimonial responsibility for the damages derived from the registration as LHI of the Alkerdi Berroberría System, and the Administration is ordered to indemnify Mármoles Baztán, SA the amount of 7,923,207 euros for consequential damages and 31,829,938.32 euros plus the legal interest of said amount calculated from the date of the claim in administrative proceedings until the notification of the resolution, as well as the payment of the costs procedural. SECOND. Once the corresponding transfer was made, in writing presented on February 13, 2019, the defendant Administration opposed the demand, based on the facts and fundamentals of law that it deemed appropriate. In the same terms, the co-defendant made its answering brief. THIRD. Received the trial lawsuit and completed the process of conclusions, it was indicated for voting and ruling that took place on February 4, 2020, being the speaker Mrs. Magistrate Mrs. ANA IRURITA DIEZ DE ULZURRUN. Fundamentals of Law FIRST. Approach to the Contentious-Administrative Appeal The subject of this contentious-administrative appeal is Resolution 95/2018 of April 6, of the General Director of Culture, Príncipe de Viana Institution, by which the
78 A. Sobrino and B. Calderón-Cerrato claim for patrimonial liability made by Mármoles Baztán, S.A. is inadmissible on August 31, 2017 for the economic damage suffered as a result of registration in the Register of the Property of Cultural Interest as an Archaeological Zone of the Alkerdi Berroberría System. It is reasoned in the aforementioned resolution that the registration of the LHI is an act of non-declarative procedure that lacks substantivity to produce damages, in addition to the impossibility of issuing a pronouncement on the merits while the judicial pendency persists, there are various processes in process regarding the declaration/inscription of the aforementioned cultural asset. In support of this reasoning, he cites the sentences of the AN of November 19, 2013, or of the Supreme Court of October 21, 2008. This premature claim due to lack of exhaustion of legal remedies prevents the alleged damage from being produced, specified and determined, STS July 10, 1992 or September 30, 2014. Consequently, the claim is inadmissible. The plaintiff defends the nullity of the aforementioned resolution by affirming the concurrence of each and every one of the requirements to be able to assess the patrimonial responsibility of the defendant Administration. In the first place, it alleges that it is necessary for the administration to issue an act recognizing or declaring the existence of the LHI, as required by article 40.2 Law 16/1985 on Spanish Historical Heritage. Registration also requires prior determinations such as the delimitation of the area that is declared LHI, which requires an administrative act. This is the procedure followed in all the Autonomous Communities. In this case, there is no administrative act declared by the LHI, so there has been a de facto way, on which the claim of patrimonial responsibility is based. The inscription of the Alkerdi System in the Registry includes the mention of the level of protection that corresponds to it, therefore the exploitation of the quarry cannot continue. Said termination constitutes an effective damage, consisting of the loss of the expenses incurred and the loss of the future profits of the exploitation. The causal relationship between the administration’s activity (LHI registration) and the damages derived from the impossibility of continuing exploitation is evident. The amount claimed amounts to 7,923,207 euros for expenses and 31,829,938' 32 euros for the loss of the exploitation. The damage is unlawful since the behaviour of the administration has been abnormal because the LHI was registered without prior declaration file, and the administration must compensate for the damages derived from the imposed limitation (STS of December 2, 2014). The claim is not premature, because the LHI statement is executive and has caused damage that is permanent and can be the subject of a claim. For all this, he requests that his claim be estimated, the nullity of resolution 95/2018 of April 6 is declared, the administration is ordered to compensate for damages caused (7,923,207 euros for consequential damages and 31,829.938.32 euros in concept of lost profits), the interests from the claim, the proceedings and the costs. The defendant Administration is interested in the dismissal of the contentiousadministrative appeal, and that the contested resolution is declared in accordance with the law. Remember that the claim is based on damages supposedly produced by the declaration or registration as LHI of the Alkerdi Berroberría system, the
4 A Causal Model Application to a Cultural Heritage Sentence Analysis 79 declaration was the subject of the appeal ORD 525/2016, which had not been resolved at the time of issuing resolution 95/2018. The appeal was dismissed by sentence 200/2018 of May 29, declaring the administrative action in accordance with law. At the time of filing the claim, the sentence was not final, since an appeal had been filed against it before the Supreme Court. By order of March 28, 2019, the appeal was inadmissible but resolution 95/2018 is in accordance with the law when it states that the claim must be considered untimely and premature, since when the damage is attributed to the annulment of an act, the term to claim has to start from its firmness as provided in article 67.1 Law 39/2015. In this case, at the time of issuance of the resolution under appeal, sentence 200/2018 had not reached firmness and therefore the claim it was formulated ahead of time. Alternatively, the claim should be dismissed. There is no de facto procedure since an administrative file was processed in the declaration of the Alkerdi Berroberría System as LHI, inasmuch as there are manifestations of rock art in it. There has been no defencelessness since it is the current legislation that determines which assets are considered LHIs. The inscription in the Listed Heritage Items of the cultural heritage of Navarra is a necessary consequence of its declaration as LHI by ministry of the Law – articles 24 and 13 of Law 14/2005. The administration was obliged to register, which is what it has done in this procedure without there being any abnormal operation of the public services. Based on the above and in terms of the damages, they are not sufficiently specified either in the administrative claim or in the lawsuit. Many of them coincide with those claimed in procedure 37/2017 filed against Resolution 255/2016, of November 21, of the Technical Secretary General of the Department of Economic Development of the Government of Navarra, by which the patrimonial claim derived from the damage suffered as a result of the suspension of the drilling and blasting works of the quarry, agreed by resolution 668/2014 being parties thereto. Likewise, the causal relationship between the declaration or registration as LHI of the Alkerdi Berroberría system and the damage suffered for which it is now claimed has not been established. Damages are not a consequence of the declaration or registration of the LHI. The damages or their valuation have not been tacitly admitted, as the resolution that is the subject of the dispute does not pronounce on the matter because it does not enter into valuation of said concepts. Quantification is not possible because it was not carried out at the administrative headquarters. For all this, he requests that the claim be dismissed. Zurich Insurance opposes the lawsuit by first claiming that its policy does not cover the claim that is the subject of this process. Once the foregoing has been established, it indicates that the inadmissibility is correct, given that the administrative claim was filed before the ruling was issued on the legal compliance of the declaration of the Alkerdi Berroberría System as LHI. The final sentence is the one that determines the moment of birth of the action to claim. Alternatively, the claim cannot succeed because registration as a LHI is not the execution of an administrative act but compliance with a legal provision once the existence of rock art has been proven in compliance with the provisions of article 15 and DA 2ª of the Foral Law 14/2005. It was not necessary to process
80 A. Sobrino and B. Calderón-Cerrato the procedure provided for in article 19 of the aforementioned Foral Law since the budget for such a declaration was agreed, as stated in sentence 200/2018 of May 29 of this Chamber. The inscription in the Registry is a consequence of the declaration of the LHI, required by article 24 of the LF 14/2005. It is irrelevant how the procedures are managed in other Autonomous Communities, there is own regulation in our legislation. Regarding damages, the same concepts as in PORD 37/2017 and 313/2018 are being claimed without in this case there being a causal relationship between the administrative action and said damages. SECOND. Relevant Background for the Resolution of the Case From the documentary in the file, the following relevant facts emerge, which have not been denied by the parties: 1. On November 8, 1984 it was granted to “Mármoles del Baztán, S.A.” Alkerdi’s exploitation concession title for the extraction of marble for a period of 30 years, extendable for equal periods up to a maximum of 90 years, under the condition of complying with the Natural Space Restoration Plan. 2. By Resolution 901/2013 of October 10 of the General Director of Industry, energy and innovation, the concession of the Alkerdi exploitation was extended, with marble limestone as its object, with a period of validity of 30 years from 2014, expiration date of the initial concession, the exploitation project as well as the restoration plan were approved. 3. On July 15, 2014, a blast took place that caused great alarm in the population due to the power and noise generated, after which the Urdax City Council issued a resolution dated July 25, 2014 requiring the concessionaire to comply with the established in resolution 513/1999 of the General Director of Culture (Príncipe de Viana Institution), within the framework of the 1999 Environmental Impact Study, and in order to protect the archaeological sites of Berroberría and Alkerdi, as well as the rest of the caves and existing water currents, do not carry out blasting in situ, at the same time that it requires the Department of the Environment to, together with the Department of Culture and the Department of Economy, Finance, Industry and Employment, as well as the CHC, inspect in the field of their respective competencies the activity that is being carried out in the concession so that the appropriate corrective measures may be required. By resolution 668/2014 of August 12, the blasting is provisionally suspended and various studies are started on them and the archaeological status of the nearby caves. 4. In August 2016, as a result of the research study carried out by the Sociedad Ciencias Aranzadi, commissioned by the Government of Navarra, a series of engravings from the Paleolithic era were discovered in the cave of Alkerdi 2, until then unknown, and classified as the most ancient of Navarra. The report also notes that the entire Karst system as a whole needs to be protected by warning that continued activity at the quarry would affect undiscovered cultural heritage.
4 A Causal Model Application to a Cultural Heritage Sentence Analysis 81 5. As a result of all this, on August 29, 2016, the entire Alkerdi Berroberría system was registered in the Register of Listed Heritage Item of Navarra as LHI and in the category of Archaeological Zone. This statement was the subject of the appeal ORD 525/2016, issuing sentence 200/2018 of May 29, which dismissed the claim as it did not appreciate the factual way in the declaration of the Alkerdi Berroberría System as LHI and the unnecessary observance of the procedures of article 19 of the Regional Law 14/2005 given the verification of the existence of rock art in the caves of the system. The sentence also declared inadmissible the claim for damages for the stoppage of the exploitation activity, “because it was understood that they were not a consequence of the declaration of the LHI but, where appropriate, of the revocation of the extension of the concession” (FJ 7◦ ). By order of March 28, 2019, the Supreme Court rejected the appeal for cassation. 6. By Regional Order 192/2016 of September 12 of the Minister of Economic Development, the nullity of resolution 901/2013 of October 10 was declared ex officio, by which the extension of the Alkerdi concession had been declared due to lack of publication of the plan restoration. The Foral Order is appealed giving rise to ORD 113/2017 in which dismissal sentence No. 251/2018 of June 28 was issued. 7. Once the extension was processed again, by Resolution 197/2016 of October 28 of the General Director of Industry, Energy and Innovation, the existence of the new LHI was denied. Appeal filed, by sentence of June 13, 2018 ORD 112/2013 was dismissed confirming the resolution. 8. On August 31, 2017, Mármoles Baztán filed a patrimonial claim with the Counselor of the Department of culture, sports and youth of the Government of Navarra, for the damages derived from the registration of the LHI for the amount of 7,923,207 euros as consequential damages and 31,829 .938,32 euros for the loss of the exploitation concession, which is inadmissible by resolution 95/2018 of April 6 of the General Director of Culture and the Príncipe de Viana Institution, subject of this litigation. 9. On August 31, a patrimonial claim was also filed before the Minister of Economic Development of the Government of Navarra for the damages derived from the declaration of nullity of the extension of the mining concession by Foral Order 192/2016 of September 12, for the amount of 7,923,207 euros, inadmissible by resolution 72/2018 of April 20, giving rise to ORD 313/2018. THIRD. On the Inadmissibility of the Claim of Patrimonial Responsibility Resolution 95/2018 of April 6 rejects the claim of responsibility articulated by the appellant company on the understanding that it has been filed prematurely and extemporaneously given that judicial proceedings regarding the conformity or nonconformity to the right of the registration declaration of the Alkerdi Berroberría System were pending as LHI. On the date of said resolution, sentence 200/2018 of May 29, ORD 525/2016, had not been issued on this matter. At the time of filing the contentious-administrative appeal; on July 25, 2018, the sentence was pending
82 A. Sobrino and B. Calderón-Cerrato an appeal before the Supreme Court, an appeal that has finally been inadmissible by order of March 28, 2019. That is, at this time if it is possible to analyze the claim of patrimonial responsibility, since the procedural impediment for the birth of the action wielded by defendants, has disappeared, since there is already a final sentence that affirms the conformity to law of the declaration of the Alkerdi Berroberría System as LHI. The parties have also ruled on the substantive issue, so that no defencelessness is caused. Notwithstanding the foregoing, if it is necessary to indicate that the pending judicial process on the LHI registration declaration was not an obstacle to resolving the claim of patrimonial liability articulated by Mármoles Baztán SA, since the action was based on the damages that the registration of the LHI, declared by way of fact in the plaintiff’s thesis, has caused her, a statement that is executive. That is to say, it was not necessary to wait for the issuance of a sentence on whether or not said statement was in accordance with the law, since the appellant claims for the damages that in his opinion the statement of the LHI itself has caused it and its registration in the corresponding Registry affecting the exploitation concession held over the Alkerdi quarry. FOURTH. Regarding the Patrimonial Responsibility as a Result of the Registration of the Alkerdi Berroberria System as LHI in the Register of Listed Heritage Items of Navarra The patrimonial liability of the public administration is regulated by articles 32– 35 of Law 40/2015 of October 1, on the Legal Regime of the Public Sector, legal precepts that make explicit the general principle of compensation by the Public Administrations for damages and losses caused by the operation of public services, constitutionally sanctioned in Spain in article 106.2 of the Constitution – which indicates that “Individuals, in the terms established by law, shall have the right to be compensated for any injury they suffer in any of their assets and rights, except in cases of force majeure, provided that the injury is a consequence of the operation of public services”. These norms are applicable to Local Entities in merit of the normative provision of article 54 of the Regulatory Law of the Bases of the Local Regime (Law 7/1985, of April 2, which refers to the general legislation on administrative responsibility, to the as well as article 223 of the Regulations for the Organization and Operation of Local Corporations (Royal Decree 2568/1986, of November 28. The aforementioned legal regime has been extensively applied – and, consequently, developed and interpreted – by the Jurisprudence (both applying the current and cited article 32.1 and its predecessor, article 139 of Law 30/1992), forming a body of doctrine, within which it can be affirmed that, for the declaration of the patrimonial responsibility of the Administration, the concurrence of two substantial positive requirements, one negative and the other procedural, is necessary:
4 A Causal Model Application to a Cultural Heritage Sentence Analysis 83 A. The first of the positive ones is that there is an effective, economically evaluable and individualized damage with respect to a person or group of people, that the interested party does not have the legal duty to bear. This requirement is included in the elements to be tested, although some of its aspects are produced or manifested within the scope of the arguments of the parties (simplified by the existence of a catalogue of jurisprudential solutions that can be invoked and appreciate- without further discussion), such as the extent and nature of the compensable damages, the legitimate persons and the cases in which there is a legal obligation to bear the damage. B. The second positive requirement is that the damage is attributable to a Public Administration. This note is apparently the most complex, since the common doctrine of tort liability and for illegal acts becomes a complex phenomenon of examination on the causal relationship, the eventual concurrence and relevance of causes and the existence of guilty elements. However, in the administrative patrimonial responsibility, in the configuration that we have enjoyed since the 1957 Law (even since the 1954 Expropriation Law), it is greatly simplified by the legal expression that the injury “is a consequence of the normal or abnormal operation of public services” (articles 122 of the Forced Expropriation Law, 40 of the Law of the Legal Regime of the State Administration and 32 Law 40/2015). Fundamentally, there are four imputation titles for the purpose of determining the responsibility of an Administration with respect to a specific injury: that the injury occurs as a direct consequence of the ordinary exercise of the service; that the injury is due to an abnormality or non-functioning of the public service; that there is a risk situation created by the Administration in the area of production of the harmful event, or that there is an unjust enrichment by the Administration. C. The negative factor is that it does not obey force majeure damage. This note has been conceptually and jurisprudentially specified in the sense that it concerns, in order to be able to the concurrence of force majeure, an event produced with the traditional requirements that distinguish force majeure from the fortuitous event (concepts of predictability and irresistibility), but specifically that it is a cause outside the scope of the public service. D. The procedural element is that the appropriate claim is formulated before the responsible Administration in the period of 1 year, counting from the occurrence of the injury. This element raises the question of the initial term – on which there are sufficient jurisprudential details – and on the Administration to which the claims should be addressed if several of them concur. The plaintiff considers that there is a set of actions of the administration that are the determinants of the damage that has been caused and, which include the extension of the concession, the investigation of the system by hiring the Aranzadi study society, the declaration as LHI and the delimitation of the Alkerdi Berroberría area, including the Alkerdi quarry as LHI. The appellant highlights that the delimitation of the protection zone and the non-continuation of the exploitation are not produced by operation of the law but are a decision of the administration and both actions
84 A. Sobrino and B. Calderón-Cerrato are those that suppose normal or abnormal action of the administration causing the damage that it has suffered, consisting of expenses incurred to continue with the mining operation after the renewal of the initial concession and the loss of profit that occurs when not being able to continue with the activity. However, the assumptions required by law and jurisprudence do not meet for the claim to prosper. First of all, it is necessary to indicate that the registration of the Alkerdi Berroberria System as an archaeological zone LHI is an administrative action in accordance with the law. Said question as well as the analysis of the procedure followed for the declaration of the LHI; the extension of the same and its incompatibility with the blasting system for the extraction of mineral, was alleged in the procedure ORD 525/2016 of this Chamber, and resolved by the final sentence 200/2018 of May 29 that reasons in this regard: SIXTH. On the Declaration of LHI by the Ministry of Law and the Protection of the Alkerdi Berroberría System The plaintiff also challenges the appealed resolution alleging the contradiction of the LHI statement by operation of the Law with art. 93 of Law 39/2015, which prohibits the Administration from initiating a material enforcement action that limits the rights of individuals without previously adopting the resolution that serves as a legal basis, for which it has acted in fact, causing defencelessness. The art. 93 of Law 30/1992 (although the plaintiff mistakenly points out art. 93 of Law 39/2015, which has no relation to the issue under discussion) establishes that: “1. Public Administrations shall not initiate any material action of execution of resolutions that limit the rights of individuals without having previously adopted the resolution that serves as a legal basis. 2. The body that orders an act of material execution of resolutions will be obliged to notify the interested individual of the resolution that authorizes the administrative action”. The precept is framed in the TV, referring to the execution of administrative acts and is not applicable in this case because the registration as LHI of the Alkerdi Berroberría System is not an execution of a previous administrative act, but rather, once the existence of rock art, the inscription is carried out in the General Register of Listed Heritage Items of the Ministry of Education, Culture and Sports, with the category of archaeological zone, in compliance with the provisions of art. 15 and the Second Additional Provision of Foral Law 14/2005, of November 22, on the Cultural Heritage of Navarra, which is the one that establishes that “Listed Heritage Items are declared by the Ministry of this Foral Law: a) Caves, shelters and places that contain manifestations of rock art, as well as prehistoric megalithic manifestations”. The Administration has verified that there is rock art in the Alkerdi Berroberría System and, in compliance with legal provisions, has proceeded to request registration as a LHI, in the exercise of its powers (art. 4 of the Foral Law 14/2005), and this does not imply consecrating the impunity of the Administration, because said registration is subject to the control of the Courts, as is the case with this
4 A Causal Model Application to a Cultural Heritage Sentence Analysis 85 contentious-administrative appeal, and neither has material defencelessness been caused to the plaintiff, who is the only one that could determine the nullity of the administrative action following the constant doctrine of the Constitutional Court that has been stating that the defencelessness with constitutional legal significance occurs only when the interested party is, unjustifiably, unable to impetrate the judicial protection of their rights and legitimate interests or when the violation of procedural or procedural norms carries with it the deprivation of the right to the defense, with the consequent real and effective damage to the interests of the affected party by being deprived of their right to allege, prove and, where appropriate, to reply to the contrary arguments (Constitutional Court Sentences 31/1984, 48/1984, 70/1984, 48/1986, 155/1988 and 58/1989, 161/2001 among many others). The STS of February 1, 2018 (ROJ: STS 350/2018 – ECLI: ES: TS: 2018: 350) Sentence: 139/2018 Appeal: 3218/2015 Speaker: Maria Del Pilar Teso Gamella, recalls that: “jurisprudence of this Chamber has been especially restrictive regarding the treatment of this ground for full nullity, based on article (62.1.e) of Law 30/1992, declaring that the formal defects necessary to apply this radical nullity must be of such a dimension that the procedure must have been completely and absolute dispensed with, and the omission of any of its procedures is not enough. Having singularly valued” the consequences produced by such omission to the interested party, the lack of defence that has really originated and, above all, what could have changed the original administrative act in case the omitted procedure had been observed (SSTS of October 17, 1991 and May 31, 2000) (STS of May 5, 2008). The art. 19 of the Foral Law 14/2005 establishes the procedure for the declaration of Listed Heritage Items, with the initiation agreement, 30-day public information process in the case of real estate, the provisional application to the affected assets of the protection regime established in the provincial law, the hearing of the interested parties and the town councils in whose term the real estate and the Administration Departments of the Provincial Community affected by reason of their competences, the mission of the technical reports necessary for the description of the property, as well as the justifications of the relevance and singular character that determine its declaration as Listed Heritage Item. Likewise, the mandatory report of the Navarre Council of Culture and the declaration as Listed Heritage Item by the Government of Navarra must be included, at the proposal of the competent body, with the subsequent registration in the Register of Listed Heritage Items of Navarra that will be communicated to the General State Administration, the City Council where the property is located and the interested parties. This is the general procedure, however, the same law contains the express provision that the caves, shelters and places that contain manifestations of rock art, as well as prehistoric megalithic manifestations, are Listed Heritage Items by ministry of this Foral Law. This legal provision displaces, in the case of caves with rock art, the procedure contained in art. 19 and it is the Law itself that determines that these places, once the existence of rock art has been verified, are Listed Heritage Items and, consequently, the inscription in the Register of Listed Heritage Items of Navarra proceeds.
86 A. Sobrino and B. Calderón-Cerrato Therefore, the Provincial Administration has acted in accordance with the specific provisions for this type of property contained in the Provincial Law 14/2005, which determines the non-existence of the alleged de facto way. Regarding the allegation that the technicians of the Administration’s Mining Section maintain the criterion that it is not necessary to protect the entire massif and that as only rock art has been found in the Alkerdi and (supposedly) Alkerdi 2 caves, only It can be understood that the declaration of LHI takes place by operation of the law of said cave, and not of the rest of the massif, considering the evidence practiced in court, it must also be rejected. Thus, although D. Victorio, a mining engineer, who made a report for the plaintiff, maintains that the karst massif is irregular, without an established pattern and the different caves may not be linked, a connection study must be carried out between the caves and that the quarry can be exploited without blasting and without damaging the heritage, D. Jose María stated that Alkerdi II is the mouth, but the entire karst system must be protected, the entire network of cavities that are connected. They have reached level 3, level 4 is missing which is where the current river is. The caves communicate because there is circulation of air and water. There is only painting in Alkerdi II and engravings in Alkerdi I and Alkerdi II. The condition of the quarry is proven because dust, cut marble fragments, plastics, etc. are entering, water is leaking into the cave. They are remains of the quarry and it has already affected the cave. It concludes that the quarry is incompatible with the cave, even without blasting, the extractive activity as a whole is incompatible. In the same way, Don Juan Ramón, professor at the school of mines, states that the caves form a karst system, Dª Estrella, a geologist, declared that it is a single system and that the entire system must be protected because it is not fully investigated, not quarry activity must continue. In the square where gravel is accumulated there are two holes that communicate with the painting room, so, in his opinion, the operation of the quarry cannot be followed by the currents of humidity and air and CO2. There is incompatibility between the quarry and the protection of the caves. Finally, D. Adriano, a geologist who participated in the study by the Aranzadi Science Society, stated that Alkerdi II is under the quarry, that the system is unique and the dynamics of the entire system, therefore, to protect it, they have to act on the entire system and insists, like the other technicians, that the quarry is incompatible with the conservation of heritage. Assessing the explanations offered by the different technicians in trial, all the technicians who have declared are conclusive that it is a unique karst system that requires global protection and, therefore, the declaration of Listed Heritage Item of the entire System are forceful. The technician who disagrees with this conclusion, D. Alonso, what he says is that the different caves may not be linked, does not affirm with verified data that they are not linked and although he states that, in his opinion, the quarry can be exploited without blasting and without damaging the heritage, he does not offer sufficient reasons to estimate his conclusions to the detriment of those of all the other technicians who have deposed in court. The reasons offered by the experts in geology and archaeology who have judicially declared endorse the inscription as a Site of Cultural Interest of the Akerdi-Berroberría System, not of isolated caves, and, in this sense, the comparison made by D. Antonio was very
4 A Causal Model Application to a Cultural Heritage Sentence Analysis 87 graphic when he pointed out that it would be like protecting an altarpiece in a church and not protecting the church that contains it. It is not possible to carry out the restrictive interpretation that the plaintiff postulates in its brief of conclusions because the protection of the archaeological heritage must be carried out in an integral way and for this purpose the registration of the entire System as LHI is oriented. The inscription of specific caves such as Alkerdi I or Alkerdi II does not guarantee the protection of heritage as it is a unique karst system. Due to the foregoing, this ground of challenge must also be rejected. That is to say; there was no de facto way in the LHI declaration and it was necessary, in view of the expert reports carried out, to register the entire Alkerdi Berroberría system in the Register of Listed Heritage Items of Navarra, including the Alkerdi quarry. These are actions that the Government of Navarra has carried out in full compliance with Regional Law 14/2005 of November 22, on the Cultural Heritage of Navarra and therefore we are facing a normal operation of the administration. Based on the above, the registration of the Alkerdi System as LHI in the Registry does not produce any more effects than those provided for in LF 14/2005 regarding the protection regime of the aforementioned assets, which for properties of cultural interest are included in articles 35 41 of the aforementioned rule. The registration in itself has no effect on the exploitation of the quarry, which in this case was not even paralyzed at the time of initiation of the declaration procedure of the LHIarticle 19.1 d Foral Law 14/2005 d. The initiation of the file of declaration of cultural interest with respect to a real estate will determine the suspension of the corresponding municipal licenses for subdivision, construction or demolition in the affected areas, as well as the effects of those already granted. The works that due to force majeure had to be carried out without postponement in such areas will require, in any case, the authorization of the Department responsible for culture., since it had been paralyzed by a resolution other than said procedure, such as the Resolution 668/2014 of August 12, final resolution. Therefore, there is no causal relationship between the administrative action and the damage that is said to be borne by the company, since the registration of the LHI does not produce as an effect or revocation of the extension of the exploitation that the appellant held or the herself denial. The revocation of the extension of the concession was agreed by Resolution 901/2013 of October 10 of the Director General of Industry, Energy and Innovation of the Government of Navarra, in response to the omission of an essential procedure in its processing, such as the submission to public information of the restoration plan submitted by the applicant company. Subsequently, by Resolution 197/2016 of October 28 of the General Director of industry, energy and innovation, the extension of the requested exploitation was denied, since it was understood that the mining exploitation works on the Alkerdi quarry were incompatible with the preservation of the LHI System Alkerdi Berroberría, having reported unfavorably both the Príncipe de Viana Institution based on a report prepared by the Aranzadi Studies Society and the Environmental Quality and Climate Change Service. In other words, the nullity and denial have been based on issues unrelated to the LHI registration; one of a procedural nature and the other derived from the assessment of the compatibility
88 A. Sobrino and B. Calderón-Cerrato between the specific intended exploitation and the necessary preservation of the Alkerdi Berroberría System. For this reason, it is necessary to conclude that what has been able to generate the damages that are claimed here – expenses incurred to continue with the exploitation and derived from the loss of profits – would in any case, the nullity of the extension agreed by Foral Order 192/2016 of 12 of September of the Minister of Economic Development and the denial of the extension by Resolution 197/2016 of October 28 of the General Director of Industry, Energy and Innovation of the Government of Navarra. Both are the administrative actions that have dealt with the rights and expectations that Mármoles Baztán held about the Alkerdi quarry. This was indicated in the seventh legal basis of sentence 200/2018 issued in ORD 525/16: SEVENTH. About the Damages Claimed Finally, the plaintiff alleges that the declaration and registration of the Alkerdi Berroberría System supposes for Mármoles del Baztán SA the paralysis of its exploitation activity and requests that the Administration compensate her in the amount of A C 42,546 per month for the period that mediates from the registration of the “Alkerdi Berroberría System” in the Register of Cultural Heritage of Navarra until the ruling is handed down in this procedure plus the cost of employment regulation during the same period. This application must also be rejected because the object of the initial procedure is the conformity or not to the Right of registration as LHI of the Alkerdi Berroberría System in the General Register of Goods of Cultural Interest of the Ministry of Education, Culture and Sports, with the category of archaeological zone and a declaration of damages could only be upheld in favour of the plaintiff if the administrative action was contrary to the legal system and, as has already been stated throughout this sentence, the contested administrative action is considered to be right. Furthermore, as upon reaching the defendant, the quantification of the damages alleged by the plaintiff is not duly accredited, because they have not been the subject of this proceeding, and they are not damages derived from the LHI declaration by operation of law, but, where appropriate, of the revocation of the extension of the concession, which is the object of the OP O12/2017 followed before this Chamber; which determines the rejection of this motive for appeal and with that of the complaint filed as the administrative action challenged in accordance with the Legal System”. And actually, this is also what the plaintiff comes to understand, who at the same time that she filed the claim for patrimonial liability that has given rise to this Litis, she filed another claim for damages, this one yes, that had caused her the Foral Order 192/2016 of September 12 of the Economic Development Counselor cancelling the extension of the exploitation concession that had been granted in 2014. This claim has led to the ordinary procedure 313/2018, which is pending.
4 A Causal Model Application to a Cultural Heritage Sentence Analysis 89 In view of the foregoing and reiterating that there is no causal relationship between the registration of the Alkerdi Berroberría System as LHI and the damages claimed, the lawsuit cannot succeed, and must be dismissed in this regard. FIFTH. Costs As for the costs, their imposition is not appropriate in view of the partial estimate of the claim. In the name of His Majesty The King and by the authority conferred by The Spanish Nation, We Rule That we must partially uphold the present contentious administrative appeal filed by the attorney SR LEACHE on behalf of MÁRMOLES DEL BAZTÁN SA against the agreement already identified in the heading of this resolution, which is annulled for not being in accordance with the law, rejecting the claim of patrimonial responsibility filed for the damages derived from the registration of the Alkerdi Berroberría System as a Listed Heritage Item. Without costs. Notify this Judicial Resolution in accordance with article 248 of the Organic Law of the Judicial Power, stating that against it, it is only possible to lodge an appeal against the corresponding Chamber, solely and exclusively, in the event that there is any case of objective cassation interest and with the established legal requirements, all in accordance with articles 86 and following of the Law of the Contentious Administrative Jurisdiction in the wording given by Organic Law 7/2015 of July 21. Said appeal must be prepared before this Chamber of the Superior Court of Justice of Navarra within the period of 30 days following the notification of this Sentence. The parties are informed that in any case, and in all cassation appeals that are presented, all the writings relating to the corresponding cassation appeal must be inexcusably adjusted to the extrinsic conditions and requirements that have been approved by Agreement of the Chamber of Government of the Supreme Court and this Superior Court of Justice of Navarra on dates 21-4-2016 and 27-6-2016 respectively. These Agreements are posted on the notice board of this Superior Court of Justice as well as published on the website of the General Council of the Judiciary (www.poderjudicial.es) for the public and general knowledge. Thus, by this our sentence definitively judged, we pronounce it, send it and sign it.
90 A. Sobrino and B. Calderón-Cerrato References Aguirrezabala, M., & Fanduzzi, N. P. (2012). Selección de herramientas discursivas para el análisis del lenguaje jurídico. Foro, Nueva época, 15(2), 105–123. Armstrong, D. M. (1999). The open door. In H. Sankey (Ed.), Causation and laws of nature (pp. 175–185). Kluwer Academic Publishers. Barros, D. B. (2013). Negative causation in causal and mechanistical explanation. Synthese, 190, 449–569. Bosque, I., & Demonte, V. (dir.) (1999). Gramática descriptiva de la lengua española. Espasa Calpe S.A. Cummins, D. D., Lubart, T., Alksnis, O., et al. (1991). Conditional reasoning and causation. Memory & Cognition, 19, 274–282. Dowe, P. (2000). Physical causation. University of Cambridge Press. https://doi.org/10.1017/ CBO9780511570650 Farzindar, A., & Lapalme, G. (2004). LetSum, an automatic legal text summarizing system. In T. Gordon (Ed.), Legal knowledge and information systems, Jurix, 2004 (pp. 11–18). The 6th Annual Conference IOS Press. Glennan, S. S. (2009). Productivity, relevance and natural selection. Biology and Philosophy, 24, 325–339. Hart, H. L. A., & Honoré, A. M. (1959). Causation in the law. Oxford University Press. Hoekstra, R., & Breuker, J. (2007). Commonsense causal explanation in a legal domain. Artificial Intelligence and Law, 15(3), 281–299. Honoré, T, & Gardner, J. (2019). Causation in the law. Stanford Encyclopedia of Philosophy (Fall 2019 edition), Edward N. Zalta (Ed.), https://plato.stanford.edu/archives/fall2019/entries/ causation-law/. Lagnado, D. A., & Gerstenberg, T. (2017). Causation in legal and moral reasoning. In M. R. Waldmann (Ed.), The Oxford handbook of causal reasoning (pp. 565–601). Oxford University Press. Lehmann, J. & Breuker, J., (2000). On automatic causal reasoning. In J. Breuker, et al. (Eds.), Legal knowledge and information systems. Jurix 2000 (pp. 123–134). The Thiteenth Annual Conference. IOS Press. Li, S. (2017). A corpus-based study of vague language in legislative texts: Strategic use of vague terms. English for Specific Purposes, 45, 98–109. Mackie, J. L. (1980). The cement of the Universe: A study of causation. Clarendon Press. Martí Sánchez, M. (2004). La compleja identidad del léxico jurídico. Estudios de Lingüística Universidad de Alicante (ELUA), 18, 169–189. McDermott, M. (1995). Redundant causation. The British Journal for the Philosophy of Science, 46(4), 523–544. Montolío Durán, E. (dir.) (2011). Estudio de campo: Lenguaje escrito. Comisión para la modernización del lenguaje jurídico. Ministerio de Justicia. Pearl, J. (2009). Causality: Models, reasoning, and inference (2nd ed.). Cambridge University Press. Pérez Saldanya, M. (2014). Oraciones causales. En Company, C. (dir.), Sintaxis histórica de la lengua española. Tercera parte: Adverbios, preposiciones y conjunciones. Relaciones interoracionales. Volumen 3 (pp. 3449–3609). México: FCE, UNAM. Real Academia Española. (2020). Diccionario panhispánico del español jurídico. Consultado en https://dpej.rae.es/ Ruiz, H., & de Loizaga, F. J. (1992). Algunas Consideraciones en torno al complemento agente. Revista Española de Lingüística., 22(2), 339–359. Stapleton, J. (2015). An ‘extended but-for’ test for the causal relation in the law of obligations. Oxford Journal of Legal Studies, 35(4), 697–726. https://doi.org/10.1093/ojls/gqv005
4 A Causal Model Application to a Cultural Heritage Sentence Analysis 91 Taranilla, R. (2015). El género de la sentencia judicial: Un análisis contrastivo del relato de hechos probados en el orden civil y en el orden penal. Ibérica, Revista de la Asociación Europea de Lenguas para Fines Específicos, núm, 29, 63–82. Wright, R. W. (1985). Causation in tort law. California Law Review, 73, 1735–1828.
Chapter 5 What Archaeological Texts Argue About: Denotations and Ontological Proxies Cesar Gonzalez-Perez Abstract Argumentation-oriented discourse analysis usually focuses on what is being said and how, following the text under analysis quite literally, and paying little attention to the things in the world to which the text refers. However, to perform argumentation-oriented discourse analysis, one must assume certain conceptualisations by the author in order to interpret and reconstruct propositions and argumentation structures. These conceptualisations are rarely captured as a product of the analysis process. In this chapter, we argue that considering the ontology to which a discourse refers as well as the text itself provides a richer and more useful representation of the discourse and its argumentation structures, facilitates intertextual analysis, and improves understandability of the analysis products. To this end, we propose the notions of ontological proxies and denotations, i.e. the conceptual artefacts that connect elements in the argumentation structure to the associated ontology elements, and the propositional segments that anchor these to the text, respectively. Keywords Ontological proxies · Argumentation · Discourse modelling · Conceptual modelling · Ontologies 5.1 Introduction Discourse analysis helps us understand the structure, content and objectives of texts, contributing to better insights into how people say what they say, how they justify their claims and overall, how we construct knowledge. Usually, discourse analysis focuses on “saying, doing and being” (Gee, 2014), where saying refers to what is said, doing to the practice of speaking by the author, and being to his or her the social roles. Different discourse analysis techniques such as RST (Rhetorical Structure C. Gonzalez-Perez (!) Incipit CSIC, Santiago de Compostela, Spain e-mail: cesar.gonzalez-perez@incipit.csic.es © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 C. Gonzalez-Perez et al. (eds.), Discourse and Argumentation in Archaeology: Conceptual and Computational Approaches, Quantitative Archaeology and Archaeological Modelling, https://doi.org/10.1007/978-3-031-37156-1_5 93
94 C. Gonzalez-Perez Theory) (Mann & Thompson, 1988) or IAT (Inference Anchoring Theory) (Janier et al., 2016; Reed & Budzynska, 2010) focus on different purposes, being one of them the identification and study of argumentation structures. Argument-oriented discourse analysis usually proceeds by breaking down a text into meaningful chunks, such as locutions or utterances, and then constructing a model of how these chunks are related to each other in terms of argumentation schemes or coherence relations (Centre for Argument Technology, 2018). The final products of argumentoriented discourse analysis, in this manner, are diagrams and accompanying texts that describe what argumentation devices such as inferences, conflicts or rephrasings are being employed by the author. Naturally, argument-oriented discourse analysis focuses on what is being said and follows the source text as literally as possible. This is a desirable property, as being faithful to the text minimises unwanted biases and spurious information that the analyst might otherwise inject. However, this also has the consequence that little or no attention is paid to the actual things in the world to which the text refers. But the analyst must necessarily develop a mental model of what entities are being referred to by the text in order to understand it, resolve references, construct meaning and, in general, make sense of the words. In particular, proposition reconstruction (i.e. rewording the literal locutions in the text so that standalone propositions can be obtained) often plays a central part in argument-oriented analysis discourse, as illustrated by the IAT Guidelines (Centre for Argument Technology, 2018). And reconstructing propositions requires the analyst to guess or unveil what was in the mind of the authors, so that their words make sense. This mental model of the discourse domain that the analyst constructs is rarely mentioned in the discourse analysis literature, despite its apparent centrality. Consequently, it is rarely captured as a product of the analysis process, and usually lost forever. Readers or users of the analysis products must re-create this mental model in their heads again, possibly diverging from the interpretation adopted by the analyst, and thus hindering the communication and utility of the analysis products. When the text being analysed involves a situation in which two or more agents exchange arguments, this issue becomes even more important. The analysis of dialogical texts, as well as the analysis of independent but intertextually connected texts, requires the analyst to discover the common ontology shared by the different authors and interpret their utterances in relation to it. A shared ontology between authors must exist; otherwise, no communication would be possible. For example, if an author publishes a note as a response to a paper by someone else, this second author must share an ontology with the first one in order to respond to him or her. But, again, this shared ontology is rarely documented, and the products of the discourse analysis rarely refer to it. In this manner, the reconstructed propositions and argumentation relations are only anchored on the text but not on the world external to it, leaving to each reader or user the task of re-creating this ontology in their heads, hoping that they got it right, and re-interpreting the analysis products in relation to it. In this chapter we argue that the mental model that the analyst develops in relation to the discourse being analysed should be captured during argument-
5 What Archaeological Texts Argue About: Denotations and Ontological Proxies 95 oriented discourse analysis, and documented as a proper analysis product, so that users of the diagrams or other artefacts that result from the analysis can refer to it as necessary. To do this, we propose the use of conceptual models to represent the relevant parts of the world that the text refers to. In addition, we argue that detailed connections should be made between the conventional products of argumentoriented discourse analysis, usually diagrams, and these conceptual models, so that tracing between discourse and world becomes feasible. These connections are mediated by conceptual artefacts named ontological proxies. Finally, we argue that ontological proxies must be anchored to the propositions that refer to them via denotations. This chapter is based on a previous article (Gonzalez-Perez, 2020), which is now updated and presented with an archaeological focus. 5.2 Proposed Approach The approach followed in this chapter is based on conceptual modelling. This means that we consider that the product of an argument-oriented discourse analysis effort is a conceptual model, i.e. a formalised representation of a part of the world in terms of concepts as dictated by a given formalism or modelling language. Conceptual models are powerful because they represent a part of the world through controlled simplification so that we can reason on them and apply the results of our reasoning back to the part of the world being represented (Gonzalez-Perez, 2018). For example, we can represent the geography of a place through a digital map in a Geographical Information System, reason on the digital map (for example, by measuring the distance between two villages), and then apply the conclusions of our reasoning back to the physical world (we expect these villages to be at the measured distance). Conceptual models are composed of modelling elements, which are formalised concepts that adhere to a given formalism or modelling language. This modelling language is usually described through a metamodel, which defines what kinds of modelling elements, or primitives, can be used, and how they may be connected. For example, many modelling languages such as ConML (Incipit, 2020) or UML (OMG, 2017a) establish that the world is to be described in terms of primitives such as Type and Instance, or equivalent ones, adopting a classical type/token (Wetzel, 2018) stance. In our case, the part of the world being modelled is the discourse under analysis, and the modelling language is a more or less explicit collection of primitives from which the analysis products are constructed. In our work we use an extended version of IAT (Janier et al., 2016; Reed & Budzynska, 2010), which defines basic modelling primitives such as Locution, Proposition, Inference and Illocutionary Force, as well as specific relationships between them. Even though IAT has not been described through an explicit metamodel, its major “building blocks” (locutions, propositions, inferences, etc.) can be readily characterised from the literature. In this manner, performing a discourse analysis with IAT entails re-expressing what
96 C. Gonzalez-Perez the text says in terms of IAT’s primitives, i.e. what locutions there are, how they are reconstructed into propositions, how illocutionary forces anchor each proposition onto a locution, how inferences connect propositions to drive the argumentation from premises to conclusions, and so on. In this manner, the final product of an argumentation-oriented discourse analysis effort is a conceptual model of the discourse, which describes the discourse in terms of the above-mentioned modelling primitives. We will call this model a discourse model. In addition, the central thesis in this chapter is the need for every discourse model to be accompanied by a conceptual model of the discourse domain, or part of the world to which the text refers. We will call this model a domain model. At this point, we must make a clarification. Within information technologies, the representation of the world has been approached from two different disciplinary traditions and has thus generated two different sets of terms and assumptions. In the world of software engineering, the term “conceptual model” is often used, whereas in the tradition of artificial intelligence and computer systems, the term “ontology” is more common. The commonalities between conceptual models and ontologies are far more numerous than their differences (Atkinson et al., 2006; Gonzalez-Perez, 2017; Henderson-Sellers, 2011), so we will use “conceptual model” in this chapter despite the fact that “ontology” should work equally well. In this manner, the fact that both the discourse model and the domain model are both conceptual models allows us for a homogeneous treatment as well as their interconnection, as we explain in further sections. Figure 5.1 summarises our approach. Fig. 5.1 An author produces a discourse (bottom) referring to a part of the world in their mind (right-hand side). By looking only at the discourse, an analyst creates a discourse model to represent the discourse (top left), plus a domain model to represent the associated domain (top right). Since the discourse refers to the domain (thick arrow, bottom), the discourse model must somehow refer to the domain model (dashed arrow, top)
5 What Archaeological Texts Argue About: Denotations and Ontological Proxies 97 There is an extensive body of literature on conceptual modelling (as well as ontologies), and conceptual modelling is practised today through the use of many techniques, languages and tools, such as ConML (Gonzalez-Perez, 2018; Incipit, 2020), OntoUML (Suchánek, 2018), OWL (World Wide Web Consortium, 2012) or even UML (OMG, 2017b). To express discourse models, as introduced above, we employ a modified and formalised version of IAT (Janier et al., 2016; Reed & Budzynska, 2010), supplemented with details from the Periodic Table of Arguments (Wagemans, 2019; Wagemans, 2020), which we call IAT/ML (Gonzalez-Perez & Pereira-Fariña, 2021). IAT/ML is large and complex, and a thorough description is out of the scope of this chapter, but this should not matter for the current discussion, as the approach that we propose is independent of any particular modelling formalisms. On the other hand, we chose ConML to express domain models, as it is especially suited to the representation of soft issues such as vagueness, temporality and subjectivity (Gonzalez-Perez, 2013), which are often important in discourse analysis. A full description of ConML is out of the scope of this chapter, but we can offer a brief description. ConML is a general-purpose conceptual modelling language especially oriented towards the humanities and social sciences. It is based on the object-oriented paradigm, so its metamodel defines modelling primitives such as Class, Attribute, Association, Object and Link (Gonzalez-Perez, 2018; Incipit, 2020). This means that ConML models represent parts of the world in terms of what categories of things (classes) there are, what properties they have (attributes), how they relate to each other (associations), what particular entities exist (objects), and how they are connected one another (links). Even though the discourse and domain models are both conceptual models, they are expressed in terms of different languages (IAT/ML and ConML, respectively), and thus they must be considered two separate models rather than one. Keeping these models separate also makes sense for modularity reasons. For example, an intertextuality study addressing commonalities and differences between a set of related texts may want to use a common domain model for the whole collection of texts, but obviously one discourse model for each of them. In this manner, the relationship between discourse models and domain models (top of Fig. 5.1) is manyto-one. An example my help. Consider the following excerpt from an archaeological assessment report (Angove, 2020): The hedgerows bounding the site to the south-east are shown on the Charlestown Tithe Map and are therefore historically important using the criteria of the 1997 Hedgerow Regulations. Here, the author is describing the fact that some hedgerows are historically important because they appear in certain historical document, which, according to some criteria, constitutes grounds to consider them so. Note that the particular criteria are not mentioned in this fragment, but only a general reference to some regulations. Similarly, the specific site that the author is discussing is not mentioned,
98 C. Gonzalez-Perez Fig. 5.2 An IAT/ML diagram showing the text fragment mentioned above. Locutions are shown as large boxes on the right-hand side, whereas propositions are shown as large boxes on the left. Note that an inference, labelled IN18, indicates how propositions are argumentatively related. (The diagram was prepared with LogosLink, a software tool developed by the author) although we can determine what it is from previous sections in the report. Using IAT/ML, we would model this fragment as depicted in Fig. 5.2. The diagram in Fig. 5.2 constitutes a small part of a larger discourse model. To construct this model, the analyst had to interpret what the author meant. Expressions such as “the site” or “the criteria” need some reconstruction, as the fragment bears no reference to what site or criteria are being discussed. In the absence of an explicit domain model, the discourse model depicted above fails to convey the necessary information to the reader, who must interpret the diagram themselves to, luckily, arrive at the same mental model as the analyst who created it. A domain model of this text fragment would look like the one depicted in Fig. 5.3. This domain model represents the major things that are explicitly mentioned by the author, such as the site or the Charlestown Tithe Map. It also represents other things that do not appear in the text but we know about, such as the site’s name (which is mentioned by the author in previous locutions) or the fact that the 1997 Hedgerow Regulations apply in fact to the site (which is implied by the author). All in all, this domain model captures the interpretation that the analyst made of the discourse and can be used as a reference to better understand the discourse model. At this point, the question remains as to how elements in the discourse model should be connected to elements in the domain model, as depicted by Fig. 5.4. The discourse and domain models are different models, each using a different language, so there is no common formalism that may establish the rules for the
5 What Archaeological Texts Argue About: Denotations and Ontological Proxies TheSite: Site HR1997: CompoundNorm AppliesTo Title = “Hedgerow Regulations” Year = 1997 Name = “Land off Mill Lane” Hedgerows: ConstructiveElement 99 TitheMap: Map Represents Location = SE HistoricallyImportant = true Name = “Charlestown Tithe Map” Fig. 5.3 A ConML diagram showing a domain model for the text fragment mentioned above. Boxes represent entities in the world. For each one, an identifier and a category are given, separated by a colon. For some entities, values are stated, such as in the case of Location = SE for Hedgerows. Lines connecting boxes stand for links between entities and are labelled accordingly TheSite: Site AppliesTo Hedgerows: ConstructiveElement Location = SE HistoricallyImportant = true HR1997: CompoundNorm Title = “Hedgerow Regulations” Year = 1997 Name = “Land off Mill Lane” Represents TitheMap: Map Name = “Charlestown Tithe Map” Fig. 5.4 Diagram fragments for the discourse and domain models are displayed here. Blue arrows connecting them stand for the expected connections between elements in the discourse and elements in the domain. Discourse fragments have been highlighted in different shades for clarity. For example, the words “the Charlestown Tithe Map” in proposition PR10 (top left) must be connected to the TitheMap: Map entity (bottom right) necessary connection. In other words, neither the metamodel of IAT/ML or ConML can represent both propositions and entities in the world. In addition, IAT offers no modelling primitive to represent fragments of a proposition, such as “are historically important” or “The 1997 Hedgerow Regulations” in Fig. 5.4. To address these issues, we propose the notion of ontological proxy, as well as the related notion of denotation.
100 C. Gonzalez-Perez 5.3 Results An ontological proxy is an element in a discourse model that stands for another element in the associated domain model, and which may be referenced by multiple propositions. Let us unpack this definition and explore its consequences. • Ontological proxies are model elements. This means that, like any other model elements, they are formalised concepts in the mind of the analyst (GonzalezPerez, 2018), and are usually communicated via depictions in diagrams or other media. • Ontological proxies are elements in the discourse model. This means that the IAT/ML metamodel must contain suitable modelling primitives to accommodate them. In other words, the IAT/ML metamodel must define primitives for ontological proxies as well as locutions, propositions and inferences. • Every discourse model must have an associated domain model. As we introduced above, a common domain model may be shared by multiple discourse models, but every discourse model must have one and only one domain model. • Each ontological proxy stands for one element in the associated domain model. By “stand for” here we mean that they can work as simpler replacements of the referred to domain elements, since both an ontological proxy and the associated domain element represent the same thing in the world. It is for this reason that they are called “proxies”. • Ontological proxies must be simpler than the associated domain elements; otherwise, there would be no point in using them. Also, and for the sake of modularity, ontological proxies must be as independent as possible from the modelling language employed to express the domain model. For these two reasons, ontological proxies must be lightweight and minimal. • Each ontological proxy may be referenced by multiple propositions. Actually, it is fragments of propositions what refer to ontological proxies, as highlighted in Fig. 5.4. Each proposition fragment that refers to an ontological proxy is called a denotation. These consequences have been used as design criteria to extend the IAT/ML metamodel and incorporate the necessary constructs to support ontological proxies. The following subsections describe these criteria and the associated implementation in greater detail. 5.3.1 IAT/ML Metamodel As described above, the IAT/ML metamodel must provide modelling primitives to express ontological proxies and denotations. Figure 5.5 shows the relevant part of the metamodel.
5 What Archaeological Texts Argue About: Denotations and Ontological Proxies 1 101 Model 1 Proposition 1 1..* Denotation 1 Range: 1 con TextRange Content: 1 Text 0..* Ontology Identifier: 1 Text Name: 1 Text 0..* OntologyElement (A) RefersTo 1 Target Identifier: 1 Text Fig. 5.5 Diagram depicting a section of the IAT/ML metamodel. Model on the top right refers to discourse models. Ontology refers to domain models (but see text below for details) According to the metamodel, every discourse model (simply called Model in Fig. 5.5) has an associated domain model (called Ontology in the figure). We said in previous sections that multiple discourse models can share a common domain model. However, the Ontology class in Fig. 5.5 does not represent domain models themselves, but the proxy image of a domain model that is kept by a discourse model. In other words, and from the perspective of a discourse model (Model in Fig. 5.5), Ontology represents a private and simplified copy of the associated ontology. Consequently, this relationship has been modelled as a one-to-one whole/part association. Furthermore, every private and simplified ontology contains a number of ontological proxies, called ontology elements in the metamodel. OntologyElement is an abstract class, as indicated by the “(A)” marker in Fig. 5.5. This means that it has a number of subtypes representing different kinds of ontology proxies, which we discuss below. Reading now from left to right in the diagram, every proposition has a number of denotations. A denotation is a fragment of a proposition that refers to an ontology element. The concept of denotation allows us to pick specific words or phrases in a proposition that clearly refer to an element in the ontology, such as “are historically important” in PR12 or “The 1997 Hedgerow Regulations” in PR14 in Fig. 5.4. Figure 5.6 depicts a sample instance model conforming to the metamodel in Fig.5.5. In the figure, the ontological proxies are the objects of type OntologyElement. These objects have an Identifier value whose contents match the identifiers of elements in the domain model. This matching relationship is what makes ontological proxies to work as, precisely, proxies. Note that, in the diagram, proxy relationships are shown as blue arrows between the associated elements, but they do not exist as formal relationships as such, since, as we explained above, the discourse and domain models are expressed using different languages. In any case, both human users of the models as well as computers processing them can easily find these matches and thus navigate the proxy relationships. As we said above, and as depicted in Fig. 5.5, OntologyElement is an abstract class and has a number of subtypes, corresponding to the different kinds of ontology elements that are common in domain models. Of course, there are many languages
102 C. Gonzalez-Perez LandOffMillLane: Model DN20: Denota!on PR10: Proposi!on Content = “The hedgerows bounding the site to the south-east are shown on the Charlestown Tithe Map.” Range = 0..48 Content = “The hedgerows bounding the site to the south-east” DN21: Denota!on Range = 67..87 Content = “Charlestown Tithe Map” RefersTo AT3: OntologyElement Target Identifier = “Hedgerows” Default: Ontology AT4: OntologyElement RefersTo Target Identifier = “TitheMap” discourse model domain model TitheMap: Map Name = “Charlestown Tithe Map” Hedgerows: Construc!veElement Location = SE HistoricallyImportant = true Fig. 5.6 Diagram depicting how ontological proxies work. Above the line, an instance model conforming to the metamodel in Fig. 5.5 is shown, stating that proposition PR10 has two denotations for “The hedgerows . . . ” and “Charlestown Tithe Map”. Each denotation refers to a particular ontological element of the discourse model’s associated domain model (ontology). Below the line, a fragment of the associated domain model from Fig. 5.3 is shown. Blue arrows across the line depict the fact that ontological elements work as proxies to elements in the domain model, as shown by the matching identifiers “Hedgerows” and “TitheMap” that one could use to express a domain model, so the IAT/ML metamodel must be generic enough as to cater for as many as possible. For this purpose, we decided to implement a small but varied range of subtypes of OntologyElement, which the design goal that at least languages such as ConML, OntoUML and OWL should be supported. Most conceptual modelling languages adopt an object-oriented approach and hence include primitives such as Class, Attribute, Object and Link. However, terminology varies between languages, and the specific semantics of the major primitives are also slightly different. Most languages, however, share the fact that they distinguish clearly between types and instances (or categories and entities, depending on the terminology used) as a major architectural principle around which their metamodels are organised. This means that ontological elements could also be organised along these lines. However, we felt that adopting a multilevel modelling approach (Atkinson & Kühne, 2001; Clark et al., 2014) would entail little extra complexity and provide a much richer and more expressive ontological infrastructure. Multilevel modelling allows chains of type/instance relationships of arbitrary length, thus enabling the homogeneous treatment of types and instances for many common purposes and supports higher-order types with a rather simple structure. For these reasons, we adopted the multilevel modelling principles sketched in (Almeida et al., 2018) and designed the OntologyElement subtype hierarchy shown in Fig. 5.7. The first subtype of OntologyElement is Entity, which represents things in the world such as the computer I am using, my house, the Second World War, or the 5/2016 Act on Cultural Heritage, for example. Anything in the world may be an entity. Entities are characterised through facets of two kinds: values and references. Values represent atomic qualities or quantities of entities, such as the fact that I am 53 years old or that the Second World War began in 1939. References, in turn,
5 What Archaeological Texts Argue About: Denotations and Ontological Proxies 103 OntologyElement (A) Identifier: 1 Text Kind RefersTo 0..* Opposite Reference 1 0..* 1 Entity 0..* Name: 1 Text Instantiability 0..* Value Facet (A) Kind 0..* Instance Instance Content: 1 Data 0..* [IsOfType] 0..1 1 1 Type 1 Category Feature (A) 0..* 0..* SubType isSubTypeOf Opposite SuperType 0..* Type [IsOfType] IsOfType IsOfType 0..* [IsOfType] Atom 1 Kind Property Name: 1 Text 1 1 Association RefersTo 0..* Fig. 5.7 Part of the IAT/ML metamodel showing the class hierarchy under OntologyElement. Please see the text below for a detailed description of each model element represent connections between entities, such as the fact that I (an entity) work at Incipit CSIC (another entity), or that the 5/2016 Act on Cultural Heritage (an entity) applies in Galicia, Spain (another entity). Entities come in two kinds, depending on whether or not they can be instantiated, as described in the multilevel modelling literature (Almeida et al., 2018; Clark et al., 2014). Some entities are not instantiable, that is, they cannot work as templates for other entities. These are called “particulars” (and sometimes “atoms”) in philosophy, “ur-elements” in mathematics, or “objects” in the object-oriented approach in software engineering. We call them atoms. Some examples of atoms include myself, the Second World War, or the 5/2016 Act on Cultural Heritage. Some other entities, as opposed to the previous, can be instantiable into other entities, working as templates for them, and usually corresponding to generic concepts or ideas. For example, the notion of Tree can be instantiated into individual trees, such as each of the trees I can see through the window as I type this sentence. Similarly, the notion of Person is instantiated into each individual person. These
104 C. Gonzalez-Perez instantiable entities are called “universals” in philosophy or “classes” in objectoriented software engineering. We call them categories. In general, we can say that every entity has a category as type, since, in the words of George Lakoff, “There is nothing more basic than categorization to our thought, perception, action, and speech” (Lakoff, 1990). For example, I am of the Person category, the Second World War is of the ArmedConflict category, and the 5/2016 Act on Cultural Heritage is of the Law category. In practice, and especially when constructing ontologies with some degree of uncertainty, we do not know or are not interested in the category of some entities, so specifying them is not mandatory. Now, since categories are also entities, they can have values and references. In addition, they can be characterised through two extra kinds of features: properties and associations. Properties define possible values of the entities of the category. For example, since every person has a value for their age, then we can capture this fact by stating that the Person category has an Age property. Similarly, associations define possible references of the entities of the category. For example, since every person has been born in a particular place, then we can capture this fact by stating that the Person category has a WasBornIn association towards the Place category. In this manner, the IAT/ML metamodel supports ontological proxies of six concrete kinds: atoms, values, references, categories, properties, and associations. Although some types of modelling primitives are not covered (such as OntoUML non sortals, for example), these six kinds map nicely to the major modelling primitives of almost any conceptual modelling language, as exemplified by Table 5.1. We must also remark that the notation used in Fig. 5.6 is convenient to visualise the details of the data structures implementing the models. However, we suggest a different notation for most practical purposes, which is shown in Fig. 5.8. The following sections provide guidance on how to find ontological proxies as well as some examples to illustrate how they can be used in practice. Table 5.1 Mappings between IAT/ML ontology element subtypes and modelling primitives of common conceptual modelling languages IAT/ML Atom Value Reference Category Property Association ConML Object Value Reference Class Attribute Semi-association OntoUML (Not supported) (Not supported) (Not supported) RigidSortal Property Relation OWL Individual DataProperty ObjectProperty Class (Handled through axioms) (Handled through axioms)
5 What Archaeological Texts Argue About: Denotations and Ontological Proxies discourse model Atom Hedgerows Atom TitheMap 105 domain model Hedgerows: ConstructiveElement Location = SE HistoricallyImportant = true TitheMap: Map Name = “Charlestown Tithe Map” Fig. 5.8 This depicts the same situation that was shown in Fig. 5.6, but using the IAT/ML notation introduced earlier plus some additional lines and symbols. Ellipses represent ontological proxies, that is, instances of OntologyElement in Fig. 5.7. Matching elements in the domain model are shown to the right 5.3.2 Constructing Ontological Proxies As we described in previous sections, ontological proxies are model elements. This means that they are mental constructs that adhere to a well-known formalism or modelling language. In this section we tackle the issue of how ontological proxies, as model elements, are constructed. As explained above, ontological proxies are referred to by fragments of propositions. In Fig. 5.8, for example, the fragments “The hedgerows bounding the site to the south-east” and “the Charlestown Tithe Map” are highlighted to indicate that they correspond to denotations, each of them referring to an ontological proxy. So, in order to determine what ontological proxies must be constructed for a given proposition, we must take into account the following guidelines. First, it is important to acknowledge that conceptual modelling is always done for a purpose, i.e. it is a situated activity driven by a goal. Two models of the same part of the world but pursuing different goals are likely to be very different. In addition, conceptual modelling, as a concept-creation process, is clearly dependent on subjective traits of the analyst such as academic and cultural background or personal preferences. Consequently, it is impossible to provide clear-cut rules as to how construct ontological proxies; only approximate guides can be offered. Having said this, it is safe to say that the process to construct ontological proxies is often driven by an examination of the lexicon and grammar employed by the proposition at hand, with the goal of answering the question “what is this sentence talking about?”. For example, in “The hedgerows bounding the site to the southeast are shown on the Charlestown Tithe Map” in Fig. 5.8, we can observe the following: • The subject “The hedgerows bounding the site to the south-east” refers to some hedgerows. • The verb “are shown in” indicates a representation relationship between these hedgerows and a map.
106 Hedgerows: ConstructiveElement Location = SE HistoricallyImportant = true C. Gonzalez-Perez Represents TitheMap: Map Name = “Charlestown Tithe Map” Fig. 5.9 Domain model depicting the observations made from the analysis of proposition PR10 in Fig. 5.8 • The complement “the Charlestown Tithe Map” refers to the medium supporting this representation. This means that the proposition contains three denotations, which in turn hint at three potential entities: some hedgerows, which we can conceptualise as a constructive element; a representation relationship; and a medium on which this representation is captured, i.e. the map. It also expresses connections between them: “The hedgerows . . . ” points at the thing being represented, and “the Charlestown Tithe Map” points at the thing doing the representation. We can depict this by the domain model in Fig. 5.9. Note that, in the domain model, we state that the map represents the hedgerows, rather than saying that the hedgerows are shown on the map, which would be closer to what the text literally says. Reconstructions like these, which do not alter the semantics significantly, can be safely done when domain modelling if they produce models that are clearer and easier to understand. In our case, we are using the Cultural Heritage Abstract Reference Model (CHARM) (Gonzalez-Perez et al., 2018; Gonzalez-Perez & Parcero Oubiña, 2011; Incipit, 2016) as guidance for domain modelling, so some category and association names are taken from it and adapted as necessary for better interoperability of the resulting model. The domain model, as is, contains two entities, named by the identifiers Hedgerows and TitheMap. Note also that we have chosen particular categories for these entities: Hedgerows is a ConstructiveElement and TitheMap is a Map. These categories are taken from CHARM, but other options may be also valid. For example, stating that the hedgerows referred to by proposition PR10 constitute a constructive element may not be shared by everyone, as it responds to a particular conceptualisation of the landscape. If we suspected that model users may not share this ontology, then we should rather employ a different category such as the noncommittal StructureEntity. Choosing the right category is not always easy, as often there is not much information in the text about what “right” means in this context. Using a domain-specific reference model or ontology, as we did with CHARM in this example, can be useful, as it provides a catalogue of common concepts in the domain to choose from. In this example, all the denotations refer to entities in the world, or atoms in our domain model. Other propositions may refer to other kinds of ontological elements, such as values or references. For example, PR12 in Fig. 5.4 states that “The hedgerows bounding the site to the south-east are historically important”. Here, the fragment “are historically important” can be interpreted as denoting a
5 What Archaeological Texts Argue About: Denotations and Ontological Proxies 107 value for the entity denoted by “Hedgerows”, namely the predication that they are historically important (represented by the HistoricallyImportant value in the figure). In general, proper nouns or qualified noun phrases, such as “the hedgerows” or “the 1997 Hedgerow Regulations” usually denote material or immaterial entities. Verbal phrases headed by dynamic verbs such as “excavate” or “reconstruct” (not in our example) usually denote processes or activities. Both can be modelled through Entity ontological proxies. Verbal phrases with stative or passive-mode verbs, such as “are shown on” or “state” often denote predications of values or references on the subject entity, which can be modelled through Value and Reference ontological proxies. Adjectival clauses such as “historically important” usually denote the content of values or references. A special mention should be made of phrases with the verb “to be”, as this verb may carry different meanings in many languages. In English, for example, “to be” may indicate either existence (e.g. “there is a site”), which would be modelled through an Entity; identity (e.g. “this area is the destination of mass migrations”), which can be also modelled as an Entity plus a Reference; predication (e.g. “the artefact is 12 cm long”), which is best modelled as a Value or a Reference; classification (“this is a post hole”), which can be modelled through an Entity and a Category; or subsumption (“a house is a structure”), which should be modelled through two related instances of Category (Gonzalez-Perez, 2018). Sentences containing “to be” must be carefully analysed. Not that this lexical and grammatical analysis of propositions allows us to define elements in a domain model, rather than the ontological proxies themselves. Ontological proxies, by definition, are lightweight replacements for elements in the domain model, so once this model has been created and is stable, an ontological proxy can be constructed for each model element. Going back to the example in Fig. 5.9, we would construct three ontological proxies: an Atom for Hedgerows, another Atom for TitheMap, and a Value for HistoricallyImportant = true. As we proceed to analyse more propositions in the same discourse, we would be adding to the domain model, or altering it to accommodate new elements. For example, it is likely that another proposition tells us something relevant to characterise the tithe map involved in the argumentation, or to locate the hedgerows in Fig. 5.9 in relation to the site, or even to add extra details to any of these entities. Conceptual modelling is usually an iterative and incremental task, which eventually converges to a stable resolution. 5.3.3 Examples of Use Let us look at some examples of ontological proxies in practice. Firstly, let us focus on the issue of how ontological proxies may help us to document particular interpretations of the discourse. Consider the following fragment:
108 C. Gonzalez-Perez Alice: The 5/2016 law says that you cannot build close to a protected site. Bob: But the law also says that I have the right to buy and possess any land. A first approach to analysing this fragment may interpret the exchange as a conflict, since “the law” in Bob’s line refers to the same thing as “The 5/2016 law” in Alice’s. In fact, the “But” lexical marker heading Bob’s retort is a usual indicator of conflict. This interpretation is captured by the models depicted in Fig. 5.10. However, an alternative interpretation is possible. The denotation “the law” in Bob’s line may refer to the general laws and regulations that apply, rather than the 5/2016 Heritage Act in particular. If this is the case, then Bob is saying that regulations, in general, allow you to buy and possess any land, which may not be a conflict with Alice’s proposition after all, as the 5/2016 Heritage Act could be making an exception to the general right to buy and possess land. This alternative interpretation is captured in Fig. 5.11. Here, two ontological proxies exist, capturing the facts that the 5/2016 Heritage Act is part of a larger set of overall regulations. Once this interpretation has been established, it is clear that there is no necessary conflict between propositions PR10 discourse model Atom HeritageAct5_2016 domain model HeritageAct5_2016: CompoundNorm Name = “5/2016 Heritage Act” Fig. 5.10 Discourse and domain models for the interpretation that “The 5/2016 law” and “the law” refer to the same thing discourse model Atom HeritageAct5_2016 Atom Regulations domain model HeritageAct5_2016: CompoundNorm Name = “5/2016 Heritage Act” Regulations: CompoundNorm Name = “Overall regulations” Fig. 5.11 Discourse and domain models for the interpretation that “The 5/2016 law” and “the law” refer to different but related things
5 What Archaeological Texts Argue About: Denotations and Ontological Proxies 109 and PR12, as shown. Note that, in the absence of ontological proxies, the two discourse diagrams (corresponding to the boxes displayed on a grid) from Figs. 5.10 and 5.11 would show different options but with no associated explanation. A reader of these models would find no information as why a conflict was or was not described between the propositions. Once we incorporate the ontological proxies, however, and even in the absence of the domain model, the interpretation of the discourse becomes clear. Let us now move to a different example and focus on how ontological proxies can work to assist in lexical/semantic studies. Consider the following text (Ruiz Mantilla, 2020): People tend to go down South, where there is wealth and work. And they expel the Muslim population. The North was hard, and they got rumours about Al Andalus being like an Eden. Here, two terms, “the South” and “Al Andalus”, are being used to refer to the same thing. This interpretation is shown in Fig. 5.12. First, note that propositions PR24 and PR26 use “South” or “the South” to refer to the southern region of Spain, whereas PR43 uses “Al Andalus” to refer to the discourse model Atom TheSouth domain model TheSouth: NonMaterialPlace Name = “The South”, “Al Andalus” Fig. 5.12 Discourse and domain models for the interpretation that “the South” and “Al Andalus” refer to the same thing
110 C. Gonzalez-Perez same place. This is interpretation is clearly documented by the single ontological proxy labelled TheSouth. Once this has been established, it is easy to see why PR43 works as a premise (together with PR30) for inference IN573 and leading to the conclusion PR24: living in the North was hard, and since people got rumours that Al Andalus was like an Eden, they moved there. This argument only makes sense if we assume that Al Andalus and the South are the same thing. Again, this assumption is clearly documented through ontological proxies and thus works as grounding to support inference IN573. Finally, let us consider how ontological proxies may be useful to intertextual studies. Consider the following fragments, taken from different documents (Angove, 2020; Historic Environment Service, Cornwall Council, 2012): Angove: The hedgerows bounding the site to the south-east are shown on the Charlestown Tithe Map. HESCC: The tithe map of 1842 illustrates how the settlement had expanded since the 1825 survey. Here, speakers Angove and HESCC (the Historical Environment Service of Cornwall Council) are not engaged in a dialog, and they may not even know about each other. But both are discussing the Charlestown Tithe Map, albeit by using different ways to denote it. Angove uses the denotation “the Charlestown Tithe Map” whereas HESCC uses “The tithe map”. Figure 5.13 depicts the models for both fragments. In this example, the denotation “the Charlestown Tithe Map” discourse model 1, as well as the denotation “The tithe map” in discourse model 2, point both to atoms labelled TitheMap. Furthermore, discourse model 2 contains a denotation pointing to a Year value for this atom, with contents “1842”. The domain model is shared between the two discourse models. In it, we can see a single object TitheMap with a subjectively marked value for the Year attribute, corresponding to the Year value in discourse model 2. The modelling of subjectivity is out of the scope of this chapter, but a brief introduction can be found in (Gonzalez-Perez, 2013). Essentially, the line starting with “Year” in the TitheMap box in the domain model stands for a value given to this object by a particular agent, in our case, HESCC, and which may not be shared by other agents. We chose to use the subjective marker to show that the 1842 attribution of the map is provided only by the second text, and not mentioned by the first one. In this manner, two discourse models that were in principle disconnected and structurally unrelated are now linked together through a common domain model that documents the associated speaker perspectives. This captures the fact that both discourses are referring to a common set of concepts in the world, namely the 1842 tithe map of Charlestown. This example only involves two discourse models, but this approach can be applied with any number of discourse models as long as all of them refer to a common set of things in the world.
5 What Archaeological Texts Argue About: Denotations and Ontological Proxies discourse model 1 111 domain model Atom Hedgerows Atom TitheMap TitheMap: Map Name = “Charlestown Tithe Map” Year $HESCC = 1842 Atom TitheMap Value Year = 1842 Atom Charlestown discourse model 2 Fig. 5.13 Discourse and domain models for the fragments above. Note that the two discourse models share a common domain model 5.4 Conclusions The previous sections have presented the notions of ontological proxy and denotation, and described how ontological proxies and denotations can be used to better express domain facts that are relevant to the discourse being analysed. Various aspects must be highlighted. Firstly, ontological proxies are independent of the specific languages or approaches that one employs for discourse or domain modelling. We have chosen IAT/ML and ConML, but ontological proxies do not rely on these choices. Rather, they are an abstract device that mediates between a discourse model and a domain model, whatever formalisms are used to express them. As we previously stated, the six concrete kinds of ontological proxies (atoms, values, references, categories, properties, and associations) map nicely to the major modelling primitives of almost any conceptual modelling language. Secondly, ontological proxies are part of the discourse model. This means that the discourse model is autonomous and does not need an accompanying domain model to stay expressive. In fact, we could remove the right-hand side in every figure in the previous section, and the diagrams would still be understandable. Of course, ontological proxies are proxies, and therefore lightweight, so they do not contain every detail that the full domain model can offer. This is especially clear, for example, in Fig. 5.13, where the fact that there is a subjective year attribution of the
112 C. Gonzalez-Perez Charlestown tithe map cannot be seen but in the domain model. Still, ontological proxies provide a good balance between expressiveness and conciseness, which arguably would minimise the need to retrieve and examine the domain model in most situations. In addition, the fact that the connections between discourse and domain models are established via lightweight elements acknowledges the principle of modularity that has been crucial in software engineering since at least the 1980s (Meyer, 1997). According to this principle, discourse and domain models are kept separate (they are different “modules”) but connected through few and weak links, namely, the mappings between ontological proxies and elements in the domain model. This allows each of these two artefacts to live separately, using whichever formalism is required for each one, but still be connected when needed. Another relevant issue is the fact of limited expressiveness. Since ontological proxies are simpler replacements for domain model elements, they are limited by how expressive the chosen modelling language is. In this chapter we have used ConML, which is capable, for example, of representing different subjective views on the same things, or temporal change, with minimum burden, as it provides specific mechanisms to do it. Not all modelling languages do this. If the chosen domain modelling language does not offer a similar mechanism to represent subjective views, for example, propositions such as “As opposed to the local government, tourists often think that the cathedral urgently needs repairs” would be difficult to analyse and express, as the opposed subjective views described by it could not be satisfactorily represented by any primitive in the language. In this regard, and despite the fact that ConML is highly expressive (Gonzalez-Perez, 2013), it still lacks support for irrealis modalities such as conditionals or imperatives, so ontological proxies for denotations using these modalities are difficult or impossible to represent properly. The theoretical proposal introduced in this chapter has been implemented in the LogosLink software tool, as previously mentioned, and has been applied to the analysis of selected texts from a corpus of over 800 articles on covid-19 from the Spanish edition of The Conversation (The Conversation, 2020). It is also being used to analyse a number of documents on archaeological sites related to Mansilla de la Sierra (La Rioja, Spain), the Portico of Glory at the Cathedral of Santiago de Compostela (Spain), and other areas. Future research directions include the following. The ConML language will be extended to support inequality predication, so that facts such as “The site is wider than 120 m” can be captured. Also, ConML will be extended to support various additional modalities such as deontic or hypothetical structures. This will allow domain models to become much richer and expressive, as described above. The subclasses of OntologyElement in IAT/ML will be extended likewise so that propositions containing constructs like these can be adequately linked to domain elements. Additional extensions will be made to allow denotations to refer not only to specific ontological proxies, but also to the changes associated to them. This will allow, for example, to cater for statements expressing persuasion or change of mind, such as “I was convinced that the cathedral was fine, but now I see that it needs some repairs”.
5 What Archaeological Texts Argue About: Denotations and Ontological Proxies 113 Finally, a comprehensive specification of IAT/ML, including a proper graphical notation, will be prepared and published. From the point of view of tool implementation, LogosLink will be updated with the new additions to IAT/ML, and support will be added for multi-model projects so that additional analytical options become possible, especially in relation to intertextual analysis. This material will be made available through the IAT/ML web site (Gonzalez-Perez, Pereira-Fariña & Calderon-Cerrato, 2021). References Almeida, J. P. A., Frank, U., & Kühne, T. (2018). Multi-level modelling (Dagstuhl seminar 17492). Wadern, Germany. https://doi.org/10.4230/DagRep.7.12.18. Angove, A. (2020). Land off Mill Lane, Charlestown, Cornwall – Archaeological Assessment. https://doi.org/10.5284/1084120. Atkinson, C., & Kühne, T. (2001). The essence of multilevel metamodelling. In «UML» 2001: Modeling languages, concepts and tools (Vol. 2185, pp. 19–33). Springer. Atkinson, C., Gutheil, M., & Kiko, K. (2006). On the relationship of ontologies and models. In Proceedings of the 2nd International workshop on meta-modelling (WoMM) (Vol. LNI 96, pp. 47–60). Centre for Argument Technology. (2018). Annotation guidelines for Inference Anchoring Theory (IAT) with support for Conventional Implicatures (CIs). [Online]. Available: https://typo.unikonstanz.de/add-up/wp-content/uploads/2018/04/IAT-CI-Guidelines.pdf Clark, T., Gonzalez-Perez, C., & Henderson-Sellers, B. (2014). A foundation for multi-level modelling. In C. Atkinson, G. Grossmann, T. Kühne, & J. de Lara (Eds.), Proceedings of the workshop on multi-level modelling co-located with ACM/IEEE 17th international conference on model driven engineering languages & systems (MoDELS 2014) (Vol. 1286, pp. 43–52). CEUR-WS.org. Gee, J. P. (2014). An introduction to discourse analysis: Theory and method. Routledge. Gonzalez-Perez, C. (2013). Modelling temporality and subjectivity in ConML. In R. Wieringa & S. Nurcan (Eds.), 7th IEEE International conference on research challenges in information science (RCIS 2013) (pp. 1–6). IEEE Computer Society. Gonzalez-Perez, C. (2017). How ontologies can help in software engineering. In J. Cunha, J. P. Fernandes, R. Lämmel, J. Saraiva, & V. Zaytsev (Eds.), Grand timely topics in software engineering (Vol. 10223: LNCS) (pp. 26–44). Springer. Gonzalez-Perez, C. (2018). Information modelling for archaeology and anthropology. Springer. Gonzalez-Perez, C. (2020, November). Connecting discourse and domain models in discourse analysis through ontological proxies. Electronics, 9(11), 1955. https://doi.org/10.3390/ electronics9111955 Gonzalez-Perez, C., & Parcero Oubiña, C. (2011). A conceptual model for cultural heritage definition and motivation. In M. Zhou, I. Romanowska, Z. Wu, P. Xu, & P. Verhagen (Eds.), Revive the past: Proceeding of the 39th conference on computer applications and quantitative methods in archaeology (pp. 234–244). Amsterdam University Press. Gonzalez-Perez, C., Pereira-Fariña, M., & Calderon-Cerrato, B. (2021). IAT/ML. http:// www.iatml.org/ Gonzalez-Perez, C., Martín-Rodilla, P. , & Pereira-Fariña, M. (2018). Computer-assisted analysis of combined argumentation and ontology in archaeological discourse. In 46th computer applications and quantitative methods in archaeology (CAA 2018). Tübingen. Henderson-Sellers, B. (2011). Bridging metamodels and ontologies in software engineering. Journal of Systems and Software, 84(2), 301–313. https://doi.org/10.1016/j.jss.2010.10.025
114 C. Gonzalez-Perez Historic Environment Service, Cornwall Council. (2012). Charlestown conservation area character appraisal & management plan. [Online]. Available: https://map.cornwall.gov.uk/ reports_conservation_areas/Charlestown.pdf Incipit. (2016). CHARM white paper. Incipit, CSIC. [Online]. Available: http:// www.charminfo.org/Resources/Technical.aspx Incipit. (2020). ConML technical specification. Incipit CSIC. [Online]. Available: http:// www.conml.org/Resources/TechSpec.aspx Janier, M., Aakhus, M., Budzynska, K., & Reed, C. (2016). Modeling argumentative activity with inference anchoring theory. In D. Mohhamed & M. Lewinski (Eds.), Argumentation and reasoned action. Volume I Proceedings of the 1st European conference on argumentation (Vol. 1, no. 62). College Publications. Lakoff, G. (1990). Women, fire, and dangerous things. University of Chicago Press. Mann, W. C., & Thompson, S. A. (1988). Rhetorical structure theory: Toward a functional theory of text organization. Text – Interdisciplinary Journal of Study Discourse, 8(3). https://doi.org/ 10.1515/text.1.1988.8.3.243 Meyer, B. (1997). Object-oriented software construction (2nd ed.). Prentice-Hall. OMG. (2017a). Unified modeling language 2.5.1. [Online]. Available: https://www.omg.org/spec/ UML/ OMG. (2017b). Unified modeling language. Object Management Group. Reed, C., & Budzynska, K. (2010). How dialogues create arguments. In ISSA Proceedings 2010. [Online]. Available: http://rozenbergquarterly.com/issa-proceedings-2010-howdialogues-create-arguments/ Ruiz Mantilla, J. (2020, February 23). Peridis: En comarcas de la montaña palentina nacen ya más osos que niños. El País. Suchánek, M. (2018). OntoUML specification. https://ontouml.readthedocs.io/. Accessed 9 Oct2020. The Conversation, Spanish Edition, 2020. https://theconversation.com/es. Accessed 16 Oct 2020. Wagemans, J. (2019). Four basic argument forms. Research in Language, 17(1), 57–69. https:// doi.org/10.2478/rela-2019-0005 Wagemans, J. (2020). Period table of arguments. https://periodic-table-of-arguments.org/. Accessed 16 Oct 2020. Wetzel, L. (2018). Types and tokens. In E. N. Zalta (Ed.), The Stanford encyclopedia of philosophy. Metaphysics Research Lab/Stanford University. World Wide Web Consortium. (2012). OWL 2 Web Ontology Language. World Wide Web Consortium. [Online]. Available: http://www.w3.org/TR/2012/REC-owl2-overview-20121211/
Chapter 6 The Social Production of Discourse in Archaeology Isto Huvila Abstract Archaeology is a profoundly social and collaborative enterprise. Even if it is a discipline of things, archaeology is also a discipline of discourses of things. The making of new archaeological information and knowledge both leans on and weaves a conversation of the past that is fundamentally as social as it is material. These conversations traverse an immense spectrum of archaeological practices and contexts far beyond archaeology itself. This chapter provides an overview of how discourses are produced in archaeology, their characteristics and contemporary facets, and how studying the social production of archaeological discourse(s) is helpful for understanding archaeology and archaeological knowledge. Discourse refers not only to talking or writing about archaeology but documenting, communicating and conveying archaeology, archaeological information and knowledge in diverse means, and by doing that, influencing archaeological practices and the production of archaeological knowledge. The chapter starts by asking where contemporary archaeological discourse is produced and continue to inquiring into who participates and who are left out, how to analyse and explain archaeological discourses, what characterises them, and finally, why understanding the social production of archaeological discourse can be useful for archaeologists and nonarchaeologists. 6.1 Introduction Archaeology is a profoundly social and collaborative enterprise. Even if it is a discipline of things (Olsen, 2012), archaeology is also a discipline of discourses of things. The making of new archaeological information and knowledge both leans on and weaves a conversation of the past that is fundamentally as social as it is material. I. Huvila (!) Department of ALM, Uppsala University, Uppsala, Sweden e-mail: isto.huvila@abm.uu.se © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 C. Gonzalez-Perez et al. (eds.), Discourse and Argumentation in Archaeology: Conceptual and Computational Approaches, Quantitative Archaeology and Archaeological Modelling, https://doi.org/10.1007/978-3-031-37156-1_6 115
116 I. Huvila These conversations traverse an immense spectrum of archaeological practices and contexts far beyond archaeology itself. This chapter aims to provide an overview of how discourses are produced in archaeology, their characteristics and contemporary facets, and how studying the social production of archaeological discourse(s) is helpful for understanding archaeology and archaeological knowledge. Here, as in the earlier chapters of this volume, discourse refers not only to talking or writing about archaeology but documenting, communicating and conveying archaeology, archaeological information and knowledge in diverse means, and by doing that, influencing archaeological practices and the production of archaeological knowledge. More precisely, the word discourse is used in the following in three different senses. Archaeological discourse in singularis refers to the entirety of how archaeology is discussed, communicated, talked and written about and conveyed in the society. As this entirety consists of multiple parallel, partly overlapping and in many cases conflicting ways of thinking and communicating about archaeology, the references to (archaeological) discourses in pluralis refer this myriad. Thirdly, throughout the article when discussing various discourse theories and notions like Smith’s authorised heritage discourse, the term discourse and discourses refer to the particular matters stipulated in the specific parts of the liteature. In this chapter, we start the inquiry into the social production of archaeological discourse and discourses by asking where contemporary archaeological discourse is produced and continue by inquiring into who participates and who are left out, how to analyse and explain archaeological discourses, what characterises them, and finally, why understanding the social production of archaeological discourse can be useful for archaeologists and non-archaeologists. 6.2 Approaching Archaeological Discourses 6.2.1 Whereabouts of Archaeological Discourses The extent and variety of archaeological work conducted in different countries around the world, the diversity of practices across the different branches of archaeology and archaeological and archaeology-inspired theorising and the broad implications of archaeological knowledge in the society has led to that archaeology is discussed in multiple languages in a large variety of scholarly, professional and non-professional outlets. Some areas of archaeology are more international than others but as Aitchison (2017) notes of archaeological practices, also archaeological knowledge and knowledge-making are characterised by a certain parochiality and proliferation of spaces and places where it is discussed rather than a cosmopolitanism of predominant discourses and venues that would be comparable, for instance, to sciences.
6 The Social Production of Discourse in Archaeology 117 Even if scholarly texts and academic publications carry considerable weight as principal means of conveying archaeological discourse, discussing archaeology is not confined to the literature. By quantity, the most extensive corpus of archaeological knowledge is inscribed in investigation reports. They are a key source of information about specific investigations and localities even if a single report seldom attracts an especially wide audience (Börjesson, 2015, 2016a, b). Simultaneously, they play a measurable role as a site of conveying and unfolding of how archaeological knowledge is discussed and deliberated. Especially recent research has also started to emphasise the role of oral and uninformal information exchange both in field (e.g. Morgan & Wright, 2018) and in general, in archaeological information work (Huvila, 2014). While archaeologists have always had their social networks, much of the contemporary quasi-formal professional and academic archaeological exchange and reflection takes place online in social media (Walker, 2014; Huvila, 2014; Kansa et al., 2011). Another significant context of negotiating and framing archaeological information and knowledge highlighted in the scholarship is administrative and policy documents, including heritage conventions and information standards (e.g. Börjesson, 2016a, b; Lafrenz Samuels, 2016; Enqvist, 2016). Their impact might be somewhat invisible but in practice, they play a prodigious role in steering both contemporary and future archaeological discourses (Lafrenz Samuels, 2016) by amplifying and silencing specific aspects and perspectives in archaeological discourse and consequently, what is known about and within archaeology. Besides professional and administrative outlets, popular and popular science media do often report archaeological research results and amateur archaeologists’ magazines, local archaeological and historical societies and recently, especially social media (e.g. García-Ceballos et al., 2021; Wakefield, 2020; Huvila, 2013) extends the reach and diversity of how and where archaeology is discussed and debated. The rapid and profound digitalisation of social exchange in society has broadened also the reach and diversity of both professional and public archaeological discourse. Some of the outlets are less obvious than others. Major social media sites such as Twitter and Facebook have developed to significant arenas for archaeology-related exchange (Huvila, 2013; Richardson, 2014; Wakefield, 2020). Also, digital games have developed to considerable sites of producing and conveying archaeological knowledge (Morgan, 2016). In a somewhat more confined sense, previously expensive technologies that have become available for both ordinary archaeologists and hobbyists have broadened the scope and reach of archaeological and archaeology-related social exchange. As, for instance, the proliferation of metal-detecting (Dobat et al., 2020) and amateur-use of Google Earth (Liang et al., 2018) for detecting archaeological sites evince, they can provide new opportunities for non-archaeologists to engage in archaeological and archaeology-related knowledge-making and emerge as new sites for diverse archaeological and archaeology-related discourses. Similarly, even if Wikipedia is hardly classifiable as an archaeological medium, its popularity as a source of colloquial information means that its significance for communicating archaeological knowledge is far from negligible (Grillo & Contreras, 2019). In the non-digital
118 I. Huvila contexts, museum exhibitions and major archaeological sites are obvious conveyors of archaeological knowledge but as Högberg’s (2012) discourse analytical study of information boards shows, even the physical world contains simultaneously obvious and invisible scenes where archaeological knowledge is communicated and formed. In summary, even if there are certain established venues where a substantial part of archaeological discourse takes place, the diversity of outlets where archaeology and archaeology-related matters are discussed in the contemporary society is significant. Less familiar sites and contexts are obviously easy to miss as well as those that are only emerging as arenas for archaeology-related exchange. 6.2.2 Discussants in Archaeological Discourses A pertinent follow-up question to where archaeological discourse takes place, is who participates in the discussion. While archaeologists certainly are both the loudest and most numerous contributors in the production of archaeological discourse, they are not the only ones to talk about archaeology (Hamilakis & Anagnostopoulos, 2009). Huvila and Huggett (2018) make a useful distinction between archaeological and archaeology-related practices by drawing a line between doings that pertain to archaeology-proper and diverse activities that are associated with archaeology, archaeological work and information. While archaeological practices range from professional and academic fieldwork to research, education and archaeological heritage administration, the latter extends to such areas as land development, tourism, popular education and history enactment. In the perimeters of archaeology, the scholarship has had an evident propensity to focus on education and non-professionals whereas professional archaeology-related discourses and actors, like land developers (Huvila, 2017b) or administrators (Huvila, 2016b) and their engagements with archaeology have not been discussed and problematised to a comparable degree. Laužikas et al. (2018) review archaeology-related communities and the degrees of their creolisation in relation to archaeology. Their study identifies ten peripheral spaces between archaeology and other domains including arts and design, travel and tourism, branding, crime, identity work, alternative archaeologies, museums and heritage, amateur-archaeology, education and public policy, and proposes a model for studying them. Even if none of these archaeology-related communities and discourses is novel per se, the upsurge of the digital sphere has revolutionised the volume, topographies and reach of especially non-professional participation in both archaeological and archaeology-related discourses. The studies conducted so far evince a large heterogeneity of means and modes of engaging with archaeology. Even if many archaeology-related communities and discourses concern local archaeology (Deeley et al., 2014), many of them have become ‘glocal’ in the sense that interest-driven communities traverse boundaries and unite groups across large geographical areas (Laužikas et al., 2018).
6 The Social Production of Discourse in Archaeology 119 As a whole, a glimpse to the popular and professional archaeological literature and discourse across the digital and non-digital outlets shows that the predominant voice in archaeological discourse is that of professional and academic archaeologists. A parallel closer look at the profilerating variety of emerging, especially digital venues where are archaeological matters are debated today evinces, however, of a fast widening and, as Laužikas et al. (2018) express it, crealisation of both archaeological discourse and its discussants. 6.2.3 Approaches to Analysing Discourses Multiple methods can be used to investigate and follow the social production of archaeological discourse. Much of the earlier work has been conceptual and theoretical. It has tended to follow the major shifts in archaeological theory and the philosophy of science (see e.g. Trigger, 1989; Hodder, 2001; Harris & Cipolla, 2017). Evidence-based research has been typically conducted using qualitative methods. Different varieties of discourse analysis and studies of knowledge-making alike are typically based on a close reading of written texts. Lucas (2019) highlights Hodder’s (1989) study of site reports and Tilley’s (1989a) analysis of Cambridge inaugural lectures as two key inquiries into archaeological textual discourses. Even if the study of archaeological discourses goes farther back in time (e.g. Gardin, 1967; Wylie, 1985; Barrett, 1988), the year 1989 with the texts of Hodder, Tilley and Fahnestock (1989) marks a certain beginning of a recognisable broader interest in empirical analysis of archaeological texts and discourse as a social phenomenon. This interest has since then broadened to a certain extent to cover other types of texts, media and discourses and approaches with a distinct focus, for instance, narratives, argumentation and discourse (for a partial overview, see Lucas, 2019). This chapter does not attempt to provide a systematic overview of all possible approaches that have been used to analyse (archaeological) discourses or would be useful to that end but by providing a selection of examples (Table 6.1), it provides a brief glimpse to the variety of alternatives. In archaeological context, Martín-Rodilla (2015) has analysed discourses in archaeological documents using Hobbs’ (1985) discourse-analytical approach to argumentation relations between clauses in text. Discourse analysis and close reading can be applied to other forms of texts as well. Huvila (2011) has used Laclau and Mouffe’s (2001) discourse theoretical approach to analyse interview records for identifying power dynamics and frictions between different ways of conceptualising the role and value of archaeological reports. This variant of discourse theory can help to understand, for example, how different expectations and aims of writing archaeological field reports clash and co-exist with each other in the documents, and how the presence of multiple discourses means leads not only conflicts and dissatisfaction to the usefulness of the documentation but also helps them to inform and be used by multiple communities with widely different needs and expectations (Huvila, 2011, 2012).
Deconstruction, internal logic of texts or discourses Subjectivity, four discourses (fundamental types) Narratives, forms of discourse, strategies of explanation Structuration (of e.g. discourse), role of language in structuration Dialogue, history of discourse, language, heteroglossia Representation of problems, problem representations, policy analysis, implications of representations to different groups Identity construction, discrimination, historical dimensions of discourse formation Quantitative analysis of publications, information and discourse Derrida Lacan White Giddens Bakhtin ‘What’s the Problem Represented to be’ Discourse Historical Approach Bibliometrics and informettics Visual discourse analysis Wetherell and Potter’s discourse analysis Body and discourse Fairclough (1992) Foucault (1979, 1998, 2002) Language as a social practice, power asymmetries Knowledge, power, social practices Body, embodiment Discursive psychology, interpretative repertoires Visuals, visual information, visual media Laclau and Mouffe (2001) Politics of discourses, hegemony, antagonism, articulations Coupland and Gwyn (2003) Borgman and Furner (2002) and Groth and Gurney (2010) Jancsary et al. (2016) and Albers (2013) Wetherell and Potter (1988) Reisigl and Wodak (2009) White (1975, 1987) Giddens (1984) Bakhtin (1981) Bacchi (2012) Derrida (1967) Lacan (1966) References Hobbs (1985) Key aspects Structure of discourse Approach Hobbs’ discourse-analytical approach Laclau and Mouffe’s discourse theory Critical Discourse Analysis Foucault Table 6.1 Examples of approaches to analysing discourses Goodwin (2003), Huvila (2019b), and Olsson (2016) Smith and Campbell (2017) Hutson (2002) and Jørgensen (2015) Smith (2012) and Enqvist (2014) Waterman (2014), Olsson (2016), and Bapty (2014) Bapty (2014) and Tilley (1994) Nordbladh and Yates (2014) and Shanks and Tilley (1988) Pluciennik (1999) Mizoguchi (1997) Joyce (2002) Huvila (2011) Examples of uses in archaeology-related contexts Martín-Rodilla (2015)
6 The Social Production of Discourse in Archaeology 121 Fairclough’s (1992) Critical Discourse Analysis has been used especially in critical heritage studies to inquire into archaeological heritage discourses (e.g. Smith, 2012; Enqvist, 2014) and how archaeologists and archaeology produces particulars understandings of ‘archaeological heritage’. In addition to specific discourse analytical frameworks, investigations into archaeological discourses (e.g. Hodder, 1989; Edgeworth, 1991; Bapty & Yates, 2014) have followed less specific content analytical approaches and historical method and found inspiration from several different discourse theorists including Foucault, Derrida, Lacan (incl. Foucault, 1979, 1998, 2002; Derrida, 1967; Lacan, 1966 see e.g. Bapty & Yates, 2014; Thomas, 1993; Smith, 2004), White (incl. White, 1975, 1987, see e.g. Pluciennik, 1999), Giddens (Giddens, 1984 e.g. in Mizoguchi, 1997), Bakhtin (e.g. Bakhtin, 1981 in Joyce, 2002) and others. Lucas (2019) recent contribution to the inquiry of the literary knowledge production in archaeology draws on a broad array of theorists in linguistics, composition studies and rhetorics. Examples of the variants of discourse analysis that have not made their way to the analysis of archaeological discourses but that could well be useful, are for instance, Bacchi’s (2012) ‘What’s the Problem Represented to be’ and the Discourse Historical Approach (Reisigl & Wodak, 2009). Bacchi’s method traces problems that are embedded in discourses and often remain unarticulated behind articulated solutions and priorities. Discourse Historical Approach builds on Critical Discourse Analysis, although emphasising that the interaction between discourse structures and social structures is mediated rather than determined by the latter. In addition to qualitative text analytical methods, discourses can be investigated also by using quantitative text analysis. For example, Jackson et al. (2020) have studied how archaeological texts conceptualise bone material suggesting that the two main categories of referring to bones is to discuss them as objects and as related to bodies. Apart from inquiring into texts, the social production of archaeological discourse can also be followed in other types of traces. In the literature, in parallel to text, discourses can be traced using bibliometric and informetric methods by investigating how authors cite different texts in their work (e.g. Hutson, 2002; Jørgensen, 2015). Discourse is also carried by images and can be approached using visual discourse analysis (Jancsary et al., 2016; Albers, 2013). Further, discourses and their related interpretative repertoires can also be uncovered in quantitative survey data by interpreting responses as indicative of different ways of thinking and talking about particular matters (e.g. Huvila, 2020b). Goodwin’s (2003) study of young archaeologists who are learning to excavate and interpret their findings showcase further how it is possible to study the social production of bodily discourses in action, and how it can provide critical insights in how archaeology is learned and how archaeological knowledge is produced in highly material terms. The material and bodily dimensions of archaeological discourse have been analysed further by others, drawing for instance from genre theory (Huvila, 2019a), and the work of Foucault and Fairclough (Olsson, 2016). As a whole, the brief summary of examples of discourse analytical and theoretical approaches and perspective in this section shows first and foremost the diversity and sheer number of alternatives how archaeological discourse and discourses can
122 I. Huvila be studied. Without having to say, there are many others worth considering that similarly to the ones discussed here, provide means to analyse the social production of both the broader archaeological discourse and the multiplicity of archaeological and archaeology-related discourses. 6.3 Characteristics of Archaeological Discourses After a brief survey of where archaeological discourses unfold and how they can be studied, in the following, we will proceed to provide a short exposé of three aspects of the social production of archaeological discourse that characterise the contemporary and past exchange of archaeological ideas and how the discourse unfolds in the social fabric of archaeological and archaeology-related practices. We will consider the social and societal underpinnings that influence archaeological discourse, the questions of power and mandate to make authoritative claims about archaeological matters, and finally the structural and infrastructural scaffolding that conveys and sustains archaeological knowledge work. 6.3.1 Social and Societal Underpinnings A key question in an attempt to approach the unfolding of archaeological discourse is to consider what drives archaeological knowledge production. On a broad societal level, the question is obviously about the general rationale of studying the human past. On the level of the formation of specific archaeological discourses, it is more of an issue of what drives the interest in particular issues about the past. Throughout the history of science, the development of scientific and scholarly disciplines has been explained to a varying degree by internal and external influences. Even if there is hardly a consensus, a popular tendency in the recent scholarship has been to espouse, in broad terms, a contextualist standpoint that emphasises the influence and interplay of both societal and intra-disciplinary influences (Brush, 1995; Schnapp, 2012; Salminen, 2020). For a contextualist, it is obvious that even if individual archaeologists and their personal interests and yearning for knowledge should not be dismissed as instigators of new archaeological knowledge (Farid, 2015), both the popular interest in archaeology (Trigger, 1995) and in many cases, glaringly political interests both have, and has had, a major impact on the discourse and production of new archaeological knowledge (e.g. Kohl & Fawcett, 1995a; Mizoguchi, 1997; Hegardt & Källén, 2011; Bernbeck, 2012). Apart from the interests that spur the perceived significance of archaeology and archaeological knowledge, there are multiple socio-structural factors within and in the close vicinity of archaeology that influence how the intra-disciplinary discourse unfolds. Archaeology is shaped by transdisciplinary influences from other scientific and scholarly fields. Even if there has always been exchange between
6 The Social Production of Discourse in Archaeology 123 scientific and scholarly fields (Díaz-Andreu & Coltofean-Arizancu, 2020), the recent rapid adaptation of scientific and data-intensive digital analysis methods to archaeological work have had a profound, arguably to a degree unprecedented, impact (Kristiansen, 2014a). However, in parallel to external influence, archaeology also is shaped by archaeology itself. The archetypal forms of archaeological work (Moser, 2007) and social and practical organisation archaeological practices shape and enable archaeological knowledge production (Shanks, 2012) together with paradigmatic theories (Kristiansen, 2014b) and the structures of organising and exhibiting archaeological knowledge (Coye, 2009) and information (Huvila, 2019c) in different forms and modalities. Even if archaeology has never been a solitary undertaking, discourse and knowledge production has been traditionally a concern of individual archaeologists who have directed fieldwork and research projects (Huvila, 2017a). Undoubtedly the most well-known approach to direct attention to the relevance of increasing the multivocality of archaeological discourse and knowledge production is the reflexive archaeology envisioned and developed by Hodder and colleagues (Hodder, 2000) that summons everyone wielding a trowel to take part in the reflexive practice (Berggren & Hodder, 2003) through participating both in the bodily discourse of hands-on practice (cf. Olsson, 2015; Huvila, 2019b) and ‘in conversations at the edge of the trench’ (Morgan & Wright, 2018, p. 146). Reflexive archaeology and how it has been applied and developed since the early 1990s deserves to be credited of directing focus on the fundamentally social nature of how archaeological discourse develops and archaeological knowledge unfolds but it evinces at the same time the multiplicity of challenges (Hamilakis, 1999) of how to maintain and capture it in its broad diversity. The growing number of public archaeology (e.g. Okamura & Matsuda, 2011; Wakefield, 2020; papers in Williams et al., 2019) and community archaeology initiatives (e.g. Miroff & Versaggi, 2020; Bromberg et al., 2017) provide parallel examples of engagements with non-archaeologists. The outcomes of individual endeavours can always be debated (Simpson & Williams, 2008; Emerson & Hoffman, 2019) and even if the most successful ones tend to have some rough edges as archaeological work in general (Silliman, 2018), there are signs that the archaeological community is taking significant steps towards increasing the multivocality of participation in archaeological discourse and knowledge production. Besides the social structures of archaeological work, collective society-level priorities and arrangements have an equally consequential albeit somewhat more indirect impact on archaeological discourse. Early archaeology and archaeological discourse have been characterised by antiquarian and national interests in the past and its material, primarily monumental, remains (Kohl & Fawcett, 1995b). Internationally, it became a public, societal matter first towards the mid and late twentieth century. The commercialisation of the sector and introduction of new public management principles in development-led archaeology from the 1990s onwards in a number of countries has contributed to further change and framing of archaeology as a business (Rostock, 2007) rather than as a branch of scholarship or public good. It has stirred up the critique of the increasing influence of profitoriented ideologies on archaeological work (e.g. Zorzin, 2015; Demoule, 2012)
124 I. Huvila and discourse (Smith, 2004). In parallel, even if the contemporary national political agendas are not always as blatant as they were, for instance, in the early twentieth century colonial Africa (e.g. Conde et al., 2016), national socialist Germany (Arnold & Hassmann, 1995) or Soviet Union (Shnirelman, 1995), there is a plenty of evidence (e.g. Gustafsson & Karlsson, 2011; Stylianou-Lambert & Bounia, 2016) of how the usefulness of archaeology has not escaped the attention of present-day politicians either. Besides public policy, the framing of archaeology in the popular debate, and not least in popular culture, has a continuing impact on archaeology and how archaeological discourse evolves (Matthews, 2004). It has both direct and indirect repercussions to funding, the perceived value of particular aspects of archaeological heritage, consequently to research agendas adopted in archaeology and not least to archaeologists’ self-understanding of their identity and role as professionals and cognitive authorities. In summary, even if archaeologists play a decisive role in the formation of archaeological discourse, professional and academic archaeology are not isolated from the society where the discourse takes place. Archaeology is used by extraarchaeological actors in the society as much as archaeological discourse as a whole and particular perspectives to archaeological and archaeology-related matters stem from the society where the discourse and discourses take place. 6.3.2 Power and Mandate There are many reasons why particular perspectives gain precedence in archaeological discourses and why certain discourses prevail and others pass away. The general societal and discursive power structures that are conventionally used to explain inequalities operate also in archaeology. What is important, however, is to be sensitive to their influence and lack of it, to avoid turning the critique itself to a structure that generates new kinds of inequities. A non-negligible reason why certain positions persist is the propensity of societal regimes – whether they are professional and academic disciplines, political ideologies or public authorities – to essentialise discourses. In the heritage field, one of the most prominent descriptions of how this can happen is Smith’s (2006) well-known analysis of how heritage itself can be seen as a discourse and how very a particular institutionalised authorised heritage discourse has become predominant in defining what heritage is and how it should be acted upon. In a comparable sense, there are authoritative archaeological discourses that are stipulating what pertains to archaeology and what remains outside of its perimeters. Enqvist’s (2016) study of archaeological heritage professionals’ framing of what counts as archaeological sites and monuments exemplifies the influence of their sayings and doings on how archaeology is understood in the society in practice. The mandate of making authoritative claims about archaeology and archaeological knowledge has been traditionally tightly intertwined to structural hierarchies of
6 The Social Production of Discourse in Archaeology 125 archaeological work. Not only the privilege to interpret and publish but also finds, documentation and entire sites have been routinely attributed to the individuals who direct excavations and fieldwork. In addition to the social nature of archaeological knowledge production, reflexive archaeology (Hodder, 2000) has also emphasised that knowledge-making should not be limited to encompass merely the post-survey phase interpretation of field documentation carried out by a field director or in some cases, a small group of senior archaeologists (cf. Bradley, 2003; Lucas, 2001; Tilley, 1989b). Even if archaeological knowledge-making has irrefutably become a more social and collective enterprise than before, many archaeologists still remain silent in the archaeological record (Lucas, 2001). The deliberate naming and non-naming of subjects is a powerful mechanism that includes and excludes. It helps archaeological information to traverse different archaeological and archaeology-related domains or discourses (Huvila, 2017a) and gives authority to ‘archaeological’ propositions and makes them harder to refute. These findings are similar to Enqvist’s (2016) observations of the Finnish authoritative heritage discourse where the social significance of managing archaeological heritage is depersonalised and reduced to obeying the Antiquities Act. At the same time, selective naming and non-naming do, however, also contribute to that ‘a very large group of anonymous and silent archaeologists’ still have no voice in archaeological discourse (Lucas, 2001, p. 12). Unsurprisingly, the dividing lines between silence and non-silence have tended to follow not only those of merit and experience but also those of gender, social status and origins (Berggren & Hodder, 2003) – the boundaries between privileged and disadvantaged. Feminist archaeology and feminist research on archaeology have demonstrated not only that archaeology like scientific and scholarly research in general has a long parallel history of being dominated by men and stereotypically masculine perspectives. Much of archaeological work has focused on public rather than domestic matters and projected assumptions of the prevalence of gender roles that dominated in the Western cultural hemisphere during the last couple of centuries in the past societies (Conkey, 2003). In parallel to pure gender-bias, feminist theorising has also directed attention to the general insensitivity to multiplicity – of accountabilities, scales and perspectives (Wylie, 2007) in archaeological discourse. Besides gender, this applies to indigenous (e.g. Marliac, 2005), popular culture (Holtorf, 2005), and in general nonpredominant and non-professional, perspectives and broader framing of research, for instance, concerning the mutual influence of micro- and macro-level factors and phenomena (Conkey, 2003). Western archaeologists have also been criticised for a tendency to orientalise (as for Said, 1979) non-Western views to archaeological heritage as exotic and less-creditable (Starzmann, 2012) and of neo-colonialism enacted through global appropriation of local heritage as world heritage (Stobiecka, 2020). Overall, it is easy to both exaggerate and underestimate the influence of power relations in the formation of discourses. The recent archaeological literature has taken significant steps towards disclosing predominant hierarchies, deconstructing traditional narratives of archaeology and the past, and identifying silences and
126 I. Huvila insensitivities. At the same time, it is apparent that the power structures and multiplicities highlighted in the contemporary debate are only examples – many of them flagrant ones – but still mere instances of the complex dynamics of how the mandate of having a say in archaeological matters affects archaeological knowledge production. 6.3.3 Structural and Infrastructural Scaffolding The previous sections provided a glimpse to how archaeological discourse is influenced by its underpinning intra-disciplinary and broader societal fluctuations and conscious and unconscious acts of seizing and maintaining control. At the same time, it is shaped by supporting and ‘scaffolding’ (Wylie, 2017) technical and material structures and infrastructures. This has become, perhaps especially apparent with the digitalisation of discursive infrastructures. Apart from enabling exchange, diverse digital and non-digital technologies influence how archaeologists and non-archaeologists alike can participate in archaeological discourses and knowledge-making. Critics have pointed attention to emerging digital inequalities between large well-funded and smaller archaeological projects (Watson, 2019; Chadwick, 2003) and warned of a substantial risk that new infrastructures reinforce existing detrimental hierarchies (Taylor & Gibson, 2017) and erect new unwanted ones instead of contributing to an advancement and quality of the archaeological enterprise as a whole. Besides having or not having access to particular technologies, the unfolding of archaeological discourse depends on their specific qualities and characteristics. Digital geographic data and 3D visualisations are probably the most intensely debated examples of contemporary technologies used by both professional archaeologists and others to communicate archaeology and to contribute to archaeological knowledge-making. Instead being neutral, they both have their own language (Manovich, 2001, also e.g. Cochrane & Russell, 2007) that influences interpretations (Copplestone & Dunne, 2017), the on-going discourse and sometimes are instrumental for introducing new ones. The same applies to documentation equipment. Moving from paper-based field documentation to digital does not influence only what is captured and how but also how the documented phenomena are framed and described (Huvila, 2019b). In parallel to technologies, the making of discourse is similarly shaped by different literary, non-literary (Bakhtin, 1982) and social genres (Miller, 1984) of their (re)presentation. Lucas (2019) proposes that instead of genres, it goes further back to text types – or, perhaps, in a broader sense, information, its function and especially structure. In a sociotechnical sense, borrowing from Pickering (1995), discourse can be described as unfolding in a ‘dance’ of agency between technologies, their users (Gunnarsson, 2020) and information bearers (Huvila, 2016a). Each of them puts each other to work (Huvila, 2018) to advance a particular
6 The Social Production of Discourse in Archaeology 127 set of goals and aspirations that are partly consciously instituted and partly inherent to the constituents of the discursive dance. Even if there is no real doubt about the critical influence of technologies and technical infrastructures on the unfolding of discourse, as Lucas (2019) remarks, the proliferation of new forms of media, technologies and genres of (re)presentation does not necessarily mean that types of writing (or discussing) archaeology would change to a radical extent. In parallel to following the social and societal base, power relations, and structural, infrastructural underpinnings of how archaeological discourse unfolds, it is equally important to follow the discourse itself and its forms. Archaeological discourse can still be enacted as a narrative, argumentation or persuasion, exposition or, for instance, conversation (Lucas, 2019) even if the genres, technologies and its language of expression would change shape and the societal and social premises of discussing archaeology would become different. 6.4 Understanding the Discursive Production of Archaeological Knowledge Matters After reviewing the constituents and some of the predominant characteristics of archaeological discourse, it is appropriate to summarise key implications of understanding how archaeological discourse comes into being. Theoretical and evidence-based exploration of the social production of archaeological discourse and its premises informs both archaeological inquiry itself and the making and use of archaeological information and knowledge in a broader and more indirect sense both within archaeology and neighbouring disciplines and contexts. First and fundamentally, a general understanding and empirical descriptions of the social enactment of archaeological discourse can help to understand archaeological knowledge production and to debunk inaccurate conceptions of how archaeological knowledge comes into being. A direct consequence of the social nature and multiplicity of communities and venues where archaeological discourse is enacted is that it is hardly justified to refer to archaeological discourse in the singular. There are multiple parallel, partly overlapping, national and thematic discourse communities in academic and professional archaeology alone (Venclova, 2007) with their own local knowledges and systems of knowing (Huvila, 2020a). While some archaeology-related discourses and practices have explicit and implicit influence on how archaeology is discussed and practised, many of them are excluded. In parallel to how individuals and groups of people are excluded from participating in the archaeological discourse by consciously and unconsciously silencing their voices, other voices and perspectives are suppressed by designating them non-archaeological or archaeologically less relevant. In this respect, an analytical distinction between archaeological and archaeology-related (as in Huvila & Huggett, 2018) should not be treated as a question of the value of knowledge and knowledge claims. Instead, the relation of discourses to archaeology and the
128 I. Huvila difference between the two could be seen in terms of centred sets as a question of how far a discourse is from the nucleus of ‘archaeology’ and other parallel epistemic cores (as in Huvila, 2019d), for instance, of entertainment, land development, identity-building or commercial interests. Second, a thorough understanding of the social production of archaeological discourse and discourses is a necessary precondition to any meaningful attempts to pluralise archaeological knowledge production. Decolonisation and engagement with the past and present social injustices (Starzmann, 2012) of archaeological practices and discourses is impossible without a comprehensive understanding of ideological and pragmatic dimensions of archaeological work (Schlanger, 2012), discursive mechanisms of colonisation, mechanisms of silencing, making invisible and taking over expropriating discourses. Understanding and acknowledging multiple perspectives and discourses is merely a starting point (Lucas, 2019) but a necessary premise of fostering meaningful techniques of participation, increased epistemic openness (Marila, 2020) and, for instance, envisioning, developing and using new tools together with local communities (Palmer, 2013) that do not backfire and lead to new inequalities. The efforts to pluralise archaeological discourse and studies following such attempts have simultaneously identified both parallel developments that go against its strive for broader participation in knowledgemaking across the spectrum of archaeological and creolised communities and the general difficulty of realising multivocality in practice. Attitudes are not changing fast if at all and not necessarily to directions that embrace multivocality as an ideal. The increasing professional and scholarly specialisation, adoption of increasingly complex technologies(Watson, 2019), excessive confidence in digital data (Bevan, 2015) and failure to understand and account for the contexts knowledge production (Huvila, 2019c) come with a real risk that it can be exceedingly difficult to participate in archaeological knowledge-making. Third, a comprehensive insight into the social production of archaeological discourse is of vital importance in understanding and developing archaeological information work. Without a thorough understanding of how archaeology is discussed, archaeological information remains difficult to find and use, and it is demanding to manage. Producing information and documentation that is easily understood by its actual and potential contemporary and future users is equally challenging, if possible at all. While some would deny the possibility to develop a general language for representing archaeological knowledge a lieu of what, for instance, Gardin’s (1980, 1999) logicist programme aimed at achieving (also e.g. Djindjian, 2004), even narrower standardisation and interoperability of information requires an in-depth understanding of how archaeological discourse unfolds and how it has had a strong tendency to resist settling for using shared concepts and terms (e.g. Pavel, 2010; Oikarinen & Kortelainen, 2013). So far, there is still relatively little empirical and theoretical work both from historical and contemporary perspectives on how archaeological materiality is translated into words (Lucas, 2019) and other conveyers of information. Salminen’s (2020) critique that historical research has been overtly concerned with political influences and neglected the role of individual researchers in their contemporary contexts applies also to a certain
6 The Social Production of Discourse in Archaeology 129 extent to the literature on the production of contemporary archaeological practices and discourses. Fourth and finally, a better understanding of the social production of archaeological discourse is necessary for understanding and increasing its diversity. Using the discourse analytic concept of Wetherell and Potter (1988), there are multiple interpretative repertoires of how archaeology and archaeological matters are conceived by archaeologists and non-archaeologists across the broad variety of archaeologyrelated (e.g. those in Laužikas et al., 2018, and others) and archaeological local and global communities. They all come with their own arrangements or regimes of truth (Foucault, 1975), value (Boltanski & Thévenot, 2006) and what counts as information (Ekbia & Evans, 2009). Differences and contradictions do not mean that any particular perspective would be inherently or necessarily wrong but they rather bestow an epistemic responsibility to inquire into them, their internal and external plausibility and implications, and how and where they emerge. As Lucas (2019) reminds, the tendency to contrast (Western) scientific to other discourses – especially indigenous ones – risks to lump together widely different epistemic cultures, reinforce old stereotypical dichotomies and push towards a compelled choice between archaeology and ‘non-archaeology’ not as a mere analytical division but as a distinction of legitimacy. Lucas (2019) seconds Sillitoe’s (2002) proposal to focus on epistemic practices, specific problems and issues as means to elicit dialogue and collaboration instead of underlining premisory epistemological differences. Instead of treating differences as a question of demarcations (Sommerlund, 2002) and ending up engaging in mere face-work (Clauss, 2016), a more productive approach could be to focus on distance and proximities between communities (Huvila, 2020a), their practices and matters of concern, what they perceive as real options, and most importantly, what implications and concrete consequences the different discourses have. 6.5 Conclusions There are several key takeaways from a review of the social production of archaeological discourse to consider. Even if archaeology and archaeological discourse are social and enacted in a thick of a large number of overlapping, parallel and often distant communities, it does not mean that archaeological discourse would necessarily be inclusive or dialogic. Understanding that it is the first necessary step to that direction similarly to how it is a precondition to understanding how archaeological knowledge unfolds, to advancing archaeological information work, developing better and more meaningful tools, documentation and infrastructures for archaeological knowledge production, and engaging with the broad and diverse archaeological field of discursivity. A popular contemporary suggestion for academics and professionals is to reach out and engage in the discourse outside of archaeology proper. The relevance of participating in public archaeology-related discourse in social media, and for instance,
130 I. Huvila to contribute to Wikipedia has been broadly acknowledged (e.g. Harding, 2007; Scherzler, 2010) but at the same time embraced by relatively few in archaeology and scholarly and professional community as a whole (e.g. AtallahBidart, 2020; Cyron, 2017). Acknowledging and understanding that archaeological discourse is social does not, however, mean that every archaeologist would need to do outreach in every conceivable community in person. Sometimes the expectations of community participation can be over-dimensioned (Chirikure et al., 2010). What is probably more important, is to be knowledgeable of on-going discourse, to able to position oneself – in an anthropological sense (cf. Hamilakis & Anagnostopoulos, 2009) – in the thick of archaeological things, and that the field of discursivity itself is inclusive, open and dialogic not only for an archaeological discourse, but discourses in the plural. References Aitchison, K. (2017). On the outside looking in: What will Brexit mean for European archaeology? The Historic Environment, 8(3), 194–198. Albers, P. (2013). Visual discourse analysis. In P. Albers, T. Holbrook, & A. Flint (Eds.), New methods of literacy research (p. 8). Routledge. Arnold, B., & Hassmann, H. (1995). Archaeology in Nazi Germany: The legacy of the Faustian bargain. In P. L. Kohl & C. P. Fawcett (Eds.), Nationalism, politics, and the practice of archaeology. Cambridge University Press. AtallahBidart, S. (2020). Collaborer sur wikipédia pour co-construire une société de la connaissance. Revue française des sciences de l’information et de la communication, 20. Bacchi, C. (2012). Introducing the ‘what’s the problem represented to be?’ Approach. In A. Bletsas & C. Beasley (Eds.), Engaging with Carol Bacchi: Strategic interventions and exchanges (pp. 21–24). University of Adelaide Press. Bakhtin, M. M. (1981). Dialogic imagination: Four essays. University of Texas Press. Bakhtin, M. M. (1982). L’oeuvre de François Rabelais et la culture populaire au Moyen Age et sous la Renaissance. Gallimard. Bapty, I. (2014). Nietzsche, Derrida and Foucault: Re-excavating the meaning of archaeology. In I. Bapty & T. Yates (Eds.), Archaeology after structuralism: Post structuralism and the practice of archaeology (pp. 214–276). Routledge. Bapty, I., & Yates, T. (Eds.). (2014). Archaeology after structuralism: Post structuralism and the practice of archaeology. Routledge. Barrett, J. C. (1988). Fields of discourse: Reconstituting a social archaeology. Critique of Anthropology, 7(3), 5–16. Berggren, A., & Hodder, I. (2003). Social Practice, Method, and Some Problems of Field Archaeology. American Antiquity, 68(3), 421–434. http://www.jstor.org/stable/3557102 Bernbeck, R. (2012). The political dimension of archaeological practices. In D. T. Potts (Ed.), A companion to the archaeology of the ancient Near East (pp. 87–105). Wiley-Blackwell. Bevan, A. (2015). The data deluge. Antiquity, 89(348), 1473–1484. Boltanski, L., & Thévenot, L. (2006). On justification. Princeton University Press. Borgman, C. L., & Furner, J. (2002). Scholarly communication and bibliometrics. ARIST, 36(1), 2–72. Börjesson, L. (2015). Grey literature – Grey sources? Nuancing the view on professional documentation: The case of Swedish archaeology. Journal of Documentation, 71(6), 1158– 1182.
6 The Social Production of Discourse in Archaeology 131 Börjesson, L. (2016a). Beyond information policy: Conflicting documentation ideals in extraacademic knowledge making practices. Journal of Documentation, 72(4), 674–695. Börjesson, L. (2016b). Research outside academia? An analysis of resources in extra-academic report writing. In Proceedings of the 2016 ASIS&T annual meeting, Copenhagen (pp. 1–10). Bradley, R. (2003). Seeing things: Perception, experience and the constraints of excavation. Journal of Social Archaeology, 3(2), 151–168. http://jsa.sagepub.com/content/3/2/151.abstract Bromberg, F., Cressey, P., Fesler, G., Nasca, P., & Reeder, R. (2017). We dig Alexandria: A reflection on more than fifty years of community archaeology. In Urban archaeology, municipal government and local planning (pp. 203–225). Springer. Brush, S. G. (1995). Scientists as historians. Osiris, 10(1), 214–231. Chadwick, A. (2003). Post-processualism, professionalization and archaeological methodologies. Towards reflective and radical practice. Archaeological Dialogues, 10(1), 97–117. Chirikure, S., Manyanga, M., Ndoro, W., & Pwiti, G. (2010). Unfulfilled promises? Heritage management and community participation at some of Africa’s cultural heritage sites. International Journal of Heritage Studies, 16(1–2), 30–44. Clauss, L. R. (2016). Betwixt and between: Archaeology’s liminality and activism’s transformative promise. In S. Atalay (Ed.), Transforming archaeology (pp. 29–44). Routledge. Cochrane, A., & Russell, I. (2007). Visualizing archaeologies: A manifesto. Cambridge Archaeological Journal, 17(01), 3–19. Conde, P., Senna-Martínez, J. C., & Martins, A. C. (2016). Archeological connections: Tracking and tracing international relations throughout Portuguese colonialism. In G. Delley, M. Díaz-Andreu, F. Djindjian, V. M. Fernández, A. Guidi, & M.-A. Kaeser (Eds.), History of archaeology: International perspectives (pp. 51–62). Archaeopress. Conkey, M. W. (2003). Has feminism changed archaeology? Signs, 28(3), 867–880. Copplestone, T., & Dunne, D. (2017). Digital media, creativity, narrative structure and heritage. Internet Archaeology, 44. Coupland, J., & Gwyn, R. (Eds.). (2003). Discourse, the body, and identity. Palgrave Macmillan. Coye, N. (2009). Collections, musées, paysages. Les Nouvelles de l’archéologie, 117, 3–5. Cyron, M. (2017). Wikipedia. macht. archäologie. Archäologische Informationen, 40. Archäologische Informationen. Deeley, K., Pruitt, B., Skolnik, B. A., & Leone, M. P. (2014). Local discourses in archaeology. In C. Smith (Ed.), Encyclopedia of global archaeology (pp. 4540–4545). Springer. https://doi.org/ 10.1007/978-1-4419-0465-2_1556 Demoule, J.-P. (2012). Rescue archaeology: A European view. Annual Review of Anthropology, 41, 611–626. Derrida, J. (1967). De la grammatologie. Les Éditions de Minuit. Díaz-Andreu, M., , & Coltofean-Arizancu, L. (2020). Interdisciplinarity in archaeology – A historical introduction. In L. Coltofean-Arizancu, & M. D.-A. García (Eds.) Interdisciplinarity and archaeology: Scientific interactions in nineteenth- and twentieth-century archaeology, (pp. 1–21). : Oxbow. Djindjian, F. (2004). La publication scientifique en langue naturelle est-elle en archéologie un discours logique? Essai de conception d´un langage cognitif d´aide á la publication. Archeologia e calcolatori, 15, 51–61. Dobat, A. S., Deckers, P., Heeren, S., Lewis, M., Thomas, S., & Wessman, A. (2020). Towards a cooperative approach to hobby metal detecting: The European public finds recording network (EPFRN) vision statement. European Journal of Archaeology, 23(2), 272–292. Edgeworth, M. (1991). The act of discovery: An ethnograpby of the subject-object relation in archaeological practice. Ph.D. thesis, University of Durham. Ekbia, H. R., & Evans, T. P. (2009). Regimes of information: Land use, management, and policy. The Information Society, 25(5), 328–343. Emerson, P., & Hoffman, N. (2019). Technical, political, and social issues in archaeological collections data management. Advances in Archaeological Practice, 7(3), 258–266. Enqvist, J. (2014). The new heritage: A missing link between Finnish archaeology and contemporary society? Fennoscandia Archaeologica, XXXI, 101–123.
132 I. Huvila Enqvist, J. (2016). Suojellut muistot: Arkeologisen perinnön hallinnan kieli, käsitteet ja ideologia. Doctoral dissertation, University of Helsinki. Fahnestock, J. (1989). Arguing in different forums: The bering crossover controversy. Science, Technology & Human Values, 14(1), 26–42. Fairclough, N. (1992). Discourse and social change. Polity. Farid, S. (2015). ‘Proportional representation’: Multiple voices in archaeological interpretation at Ç atalhöyük. In R. Chapman & A. Wylie (Eds.), Material evidence: Learning from archaeological practice (pp. 59–78). Routledge. Foucault, M. (1975). Surveiller et punir, naissance de la prison. Gallimard. Foucault, M. (1979). My body, this paper, this fire. Oxford Literary Review, 4(1), 9–28. Foucault, M. (1998). What is an author? In J. D. Faubion (Ed.), Aesthetics, method and epistemology (pp. 205–222). The New Press. Foucault, M. (2002). The archeology of knowledge. Routledge. L’Archeologie du savoir first published 1969 by Editions Gallimard. García-Ceballos, S., Rivero, P., Molina-Puche, S., & Navarro-Neri, I. (2021). Educommunication and archaeological heritage in Italy and Spain: An analysis of institutions’ use of Twitter, sustainability, and citizen participation. Sustainability, 13(4), 1602. Gardin, J. C. (1967). Methods for the descriptive analysis of archaeological material. American Antiquity, 32(1), 13–30. Gardin, J.-C. (1980). Archaeological constructs: An aspect of theoretical archaeology. Cambridge University Press. Gardin, J.-C. (1999). Archéologie, formalisation et sciences sociales. Sociologie et sociétés, 31(1), 119–127. http://www.erudit.org/revue/socsoc/1999/v31/n1/001282ar.pdf Giddens, A. (1984). The constitution of society: Outline of the theory of structuration. Polity. Goodwin, C. (2003). The Body in Action. In J. Coupland, & G. Richard (Eds.) Discourse, the body, and identity. Palgrave Macmillan. http://site.ebrary.com/id/10076971 Grillo, K. M., & Contreras, D. A. (2019). Public archaeology’s mammoth in the room: Engaging wikipedia as a tool for teaching and outreach. Advances in Archaeological Practice, 7(4), 435– 442. Groth, P., & Gurney, T. (2010). Studying scientific discourse on the web using bibliometrics: A chemistry blogging case study. In Proceedings of the WebSci10: extending the frontiers of society on-line. Web Science Trust. Gunnarsson, F. (2020). Digitalisation and its impact on archaeological knowledge production. In J. Hansson & J. Svensson (Eds.), Doing digital humanities: Concepts, approaches, cases (pp. 27–44). Linnaeus University Press. Gustafsson, A., & Karlsson, H. (2011). A spectre is haunting swedish archaeology – The spectre of politics: Archaeology, cultural heritage and the present political situation in sweden. Current Swedish Archaeology, 19(1), 11–36. Hamilakis, Y. (1999). La trahison des archeologues? Archaeological practice as intellectual activity in postmodernity. Journal of Mediterranean Archaeology, 12(1), 60–79. Hamilakis, Y., & Anagnostopoulos, A. (2009). What is archaeological ethnography? Public Archaeology, 8(2–3), 65–87. Harding, A. (2007). Communication in Archaeology. European Journal of Archaeology, 10(2–3), 119–133. http://eja.sagepub.com/cgi/content/abstract/10/2-3/119 Harris, O. J., & Cipolla, C. (2017). Archaeological theory in the new millennium. Routledge. Hegardt, J., & Källén, A. (2011). Being through the past: Reflections on swedish archaeology and heritage management. In L. R. Lozny (Ed.), Comparative archaeologies (pp. 109–135). Springer. Hobbs, J. R. (1985). On the coherence and structure of discourse. Technical report, Center for the Study of Language and Information (CSLI). Hodder, I. (1989). Writing archaeology: Site reports in context. Antiquity, 63(239), 268–274. Hodder, I. (2000). Towards reflexive method in archaeology: The example at Çatalhöyük. McDonald Institute for Archaeological Research. Hodder, I. (Ed.). (2001). Archaeological theory today. Polity.
6 The Social Production of Discourse in Archaeology 133 Högberg, A. (2012). The voice of the authorized heritage discourse: A critical analysis of signs at ancient monuments in Skåne, Southern Sweden. Current Swedish Archaeology, 20, 131–167. http://www.arkeologiskasamfundet.se/csa/ Holtorf, C. (2005). Beyond crusades: How (not) to engage with alternative archaeologies. World Archaeology, 37(4), 544–551. Hutson, S. R. (2002). Gendered citation practices in american antiquity and other archaeology journals. American Antiquity, 67(2), 331–342. http://www.jstor.org/stable/2694570 Huvila, I. (2011). The politics of boundary objects: Hegemonic interventions and the making of a document. Journal of the Association for Information Science and Technology, 62(12), 2528– 2539. Huvila, I. (2012). Authorship and documentary boundary objects. In 45th Hawaii international conference on system science (HICSS) (pp. 1636–1645). IEEE Computer Society. Huvila, I. (2013). Engagement has its consequences: The emergence of the representations of archaeology in social media. Archäologische Informationen, 36, 21–30. Huvila, I. (2014). Archaeologists and their information sources. In I. Huvila (Ed.), Perspectives to archaeological information in the digital society (pp. 25–54). Department of ALM, Uppsala University. Huvila, I. (2016a). Awkwardness of becoming a boundary object: Mangle and materialities of reports, documentation data and the archaeological work. The Information Society, 32(4), 280– 297. Huvila, I. (2016b). ‘If we just knew who should do it’, or the social organization of the archiving of archaeology in Sweden. Information Research, 21(2), Paper 713. http://www.informationr.net/ ir/21-2/paper713.html Huvila, I. (2017a). Archaeology of no names? The social productivity of anonymity in the archaeological information process. ephemera, 17(2), 351–376. Huvila, I. (2017b). Land developers and archaeological information. Open Information Science, 1(1), 71–90. Huvila, I. (2018). Putting to (information) work: A Stengersian perspective on how information technologies and people influence information practices. The Information Society, 34(4), 229– 243. Huvila, I. (2019a). Genres and situational appropriation of information. Journal of Documentation, 75(6), 1503–1515. Huvila, I. (2019b). Learning to work between information infrastructures. Information Research, 24(2), paper 819. http://www.informationr.net/ir/24-2/paper819.html Huvila, I. (2019c). Management of archaeological information and knowledge in digital environment. In M. Handzic (Ed.), Knowledge management, arts and humanities (pp. 147–169). Springer. Huvila, I. (2019d). Rethinking context in information research: Bounded versus centred sets. Information Research, 24(4), paper colis1912. http://www.informationr.net/ir/24-4/colis/ colis1912.html Huvila, I. (2020a). Information-making-related information needs and the credibility of information. Information Research, 25(4), paper isic2002. http://informationr.net/ir/25-4/isic2020/ isic2002.html Huvila, I. (2020b). Librarians on user participation in five european countries/perspectives de bibliothécaires sur la participation des utilisateurs dans cinq pays européens. Canadian Journal of Information and Library Science, 43(2), 127–157. Huvila, I., & Huggett, J. (2018). Archaeological practices, knowledge work and digitalisation. Journal of Computer Applications in Archaeology, 1(1), 88–100. Jackson, S. E., Richissin, C. E., McCabe, E. E., & Lee, J. J. (2020). Data-informed tools for archaeological reflexivity: Examining the substance of bone through a meta-analysis of academic texts. Internet Archaeology, 55. Jancsary, D., Höllerer, M. A., & Meyer, R. E. (2016). Critical analysis of visual and multimodal texts. In Methods of critical discourse studies (pp. 180–204). SAGE.
134 I. Huvila Jørgensen, E. K. (2015). Typifying scientific output: A bibliometric analysis of archaeological publishing across the science/humanities spectrum (2009–2013). Danish Journal of Archaeology, 4(2), 125–139. Joyce, R. A. (2002). The languages of archaeology: Dialogue, narrative, and writing. Blackwell. Kansa, E. C., Kansa, S. W., & Watrall, E. (Eds.). (2011). Archaeology 2.0: New approaches to communication and collaboration. Cotsen Institute of Archaeology, UC Los Angeles. Kohl, P. L., & Fawcett, C. (1995a). Archaeology in the service of the state: Theoretical considerations. In P. L. Kohl & C. P. Fawcett (Eds.), Nationalism, politics, and the practice of archaeology (pp. 3–18). Cambridge University Press. Kohl, P. L., & Fawcett, C. P. (Eds.). (1995b). Nationalism, politics, and the practice of archaeology. Cambridge University Press. Kristiansen, K. (2014a). Towards a new paradigm? The third science revolution and its possible consequences in archaeology. Current Swedish Archaeology, 22, 11–34. Kristiansen, K. (2014b). What is in a paradigm? Reply to comments. Current Swedish Archaeology, 22, 65–71. Lacan, J. (1966). Écrits. Éditions de Seuil. Laclau, E., & Mouffe, C. (2001). Hegemony and socialist strategy: Towards a radical democratic politics (2nd ed.). Verso. Lafrenz Samuels, K. (2016). Transnational turns for archaeological heritage: From conservation to development, governments to governance. Journal of Field Archaeology, 41(3), 355–367. Laužikas, R., Dallas, C., Thomas, S., Kelpšienė, I., Huvila, I., Luengo, P., Nobre, H., Toumpouri, M., & Vaitkevičius, V. (2018). Archaeological knowledge production and global communities: Boundaries and structure of the field. Open Archaeology, 4(1), 350–364. Liang, J., Gong, J., & Li, W. (2018). Applications and impacts of Google Earth: A decadal review (2006–2016). ISPRS Journal of Photogrammetry and Remote Sensing, 146, 91–107. Lucas, G. (2001). Critical approaches to fieldwork contemporary and historical archaeological practice. Routledge. Lucas, G. (2019). Writing the past: Knowledge and literary production in archaeology. Routledge. Manovich, L. (2001). The language of new media. MIT Press. Marila, M. (2020). Introductory notes to a speculative epistemology of archaeology. phdthesis, University of Helsinki. Marliac, A. (2005). Scientific discourse and local discourses: The case of African archaeology. International Journal of Historical Archaeology, 9(1), 57–70. Martín-Rodilla, P. (2015). An empirical approach to the analysis of archaeological discourse. In A. Traviglia (Ed.), Across space and time: Papers from the 41st conference on computer applications and quantitative methods in archaeology, Perth, 25–28 March 2013 (pp. 319– 325). Amsterdam University Press. Matthews, C. N. (2004). Public significance and imagined archaeologists: Authoring pasts in context. International Journal of Historical Archaeology, 8, 1–25. Miller, C. R. (1984). Genre as social action. The Quarterly Journal of Speech, 70(2), 151–167. Miroff, L. E., & Versaggi, N. M. (2020). Community archaeology at the trowel’s edge. Advances in Archaeological Practice, 1–11. Mizoguchi, K. (1997). The reproduction of archaeological discourse: The case of Japan. Journal of European Archaeology, 5(2), 149–165. Morgan, C. (2016). Video games and archaeology. SAA Archaeological Record, 16(5), 9–10. Morgan, C., & Wright, H. (2018). Pencils and pixels: Drawing and digital media in archaeological field recording. Journal of Field Archaeology, 43(2), 136–151. Moser, S. (2007). On disciplinary culture: Archaeology as fieldwork and its gendered associations. Journal of Archaeological Method and Theory, 14(3), 235–263. Nordbladh, J., & Yates, T. (2014). This perfect body, this virgin text: Between sex and gender in archaeology. In I. Bapty & T. Yates (Eds.), Archaeology after structuralism: Post structuralism and the practice of archaeology (pp. 222–237). Routledge. Oikarinen, T., & Kortelainen, T. (2013). Challenges of diversity, consistency, and globality in indexing of local archeological artifacts. Knowledge Organization, 40(2), 123–135.
6 The Social Production of Discourse in Archaeology 135 Okamura, K., & Matsuda, A. (2011). New perspectives in global public archaeology. Springer. Olsen, B. (2012). Archaeology the discipline of things. University of California Press. Olsson, M. (2015). Making sense of the past: The information practices of field archaeologists. In Presentation at the i3 conference, Aberdeen, Scotland. Olsson, M. (2016). Making sense of the past: The embodied information practices of field archaeologists. Journal of Information Science, 42(3), 410–419. Palmer, M. H. (2013). (In)digitizing Cá uigú historical geographies: Technoscience as a postcolonial discourse. In A. Lünen & C. Travis (Eds.), History and GIS: Epistemologies, considerations and reflections (pp. 39–58). Springer. Pavel, C. (2010). Describing and interpreting the past: European and American approaches to the written record of the excavation. Editura Universitatii din Bucuresti. Pickering, A. (1995). The mangle of practice: Time, agency, and science. University of Chicago Press. Pluciennik, M. (1999). Archaeological narratives and other ways of telling. Current Anthropology, 40(5), 653–678. Reisigl, M., & Wodak, R. (2009). The discourse-historical approach (DHA). In R. Wodak & M. Meyer (Eds.), Methods of critical discourse studies (2nd ed., pp. 87–121). SAGE. Richardson, L.-J. (2014). Public archaeology in a digital age. Ph.D. thesis, UCL. Rostock, J. (2007). Arkæologi som forretning – om en diskurs med uheldige konsekvenser. Arkæologisk Forum, 17, 33–39. Said, E. W. (1979). Orientalism. Vintage Books. Salminen, T. (2020). Arkeologian historia: tehtyä ja tehtävää. Muinaistutkija, 1, 35–47. Scherzler, D. (2010). Das Ende des Frontalunterrichts Beobachtungen zu Archäologie und Web 2.0 im Frühling 2011. Archäologische Informationen, 33(1), 99–111. http://www.dianescherzler.de/downloads/AI_33_Scherzler.pdf Schlanger, N. (2012). Situations archéologiques, expériences coloniales. Les Nouvelles de larchéologie, 128, 41–46. Schnapp, A. (2012). La crise de l’archéologie, de ses lointaines origines à aujourd’hui. Les Nouvelles de l’archéologie, 128, 3–6. Shanks, M. (2012). The archaeological imagination. Left Coast Press. Shanks, M., & Tilley, C. (1988). Social theory and archaeology. University of New Mexico Press. Shnirelman, V. A. (1995). From internationalism to nationalism: Forgotten pages of Soviet archaeology in the 1930s and 1940s. In P. L. Kohl & C. P. Fawcett (Eds.), Nationalism, politics, and the practice of archaeology (pp. 120–138). Cambridge University Press. Silliman, S. W. (2018). Engaging archaeology: 25 case studies in research practice. WiIey. Sillitoe, P. (2002). Globalizing indigenous knowledge. In P. Sillitoe, A. Bicker, & J. Pottier (Eds.), Participating in development: Approaches to indigenous knowledge (pp. 108–138). Routledge. Simpson, F., & Williams, H. (2008). Evaluating community archaeology in the uk. Public Archaeology, 7(2), 69–90. Smith, L. (2004). Archaeological theory and the politics of cultural heritage. Routledge. Smith, L. (2006). Uses of heritage. Routledge. Smith, L. (2012). Discourses of heritage: Implications for archaeological community practice. Nuevo mundo mundos nuevos. Smith, L., & Campbell, G. (2017). The tautology of ‘intangible values’ and the misrecognition of intangible cultural heritage. Heritage & Society, 10(1), 26–44. Sommerlund, J. (2002). Demarcations and boundary objects: Scientific balancing acts in molecular microbial ecology. Ph.D. thesis, Copenhagen Business School. Starzmann, M. T. (2012). Archaeological fieldwork in the Middle East: Academic agendas, labour politics and neo-colonialism. In N. Schlanger, S. van der Linde, M. van den Dries, & C. Slappendel (Eds.), European archaeology abroad: Global settings, comparative perspectives. Sidestone Press. Stobiecka, M. (2020). Archaeological heritage in the age of digital colonialism. Archaeological Dialogues, 27(2), 113–125. Stylianou-Lambert, T., & Bounia, A. (2016). The political museum. Routledge.
136 I. Huvila Taylor, J., & Gibson, L. K. (2017). Digitisation, digital interaction and social media: Embedded barriers to democratic heritage. International Journal of Heritage Studies, 23(5), 408–420. https://doi.org/10.1080/13527258.2016.1171245 Thomas, J. (1993). Discourse, totalization and ‘the neolithic’. In C. Y. Tilley (Ed.), Interpretative archaeology (pp. 357–394). Berg. Tilley, C. (1989a). Discourse and power: The genre of the cambridge inaugural lecture. In D. Miller, M. Rowlands, & C. Tilley (Eds.), Domination and resistance (pp. 40–62). Routledge. Tilley, C. (1989b). Excavation as theatre. Antiquity, 63(239), 275–280. Tilley, C. (1994). Interpreting material culture. In S. M. Pearce (Ed.), Interpreting objects and collections (pp. 67–75). Routledge. Trigger, B. G. (1989). A history of archaeological thought. Cambridge University Press. Trigger, B. G. (1995). Romanticism, nationalism, and archaeology. In P. L. Kohl & C. P. Fawcett (Eds.), Nationalism, politics, and the practice of archaeology (pp. 263–279). Cambridge University Press. Venclova, N. (2007). Communication within archaeology: Do we understand each other? European Journal of Archaeology, 10(2–3), 207–222. Wakefield, C. (2020). Digital public archaeology at must farm: A critical assessment of social media use for archaeological engagement. Internet Archaeology, 55. Walker, D. (2014). Decentering the discipline? Archaeology, museums and social media. AP: Online Journal in Public Archaeology, S1, 77–102. Waterman, S. (2014). Discourse and domination: Michel Foucault and the problem of ideology. In I. Bapty & T. Yates (Eds.), Archaeology after structuralism: Post structuralism and the practice of archaeology (pp. 79–103). Routledge. Watson, S. (2019). Whither archaeologists? Continuing challenges to field practice. Antiquity, 93(372), 1643–1652. Wetherell, M., & Potter, J. (1988). Discourse analysis and the identification of interpretative repertoires. In C. Antaki (Ed.), Analysing everyday explanation: A casebook of methods (pp. 168–183). Sage. White, H. (1975). Metahistory. Johns Hopkins University Press. White, H. (1987). The content of form: Narrative discourse and historical representation. Johns Hopkins University Press. Williams, H., Pudney, C., & Ezzeldin, A. (2019). Public archaeology arts of engagement. Archaeopress. Wylie, A. (1985). Between philosophy and archaeology. American Antiquity, 50(2), 478–490. http:/ /www.jstor.org/stable/280505 Wylie, A. (2007). Doing archaeology as a feminist: Introduction. Journal of Archaeological Method and Theory, 14(3), 209–216. Wylie, A. (2017). How archaeological evidence bites back: Strategies for putting old data to work in new ways. Science, Technology & Human Values, 42(2), 203–225. Zorzin, N. (2015). Dystopian archaeologies: The implementation of the logic of capital in heritage management. International Journal of Historical Archaeology, 19(4), 791–809.
Chapter 7 Dealing with Vagueness in Archaeological Discourses Cesar Gonzalez-Perez, Martín Pereira-Fariña, Patricia Martín-Rodilla, and Leticia Tobalina-Pulido Abstract Vagueness is an intriguing topic, especially in the humanities. It has been treated as a problem that contaminates information and makes research harder, but also as an expression of human subjectivity that enriches our accounts of the world. Vagueness is studied by philosophers, treated by computer scientists, and used by archaeologists intentionally or unintentionally. This chapter aims to provide a comprehensive overview of how vagueness has been treated from philosophy and computer science, and offer a synthetic theoretical framework to operationalise vagueness on archaeological discourses that can be applied for practical purposes. To illustrate this, an empirical study is described. Vagueness is everywhere, and archaeology is no exception. From quantitative measurements or datings to uncertain function or use assessments, archaeologists deal with imprecise, inaccurate and uncertain information all the time. In addition, vagueness is strongly embedded in language. Human language contains a number of mechanisms to express vagueness, such as hedges (“approximately”, “might”) or ranges (“between 12 and 15”). This means that any archaeological discourse is likely to employ devices like these to describe relevant information. In addition, vagueness is a computational challenge. Common representations of knowledge that are stored on computer systems discard vagueness, thus losing nuance and richness. Computer scientists have tried to incorporate these aspects into C. Gonzalez-Perez (!) · L. Tobalina-Pulido Incipit CSIC, Santiago de Compostela, Spain e-mail: cesar.gonzalez-perez@incipit.csic.es; leticia.tobalina-pulido@incipit.csic.es M. Pereira-Fariña Department of Philosophy, University of Santiago de Compostela, Santiago de Compostela, Spain e-mail: martin.pereira@usc.es P. Martín-Rodilla Department of Computer Science and Information Technologies, University of A Coruña, A Coruña, Spain e-mail: patricia.martin.rodilla@udc.es © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 C. Gonzalez-Perez et al. (eds.), Discourse and Argumentation in Archaeology: Conceptual and Computational Approaches, Quantitative Archaeology and Archaeological Modelling, https://doi.org/10.1007/978-3-031-37156-1_7 137
138 C. Gonzalez-Perez et al. data through approaches such as fuzzy logic or protoforms, which provide richer accounts of what is being represented, but at the cost of higher complexity. This chapter begins with a philosophical introduction to vagueness, and then it describes how vagueness has been treated from computer science. Then, a vagueness framework is proposed based on the previous, and an empirical study is described to illustrate practical applications. Keywords Vagueness · Archaeological discourse · Imprecision · Inaccuracy · Uncertainty · Error 7.1 Philosophical Groundings of Vagueness If you are removing a grain rice from a pile of rice one by one, at what point will the pile cease being a pile? Where does the Everest Mountain start? How many hairs should be removed from a head for it to become bald? What is the height of a tall person? How many parts can be substituted in a classic car before it is not original anymore? This sort of questions have been a matter of philosophical reflection from Ancient Greece, when Eulibides of Miletus wondered what constitutes a “stone heap” (van Deemter, 2010). This is the so called Sorites paradox, which has been formulated in a variety of ways, and this is a typical one: 1 stone does not make a heap. If 1 stone does not make a heap, then 2 stones neither. If 2 stones do not make a heap, then 3 stones neither. ... Therefore, no finite number of stones will make a heap. This paradox shows the key point of vagueness, a phenomenon that is usually defined as the existence of borderline cases (van Deemter, 2010; Hyde, 2008; Keefe, 2000; Williamson, 1996). This is a sort of grey area where there is no clear cut between what is the case and what is not, such as in the case of “heap” above. Vagueness is a very common phenomenon in natural language, which should not be confused with other linguistic phenomena such as ambiguity, meaninglessness or uncertainty. Next, we will introduce an overview of the concept of vagueness and the key points of four philosophical approaches to vagueness. 7.1.1 Philosophical Approaches to Vagueness Vagueness is an ambiguous concept, and vague itself. In a technical sense (Hyde, 2008), it should not be confused with the concepts of ambiguity, meaninglessness or uncertainty. Thus, a statement such as “I saw you in the bank” is not vague but
7 Dealing with Vagueness in Archaeological Discourses 139 ambiguous because it has well defined meanings (I could have seen you either in a bank office or sitting in the bank of the river having a picnic) but I do not know which one the speaker is referring to. Regarding meaninglessness, we say that an expression such as “boomrap” is meaningless because its reference is unknown and does not entail the existence of borderline cases. Lastly, a sentence such as “she will arrive on time or 10 minutes late” shows that we do not know exactly when a future event is going to happen, but we still have a clear criterion to determine whether the statement is true or false. Vagueness is complex as well. Different kinds of vagueness can be recognised in natural language (Hyde, 2008), such as degree-vagueness and combinatoryvagueness or concept application or individuation, but all of them show certain degree of resemblance. On the one hand, we can talk about absolute borderline cases, which refer to those predicates in which no matter how much conceptual analysis of empirical investigation we do, we will never be able to determine whether the predicate can be applied or not (e.g., “bald”, “heap”, etc.). On the other hand, we can talk about compositional borderline cases, which derive from a combination of a variety of (eventually) crisp features or conditions that generates a grey area of borderline cases. For instance, the concept of country, although we can provide a more or less well-defined list of features that a country must satisfy, there are certain territories having statuses that are debatable, despite satisfying each criterion independently. Some well-known examples are Hong Kong, Kosovo, or Palestine. For the sake of simplicity, we just focus on absolute borderline cases, such as predicates like “tall”, “heap”, “bald”; hedges like “approximately”, “more or less”, etc.; quantifiers like “many”, “most”, etc. This is precisely the starting point of the contemporary studies of vagueness. Russell, in his seminal paper Vagueness (Russell, 1923), set the grounds for the contemporary studies of vagueness arguing that vagueness is a purely linguistic phenomenon, a semantic one. The main function of natural language is the representation of the world but, as a representational system, it is imperfect, and vagueness is some kind of defect in it (Hyde, 2008). Well defined formal languages, such as classical logic, are free of this problem; therefore, and according to this take, the only right way to address vagueness is to eliminate it. This approach is also compatible with Russell’s logic atomism project, which rejected the existence of vague entities and defended that neither classical logic or semantics are suitable for the analysis of natural language. Russell’s philosophical project lost his prevalence when a new conception of natural language emerged in the 1950s: ordinary language is useful for much more than just representing or talking about the world; actually, when we are saying something, we are doing something (Austin, 1989; Wittgenstein, 1989). Under this new framework, new views of vagueness arose, being the epistemic and the pragmatic the most relevant ones. The epistemic view (Williamson, 1996) argues that vagueness is not a semantic problem but an epistemic one. Its central point is to reject the existence of borderline cases: predicates are always clear, but we just do not know it. Therefore, linguistic vagueness is derived from a certain type of ignorance, namely lack of information.
140 C. Gonzalez-Perez et al. Therefore, together with the semantic view, both assume that reality is crisp, i.e. there are no vague objects (Evans, 1978). The pragmatic view (van Deemter, 2010) assumes the existence of borderline cases, but rejects that this constitutes a semantic or epistemic problem. Vagueness arises as a feature (not a defect) of the relation between the users of language and language itself. Therefore, vagueness cannot be studied independently of its communicative function. Moreover, vagueness is not just present in language, but everywhere. It appears in many other realms such as in the classification of species (van Deemter, 2010) (Chap. 2) or when we try to define what obesity means (Chap. 3). Thus, its project is not eliminating vagueness but understanding how we handle it when it appears and how we assume specific conventions to interpret it according to context (Keefe, 2000). With respect to the existence of vague entities in the world, the pragmatic approach presupposes the existence of vague properties, objects and even identities (Akiba, 2014). 7.1.2 Theories of Vagueness As we said, Russell’s goal was to eliminate vagueness from natural language because it was a defect. However, today we know that this is impossible, and more recent approaches to vagueness aim to deal with that as a feature of language. The three most relevant theories for this are supervaluationism, subvaluationism, and gradability. Supervaluationism is based on bivalued classical semantics, that is, having true and false as the only proper truth values. It assumes that the truth value of a vague statement can be indeterminate, i.e., vagueness generates truth value gaps (Keefe, 2000). This leads to assume a non-classical semantics, since, for instance, the truth value of a sentence does not only depend on its constituents (solving, in this way, the Sorites paradox by rejecting the conjunction of its premises), but that can be classical for individual specifications. Therefore, supervaluationism is a sort of nonclassical metalanguage compatible with a classical logic study of language. Subvaluationism (Hyde, 2008) is the counterpart of supervaluationism and a form of paraconsistent logic (a conjunction of contradictory statements can be collectively true while not being all individually true). Instead of admitting truth value gaps, this theory admits truth value gluts, i.e., predicates or statements expressing borderline cases can be true and false simultaneously. As opposed to supervaluationism, subvaluationism rejects classical semantics and the law of the excluded middle; therefore, it is not compatible with classical logic. To avoid this criticism, subvaluationism argues that accepting the overlapping of truth values in a sentence does not entail accepting it for every sentence, but still, it has detractors (Keefe, 2000). Supervaluationism and subvaluationism are theories of vagueness that reject the epistemic view of vagueness, as they consider vagueness to be not a matter of ignorance but a semantic phenomenon. However, they agree with the epistemic view
7 Dealing with Vagueness in Archaeological Discourses 141 regarding the inexistence of vagueness of the world (i.e. there are not vague objects) but there is vagueness in the world, result of how our representational tools and skills work. Gradability, the third main theory of vagueness, defends the existence of vague properties and vague objects. Vague predicates, such as “tall” or “small”, are vague because they allow us to talk about properties that are essentially vague. Vague nouns, such as “mountain”, or determiners, such as “many”, denote vague objects or situations because there is no way to define perfect boundaries for them. However, the fact that a predicate is vague does not mean that it can only refer to borderline cases; on the contrary, crisp cases can exist as well. For instance, if I am at the top of a mountain, I am clearly in the mountain; if I am in the valley next to the mountain and I can see the mountain, I am clearly not in the mountain. If I start to walk towards the mountain, I will be in the mountain at some point without being aware when that changed has happened; in other words, there is no dichotomy between being in/out of the mountain, but a continuum (van Deemter, 2010). Thus, gradability is a matter of “to what extent?” and assumes that truth values constitute a continuum. This is the foundation of many-valued logics, such as fuzzy logic (discussed in further sections of this chapter), which define a functional semantics that rejects the law of the excluded middle but allows us to define the meaning of expressions such as “almost true”. The pragmatic view is the main supporter of the gradability approach. The assignment of a truth value to a specific function is a matter of context and social convention, in other words, a matter of how the predicate is being used by people. This limits its logical study in classical terms and requires a more empirical study based on speakers and language use in different contexts. Defining a representative sample for studies like this constitutes a challenging issue. In spite of that, current studies suggest that contextual circumstances and background information make users choose the crisp definition that is most suitable for each communicative situation, although this can change if the circumstances change (van Deemter, 2010). An open question affecting all of these theories is high-order vagueness (Hyde, 2008): Is it possible to define a clear-cut boundary for the border of borderline cases? Of course, this question can be recursively formulated ad infinitum. This generates additional questions, such as how many truth values are necessary to handle vague predicates, but these kinds of questions are out of our scope in this book. Next, we describe what and how these different views have been used for the development of mathematical and computational tools for handling vagueness in practical applications. 7.2 Computational Treatments of Vagueness The theory of computation has inherited many of the approaches analysed in the previous section for applications in computer systems, which has allowed us to incorporate vagueness into algorithms and decision-making systems. Note that the
142 C. Gonzalez-Perez et al. works in this area are part of one (or several) specific disciplines within computer science and computation theory. Due to that, it is not the purpose of this section to make an exhaustive review of all of them, but to provide the reader with an overview of most outstanding computational studies and application of vagueness, pointing out especially the approaches applied to archaeological discourses. Firstly, we can find strongly mathematical treatments of vagueness. Most of them are based on an margin-of-error paradigm, such as the Interval Predictor Model (Lacerda & Crespo, 2017), models that estimate uncertainty regions of the information contained. Statistical-based approaches are also common in mathematical-computational modelling of vagueness, which generally associate probability functions with especially vague attributes (features of each data type) of the information that we are modelling. The probability functions (Fermüller et al., 2017) can be indicators of the precision (used in inferential statistics) or the certainty degree of the attribute values (that is, measures of error for a given value). These solutions explicitly model different aspects or dimensions of vagueness and have been widely applied for years in archaeological contexts, such as in studies with GIS data (Lieskovský et al., 2013; Runz et al., 2007) or 3D reconstructions (Nicolucci & Hermon, 2010; de Runz et al., 2013), or structured data and classification mechanisms (Hermon & Niccolucci, 2002). However, these models assimilate the vagueness of the information as a function of margin of error, a semantic paradigm which assume that some dimension of the vagueness is always due to the deviation of our knowledge in relation to a “true” reference value and try to compute that deviation (Martin-Rodilla et al., 2019b). This approach is hardly generalizable when the data sources are unstructured, expressed in natural language and from humanistic or social domains (Martin-Rodilla et al., 2019b; PROVIDEH, 2018). Secondly, there are mathematical accounts of vagueness grounded on philosophical ideas (such as Black (Black, 1937)). These works laid the foundations for the modelling of the complex phenomenon of vagueness at a computational level. Years later and based on Black’s works, Zadeh’s fuzzy logic-based theory (Zadeh, 1996) constituted a milestone at the computational level, allowing for the first time the computational representation and treatment of vagueness. It is a less error-focused approach (Zadeh, 1996, 2010), which develops specific techniques (for example, fuzzy sets and degrees of probability, rule bases, linguistic summaries such as fuzzy description of variables or fuzzy quantifiers, and similarity measures) to model vague aspects of information. Since then, fuzzy logic has been widely used as a method of representation of vagueness in computer systems, being able to find implementations of fuzzy logic of different nature (de Silva, 1995; Syropoulos, 2016), also with some attempts in archaeology (Baxter, 2009; Reeler, 1999; Taheri et al., 2019). We can also find some novel approaches from computational theory itself, closer to the classical programming languages representations (Coletti, 2020) or developing specific informational ambiguity metrics (Fabbrini et al., 2001; Fantechi et al., 2018). In recent years, some comprehensive approaches have been developed from software engineering, including vagueness aspects within conceptual models of software applications (Abualdenien & Borrmann, 2020; Martin-Rodilla et al.,
7 Dealing with Vagueness in Archaeological Discourses 143 2019b), especially after various attempts to identify features for which vague information can be conceptually modelled, such as set membership, interval membership, incompleteness, and related issues (Jing et al., 2008). At a software level, there are translations of this conceptualization in implementations both in relational and nonrelational databases (Martin-Rodilla et al., 2019a). All these approaches have been successfully applied in heterogeneous domains: bio-medical sciences (He & Smit, 2021), logistics (Ottomanelli & Wong, 2011), egovernment and infrastructures, energy resources, etc., and in which the so-called expert or decision-making assistance systems (de Silva, 1995). Their implementation involves in most cases certain adaptations of fuzzy logic depending on the nature of the domain data. However, the intrinsic presence of vagueness in natural language is still a challenge for the formal representation of fuzzy theory in computation, especially when the source we wish to deal with at a computational level is discourse in humanistic and social science domains, characterized by a great presence of vagueness in the produced narratives (Hermon & Niccolucci, 2002; Martin-Rodilla et al., 2019b). In order to address this challenge, many of the current fuzzy-level approaches in computing incorporate certain interdisciplinary and hybrid approaches that specifically attempt to improve the computational treatment of vagueness in natural language narratives. We can find, for example, implementations of fuzzy logic with linguistic characteristics, such as HFLTS (Ashtiani & Azgomi, 2016), Fuzzy Natural Logic (Novak, 2017) or hybrid approaches for incorporating linguistic aspects to new protoforms (an abstracted model which instances represent knowledge about data (Zadeh, 2002)) in fuzzy logic (RamosSoto & Martin-Rodilla, 2021), and specific studies of the performance of these models in applications that use natural language, such as automated Question and Answering (Q&A) systems (Gupta et al., 2018) or automatic generation of language (Ramos-Soto & Martin-Rodilla, 2021). Also, linguistic-analytical studies are common in this area, determining metrics to quantify aspects of vagueness in specific linguistic categories or expressions (in different languages) such as markers (Malyuga & McCarthy, 2018) or adjectives (Gasmi & Bourahla, 2017; Lassiter & Goodman, 2017), and even analytically selecting the most determining aspect of vagueness given an expression in natural language (that can express or reflect various aspects of vagueness at the same time) (Raskin & Taylor, 2014). Continuing with the hybrid approaches with a linguistic component, interdisciplinary research lines from linguistics apply a corpus-based approach to the problem of the representation of vagueness at a computational level, analysing large volumes of texts in different languages and then automating certain detection of patterns and/or specific expressions with a vague component (Lebanoff & Liu, 2018; Rashkin et al., 2017). This approach is used both in specific languages, such as Romanian, English or German (Dinu et al., 2017; Leto Russo, 2019; Li, 2019; Quammie-Wallen, 2021), as well as using multi-language parallel corpus such as Russian/English (Malyuga & McCarthy, 2018) or German/Spanish/Mandarin (Cutting, 2019), among others, and is applied using different corpus sources and goals, such as medical corpus, analysis of fake news (Rashkin et al., 2017), political discourses (Leto Russo, 2019; Rashkin et al., 2017) or historical corpus (Dinu et
144 C. Gonzalez-Perez et al. al., 2017; Toledo, 2017). Finally, there are computational implementations arising from formal theories in discourse, such as Rhetorical Structure Theory (Mann & Thompson, 1987) and/or formalization of argumentation, such as Inference Anchoring Theory (Janier et al., 2016). However, its specifications do not deal with aspects of vagueness explicitly, leaving the decision of the categorization of vague expressions for the modeller to tackle by hand. Also, some of the automation attempts in RST, for example (Joty et al., 2015), apply automatic classification models for the discourse analysis, but their implementation does not explicitly include vagueness support. In summary, the computational treatment of vagueness was in the past, and continues being in the present, deeply influenced by mathematical models based on margins-of-error paradigms or by fuzzy logic models incorporating recently updated and hybridized variants. In the case of margins-of-error models, their semantics makes it difficult to apply them to treat archaeological discourses and their vagueness due to its own subjective, narrative-based and vague nature. In the case of the fuzzy logic-based models, the new approaches to hybrid intelligence and hybrid corpus methodologies with machine learning algorithms show certain promising results (Gupta et al., 2018), although their application in humanities domains (and in particular with archaeological discourses as sources) is still novel, not standardized between projects and residual. The existence of formal theories that allow a certain extent standardization of the computational treatment of the vagueness for the archaeological discourses, with some flexibility of application between projects that would allow the inclusion of the different dimensions of vagueness in archaeological discourse analysis, together with algorithms and multilingual techniques for treating them, would allow further progress in this area. The following sections develop this need and propose an approach for the specific conceptualization of vagueness based on the previous philosophical foundations, and especially oriented to the application in archaeology through the computational techniques described. 7.3 Concept of Vagueness as a Conceptual Modelling Issue After having described how vagueness has been tackled from philosophy and computation, we propose now a conceptualisation of vagueness that is grounded on conceptual modelling (Gonzalez-Perez, 2018; Olivé, 2007). Conceptual modelling is a discipline that aims to provide theories and techniques to represent the world in a manner that is useful for humans to build information systems. It is closely related to ontologies and ontology engineering, but the latter emphasises machine processing whereas conceptual modelling has traditionally emphasised human communication (Gonzalez-Perez, 2017). Conceptual models are often expressed in terms of types and tokens (Wetzel, 2018). Types correspond to what categories exist in the world
7 Dealing with Vagueness in Archaeological Discourses 145 (often called classes), what properties these have (attributes), and how they relate to one another (via associations), whereas tokens correspond to what class instances exist (objects), what properties they exhibit (values), and how they connect to each other (links). In this context, we see vagueness as a property related to the imperfection, lack of clarity, lack of detail, doubt, and unreliability of object existence, values or links. Vagueness cannot be defined comprehensively, as it encompasses phenomena of very different nature. It is not a classical category having a criteria-based definition, but rather a Lakoffian radial category (Lakoff, 1990) comprising phenomena having a rough family resemblance. 7.3.1 Sources of Vagueness In previous works, we have argued that vagueness originates mostly from two kinds of sources: ontological and epistemic (Gonzalez-Perez, 2018; Tobalina-Pulido & Gonzalez-Perez, 2020). Ontological Vagueness comes from the fact that some things in the world do not have clear-cut or perfectly defined boundaries but rather exhibit gradual change. A typical example, often used in philosophy and presented in a previous section of this chapter, is that of the limits of a hill. If we are sitting at the top of a hill, we can certainly state that we are on the hill. Similarly, if we are sitting at the bottom of the valley, we can safely state that we are not on the hill. However, if we start walking from the top of the hill downward towards the valley, there is no clear boundary at which we switch from being on the hill to not being on it. Rather, it is a gradual change, and therefore we say that the limits of the hill are ontologically vague. Future or imaginary events are also a common source of ontological vagueness. For example, the precise duration of a task planned for the future has not been established yet (since the task has not been yet carried out), and therefore any statement about it is ontologically vague. Epistemic Vagueness comes from the fact that our knowledge about the world is usually imperfect and incomplete. For example, we may be not sure about how many children Alexander the Great had. Certainly, he either had none or one or two, etc. But since we are not sure, any statement that we make about it will be necessarily epistemically vague. Also, epistemic vagueness is sometimes purposeful injected into a statement by blurring or hedging what we say for semantic purposes. For example, we may say that there were between 15 and 20 houses in a village that we visited last year, as we cannot recall the specific number. Using an interval instead of committing to a particular number indicates our lack of perfect knowledge for the sake of being correct.
146 C. Gonzalez-Perez et al. 7.3.2 Vagueness Variables In order to study vagueness, we must operationalise it in the form of different variables that can be described and measured. We propose the following. 7.3.2.1 Imprecision Imprecision is the absence of detail in a statement in relation to what is being represented. In other words, a statement is very precise if it contains a lot of detail. For example, if we say that “Mount Everest is 8800 m high”, we are being quite imprecise, as Everest is in fact 8848.86 m high according to the latest measurements. Expressions such as “more or less”, “roughly” or “approximately” usually indicate imprecision. However, imprecision can be present even in the absence of any lexical marker, like in the previous example. It is common that, especially in colloquial language, detail is removed for simplicity and conciseness. Imprecision is a property of statements, regardless of what they aim to represent. Imprecision may be originated by: • The inherent ontological vagueness of the entity being described. For example, a statement such as “the Himalayas is 700 km long” is imprecise due to the ontological vagueness of the length of any mountain range. No matter how good our instruments and technologies are, we cannot obtain a significant measurement that is much more precise. • The limitation of our instruments and technologies. For example, a common field thermometer can tell us that the temperature is 19 ◦ C, or perhaps 19.3 ◦ C. These measurements are likely to be imprecise in the sense that expressing the actual temperature may use more significant figures. This kind of limitation is often called the systematic error of the instrument or technology, and should not be confused with the measurement error, as described in further sections. • The intentional removal of detail. For example, if we are unsure about how many children Alexander the Great had, we can say that “he had between 4 and 6”. This lacks detail but, as described in further sections, is a more reliable expression than a more detailed one. 7.3.2.2 Inaccuracy Inaccuracy is the difference between the content of a statement and the entity being represented by it. In other words, a statement is very accurate if it describes this entity very faithfully. For example, if we state that “there are 21 post holes in this area” when in fact there are 21 post holes, we are being fully accurate. However, if in reality there are 16 post holes, then we are being quite inaccurate. Inaccuracy is a property of the relationship between a statement and that which it aims to represent. For this reason, inaccuracy can be difficult to detect by looking
7 Dealing with Vagueness in Archaeological Discourses 147 at the statements only. We need to compare what is being said to what is being described in order to ascertain inaccuracy. Inaccuracy is usually originated by epistemic vagueness. In other words, it is our lack of knowledge what makes our statements inaccurate. 7.3.2.3 Uncertainty Uncertainty is the degree of doubt that we possess about a statement. In other words, a very certain statement is one for which we feel highly confident and reliant. For example, in a statement such as “I think this should be a burial site”, the lexical markers “I think” and “should” clearly point at uncertainty and indicate that the speaker harbours some doubt about its content. Markers such as “perhaps”, “maybe” or “I think” usually indicate uncertainty. Uncertainty is a property of statements, regardless of what they aim to represent. Uncertainty is usually originated by epistemic vagueness, that is, our imperfect knowledge about the world. 7.3.2.4 Error Error is the difference between the contents of two or more statements that aim to represent the same thing. In other words, an error-free statement is one that coincides with other statements representing the same thing. For example, if we measure the length of a tomb twice and obtain 14.2 m and 13.9 m as results, the error is given by the fact that the two measurements do not coincide. The larger the difference, the larger the error. Note that error only makes sense when the two (or more) statements represent the same thing. However, determining whether two statements represent the same thing or not can be tricky. For example, imagine two specialists producing independent reports on the conservation status of a monument, the first one concluding that the monument does not need restoration and the second one concluding that it does. This can only be considered erroneous if the specialists used the same techniques and approaches to assess the state of the monument and share the same goals. As soon as significant subjective issues are introduced, it can be argued that the two statements do not represent the same thing and, therefore, no error exists. For example, if two neighbours provide their respective opinions on the worth and value of a derelict monument risking demolition, we should not appeal to error when they disagree, as their opinions cannot be considered representations of the monument itself but of the particular values and preferences of each neighbour. Error, as defined by this proposal, includes measurement error, which is a property of how instruments and techniques are used for a specific task. In our previous example about measuring a tomb twice, perhaps the person making the measurements did not secure the measurement tape correctly and it slipped, producing an error. However, error in this proposal excludes systematic error, which
148 C. Gonzalez-Perez et al. is a property of the instruments and technologies that we use rather than our use of them, and which we consider to be part of imprecision as described in a previous section. 7.3.3 Relationships Between Vagueness Variables Several relationships exist between vagueness variables. 7.3.3.1 Imprecision Decreases Uncertainty In general, the more imprecise a statement it, the less uncertain it becomes. In other words, removing detail from a statement tends to make it more certain, because the less detail it contains, the more covering it is, the weaker the commitment it entails, and the better chances that it is correct. For example, the statement “Alexander the Great had 5 children” is very precise, as it conveys a specific figure. However, the only state of affairs that would make it true is if Alexander the Great had indeed 5 children, so we are reasonably uncertain about it. Changing the statement to “Alexander the Great had 4 or 5 children” removes some detail and makes the statement more imprecise, but also makes it more certain as we are now covering more options than before. Changing it further to “Alexander the Great had between 0 and 50 children” makes it extremely certain but also extremely imprecise. Similarly, the statement “the site was abandoned in 2257 BCE” is quite precise but probably very uncertain, as it is unlikely that we have reliable information about an event that happened over 4000 years ago. Hedging the statement as “the site was abandoned around 2200 BCE” removes detail but gains in certainty. And adding a margin like in “the site was abandoned in 2200 BCE ± 300” makes it much more certain, albeit less precise. Precision makes a statement informative, whereas certainty makes it reliable. A statement such as “the site was abandoned in 2257 BCE” is very informative, but it can be unreliable if we are not certain of what it says. At the opposite end of the spectrum, a statement such as “Alexander the Great had between 0 and 50 children” provides nearly no information, but is highly reliable as it is almost certainly true. Often, we must find a useful balance between imprecision and uncertainty, for example by purposefully removing detail to gain certainty, which is a common technique in, for example, archaeological dating. 7.3.3.2 Imprecision Increases Inaccuracy In general, the more imprecise a statement is, the more inaccurate it becomes. In other words, removing detail from a statement tends to make it less faithful to what
7 Dealing with Vagueness in Archaeological Discourses 149 it aims to describe, as the absence of detail moves the stated content towards nice round figures which, under the usual interpretation, are less likely to be accurate. For example, the statement “the main tower is 22.7 m tall” is quite precise, and probably quite accurate if the measurement was properly done. Instead, “the main tower is 22 m tall” is more imprecise and, consequently, more inaccurate, as the removal of detail is moving the conveyed information away from the actual height of the tower. And “the main tower is roughly 20 m tall” is very imprecise and probably very inaccurate”, as the tower is unlikely to be exactly 20 m tall. Consider, however, that a very precise statement can be extremely inaccurate. In other words, a high amount of detail does not entail a faithful representation. For example, imagine an archaeological site having a protection perimeter of 2.15 km2 . A statement such as “the protection perimeter is 4.45 km2 ” provides a quite detailed description but is wildly inaccurate. However, a statement such as “the protection perimeter is roughly 2 km2 ” is more imprecise but much more accurate. In other words, precision is useless if we are not accurate. 7.3.3.3 Error Increases Uncertainty In general, the more error we have, the more uncertain we are. In other words, detecting frequent and large discrepancies between descriptions tends to make us less sure about what we say, as it is difficult to choose between them. For example, imagine that a set of pottery fragments is dated using two different techniques. If one yields an estimated date of 3200 BCE and the other estimates it as 3100 BCE, we can be quite confident about the pottery’s age. However, if the dating techniques yield estimates of 3200 BCE and 4500 BCE, we cannot be sure of which number is better. 7.4 Empirical Study In order to test the proposed vagueness framework, an empirical study was carried out. The aim was to determine and characterise the perception of vagueness variables expressed on text by readers in terms of its textual expression, under the hypothesis that vagueness perception obtained empirically from specialists should correlate with vagueness variables as measured in the texts. Two well-known archaeological elements were selected: the Roman Villa of Liédena (Navarra, Spain), henceforth referred to as “Villa”, and the Visigoth bone brooch of Santa María de Hito (Cantabria, Spain), referred to as “Brooch”. For each of these, four text fragments in Spanish between 200 and 1000 words each were extracted from various publications and labelled A to H, as shown in Table 7.1. The experiment was organised as follows. Imprecision and uncertainty were measured for each text fragment by counting lexical markers, as described below. Then, different archaeologists were asked to read the texts and score each in relation
150 C. Gonzalez-Perez et al. Table 7.1 Selected text fragments for the empirical study Element Villa Text A B C D Brooch E F G H Reference Altadill, J. 1921. “Los mosaicos romanos de Liédena”, Boletín de la Comisión de Monumentos Históricos y Artísticos de Navarra, 62–63. Mezquíriz Irujo, M. A. 2009. “Las villae tardorromanas del Valle del Ebro”, Trabajos de Arqueología Navarra, 21, 222–223. Taracena Aguirre, B. 1950. “Excavaciones en Navarra, La villa romana de Liédena”, Príncipe de Viana, 38–39, 14–15. Vizcaíno León, D. et al. 2013. “La reconstrucción virtual del patrimonio arqueológico al servicio de la divulgación y puesta en valor de la Villa Romana de Liédena (Navarra, España)”, VAR, 4, 104–108. Gimeno García-Lomas, R. 1978. “Hallazgo de un broche alto medieval trabajado en hueso”, Boletín del Seminario de Estudios de Arte y Arqueología, 44, 430–432. García Guinea, M. A. 2006. “Broche de cinturón (Necrópolis de Santa María de Hito), Apocalipsis: el ciclo histórico de Beato de Liébana: catálogo de la exposición”, Santillana del Mar. Europapress, 2015. “Un broche de hueso encontrado en la necrópolis medieval de Santa María de Hito, Pieza del Mes de la UC”. https://www. europapress.es/1antabria/cultura-deporte-00760/noticia-broche-huesoencontrado-necropolis-medieval-santa-maria-hito-pieza-mes-uc-20,150, 306,171,046.html Gutiérrez Cuenca, E. & Hierro Gárate, J. A. 2018. “Broche de cinturón de Santa María de Hito” La pieza del mes 2014–2016, Museo de Prehistoria y Arqueología de Cantabria, 24–25. The actual text contents are not shown for brevity to these variables through an online survey. Finally, the assigned scores were compared against the measured values for each text, looking for correlations. 7.4.1 Measuring Imprecision and Uncertainty Imprecision and uncertainty were measured by counting the number of relevant lexical markers in each text, and then dividing this into the text word count and multiplying by 1000 to obtain an appropriately scaled index. For imprecision, markers included qualifiers such as “some”, “quite”, “approximately” or “much”; approximate dating expressions such as “Constantine coin” or “Later Roman Empire” or those spanning a range, such as “first to third centuries”. Table 7.2 shows the results. In the case of uncertainty, markers included subjectivity expressions such as “I believe that” or “I think”; hedges such as “maybe” or “it seems likely”, and question marks or similar signs indicating doubt associated to data, such as in “1.5 m?”. Table 7.3 shows the results.
7 Dealing with Vagueness in Archaeological Discourses Table 7.2 Imprecision measurements Element Villa Brooch Text A B C D E F G H Marker count 27 29 42 13 12 12 8 17 151 Word count 769 654 1056 190 562 215 202 528 Imprecision 35.11 44.34 39.77 68.42 21.35 55.81 39.60 32.20 Imprecision is calculated as 1000 × MarkerCount/WordCount Table 7.3 Uncertainty measurements Element Villa Brooch Text A B C D E F G H Marker count 10 11 12 2 5 6 3 7 Word count 769 654 1056 190 562 215 202 528 Uncertainty 13.00 16.82 11.36 10.53 8.90 27.91 14.85 13.26 Uncertainty is calculated as 1000 × MarkerCount/WordCount 7.4.2 Survey and Assigned Scores Two online surveys were created, one for each archaeological element, using the Google Forms platform. Both surveys had the same structure of four parts. The first part gathered demographic data of respondents, in terms of work experience in archaeology and their degree of experience in the subject of the texts (either Roman villae or Visigoth brooches). The second part gathered information on how familiar each respondent was with each archaeological element, in terms of having visited, read about, or worked on it. The third part gathered information on perceived precision and certainty for each text, asking the respondent to assess the precision and certainty of each text fragment by using a 5-point Likert scale quantified in a scale of 0–10. In addition, a third variable was added as an additional check: respondents were asked to assess the absence of relevant details and information from each text. Finally, the fourth part of the survey allowed respondents to make comments on the survey, make suggestions for improvement, and leave their email address to be informed of the results. It must be highlighted that the surveys were phrased in terms of precision and certainty, whereas vagueness measurements were made in terms of imprecision and uncertainty. Although the vagueness framework presented in this chapter conceptualises variables in “negative” form, a survey phrased in “positive” form
152 C. Gonzalez-Perez et al. was considered easier to understand for specialists not familiar with this vagueness theory. The first survey, corresponding to the Villa, was responded by 26 archaeologists. The second survey, corresponding to the Brooch, was responded by 15 archaeologists. Table 7.4 shows the survey results. 7.4.3 Discussion of Results The major hypothesis of the study was that correlations should be observed between vagueness measurements (as shown in Tables 7.2 and 7.3) and vagueness reported by specialists (as shown in Table 7.4). In particular, measured imprecision was expected to negatively correlate with perceived precision, and measured uncertainty was expected to negatively correlate with perceived certainty. In addition to studying these, correlations were also investigated between all additional pairs of measured against reported variables. Figure 7.1 shows the results for the primary hypothesis. For the Villa element, reported precision negatively correlated with measured precision, but very weakly. However, reported certainty negatively correlated with measured uncertainty to a greater extent. For the Brooch element, reported precision negatively correlated with measured precision quite strongly, and reported certainty negatively correlated with measured uncertainty much more weakly. This supports the proposed hypothesis. For the Villa element, a weak positive correlation was also found between reported certainty and measured imprecision (R2 = 0.2057). For the Brooch element, weak negative correlations were found across variables, that is, between reported precision and measured uncertainty (R2 = 0.1734) and between reported certainty and measured imprecision (R2 = 0.1908). Overall, the results seem to support the hypothesis. However, the low number of texts employed and the weak correlations do not allow us to draw strong conclusions. At this stage, we believe that there may exist additional factors, Table 7.4 Average and standard deviation for perceived precision, certainty and absence as reported by specialists through the surveys, in a scale of 0–10 Element Villa Brooch Text A B C D E F G H Precision average 4.10 5.30 7.10 4.80 7.33 5.17 3.33 7.17 Precision std. dev 2.22 2.58 2.31 2.23 1.93 2.13 2.69 1.25 Certainty average 3.30 4.90 7.20 6.40 7.00 6.17 4.67 7.50 Certainty std. dev 2.32 2.50 2.16 2.65 2.27 2.01 3.14 1.29 Absence average 7.20 5.50 4.10 5.50 6.50 7.00 8.00 4.17 Absence std. dev 2.68 2.65 2.63 2.65 2.55 2.45 2.08 1.49
7 Dealing with Vagueness in Archaeological Discourses 153 Villa (im)precision Villa (un)certainty 80 18 16 60 Measured uncertainty Measured imprecision 70 R² = 0.0274 50 40 30 20 10 0 R² = 0.2582 14 12 10 8 6 4 2 0 1 2 3 4 5 6 7 8 9 0 10 0 1 2 3 Reported precision 6 7 8 9 10 9 10 30 50 Measured uncertainty Measured imprecision 5 Brooch (un)certainty Brooch (im)precision 60 R² = 0.355 40 30 20 10 0 4 Reported certainty 0 1 2 3 4 5 6 Reported precision 7 8 9 10 25 20 R² = 0.0692 15 10 5 0 0 1 2 3 4 5 6 7 8 Reported certainty Fig. 7.1 Reported vs. measured (im)precision (left) and (un)certainty (right) for the Villa (top) and Brooch (bottom) elements beyond the lexical markers that were considered in this study, that influence how a text is perceived regarding precision and certainty. One candidate factor may be the style of the text. For example, text A scored the lowest for both reported precision and certainty for the Villa element. This is a text dating back to 1921, which employs a language that most people would consider too convoluted and old fashioned today. This may be contributing to the low reported values for this text. Another candidate factor may be the preconceptions and previous experiences of the specialists assessing the texts. For someone who knows much about an element such as the Villa or the Brooch, a text may look imprecise or uncertain even if other specialists with not as much experience with this particular element would see it as quite precise or certain. Finally, other linguistic devices in addition to the lexicon may be contributing to the specialists reported scores, such as sentence length and complexity or connector use. These observations are possible within the vagueness theoretical framework described in this chapter. For example, the fact that imprecision and uncertainty are separately defined and characterised allowed us to study how people perceive them through different sets of markers.
154 C. Gonzalez-Perez et al. 7.5 Conclusions In this chapter we have described how vagueness is treated in philosophy and computer science. We have also proposed an operationalised theory for vagueness variables, and illustrated it through an empirical study. By managing vagueness explicitly, as proposed in this chapter, archaeological texts can be explicit about their imprecision, inaccuracy, uncertainty and error. This allows more faithful representations of the archaeological record and more nuanced interpretations of their accounts. However, dealing with vagueness comes to a cost, as it increased the complexity of the associated information systems and even the associated field methodologies (Tobalina-Pulido & Gonzalez-Perez, 2020). We will keep working to find an acceptable balance between expressivity and complexity, and better approaches and techniques to implement this vagueness theory in archaeological software tools. References Abualdenien, J., & Borrmann, A. (2020). Vagueness visualization in building models across different design stages. Advanced Engineering Informatics, 45, 101107. https://doi.org/10.1016/ j.aei.2020.101107 Akiba, K. (2014). Vague objects and vague identity (Vol. 33). Springer. Ashtiani, M., & Azgomi, M. A. (2016). A hesitant fuzzy model of computational trust considering hesitancy, vagueness and uncertainty. Applied Soft Computing, 42, 18–37. https://doi.org/ 10.1016/j.asoc.2016.01.023 Austin, J. L. (1989). How to do things with words: The William James lectures delivered at Harvard University in 1955 (2nd ed.). University Press. Baxter, M. J. (2009). Archaeological data analysis and fuzzy clustering. Archaeometry, 51(6), 1035–1054. https://doi.org/10.1111/j.1475-4754.2008.00449.x Black, M. (1937). Vagueness. An exercise in logical analysis. Philosophy of Science, 4(4), 427– 455. [Online]. Available: http://www.jstor.org/stable/184414 Coletti, G. (2020). Decision Rules Under Vague and Uncertain Information. In Fuzzy Approaches for Soft Computing and Approximate Reasoning: Theories and Applications: Dedicated to Bernadette Bouchon-Meunier (pp. 85–97). Cham: Springer International Publishing. https:// doi.org/10.1007/978-3-030-54341-9_8 Cutting, J. (2019). German, Spanish and Mandarin speakers’ metapragmatic awareness of vague language compared. Journal of Pragmatics, 151, 128–140. https://doi.org/10.1016/ j.pragma.2019.03.011 de Runz, C., Desjardin, E., Piantoni, F., & Herbin, M. (2013). Using fuzzy logic to manage uncertain multi-modal data in an archaeological GIS. In International Symposium on Spatial Data. Quality-ISSDQ 13-15th June 2007. Enschede, the Netherlands, vol. 7, 2007. de Silva, C. W. (1995). Intelligent control. Routledge. Dinu, A., Hahn, W. v., & Vertan, C. (2017, November). On the annotation of vague expressions: A case study on Romanian historical texts. In Proceedings of the workshop on language technology for Digital Humanities in Central and (South-)Eastern Europe, pp. 24–31, https:// doi.org/10.26615/978-954-452-046-5_004. Evans, G. (1978). Can there be vague objects? Analysis, 38(4), 208. https://doi.org/10.1093/analys/ 38.4.208
7 Dealing with Vagueness in Archaeological Discourses 155 Fabbrini, F., Fusani, M., Gnesi, S., & Lami, G. (2001). An automatic quality evaluation for natural language requirements. In Proceedings of the seventh international workshop on RE Foundation for Software Quality (REFSQ’2001), pp. 4–5. Fantechi, A., Ferrari, A., Gnesi, S., & Semini, L. (2018, August). Requirement engineering of software product lines: Extracting variability using NLP. In 2018 IEEE 26th international Requirements Engineering conference (RE), pp. 418–423. https://doi.org/10.1109/ RE.2018.00053 Fermüller, C. G., Hofer, M., & Ortiz, M. (2017). Querying with vague quantifiers using probabilistic semantics. In Flexible Query Answering Systems: 12th International Conference, FQAS 2017, London, UK, June 21-22, 2017, Proceedings 12 (pp. 15–27). Springer International Publishing. Gasmi, M., & Bourahla, M. (2017). Reasoning with vague concepts in description logics. International Journal of Fuzzy System Applications, 6(2), 43–58. https://doi.org/10.4018/ IJFSA.2017040103 Gonzalez-Perez, C. (2017). How ontologies can help in software engineering. In J. Cunha, J. P. Fernandes, R. Lämmel, J. Saraiva, & V. Zaytsev (Eds.), Grand timely topics in software engineering (LNCS) (Vol. 10223, pp. 26–44). Springer. Gonzalez-Perez, C. (2018). Information modelling for archaeology and anthropology. Springer. Gupta, C., Jain, A., & Joshi, N. (2018). Fuzzy logic in natural language processing – A closer view. Procedia Computer Science, 132, 1375–1384. https://doi.org/10.1016/j.procs.2018.05.052 He, L., & Smit, E. (2021). Vague language in online medical consultation. European Journal of Health Communication, 2(1), 1–28. https://doi.org/10.47368/ejhc.2021.001 Hermon, S., & Niccolucci, F. (2002). Estimating subjectivity of typologists and typological classification with fuzzy logic. Archeologia e Calcolatori, 13, 217–232. Hyde, D. (2008). Vagueness, logic and ontology. Ashgate. Janier, M., Aakhus, M., Budzynska, K., & Reed, C. (2016). Modeling argumentative activity with inference anchoring theory. In D. Mohhamed & M. Lewinski (Eds.), Argumentation and reasoned action. Volume I proceedings of the 1st European conference on argumentation (Vol. 1, no. 62). College Publications. Jing, X., Pinel, P., Pi, L., Aranega, V., & Baron, C. (2008). Modeling uncertain and imprecise information in process modeling with UML. Joty, S., Carenini, G., & Ng, R. T. (2015). CODRA: A novel discriminative framework for rhetorical analysis. Computational Linguistics, 41(3), 385–435. https://doi.org/10.1162/ COLI_a_00226 Keefe, R. (2000). Theories of vagueness. Cambridge University Press. Lacerda, M. J., & Crespo, L. G. (2017, May). Interval predictor models for data with measurement uncertainty. In 2017 American Control Conference (ACC), pp. 1487–1492. https://doi.org/ 10.23919/ACC.2017.7963163 Lakoff, G. (1990). Women, fire, and dangerous things. University of Chicago Press. Lassiter, D., & Goodman, N. D. (2017). Adjectival vagueness in a Bayesian model of interpretation. Synthese, 194(10), 3801–3836. https://doi.org/10.1007/s11229-015-0786-1 Lebanoff, L., & Liu, F. (2018, August). Automatic detection of vague words and sentences in privacy policies. [Online]. Available: http://arxiv.org/abs/1808.06219 Leto Russo, P. G. (2019). A corpus-based study of vague language in political discourse: Trump and the strategic use of vague terms. Università degli Studi di Modena e Reggio Emilia. Li, S. (2019). Communicative significance of vague language: A diachronic corpus-based study of legislative texts. English for Specific Purposes, 53, 104–117. https://doi.org/10.1016/ j.esp.2018.11.001 Lieskovský, T., Duračiová, R., & Karell, L. (2013). Selected mathematical principles of archaeological predictive models creation and validation in the GIS environment. Interdisciplinarity and Archaeology – Natural Science in Archaeology, IV(2), 177–190. https://doi.org/10.24916/ iansa.2013.2.4
156 C. Gonzalez-Perez et al. Malyuga, E., & McCarthy, M. (2018). English and Russian vague category markers in business discourse: Linguistic identity aspects. Journal of Pragmatics, 135, 39–52. https://doi.org/ 10.1016/j.pragma.2018.07.011 Mann, W. C., & Thompson, S. A. (1987). Rhetorical structure theory: Description and construction of text structures. In Natural language generation (pp. 85–95). Springer. Martin-Rodilla, P., Gonzalez-Perez, C., Martín-Rodilla, P., Gonzalez-Perez, C., Martin-Rodilla, P., & Gonzalez-Perez, C. (2019a). Conceptualization and non-relational implementation of ontological and epistemic vagueness of information in digital humanities. Informatics, 6(2), 20. https://doi.org/10.3390/informatics6020020 Martin-Rodilla, P., Pereira-Farı̃a, M., & Gonzalez-Perez, C. (2019b). Qualifying and quantifying uncertainty in digital humanities: A fuzzy-logic approach. In ACM international conference proceeding series (pp. 788–794). https://doi.org/10.1145/3362789.3362833 Nicolucci, F., & Hermon, S. (2010). A fuzzy logic approach to reliability in archaeological virtual reconstruction, in: Nicolucci, F., & S. Hermon (eds.), Beyond the Artifact. Digital Interpretation of the Past. Proceedings of CAA2004, Prato 13–17 April 2004. Archaeolingua, Budapest, pp. 28–35. Novak, V. (2017). Fuzzy logic in natural language processing. In 2017 IEEE international conference on fuzzy systems (FUZZ-IEEE) (pp. 1–6). https://doi.org/10.1109/FUZZIEEE.2017.8015405 Olivé, A. (2007). Conceptual modeling of information systems. Springer. Ottomanelli, M., & Wong, C. K. (2011). Modelling uncertainty in traffic and transportation systems. Transportmetrica, 7(1), 1–3. https://doi.org/10.1080/18128600903244636 PROVIDEH. (2018). CHIST-ERA call 2016 – VADMU topic. http://www.chistera.eu/projects/ providedh Quammie-Wallen, P. (2021). Vague language in Hong Kong English, ‘something like that’. English Today, 37(1), 13–25. https://doi.org/10.1017/S0266078419000415 Ramos-Soto, A., & Martin-Rodilla, P. (2021). Enriching linguistic descriptions of data: A framework for composite protoforms. Fuzzy Sets and Systems, 407, 1–26. https://doi.org/ 10.1016/j.fss.2019.11.013 Rashkin, H., Choi, E., Jang, J. Y., Volkova, S., & Choi, Y. (2017). Truth of varying shades: Analyzing language in fake news and political fact-checking. In Proceedings of the 2017 conference on empirical methods in natural language processing (pp. 2931–2937). https:// doi.org/10.18653/v1/D17-1317 Raskin, V., & Taylor, J. M. (2014, June). Fuzziness, uncertainty, vagueness, possibility, and probability in natural language. In 2014 IEEE Conference on Norbert Wiener in the 21st Century (21CW) (pp. 1–6). https://doi.org/10.1109/NORBERT.2014.6893868 Reeler, C. (1999). Neural networks and fuzzy logic analysis in archaeology. In L. Dingwall, S. Exon, V. Gaffney, S. Laflin, & M. van Leusen (Eds.), Proceedings of the 25th anniversary conference, University of Birmingham, April 1997. Archaeopress. Runz, C. D., Desjardin, E., Piantoni, F., & Herbin, M. (2007). USING fuzzy logic to manage uncertain multi-modal data in an archaeological GIS. Russell, B. A. W. (1923). Vagueness. Australasian Journal of Psychology and Philosophy, 1, 84– 92. Syropoulos, A. (2016). A (basis for a) philosophy of a theory of fuzzy computation. https://doi.org/ 10.2478/kjps-2018-0009. Taheri, S. M., Ghadim, F. I., & Kabirian, M. (2019, January). Application of fuzzy inference systems in archaeology. In 2019 7th Iranian Joint Congress on Fuzzy and Intelligent Systems (CFIS) (pp. 1–4). https://doi.org/10.1109/CFIS.2019.8692167 Tobalina-Pulido, L., & Gonzalez-Perez, C. (2020). Valoración de la calidad de los datos arqueológicos a través de la gestión de su vaguedad. Aplicación al estudio del poblamiento tardorromano. Complutum, 31(2), 341–358. https://doi.org/10.5209/cmpl.72488 Toledo, E. Q. (2017). Vague language in the corpus of historical English texts (Vol. 2). van Deemter, K. (2010). Not exactly. In Praise of vagueness. Oxford University Press.
7 Dealing with Vagueness in Archaeological Discourses 157 Wetzel, L. (2018). Types and tokens. In E. N. Zalta (Ed.), The Stanford encyclopedia of philosophy (Fall 201). Metaphysics Research Lab, Stanford University. Williamson, T. (1996). Vagueness (Paperback). Routledge. Wittgenstein, L. (1989). Philosophical investigations (3rd ed. re. ed.). Blackwell. Zadeh, L. A. (1996). Fuzzy logic = Computing with words. IEEE Transactions on Fuzzy Systems, 4(2), 103–111. https://doi.org/10.1109/91.493904 Zadeh, L. A. (2002). A prototype-centered approach to adding deduction capability to search engines-the concept of protoform. In 2002 annual meeting of the North American fuzzy information processing society proceedings. NAFIPS-FLINT 2002 (Cat. No. 02TH8622) (pp. 523–525). https://doi.org/10.1109/NAFIPS.2002.1018115 Zadeh, L. A. (2010, August). A summary and update of ‘fuzzy logic’. In 2010 IEEE international conference on granular computing (pp. 42–44). https://doi.org/10.1109/GrC.2010.144
Chapter 8 Extending Discourse Analysis in Archaeology: A Multimodal Approach Jeremy Huggett Abstract Archaeology is a highly visual discipline, reliant on observation as well as description, and consequently makes extensive use of diagrams, maps, plans, illustrations and photography as well as textual narratives in communicating its interpretations of past material culture. If discourse analysis is to shed light on the construction of archaeological knowledge it therefore should seek to incorporate the visual alongside the textual, but at present discussion of the two modes are largely independent of each other with an emphasis on the text. A case study examines the interrelationships and interdependencies that exist between text and illustrations in archaeological grey literature, and argues that a multimodal approach to knowledge creation is called for which better reflects the different modes and media used in archaeology. Keywords Text · Visualisation · Multimodal · Knowledge · Grey literature 8.1 Archaeology and Discourse Analysis Discourse analysis is frequently defined quite specifically in terms of the analysis of text or speech. Originally defined by Harris as “a formal method for the analysis of connected speech or writing based on its linguistic components in order to obtain new information about the text under study” (Harris, 1952, p. 1), similar definitions are found in contributions to this volume (for example, Pereira-Fariña, GonzalezPerez, Martin-Rodilla, Lawrence et al., and Castiello) and elsewhere. For instance, “Discourse analysis examines patterns of language across texts and considers the relationship between language and the social and cultural contexts in which it is used” (Paltridge, 2012, p. 2) and “ . . . discourse analysis is a view of language at the level of text. Discourse analysis is also a view of language in use . . . ” J. Huggett (!) Archaeology, School of Humanities, University of Glasgow, Glasgow, UK e-mail: jeremy.huggett@glasgow.ac.uk © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 C. Gonzalez-Perez et al. (eds.), Discourse and Argumentation in Archaeology: Conceptual and Computational Approaches, Quantitative Archaeology and Archaeological Modelling, https://doi.org/10.1007/978-3-031-37156-1_8 159
160 J. Huggett (Paltridge, 2012, p. 7). Discourse analysis is often defined in terms of ‘language’ rather than purely text, and while this might be extended to non-linguistic forms of communication the context frequently indicates a more restrictive interpretation. For instance, Schiffrin et al. (2001a, p. 1) categorise discourse as anything beyond the sentence, as language in use, and as a broader range of social practice which included non-linguistic and non-specific instances of language. This third category would appear to admit visual languages, for example, although the volume (Schiffrin et al., 2001b) is fundamentally textual in outlook and while the content of the second edition of the volume (Tannen et al., 2015) is significantly different, the textual emphasis across the contributions is largely unchanged. Similarly, Gee (2011, p. 7) describes discourse analysis as the study of language-in-use, and suggests that “a Discourse is a ‘dance’ that exists in the abstract as a coordinated pattern of words, deeds, values, beliefs, symbols, tools, objects, times, and places” (Gee, 2011, p. 36). While this might potentially allow non-textual representations as a form of linguistic symbolism, the emphasis throughout remains on language as text and speech. Discourse is associated with language and in turn language is restricted to text and speech. But to what extent is archaeology a primarily textual discipline? It is common for archaeology to be described as “writing history”, creating “narrative accounts of past cultures” (QAA, 2014, p. 8), for example, which would imply the centrality of a textual approach. The close association between archaeology and text is evidenced through terminology such as the ‘archaeological record’, the decoration on a pot categorised as a ‘grammar’, the life history of an artefact described as a ‘biography’, and so on. The discipline of archaeology has itself been traditionally divided in textual terms, with historical archaeology focusing on ‘literate’ societies’ and leaving ‘illiterate’ societies to prehistoric archaeology: what Hawkes (1954, pp. 156–57) termed “text-aided” archaeology as opposed to “text-free” archaeology. The presence of texts in historical archaeology has seen by some as limiting the potential for archaeological theory, analysis, and interpretation, making archaeology subservient to history, a perspective leading some historical archaeologists to attempt to set texts aside and treat their studies as ahistorical (e.g. Moreland, 2006, pp. 136–37). At the same time, paradoxically, prehistoric archaeologists might bemoan the absence of texts (see, for example, Andrén, 1998, pp. 2–4; Moreland, 2003, pp. 9–13) since an archaeology without texts was seen to limit the degree to which past cultural activities could be reconstructed (Hawkes, 1954, p. 160ff). A textual focus is equally concerned with how archaeology and its associated knowledge is presented and communicated. The task of the archaeologist is to ‘write’ archaeology, from desk-based assessments and project proposals, to fieldwork with its inscriptions made in field notebooks, diaries, and context records, to the preparation of the final report and publication, and its subsequent incorporation in works of synthesis. Consequently “Writing long-form linear text remains the privileged form in which our research is shared” (Tringham & Danis, 2019, p. 62). As Joyce argued, archaeological discourse is dialogic in nature:
8 Extending Discourse Analysis in Archaeology: A Multimodal Approach 161 The formation of marked genres – including site reports and more popular media, such as museum exhibits – are formalizations of specific dialogues, amenable to analysis as genres. Archaeology is a textual practice from the field through the lab and into all forms of dissemination. (Joyce, 2002, p. 3) This apparent textual pre-eminence in archaeology was reinforced during the structural and post-structural theoretical debates in archaeology in the 1980s and 1990s concerning text-based approaches to understanding the past. As Hodder observed, the idea that material culture was itself a text that could be ‘read’ was tacitly assumed in archaeology and the challenge lay not so much in the idea that artefacts could ‘speak’ but in understanding what they meant (Hodder, 1986, p. 122ff; see also Hodder & Hutson, 2003, p. 167ff). Shanks and Tilley, jointly and separately, similarly sought to treat material culture as text, arguing that material culture was . . . a communicative medium of considerable importance for transmitting, storing and preserving social knowledge and as a symbolic medium for orientating people in their natural and social environment because of the relative permanence of material culture vis a vis speech acts. So material culture can be regarded in oral societies as a form of writing and discourse inscribed in a material medium in just the same way as words in chirographic and typographic cultures are inscribed on a page. (Shanks & Tilley, 1987, pp. 96–97; see also Shanks, 1992; Tilley, 1991, for example) Although Hodder initially argued that material culture was easier to decipher than texts where the language was not known, it was quickly recognised that the relationship was not straightforward and material culture did not communicate in the same way as writing. For example, Barrett (1988) rejected the idea of an archaeological ‘record’ left by the past that could be ‘read’ like a text; instead, the past consisted of fragmentary traces of social practices whose presences and relationships could best be understood as a field of discourse. Elsewhere, Olsen remarked that . . . we come to ignore the differences between things and text: that material culture is in the world and plays a fundamentally different constitutive role for our being in this world than texts and language. Things do far more than just speak and express meanings . . . (Olsen, 2003, p. 90) Others simply observed that the linearity associated with text made it a poor analogy for the spatial and temporal complexities of archaeological evidence (for example, Renfrew, 1989, pp. 35–36; see Preucel, 2006, pp. 138–42 for a useful overview). 8.2 From Text to Visualisation Archaeology and text has therefore been entangled in complex practical and theoretical ways over the years, further complicated by association with Foucault’s use of ‘archaeology’ as a means of analysing written and spoken discourse (Foucault, 1989). However, although the textual analogy retains considerable influence, its
162 J. Huggett pre-eminence is over-stated. An alternative perspective argues that archaeology is a “profoundly visual discipline” (James, 2015, p. 1189); it is “an explicitly visual science . . . [which] has from its very beginnings developed a distinctive visual language that it has used to communicate theories, technical principles, and data” (Moser, 1996, p. 185); a discipline in which its “illustrative traditions are central” (Moser, 2001, p. 280) and where its everyday discourse is reliant on visual communication (Bateman, 2006, p. 68). Indeed, “There can be no doubt that archaeological ‘imagination’ has always been visual to a large extent” (Hussain, 2021, p. 140). Visual representations have long been a key means of archaeological communication, as seen in Carter’s volumes on the Tutankhamun excavations, Flinders Petrie’s Ten Years Diggings in Egypt, or Gardner’s Ancient Athens, for instance (see Thornton, 2018). So much of archaeology is predicated on observation that visual representations can in some senses be seen as ‘natural’ – for example, Hope-Taylor proposed the existence of a universal visual language for archaeological evidence: Translate such data into words, and not only are they removed one step further from reality but also their meaning is put internationally at risk . . . Since we can all understand each other’s drawings and photographs whatever language we happen to speak, it must always be folly to verbalize where we could visualize. (Hope-Taylor, 1967, p. 181) So if the visual component of archaeology is so important, why has the textual emphasis remained so dominant? One characteristic of archaeological publications is that visual representations are often demoted to an accompanying role to the text, illustrating but otherwise contributing little to the discourse. For example, James points to what he calls “a widespread ‘logocentrism’ and ‘iconophobia’ . . . based on the notion that the more pictures a work has, the less seriously it is taken” (James, 1997, p. 24). More recently, Opgenhaffen (2021, pp. 354–55) has observed how few illustrations accompany more theoretically-inclined archaeological publications, whereas an extensively illustrated student textbook on archaeological theories, methods and practice contains no section on visualisation practice, emphasising the niche character of visualisation in archaeological communication. Pétursdóttir argues that . . . this attitude towards visual material is . . . anchored in a more deep-seated discrimination between word and image, between the articulated and the artistic and, more generally, in a semiotics of suspicion that has permeated the humanities and social sciences throughout the 20th century. (Pétursdóttir, 2020, p. 102) Part of this is bound up in a traditional distrust of images as lacking appropriate objectivity and transparency, a position inherited from logical empiricism which saw visualisation as inferior to text in a form of linguistic determinism (for example, Baigrie, 1996; Giere, 1996; Topper, 1996). This assumes that visual representations are at best illustrations in support of the text and denies their capacity to carry information, even evidence, in their own right. The power of the visual in discourse is nevertheless important: “You doubt what I say? I’ll show you.” And, without moving more than a few inches, I unfold in front of your eyes figures, diagrams, plates, texts, silhouettes, and then and there
8 Extending Discourse Analysis in Archaeology: A Multimodal Approach 163 present things that are far away and with which some sort of two-way connection has now been established. (Latour, 1990, p. 36) However, in such a scenario it would clearly still be possible to see visual representations as little more than props in an argument without necessarily having evidential value. The use of visual representations in archaeology has been seen as rather different to the standard approaches to scientific images because of the way that they do represent evidence in their own right (for example, Bueno, 2016, p. 15; Lopes, 2009; Hussain, 2021). This is not to suggest that they are not selective, or indeed, that they are objective, but such selection is part of the process of archaeological knowledge construction. In the field, for example, drawings are fundamental to the process of knowledge creation: The reiterative process through which site drawings are transformed into illustrations for publication gradually separates the image from the subjective interpretive process that was at the root of its inception. The conscious and unconscious decisions that were part of the image’s creation become embedded more deeply within the knowledge authority structures of the discipline. The fuzzily drawn lines are sharpened and the hesitantly drawn boundaries are strengthened and defined through the repeated tracing and redrawing of the original field drawing. (Bateman, 2006, p. 78) The means by which archaeological representations achieve this evidential status is through the use of conventions: social or symbolic practices which ensure commonality of understanding and enable comparability between visualisations (see, for example, Lopes, 2009, p. 12; Moser, 2001, pp. 268–69). The conventions determine the information to be included and frequently the way it is to be represented, creating what are effectively technical drawings of artefacts, maps, plans, stratigraphic sections, and the like. These conventions represent visual sets of rules, both tacit and explicit, which are accepted and understood by the archaeological community, if not beyond. Conventions differ between mode of visualisation (drawing, map, plan etc.) and subject (lithics, pottery, etc.), and between media (photograph, drawing, etc.), providing different ways of seeing and representation. Such conventions . . . work to imbue visualisations with the quality of objectivity (which brings together other qualities such as transparency, scientific-ness and facticity). This produces the impression that visualisations are showing the facts, telling it like it is, offering windows onto data. (Kennedy et al., 2016, p. 716) For example, the preference in archaeological field drawings for two-dimensional presentations – either top-down (as in maps and plans) or frontal view (as in section drawings) – may have its origins in field practice (recording via the two-dimensional permatrace sheet or computer screen, for instance), but it also carries with it an implicit objectivity (although it is not) and may present an impression of control and authority through a ‘god-like’ perspective (Kennedy et al., 2016, p. 723). Even when three-dimensional data is collected, as in structure-from-motion imaging of stratigraphic sections, they are frequently represented as two-dimensional images or tracings.
164 J. Huggett Classically, visual representations are seen to have a supporting role to the text: . . . the employment of a graphical feature, photograph, map, or other representational device to elucidate, explain, or show something in a text . . . the illustration is meant to summarize an argument, provide a reference point, or corroborate the text (Burdick et al., 2012, p. 43) The text retains priority in such a scenario: the image provides data or backing for an argument while the detail of the specific position is expressed textually. Indeed, if an illustration simply recapitulates the text, the necessity of its inclusion may legitimately be open to question (e.g. Candea, 2019, pp. 65–66). That aside, images are seen as a means of improving the readability and understandability of the text through their capacity to summarise and communicate information more economically, although their success is dependent on skilful presentation and often – ironically – on appropriate labelling and captioning. Even if they are not necessarily peripheral to the presentation of argument (c.f. Moser, 1996, p. 186), they can appear to add a spurious level of authority by virtue of their inclusion, with their ‘scientific’ air of objectivity and transparency. Furthermore, there may be unrecognised, even hidden implications embedded in the visualisation which go beyond the intentions of the author – for example, drawn elements such as circles imply closure, solid lines suggest clear boundaries (Candea, 2019, p. 76), and, of course, the range of doubts and uncertainties inherent in archaeological field drawings are frequently resolved in their final publication form. Like texts, images can also mislead . . . through the constant ambiguity between what is being figured and what is merely a convenient way to draw something. Is the distance between these two forms, their respective size, or the thickness of the line meant to be relevant, or is it merely the clearest way to arrange a picture on the page? (Candea, 2019, pp. 76–77) Crucially, text and image are different modes of expression, employing different languages and conventions: words and grammar producing sentences on the one hand, with shape, colour, size and space on the other, although aspects such as page layout and typography blur the distinction between the two. A consequence of these different modes is that there is always a semantic gap between text and image that is bridged through interpretation of the relationships between them. There may be a degree of functional equivalency between visual and textual arguments, but the means by which they present their information and their relationship with author and reader differ (e.g. van den Hoven, 2012, p. 258ff). The question remains, however, that if visual representations provide more than simply decoration or “bravura display” (Flanders, 1998, p. 309) for the accompanying text, should they not be incorporated as part of the analysis of a discourse, rather than that analysis focussing solely on the text? Even the most basic archaeological grey literature fieldwork reports contain often substantial graphical components alongside their texts (for example, see Fig. 8.1). Subsequently applying optical character recognition to extract the text for analytical purposes wrenches the text from that intimate relationship with the graphical and image components and restores the division between text and illustration to the state prior to the preparation of the final report, changing the interplay between text and image in the process.
8 Extending Discourse Analysis in Archaeology: A Multimodal Approach 165 Fig. 8.1 Snapshot of the Inverkeithing Friary excavation report (Beckett, 2018), excluding the cover and content pages and appendices This raises important questions concerning the role and function of the images and graphics, and the extent to which the text in the report is reliant on or independent of them. Are the visuals in archaeological texts critical to the discourse on the page, and can a discourse analysis focused on the text alone adequately capture the knowledge represented? 8.3 Discourse Analysis and Visualisation Understanding visualisation communication entails understanding the underlying codes – codes we may already know, at least implicitly, without necessarily knowing what we know or how we ‘read’ an image (Kress & van Leeuwen, 2006, pp. 32– 33). Such codes provide the vehicle through which a visualisation creates meaning.
166 J. Huggett Kjeldsen (2018, p. 79) describes visual images as providing a thick and rich but ambiguous representation because of the range of dimensions and visual details they provide, whereas text is seen as providing unambiguous but thin information. For instance, an archaeological statement such as ‘layer X is cut by layer Y’ describes a stratigraphic condition in a straightforward if abstract manner, whereas a matrix diagram demonstrates this visually along with other relationships that either layer might be involved in, a section diagram shows the relationship visually along with details of the shape and extent of the cut, and a photograph may show this together with an indication of the basis for the distinction between the layers based on the colour and texture differentiation, for example, often with other contextual elements visible in the background. Whether text can legitimately be described as unambiguous is also open to question: the relative regularity and clarity of textual codes might be mistaken for a lack of ambiguity, and differences in phrasing and shading can be used to imply uncertainty or lack of clarity in a description in much the same way as they can be represented visually. How meaning is communicated through visual representation has been categorised in numerous ways (see summary in Engelhardt, 2007, for example). For instance, Engebretsen and Weber (2018, pp. 277–78) identify a series of ‘graphic modes’ within the broader set of modes or semiotic resources. These graphic modes include typography, layout, maps, diagrams, drawings, and photographs, and each offer different semiotic affordances and employ different conventions. Furthermore, each of these modes consists of a set of what they call semiotic elements, or sub-modes (for example, font, size, colour, shape, spatial arrangement, etc.), each of which in turn can be broken down further into more specific characteristics (for example, hue, saturation, luminance, texture, etc.). In an alternative approach, Drucker (2014, pp. 65–66) categorises visualisation according to different parameters which can be combined in different ways to different ends. For example, there may be different graphical formats (maps, plans, timelines, charts, photographs, etc.), they may have different purposes (mapping, data presentation, calculation, etc.), they may have different types of content (spatial, temporal, quantitative, qualitative, interpretative, etc.), they may structure meaning differently (by analogy, through comparison, connection, in 2D, 3D, etc.), or may differ according to their disciplinary origins (geographical maps, geological sections, statistical charts, genealogical trees, etc.). In both cases, while some aspects of a visual code may be shared between different (sub)modes, others may be unique and indeed, may specifically characterise a particular visualisation method. There is also an analytical division in terms of the methodology used to expose the workings of visual representations as meaning-making devices. On the one hand, methods may be derived from linguistic analysis, based on the notion that visualisations possess a ‘grammar’ which enables them to be treated analogous to texts (based on Kress & van Leeuwen, 2006, for example). On the other hand, methods may be derived from information visualisation studies, itself concerned with the design of visual representations to facilitate understanding, employing graphical analytics and using visualisations to uncover relationships in other visualisations (see Kilchör & Lehmann, 2021; Uggla, 2021, for example).
8 Extending Discourse Analysis in Archaeology: A Multimodal Approach 167 Table 8.1 Some of the semiotic elements or sub-modes associated with visual representations organised according to the ideational, interpersonal, and compositional metafunctions What is being communicated? Type of data presentation (graph, chart, map, flowchart, network . . . ) Type of data (quantitative, qualitative . . . ) Type of information (facts, process, classification, structure, concept . . . ) Type of representation (comparison in size, ranking, distribution, correlation, space/location change over time . . . ) Type of subject (event, action, people, objects . . . ) Style (pictorial, non-pictorial . . . ) Basic semiotic resources (lines, points, circles . . . ) Visual variables used (size, shape, colour, texture, surface, volume, duration, order, perspective . . . ) What is not shown or omitted? Are other visuals integrated with this one? (embedded drawings, photos, etc.) How is it presented? Style of visualisation (scientific, hand-drawn, cartoon, standard software template . . . ) Purpose (narrative, descriptive, explanatory, argumentative, exploratory ...) User engagement (degree of interactivity . . . ) Relationship between author/reader (top-down, bottom-up, linear, non-linear, narrative, exploratory . . . ) Distance (small – ‘showing’ mode, large – ‘telling’ mode, viewpoint – 2D/3D, ‘god’ view, immersive . . . Attitude (professional, casual, sensational, impartial, objective, subjective, factual . . . ) Knowledge required (level of visual literacy) Framing (fact-based or not? Is uncertainty shown? Can different visualisations be chosen? Appearance of trustworthiness or reliability Balance between aesthetics and ethics How is meaning created? Grouping of units (proximity, spatial arrangement, foreground, background . . . ) Salience or emphasis (through size, colour, shape, contrast, repetition, dynamics . . . ) Framing of units (through axes, legend, caption, text boxes, frames, connecting lines, space, colour . . . ) Positioning of units (horizontal, vertical, radial, circular, top, bottom, centre ...) Nature of layout (2D/3D, gridded, alignment, contrast, consistency, symmetry, balance, margin . . . ) Navigation Hierarchy (information architecture, information layers . . . ) Reader guidance (defined reading path: left to right etc., no predefined path . . . ) Usability (information density and complexity, interactivity, accessibility, inclusion . . . ) Causal relations? (arrows, nodes and connectors . . . ) Adapted from Weber (2019, Tables 1, 2 and 3) Fundamentally, all approaches can be seen to build from three metafunctions originally defined by Halliday: ideational, interpersonal, and textual (e.g. Halliday & Matthiessen, 2006, p. 511ff). These essentially ask what is being communicated, how the content is presented to the ‘reader’, and how the composition is used to create meaning. For example, Weber (2019) has usefully structured a framework of textual and graphical modes of visualisation around these three metafunctions, and the semiotic elements or sub-modes associated with visual representations are summarised in Table 8.1. This analytical framework illustrates a common problem with many forms of discourse-related visualisation analysis: it is highly descriptive and consequently
168 J. Huggett labour-intensive to apply since the process of recording essentially constitutes a form of entextualisation, a translation of the visual characteristics into textual description (see, for example, Jones, 2021, p. 10ff), which tends to imply a high degree of human intervention in the process. This complexity may legitimately raise questions as to scalability of such approaches to large corpora, and in turn, how digital tools might be brought to bear on visualisation analysis. 8.4 Multimodal Discourse Analysis Although discourse studies more generally have privileged language, virtually equating one with the other, as Rheindorf notes, . . . if the ultimate aim of critically studying discourse is to reveal the ways in which it constitutes, maintains, and transforms social reality and relations . . . such logocentrism is a severe limitation: to focus only on the linguistic elements risks ignoring a significant portion of the meaning potential of texts . . . . (Rheindorf, 2019, p. 93) That said, the term ‘text’ has been stretched to cover other analytical objects, as in the case of archaeological approaches to material culture. Bateman et al. (2017, p. 52) observed that anything subjected to semiotic analysis could be treated as a ‘text’, one effect of which was in many instances to inappropriately associate the properties of texts with non-textual objects. Consequently terms like ‘visual language’, ‘visual grammar’, and ‘visual literacy’ potentially run the risk of mis-associating textual properties with visual representations and presumes that non-textual objects perform in a similar manner to texts. This can present problems when the non-textual is categorised and described in primarily textual terms. The logocentric nature of discourse analysis began to change with the recognition by Kress and van Leeuwen of what they called a “communicational ensemble” (2001, p. 111), acknowledging that meaning was created in many different ways through different modes and media coming together. Consequently, they argued, the idea that “language is the central means of representing and communicating even though there are ‘extra-linguistic’, ‘para-linguistic’ things going on as well – is simply no longer tenable, that it never really was, and certainly is not now.” (Kress & van Leeuwen, 2001, p. 111). They subsequently produced what they called a ‘grammar’ of images (Kress & van Leeuwen, 2006) but this multimodal approach to the creation of meaning extended beyond text and images, ranging across layouts, music, gestures, video and film, soundtracks, 3D objects, artefacts, space, architecture, etc. Kress (2010, p. 79) describes these multiple modes or different semiotic resources as presenting a challenge to notions of language since the different modes offer different potentials which affect the choice of modes used in specific instances of communication. Hence, for example, a multimodal analysis may involve the examination of the words and their presentation on the page in conjunction with the function and meaning of visual images, and the way the two semiotic resources are integrated with each other (e.g. O’Halloran, 2004, p. 1). For
8 Extending Discourse Analysis in Archaeology: A Multimodal Approach 169 instance, the text and image may refer to each other through cross-references in the text to the image, representations within the image of aspects of the text, and so on. In combination, therefore, they provide different possibilities for meaningmaking, and further, expand what is possible to express using one or other mode alone (Bateman, 2011, p. 17). While different modes may have specific properties uniquely associated with them, they are not necessarily restricted to them but may operate across different modes. For example, ‘framing’ in an image context may refer to the boundedness of the image, but it may equally refer to the layout of a text, or the divisions between architectural spaces, or the intervals in film or music (Kress & van Leeuwen, 2001, pp. 2–3). Although multimodal analysis implies a unified multidimensional approach across all modes, this is not always the case: each mode may be analysed individually. Although this clearly cuts across the objective of multimodal analysis and hence restricts potential outcomes, it recognises that a ‘true’ multimodal discourse analysis is highly complex. For example, the presentation of images, graphics, words, typography, and their spatial arrangement on a page represents an intricate tapestry of interrelationships, further complicated in a digital arena with the introduction of hyperlinks, sound, animations and moving images, making the treatment of all the semiotic modes as a single entity extremely challenging. Consequently, while it may be possible to treat the verbal-visual complex as a single analytical unit, alternatively it may be feasible to separate out each mode and analyse it on its own, perhaps drawing them all together as a final step (e.g. Bednarek & Caple, 2017, p. 9). To illustrate this, Bednarek and Caple developed a topology which allows any analysis to be positioned relative to choices about the unit of analysis and the semiotic mode (the two axes in Fig. 8.2) (Bednarek & Caple, 2017, pp. 9–12). For example, an analysis might be monomodal, focussing on a single mode within a single text (bottom right in Fig. 8.2), or a single mode across several texts (top right in Fig. 8.2), potentially repeating the study examining a different mode and ultimately combining both to generate a multimodal analysis. Alternatively, an analysis might be multimodal from the outset, looking at a combination of different modes across several texts (top left in Fig. 8.2) or within a single text (bottom left in Fig. 8.2). 8.5 Multimodal Analysis and Archaeological Discourse Archaeological scholarship on the written text itself has been primarily monomodal, focussing on the textual component and saying little about other means of communication that can be used in conjunction with text. One of the most extensive discussions of archaeology and text is that by Lucas (2019), a volume which itself only contains one figure (and hence reinforces Opgenhaffen’s (2021, pp. 354– 55) observation about the lack of illustration in such texts). Lucas’ discussion of the role of text in archaeological knowledge production says nothing about visual representation as a contributing factor, although it is interesting to consider
170 J. Huggett Fig. 8.2 A topology of semiotic resources and analytical units, focusing on choices surrounding the analysis of language and/or images in texts. (Adapted from Bednarek & Caple, 2017, Figure 1.3) the relationship and role of his figure illustrating the Folkton Drums in relation to the accompanying textual discussion of the drums (Lucas, 2019, pp. 147–49). By way of comparison, Fagan (2016) briefly refers to illustrations in a book that otherwise focuses on text, although Connah (2010) includes a full chapter on visual explanation in his book on writing in archaeology. Overall, however, there is only limited consideration of the multimodal nature of the writing process in archaeology. There is a considerable body of scholarship looking at different aspects of the nature and role of visualisation in archaeology, in addition to the manuals and guides defining methods and conventions (for example, Adkins & Adkins, 2009). There are discussions of analog and digital field drawing (e.g. Bateman, 2006; Morgan & Wright, 2018; Morgan et al., 2021), drawings and visual representations (e.g. Molyneaux, 1997; Moser, 1996, 2001, 2014; Perry & Johnson, 2014; Hussain, 2021), 2D and 3D digital imagery (e.g. Frischer & Dakouri-Hild, 2008; Garstki, 2017), photography (e.g. Carter, 2015; contributions in McFadyen & Hicks, 2019; Morgan, 2016; Shanks & Svabo, 2013), mapping (e.g. Gillings et al., 2019), and aerial and satellite imagery (e.g. Hanson & Oltean, 2013; Parcak, 2009), as well as a range of image-related contributions to Smiles and Moser (2005), for example. There is also a vast archaeological literature on material culture discourse
8 Extending Discourse Analysis in Archaeology: A Multimodal Approach 171 associated with artefacts, structures and monuments, for example, which is not considered here. Like their textual equivalents, most discussions of archaeological visualisation are primarily monomodal and focus on the particular type of visual representation concerned and say relatively little about the relationship of that mode with the broader context in which it may be reproduced. Some exceptions to this include, for example, Morgan’s study of photography at Çatalhöyük with its use of framing and semiotic codes (Morgan, 2016), Baird’s analysis of photography at Dura-Europos (Baird, 2011, 2019), Hussain’s comparative analysis of French and Anglophone lithic imagery (Hussain, 2021), Carter’s examination of the use of scales in archaeological site photography (Carter, 2015), or indeed, many of the studies in McFadyen and Hicks (2019). These would likely in other contexts be recognised as discourse studies wherein they primarily examine a single, if non-textual, mode. In some instances, however, they also refer to accompanying modes: for example, Carter describes how the importance of a photograph only becomes apparent from the accompanying text and comments on the arrangement of the images on the page (Carter, 2015, p. 9). Monomodal or multimodal, much of the work represented in the archaeological discussions of visualisation provides a valuable grounding for a wider discourse-based analysis examining the way in which archaeologists integrate linguistic and non-linguistic aspects within their discourses. Like texts, archaeological visualisations are often several steps removed from the phenomena they represent or organise: a field drawing may be one step removed, while a final publication drawing will be several steps further removed as a consequence of intervening interpretation, redrawing, and reconfiguration. The same can be said for photographs, with original images subject to subsequent enhancement, cropping, and resizing, for example. Such processes place the eventual viewer as observer at potentially some distance from the representation and the processes it has undergone. As Drucker observes: “The interpretative acts that become encoded in graphical formats may disappear from final view in the process, but they are the persistent ghosts in the visual scheme, rhetorical elements of generative artefacts” (Drucker, 2014, p. 66). To investigate the effects of this and examine the degree to which archaeological texts are reliant on accompanying images, maps, and diagrams for their meaning, a selection of archaeological grey literature reports can be examined. These are derived from the Archaeology Data Service Grey Literature Archive, which is particularly appropriate in this context given that the archive is increasingly being used as a corpus for discourse-style analysis and natural language processing (e.g. Richards et al., 2011, see also Wright and Evans, this volume). The example reports chosen here for the two case studies have been quasi-randomly selected; they are understood to be broadly representative of their type and their discussion should not be construed as criticism of the reports or their authors. Case Study 2 will be discussed in less detail since the focus will be on significant differences in presentation from Case Study 1, bearing in mind each are produced by different commercial archaeological organisations.
172 J. Huggett 8.5.1 Case Study 1: Excavation Report The Inverkeithing excavation report by Northlight Heritage (Beckett, 2018) is a data structure report (DSR), a required output of any archaeological intervention in Scotland and intended to provide the basis for further analysis and archiving. The structure of a DSR broadly corresponds to the reporting requirements laid down by professional bodies (e.g. CIfA, 2020b, pp. 13–15) and consists of a narrative account of the intervention accompanied by maps, plans and diagrams as required together with lists of data. This report concerns a small-scale excavation undertaken over a period of 12 days on the site of the former Franciscan Friary at Inverkeithing, in Fife, Scotland, in the area of what is currently a park garden. In this example, up to 40% of the main body of the report as produced (excluding the cover, contents pages, and appendices) consists of a mixture of photographs, maps, plans and section drawings (see Fig. 8.1). One feature that is emphasised early in the report is that this was in part a community project: ‘Back in the Habit – Digging for Inverkeithing’s Medieval Friary’. Many of the photographs provided in the report (see Fig. 8.3) quite literally flesh out the brief, factual statement on community engagement and education in the text which focuses on the number of volunteers, the number of school children, and the number of visitors on the open day, together with a brief description of what the school children did on their visit. Five out of the six images show people working or training, emphasising both the active engagement and nature of work activities undertaken. In most cases, faces are hidden or obscure, providing a degree of anonymity to those depicted. While it would be difficult to argue that the images contain information crucial to the report, they offer a useful flavour of the public engagement activities and in combination with the raw numbers of participants noted in the text provide a valuable indicator of public interest for audiences such as the funders of the project. This is underlined by a number of textual visitor accounts of their memories of the site. Finally, a photograph of a number of the volunteers and staff following a day of backfilling is provided at the end of the main report. This is one of the few images where faces are clearly visible: they are presumably among the list of volunteers provided in the acknowledgements, though none are identified. A key illustration is the site plan, showing the location of buildings, trenches, and other features (Fig. 8.4). Apart from the site’s map coordinates provided in the text, and a description of its location relative to other modern contemporary streets and buildings, there are no other textual details of the location of the site provided, which underlines the significance of the site plan in conjunction with the earlier location map. The plan incorporates a standard north arrow and scale bar, and the bounding border shows the overlying map grid with coordinates, making the scale bar slightly redundant. The boxed key distinguishes the flower bed and lawns by colour (although the empty delineated area to the lower right of the plan is actually lawn but not coded as such). The excavation trenches are delineated with dot-dash lines, and provided with labels that float outside each trench, lacking connectors
8 Extending Discourse Analysis in Archaeology: A Multimodal Approach 173 Fig. 8.3 Inverkeithing Friary: examples of photographs. Volunteers excavating (top left and right); school children learning (middle left and right); volunteers training (bottom left and right) (a composite of Beckett, 2018, Plates 2, 3 and 5)
174 J. Huggett Fig. 8.4 Inverkeithing Friary: site plan, showing trench locations (after Beckett, 2018, Figure 2) to link them unambiguously although the relationship is visually clear. A wall is labelled, again without a connecting line and the label itself is well-separated from its colour coded area, which may introduce some ambiguity. Correspondingly, the label referring to a projected wall line does have a connector, but the resolution is such that this might relate to the dashed line extending from the hospitium building (and otherwise unidentified), or alternatively to the faint dotted lines which link back to the wall in trench 2. Reference to the textual description in the report makes it clear that the latter applies. The interior of trenches 1 and 2 contain a series of areas bounded by dashed lines, labelled S1 to S10, whereas trench 3 contains two areas demarcated by dotted lines with numeric labels contained in square brackets. Trench 4 contains demarcated but unlabelled areas, including one that is colour coded. Given its similar treatment to the coded area within trench 2 which is labelled
8 Extending Discourse Analysis in Archaeology: A Multimodal Approach 175 as a wall, the same meaning could be assumed to apply here. None of the codes used in the labels are explained in the key or elsewhere on the plan, and their meaning is only found by reference to the text: ‘S’ stands for sondage, the square bracketed numbers represent cuts, while the coded area in trench 4 is a partial floor surface rather than a wall. In the text itself, trench locations are identified in general terms: ‘SW corner of the Friary Gardens’, ‘Eastern edge of the site’, for example, and the locations of the sondages are described in similar terms relative to their trench. Details such as depth and contents of the trenches are entirely contained within the textual descriptions since these would overload the two-dimensional plan. This brief overview highlights the considerable degree to which the textual component and the illustration are inter-dependent. Elements are contained within each that are not common to both, while other elements are only understandable by reference from one to the other. For example, the colour code in the site plan applied to the standing buildings and to the wall and floor features in the excavation is not shown in the key provided, but reading the text it becomes clear that this is used to denote structures in general. Similarly, the shared coded representation of the structures in trenches 2 and 4 could legitimately imply both are walls, given the label in trench 2, but the text description of trench 4 indicates that this is not the case. It is clear, therefore, that ambiguities in either the text or the plan require to be resolved by reference to the other. The section drawing in Fig. 8.5a is evidently diagrammatic in format, as indicated by the representation of grass on the ground surface and the hatching of the mortar layer (0009), for example. A key is provided to the shading used, and a scalebar is shown. Each of the layers are shown as bounded areas identified by numeric codes in brackets, but beyond the indications from the key, the nature of each of Fig. 8.5 Inverkeithing Friary: (a) SE-facing section of Sondage 6; (b) Plan of wall within Sondage 6/7 (a composite of Beckett, 2018, Figures 4 and 5)
176 J. Huggett the layers is dependent on the textual description. For example, (001) and (028) are described as topsoil/landscaping overburden, while (029) is a deposit of sandy loam containing oyster shell (indicated from the key) and some stones (not apparent from the drawing). The section drawing shows that layer (010) contained stone and oyster shell but the textual description indicates it also included green-glazed pottery and butchered animal bone. What is unclear is the extent to which the coded objects shown in the layers represent the actual position and shape of stones, charcoal, etc. or are purely representative. The stratigraphic relationships between layers are implicit within the section plan, and also specified in the context information table in the text appendices. In the text, the interpretation of the layers is separated by some distance from their description, reserved for a discussion/summary section. For example, both (029) and (009) in the section are interpreted as material left behind from the robbing out of the wall. The markers shown labelled b and b! presumably represent the section datum line used in the creation of the original field drawing but site coordinates for these are not provided so the precise location, orientation, and height of the section is not known. It is also difficult to tie the section drawing in with the plan provided (Fig. 8.5b), despite both relating to the same area. The plan in Fig. 8.5b contains a north arrow, scalebar, and a key to the shading used. The wall (017) is shown in some detail, although again the presentation is diagrammatic rather than artistic since the outlines of the stones conventionally represent where stone meets the surrounding matrix rather than a ‘true’ representation of the stone from above. The two sondages are shown and labelled with connectors, but their numbers are not shown. It is possible that the sondage in the top left is S6 on the basis that the section drawing of S6 is SE-facing (from its caption), the S6 section shows layer (31) to the right of the wall (17), and although layer (30) is not shown, its description in the report appendix indicates that (30) is under (10), which is shown. This demonstrates that ambiguities in visual representations may be resolved by reference to other visualisations, as well as to the accompanying text. Similarly, lack of detail in the text can often be resolved through information provided in the visual representations. 8.5.2 Case Study 2: Field Evaluation and Watching Brief The Wind Hill archaeological evaluation and watching brief report by AOC Archaeology (Walker, 2020) is a report structured according to the professional standards defined for reporting field evaluations (CIfA, 2020b, 13–15) and watching briefs (CIfA, 2020a, 14–15). The Wind Hill evaluation was undertaken in advance of the construction of a parking area and driveway, with the objective of establishing the presence or otherwise of any archaeological remains that might be encountered during the groundworks, and, if found, evaluating their extent, preservation, date, and significance. The report consists of a number of narrative sections followed by selected maps and drawings, photographs, with context summary tables in an appendix. Almost 50% of the main body of the report (excluding the cover, contents
8 Extending Discourse Analysis in Archaeology: A Multimodal Approach 177 Fig. 8.6 Wind Hill: site plan, showing the standardised template format and test pit locations. (After Walker, 2020, Figure 2) pages, and appendices) consists of maps, plans, sections and photographs. However, there is a sharp differentiation between text and image, with all the figures and plates placed in sections at the end of the report rather than embedded in the text at appropriate locations. Alongside the report is a digital site data archive (Walker, 2021) which includes the report, a set of photographs (only 9 of which appear in the report), and low resolution scans of original site records, including drawing sheets, registers of levels, photos, and finds, together with trench records and some selected context sheets. Figure 8.6 demonstrates the use of a standard template consisting of a large bounding box enclosing the drawn area and a series of small bounding boxes containing the figure number, north arrow, key, scale bar, and company logo. This could be seen as heightening an impression of professional reliability, perhaps reinforced by signs that the plan is digital rather than hand-drawn. The plan is not gridded and lacks coordinates so locational information for the site is limited to the description and national grid reference in the accompanying text. The plan provides the arrangement of test pits within the monitored area, adding more specific detail to the general locational information in the text which places test pits relative to the garden area (the proposed car park) and the proposed driveway to the south. Identification of the bounded areas beyond the monitored area is unclear from both plan and text, although Plates 1 and 8 (Walker, 2021) provide some contextual information, together with additional photographs in the digital archive.
178 J. Huggett Fig. 8.7 Wind Hill: (a) Final section drawing for test pit 4 (extracted from Walker, 2020, Figure 3); (b) Field drawing of test pit 4 section (reconstructed from Walker, 2021, drawing sheet 4) Figure 8.7a shows one example of a section drawing, extracted from a composite illustration of selected sections which uses the standard template incorporating scalebar, key, and logo etc. All the layers are demarcated with firm, strong boundaries with the exception of the interfaces between 4/006, 4/002 and 4/004 which are shown with dashed lines. Apart from labels for the areas representing layers and the cardinal points at the corners of the section, areas of natural deposit and an animal burrow are also labelled. Crosses mark the location of a horizontal datum line though no locational information is provided (although the height of the datum can be calculated from the levels register in the digital archive). All the information concerning the nature of the layers visible in the section is reliant on the accompanying text which indicates that the dashed lines represent interfaces between layers that were difficult to define clearly, and discusses the interpretation of 4/004 as a possible pit (subsequently interpreted as be root/animal disturbance) and cut 4/005 as a linear feature (thought to be modern disturbance). No distinction between layers and cuts is shown in the presentation of the context numbers, other than the use of connecting lines. The section drawing itself is a simplified, summary diagram of what was recorded in the field: reference to the archived field drawing (see Fig. 8.7b) shows the presence of annotations and the representation of stones in the sections, together with the location of pottery, marked as a strong black line in 4/006 to the left of the animal burrow and identified as glazed post-medieval pottery in the archived record (Walker, 2021). The interface between 4/001 and 4/006 is shown as a firm boundary in the final drawing but is evidently a dashed line in the field drawing, and is noted as a diffuse horizon in the archive record (Walker, 2021). This example highlights not only the interrelationship of illustration and text
8 Extending Discourse Analysis in Archaeology: A Multimodal Approach 179 but also the relationship between report and archive, in the way that it sheds light on details in the field records that do not make it through into the final report as a consequence of decisions made during the post-excavation process which are only revealed by virtue of access to the digital archive. These two case studies underline the inter-relationships between visual representation and textual description and interpretation in archaeological reports. At times, the visual expands on what is included in the text, at other times it simply illustrates what the text says. On other occasions, the text explains what the visual is showing, and in others again, the text and visual(s) interact to resolve questions about both. Other reports will not necessarily share the same ambiguities or the same sets of relationships, since these are in part related to convention, organisational and individual custom, and circumstances surrounding the archaeological intervention concerned. Again, the observations made here are not criticisms of the reports or their authors, but are derived from a close reading of the selected visual representations and their associated texts. The outcome of this overview demonstrates that, for some purposes at least, visual representations from archaeological texts cannot easily be ignored, as they do more than simply accompany the text or illustrate what is already clear from the textual descriptions. Of course, for certain approaches – for instance, natural language processing to extract the what, where, and when of archaeological sites (for example, see Wright and Evans, this volume) – the visual representations are not required since adequate information can usually be found in the text. However, if more detailed processing of the text is undertaken to extract information about features, contexts, objects, and so on, then it is very likely that the visual representations will provide important information, both supplementing and providing new information to add to that which can be derived from the text. Seeking to understand the archaeological data, and the subsequent warrants and claims concerning a site without including consideration of the visual representations accompanying the textual report will demonstrably risk inaccuracy and error. A multimodal analysis that incorporates all the modes embedded within such archaeological reports is therefore a necessity in all but the simplest of cases. 8.6 Digital Multimodal Discourse Analysis A multimodal analysis incorporating all the modes of communication present at the same time can be a highly complex task, a complexity compounded by the reliance on describing and categorising non-linguistic modes in textual terms. Questions about the extent to which this kind of analysis can be automated and conducted digitally with minimal human intervention have been asked for some time, largely because computers are commonly seen to be more compatible with linguistic rather than graphical forms of analysis. Optical character recognition enabled large bodies of text to be consumed digitally, while natural language processing techniques are capable of automating the annotation of texts. However,
180 J. Huggett Salway (2010, p. 50) highlighted that “A major obstacle to the computer-based analysis of multimodal texts is the current limit on what can be achieved with automatic image and video analysis techniques, compared with text analysis.”. Similarly, Thomas (2017, p. 2) describes illustrations as “the pictorial obstacles that computational tools come up against”. The primary problem identified is that “images . . . cannot be treated in the same computational way as texts: they cannot be marked up, retrieved or ‘mined’ like words.” (Thomas, 2017, p. 2). Text is seen to be more computationally tractable because words and sequences of words form explicit meaning-bearing units, whereas visual representations have no equivalent accessible units of meaning (e.g. Salway, 2010, pp. 51–52; Kirschenbaum, 2003, pp. 145–46). As a consequence, computer-based analyses of visual representations have relied on textual descriptions of selected characteristics or the detection of words and phrases within the textual component that refer in various ways to the accompanying images. However, this approach creates a significant semantic gap between a coded description of a graphical representation and what that visualisation actually contains (and correspondingly, what a ‘reader’ would see). The textual characterisation is a poor substitute for the original, and further, inserts an interpretative layer incorporating a specific theoretical perspective into the analysis. A method for automating the handling of images that has been increasingly investigated is the use of neural networks: deep learning computer systems which are capable of recognising objects and categorising images, and which can be used to automatically annotate an image collection. In supervised neural networks, a representation of the elements sought within the image dataset has to be coded or a metadata constructed in advance (Arnold & Tilton, 2019, p. i4). To do this, either an existing training dataset which has already been labelled may be employed, or a new training dataset has to be created which entails coding a set of images in advance. In either event, a degree of manual tagging or annotation is required before being subsequently applied automatically across the larger image set via the neural network. Alternatively, in unsupervised neural networks, the algorithm discovers patterns and groupings without the need for pre-labelled data. The complexity and work entailed in creating a training dataset can be considerable (for example, Hiippala et al., 2021, p. 673ff), which means that most analysts employ a pre-trained neural network and apply it to the data in question. Many of these pre-trained networks are based on the ImageNet dataset, consisting of over 14 million labelled images across over 20,000 categories (see Crawford & Paglen, 2019 for a critical overview). For example, Arnold and Tilton (2019, p. i10) employed an ImageNet-trained neural network on a collection of 170,000 images and were able to categorise the images on the basis of the dominant objects represented in each image. However, Wevers and Smits (2020) employed an ImageNet-trained neural network to categorise images within newspapers, but found that it did not work well on historic images. This is partly because ImageNet employs images scraped from the Internet, hence focuses on contemporary objects captured using high-resolution photography (Wevers & Smits, 2020, p. 200). The same problem has been experienced elsewhere; for example:
8 Extending Discourse Analysis in Archaeology: A Multimodal Approach 181 . . . networks are often challenged by unknown, often pre-modern object categories or objects defamiliarized by style properties. This is mainly because detection networks were trained on real photos and therefore have never seen instances of swords, medieval clothing or objects deformed by Cubism . . . (Lang & Ommer, 2021, p. 7) Other related problems arising with such pre-trained networks include the fact that resources such as ImageNet are primarily photographic in nature (Chávez Heras & Blanke, 2021, p. 1155) which limits their value for other non-linguistic data, so, although Wevers and Smits were able to use their neural network to differentiate between images and illustrations, for example, their subsequent analyses focus on images alone (Wevers & Smits, 2020, p. 197). Furthermore, images in ImageNet and other similar training datasets are labelled in terms of nouns, so although objects etc. may be detected, conceptual descriptions are not incorporated which is a limitation when it comes to understanding meaning-making. Pre-trained networks are also susceptible to the biases within the training data (see, for example, Crawford & Paglen, 2019) as well as what Offert and Bell call ‘perceptual bias’, defined as “the difference between the assumed ‘ways of seeing’ of a machine vision system, our reasonable expectations regarding its way of representing the visual world, and its actual perceptual topology.” (Offert & Bell, 2021, pp. 1133–1134). While Arnold and Tilton (2019) describe the use of neural networks as “distant viewing”, in which the ‘viewing’ is undertaken by the network, Offert and Bell argue for ‘close reading’ of feature visualisations. Using the output images generated by the neural network in response to the inputs, rather than the input images themselves, they are able to show that the original dataset is often heavily biased towards specific, misleading, depictions. For example, the ‘fence’ class “not only picked up the general geometric structure of the fence but also the fact that many photos of fences in the original dataset . . . seem to contain people confined behind these fences ... this also means that images of people behind fences will appear more fence-like to the classifier” (Offert & Bell, 2021, p. 1141). This underlines that algorithms do not look at an image in the way humans do: they handle images as matrices of pixel values (Wevers & Smits, 2020, p. 196). A fundamental problem with neural networks, however, is their lack of interpretability (c.f. Huggett, 2021, p. 424ff; see also Offert & Bell, 2021, pp. 1135– 1136). How the different layers of a neural network actually generate the identification is largely opaque, or, if examined closely, is largely uninterpretable to the human eye. A range of approaches to the interpretability of machine learning have been proposed: for example, building models that have explainability designed into them from the outset, post-hoc methods which seek to approximate the model in a way that is more easily explainable, and interactive methods which allow a clearer functioning of the model at each stage (Selbst & Barocas, 2018, p. 1110). All have limitations built into them, whether it is over-simplifying the model to make it understandable or limiting the range of variables under consideration, for example. Hiippala (2021, pp. 144–47) writes of the cascading risks in applying computational models, ranging from the selection of data and its annotation, the model selected, the training choices made, and the subsequent deployment of the model, and argues that these require an understanding of the underlying assumptions at each stage,
182 J. Huggett which underlines the importance of expertise spanning the humanities and computer science. In the end, however, computational models are not yet at the stage where they can meet Kirschenbaum’s challenge: Whereas it might be possible to imagine a pattern-matching algorithm that could distinguish between shepherds and sheep, how could a computer ever hope to recognize the difference between shepherds and, say, philosophers? (Kirschenbaum, 2003, p. 147) 8.7 Conclusions A recent discussion of information-making in archaeological field reports looks at how archaeologists document their information work practices within their reports, and finds that evidence for this occurs throughout the typical report (Huvila et al., 2021, p. 1120). Interestingly, the use of visual representations other than photographs as a means of documenting practice is not considered in what is primarily a textual review of report writing, essentially relying on whether or not the ‘event’ of drawing or photography is referred to in the narrative text rather than evidence of its presence in the report (Huvila et al., 2021, p. 1113). However, they do note that most reports contain photographs which may be used to depict the context of the site and its details along with images of archaeologists at work (Huvila et al., 2021, p. 1115), as also seen in the case studies discussed above. Clearly, including the different kinds of visual representations used in archaeological report – maps, plans, sections, etc. as well as photographs – in a multimodal study as discussed here would be a natural extension to this kind of work. However, the use of discourse analysis in archaeology (including the discussion here) has, perhaps inevitably, focused on finished products: the final textual reports of archaeological interventions. If a key objective of discourse analysis is to understand archaeological knowledge creation rather than to simply enhance the ability to categorise and locate archaeological reports, then this emphasis on the end products of archaeological practice is only part of the story. Archaeological reports are constructed from the products of fieldwork: the databases, excavation diaries, textual records, plans, sections, drawings, photographs, as well as the objects themselves. It is the creation of these and the way these are subsequently incorporated into the final narrative that constitutes the process of archaeological knowledge creation. This process is broadly equivalent to the acts of translation, transduction, and transformation defined by Kress (2010, pp. 124–30) in relation to the movement of meaning and meaning change. Translation essentially constitutes the movement of meaning from one mode to another, effectively from one ‘language’ to another (Kress, 2010, p. 124). This includes the shift from image to writing, from descriptive record to drawing, for example, such as takes place during the post-excavation phase of archaeological reporting. Kress calls this ‘transduction’: “the re-articulation of meaning from the entities of one mode into the entities of the new mode” (Kress, 2010, p. 125), emphasising the shift from words to visual representation or from photographic image to textual description. Kress also defines the process of
8 Extending Discourse Analysis in Archaeology: A Multimodal Approach 183 ‘transformation’, which entails the reordering of elements in a text or other semiotic object while remaining in the same mode and without ontological change (Kress, 2010, p. 129). For instance, using textual descriptions and interpretations from field records and incorporating them in narrative text would constitute transformation, whereas digitising a pencil field drawing would be transduction, moving from one material mode to another even whilst retaining its identity as a drawing. The same processes of translation, transduction and transformation might equally be applied to the archaeological recording of the material evidence in the first place, but such would be beyond the scope of this chapter and would further extend the discourse analysis into speech and gesture, for example (see Edgeworth, 2003, 2006, 2012, for instance). In relation to field recording, however, the work by Mickel (2015) and Sandoval (2020) on excavation diaries, Morgan et al.’s (2021) study of drawing and knowledge construction, and especially Sandoval’s examination of context records and their accompanying sketches (Sandoval, 2021) show how a close reading of such records can shed light on the knowledge creation process (see also Huggett, 2020, p. 10ff). The value of a multimodal analysis is the way in which it would draw all these threads (and more) together. In the same way that an understanding of knowledge creation in archaeology is limited by a discourse analysis focused primarily on finished texts, examinations of knowledge creation are, perhaps by definition, largely focused on creation rather than the consumption and consequent understanding of knowledge. For example, “Scholars studying multimodal discourse have mainly focused on meaning-making as the primary property of the text and as the result of the intention of the maker rather than as the result of the inference process carried out by the receiver of a multimodal text” (Tseronis & Pollaroli, 2018, p. 150). To a degree, this task is taken up by studies of argumentation, both within discourse studies and beyond (for example, see the approaches outlined in many of the contributions to this volume). However, the methods of analysis used in argumentation studies such as these are primarily textual in outlook and there is comparatively little consideration of the use of non-linguistic modes in archaeological argumentation. Beyond discourse studies, non-linguistic modes similarly lack discussion in the context of ‘reading’ archaeology: for example, Gibbon (2014) only refers to images in the mind, not on the page, while elsewhere the traditional emphasis remains on reading the past as if it were a text (for example, Hodder & Hutson, 2003). Groarke, for example, emphasises the range of semiotic resources that may be incorporated within argumentation, employed by the narrator and received by the reader: . . . there are modes of arguing that employ visuals of many different sorts (diagrams, graphs, photographs, videos, paintings, observation, etc.), tactile sensations, musical notes, non-verbal sounds, and a wide variety of other non-verbal elements. . . . In the age of print, an overwhelming emphasis on the verbal mode of arguing may have been adequate and appropriate. In a digital age, we need a set of modes that accommodates digital communication and the ease with which it embraces images and sounds of all sorts. (Groarke, 2015, p. 142) Clearly, coming to an understanding of how archaeological knowledge is created is an important endeavour, but an appreciation of how that knowledge is received
184 J. Huggett and how the different modes incorporated in its communication are employed is also critical to that understanding, and to subsequent developments in knowledge creation and communication. So while analyses of discourses that look beyond texts are important, analyses that consider all parties to a discourse, rather than just those who created it, are equally so. This emphasis on the need for multimodal analysis has been a core proposition in this chapter, with a particular stress on the importance of visual representations as one of the key vehicles in archaeological discourse. Consequently it has been argued that limiting discourse analysis to text, especially when visualisations are present, is a significant restriction in understanding the nature of that discourse. Archaeological visual representations are of more than secondary interest: “they intimately resonate with broader concerns of knowledge production and archaeological theory” (Hussain, 2021, p. 155). Unpicking those intimate relationships within the collective ensemble of semiotic resources used in archaeological knowledge creation is a task requiring a multimodal, rather than monomodal, approach. References Adkins, L., & Adkins, R. (2009). Archaeological illustration (Cambridge Manuals in Archaeology). Cambridge University Press. Andrén, A. (1998). Between artifacts and texts: Historical archaeology in global perspective (Contributions to Global Historical Archaeology). Springer. https://doi.org/10.1007/978-14757-9409-0 Arnold, T., & Tilton, L. (2019). Distant viewing: Analyzing large visual corpora. Digital Scholarship in the Humanities, 34(Supplement_1), i3–i16. https://doi.org/10.1093/llc/fqz013 Baigrie, B. (1996). Introduction. In B. Baigrie (Ed.), Picturing knowledge: Historical and philosophical problems concerning the use of art in science (pp. xvii–xxiv). University of Toronto Press. Baird, J. A. (2011). Photographing Dura-Europos, 1928–1937: An archaeology of the archive. American Journal of Archaeology, 115(3), 427–466. https://doi.org/10.3764/aja.115.3.0427 Baird, J. A. (2019). Exposing archaeology: Time in archaeological photographs. In L. McFadyen & D. Hicks (Eds.), Archaeology and photography: Time, objectivity and archive (pp. 73–95). Bloomsbury Visual Arts. Barrett, J. C. (1988). Fields of discourse: Reconstituting a social archaeology. Critique of Anthropology, 7(3), 5–16. https://doi.org/10.1177/0308275X8800700301 Bateman, J. (2006). Pictures, ideas, and things: The production and currency of archaeological images. In M. Edgeworth (Ed.), Ethnographies of archaeological practice: Cultural encounters, material transformations (pp. 68–80). AltaMira Press. Bateman, J. A. (2011). The decomposability of semiotic modes. In K. L. O’Halloran & B. A. Smith (Eds.), Multimodal studies: Exploring issues and domains (Routledge Studies in Multimodality 2) (pp. 17–38). Routledge. https://doi.org/10.4324/9780203828847 Bateman, J. A., Wildfeuer, J., & Hiippala, T. (2017). Multimodality: Foundations, research and analysis — problem-oriented introduction. De Gruyter. https://doi.org/10.1515/ 9783110479898 Beckett, A. (2018). Inverkeithing Friary archaeological excavation. (Northlight Heritage). Archaeology Data Service [distributor]. https://doi.org/10.5284/1058630
8 Extending Discourse Analysis in Archaeology: A Multimodal Approach 185 Bednarek, M., & Caple, H. (2017). The discourse of news values: How news organizations create newsworthiness. Oxford University Press. https://doi.org/10.1093/acprof:oso/ 9780190653934.001.0001 Bueno, O. (2016). Visual reasoning in science and mathematics. In L. Magnani & C. Casadio (Eds.), Model-based reasoning in science and technology (Studies in applied philosophy, epistemology and rational ethics) (pp. 3–19). Springer. https://doi.org/10.1007/978-3-31938983-7_1 Burdick, A., Drucker, J., Lunenfeld, P., Presner, T., & Schnapp, J. (2012). Digital humanities. MIT Press. https://doi.org/10.7551/mitpress/9248.001.0001 Candea, M. (2019). On visual coherence and visual excess: Writing, diagrams, and anthropological form. Social Analysis: The International Journal of Cultural and Social Practice, 63(4), 63–88. https://doi.org/10.3167/sa.2019.630404 Carter, C. (2015). The development of the scientific aesthetic in archaeological site photography? Bulletin of the History of Archaeology, 25(2), Art. 4. https://doi.org/10.5334/bha.258 Chávez Heras, D., & Blanke, T. (2021). On machine vision and photographic imagination. AI & Society, 36, 1153–1165. https://doi.org/10.1007/s00146-020-01091-y CIfA. (2020a). Standard and guidance for an archaeological watching brief. Chartered Institute for Archaeologists. https://www.archaeologists.net/codes/cifa CIfA. (2020b). Standard and guidance for archaeological field evaluation. Chartered Institute for Archaeologists. https://www.archaeologists.net/codes/cifa Connah, G. (2010). Writing about archaeology. Cambridge University Press. https://doi.org/ 10.1017/CBO9780511845383 Crawford, K., & Paglen, T. (2019, September 19). Excavating AI: The politics of images in machine learning training sets. Excavating AI.. https://excavating.ai Drucker, J. (2014). Graphesis: Visual forms of knowledge production. Harvard University Press. Edgeworth, M. (2003). Acts of discovery: An ethnography of archaeological practice (British Archaeological Reports International Series 1131). Archaeopress. Edgeworth, M. (Ed.). (2006). Ethnographies of archaeological practice: Cultural encounters, material transformations (Worlds of Archaeology Series). AltaMira Press. Edgeworth, M. (2012). Follow the cut, follow the rhythm, follow the material. Norwegian Archaeological Review, 45(1), 76–92. https://doi.org/10.1080/00293652.2012.669995 Engebretsen, M., & Weber, W. (2018). Graphic modes: The visual representation of data. In C. Cotter & D. Perrin (Eds.), The Routledge handbook of language and media (Routledge handbooks in linguistics) (pp. 277–295). Routledge. https://doi.org/10.4324/9781315673134 Engelhardt, Y. (2007). Syntactic structures in graphics. IMAGE. Zeitschrift Für Interdisziplinäre Bildwissenschaft, 5(3:1), 23–35. https://doi.org/10.25969/MEDIAREP/16745 Fagan, B. (2016). Writing archaeology: Telling stories about the past. Routledge. https://doi.org/ 10.4324/9781315415611 Flanders, J. (1998). Trusting the electronic edition. Computers and the Humanities, 31(4), 301– 310. Foucault, M. (1989). Archaeology of knowledge. (A. M. Sheridan Smith, Trans.). Routledge. https:/ /doi.org/10.4324/9780203604168. Frischer, B., & Dakouri-Hild, A. (Eds.). (2008). Beyond illustration: 2D and 3D technologies as tools for discovery in archaeology (British Archaeological Reports International Series 1805). Archaeopress. Garstki, K. (2017). Virtual representation: The production of 3D digital artifacts. Journal of Archaeological Method and Theory, 24(3), 726–750. https://doi.org/10.1007/s10816-0169285-z Gee, J. P. (2011). An introduction to discourse analysis: Theory and method (3rd ed.). Routledge. https://doi.org/10.4324/9780203847886 Gibbon, G. E. (2014). Critically reading the theory and methods of archaeology: An introductory guide. AltaMira Press.
186 J. Huggett Giere, R. (1996). Visual models and scientific judgement. In B. Baigrie (Ed.), Picturing knowledge: Historical and philosophical problems concerning the use of art in science (pp. 269–302). University of Toronto Press. Gillings, M., Hacigüzeller, P., & Lock, G. (Eds.). (2019). Re-mapping archaeology: Critical perspectives, alternative mappings. Routledge. https://doi.org/10.4324/9781351267724 Groarke, L. (2015). Going multimodal: What is a mode of arguing and why does it matter? Argumentation, 29(2), 133–155. https://doi.org/10.1007/s10503-014-9336-0 Halliday, M. A. K., & Matthiessen, C. M. I. M. (2006). Construing experience through meaning: A language-based approach to cognition. Open Linguistics Series. Continuum. Hanson, W. S., & Oltean, I. A. (Eds.). (2013). Archaeology from historical aerial and satellite archives. Springer. https://doi.org/10.1007/978-1-4614-4505-0 Harris, Z. S. (1952). Discourse analysis. Language, 28(1), 1–30. https://doi.org/10.2307/409987 Hawkes, C. (1954). Archeological theory and method: Some suggestions from the Old World. American Anthropologist, 56(2), 155–168. Hiippala, T. (2021). Distant viewing and multimodality theory: Prospects and challenges. Multimodality & Society, 1(2), 134–152. https://doi.org/10.1177/26349795211007094 Hiippala, T., Alikhani, M., Haverinen, J., Kalliokoski, T., Logacheva, E., Orekhova, S., Tuomainen, A., Stone, M., & Bateman, J. A. (2021). AI2D-RST: A multimodal corpus of 1000 primary school science diagrams. Language Resources and Evaluation, 55(3), 661–688. https://doi.org/ 10.1007/s10579-020-09517-1 Hodder, I. (1986). Reading the past: Current approaches to interpretation in archaeology (1st ed.). Cambridge University Press. Hodder, I., & Hutson, S. (2003). Reading the past: Current approaches to interpretation in archaeology (3rd ed.). Cambridge University Press. https://doi.org/10.1017/CBO9780511814211 Hope-Taylor, B. (1967). Archaeological Draughtsmanship: Part III. Antiquity, 41(163), 181–189. Huggett, J. (2020). Capturing the silences in digital archaeological knowledge. Information, 11(5), 278. https://doi.org/10.3390/info11050278 Huggett, J. (2021). Algorithmic agency and autonomy in archaeological practice. Open Archaeology, 7(1), 417–434. https://doi.org/10.1515/opar-2020-0136 Hussain, S. T. (2021). Compelling image-worlds: A pictorial perspective on the epistemology of stone artefact analysis in Palaeolithic archaeology. In S. A. de Beaune, A. Guidi, O. M. Abadía, & M. Tarantini (Eds.), New advances in the history of archaeology (Proceedings of the XVIII UISPP World Congress (4–9 June 2018, Paris, France)) (Vol. 16, pp. 138–170). Archaeopress. Huvila, I., Sköld, O., & Börjesson, L. (2021). Documenting information making in archaeological field reports. Journal of Documentation, 77(5), 1107–1127. https://doi.org/10.1108/JD-112020-0188 James, S. (1997). Drawing inferences: Visual reconstructions in theory and practice. In B. Molyneaux (Ed.), The cultural life of images: Visual representation in archaeology (pp. 22– 48). Routledge. https://doi.org/10.4324/9781315888460 James, S. (2015). “Visual competence” in archaeology: A problem hiding in plain sight. Antiquity, 89(347), 1189–1202. https://doi.org/10.15184/aqy.2015.60 Jones, R. H. (2021). Data collection and transcription in discourse analysis: A technological history. In K. Hyland, B. Paltridge, & L. L. C. Wong (Eds.), The Bloomsbury handbook of discourse analysis (pp. 9–20). Bloomsbury Academic. https://doi.org/10.5040/9781350156111 Joyce, R. A. (2002). The languages of archaeology: Dialogue, narrative, and writing. Blackwell Publishers. https://doi.org/10.1002/9780470693520 Kennedy, H., Hill, R. L., Aiello, G., & Allen, W. (2016). The work that visualisation conventions do. Information, Communication & Society, 19(6), 715–735. https://doi.org/10.1080/ 1369118X.2016.1153126 Kilchör, F., & Lehmann, J. (2021). Graphical viewing at a distance: Graphical analytics as a method for the investigation of illustrated books. Visual Communication, 20(3), 415–436. https://doi.org/10.1177/1470357220972165
8 Extending Discourse Analysis in Archaeology: A Multimodal Approach 187 Kirschenbaum, M. (2003). The word as image in an age of digital reproduction. In M. E. Hocks & M. R. Kendrick (Eds.), Eloquent images: Word and image in the age of new media (pp. 137–156). MIT Press. https://doi.org/10.7551/mitpress/2694.001.0001 Kjeldsen, J. E. (2018). Visual rhetorical argumentation. Semiotica: Journal of the International Association for Semiotic Studies/Revue de l’Association Internationale de Sémiotique, 220(January), 69–94. https://doi.org/10.1515/sem-2015-0136 Kress, G. (2010). Multimodality: A social semiotic approach to contemporary communication. Routledge. https://doi.org/10.4324/9780203970034 Kress, G., & van Leeuwen, T. (2001). Multimodal discourse: The modes and media of contemporary communication. Arnold. Kress, G., & van Leeuwen, T. (2006). Reading images: The grammar of visual design (2nd ed.). Routledge. Lang, S., & Ommer, B. (2021). Transforming information into knowledge: How computational methods reshape art history. Digital Humanities Quarterly, 15, 3. http://digitalhumanities.org/ dhq/vol/15/3/000560/000560.html Latour, B. (1990). Drawing things together. In M. Lynch & S. Woolgar (Eds.), Representation in scientific practice (pp. 19–68). MIT Press. Lopes, D. (2009). Drawing in a social science: Lithic illustration. Perspectives on Science, 17(1), 5–25. Lucas, G. (2019). Writing the past: Knowledge and literary production in archaeology. Routledge. https://doi.org/10.4324/9780429444487 McFadyen, L., & Hicks, D. (Eds.). (2019). Archaeology and photography: Time, objectivity and archive. Bloomsbury Visual Arts. https://doi.org/10.4324/9781003103325 Mickel, A. (2015). Reasons for redundancy in reflexivity: The role of diaries in archaeological epistemology. Journal of Field Archaeology, 40(3), 300–309. https://doi.org/10.1179/ 2042458214Y.0000000002 Molyneaux, B. (Ed.). (1997). The cultural life of images: Visual representation in archaeology. Routledge. https://doi.org/10.4324/9781315888460 Moreland, J. (2003). Archaeology and text (Duckworth Debates in Archaeology). Duckworth. Moreland, J. (2006). Archaeology and texts: Subservience or enlightenment. Annual Review of Anthropology, 35(1), 135–151. https://doi.org/10.1146/annurev.anthro.35.081705.123132 Morgan, C. (2016, September). Analog to digital: Transitions in theory and practice in archaeological photography at Çatalhöyük. Internet Archaeology, 42. https://doi.org/10.11141/ia.42.7 Morgan, C., & Wright, H. (2018). Pencils and pixels: Drawing and digital media in archaeological field recording. Journal of Field Archaeology, 43(2), 136–151. https://doi.org/10.1080/ 00934690.2018.1428488 Morgan, C., Petrie, H., Wright, H., & Taylor, J. S. (2021). Drawing and knowledge construction in archaeology: The Aide Mémoire Project. Journal of Field Archaeology, 46(8), 614–628. https:/ /doi.org/10.1080/00934690.2021.1985304 Moser, S. (1996). Visual representation in depicting the missing-link origins. In B. Baigrie (Ed.), Picturing knowledge: Historical and philosophical problems concerning the use of art in science (pp. 185–214). University of Toronto Press. Moser, S. (2001). Archaeological representation: The visual conventions for constructing knowledge about the past. In I. Hodder (Ed.), Archaeological theory today (pp. 262–283). Polity Press. Moser, S. (2014). Making expert knowledge through the image: Connections between antiquarian and early modern scientific illustration. Isis, 105(1), 58–99. https://doi.org/10.1086/675551 O’Halloran, K. L. (2004). Introduction. In K. L. O’Halloran (Ed.), Multimodal discourse analysis: Systemic-functional perspectives (Open Linguistics Series) (pp. 1–7). Continuum. Offert, F., & Bell, P. (2021). Perceptual bias and technical metapictures: Critical machine vision as a humanities challenge. AI & Society, 36, 1133–1144. https://doi.org/10.1007/s00146-02001058-z Olsen, B. (2003). Material culture after text: Re-membering things. Norwegian Archaeological Review, 36(2), 87–104. https://doi.org/10.1080/00293650310000650
188 J. Huggett Opgenhaffen, L. (2021). Visualizing archaeologists: A reflexive history of visualization practice in archaeology. Open Archaeology, 7(1), 353–377. https://doi.org/10.1515/opar-2020-0138 Paltridge, B. (2012). Discourse analysis: An introduction (2nd ed.). Bloomsbury Academic. Parcak, S. H. (2009). Satellite remote sensing for archaeology. Routledge. https://doi.org/10.4324/ 9780203881460 Perry, S., & Johnson, M. (2014). Reconstruction art and disciplinary practice: Alan Sorrell and the negotiation of the archaeological record. The Antiquaries Journal, 94, 323–352. https://doi.org/ 10.1017/S0003581514000249 Pétursdóttir, Þ. (2020). Visual essays: Different ways of knowing and communicating the archaeological. Norwegian Archaeological Review, 53(2), 101–103. https://doi.org/10.1080/ 00293652.2020.1860119 Preucel, R. W. (2006). Archaeological semiotics (Social Archaeology). Blackwell. QAA. (2014). ‘Subject benchmark statement: Archaeology’. UK quality code for higher education. Quality Assurance Agency for Higher Education. Renfrew, C. (1989). Comments on archaeology into the 1990s. Norwegian Archaeological Review, 22(1), 33–41. https://doi.org/10.1080/00293652.1989.9965488 Rheindorf, M. (2019). Revisiting the toolbox of discourse studies: New trajectories in methodology, open data and visualization. Palgrave Macmillan. https://doi.org/10.1007/978-3-030-19369-0 Richards, J., Jeffrey, S., Waller, S., Ciravegna, F., Chapman, S., & Zhang, Z. (2011). The archaeology data service and the Archaeotools project: Faceted classification and natural language processing. In E. C. Kansa, S. W. Kansa, & E. Watrall (Eds.), Archaeology 2.0: New approaches to communication and collaboration (pp. 27–56). Cotsen Institute of Archaeology Press. Salway, A. (2010). The computer-based analysis of narrative and multimodality. In E. Ruth (Ed.), New perspectives on narrative and multimodality (Routledge Studies in Multimodality) (pp. 50–64). Routledge. https://doi.org/10.4324/9780203869437 Sandoval, G. (2020). In pursuit of a reflexive recording. An epistemic analysis of excavation diaries from the Çatalhöyük Research Project. Norwegian Archaeological Review, 53, 135–153. https:/ /doi.org/10.1080/00293652.2020.1854338 Sandoval, G. (2021). Single-context recording, field interpretation and reflexivity: An analysis of primary data in context sheets. Journal of Field Archaeology, 46(7), 496–512. https://doi.org/ 10.1080/00934690.2021.1926700 Schiffrin, D., Tannen, D., & Hamilton, H. E. (2001a). Introduction. In D. Schiffrin, D. Tannen, & H. E. Hamilton (Eds.), The handbook of discourse analysis (1st ed.). Blackwell. https://doi.org/ 10.1002/9780470753460 Schiffrin, D., Tannen, D., & Hamilton, H. E. (Eds.). (2001b). The handbook of discourse analysis (1st ed.). Blackwell. https://doi.org/10.1002/9780470753460 Selbst, A. D., & Barocas, S. (2018). The intuitive appeal of explainable machines. Fordham Law Review, 87, 1085–1139. https://doi.org/10.2139/ssrn.3126971 Shanks, M. (1992). Experiencing the past: On the character of archaeology. Routledge. https:// doi.org/10.4324/9780203973639 Shanks, M., & Svabo, C. (2013). Archaeology and photography: A pragmatology. In A. GonzálezRuibal (Ed.), Reclaiming archaeology: Beyond the tropes of modernity (Archaeological Orientations) (pp. 89–102). Routledge. https://doi.org/10.4324/9780203068632 Shanks, M., & Tilley, C. (1987). Social theory and archaeology. Polity Press. Smiles, S., & Moser, S. (Eds.). (2005). Envisioning the past: Archaeology and the image. Blackwell. https://doi.org/10.1002/9780470774830 Tannen, D., Hamilton, H. E., & Schiffrin, D. (Eds.). (2015). The handbook of discourse analysis (2nd ed.). Wiley. https://doi.org/10.1002/9781118584194 Thomas, J. (2017). Nineteenth-century illustration and the digital. Springer. https://doi.org/ 10.1007/978-3-319-58148-4 Thornton, A. (2018). Archaeologists in print: Publishing for the people. UCL Press. https://doi.org/ 10.14324/111.9781787352575 Tilley, C. (1991). Material culture and text. Routledge. https://doi.org/10.4324/9781315746883
8 Extending Discourse Analysis in Archaeology: A Multimodal Approach 189 Topper, D. (1996). Towards an epistemology of scientific illustration. In B. Baigrie (Ed.), Picturing knowledge: Historical and philosophical problems concerning the use of art in science (pp. 215–249). University of Toronto Press. Tringham, R., & Danis, A. (2019). Doing sensory archaeology. In R. Skeates & J. Day (Eds.), The Routledge handbook of sensory archaeology (1st ed., pp. 48–75). Routledge. https://doi.org/ 10.4324/9781315560175-4 Tseronis, A., & Pollaroli, C. (2018). Introduction: Pragmatic insights for multimodal argumentation. International Review of Pragmatics, 10(2), 147–157. https://doi.org/10.1163/1877310901002001 Uggla, K. (2021). Interpreting information visualization. In S. Petersson (Ed.), Digital human sciences: New objects – New approaches (pp. 103–126). Stockholm University Press. https:/ /doi.org/10.16993/bbk/ van den Hoven, P. (2012). The narrator and the interpreter in visual and verbal argumentation. In F. H. van Eemeren & B. Garssen (Eds.), Topical themes in argumentation theory: Twenty exploratory studies (Argumentation Library) (pp. 257–271). Springer. https://doi.org/10.1007/ 978-94-007-4041-9_17 Walker, M. (2020). Wind Hill, Bransdale, North Yorkshire – Archaeological evaluation and watching brief report (AOC Archaeology Group 52051). Archaeology Data Service [distributor]. https://doi.org/10.5284/1085027 Walker, M. (2021). Site data from an archaeological evaluation and watching brief at Wind Hill, Bransdale, North Yorkshire (AOC Archaeology Group). Archaeology Data Service [distributor]. https://doi.org/10.5284/1085027 Weber, W. (2019). Towards a semiotics of data visualization – An inventory of graphic resources. In 2019 23rd international conference information visualisation (IV) (pp. 323–328). https:// doi.org/10.1109/IV.2019.00061 Wevers, M., & Smits, T. (2020). The visual digital turn: Using neural networks to study historical images. Digital Scholarship in the Humanities, 35(1), 194–207. https://doi.org/10.1093/llc/ fqy085
Part II Computational Techniques
Chapter 9 Computer Processing of Language: Where Archaeological Discourse and Computers Meet Patricia Martín-Rodilla Abstract Archaeological practice produces a vast amount of documentation about our past in form of archaeological narratives in free-format texts (internal reports, academic publications or dissemination activities). This huge amount of unstructured textual documentation has produced in recent years an increasingly interest in the application of computational processing of natural language as part of the archaeological research and practice. Advancing in the understanding, analysis, processing and exploitation of these archaeological narratives by machines requires an in-depth training work in methods and techniques in natural language processing from other areas, such as linguistics, software engineering or artificial intelligence. This chapter provides a briefly historical and typological review of the different computational approaches for natural language processing (which will be addressed in subsequent chapters of the volume), to later focus on the computational processing of archaeological discourse and its possibilities. The archaeological discourse in form of free-textual narratives, constitutes the expression and reflection of the archaeological knowledge produced. Thus, a computational discursive treatment is necessary to advance in the archaeologist-computers relation. The different computational approaches adopted at the discourse level and what kind of applications in archaeology are possible will be seen here. Keywords Natural language processing · Archaeology · Discourse parsing · Computational discourse analysis P. Martín-Rodilla (!) Department of Computer Science and Information Technologies, University of A Coruña, A Coruña, Spain e-mail: patricia.martin.rodilla@udc.es © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 C. Gonzalez-Perez et al. (eds.), Discourse and Argumentation in Archaeology: Conceptual and Computational Approaches, Quantitative Archaeology and Archaeological Modelling, https://doi.org/10.1007/978-3-031-37156-1_9 193
194 P. Martín-Rodilla 9.1 Introduction Documentation of the archaeological practice is one of the most important activities of any archaeologist. The vast amount of raw data and information processed as a results of fieldwork campaigns, qualitative work (surveys, anthropological studies in archaeology, etc.) or quantitative approaches (e.g. archaeometry’s studies or geographical, network-based or dating analysis, among others) is commonly interpreted and reported in form of complex and elaborate natural language narratives about these findings, either in reports (the so-called grey literature) or in academic and dissemination publications (scientific journals, books, teaching materials, etc.). The presence and need for treatment of all this knowledge in freestyle texts, together with the highly techno-medication of some processes and methodologies in archaeology (Huggett, 2004), has caused an increasingly interest of the archaeology community in the computational processing of natural language (especially from textual sources) and its possibilities in archaeology. We can find, for example, large archaeological projects at the European level that study NLP possibilities or that perform specific applications of NLP to archaeological contexts and sources, such as ARIADNE (Vlachidis et al., 2017) or more recently SEADDA (SEADDA Project, 2020). With these recent works in mind, the aim of this chapter is twofold. First of all, the chapter gives the reader a non-exhaustive but multifaceted and up-to-date overview of natural language processing area. It is important to highlight here that natural language processing, as an interdisciplinary area, covers a vast field of knowledge impossible to address in one book chapter. Therefore, it is not the goal of this first part of the chapter to cover all the theories, methods or techniques available in NLP, but rather to briefly organize the historical evolution of the discipline and to provide a small typology of techniques, tools and problems approached with NLP at the computational level. This first part serves as the basis for the rest of the chapters of the volume that deal in depth with some of these problems and/or methods and their application in archaeology. Thus, this first section of the chapter takes a short historical journey through the discipline of Natural Language Processing (hereafter NLP) and the type of information that we can currently extract and analyze using automatic and semi-automatic NLP methods. This information is extracted at different linguistic levels (lexical, grammatical, etc.) and these levels determine the type of application and research goals at the archaeological level that can be proposed. This tour also illustrates some real projects that extract linguistic information from archaeological sources. Secondly, the chapter focuses on the computational processing of archaeological discourse and its possibilities. In recent years, discourse analysis is being integrated into the human-machine relationship, allowing us some automation and assistance via software in the identification of discursive elements, referenced ontological entities and inferential relationships. Intrinsically narrative domains, such as historical, archaeological, or anthropological studies need this level of computational treatment
9 Computer Processing of Language: Where Archaeological Discourse. . . 195 in order to be able to perform a real analysis of the knowledge produced which is expressed in free texts. This bidirectional relationship between discourse and archaeological knowledge, (a transversal ideal across this volume) is explored in this chapter from a more computational point of view, addressing techniques, methods and tools for automatic and semi-automatic analysis of discourse and its application to current and future archaeological research. 9.1.1 A Brief History of NLP We could situate Natural Language Processing (hereafter NLP) as an interdisciplinary study area, since (1) it requires aspects of linguistics, cognitive science, computer science and artificial intelligence fields, and (2) it is an area clearly defined by its problem orientation: its main objective is to improve the interaction between machines and humans using natural language, being machines capable of understanding, processing, and producing expressions in natural language. This last aspect, about the natural language production by machines, has given rise to a subarea called natural language generation, being outside the scope of this chapter. We will therefore focus on the relevant advances that have been made in the relation between machines and humans in natural language, in which humans produce natural language data and machines try to understand and process them. Although contributions from previous formal linguistic theories are clearly influenced on the field, the 1950s is often cited as the beginning of the discipline of NLP as we know it today, and specifically, the works of Alan Turing in the definition of intelligent machines and their relationship with the language in Machine and Intelligence (Turing, 2009) and the works of Noam Chomsky in Syntactic Structures book in 1957 (Chomsky, 2002), as foundations on the study of the relations between natural language and computers. At this point, Chomsky created a style of grammar called Phase-Structure Grammar, which methodically translated natural language sentences into a format that is usable by computers. Subsequent Chomsky works in generative grammar resulted on real developments during 1960–1970s, such as SHRDLU (Winograd, 1971), a natural language system that allow to give some orders to the computer using “blocks worlds” with restricted vocabularies (using memory as contextual information for the system) or ELIZA (Weizenbaum, 1966), a basic conversational simulator written by Joseph Weizenbaum between 1964 and 1966. It was also at this time that the U.S. National Research Council (NRC) created in 1964 the Automatic Language Processing Advisory Committee (ALPAC) with the mission of evaluating the progress of Natural Language Processing research. ALPAC analyzed 12 years of research and $20 million dollars invested in NLP, especially in machine translation related developments. ALPAC report (Pierce & Carroll, 1966) was issued in 1966 and constitutes a turning point in the NLP research. ALPAC report was overly critical of research done in machine
196 P. Martín-Rodilla translation so far and emphasized the need for basic research in NLP area. The report publication caused the U.S. government to reduce its funding of the topic dramatically as well as a drastic worldwide drop in NLP research. In the early 1980s there was a resurgence of NLP studies, thanks to symbolic approaches and some grammatical extensions (Indurkhya & Damerau, 2010). NLP starts to be a highly specialized area, with some well-defined tasks in order to divide the efforts and obtain better results at some linguistic levels. For example, the study of reference and coreference problems, the rule-based parsing approach or the Rhetorical Structure Theory at a discursive level are important NLP advances at this time (Indurkhya & Damerau, 2010; Lesk, 1986; Mann & Thompson, 1988). The success of the previous advances came only in specific contexts or domains, almost as ad hoc applications. To try to overcome this barrier in NLP generalization, the 1990s was dedicated to applying purely grammar-based approaches (later called the rule-based approaches). The NLP rule-based solutions were evaluated following statistical methods, with the aim of comparing systems performance and achieving a certain degree of generalization in the developments provided. Some quantitative models became popular for some NLP applications, such as N-Grams models (Rosenfeld, 2000) (an N-gram model predicts the occurrence of a word based on the occurrence of its N-1 previous words in the text). Some of the most relevant successes in purely statistical NLP are in the subfield of automatic translation (Hirschberg & Manning, 2015; Manning & Schutze, 1999), whereas it present modest results in other specific NLP problems and tasks. Seeking to an NLP improvement in which tasks that the symbolic, rule-based and statistical approaches did not offer good results, in 1997 neural networks were applied for the first time (i.e., LSTM recurrent neural net models (Khurana et al., 2017)), which would constitute the beginning of the approaches in machine learning (as opposed to the rule-based approaches). Currently, advances in representation learning and deep machine learning methods are commonly used in natural language processing, with results showing that such methods can achieve state-of-the-art results in many NLP tasks. Thus, we can find today NLP supervised and unsupervised machine learning approaches. The supervised approach is based on a training dataset that has been generally annotated by human experts, which is used by the machine to learn generalizations and data patterns and then to recognize them in larger textual corpus. Supervised methods require the annotation of a small corpus of training documents, a time-consuming task but which is a less labor-intensive task than the creation of hand-crafted rule-based systems. Unsupervised approach refers to machine learning method without any human intervention in the machine learning process. Thus, some probabilistic clustering techniques are employed for creating the training dataset (without human annotations) and for obtaining an output result, with the subsequent application to a larger corpus. This dichotomy between rule-based and machine learning approaches continues today, with different theoretical positions in the NLP researchers (Hirschberg & Manning, 2015; Indurkhya & Damerau, 2010). Some advantages of the machine learning approach point to a greater generalization of the solutions achieved and use
9 Computer Processing of Language: Where Archaeological Discourse. . . 197 of the volume of textual data currently available, with adequate precision results. In terms of the rule-based approach, the advantages point to high customization of the solution and some higher accuracy rates in some NLP tasks, as well as the need for no prior annotated corpora or high computational processing. In recent years, approaches based on green computing, concerned about the impact of computing and its practices on the environment, have warned of the huge amount of energy and resources of all kinds used in training machine learning systems for any task. In this sense, tasks whose rule-based approach shows similar precisions to learning approaches can constitute a sustainable alternative for many NLP applications (Strubell et al., 2019). Currently, research trends include hybrid sustainable NLP methods that allow efficient use of resources with only training phases if necessary, analyzing risks and pros and cons of each approach, with a more hybrid methodology. This hybrid approach mainly affects NLP applications with a large volume of unstructured information (textual or oral), processing requirements (as the cases of the heritage, cultural and archaeological domain applications). Note that the applications of all possible NLP approaches to textual, humanistic, cultural, and archaeological domains will be discussed in following sections of this chapter. After the previous historical review of NLP, the next section details the language levels and type of tasks that NLP is currently addressing. 9.2 Natural Language: Understanding Levels and Formal Structure for Computational Analysis As could be seen in the previous historical tour, NLP has gone through different stages of development at a technological and methodological level. Generally, the transition from one stage to another has always been marked by a technological advance in information extraction or in information analysis methods. These advances did not necessarily occur within the NLP community, but rather the community adopted general approaches and evaluated them to verify that these advances offered good results in the various tasks that conforms NLP as a discipline. We can therefore identify four current paradigms that coexist in the developments of NLP: • Rule-based methods: It involves the production of sets of linguistic symbology or sets of rules and grammars (usually handwritten), relying on the manual encoding of linguistic (and world) knowledge. The rules allow the identification, extraction and analysis from text at different linguistic levels, as well as the application of derivation rules that expand the results (Boufaden et al., 2002; Indurkhya & Damerau, 2010; Polanyi et al., 2004). • Statistical methods: It involves the formalization of the NLP problem in terms of a statistical problem, and the subsequent application of statistical models to confirm or refuse the significance of the model’s results for this specific NLP task (Manning & Schutze, 1999; Rosenfeld, 2000).
198 P. Martín-Rodilla • Learning Methods (Neural): It involves the application of automatic algorithms for learning how to solve the NLP task achieved. Both supervised and unsupervised learning methods are applied, always relying on the capability of the computer to learn the linguistic knowledge for a big volume of linguistic information (in form of corpora) (Khurana et al., 2017; Liu et al., 2020). • Hybrid intelligence methods: It involves the injection of deep and structured linguistic knowledge (defined by humans as formal knowledge models, not just annotated texts) into learning models (in machine learning or deep learning approaches) to develop hybrid approaches for NLP tasks. In this hybrid approaches, abstract and structured knowledge from specialists can be used not just as training data to learn uninterpretable black-box models, but also to design the models themselves by making them more transparent, easy to interpret by humans, and more efficient for specific purposes (Gamallo et al., 2020). This paradigm also presents some connections in foundations with cognitive science (Mishra & Bhattacharyya, 2018; Sharp & Delmonte, 2015), in which the cognition information about the language (studies with eye-tracking or sensors about the brain mechanisms, etc.) are also include in some computational models. As previously seen, each paradigm has its advantages and disadvantages, as well as implications in technological and project organization decisions, in terms of characteristics of the results obtained and in sustainability depending on the choice. It is also possible to combine paradigms or to focus on one paradigm for each specific NLP task defined. For more details on this intra-paradigm analysis, see Chowdhury (2003), Hirschberg and Manning (2015), Indurkhya and Damerau (2010), and Khurana et al. (2017). But it is not only the technological and methodological paradigm to adopt that we must select when we faced with an NLP challenge. Depending on the goal and the kind of information that we need to extract and analysis on each NLP project, we must also decide at what natural language processing level we want to work. NLP has been subdivided in tasks, creating a typology of tasks (with different levels of complexity and abstraction in their tasks’ definition), with the aim of solving NLP problems at different linguistic levels. Although there are other categorizations of language levels in Linguistics, it is common to use the categorizations proposed by Liddy and Feldman (Feldman, 1999; Liddy, 1998) to structurally organize the language and later define tasks and challenges in NLP within each level. Thus, it is possible to extract meaning from a written text (or spoken language) at seven levels (Feldman, 1999; Liddy, 1998), from lower to higher level of abstraction: • Phonological level: analysis of pronunciation and prosody aspects, including phoneme recognition or similarity tasks, etc. It is common to combine NLP applications with voice recognition and audio speech technologies in order to apply NLP at this level (Chaudhary et al., 2018). • Morphological level: analysis of smallest piece of language to obtain some meaning. This includes to deal with stems, suffixes and prefixes or lemmas (Balakrishnan & Lloyd-Yemoh, 2014; Lovins, 1968). Also, determining the part
9 Computer Processing of Language: Where Archaeological Discourse. . . • • • • • 199 of speech (POS) for each word is also an important task at this level (Indurkhya & Damerau, 2010). Lexical level: analysis of lexical meaning of words and parts of speech interpretations, such as determining the polarity in sentiment analysis for a given word, determining if certain word is a proper name (NER task, e.g., proper names of persons or places) (Indurkhya & Damerau, 2010), or conducting the disambiguation at a word level (Lesk, 1986). Lexical level could appear or not depends on the linguistic classifications, swiping some tasks with the semantic level in other classifications. Syntactic level: analysis sentence structure and sentence-based roles. It implies to perform a grammatical parsing and interpreting in function of the analysis results (Indurkhya & Damerau, 2010; Soricut & Marcu, 2003). Semantic level: analysis of words in the context of the sentence. For instance, determining polarity in sentiment analysis in a given context or Information Extraction (IE) task (Andersen et al., 1992), which extract specific predefined information from the text, especially triples in form of Objects or Subjects and their relationships. Discourse level: analysis above each sentence. This allows to analyze paragraphs or complete documents, trying to extract structural and semantic information, applying discourse analysis methodologies. The relationships between sentences allow causal or argumentative analysis (among others) at the document level. It also includes aspects of discursive intention and coherence (Harris, 1981; Kurdi, 2017; Mann & Thompson, 1988). Pragmatic level: analysis of the use of linguistic structures in specific situations, depending on the context of use. Generally, requires analyses in some previous levels and additional human knowledge that can sometimes be provided to the machine (Kurdi, 2017). Note that lexical and semantic levels could share or overlap some tasks in other linguistic classifications. Using the entire classification detail above, our lexical level corresponds to analyses more focused on the word itself as a minimum unit, with its internal structure and self-contained meaning and its specific lexical typology. As example, in this pair of expressions: 1. “Cold pizza drives me crazy” 2. “You have to be crazy to eat one of those pieces of cold pizza” At the lexical level, we apply strategies at the word level, so for instance in a sentiment analysis application, we will work with polarity at a word level, where the term “crazy” could have a negative polarity, although it is clearly positive in the first context and negative in the second. The good result of the lexical strategies is based on their ability to generalize and balance individual values for each unit or word. This analysis is not only valid to sentiment analysis, but also to any algorithm or NLP strategy with the same lexical base of operation. Regarding the semantic level, we include in this classification applications more focused on the word as a minimum unit but also analyzing its context of use, its
200 P. Martín-Rodilla meaning in that context and the auxiliary role the word plays in it. Looking back to the previous sentences and the sentiment analysis example, 1. “Cold pizza drives me crazy” 2. “You have to be crazy to eat one of those pieces of cold pizza” In this case, we would have algorithms or NLP strategies based not only on the individual polarity of each element, but also on the role it plays in the specific sentence (for example, if we have information about the expression “to drive crazy to someone” and its common use) and we employ this information to reasoning about the polarity of sentences. As has been explained on the lexical level, some authors do not make this distinction and work by default at a semantic level, ignoring (and including in the semantic level) the lexical approach for some NLP algorithms and strategies. The rest of the levels of the classification used here present a more standardized behavior. With these fundamentals, it is possible to see the discipline of NLP as a matrix structure in which the rows are the different language levels and their subtasks, and the columns are the paradigms of methods and techniques in NLP (we can find subdivisions of both rows and columns in other detailed classifications, and it could be both expanded in the future). Figure 9.1 tries to summarize the matrix structure that we propose here. In Fig. 9.1, most of the relevant NLP task are categorized by Liddy and Feldman levels (Feldman, 1999; Liddy, 1998). Since it is not the scope of this chapter to detail all the possible NLP tasks, we only list the most well-known NLP tasks so that the reader has an idea of what type of linguistic information is possible to extract and what depth of analysis can be carried out at each level. The plus icon indicates the paradigm-linguistic level combinations that occur most commonly in the literature (where there are more developments and results achieved). This is not to say that the free holes have not been developed any work. Note that the more we raise the level of linguistic abstraction, the more work on learning and hybrid paradigms has been done, due in part to the complexity of the tasks at the higher levels of abstraction. In the combinations of the matrix without symbol, the number of jobs or the results are smaller, although it is possible to find NLP development attempts in almost all combinations. The NLP structure proposed does not pretend to establish a standard typology that covers every of the methods/techniques and paradigms adopted, but rather to summarize the NLP area for the reader, who can easily extend the approach in case their needs are more specific. 9.2.1 An Overview on Natural Language Computational Analysis in Archaeology Not all previous language levels and NLP paradigms have been explored in the specific domain of archaeology. In many cases, it is the archaeologists who have
9 Computer Processing of Language: Where Archaeological Discourse. . . 201 Fig. 9.1 NLP summary matrix structure. NLP paradigms as columns and linguistic levels (and some NLP tasks examples) as rows decided what textual information and at what level is relevant and useful to extract and analyze for a specific archaeological investigation. NLP paradigms used in archaeology have followed a similar historical development to the discipline of NLP itself, with initial rule-based approaches to gradually exploring the statistical and learning approaches. Recently, an active research community in archaeology explored the possibilities of the most advanced learning methods on archaeological sources (Alex et al., 2020). Meanwhile, hybrid intelligence approaches (including NLP cognitive approximations and cognitivebased research questions) are still in early stages at the NLP level and have not been widely applied in archaeology. It is important to highlight here that the same implications of taking a decision about adopting any NLP paradigm (i.e., Ruled-based vs. statistical methods or learning methods) are also valid for the archaeological domain, or are even slightly
202 P. Martín-Rodilla magnified in this field. Specifically, rule-based methods are usually employed in small and ad hoc applications (the rules approach is difficult to generalize) and the rest of approaches are chosen in larger projects with more volume of information (but professionals need to be trained). This is also true for archaeological NLP applications. For example, the rule-based approach is mostly adopted in small applications of NLP in archaeology, since the high variability of textual sources makes them sometimes weak to generalize the developments. Meanwhile, statistical, learning-based or hybrid approaches require specific training in formal, statistical, and algorithmic methods (sometimes with a significant learning curve) by archaeologists, or the configuration of interdisciplinary teams to carry out these larger projects. It is necessary to consider also that in the most up-to-date approaches there may be other factors that influence both the decision of the NLP paradigm and its results, such as the degree of familiarity with the texts by the researchers who carry out the study. This is an aspect that we must consider. The lack of annotated corpora for training in supervised methods is also a handicap even greater in domains such as archaeology. Regarding the different linguistic levels treated, the motivations and goals of the archaeological community when working on one or more of the previous levels are numerous and varied. We can find, for example, a whole set of applications at the lexical level motivated by the existence and previous development in archaeology of thesauri, domain typologies and controlled vocabularies, which lays an ideal basis for NLP at the morphological and lexical levels (Felicetti, 2017; Vlachidis et al., 2017). Thus, and as will be seen in later chapters, it has yielded successful results in some applications in archaeology. At the syntactic level, applications in archaeology decrease considerably, although the studies carried out help us to detect domain-specific challenges. For example, some initial efforts have made for integrating grammatical analysis in inferring fieldwork methodologies from grey literature (Epure et al., 2015). Also, in Jeffrey et al. (2011) and Jockers and Underwood (2015) some structural challenges in NLP and text mining in the humanities were defined, and the reports of the ARIADNE project on NLP in archaeology already collect some challenges in NLP for archaeology that involve from the syntactic to the pragmatic level, such as the development of multilingual grammars, the role of negation in archaeology or some special cases of archaeological ontological ambiguity in ontology learning processes (Vlachidis et al., 2017). The levels that have been developed in the archaeological domain will be treated in specific chapters of this volume. Thus, the rest of this chapter is dedicated to the level of discourse, with chapters for the lexical (Chap. 9), syntactic and semantic (Chap. 10) and pragmatic (Chaps. 11 and 12) levels. The previous historical and typological tour detailed us what kinds of problems can NLP solve as a discipline, what subtasks are addressed, and what approaches we can use. The rest of the chapter will use those foundations to focus on one of the levels of highest abstraction and interest in the archaeological domain: the discursive level. The following sections will detail the work done so far on automatic and semi-
9 Computer Processing of Language: Where Archaeological Discourse. . . 203 automatic discourse processing, why it is interesting and necessary in archaeology, and how it has been applied until now to archaeological contexts. 9.3 Where Archaeological Discourse and Computers Meet Among many other functions, language acts as a bridge: it connects people, transmitting the most unstructured, complex or instinctive ideas from underlying models. Furthermore, it serves as a pathway between people and machines (e.g., programming languages or semantic models). Natural language processing or text mining techniques have improved this human-machine relationship, both in analysis (with automatic parsers, language recognizers or prediction models), and in natural language production (in the form of chatbots or applications data-to-text). However, many of these advanced models reproduce textual syntactic and/or semantic patterns in an algorithmic way, avoiding information from higher levels of abstraction such as speaker intention, composition and discursive coherence (Martín-Rodilla, 2015). In recent years, discourse analysis is being integrated into the human-machine relationship, allowing us the automation of some discursive extraction and the assistance via software in the identification of discursive elements, referenced ontological entities and inferential relationships. Inherently narrative domains, such as archaeology, need software support at the discursive level, mainly due to intrinsic characteristics such as subjectivity or ambiguity present in the domain (GonzálezPérez, 2018). Let us take an archaeological case study as motivation: Why is the discursive level interesting for archaeology? and its computational approach? Let us imagine that we want to carry out an archaeological project that aims to investigate and recover archaeological heritage at sites in ancient rural areas that were flooded for dam construction. Numerous studies have been interested in the number of villages flooded due to the construction of dams between the 1960s and 1970s in USA, Spain (during Franco’s dictatorship) (del Romero Renau, 2013) or Portugal (Arcà et al., 2001), and their consequent underwater heritage. Given the temporal proximity of the context, we will have abundant grey literature (official reports, local news in newspapers, publications about these archaeological sites, etc.) All this textual material contains narratives throughout history about the material evidence that we can find (What are the different chronologies involved?), the motivations and opinions about their flooding, conservation, or destruction (Should we conserve what flooded?), and the implications it had for the local population (How do they perceive that heritage now?) (Fig. 9.2). In order to answer these research questions, the automatic or semi-automatic analysis of the textual sources allows us a systematic treatment of the narratives produced about this archaeological heritage. Using natural language processing (NLP), we can, for example, (1) extract from the reports and publications the types of entities and archaeological findings that have been found in previous investigations, (2) establish metadata for all the existing documentation and that
204 P. Martín-Rodilla Fig. 9.2 Example of an archaeological project that illustrates the different levels of NLP information of interest and the absence of discourse-level treatment of the textual sources allow its optimal computational search or (3) evaluate the newspapers with a sentiment analysis to deal with the opinions of the local press over time. It is possible to perform all these analyzes with NLP, working at the lexical, grammatical, or semantic level. However, let us look again at the research questions we have proposed: What are the different chronologies involved? Should we conserve what was flooded? How do local people perceive that heritage now? As we can see at Fig. 9.2, working at these linguistic levels, we only have a partial answer to the first question (that is, to the type of archaeological entities that previous scientific reports or publications findings deal with) or some sentiment analysis results on opinions, but it is not possible to answer the rest of the questions. Based on what evidence did the archaeologists reach these conclusions regarding the chronologies involved? What discourses support the conservation or destruction of the flooded heritage? What reasons do the local population use to perceive this heritage as positive or negative? All these sub-questions already belong to a deeper field of analysis. The discourse level treatment of the reports, publications, or news about these sites is really what will give us an answer to the research questions raised. Of course, it is possible to carry out all this work without software assistance (not automating any aspect of the analysis process) but, as is evident, the increase in textual sources of information and in formal methods to perform discourse
9 Computer Processing of Language: Where Archaeological Discourse. . . 205 analysis (connected with the rest of the levels NLP) facilitates the treatment of these sources, the systematization of the work at a methodological level and improves the replicability and consistency of the study. Therefore, we will be able to find out what material evidence supports the chronology of the flooded fields, the different reasons why it was chosen to flood that village, the different reasons why local population presents positives or negatives opinions about the flooded heritage etc. As the example in Error! Reference source not found illustrates, there is a connection inside the archaeological textual sources (grey literature, publications, dissemination materials, etc.) between the discourse employed by the archaeologists for explaining their findings and reasoning and the archaeological knowledge that has been produced inside the text. The vast amount of textual material on archaeological investigations currently available makes the computational approach at all levels (storage and access, extraction, treatment, and analysis) a real necessity. At the discourse level, the computational approach allows us to cover many textual sources from the same formal and theoretical representation, comparing different approaches and applying metrics that allow us to carry out deeper analyzes on the discursive aspects that we want to analyze. In the previous case study, for example, it allows us to analyze the causality aspects of all sources looking for the different reasons why it was chosen to flood that village or a contrast discursive analysis in the opinion’s analysis. The development of formal methods of representation, automatic treatment and specific NLP algorithms at the discursive level opens the door to a treatment of archaeological textual sources that is much richer and integrated into the research questions of the archaeological domain. Despite these advantages, the use of semiautomatic or automatic treatment of textual sources in archeology for analysis at the discursive level is still residual. The next section goes through the computational analysis of discourse are and, later, it details the works that have used it in archaeological contexts, where more possible applications can be seen. 9.3.1 Computational Analysis of Discourse The term “discourse” has been changing in its meaning and references over time (and studied overlaps with terms such as speech, text or context (Gordon, 2009; Kurdi, 2017)). Currently, we can find at least two interdisciplinary approaches that deal with the concept of discourse, considering the linguistic phenomenon and its content in different ways. In the first place, the inheritance of linguistic, semantics and communicative studies allows us to define discourse as the underlying conceptualization in a communicative act (spoken or written), which has domainspecific vocabularies and structural elements. Secondly, discourse can be defined as a linguistic construct made up of statements, allowing the discourse creator to assign “meaning to words and to communicate repeatable semantic relations to, between, and among the statements, objects, or subjects of the discourse” (Foucault & Kremer-Marietti, 1969).
206 P. Martín-Rodilla We can therefore identify in these two approaches the objects of a discourse analysis (writing, conversation, any communicative event), a discipline that allows subdividing the sub-elements of discourse (at different levels) and formalizing them, either only attending to linguistic criteria, or expanding the concept and conducting a discourse analysis that includes socio-psychological aspects of the authors. Thus, discourse analysis constitutes a methodology of textual analysis (or oral discourse) based on the subdivision into lower-level elements (i.e., sentences), and their characterization, to analyze the meaning and internal connections between the elements (Harris, 1981). It is possible to carry out NLP semi-automatic and automatic works in both approaches to discourse analysis as a methodology, although with differences in levels of complexity and applications. In this chapter we will review existing computational approaches to discourse analysis focusing on the linguistic approach, including some expansions in intention, coherence, and other formal metrics that already work at the discourse level (Kurdi, 2017). Thus, the review does not include conceptualizations of the discourse around ideas, beliefs or relations between knowledge and power (Foucault & Kremer-Marietti, 1969; Lessa, 2006) because of the reduce number of computational implementations of them. However, the expansion of the discourse analysis area in the computation applications is increasing day by day, influencing areas such as argument mining. More information about sociological and philosophical aspects of discourse and some computational approximations could be find in some chapters of this volume (e.g., Chaps. 1 and 3). At a computational level, discourse structure has received an increasingly attention in recent years due to the benefits its application offers in some NLP tasks, such as automatic summarization or question answering (Atutxa et al., 2019). Some discursive phenomena, as topic modelling or anaphora, are more advanced in terms of computational approaches (Kurdi, 2017; Wiseman et al., 2016). However, the most important part to deal with when we make any computational approach from textual sources is the formal aspect of the discourse itself. Formalization allows the subsequent recognition, extraction, and application of computational methods in textual analysis. The first formalizations at the level of discourse organized discourse around units called utterances. These minimal units of discourse were connected to each other both logically and topologically (Kurdi, 2017). The subsequent development of this germ idea gave rise to formalizations of discursive structures that allow computational advances in discourse analysis. Hobb’s formalization constitutes the first formal discourse theory that uses tree as underlined text structure (Hobbs, 1985), and it is possible to find current applications based on Hobb’s approach (Dutta et al., 2008). Later, it is from the formulation of the Rhetorical Structure Theory (hereafter RST), a theory of discourse structure formalized in 1988 by Mann and Thompson (1988), when computational approaches at the discourse level become generalized. RST internally represents the structure of any discourse as a tree of discourse units, which are related to each other by rhetorical criteria (analyzing content at a functional and semantic level within the discourse elements and their relationships) (Marcu, 2000). We can, therefore, given a textual source, obtain the underlined discursive tree with the discursive elements, if they are central
9 Computer Processing of Language: Where Archaeological Discourse. . . 207 aspects (nucleus) or peripheral (satellites) within the discourse and if they are related at a causal, contrastive, elaborative level etc. (Taboada & Mann, 2006). RST is taken as the base formalization by almost all computational approaches that currently work at the discursive linguistic level, although some approaches carry out RST adaptations or use different computational techniques for their parsers (Kurdi, 2017). Marcu (2000) defined the first method to follow for the construction of discourse parsers. This method involves a first segmentation of the discourse and a subsequent construction of the RST discourse tree. The current parsers follows this methodology, although they carry out the construction of the tree employing different methods: either through rule-based NLP (Boufaden et al., 2002; Polanyi et al., 2004; Soricut & Marcu, 2003), or using statistical or machine learning approaches (Heilman & Sagae, 2015; Joty et al., 2013; Li et al., 2014) and exploring different source languages (Liu et al., 2020). Most of the existing parsers are high-quality developments in terms of internal formalization, offering real automation results from textual sources. However, their development within research projects without a continuation in funding, makes many of them little accessible or present high learning curves for their use (and therefore, high difficulty also for their modification and research improvement). This could be one of the reasons for the limited knowledge of this type of development outside the academic environment or areas outside of computational linguistics field. In order to promote research and application of natural language processing at the speech level, numerous computational resources have been developed in recent years, especially different corpus annotated using RST. These annotated corpus act as a gold standard in algorithm, methods or novel techniques evaluation systems, as well as training data sources in case we develop new algorithms based on supervised methods. We can find RST-based treebanks and annotated corpus in different languages (Cao, 2018; Cao et al., 2018; Das & Stede, 2018; Iruskieta et al., 2013; Mann & Taboada, 2005–2021). Using these parsers as an extraction basis, applications of automatic or semiautomatic analysis of discourse have been carried out in diverse domains (in addition to the applications on NLP tasks improvement), such as medicine (Paulino et al., 2018), media (in fake news detection or hate speech recognition, among others) (Fortuna & Nunes, 2018; Karimi & Tang, 2019; Kolhatkar & Taboada, 2017) or the legal domain (Gamallo et al., 2019; Kurdi, 2017; Moens et al., 2007). Another area of complementary development has been that of software strategies for displaying information from discourse analysis, in two sub-lines (1) the discourse analysis carried out manually by experts and their collective annotation— an extensive review of applications is presented in Martin-Rodilla and Sánchez (2020)—and (2) information visualization techniques for visualizing the results of the parsers (Martin-Rodilla & Sánchez, 2020; Zhao et al., 2012). Apart from RST, in recent years some discourse structural theories have been developed using graphs as the underlined formalization (Radev, 2000; Webber, 2004; Webber & Joshi, 2012). A complete review about the current state of RST application and the comparison with graph-based existing approaches in discourse automation could be found at (Hou et al., 2020).
208 P. Martín-Rodilla In summary, the computational approaches that allow an automatic analysis of discursive structures from textual sources have presented a great change in recent years. Theories such as RST or the novel approaches based on graphs offer us the necessary formalization to create parsers that automate discursive analysis from textual sources. The existence of complementary tools at the software level, such as editors to annotate texts by experts, treebanks as a massive example of text analysis, and advances in the visualization of results allow the application of discursive analysis at the linguistic level to a wide variety of domains. However, the limited accessibility to the tools, the high learning curve of the methods and tools and the need for interdisciplinary teams in this type of analysis constitute barriers in the generalization of its application. The following section briefly details the efforts made in this regard in the archaeological context. 9.3.2 Applications in Archaeological Discourse Although the study of archaeological discourse from a methodological perspective on the part of the researcher is a constant in archeological research, the application of computational methods that allow its formalization, assistance via software or even automation is still residual. At the topic modeling level, there are initial applications with existing algorithms (Borgo Ton, 2019), but without formalizing aspects of discourse from the base textual sources. At a formal level (which, as previously seen, is necessary to advance in the application of automatic parsing and deep computational analysis), the first formalizations of archaeological discourse can be found in the works of Gardin (1980), with ramifications in current applications (Dallas, 2016; Moscati, 2016). These works are the basis for the application of computational discourse analysis in archeology, although they constitute theoretical frameworks not fully implemented in computational algorithms. Similar state presents some most advanced efforts to recover the linguistic aspect of archaeological discourse with formalizations based on Hobbs or RST (MartínRodilla, 2015, 2018; Martín-Rodilla & González-Pérez, 2014). Also, empirical studies with professionals in archeology and nearby areas of knowledge show greater satisfaction with these methods in archaeology in the form of software assistance, or semi-automation (Martin-Rodilla, 2018; Rodilla & González-Pérez, 2017), concluding that some human know-how and interdisciplinary teams are necessary to undertake more ambitious efforts in computational discourse analysis in archaeology (González-Pérez, 2021). In summary, the archaeological domain has not yet fully tested the advances detailed above in automatic and semi-automatic discourse analysis, either because they do not leave the academic-linguistic environment or because of an archaeological excessive fear of automation, although there are some promising formalizations and studies in this area that would allow, in the future, to combine the advances detailed in this chapter with real applications in archaeological contexts.
9 Computer Processing of Language: Where Archaeological Discourse. . . 209 9.4 Computer Processing of Language and Discourse in Archaeology: Current and Future Perspectives In summary, this chapter consists of two separate thematic parts. Firstly, an extensive tour is made through the history of natural language processing as a discipline, its paradigms (the most common methods and tasks) and the linguistic levels at which NLP can work. In addition, the current NLP situation in archaeology is briefly contextualized (a more in-depth study will be addressed in successive chapters). At the NLP paradigm level, rule-based approaches are well established, and learning-based approaches are currently expanding. Future steps possibly include more attempts in hybrid intelligence approaches in archaeology. Recent projects more focused on cognitive aspects in the discipline, such as the one recently awarded in the Sinergy Grant ERC 2020 call called XSCAPE (Incipit-CSIC, 2020), focused on researching if the material structures of archaeological settlements, buildings, roads, and artefacts actively change brain and mind patterns of thought and attention. This also could have ramifications in terms of the language and discourse produced by the archaeologists in their textual narratives and give us an idea of the future importance of cognitive studies in archaeology. This trend on cognitive studies will probably cause an advance also in the applications of hybrid NLP and cognitive approaches in studies from textual sources in archaeology, but the latter is only a prediction that we must wait to see fulfilled. In the second part, the linguistic level of discourse and its computational possibilities in archaeology are specifically addressed. As it has been seen, the discursive level constitutes a fundamental level at a theoretical level (since it deals with archaeological discourse vital aspects such as causality, negation, subjectivity, or ambiguity, among others). In recent decades, there have been important advances in the computational formalization and NLP applications at the discourse level. However, the software assistance and NLP approaches at a discourse level is still residual in archaeology. Current and future advances in representation, extraction and analysis techniques at the discursive level will allow a generalization of the approach and its possible systematic application in textual sources in archaeology. Acknowled gments This research has received financial support from the Saving European Archaeology from the Digital Dark Age (SEADDA) 2019-2023 COST ACTION CA 18128 and Xunta de Galicia—“Consellería de Cultura, Educación e Universidade” and the ERDF (“Centro Singular de Investigación de Galicia” accreditation ED431G 2019/01). References Alex, B., Kramer, I. C., Verschoof-van der Vaart, W. B., Orengo, H. A., Garcia-Molsosa, A., & Conesa, F. C. (2020). Machine learning in archaeological research; challenges and opportunities. Session 5 at 48th computer applications and quantitative methods in archaeology (CAA) conference, Oxford, UK.
210 P. Martín-Rodilla Andersen, P. M., Hayes, P. J., Weinstein, S. P., Huettner, A. K., Schmandt, L. M., & Nirenburg, I. (1992). Automatic extraction of facts from press releases to generate news stories. In Third conference on applied natural language processing, pp. 170–177. Arcà, A., Bednarik, R. G., Fossati, A., Jaffe, L., & Abreu, M. S. (2001). Damned dams again: The plight of Portuguese rock art. Rock Art Research, 18, i–iv. Atutxa, A., Bengoetxea, K., de Ilarraza, A. D., & Iruskieta, M. (2019). Towards a top-down approach for an automatic discourse analysis for Basque: Segmentation and Central Unit detection tool. Plos One, 14(9), e0221639. Balakrishnan, V., & Lloyd-Yemoh, E. (2014). Stemming and lemmatization: A comparison of retrieval performances. Lecture Notes on Software Engineering, 2, 262–267. Borgo Ton, M. (2019). Magic lantern shows through a macroscopic lens: Topic modelling and mapping as methods for media archaeology. Early Popular Visual Culture, 17(3–4), 341–360. Boufaden, N., Lapalme, G., & Bengio, Y. (2002). Segmentation en thèmes de conversations téléphoniques: traitement en amont pour l’extraction d’information. En Actes de la 9ème conférence sur le Traitement Automatique des Langues Naturelles (TALN) 2002. Cao, S. (2018). Elaboration of a RST Chinese treebank.http://hdl.handle.net/10810/26206 Cao, S., da Cunha, I., & Iruskieta, M. (2018). The RST Spanish-Chinese treebank. In Proceedings of the joint workshop on linguistic annotation, multiword expressions and constructions (LAWMWE-CxG-2018), pp. 156–166, . Chaudhary, A., Zhou, C., Levin, L., Neubig, G., Mortensen, D. R., & Carbonell, J. G. (2018). Adapting word embeddings to new languages with morphological and phonological subword representations. arXiv preprint arXiv:1808.09500. Chomsky, N. (2002). Syntactic structures. Walter de Gruyter. Chowdhury, G. G. (2003). Natural language processing. Annual Review of Information Science and Technology, 37(1), 51–89. Dallas, C. (2016). Jean-Claude Gardin on archaeological data, representation and knowledge: implications for digital archaeology. Journal of Archaeological Method and Theory, 23(1), 305–330. Das, D., & Stede, M. (2018). Developing the bangla RST discourse treebank. In Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018). del Romero Renau, L. (2013). La construcción de sociedades hidráulicas:: El caso de España y del Oeste de EE. UU. Cuadernos de geografía, 93, 53–77. Dutta, K., Prakash, N., & Kaushik, S. (2008). Resolving pronominal anaphora in hindi using hobbs algorithm. Web Journal of Formal Computation and Cognitive Linguistics, 1(10), 5607–5611. Epure, E. V., Martín-Rodilla, P., Hug, C., Deneckère, R., & Salinesi, C. (2015). Automatic process model discovery from textual methodologies. In 2015 IEEE 9th international conference on research challenges in information science (RCIS), pp. 19–30. Feldman, S. (1999). NLP meets the Jabberwocky: Natural language processing in information retrieval. Online-Weston Then Wilton, 23, 62–73. Felicetti, A. (2017). Teaching archaeology to machines: Extracting semantic knowledge from free text excavation reports. Digital Humanities, p. 9. Fortuna, P., & Nunes, S. (2018). A survey on automatic detection of hate speech in text. ACM Computing Surveys (CSUR), 51(4), 1–30. Foucault, M., & Kremer-Marietti, A. (1969). L’archéologie du savoir (Vol. 1). Gallimard. Gamallo, P., Martín-Rodilla, P., & Calderón, B. (2019). Identifying causal relations in legal documents with dependency syntactic analysis. In 8th symposium on languages, applications and technologies (SLATE 2019). Gamallo, P., Grarcía, M., Martin-Rodilla, P., & Pereira-Farina, M. (2020). Workshop on hybrid intelligence for natural language processing tasks (co-located at ECAI-2020). March 2021. Available at https://hi4nlp.pages.citius.usc.es/ Gardin, J. C. (1980). Archaeological constructs: an aspect of theoretical archaeology. Cambridge University Press. González-Pérez, C. (2018). Information modelling for archaeology and anthropology. Software engineering principles for cultural heritage. Springer.
9 Computer Processing of Language: Where Archaeological Discourse. . . 211 González-Pérez, C. (2021). Heritage 3.0 project: Argumentation and conceptual modelling for enhanced cultural heritage participation and management policies. Grant PID2020114758RB-I00 Founder and prescriptor: Spanish NAtional Agency for Research Funding (Agencia Estatal de Investigación). Available at http://www.incipit.csic.es/en/project/acme Gordon, C. (2009). Making meanings, creating family: Intertextuality and framing in family interaction. OUP. Harris, Z. S. (1981). Discourse analysis. In Papers on syntax (pp. 107–142). Springer. Heilman, M., & Sagae, K. (2015). Fast rhetorical structure theory discourse parsing. arXiv preprint arXiv:1505.02425. Hirschberg, J., & Manning, C. D. (2015). Advances in natural language processing. Science, 349(6245), 261–266. Hobbs, J. R. (1985). On the coherence and structure of discourse. CSLI Publications. Hou, S., Zhang, S., & Fei, C. (2020). Rhetorical structure theory: A comprehensive review of theory, parsing methods and applications. Expert Systems with Applications, 157, 113421. Huggett, J. (2004). Archaeology and the new technological fetishism. Archeologia e Calcolatori, 15, 81–92. Incipit-CSIC. (2020). XSCAPE Material Minds Project (ERC-2020-SyG 951631 – XSCAPE). 08/03/2021; Available at http://www.incipit.csic.es/en/project/xscape Indurkhya, N., & Damerau, F. J. (2010). Handbook of natural language processing (Vol. 2). CRC Press. Iruskieta, M., Aranzabe, M. J., Diaz de Ilarraza, A., Gonzalez, I., Lersundi, M., & Lopez de Lacalle, O. (2013). The RST Basque TreeBank: an online search interface to check rhetorical relations. In 4th workshop RST and discourse studies 2013, pp. 40–49. Jeffrey, S., Richards, J., Ciravegna, F., Waller, S., Chapman, S., Zhang, Z., & Austin, A. (2011). When ontology and reality collide: The Archaeotools project, faceted classification and natural language processing in an archaeological context. In Proceedings of the 36th international conference, Budapest, 2–6 April 2008, pp. 285–290. Jockers, M. L., & Underwood, T. (2015). Text-mining the humanities. In A new companion to digital humanities (pp. 291–306). Wiley. Joty, S., Carenini, G., Ng, R., & Mehdad, Y. (2013). Combining intra-and multi-sentential rhetorical parsing for document-level discourse analysis. In Proceedings of the 51st annual meeting of the Association for Computational Linguistics (Volume 1: Long papers), pp. 486– 496. Karimi, H., & Tang, J. (2019). Learning hierarchical discourse-level structure for fake news detection. arXiv preprint arXiv:1903.07389. Khurana, D., Koli, A., Khatter, K., & Singh, S. (2017). Natural language processing: State of the art, current trends and challenges. arXiv preprint arXiv:1708.05148. Kolhatkar, V., & Taboada, M. (2017). Constructive language in news comments. In Proceedings of the first workshop on abusive language online, pp. 11–17. Kurdi, M. Z. (2017). Natural language processing and computational linguistics 2: semantics, discourse and applications (Vol. 2). Wiley. Lesk, M. (1986). Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In Proceedings of the 5th annual international conference on systems documentation (pp. 24–26). Association for Computing Machinery. Lessa, I. (2006). Discursive struggles within social welfare: Restaging teen motherhood. British Journal of Social Work, 36(2), 283–298. Li, J., Li, R., & Hovy, E. (2014). Recursive deep models for discourse parsing. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 2061–2069, . Liddy, E. D. (1998). Enhanced text retrieval using natural language processing. Bulletin of the American Society for Information Science and Technology, 24(4), 14–16. Liu, Z., Shi, K., & Chen, N. F. (2020). Multilingual neural RST discourse parsing. arXiv preprint arXiv:2012.01704.
212 P. Martín-Rodilla Lovins, J. B. (1968). Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11(1–2), 22–31. Mann, W. C., & Taboada, M. (2005–2021). RST tools for analysts. [12/03/2021]; Available at https://www.sfu.ca/rst/06tools/index.html Mann, W. C., & Thompson, S. A. (1988). Rhetorical structure theory: Toward a functional theory of text organization. Text, 8(3), 243–281. Manning, C., & Schutze, H. (1999). Foundations of statistical natural language processing. MIT Press. Marcu, D. (2000). The theory and practice of discourse parsing and summarization. MIT Press. Martín-Rodilla, P. (2015). An empirical approach to the analysis of archaeological discourse. In Across Space and Time. Papers from the 41st Conference on Computer Applications and Quantitative Methos in Archaeology, Perth 25–28, March 2013 (vol. 319). https://doi.org/ 10.5117/9789089647153 Martin-Rodilla, P. (2018). Digging into software knowledge generation in cultural heritage. Springer. Martín-Rodilla, P., & Gonzalez-Perez, C. (2014). An ISO/IEC 24744-derived modelling language for discourse analysis. In 2014 IEEE eighth international conference on research challenges in information science (RCIS), pp. 1–10. Martin-Rodilla, P., & Sánchez, M. (2020). Software support for discourse-based textual information analysis: A systematic literature review and software guidelines in practice. Information, 11(5), 256. Mishra, A., & Bhattacharyya, P. (2018). Cognitively inspired natural language processing: An investigation based on eye-tracking. Springer. Moens, M.-F., Boiy, E., Palau, R. M., & Reed, C. (2007). Automatic detection of arguments in legal texts. In Proceedings of the 11th international conference on artificial intelligence and law (pp. 225–230). Association for Computing Machinery. Moscati, P. (2016). Jean-claude gardin and the evolution of archaeological computing. Les nouvelles de l’archéologie, 144, 10–13. Paulino, A., Sierra, G., Hernández-Domínguez, L., da Cunha, I., & Bel-Enguix, G. (2018). Rhetorical relations in the speech of Alzheimer’s patients and healthy elderly subjects: An approach from the RST. Computación y Sistemas, 22(3), 895–905. Pierce, J. R., & Carroll, J. B. (1966). Language and machines: Computers in translation and linguistics (ALPAC report). National Academy of Sciences/National Research Council. Polanyi, L., Culy, C., Van Den Berg, M., Thione, G. L., & Ahn, D. (2004). A rule based approach to discourse parsing. In Proceedings of the 5th SIGdial workshop on discourse and dialogue at HLT-NAACL 2004, pp. 108–117. Radev, D. (2000). A common theory of information fusion from multiple text sources step one: Cross-document structure. In 1st SIGdial workshop on discourse and dialogue, pp. 74–83. Rodilla, P. M., & González-Pérez, C. (2017). A modelling language for discourse analysis in humanities: Definition, design, validation and first experiences. Revista de Humanidades Digitales, 1, 368–378. Rosenfeld, R. (2000). Two decades of statistical language modeling: Where do we go from here? Proceedings of the IEEE, 88(8), 1270–1278. SEADDA Project. (2020). SEADDA ACTION COST CA18128 – Saving European archaeology from the digital dark age 08/03/2021; Available at https://www.seadda.eu/ Sharp, B., & Delmonte, R. (2015). Natural language processing and cognitive science. De Gruyter. Soricut, R., & Marcu, D. (2003). Sentence level discourse parsing using syntactic and lexical information. In Proceedings of the 2003 human language technology conference of the North American chapter of the Association for Computational Linguistics, pp. 228–235. Strubell, E., Ganesh, A., & McCallum, A. (2019). Energy and policy considerations for deep learning in NLP. arXiv preprint arXiv:1906.02243. Taboada, M., & Mann, W. C. (2006). Rhetorical structure theory: Looking back and moving ahead. Discourse Studies, 8(3), 423–459.
9 Computer Processing of Language: Where Archaeological Discourse. . . 213 Turing, A. M. (2009). Computing machinery and intelligence. In Parsing the turing test (pp. 23– 65). Springer. Vlachidis, A., Tudhope, D., Wansleeben, M., Azzopardi, J., Green, K., Xia, L., & Wright, H. (2017). D16.4: Final report on natural language processing. Ariadne. Webber, B. (2004). D-LTAG: Extending lexicalized TAG to discourse. Cognitive Science, 28(5), 751–779. Webber, B., & Joshi, A. (2012). Discourse structure and computation: Past, present and future. In Proceedings of the ACL-2012 special workshop on rediscovering 50 years of discoveries, pp. 42–54. Weizenbaum, J. (1966). ELIZA—A computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), 36–45. Winograd, T. (1971). Procedures as a representation for data in a computer program for understanding natural language. Massachusetts Institute of Technology Cambridge Project Mac. Wiseman, S., Rush, A. M., & Shieber, S. M. (2016). Learning global features for coreference resolution. arXiv preprint arXiv:1604.03035. Zhao, J., Chevalier, F., Collins, C., & Balakrishnan, R. (2012). Facilitating discourse analysis with interactive visualization. IEEE Transactions on Visualization and Computer Graphics, 18(12), 2639–2648.
Chapter 10 NLP and Archaeology: A View from a Digital Archive Holly Wright, Tim N. L. Evans, and Katie Green Abstract The Archaeology Data Service (ADS) has been experimenting with Natural Language Processing (NLP) methodologies for over 12 years. As an accredited digital repository, the focus has been to explore how NLP techniques can be used to augment any basic digital object’s metadata and to begin to facilitate increased human and machine access. Thus, the words used within the ADS archive catalogue and archaeological reports have added value; they provide detail, context and understanding, but conversely, they can also be ambiguous. The NLP techniques studied go beyond allowing a user to search a PDF file, to building a classification for the user, and then continuing to improve the rules behind the method(s). While these experiments solidified our view that NLP has an important role to play in our core services, our ability to implement them in a robust way has remained elusive. This chapter presents our journey from an archaeological perspective, being useful to both researchers who wish to engage with NLP methodologies in Social Sciences and Humanities, while also giving the point of view of a trusted digital repository. Also, it reports ADS efforts to implement NLP within our collections, discussing why it remains elusive and future challenges. Keywords Archaeology data service · Natural language processing · Named entity recognition · Archive · Metadata 10.1 Introduction The Archaeology Data Service (ADS) has been experimenting with Natural Language Processing (NLP) technologies and methodologies for over 12 years. As an accredited digital repository that actively curates over two million digital objects, the focus of the NLP work at the ADS has been to explore how these techniques can H. Wright (!) · T. N. L. Evans · K. Green Archaeology Data Service, University of York, York, UK e-mail: holly.wright@york.ac.uk; tim.evans@york.ac.uk; katie.green@york.ac.uk © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 C. Gonzalez-Perez et al. (eds.), Discourse and Argumentation in Archaeology: Conceptual and Computational Approaches, Quantitative Archaeology and Archaeological Modelling, https://doi.org/10.1007/978-3-031-37156-1_10 215
216 H. Wright et al. be used to augment the basic metadata that describes a digital object, and to begin to facilitate increased human and machine access. In simple terms, the words used within the ADS archive catalogue and archaeological reports have added value; they provide detail, context and understanding, but conversely, they can also be ambiguous. The NLP techniques studied go beyond allowing a user to search a PDF file (for example) for matching words and phrases, to building a classification for the user, and then continuing to improve and refine the rules behind the method(s). While these experiments solidified our view that NLP has an important role to play in our core services, our ability to implement them in a robust way has remained elusive. This chapter will present our journey from an archaeological domain perspective rather than a technical perspective, in the hope that it will be useful to both researchers who wish to engage with NLP methodologies using Social Sciences and Humanities data, while also giving a user needs perspective from the point of view of a trusted digital repository, and its users. This chapter sets out the history of the many directions across which the ADS attempted to implement NLP functionality within our collections, a discussion of why it remains elusive, and the challenges we hope to address in future. 10.2 Archaeology and Unpublished Fieldwork Reports Countries like the UK, with a large development-led archaeology sector have a problem. Every time a developer or a government department decides to undertake a project that may result in the destruction of an archaeological resource, they must hire a commercial field unit to assess, and potentially ‘offset’ the loss of the archaeological resource in a way that advances understanding and provides public benefit (Thomas, 2019). In most instances, this results in a single synthetic output: the unpublished fieldwork report (also referred to as grey literature). This report is meant to satisfy the requirements of the local government authority, by documenting and describing what was found, and what its significance might be. These reports are often quite mundane but form a critical part of the corpus archaeologists must consult prior to undertaking any work nearby, or for academic research focussed beyond the site level (Fulford & Holbrook, 2018). As such, archaeologists must access these reports, to understand any prior archaeological interventions and include them in their planning (Evans, 2015). This used to mean travel to local authority offices to consult the lone paper copy of the report, resulting in time and expense impacting the already tight profit margins to which most archaeologists must work (Bradley, 2006). To mitigate this in the UK, the OASIS system (https://oasis.ac.uk/) was developed to automate these compliance procedures, where practitioners must provide information about their investigations to local Historic Environment Records (HERs) or national heritage bodies in digital form (Richards & Hardman, 2008). The ADS was able to add digital preservation and dissemination of these reports to this workflow, resulting in over 62,000 reports now freely available online through a designated interface called
10 NLP and Archaeology: A View from a Digital Archive 217 the ADS Library of Unpublished Fieldwork Reports (n.d.), latterly incorporated within a larger application called the ADS Library (https://archaeologydataservice. ac.uk/library/). To understand the impact of the open dissemination of these reports in digital form, an economic assessment was undertaken, resulting in the primary conclusion that a significant number of commercial archaeological field units in the UK now make considerable use of ADS held reports within their costing and business models, and the resultant savings are now a critical part of their commercial workflows (Beagrie & Houghton, 2013). 10.3 Metadata Challenges As important as this resource is to most archaeological work in the UK, the actual content within unpublished fieldwork reports is notoriously difficult to access. While most reports from the 2000s to the present were “born digital” and created using a word processing program, HERs continue to digitise the report backlogs, many of which are scanned from typewritten pages which OCR software still struggles to read. The purpose of these reports is to inform the description and classification of the individual heritage asset, so the creation of resource discovery metadata to allow their content to be searched alongside other reports has become an ongoing challenge for the ADS. When Natural Language Processing (NLP) as a form of automated metadata extraction first came on the scene, ADS staff quickly saw a potential solution they were eager to explore. Use of controlled vocabularies was not originally a major feature of OASIS, but as time went on the value of their incorporation became obvious. Work was undertaken to standardise controlled vocabularies for the heritage sector in Great Britain (Binding & Tudhope, 2016) along with international standards relevant to archaeology, such as the Getty Art and Architecture Thesaurus and the Getty Thesaurus of Geographical Names (Cobb, 2015). The possibilities for incorporating Named Entity Recognition (NER), in combination with NLP, might bring not only greater Findability and Accessibility, but also Interoperability; the F, A and I in the FAIR Principles (Wilkinson et al., 2016), to the Library of Unpublished Fieldwork Reports. 10.4 The Archaeotools Project In 2007 the ADS made its first attempt to use NLP to address this challenge. In partnership with the Natural Language Processing (NLP) Research Group at the University of Sheffield the ADS undertook the Archaeotools project, funded under the UK Arts and Humanities e-Science Initiative. For perspective, at the start of Archaeotools the Library of Unpublished Fieldwork Reports totalled around 2300 with a growth rate of 50–100 each month (Jeffrey et al., 2009), and even at that
218 H. Wright et al. point the challenges in creating robust metadata were considered significant enough to look to NLP for help. Over 2 years, the Archaeotools project worked to implement Information Extraction (IE) over a corpus of around 1000 unstructured Unpublished Fieldwork Reports, using very simple NER to map the results to subject (What), location (Where) or temporal designation (When) and to Dublin Core (DC) entities for publication information: Subject (topics covered, findings mentioned) mapped to What Location (place names related to events and findings) mapped Where Temporal (temporal information related to findings) mapped to When Grid reference mapped to Where Report title, creator, publisher, publisher contact, publication date mapped to DC Event dates mapped to DC Bibliography and references mapped to DC Archaeotools employed both a knowledge engineering approach (KE) and an automatic training (AT) approach to the ADS Library of Unpublished Fieldwork Reports. For the parts of the data that appeared in standardised contexts, such as the title of the report, the KE approach was applied. For the heterogeneous and irregular data, such as placenames and subjects, both approaches were then combined for IE. This produced mixed results. For example, when archaeologists discuss a site, they invariably also discuss other sites that are relevant to the site under investigation. This caused problems for Archaeotools, as the NLP was unable to distinguish between them, and determine Where the site was located amongst all the extracted placenames. This was solved by only returning the placenames found in the summary, or barring a summary, the first 10% of the document. This still returned 162 out of 960 reports where the correct placename could not be identified (Richards et al., 2011), which speaks to the lack of structure in archaeological reports generally. Upon completion of the Archaeotools project, it was agreed there was potential for applying automated data and metadata extraction to Unpublished Fieldwork Reports, and that the combined approach was generally felt to be successful. Archaeotools was much more successful when applied to structured data within the ADS Archsearch search interface and helped build the Solr index upon which Archsearch was based. It was also successful in identifying trends in the use of terms in other types of unstructured text held by the ADS, such as the Proceedings of the Society of Antiquaries of Scotland (PSAS). The PSAS have been published since 1851, and as such do not follow modern forms of sentence structure and syntax, but Archaeotools was still able to find useful patterns: Here is an example section of text from an early PSAS paper and the named entities that could be extracted from it using NLP (Bateman & Jeffrey, 2011): The bronze ring inscribed with runic characters, presented to the Society, was found in the year 1849, in the Abbey Park, in the immediate neighbourhood of St Andrews. It is a large bronze finger ring inscribed on the two faces in Anglo-Saxon runes, and is of peculiar interest, as being, it is believed, the only example of the Paleography of our Anglo-Saxon
10 NLP and Archaeology: A View from a Digital Archive 219 forefathers hitherto found in Scotland, with the single, but most important exception of the noble monument at Ruthwell, Dumfriesshire. (Wilson, 1851) What – Bronze Ring, Runic Inscription (also ‘the monument at Ruthwell’) Where – Abbey Park, St Andrews, (also Ruthwell, Dumfriesshire) When – Anglo-Saxon (also ‘found 1849’) Who – Wilson, D. Media – PSAS (PDF) This early attempt allowed the ADS to see what might be possible but moving forward was reliant on access to highly specialised research expertise which was not available after completion of the Archaeotools project. We also became aware that while we could clearly see how useful this might be for our users, and the archaeologists in the project considered it largely successful, for NLP researchers our desired application of wishing to extract richer resource discovery metadata from unstructured text, was very mundane in comparison to their other research areas within computer science. 10.5 NLP and the ARIADNE Infrastructure The next opportunity to advance NLP and NER capabilities for the ADS Unpublished Fieldwork Reports didn’t arrive until 2014 with the Advanced Research Infrastructure for Archaeological Dataset Networking in Europe (ARIADNE), which was funded under the European Community’s Seventh Framework Programme. The primary output of the project was the ARIADNE Portal (https:// portal.ariadne-infrastructure.eu/), but a range of research pilots were also key to the project, including NLP. For ARIADNE, partners built on previous work done within the Semantic Technologies for Archaeological Resources (STAR) project (Vlachidis et al., 2010) using rule-based NLP methods and the GATE toolkit developed at Sheffield (https:// gate.ac.uk/). Work done in English within STAR was expanded within ARIADNE to determine whether it could be adapted for Dutch and Swedish grey literature. This work was undertaken in collaboration with ARIADNE partners at Leiden University, DANS (Dutch reports) and the Swedish National Data Service (Swedish reports) and made use of glossaries and thesauri from the Dutch and Swedish partners, importing the thesauri into GATE, and analysing the suitability and performance for NER use. The NER experimentation focused on the characteristics: • • • • • • Archaeological Context Material Physical Object (Finds) Monument Place Temporal (Time Appellation)
220 H. Wright et al. Fig. 10.1 Screenshot of the suite of named entity recognizer pipelines developed within the ARIADNE project, available on GATEcloud (https://cloud.gate.ac.uk/shopfront) Once extracted, these were mapped to native vocabularies, CIDOC CRM subjects or Getty Art and Architecture Thesaurus concepts. Additional thematic case studies were undertaken, including a numismatic study, and a dendrochronology study (Vlachidis et al., 2017). The archaeology and dendrochronology NER pipelines are openly available via GATEcloud (Fig. 10.1). With the hiring of an Applications Developer from the NLP Research Group at Sheffield, the ADS was also able to experiment with machine learning-based NLP techniques for our NLP contribution to ARIADNE. The ADS built upon the lessons learned from Archaeotools and attempted once again to use NLP tools to unlock the potential of its Unpublished Fieldwork Reports. This text typically exists in PDF, MS Word, or plain text files within the ADS Library of Unpublished Fieldwork Reports. Training data developed by Archaeotools was applied to a classifier (a machine learning tool that takes data items and places them into classes resulting in a statistical model used to extract entities from text). Several classifiers were tested. In the end, the CRF classifier was chosen, not because it produced better results than other classifiers, but because it was easier to implement into an API and required less computing time to produce results (Vlachidis et al., 2017). The models were built by the classifier using gazetteers, which were then directly applied to data from the Unpublished Fieldwork Reports. As there was no Gold Standard for Unpublished Fieldwork Reports, a group of reports from the North Yorkshire region which had not been part of any previous Archaeotools training data were chosen and manually scored. The gazetteers improved extraction performance, confirming substantial overlap of information from various corpora within the grey literature. To train the CRF classifier, a window size of five surrounding tokens and the following feature set was used:
10 NLP and Archaeology: A View from a Digital Archive 221 Fig. 10.2 Screen shot of the prototype ADS API showing original text on the left and metadata entities extracted from the text on the right • • • • N-Grams with max length of six tokens (i.e., contiguous sequence of words) Exact token string Features from previous word class sequence Archaeological Gazetteer As a proof-of-concept, a prototype API was developed (Fig. 10.2). The prototype allowed domain experts to annotate reports, generate resource discovery metadata where none existed, and generate metadata which could be used to further train the classifiers. While only a prototype, the interface showed how an API might be visualised if implemented in an existing interface, allowing users to correct the results, ensuring the creation of better-quality metadata. The API included the Named Entities to which the ADS would map the extracted metadata, using the thesauri created within the Semantic ENrichment Enabling Sustainability of arCHaeological Links (SENESCHAL) project (Binding & Tudhope, 2016). Text was entered into the “input text area” allowing entities to be extracted using the CRF classifier. The extracted entities were displayed as suggested metadata to the right of the entered text, allowing users to assess the relevance of the extracted entities. The API also detected and extracted UK grid references using manually crafted regular expressions, which were automatically verified using UK Geospatial data held within a spatial database. By clicking on a magnifying glass icon beside each generated entity, users could jump directly
222 H. Wright et al. to the word in the text from which the result was derived, allowing easy manual verification or correction (Vlachidis et al., 2017). The entities extracted by the NER module with this method, using a relatively short piece of summary text, produced good results. The small number of entities returned were easy to view and manage within the API, although this became more complicated when tested with a larger body of text. Development time was focussed on creating and refining a version of the API that allowed external users to submit NER tasks to an ADS server, which then returned a set of terms, including their category and offsets, which developers could incorporate into their existing interfaces. The API was a RESTful HTTP web service where users could submit a task, and clients POST JSON to an API endpoint. If successful, the API would return JSON in the response. Depending on the complexity of the task and length of the content, the API might return the result asynchronously, in which case the results were not immediately available, and it was envisioned that a delay would have to be implemented by the developer after the task was submitted. Here is an example of the kind of text tested by the API: The various sites that Butser Ancient Farm occupied over the years were all, in one way or another, based on the concept of demonstrating what a farm, which would have existed in the British Iron Age circa 300 BC, might have been like. It was founded in 1972 as the Butser Ancient Farm Project and occupied sites on Little Butser Hill, Hampshire UK, the so-called Demonstration Site in the grounds of Queen Elizabeth Country Park, Hampshire and finally it moved to its present site at Bascomb Down in 1991. The work was extended to include the construction of a Roman Villa in 2002. Using the simple example text, the API returned the following response aligned to pre-defined general categories such as location, subject and period (what, where, when): Placename: Butser Ancient Farm Placename: Little Butser Hill Placename: Hampshire Placename: Bascomb Down Subject: farm Subject: Villa Temporal: British Iron Age circa 300 BC Temporal: Roman Temporal: 2002 The ADS planned to test the API as part of the redevelopment of the OASIS system. The aim was to allow an archaeologist to upload a report to OASIS, and by choosing to use the NER service, automatically extract suggested metadata for the report. The metadata could then be accepted or rejected by the user and then automatically populated into the correct fields within OASIS. OASIS is intended to be easy to use and the process of uploading a report, creating the relevant metadata, and submitting it to the system was meant to be quick to complete. It took time to assess how long the call-response method would take in real time, as OASIS was already quite a process-intensive application. The idea of adding
10 NLP and Archaeology: A View from a Digital Archive 223 equally intensive NER functionality, including the time needed to approve or reject suggested matches, would likely try user patience. In addition, the user base of OASIS was expanding beyond commercial archaeologists, with community users also being a key demographic, making ease and speed of use even more important. Ensuring the updates to the OASIS system were fit for purpose meant the deployment was delayed, and it was not possible to continue the work within the ARIADNE project, but the API was circulated informally to ARIADNE partners for internal review to test the service and provide feedback. It was found that while the service did not return any false positives, it failed to return all potential positives. This would indicate that while the metadata generated by the service was reliable, it was not complete. It was determined that this was likely due to a need for more training data, and/or an adjustment to the algorithm. The ADS hoped to continue to work on the service beyond the completion of the ARIADNE project, to see if further improvements were possible, but this was curtailed due to capacity issues and staffing changes. 10.6 The ADS and NLP at the University of York The ADS got another opportunity to work toward our goal of using NLP to enhance the metadata in our Unpublished Fieldwork Reports after an approach by our own Department of Computer Science at the University of York. After a few meetings it was decided that this would be a fruitful research avenue which we should pursue together. Once again, the interests of the computer scientists were not necessarily the interests of the archaeologists, and after further discussion, the importance of applying NLP to enhance the metadata within our Unpublished Fieldwork reports over more cutting-edge, but less useful avenues, was agreed upon. This led to opportunities for MSc students in Computer Science and Archaeological Information Systems (AIS). The AIS student was interested in applying NLP to identify Zooarchaeological data for her dissertation, and the implementation provided a dissertation topic for the Computer Science student. The students worked together, guided by Computer Science and ADS staff, with additional Zooarchaeology domain expertise and supervision provided by staff in the Department of Archaeology. Working directly with an archaeologist with domain expertise was particularly important to the usefulness of the outcome. At this point, the ADS had experienced working with quite a few computer scientists interested in Machine Learning generally, and NLP in particular. As an open access digital archive, the ADS is also approached fairly regularly by computer scientists from a range of countries and institutions wanting to use our data for their own research. This was of course encouraged, but when the results are shared with us, it is clear they did not understand the kinds of questions that were of importance to archaeologists (even when that was their expressed intention for wanting to use the data). By not working alongside archaeologists during the entire process, the results were at best, of little use, and at worst, lacking an important ethical
224 H. Wright et al. understanding of how to interact with data about the human past. For a project where the ADS would be investing time and capacity, we were insistent that for this new collaboration, archaeologists with critical domain expertise and an understanding of the theoretical underpinnings of our discipline were included in all phases of the work. As the Computer Science staff expertise was in deep learning neural networks, the students were advised to use the Keras open-source neural network library (Keras, 2017) as Keras allowed easy switching between different backend engines. The AIS student used a statistical approach to evaluate the performance of the NER tool, and to evaluate its usefulness to archaeologists, she created a questionnaire. To create the NER Gold Standard for Zooarchaeological data, a diverse range of Zooarchaeological reports were chosen from the (at the time) over 42,000 reports held by the ADS. This ensured a wide range of animal taxa, locations and archaeological time periods were represented. These were then annotated by Zooarchaeology specialists to create the Test Set, which was then checked by the Zooarchaeology domain expert at York as the superannotator. The Test Set was then transformed into XML using GATE, resulting in a Gold Standard consisting of over 2000 annotations in 97 different classes (Talboom, 2017). As this was undertaken as MSc dissertation projects, it was not possible to fully develop the NER tool, but some promising outcomes were evident. The tool was able to correctly tag entities and annotate entities, however, the tool also tagged words that did not have a tag in the original annotation or tagged the incorrect term. These could have been improved with further development. Rather than focussing on the F-measure, which is invariably disappointing for computer scientists when dealing with archaeological data (a lower F-measure often represents a very useful result for domain users), the AIS student focussed on understanding the usefulness for archaeological researchers. Using the Likert scale, users were asked to evaluate their agreement with the following statements, along with some qualitative queries: • • • • • • The tool retrieved the required information The majority of the retrieved documents was relevant I trust the retrieved information The tool found all relevant documents within the repository What percentage of the received documents do you see as relevant? Other perceptions? Once the testing was completed, users were then asked to evaluate their agreement with these additional statements/queries: • • • • • • The tool is time saving The tool is facilitating my searching for information The tool is relevant to my work tasks I would use this tool in my future research What type of queries did you use? Other perceptions?
10 NLP and Archaeology: A View from a Digital Archive 225 Fig. 10.3 Bar chart showing the mean results of the usefulness of osteoarchaeological entity search, using the seven-point Likert scale. (Reproduced from Talks, 2019) While the NER tool could not be completed within the timeframe of the MSc, the creation of this evaluation structure, undertaken with the domain expert, was a step in the right direction. The following year, this work was later used by another student as the basis for his BSc and then MSc dissertations in Archaeological Science, expanding the NER tool for osteoarchaeology. This work used the same technical approach but expanded testing the usefulness of the tool to include domain experts in osteoarchaeology, archaeologists, archaeology students, and non-archaeologists in the UK (Talks, 2019). While the evaluation was quite general, it shows we are moving to a point where we can start asking these key questions in a robust way (Fig. 10.3). As with our other attempts however, the capacity to move this into ADS technical workflows was lost when the researchers and students moved on, but our discussions continue, and an effective, efficient, and useful solution will be found. 10.7 Conclusion The way the ADS currently views the potential of NLP, particularly regarding its use as part of OASIS, focussed on two main challenges. It should be relatively simple to extract accurate Dublin Core-type metadata, so the first challenge is making it possible to drill down within unstructured text to pull out significance beyond the literal identification of terminologies (such as henge). This is particularly important, as archaeological grey literature does not exist in a vacuum. It links
226 H. Wright et al. to and from extant inventory records such as the Heritage Gateway (https://www. heritagegateway.org.uk/gateway/) in England, Canmore (https://canmore.org.uk/) in Scotland, and the ARIADNE Portal (https://portal.ariadne-infrastructure.eu/) internationally. Users of these aggregation interfaces may well come across ADS reports by searching for a particular site-type, but may not realise there is additional, more detailed information available because it isn’t represented in the metadata. Associated with this would be an ambition to allow greater understanding and a thematic sense of how this digital object contributes to a research topic or question. For example, rather than searching for Mesolithic and/or Neolithic, this may allow resource discovery related to the Mesolithic to Neolithic transition, and therefore evidence of the adoption of agriculture. Pragmatically this would also improve the range of vocabularies to encompass specialisms within the archaeological community such as Zooarchaeology; turning the research questions themselves into defined concepts. Within the UK, a new generation of Research Frameworks (e.g. https://scarf. scot/) have begun to turn common questions and fields of study into clear entities, and more generally, other disciplines and initiatives outside of archaeology, such as the Marine Environment (https://vocab.nerc.ac.uk/collection/) have taken steps to define their classifications and understandings to help the researcher identify how something contributes to their specialism. Another example would be the pottery reporting and classification work currently underway by Historic England (Barclay et al., 2016; Medieval Pottery Research Group, 2019). An NLP approach that can augment digital objects to show a user why and how they are relevant, is an ambitious but necessary step for us, particularly as the ADS Library of Unpublished Fieldwork Report continues to grow at such a rapid rate. It also has the benefit of being a dynamic exercise. Unlike the production of metadata at the point of data creation or deposition within an archive, it allows continued re-evaluation of a digital object, based on new ideas and interpretations. The ongoing curation of metadata through the implementation of NLP tools may be key to fulfilling the ambition of a living, engaging and relevant digital archive. The second challenge is the historic backlog of Unpublished Fieldwork Reports. If the ADS currently holds around 62,000 reports, we estimate that represents maybe half (optimistically) of the reports produced in England alone since 1990 (where roughly 4000 fieldwork events take place per year, on average). A large factor limiting the deposition and online dissemination of these Unpublished Fieldwork Reports is the time it would require to manually create metadata. The ability to auto-generate syntactically meaningful metadata for this backlog would double the usefulness of the resource by significantly increasing findability, accessibility, and interoperability. This would also provide a pathway for dealing with other types of unstructured text the ADS hold, such as backruns of archaeological journals. The ADS has learned many lessons about the difficulties of incorporating NLP into our workflows in any practical way over the last 12 years, but our belief in the potential it could unlock for our unstructured data remains undiminished. If NLP is going to be implemented in substantive ways, it needs to be supported in the same
10 NLP and Archaeology: A View from a Digital Archive 227 ways as other forms of vital technical infrastructure. At the same time, the barriers to implementation are not technical. We understand the technology and how to use it, but for smaller organisations, such as the ADS, it is nearly impossible to retain NLP specialists who can both implement and maintain these types of systems in the long term, and it is currently unclear how this problem can be solved. The projectto-project funding landscape and the transient nature of academia also contribute to this issue. It is difficult to make the case for these needs when it is equally difficult to create viable proofs-of-concept to even demonstrate their potential. We will keep working to find a solution. References Barclay, A., Knight, D., Booth, P., Evans, J., Brown, D. H., & Wood, I. (2016). Standards for pottery studies in archaeology. Medieval Pottery Research Group. https://romanpotterystudy. org.uk/wp-content/uploads/2016/06/Standard_for_Pottery_Studies_in_Archaeology.pdf Bateman, J., & Jeffrey, S. (2011). What matters about the monument: Reconstructing historical classification. Internet Archaeology, 29. https://doi.org/10.11141/ia.29.6 Beagrie, N., & Houghton, J. (2013). The value and impact of the archaeology data service. JISC Website. http://repository.jisc.ac.uk/5509/1/ADSReport_final.pdf Binding, C., & Tudhope, D. (2016). Improving interoperability using vocabulary linked data. International Journal on Digital Libraries, 17(1), 5–21. Bradley, R. (2006, September). Bridging the two cultures – Commercial archaeology and the study of prehistoric Britain. The Antiquaries Journal, 86, 1–13. Cobb, J. (2015). The journey to linked open data: The Getty vocabularies. Journal of Library Metadata, 15(3-4), 142–156. Evans, T. N. L. (2015). A reassessment of archaeological Grey literature: Semantics and paradoxes. Internet Archaeology, 40. https://doi.org/10.11141/ia.40.6 Fulford, M., & Holbrook, N. (2018). Relevant beyond the Roman period: Approaches to the investigation, analysis and dissemination of archaeological investigations of the rural settlements and landscapes of Roman Britain. Archaeological Journal, 175(2), 214–230. Jeffrey, S., Richards, J., Ciravegna, F., Waller, S., Chapman, S., & Zhang, Z. (2009). The Archaeotools project: Faceted classification and natural language processing in an archaeological context. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 367(1897), 2507–2519. Keras. (2017, June 10). Keras. https://keras.io/ Library of Unpublished Fieldwork Reports. (n.d.). https://archaeologydataservice.ac.uk/archives/ view/greylit/. Accessed 15 Apr 2021. Medieval Pottery Research Group. (2019, December 21). A guide to the classification of Medieval ceramic forms digitisation project completed!https://medievalpottery.org.uk/2019/12/ 21/a-guide-to-the-classification-of-medieval-ceramic-forms-digitisation-project-completed/ Richards, J. D., & Hardman, C. S. (2008). Stepping back from the trench edge: An archaeological perspective on the development of standards for recording and publication. In M. Greengrass & L. Hughes (Eds.), The virtual representation of the past. Digital research in the arts & humanities (pp. 101–112). Ashgate. Richards, J., Jeffrey, S., Waller, S., Ciravegna, F., Chapman, S., & Zhang, Z. (2011). The archaeology data service and the Archaeotools project. Archaeology, 2. https://doi.org/10.2307/ j.ctvhhhfgw.11 Talboom, L. (2017). Improving the discoverability of zooarchaeological data with the help of natural language processing (MSc in archaeological information systems). University of York.
228 H. Wright et al. Talks, A. (2019). An exploration of NLP and NER for enhanced search in osteoarchaeological and palaeopathological textual resources (Bachelor of science in bioarchaeology). University of York. Thomas, R. (2019). It’s not mitigation! Policy and practice in development-led archaeology in England. The Historic Environment: Policy & Practice, 10(3-4), 328–344. Vlachidis, A., Binding, C., Tudhope, D., & May, K. (2010, July). Excavating grey literature: A case study on the rich indexing of archaeological documents via natural language-processing techniques and knowledge-based resources. ASLIB Proceedings, 39, 88. Vlachidis, A., Tudhope, D., Wansleeben, M., Azzopardi, J., Green, K., Xia, L., & Wright, H. (2017). D16.4 final report on natural language processing. http://legacy.ariadne-infrastructure. eu/wp-content/uploads/2019/01/D16.4_Final_Report_on_Natural_Language_Processing_ Final.pdf Wilkinson, M. D., Michel, D., Aalbersberg, I. J. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., et al. (2016). The FAIR guiding principles for scientific data management and stewardship. Scientific Data, 3(March), 160018. Wilson, D. (1851). Inscribed ring. Proceedings of the Society of Antiquaries of Scotland, 1, 1851– 1854.
Chapter 11 Information Extraction and Machine Learning for Archaeological Texts Alex Brandsen Abstract Archaeologists are creating ever-increasing amounts of textual data. So much in fact, that manual reading and inspection has become practically impossible. By leveraging computational approaches, it is possible to extract relevant information from this big data, allowing for more efficient research and new analyses. In this chapter, methods and techniques to extract information from archaeological texts through Machine Learning are introduced and discussed, with a focus on practical examples. After reading the chapter, you should have a clear grasp on the possibilities of text mining in archaeology, the current state of research, and enough information to start your own text analyses. Keywords Information extraction · Text mining · Machine learning · Data science 11.1 Introduction In the last ten years or so, archaeologists have started generating ‘big data’: information assets characterised by the four V’s: Volume, Velocity, Veracity, and Variety. Volume simply means the size of the data, generally meaning many gigabytes or terabytes of data. The Velocity is the speed at which data updates, and Veracity is a measure of how trustworthy data is, these V’s are generally less relevant to archaeology. Variety speaks to the level of heterogeneity in the data, and how fuzzy or unclear data is, something we do encounter regularly in archaeology. In short, big data is so unwieldy that it is not feasible to analyse it with conventional methods. This problem of having too much data has been described by multiple authors, with Bevan calling it a “data deluge” (Bevan, 2015, p. 1) and Vince noting “we are drowning in our own data” (Vince, 1996, p. 1). Dealing with A. Brandsen (!) Faculty of Archaeology, Leiden University, Leiden, Netherlands e-mail: a.brandsen@arch.leidenuniv.nl © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 C. Gonzalez-Perez et al. (eds.), Discourse and Argumentation in Archaeology: Conceptual and Computational Approaches, Quantitative Archaeology and Archaeological Modelling, https://doi.org/10.1007/978-3-031-37156-1_11 229
230 A. Brandsen structured data—such as databases and geospatial data—has received a fair share of our attention, but much less research is being done on processing and analysing unstructured information: the documents that archaeologists write (Bevan, 2015). These texts do contain a wealth of information, and by using computational tools to access, extract, and combine information in the documents, we can perform new synthesising research on large scales. Due to the amount of text data, computational methods almost become a necessity: in the Netherlands alone more than 4000 excavation reports are produced each year, not to mention thousands of books, papers, and preprints as well. When we extrapolate that to the situation across the world, it quickly becomes clear that manual inspection of these texts is unfeasible. In this chapter, methods and techniques to extract information from archaeological texts through Machine Learning are introduced and discussed, with a focus on practical examples. After reading the chapter, you should have a clear grasp on the possibilities of text mining in archaeology, the current state of research, and enough information to start your own text analyses. 11.2 Information Extraction Techniques In this section, an overview is given of techniques that are useful for Information Extraction. The focus here is on explaining what the methods do and what use they have, while the following sections go more into the technical details on how to practically apply these methods. The first part of this section explains some general concepts, and the subsections deal with specific techniques. Natural Language Processing (NLP) is a research field which explores how computers can be used to understand and manipulate natural language, i.e. speech and written text in human language (as opposed to formal/constructed language such as programming languages) (Chowdhury, 2005). A document collection that we can analyse with NLP is called a corpus. Text Mining is a subfield of NLP, and is a group of tasks all related to analysing written text (Feldman & Sanger, 2007). The most common task is Information Extraction (IE): extracting information from unstructured text. In essence, IE is text simplification: turning unstructured text into a structured view of the information present in the text. There are a number of techniques that fall under IE, we here list the most used ones: • Named Entity Recognition (NER), detection of entities (or concepts) in text. For example, finding all archaeological artefacts or time periods mentioned in a document. • Document Classification, the process of automatically assigning labels to a text. For example, assigning subject metadata to a document by classifying it into one, or a number of categories. • Topic Modelling, automatically clustering documents into distinct groups based on their content.
11 Information Extraction and Machine Learning for Archaeological Texts 231 • Relationship Extraction, the identification of relations between entities. For example, finding relations such as [artefact] is found in [context]. • Coreference Resolution, detection of coreference between entities. For example, in the sentence “We found an arrow head, it is dated to the Neolithic”, it is useful to know that “it” and “arrow head” refer to the same real world entity. • Terminology Extraction or ontology extraction, a method of automatically constructing a thesaurus by analysing a large corpus. At the moment in archaeology, NER, document classification, and topic modelling are being researched the most, and we will focus mainly on these techniques so we can illustrate the methods with archaeological examples. Machine Learning (ML) is a form of Artificial Intelligence, which uses relations between data points in large data sets to create statistical models which can be used for various purposes. Generally, a Machine Learning algorithm will be able to take a human-annotated set of data (e.g. labelled entities in text) and create a statistical model which can predict new, unlabelled data (e.g. predict entities in text). Another way to make predictions is to use handcrafted rules: a Rule-Based approach. Here, an expert manually creates rules that can predict labels, e.g., “if a word is in an artefact word list, label it as an artefact”. Both approaches have been used with various degrees of success in archaeological text mining. More detailed information about ML can be found in Sect. 11.5. The term Grey Literature is used to describe documents which are not published in the traditional sense of the word by academic or commercial publishing houses, such as field reports and theses. A lot of research in archaeological text mining is focused on using this type of literature, as it is generally the most prevalent and the least studied. 11.2.1 Named Entity Recognition Named Entity Recognition is the process of finding different categories of named entities (or concepts) in text. Quite often, the categories of entities are persons, organisations, locations, time periods, and quantities, as defined in CoNLL-2002 (Conference on Natural Language Learning), the most used NER benchmark (Tjong Kim Sang, 2002). For archaeology, these entities are not as relevant, except for time periods and locations. Generally, archaeologists are interested in entity types such as artefacts, materials, contexts, species, locations, and time periods. An example sentence with marked archaeological named entities is shown in Fig. 11.1. NER can be useful for a range of applications. In archaeology, it is mainly used to automatically generate metadata, i.e. descriptions of data (Jeffrey et al., 2009; Byrne & Klein, 2010; Vlachidis, 2012; Niccolucci & Richards, 2013; Vlachidis et al., 2017). A lot of archaeological texts have limited or no metadata at all, and to be able to find these texts for research, it is useful to have some description of the data, such as which time periods, places, and artefacts are mentioned in the text. Instead of
232 A. Brandsen Fig. 11.1 Example sentence with named entities marked. Entity types have been shortened: MAT = material, ART = artefact, PER = time period, and CON = context using all the detected entities, often a selection is made of the most important ones to serve as metadata. It is also possible to connect the detected entities to entries in thesauri, to further improve interoperability between data sets (Tudhope et al., 2011). Instead of using a selection of entities as metadata, it is also possible to index all the entities in a search engine, together with the full text of the documents (Brandsen et al., 2019, 2020). This makes it possible for researchers to do advanced searches and find more relevant documents to their research. Recent research by Brandsen and Lippok (2021) shows that using such an intelligent search engine leads to more data and new insights. With either search on entities, or automatic metadata generation, the goal is to make the data more FAIR (Findable, Accessible, Interoperable and Reusable, Wilkinson et al., 2016). Another approach that is made possible by entity extraction is pattern mining: using algorithms to automatically extract meaningful patterns from data. In other domains this has been researched extensively, but in archaeology it is quite rare. One example of pattern mining is the work by Wilcke et al. (2019), but they worked with hand-created XML (eXtensible Markup Language) data, not extracted entities. Their results are perhaps a bit lacking, but this might be partly due to the small amount of data. Once these methods are applied to thousands or millions of entities extracted from big data, more meaningful patterns might emerge. 11.2.2 Document Classification Unlike NER where extraction of entities is the goal, document classification aims to assign one or more labels to a text. But similar to NER, this is often done to create metadata (Brandsen & Koole, 2021). Another approach is to label documents as relevant or irrelevant for a particular research question (Fischer et al., 2021). In archaeology, the focus has mainly been on NER, so there are not many examples. However, there are many possible applications, for example: determining whether or not tweets or reviews are about (a particular kind of) archaeology, or classifying a large amount of papers into certain categories for further study. There are three variants of document classification: 1. Binary classification, each document is classified as either belonging, or not belonging, to one class (is a document relevant or not?) 2. Multi-class classification, each document can be classified as belonging to one of multiple classes (Which time period is a document about?)
11 Information Extraction and Machine Learning for Archaeological Texts 233 3. Multi-label classification, each document can be classified as belonging to one or more classes (which subject(s) are discussed in this document?) Generally, when assigning metadata, we would be looking for multiple classes for each document, so multi-label classification is the most common. A particular type of document classification worth mentioning here is sentiment analysis, a task where the goal is to determine whether a text is positive or negative about a certain topic (Turney, 2002). This task is quite popular, receiving a lot of research interest, mainly in eCommerce and social media settings, where finding out whether a post or review is positive or negative is useful information. In an archaeological setting, it has been used to e.g. study the reactions to the destruction of heritage sites by ISIS (Cunliffe & Curini, 2018) and how tourists interact with monuments (Paolanti et al., 2019). 11.2.3 Topic Modelling Topic modelling is a Machine Learning technique that can be used to cluster a collection of documents into groups, based on the word content of those documents. It is an unsupervised Machine Learning technique, as it does not require data annotated by humans (see Sect. 11.5.1). Because of this, it is a quick and easy way to start analysing a corpus. However, it is difficult to get accurate or meaningful results, which is why document classification is often more worthwhile. That being said, some potential uses for topic modelling include automatically grouping papers about a certain topic into subtopics to decide which will be manually read, and investigating changes in language use (or even theoretical trends) in archaeological literature over time and/or space (Plets et al., 2021; Jackson et al., 2020). An example of a topic model is shown in Fig. 11.4. 11.2.4 Information Retrieval Related to Information Extraction is Information Retrieval (IR): methods to retrieve a set of documents based on a user defined query. In essence, IR is building search engines. IR is a research area of its own, with conferences and journals dedicated to the topic, and an in depth discussion is out of the scope of this chapter. However, it is worth briefly discussing IR in the context of Information Extraction. Finding relevant literature for research is of course a common problem across all of science, and archaeology is no exception. But currently, most literature search is done using metadata search: searching through the title, description, and sometimes keywords manually entered by the author or archival service. Such metadata can not fully capture all the information present in a document, and as such relevant information can be missed. More advanced search systems, including full-text
234 A. Brandsen search and named entity search, have been explored to some extent (Paijmans & Brandsen, 2010; Gibbs & Colley, 2012; Brandsen et al., 2019), but more research is needed to create better search engines for archaeologists. 11.3 Previous Research on Information Extraction in the Archaeology Domain As mentioned by Richards et al. (2015), archaeological texts have excellent potential for text mining, due to its relatively well-controlled vocabulary. Much work has gone into producing thesauri (controlled word lists) in multiple languages (Gilman & Newman, 2007; Brandt et al., 1992), which we can leverage to extract information from text. In the last fifteen years, a range of projects have been undertaken which have attempted to use text mining within archaeology, starting with rule-based methods, and gradually moving towards Machine Learning based methods. In this section, we provide a brief overview of these text mining studies. Amrani et al. (2008) created a workflow allowing archaeologists to extract information from English texts, but in a quite specialised way on a small collection. At the same time, The OpenBoek project (Paijmans & Brandsen, 2010) used Machine Learning to automatically label time periods and locations in Dutch field reports, which were searchable together with the full text in a web application. Byrne and Klein (2010) experimented with extracting archaeological events and converting them to RDF (Resource Description Framework) triples, to increase the interconnectivity between data sets from different sources. The Archaeotools project used a combination of rule-based and Machine Learning approaches to automatically generate location, time period, and subject metadata for a small selection of reports. This generated metadata could then be used for searching in a facetted interface (Jeffrey et al., 2009). In the OPTIMA project, Vlachidis (2012) applied rule-based techniques to perform NER and express entities in the CIDOC-CRM schema.1 The output of this research was further built upon in the STAR and STELLAR projects, where Tudhope et al. (2011) created a search demonstrator which searches through extracted entities from text and five excavation databases at the same time. As part of the international ARIADNE project, some experiments were undertaken with NLP on grey literature. The ADS (Archaeology Data Service) in the UK created a prototype web application which uses NER to automatically create metadata for English reports, and experimented with rule-based NER for Dutch and Swedish reports as well (Vlachidis et al., 2017). 1 The International Documentation Committee—Conceptual Reference Model (a way to model information) for cultural heritage and museum documentation, as defined by the International Committee for Documentation (CIDOC) (2014).
11 Information Extraction and Machine Learning for Archaeological Texts 235 In her Master’s thesis, Talboom (2017) specifically targeted zooarchaeological entities in reports, using Machine Learning to perform NER. Building on her work, Talks (2019) added more entity types and did an extensive evaluation with users. Very recently, Fischer et al. (2021) used text mining as part of their research on ruralisation in the Netherlands. They created a term document matrix and compared this with a list of keywords related to the topic of ruralisation, to assess the usefulness of a large number of reports for a number of topics. In a slightly different direction, Plets et al. (2021) describes research on grey literature from Belgium, looking at theoretical trends over time. They successfully manage to use text mining to find these trends and chart the decrease in text quality due to developer-led archaeology. Similarly, Jackson et al. (2020) used topic modelling techniques on English data to see if there are patterned ways in which archaeologists write about osteology. In the Netherlands, the AGNES (Archaeological Grey literature Named Entity Search) project has been working to create a search engine for Dutch excavation reports, which leverages Machine Learning NER to make more efficient and detailed search possible (Brandsen et al., 2019). This project will be extended to also include English and German documents, and include more document types (such as books, papers, etc) over the next four years. From this overview it is evident that there is a clear focus on grey literature, presumably due to their ubiquity and potential for Information Extraction. A lot of research also focuses on making data more FAIR (Wilkinson et al., 2016), by automatically creating more metadata, by building search engines, and by expressing unstructured text information into machine-readable formats to increase interoperability. Generally, the aim is to assist archaeologists in their research by making big data sets that are difficult to navigate more manageable and searchable. The hope is that by harnessing computer power to analyse and summarise big data, we can do better synthesising research at large scales, leading to a better view of the past. 11.4 Preprocessing As mentioned in the introduction, text is unstructured data. This means that there is no external structure added to the data which allows computers to easily process it. For example, in a database table, each number or string is stored in a cell, which is in a specific column and row. This column/row structure allows computers to ‘understand’ the data and perform analyses. Humans can easily make sense of text by reading it, because we have an incredible amount of background knowledge: we know the world we exist in, we know which words describe which concepts in that world, we know the language the text is written in, and we have the ability to read words and process them into meaningful information in our minds. However, to computers, text is just a sequence of individual symbols with no inherent meaning, and they do not know the language or the concepts in the real
236 A. Brandsen world that the text describes. This makes it a lot more difficult to work with text data than it is to work with structured data. To convert text into a format where a computer can work with it, we need to do some preprocessing. During this process, we can also help our analyses by excluding or transforming words (further detailed below). While preprocessing is not the most exciting part of a text analysis, the choices made in this part of the process can make big differences in the outcome of an analysis. In addition, it does tend to take up a substantial amount of time: often more time is spent on defining and fine-tuning the preprocessing methods than on the actual analysis itself. In the next couple of sections, an overview is given of common preprocessing tasks, how to perform them, and what effect they (can) have on the results of an analysis. Note Most of the software we reference in this chapter is Linux and Python based, but all steps and methods can be done with other software on other platforms as well. 11.4.1 Converting to Plain Text For many data sets, the first step is to convert the files to plain text (.txt files). Most often, text data sets in archaeology are collections of PDF (Portable Document Format) or Microsoft Word files. They are not ideal for computation approaches as these formats also encode style information (among other details), which we generally do not need in our analyses and just cause unwanted noise. A lot of tools exist to convert PDF files to plain text. Commonly used tools are pdftotext,2 a command-line utility for Linux distributions, and the PDFMiner3 and PyPDF24 packages in Python. For Word files, the most used tool is docx2txt,5 or the Python library textract.6 which can extract text from a range of file formats, also including image and sound files. Choosing which tool is best for a use case depends on the end goal of the analysis, and which software you are using, but any tool that creates plain text should be sufficient. 2 https://www.xpdfreader.com/pdftotext-man.html. 3 https://pdfminersix.readthedocs.io/. 4 https://pypi.org/project/PyPDF2/. 5 http://docx2txt.sourceforge.net/. 6 https://textract.readthedocs.io/en/stable/.
11 Information Extraction and Machine Learning for Archaeological Texts 237 11.4.2 Optical Character Recognition Most documents we deal with nowadays are ‘born digital’, which means they were created using computer software. Born digital documents will have the text encoded as actual characters which we can extract using the methods mentioned above. However, some files will be scanned pictures of existing hard copy documents, this is mainly the case for older documents (before the 2000s). In this case, the file does not contain actual computer readable characters, but just a grid of pixels in varying colours as far as the computer is concerned. To extract computer-readable text, the process of Optical Character Recognition (OCR) is needed (Merali & Smith, 1985). This method ‘reads’ the image of the text, and uses pattern matching and/or Machine Learning methods to translate these into machine-readable text. OCR is never 100% accurate, and as such you should expect noise being introduced in this phase, with the level of noise largely dependent on the quality of the original print and the quality of the scans. But once the computer readable text is available, we can continue with the rest of the preprocessing. 11.4.3 Sentence Boundary Detection Most methods and analyses require one sentence per line in the text file, but often this is not what the plain text conversion provides. The first step is to do sentence boundary detection (also called sentence boundary disambiguation): automatically detecting where sentences begin and end (Riley, 1989). This might seem trivial, as sentences are normally ended by a full stop, exclamation mark or question mark, but in practice this is quite challenging due to the potential ambiguity of punctuation marks. A full stop for example, can be a part of an abbreviation, an email address, or be a decimal point, all instances where we should not end a sentence. The following sentences illustrate the problem: We found a Neolithic(?) flint axe in pit no. 2, but didn’t find any pottery. An adjacent post hole yielded enough charcoal for a C14 dating. Here, a number of potential problems are highlighted: the question mark after “Neolithic” and full stop after “pit no” are not the ends of the sentence. Also note the full stop on the next line, this is not a typo, but a common occurrence in text created by PDF conversion and/or OCR. The correct sentence split is on the full stop after “pottery”:
238 A. Brandsen We found a Neolithic(?) flint axe in pit no. 2, but didn’t find any pottery. An adjacent post hole yielded enough charcoal for a C14 dating. Sentence boundary detection is mainly done by using rules of varying complexity, but can also be tackled by Machine Learning. In Python, the most commonly used method is the NLTK (Natural Language ToolKit) package, which also performs a large number of other NLP tasks (Bird et al., 2009). Note The first sentence in the box above will be used to illustrate all the following steps, to give a view of the full process. 11.4.4 Tokenisation Like we mentioned earlier in this section, computers see text as a sequence of symbols with no inherent meaning. This also means that computers do not know what words are, or how to distinguish where a word starts and ends. To convert a sentence into a sequence of words, we use tokenisation, which returns a list of tokens. Tokens are similar to words, and a token often is a word, but not always. A token is defined as an instance of a sequence of characters that are grouped together as a useful unit for processing (Manning et al., 2008). This difference between words and tokens can be illustrated by tokenising our example sentence: We found a Neolithic ( ? ) flint axe in pit no . 2 , but did n’t find any pottery . In this example, most of the tokens are indeed words, but punctuation marks have also become individual tokens and “didn’t” has been converted to two separate tokens. This tokenisation process is important as it removes noise from words (such as the brackets and question mark after ‘Neolithic’) and turns sentences into chunks of information that can be processed further.
11 Information Extraction and Machine Learning for Archaeological Texts 239 11.4.5 Normalisation Once we have a list of tokens, we can normalise and clean the text. There are a lot of different methods that can be applied at this stage, but the following most common steps are discussed: lowercasing, removing words, stripping characters, stemming, and lemmatisation. Important All normalisation preprocessing steps described below can affect the end result of the analysis both positively and negatively. Depending on the data, the methods used and the end goal, each normalisation technique should be individually considered. 11.4.5.1 Lowercasing This is pretty much what the title suggest: changing all uppercase characters to their respective lowercase versions. Lowercasing is useful for most analyses, as it decreases the number of different tokens in your data set, and merges the uppercase and lowercase versions of a token into one. This intuitively makes sense as there is no semantic difference between e.g. “Axe” and “axe”, but to a computer, these are two different strings, and will be analysed separately. There are some exceptions in which case it is better to keep the uppercase characters, a good example is Named Entity Recognition (NER), a method for automatically finding and labelling certain concepts such as person names and place names. To be able to recognise such a name, having the casing intact is useful, as names will most often be capitalised, making it easier to distinguish between the last name “Flint” and the material “flint”. Lowercasing is a common function in most text analysis software and programming languages, e.g. in Python the lower() function can be used. Here is our example sentence with lowercasing applied: we found a neolithic ( ? ) flint axe in pit no . 2 , but did n’t find any pottery . 11.4.5.2 Removing Words Quite often, certain words are uninformative for an analysis, and removing them will reduce noise. It removes low-level information from the text to give more weight to important information. Besides this effect, removing common words also
240 A. Brandsen reduces the size of the data, and thus reduces the training time of Machine Learning algorithms. A method that is often used is to remove so-called ‘stop words’. Stop words are the most common words in a language, like articles, prepositions, pronouns, conjunctions, etc. Some examples in English include “the”, “a”, “so”, “is” and “that”. The words that should be deleted are defined in a manually defined stop words list. Luckily, most—if not all—text analysis software provides such a list for English, and often many more languages too. Here, we have used the NLTK stop word list to remove them from our example: found neolithic ( ? ) flint axe pit . 2 , find pottery . Another way to reduce the total number of different tokens is to remove the n most common tokens, this is very similar to removing a predefined list of stop words. The other way around, it is also possible to remove tokens that only occur n times in the data set, with n often being a number between 1 and 3. This way we remove tokens that are uncommon, and thus uninformative for some tasks. Do keep in mind that for certain types of analyses, having stop words or uncommon words in your data can be useful. An example is sentiment analysis, where words like “not” are indicative of a negative sentiment. 11.4.5.3 Stripping Characters Besides removing words, we can also remove other types of tokens, such as punctuation, numbers and symbols. Doing all three on our example sentence leads to: found neolithic flint axe pit find pottery Quite often these types of tokens are not informative, but there are exceptions. If you are trying to find C14 dates in text, removing all symbols will also remove “.±”, which is a very strong indicator of a C14 date in archaeological texts. 11.4.5.4 Stemming Stemming is the process of reducing words to their stem, i.e. removing the suffix of the word. For example, “house”, “houses” and “housing” all have the same stem: “hous”. As you can see, the stem does not need to be an actual word, although it often is. It is sufficient if all related words are reduced to the same stem—even if
11 Information Extraction and Machine Learning for Archaeological Texts 241 that stem is not a word—as to a computer there is no semantic difference. Stemming groups related words into one representation, again reducing the variety in the data. If we apply stemming using the Porter stemmer (Porter, 1980) from NLTK to our example sentence, we end up with: found neolith flint axe pit find potteri While stemming in general does reduce the variety of tokens, in this case it has not: “found” and “find” have been assigned separate stems, while really they have a very similar meaning. Stemming does not take into account the actual meaning of a word, but uses rules to remove suffixes. 11.4.5.5 Lemmatisation Lemmatisation is similar to stemming, but a bit more advanced. It reduces a word not to its stem, but to its lemma: the dictionary form of a word. Instead of chopping off a word’s suffix, it uses linguistic features to determine the Part Of Speech (POS) and semantic meaning of a word, and subsequently finds the corresponding lemma. This means that the lemma of “axing” is “ax” and the lemma of “axe” is “axe”, indicating the semantic difference between the two. If we use lemmatisation instead of stemming, our example sentence looks like this: find neolithic flint axe pit find pottery Here we see that “found” and “find” are both assigned the same lemma: “find”, unlike with stemming. Depending on your application, either stemming or lemmatisation can be more appropriate, but something to keep in mind is that lemmatisation is a more difficult task than stemming, and as such is less accurate. 11.4.5.6 Normalisation and Information Loss As already indicated with examples for e.g. lowercasing, stripping characters, and removing words, not all normalisation techniques are useful for every analysis. This is because the goal of normalisation is to reduce the complexity i.e., simplify data. However, when data is simplified, this means some information is lost. The trick is finding a balance between normalising the text to such an extent that classifiers can more easily learn statistical patterns, while not removing any information that might be useful for that classifier.
242 A. Brandsen Generally, there are certain types of preprocessing that are commonly used for each type of analysis, but every data set is different and requires a thorough consideration by inspecting the data and comparing normalisation steps. In archaeology, we often deal with particular types of fuzziness and ambiguity in our data, when compared to other domains. This means that when working with archaeological data, careful consideration is needed from both the computer science side and the archaeology side, to make optimal choices regarding normalisation and other choices during the development of text mining tools. 11.4.6 Adding Structure At this point we have preprocessed our text, and we are at the final step before we can start our analysis: adding structure, so a computer can do something with our data. The easiest way to do this is the so-called Bag of Words (BoW) approach (Manning et al., 2009). Here we simply create a table with a column for each word, and each row representing a sentence (or document). In the cells, we store how often a word occurs in each sentence. This word count is called Term Frequency (TF). See Table 11.1 for an example using the two sentences we introduced in Sect. 11.4.3. Note that the order of the words in the original sentences is lost, this is why it is called a Bag of Words: all the words end up in a ‘bag’, shuffled and without ordering. Most text analysis software will do this data transformation automatically. While here the BoW is represented as a table for clarity, in reality each sentence is stored as a vector: a list of numbers. In Python, this would look like: [0, 1, 0, 0, 0, 2, 1, 0, 1, 1, 0, 1, 0] At this point, the computer does not know which TF stands for which word, because it does not need this information. Based purely on the vectors of a large number of sentences (or documents), it can extract statistical relationships and make predictions based on those. For example, if we are interested in automatically finding sentences about the Neolithic, an algorithm would infer that if the TF of ‘neolithic’ is not 0, it has the label Neolithic. Of course this is not a great example as just looking for the term ‘neolithic’ would be enough to find that out, but relationships between other (less literal) words can also be used to make predictions. 11.4.6.1 Term Frequency and Inverse Document Frequency In the above example, we used the Term Frequency to create the vector. While this is an easy way to create a vector, it is not always ideal. Some words are simply more frequent in general, but that does not mean they are actually more important or relevant. To counteract this problem, we can use the Term Frequency–Inverse Document Frequency (TF-IDF), which lowers the value if a word occurs in many
1 2 Adjacent 0 1 Axe 1 0 c14 0 1 Charcoal 0 1 Dating 0 1 Find 2 0 Flint 1 0 Hole 0 1 Table 11.1 Example of two sentences converted into a Bag of Words table, after preprocessing Neolithic 1 0 Pit 1 0 Post 0 1 Pottery 1 0 Yield 0 1 11 Information Extraction and Machine Learning for Archaeological Texts 243
244 A. Brandsen documents (Manning et al., 2009). TF-IDF is currently the most used statistical measure for information retrieval and text mining (Beel et al., 2016). 11.4.7 Selecting Preprocessing Steps All the preprocessing steps discussed here have different effects on the eventual input data for Machine Learning, and can greatly affect the outcome. It is always worth considering which steps will help for a particular analysis, as not all steps are always applicable. In general though, when doing document classification and topic modelling, most of these steps will help increase the performance, as they decrease the variety in the text and group different forms of semantically similar words together, making it easier to generalise over the data. On the other hand, for NER (and also e.g. word embeddings, see Sect. 11.5.5), it is wise to only perform sentence detection and tokenisation and none of the normalisation steps, as differences in e.g. casing and symbols can be key indicators for entities. Lastly, another option is to simply try all possible combinations of preprocessing steps in a brute force method, and select the best performing combinations (Brandsen & Koole, 2021). 11.5 Machine Learning Once the data has been selected, preprocessed, and converted into the right format, the actual analysis can be performed, i.e. NER or document classification. Most information extraction methods used today are based on Machine Learning (ML), a subfield of Artificial Intelligence. ML can be defined as the study of algorithms that automatically improve through experience (Mitchell, 1997). This means that these algorithms can build models based on training (or sample) data, without being programmed by a human to do so, in this way ‘learning’ by themselves how to predict labels for unseen data. Machine Learning is ubiquitous in modern life, being used in everything from predicting the spam status of emails to preventing traffic accidents in cars by automatically detecting obstacles. Within archaeological research, ML is also becoming more popular, and is being used for a wide range of problems. Some examples are the automatic detection of archaeological features in LiDAR (Light Detection And Ranging) data (Verschoofvan der Vaart et al., 2020; Trier et al., 2018), classification of pottery types based on photos (Gualandi et al., 2021; Pawlowicz & Downum, 2021), analysing projectile point typology (Nash & Prewitt, 2016), and differentiating between lithic assemblages (Grove & Blinkhorn, 2020). For a more in depth overview of ML in archaeology and cultural heritage, see Bickler (2021) and Fiorucci et al. (2020). Machine Learning has also been applied to textual data, both modern and ancient. Some examples of the analysis of ancient texts are the translation of
11 Information Extraction and Machine Learning for Archaeological Texts 245 cuneiform script using an app (Sanders, 2018) and the reconstruction of missing pieces of ancient Greek text (Sommerschield, 2020). But mostly, ML is used to analyse modern texts about archaeology: e.g. books, papers, theses, and field reports written by archaeologists in the last couple of decades. Some examples include codifying semantically consistent definitions of archaeological concepts (Davis, 2020), Named Entity Recognition (Paijmans & Brandsen, 2010; Vlachidis et al., 2017; Tudhope et al., 2011; Talboom, 2017; Brandsen et al., 2019; Vlachidis et al., 2021), classifying reports on time period, location and/or subject (Jeffrey et al., 2009; Brandsen & Koole, 2021), topic modelling (Jackson et al., 2020), creating a list of relevant documents for certain topics (Fischer et al., 2021) and investigating theoretical trends over time (Plets et al., 2021). Machine Learning is often juxtaposed with rule-based approaches: methods where a researcher defines a set of rules by hand, which are used to predict labels. These rule-based methods have been successful in many cases, but we see that ML approaches are being used more and more, as they are generally more effective at learning patterns in complex data (Richards et al., 2015; Bickler, 2021). This is also why this chapter will mainly focus on ML methods. That being said, rule-based approaches still have a place in current research, especially for problems where there is not a lot of training data, and can be used together with ML methods in many cases. 11.5.1 Supervised and Unsupervised Learning Machine Learning can be subdivided into two main types: supervised and unsupervised learning. The difference is that supervised learning uses data that has been labelled by humans, while unsupervised learning uses raw, unlabelled data. In effect, supervised learning is where an algorithm learns patterns between the raw data and true labels, while unsupervised learning detects patterns in the raw data itself. To give an example of supervised learning in text, we can take the automatic labelling of papers with topics. Imagine a stack of thousands of archaeology papers with no information on topic (no assigned keywords in the metadata). But it would be useful to know the topic, so we can make a selection of which papers to read. It is possible to create a classifier model to predict the topic (or class) of a paper, by feeding a supervised Machine Learning algorithm a collection of data that has been labelled by an archaeologist. See Table 11.2 for a simplified example with two possible subjects: Neolithic or Bronze Age. The first four rows are training data, with a label assigned by a human. The ‘Content’ column contains the titles of the papers, preprocessed as discussed in Sect. 11.4.
246 A. Brandsen Table 11.2 Simplified example of Machine Learning document classification, with four human labelled training examples and one unlabelled document in the bottom row Type Training Training Training Training Prediction Content Flint domestication use wear analysis Knapping flint wheat harvest Flint bronze sickle knapping Bronze axe wood use wear analysis Domestication flint knapping microscope Class (or label) Neolithic Neolithic Bronze Age Bronze Age ??? Table 11.3 Example prediction based on how often terms occur in a class in the training data. The percentages show in which proportion of documents from a class this term occurs. The label ‘Neolithic’ can be assigned with a 66.6% certainty Term Domestication Flint Knapping Microscope Average Neolithic % 50% 100% 50% n/a 66.6% Bronze age % 0% 50% 50% n/a 33.3% Try it Yourself Based on the information in this table, a human would be able to predict the label of the last row. If you want to do some human brain powered ‘machine’ learning, you can try it yourself: which label do you predict for the last row? By reading the words in the examples, humans can figure out that the terms “domestication”, “flint”, and “knapping” are indicators that the predicted label should be “Neolithic”, even if they have no prior knowledge of archaeology. Computers can do this too, but mathematically. Imagine each term being assigned a score between 0 and 1, based on which documents the terms occur in. A score of 0 means it only occurs in Neolithic, a score of 1 means it only occurs in Bronze Age, and a score between 0 and 1 means it occurs in both to some degree. For a new, unlabelled document, we can then calculate the average of all the term scores and predict a label based on whether it is above or below 0.5. This process is illustrated in Table 11.3. For each term, we calculate in what percentage of documents it occurs for each label, and then average those scores to get a final prediction. The term “domestication” occurs in 50% of Neolithic documents, and not at all in Bronze Age documents, meaning it is an indicator (or feature) of a document belonging to the Neolithic class. “flint” occurs in both classes but more in Neolithic and “knapping” occurs in both equally, meaning it does not indicate either class. Then finally, “microscope” does not occur in either class, so also does not affect the classification.
11 Information Extraction and Machine Learning for Archaeological Texts 247 Table 11.4 Simplified example of four documents, with term frequencies for the terms ‘flint’ and ‘bronze’ Document number 1 2 3 4 Document content Flint bronze flint bronze flint Flint flint flint flint Bronze flint bronze bronze bronze Bronze bronze bronze Flint TF 3 4 1 0 Bronze TF 2 0 4 3 When the scores are averaged, we can see that the label “Neolithic” is predicted with 66.6% certainty. Of course, this is a very simplified model of classification, but should give an insight into how Machine Learning algorithms deal with text data. In real world examples, there are often many more possible labels, many more terms to take into account, and possibly bias due to differences in document size, all of which complicate matters. Unsupervised learning does not use any labelled data, but can still make subdivisions in data. In essence, most unsupervised learning methods are some variation of a clustering algorithm. Of course, archaeologists are very familiar with clustering algorithms, and we have been using these methods for at least 40 years (Doran & Hodson, 1975). Some examples include geospatial clustering of finds (Bogdanovic, 2015) and clustering artefacts into a typology (Gilboa et al., 2004). It is possible to do the same with text data, after transforming the text into a vector (as discussed in Sect. 11.4.6). The most used unsupervised learning technique used for archaeological texts is topic modelling: automatically creating a number of clusters, each with a certain topic, defined by which words are most frequent in that cluster. A simplified example is provided in Table 11.4, where four documents are preprocessed to only contain the terms ‘flint’ and ‘bronze’, and the term frequencies are shown for each. At this point, the documents have been vectorised: for each document there is a vector with two dimensions (the dimensions being flint and bronze). This can also be expressed as a list of vectors (here displayed in Python syntax): { 1 2 3 4 : : : : [3 [4 [1 [0 , , , , 2], 0], 4], 3] } For each document number, there is a corresponding list containing two numbers (a two-dimensional vector). By treating these numbers as x and y values, we can easily plot this as a scatter plot to visualise the data: see Fig. 11.2. Here, the document vectors are plotted in two-dimensional vector space, and an algorithm has been applied to cluster the points into two groups based on their position in the plot. In essence, this is how clustering text data works, although normally the vectors used have more than two dimensions, often hundreds or even thousands, which makes
248 A. Brandsen Fig. 11.2 Scatter plot of the data from Table 11.4. Points have been clustered and assigned a label and colour these hyper-dimensional vector spaces difficult to intuitively illustrate.7 The group label and colour have been manually assigned, and this is an important point: any clustering algorithm will return a number of clusters, but it can not assign a label, this has to be done manually afterwards by inspecting the data. In this example, it was very easy to assign topic labels, as there are two well-defined groups with different content, but this is not always the case. An example is the work by (Plets et al., 2021), who used topic modelling to try and detect changes in theoretical thought in archaeology over time. Unfortunately, the clusters presented by the algorithm could not be assigned to different schools of thought. Another problem with (some) clustering algorithms is that they are non-deterministic, i.e. running the same analysis on the same data with the same settings will produce differing results every time. The size of the difference can be small or substantial, and any conclusion based on the method will have to take this into account. As unsupervised learning does not provide actual labels for our data, it is not often used. Therefore, the rest of this chapter will mainly focus on the characteristics of supervised learning. 7 There are possibilities to display multi-dimensional data in two or three dimensions: an often used method is Principal Component Analysis (Wold et al., 1987) which ‘flattens’ data, but also loses complexity.
11 Information Extraction and Machine Learning for Archaeological Texts 249 11.5.2 Training Data and Validation For any supervised Machine Learning method, training data with annotated labels is required for the algorithm to learn from. This training data is sometimes also called the ‘ground truth’. Depending on the task, different types of labels are needed. In the case of document classification, one or more labels is needed for each document. For Named Entity Recognition, a label is required for each token. Such a combination of an observation (a document or a token) and a corresponding label is called a sample. It is important that the training data is representative of the entire data set, so the algorithm can learn—and deal with—the variety that exists in the data. Once the labelled data has been created, it is required to split the data into a train set and a test set. The train set is used to train the model, so the algorithm uses these samples to create statistical relations. The test set is then used to evaluate how well the model is performing. This is done by letting the model predict labels on the test set, and then comparing them to the ground truth labels to calculate a performance metric (see Sect. 11.5.4). It is important that the model does not ‘see’ the test set during training, as that would give an unfair advantage, and the performance score would not reflect the effectiveness it will have on unlabelled data. Often, the data is split into 80% train set and 20% test set, also called an 80/20 split. But other splits with more or less test data can be useful, depending on the task and the amount of data available. However, such a static split does come with a caveat: if the test set coincidentally happens to be very easy or hard to predict, this does not truly reflect how well the model would perform on new data. To prevent this, it is often better to perform leave-one-out cross validation. This means that the data is split into k equal sized chunks, and the model is trained k times, each time using one of the chunks as the test set, and the rest of the chunks (k-1) as training data. Afterwards, the performance metrics are averaged across the k runs to provide a more well-rounded indication of the model’s quality. For any task, a relatively large number of samples is needed for the algorithm to be effective. Unfortunately, there is no predefined number of samples which would guarantee good performance: each task is different and has varying levels of complexity, which influences the amount of data needed. For some simpler tasks with just two possible labels (a binary task), 300 to 500 samples might be enough, but for e.g. complex NER, thousands of examples are needed for each target entity. One way to determine if more data will improve the performance, is by again splitting the data into k chunks (with k often being 10), and running the algorithm k-1 times, starting with 2 chunks (1 train, 1 test) and every time adding one chunk of data (which becomes the test set). The performance scores can then be plotted in a line graph to judge whether adding more labelled data would help. A curve that flattens out means adding more data will probably not help, but a curve that has not flattened out yet indicates more data will probably increase the performance (Brandsen et al., 2020).
250 A. Brandsen 11.5.3 Commonly Used Algorithms for Information Extraction Many algorithms have been developed for Machine Learning, each with different strengths and weaknesses. To give an idea of which are useful for Information Extraction, some commonly used methods are discussed here for each type of task. This list is far from exhaustive, for a more complete list see (Mohri et al., 2018). For text classification, the most commonly used algorithm used to be Naive Bayes (NB), which uses the probabilities of known events to predict new events. In fact, the example in Table 11.3 is a form of NB. It is particularly useful when working with small training data, as it learns quickly compared to other methods which require more data. However, this method is not very powerful or good at handling complex data, and has been largely superseded by the Support Vector Machines (SVM) algorithm (Cortes & Vapnik, 1995). SVM works by plotting vectors in a space (like in Fig. 11.2), and drawing a line (called a hyperplane in multidimensional space) dividing the points so the distance between all points and the hyperplane is maximised. Any new vectors will be assigned a label depending on which side of the hyperplane it is plotted. To illustrate this, a hyperplane has been added in Fig. 11.3, and a new, unlabelled point is added (green square). The red point—based on this hyperplane—would be classified as ‘Neolithic’ by the SVM. In reality, these hyperplanes are never straight, but bend around the vector points in hyper-dimensional space. This can be calculated, but unfortunately not visualised. For NB and SVM, the order of the samples does not matter and is not taken into account. However, for Named Entity Recognition, a lot of information is encoded in the order of—and context around—a token. Think of e.g. the time period entity, Fig. 11.3 Scatter plot of the data from Table 11.4. Points have been clustered and assigned a label and colour. A hyperplane dividing the two groups of points has been added in green. A new, unlabelled vector is displayed (green square)
11 Information Extraction and Machine Learning for Archaeological Texts 251 it is very likely that time periods are preceded by the tokens “around” or “from”, for example “we found a house from 1800 BCE”. Having information about tokens before and after the current token the algorithm is trying to label is very useful, and so for NER, other algorithms are more effective. The most well-known one is the Conditional Random Fields (CRF) algorithm, and is generally the starting point for any sequence classification problem. (Joachims, 1998). It is relatively easy to use, does not require much computing power or time to run, and it generally produces good results. NB, SVM and CRF are all available to use via the scikit-learn Python library (Pedregosa et al., 2011), among others. For both document classification and NER, neural networks (also known as Deep Learning) have seen an increase in popularity over the last decade. As they are able to capture complexity more accurately than ‘traditional’ algorithms, they can provide state-of-the-art performance. For document classification, Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs) are often used. For NER, the Bidirectional Long Short Term Memory (BiLSTM) algorithm is popular, as well as the Bidirectional Encoder Representations from Transformers (BERT) architecture. BERT is discussed in more detail in Sect. 11.5.6 below. Clustering is often performed using the k-means algorithm. Also used extensively with other types of data, k-means aims to group vectors into k clusters by minimising the within-cluster variance. Specifically for topic modelling, LDA (Latent Dirichlet Allocation) is often used, which can relatively easily be implemented with the pyLDAvis Python library (Sievert & Shirley, 2014). In Fig. 11.4 an example is shown of the output of pyLDAvis, displaying ten clusters of documents about ancient fire use. Topic number 8 has been highlighted, and the top relevant terms for that topic displayed on the right. Judging from the top terms, this particular cluster seems to be about burning bones and/or cremations. 11.5.4 Evaluation and Performance Metrics To see how well an algorithm performs, we need to evaluate the output. For unsupervised learning, there are no target labels, so it is not possible to do a quantitative evaluation, and a qualitative evaluation is needed by manually inspecting the outcome. For supervised learning, it is possible to quantitatively measure the output, as we can compare the labels predicted by the algorithm to the true labels assigned by human annotators, and calculate performance metrics. However, due to the fuzziness and ambiguity in archaeological data, sometimes a manual inspection of the predicted labels is warranted, to see in detail where the algorithm is correct and incorrect (or nearly correct). However, a performance metric should always be calculated when possible, as this gives an overview of the performance over the entire test set, but also because this makes it possible to compare different methods on the same data, and promotes reproducible open science.
Fig. 11.4 Topic model visualisation by pyLDAvis 252 A. Brandsen
11 Information Extraction and Machine Learning for Archaeological Texts Table 11.5 Illustrating the true/false positive/negative categories Label True False 253 Prediction True False tp fn fp tn In the rest of this section, the most common metrics for text mining are discussed, but many more exist, and it is worth investigating which one is most suitable for a given task. Most metrics involve calculations of percentages between correctly and incorrectly classified items. A label is predicted by the algorithm for each item in the test set, and those predicted labels are compared to the true labels. Each prediction can then be assigned to one of the categories listed below. The categories and metrics are further explained with an archaeological example: imagine a document classification task where the goal is to automatically label a large set of papers as being relevant or irrelevant to a certain research topic, e.g. Early Medieval cremations in Europe. As the amount of possibly relevant papers is too large to manually inspect, using a Machine Learning algorithm to make a preselection could be useful. • True positive (tp). When a paper is relevant, and the label is correctly predicted as ‘relevant’. • True negative (tn). When a paper is irrelevant, and the label is correctly predicted as ‘irrelevant’. • False negative (fn). When a paper is relevant, but the label is incorrectly predicted as ‘irrelevant’. More simply put: a paper that has not been recognised as relevant by the system. • False positive (fp). When a paper is not relevant, but the label is incorrectly predicted as ’relevant’. More simply put: the system thinks a paper is relevant when it is not. These categories are further illustrated in Table 11.5. Once all the items have been assigned a category, it is possible to calculate performance metrics. The most used measures in Machine Learning in general are recall, precision and F1 score.. Recall shows what proportion of actual positives was identified correctly. For our example, it indicates out of all the relevant papers, what percentage have been correctly labelled as ‘relevant’. It can also be viewed as the percentage of papers that have been found. It is defined as follows: Recall = . tp tp + f n (11.1)
254 A. Brandsen Precision shows what proportion of positive identifications was actually correct. For our example, it indicates out of all the papers labelled as ‘relevant’, what percentage was actually relevant. In essence, this means that it shows that when an algorithm predicts a label, how often it is right. It is defined as follows: Precision = . tp tp + fp (11.2) The F1 score (or F measure) combines recall and precision to provide an overall evaluation metric. More specifically, it is the harmonic mean of precision and recall, and is defined as: F =2· . precision · recall precision + recall (11.3) The 1 in F1 means that recall and precision are equally important (and thus equally weighted) when calculating the harmonic mean. But in some cases, either recall or precision are more important, in which case the F score can be weighted to favour recall or precision more. This is done by changing to the F0.5 score (precision is 2 times more important/weighted) or F2 score (recall is 2 times more important/weighted). For example, (Brandsen et al., 2019) shows that when searching for documents, Dutch archaeologists are more interested in getting as many relevant documents as possible, even if this means getting more irrelevant documents. This means that the recall is more important, and the F2 score would be more suited for that task. Other metrics are less popular, but can be useful in certain situations. These include the ROC (Receiver Operating Characteristic) curve, the related AUC (Area Under the ROC Curve), and the MCC (Matthews Correlation Coefficient) (Verschoof-van der Vaart & Landauer, 2021). If a less popular metric is chosen, it is useful to also include the most common metric(s) for the task as well, to be able to compare algorithms between studies. Generally, these metrics are not calculated manually. Most Machine Learning libraries will have functions that can automatically calculate the metrics, based on an input of predicted labels and correct labels. For Python, the Metrics functions of the scikit-learn library have the metrics discussed here available, among many others. 11.5.5 Word Embeddings Word embeddings are a different way to represent tokens. Instead of using the actual string (or a number assigned to that string), word embedding algorithms
11 Information Extraction and Machine Learning for Archaeological Texts 255 convert tokens into vectors. Instead of creating a vector for each document (like in Sect. 11.4.6), a vector is created for each token in the document. The vectors are created by the word embeddings algorithm in such a way that words which occur in similar contexts (i.e. have similar words near it in sentences), have similar vectors (i.e. are near each other in the vector space). This is based on the distributional hypothesis: words that have similar contexts will have similar meanings (Harris, 1954). Once the vectors for the individual tokens have been calculated, a single vector for the document can be created (for example by averaging all the token vectors). Word embeddings are useful because to a computer, “axe” and “adze” are two completely unrelated strings, the computer does not know they are semantically similar. However, if the vectors of these two words are near each other, the computer can use this information to understand that they are similar. To illustrate this, the following two sentences (before and after preprocessing) would have substantially different vectors when using the method from Sect. 11.4.6: • “The axe was used to chop wood” (axe used chop wood) • “The birch was carved with an adze” (birch carved adze) In fact, after preprocessing, none of the tokens overlap between sentences. Yet it is clear to humans that these sentences have substantial semantic similarity. Assuming the word embeddings have been created correctly, these two sentences would be quite similar: axe is similar to adze, carve is similar to chop, and birch is similar to wood. And when averaged into a document vector, the computer understands these sentences to be similar, even though the tokens are completely different. This makes word embeddings incredibly powerful for dealing with text data, and consequently, it has been applied with great success to many tasks: from document classification and NER, to automatically expanding search queries and tracking the change of meanings of words over time. Word embeddings can be created by multiple algorithms, the most popular currently are word2vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014) and FastText (Bojanowski et al., 2016). Instead of just averaging word vectors to get to a document vector, it is also possible to use the doc2vec model (Le & Mikolov, 2014) to create more sophisticated document vectors. All these models can be implemented in Python using the gensim library (Rehurek & Sojka, 2010). 11.5.6 Transfer Learning While word embeddings are a significant improvement over the bag-of-words model, the current state of the art in NLP is transfer learning, specifically transformer based methods. These Deep Learning algorithms can ‘learn’ language from extremely large unlabelled text collections (billions of tokens) to create a language model, and can then use this model to better perform specific tasks. The idea behind these language models, is that they mimic human behaviour: by already knowing a language, it is easier to try and predict classes. The most well-
256 A. Brandsen known architecture is called BERT (Bidirectional Encoder Representations from Transformers), developed by researchers at Google (Devlin et al., 2019). Similar to word embeddings, BERT also creates vectors for tokens. However, traditional word embeddings such as word2vec are context independent, meaning a token will always have the same vector, regardless of its context. In this case, the word ‘flint’ will have the same vector in “a flint axe” and “Mr. Flint” while being semantically very different. BERT produces context-dependent embeddings, meaning the vector of a token is different if it occurs in a different context. This means that BERT is particularly useful for tasks where synonymy and polysemy are a problem, and can handle more complex tasks with higher performance. While BERT is being used extensively in other domains for a wide range of tasks, in archaeology it has not been used much yet. An exception to this is the work by Brandsen et al. (2021), who used BERT for NER, showing substantial improvement over a CRF based method. BERT does have large potential for use in archaeology, as it leverages unlabelled data to train the neural net. Normally for deep learning algorithms, a very large amount of labelled data is needed to train the network, which often is not available in our domain. By creating a language model with unlabelled data, only a modest amount of labelled data is needed to fine-tune the model on a specific NLP task. The unlabelled training data does not necessarily need to be archaeological data either, as long as it is in the same language, hence why it is called ‘transfer learning’, transferring knowledge from one domain to another. However, research does show that using unlabelled training data from the domain itself can lead to modest increases in performance (Lee et al., 2019; Beltagy et al., 2020; Brandsen et al., 2021). 11.6 Conclusions This chapter has described various Information Extraction methods, how to perform these using Machine Learning, and given an introduction to data preprocessing and the evaluation of text mining algorithms, with a focus on practical archaeological examples. This provides a snapshot of the current state of research, as well as some ideas and inspiration for future directions. Even though a large proportion of this chapter is dedicated to machine automation, computers are not going to replace archaeologists any time soon, as also noted by other archaeologists working with Machine Learning (Verschoof-van der Vaart et al., 2020; Traviglia et al., 2016). While computers are great at calculating answers, they are not able to ask any questions: formulating research ideas and analysing the output of algorithms will still have to be done by humans. A certain level of creativity and ability to ‘connect the dots’ is needed in science, which we need human brains for. While neural networks are getting increasingly complex and are starting to mimic human learning, they are still rudimentary when compared to the incredible ability of humans to learn from scratch, connect ideas, and think out of
11 Information Extraction and Machine Learning for Archaeological Texts 257 the box, while algorithms are (quite literally) bound by their ‘box’, or the limits of the programming that created them. Instead, computational tools are meant to further enhance the archaeologist’s ability to draw meaningful conclusions from raw data and to make this process more efficient. Outsourcing menial tasks to e.g. students and volunteers has a long history in archaeology, and science as a whole. The more we can replace this valuable human time with relatively unvaluable computing time, the more we can focus on the interesting parts of archaeology: drawing conclusions and building theories relating to past human behaviour. However, this new big data paradigm (Löwenborg, 2018) and the associated techniques also pose new challenges (Kintigh et al., 2014; Gattiglia, 2015). An example is the reliability of data. While data has always been central to archaeological knowledge, in this new paradigm large data sets can be presumed to be unproblematic, and any problems with quality or reliability to be overcome purely by the quantity of data (Huggett, 2020). This can cause the conceptual understanding of the creation of archaeological data—gained over decades of discussion—to be overlooked when performing these large scale syntheses (Cunningham & MacEachern, 2016). At the same time, discussions around big data have seen a renewed interest in the relation between data, and the knowledge created from this data (Leonelli, 2015). As big data is getting increasingly ubiquitous in archaeology, it seems inevitable that computational methods to find, combine, and analyse nuggets of information from large data sets will become increasingly common place. As other domains— and specifically computer science scholars—push the state of the art of NLP towards ever-increasing performance, we as archaeologists can use and adapt these new tools with relative ease, or collaborate with experts. Using these methods and applying them to our own data, for our own research questions, we can perform better synthesising research at larger scales, leading to a better, more thorough understanding of the past. References Amrani, A., Abajian, V., & Kodratoff, Y. (2008). A chain of text-mining to extract information in archaeology. In Information and communication technologies: From theory to applications, ICTTA 2008, Damascus, Syria (pp. 1–5). https://doi.org/10.1109/ICTTA.2008.4529905 Beel, J., Gipp, B., Langer, S., & Breitinger, C. (2016). Research-paper recommender systems: A literature survey. International Journal on Digital Libraries, 17(4), 305–338. https://doi.org/ 10.1007/S00799-015-0156-0 Beltagy, I., Lo, K., & Cohan, A. (2020). SCIBERT: A pretrained language model for scientific text. In EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference. Hong Kong: Association for Computational Linguistics. https://doi.org/10. 18653/v1/d19-1371 Bevan, A. (2015). The data deluge. Antiquity 89(348), 1473–1484. https://doi.org/10.15184/aqy. 2015.102
258 A. Brandsen Bickler, S. H. (2021). Machine learning arrives in archaeology. Advances in Archaeological Practice, 9(2), 186–191. https://doi.org/10.1017/aap.2021.6 Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python. Sebastopol: O’Reilly. Bogdanovic, I. (2015). Spatial cluster detection in archaeology: Current theory and practice. In Mathematics and archaeology (pp. pp 366–382). Boca Raton: CRC Press. Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2016). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5(1), 135–146. Brandsen, A., & Koole, M. (2021). Labelling the past: Data set creation and multi-label classification of dutch archaeological excavation reports. Language Resources and Evaluation, 56, 543–572. https://doi.org/10.1007/s10579-021-09552-6 Brandsen, A., Lambers, K., Verberne, S., & Wansleeben, M. (2019). User requirement solicitation for an information retrieval system applied to Dutch grey literature in the archaeology domain. Journal of Computer Applications in Archaeology, 2(1):21–30, https://doi.org/10.5334/jcaa.33 Brandsen, A., & Lippok, F. (2021). A burning question – Using an intelligent grey literature search engine to change our views on early medieval burial practices in the Netherlands. Journal of Archaeological Science, 133, 105456. https://doi.org/10.1016/j.jas.2021.105456 Brandsen, A., Verberne, S., Lambers, K., & Wansleeben, M. (2021). Can BERT dig it? - Named entity recognition for information retrieval in the archaeology domain. http://arxiv.org/abs/ 2106.07742 Brandsen, A., Verberne, S., Wansleeben, M., & Lambers, K. (2020). Creating a dataset for named entity recognition in the archaeology domain. In Proceedings of the 12th Language Resources and Evaluation Conference (pp. 4573–4577). Marseille: European Language Resources Association. https://www.aclweb.org/anthology/2020.lrec-1.562/ Brandt, R., Drenth, E., Montforts, M., Proos, R., Roorda, I., & Wiemer, R. (1992). Archeologisch Basisregister. Tech. Rep., Rijksdienst voor Cultureel Erfgoed, Amersfoort. Byrne, K., & Klein, E. (2010). Automatic extraction of archaeological events from text. In B. Frischer, J. Crawford, & D. Koller (Eds.), Making history interactive: Computer applications and quantitative methods in archaeology 2009. BAR International Series (vol. 2079, pp. pp 48–56). Oxford. Chowdhury, G. G. (2005). Natural language processing. Annual Review of Information Science and Technology, 37(1), 51–89. https://doi.org/10.1002/aris.1440370103 Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297. https://doi.org/10.1007/BF00994018 Cunliffe, E., & Curini, L. (2018). ISIS and heritage destruction: A sentiment analysis. Antiquity, 92(364), 1094–1111. https://doi.org/10.15184/AQY.2018.134 Cunningham, J. J., & MacEachern, S. (2016). Ethnoarchaeology as slow science. World Archaeology, 48(5), 628–641. Davis, D. S. (2020). Defining what we study: The contribution of machine automation in archaeological research. Digital Applications in Archaeology and Cultural Heritage, 18, e00152. https://doi.org/10.1016/J.DAACH.2020.E00152 Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (vol. 1, pp. 4171–4186). Minnesota: Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1423 Doran, J., & Hodson, F. (1975). Mathematics and computers in archaeology. Harvard: Harvard University Press. Feldman, R., & Sanger, J. (2007). The text mining handbook: Advanced approaches in analyzing unstructured data. Cambridge: Cambridge University Press. Fiorucci, M., Khoroshiltseva, M., Pontil, M., Traviglia, A., Del Bue, A., & James, S. (2020). Machine learning for cultural heritage: A survey. Pattern Recognition Letters, 133, 102–108. https://doi.org/10.1016/j.patrec.2020.02.017
11 Information Extraction and Machine Learning for Archaeological Texts 259 Fischer, A., Londen, H. V., Bercken, A. B. V. D., Visser, R., & Renes, J. (2021). NAR 68 Urban farming and ruralisation in the Netherlands (1250 up to the nineteenth century), unravelling farming practice and the use of (open) space by synthesising archaeological reports using text mining. Nederlandse Archeologische Rapporten (NAR) 68. Gattiglia, G. (2015). Think big about data: Archaeology and the big data challenge. Archäologische Informationen, 38(1), 113–124. https://doi.org/10.11588/ai.2015.1.26155 Gibbs, M., & Colley, S. (2012). Digital preservation: Online access and historical archaeology ’grey literature’ from New South Wales, Australia. Australian Archaeology, 75, 95–103. https:// doi.org/10.1080/03122417.2012.11681957 Gilboa, A., Karasik, A., Sharon, I., & Smilansky, U. (2004). Towards computerized typology and classification of ceramics. Journal of Archaeological Science, 31(6), 681–694. https://doi.org/ 10.1016/j.jas.2003.10.013 Gilman, P., & Newman, M. (2007). Informing the future of the past: Guidelines for historic environment records (2nd edn.). Tech. Rep., ADS, ALGAO UK, English Heritage, Historic Scotland, RCAHMS and RCAHMW. Grove, M., & Blinkhorn, J. (2020). Neural networks differentiate between middle and later stone age lithic assemblages in eastern Africa. PloS One, 15(8), e0237528. Gualandi, M. L., Gattiglia, G., & Anichini, F. (2021). An open system for collection and automatic recognition of pottery through neural network algorithms. Heritage, 4(1), 140–159. Harris, Z. S. (1954). Distributional structure. Word, 10(2–3), 146–162. Huggett, J. (2020). Is big digital data different? Towards a new archaeological paradigm. Journal of Field Archaeology, 45(suppl. 1), S8–S17. https://doi.org/10.1080/00934690.2020.1713281 International Committee for Documentation (CIDOC). (2014). Information and documentation A reference ontology for the interchange of cultural heritage information (ISO Standard No. 21127:2014). Tech. Rep., International Organization for Standardization. https://www.iso.org/ standard/57832.html Jackson, S., Richissin, C. E., McCabe, E. E., & Lee, J. J. (2020). Data-informed tools for archaeological reflexivity: Examining the substance of bone through a meta-analysis of academic texts. Internet Archaeology, 55. https://doi.org/10.11141/ia.55.12 Jeffrey, S., Richards, J., Ciravegna, F., Waller, S., Chapman, S., & Zhang, Z. (2009). The Archaeotools project: Faceted classification and natural language processing in an archaeological context. Philosophical Transactions Series A, Mathematical, Physical, and Engineering Sciences, 367(1897), 2507–19. https://doi.org/10.1098/rsta.2009.0038 Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In: Machine learning: ECML-98 (pp. 137–142). Berlin: Springer. Kintigh, K. W., Altschul, J. H., Beaudry, M. C., Drennan, R. D., Kinzig, A. P., Kohler, T. A., Limp, W. F., Maschner, H. D., Michener, W. K., Pauketat, T. R., Peregrine, P., Sabloff, J. A., Wilkinson, T. J., Wright, H. T., & Zeder, M. A. (2014). Grand challenges for archaeology. American Antiquity, 79(1), 5–24. Le, Q., & Mikolov, T. (2014). Distributed representations of sentences and documents. In International Conference on Machine Learning. Proceedings of Machine Learning Research (pp. 1188–1196). Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., & Kang, J. (2019). BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4), 1234–1240. https://doi.org/10.1093/bioinformatics/btz682 Leonelli, S. (2015). What counts as scientific data? A relational framework. Philosophy of Science, 82(5), 810–821. Löwenborg, D. (2018). Knowledge production with data from archaeological excavations. In Archaeology and archaeological information in the digital society (pp. 37–53). Milton Park: Routledge. Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge: Cambridge University Press. Manning, C. D., Ragahvan, P., & Schutze, H. (2009). An introduction to information retrieval. Cambridge: Cambridge University Press. https://doi.org/10.1109/LPT.2009.2020494
260 A. Brandsen Merali, Z., & Smith, J. (1985). Optical character recognition: The technology and its application in information units and libraries. Wetherby: Boston Spa. Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. In 1st International Conference on Learning Representations, ICLR 2013 Workshop Track Proceedings. Mitchell, T. (1997). Machine learning. New York: McGraw Hill. Mohri, M., Rostamizadeh, A., & Talwalkar, A. (2018). Foundations of machine learning (2nd edn.). Cambridge: MIT Press. Nash, B. S., & Prewitt, E. R. (2016). The use of artificial neural networks in projectile point typology. Lithic Technology, 41(3), 194–211. Niccolucci, F., & Richards, J. D. (2013). ARIADNE: Advanced research infrastructures for archaeological dataset networking in Europe. International Journal of Humanities and Arts Computing, 7(1–2), 70–88. https://doi.org/10.3366/ijhac.2013.0082 Paijmans, H., & Brandsen, A. (2010). Searching in archaeological texts: Problems and solutions using an artificial intelligence approach. PalArch’s Journal of Archaeology of Egypt/Egyptology, 7(2), 1–6. Paolanti, M., Pierdicca, R., Martini, M., Felicetti, A., Malinverni, E., Frontoni, E., & Zingaretti, P. (2019). Deep convolutional neural networks for sentiment analysis of cultural heritage. ISPRS-International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 4215, 871–878. Pawlowicz, L. M., & Downum, C. E. (2021). Applications of deep learning to decorated ceramic typology and classification: A case study using Tusayan White Ware from Northeast Arizona. Journal of Archaeological Science, 130, 105375. https://doi.org/10.1016/j.jas.2021.105375 Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12, 2825–2830. Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1532–1543) Plets, G., Huijnen, P., & van Oeveren, D. (2021). Excavating archaeological texts: Applying digital humanities to the study of archaeological thought and banal nationalism. Journal of Field Archaeology, 46, 289–302. https://doi.org/10.1080/00934690.2021.1899889 Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3), 130–137. Rehurek, R., & Sojka, P. (2010). Software framework for topic modelling with large corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks (pp. 45– 50). Valletta: ELRA. Richards, J., Tudhope, D., & Vlachidis, A. (2015). Text mining in archaeology: Extracting information from archaeological reports. In J. A. Barcelo & I. Bogdanovic (Eds.), Mathematics and archaeology (pp. 240–254). Boca Raton: CRC Press. https://doi.org/10.1201/b18530-15 Riley, M. D. (1989). Some applications of tree-based modelling to speech and language. In Proceedings of the Workshop on Speech and Natural Language, Association for Computational Linguistics (ACL) (pp. 339–352). https://doi.org/10.3115/1075434.1075492 Sanders, D. H. (2018). Neural networks, AI, phone-based VR, machine learning, computer vision and the CUNAT automated translation app–not your father’s archaeological toolkit. In 2018 3rd Digital Heritage International Congress (DigitalHERITAGE) Held Jointly with 2018 24th International Conference on Virtual Systems & Multimedia (VSMM 2018) (pp. 1–5). Piscataway: IEEE. Sievert, C., & Shirley, K. (2014). LDAvis: A method for visualizing and interpreting topics. In Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces, pp. 63–70 Sommerschield, T. (2020). Ralegh radford Rome awards: Restoring ancient text using machine learning: A case-study on Greek and Latin epigraphy. Papers of the British School at Rome, 88, 387–388. https://doi.org/10.1017/S0068246220000240
11 Information Extraction and Machine Learning for Archaeological Texts 261 Talboom, L. (2017). Improving the discoverability of zooarchaeological data with the help of Natural Language Processing. Master’s thesis, University of York. Talks, A. (2019). An exploration of NLP and NER for enhanced search in osteoarchaeological and palaeopathological textual resources. Master’s Thesis, University of York. Tjong Kim Sang, E. F. (2002). Introduction to the CoNLL-2002 shared task: Languageindependent named entity recognition. In COLING-02: The 6th Conference on Natural Language Learning 2002 (CoNLL-2002). Traviglia, A., Cowley, D., & Lambers, K. (2016). Finding common ground: Human and computer vision in archaeological prospection. AARGnews-The Newsletter of the Aerial Archaeology Research Group, 53, 11–24. Trier, Ø. D., Salberg, A. B., & Pilø, L. H. (2018). Semi-automatic mapping of charcoal kilns from airborne laser scanning data using deep learning. In CAA2016: Oceans of Data. Proceedings of the 44th Conference on Computer Applications and Quantitative Methods in Archaeology (pp. 219–231). Oxford: Archaeopress. Tudhope, D., May, K., Binding, C., & Vlachidis, A. (2011). Connecting archaeological data and grey literature via semantic cross search. Internet Archaeology, 30(30). https://doi.org/10. 11141/ia.30.5 Turney, P. (2002). Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp 417–424 Verschoof-van der Vaart, W. B., Lambers, K., Kowalczyk, W., & Bourgeois, Q. P. (2020). Combining deep learning and location-based ranking for large-scale archaeological prospection of LiDAR data from the Netherlands. ISPRS International Journal of Geo-Information, 9(5), 293. https://doi.org/10.3390/ijgi9050293 Verschoof-van der Vaart, W. B., & Landauer, J. (2021). Using CarcassonNet to automatically detect and trace hollow roads in LiDAR data from the Netherlands. Journal of Cultural Heritage, 47, 143–154. https://doi.org/10.1016/j.culher.2020.10.009 Vince, A. (1996). Editorial. Internet Archaeology, 1. https://doi.org/10.11141/ia.1.7 Vlachidis, A. (2012). Semantic indexing via knowledge organization systems: Applying the CIDOC-CRM to archaeological grey literature. Unpublished PhD Thesis, University of South Wales (USW). Vlachidis, A., Tudhope, D., & Wansleeben, M. (2021). Knowledge-based named entity recognition of archaeological concepts in Dutch. In E. Garoufallou & M. A. Ovalle-Perandones (Eds.), 14th International Conference on Metadata and Semantic Research (pp. 53–64). Cham: Springer. https://doi.org/10.1007/978-3-030-71903-6_6 Vlachidis, A., Tudhope, D., Wansleeben, M., Azzopardi, J., Green, K., Xia, L., & Wright, H. (2017). D16.4: Final report on natural language processing. Tech. Rep., ARIADNE. http://legacy.ariadne-infrastructure.eu/wp-content/uploads/2019/01/D16.4_Final_ Report_on_Natural_Language_Processing_Final.pdf Wilcke, W. X., de Boer, V., de Kleijn, M. T., van Harmelen, F. A., & Scholten, H. J. (2019). Usercentric pattern mining on knowledge graphs: An archaeological case study. Journal of Web Semantics, 59, 1–10. https://doi.org/10.1016/j.websem.2018.12.004 Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J. W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., ... Mons, B. (2016) The FAIR guiding principles for scientific data management and stewardship. Scientific Data, 3, 160018. https://doi.org/10.1038/sdata.2016.18 Wold, S., Esbensen, K., & Geladi, P. (1987). Principal component analysis. Chemometrics and Intelligent Laboratory Systems, 2(1–3), 37–52.
Chapter 12 Argument Mining and Analytics in Archaeology John Lawrence, Martín Pereira-Fariña, and Jacky Visser Abstract The ever increasing volume of textual data ripe for analysis has driven computational efforts to unlock the wealth of information contained within. The automated reconstruction of the argumentative structure of texts, Argument Mining, meets this challenge by not only showing what claims are being advanced (conclusion), but also why (premises). In this chapter, we start by surveying some of the foundations and state-of-the-art of argument mining and how they can be applied in domain-specific tasks in different research contexts, such as archaeology. After that, we discuss two central themes in argumentation critical for argument mining: argument schemes (common patterns of reasoning) and discourse markers (that function as argumentative indicators). Next, we describe how to create specific datasets for argument mining systems by means of annotated text corpora and how to store it using the Argument Interchange Format ontology. We conclude explaining Argument Analytics, a visual way to deliver the output of argument mining systems to its potential users. Keywords Argument mining · Argument scheme · Corpus · Analytics · Ontology 12.1 Introduction The ever increasing volume of textual data ripe for analysis has driven computational efforts to unlock the wealth of information contained within. Automated techniques such as Opinion Mining and Sentiment Analysis make it possible to J. Lawrence · J. Visser (!) Department of Computing, University of Dundee, Dundee, UK e-mail: j.lawrence@dundee.ac.uk; j.visser@dundee.ac.uk M. Pereira-Fariña Department of Philosophy, University of Santiago de Compostela, Santiago de Compostela, Spain © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 C. Gonzalez-Perez et al. (eds.), Discourse and Argumentation in Archaeology: Conceptual and Computational Approaches, Quantitative Archaeology and Archaeological Modelling, https://doi.org/10.1007/978-3-031-37156-1_12 263
264 J. Lawrence et al. identify the views expressed in a piece of text—for example, whether a product review is positive or negative (Pang & Lee, 2008). While these well-established techniques can be effectively used to determine the stance of argumentative texts, they stop short of reconstructing the reasoning advanced in support of that stance. The automated reconstruction of the argumentative structure of texts, Argument Mining, meets this challenge by not only showing what claims are being advanced, but also why. The earliest approaches to argument mining (Moens et al., 2007; Palau & Moens, 2009), attempted to detect the argumentative parts of a text by first splitting it into sentences and using features of these sentences to classify each as either Argument or Non-Argument, and then classifying each Argument sentence as either premise or conclusion. Whilst much recent work in this area builds on these concepts and techniques, the range of tasks and technologies available has grown dramatically, as have the application areas. The tasks addressed can be broadly categorised as: identifying argument components, including boundary detection and argument/non-argument classification; identifying clausal properties, both intrinsic, such as whether the clause is factual or opinion-based, and contextual such as whether the clause is the conclusion to an argument; and identifying relational properties, from simple premise/conclusion relationships, to whether a set of clauses form an instance of an argument scheme. In this chapter, we will survey some of the foundations and state-of-the-art of argument mining: the automated reconstruction of the reasons advanced in defence (or attack) of a disputed claim (Lawrence & Reed, 2020). Argument mining tools can be fine-tuned for application on a variety of text genres and domains. While we will mostly discuss general argument mining, the methods can be utilised to great effect on specialised domain-specific tasks in, e.g., research contexts—whether in archaeology or beyond. After outlining the application of argument mining techniques in the archaeology domain (Sect. 12.2), we will use the remainder of this chapter to first discuss two central themes in argument mining and argumentation broadly (Sect. 12.3): argument schemes as common patterns of reasoning that can be identified and used to determine the structure of argumentation (Sect. 12.3.1), and discourse markers that function as argumentative indicators (Sect. 12.3.2). Next, we turn to argumentation datasets, focusing on both annotated and generated text corpora developed specifically for argument mining (Sect. 12.4). In the next section, we briefly look at the Argument Interchange Format, a shared ontology for representing argumentative concepts and labels (Sect. 12.5) that can be employed to increase the compatibility and reusability of annotated datasets. In the last part of the chapter, we will explain Argument Analytics, which can take the output of argument mining systems to provide quantitative metrics and visualisations. These data-driven infographics give users an overview of complex evidential reasoning and debates, supporting the sense-making process on otherwise hard to track argumentative cases (Sect. 12.6).
12 Argument Mining and Analytics in Archaeology 265 12.2 Applying Argument Technologies in Archaeology Lucas (2019) defends an argumentative model of reasoning as one of the possible theoretical frameworks for knowledge production in archaeology. This type of knowledge is not stable until it is elaborated in complex texts (although it is still being mobile) but, due to the nature of archaeological knowledge (claims about the past that cannot be indubitably verified), there are always alternative ways competing between them for interpreting archaeological records. The most recent work studying how archaeological knowledge is produced (Chapman & Wylie, 2016) is mainly inspired by Toulmin’s (1958) model of argumentation, having several limitations to gather the variety and richness of the different ways in which people build different type of arguments in the field. We argue, however, in favour of using annotated text corpora (see Sect. 12.4) and Argument Analytics (see Sect. 12.6) in the study of knowledge production in archaeology as an alternative to the Toulmin model. Knowledge production in archaeology requires different text types. In addition to the documents generated during any research (reports, field diaries, etc.), the two main text types are journal articles and books (Lucas, 2019). Because journal articles usually contain more argumentation than books—which are more descriptive and expository—these should constitute the raw data to create annotated corpora. In addition, since this type of text uses a formal linguistic register and is carefully written, the annotation of this type of text should be easier and more reliable. On the other hand, different texts can provide alternative views of the same underlying data. Therefore, the texts must develop alternative lines of reasoning. Texts aligned with the same view or defending the same main claim can be put together into the same set of raw corpus data to be annotated, which allows us to explore its inter-textual relationships (Chandler, 2003; Visser et al., 2018a); i.e., discover how texts and authors are connected between them and how the content of various texts cross-references and relies on the meaning of others. As a result, arguments are not isolated islands but, following Lucas’ program (Lucas, 2019, p. 61), “arguments work best not as linear chains but as multiple strands which work to triangulate around the same chain”. Efforts to create good annotated text corpora will make it possible to address some of the current epistemic issues in the field of archaeology, because it unpacks how different researchers argue in favour of their competing interpretative views about the past. A standard ontology and annotation scheme (see Sect. 12.5), and notions such as argument schemes (see Sect. 12.3.1), have a higher expressive power than Toulmin’s model, and they allows us to better capture the argumentative richness of texts. In addition, combining these manual methods with automated techniques for Argument Mining and Argument Analytics, makes it possible to capture the theoretical pluralism and overcome that inner tension with an argumentative model of reasoning. As we describe in Sect. 12.6, annotated corpora not only provide information about the content of the debates but also about its dynamics. This allows us to observe both the interaction between speakers and their main
266 J. Lawrence et al. points of agreement and disagreement. As a result, we will be able to join together the different theories with respect to a particular issue and how they are defended. In conclusion, we argue that is worth to make an effort to create good annotated text corpora in the archaeological field. Other fields where such corpora already have been created, such as political debate, education, legal texts, and newspapers, have already got the benefit of argument technologies. For instance, Visser et al. (2020a) have successfully deployed argument technology to all secondary schools in the UK to instil critical literacy skills in distinguishing fake news from genuine. Similarly, the project The Morality of Abortion1 shows how different debaters in the BBC Radio 4 programme The Moral Maze argue their different positions with respect to abortion laws and what their main points of conflict were. It is our contention that the same can be achieved in the archaeological field. 12.3 Dimensions of Argument Mining Many of the earliest approaches to argument mining focused on applying existing computational linguistic techniques to identify specific facets of the argumentative structure contained within a text. Instead of requiring an a priori defined set of rules that the software applies to a given example of argumentative text, these techniques train machine learning algorithms on manually annotated argument datasets (see Sect. 12.4) to produce models capable of automatically classifying text. By feeding in a sufficient volume of text spans along with their annotated class—e.g. part of a sentence with the annotated label as ‘premise’—the model learns to associate specific linguistic cues with this class and can then predict to which class new text spans should be assigned. In this way, the system learns a model on the basis of an appropriately labelled set of examples (the training data), which is then tested to an as yet unseen set of unlabelled examples (the test data) to test how well the system performs. As advances are made in the performance of machine learning models, this strategy continues to deliver incrementally improving results (Galassi et al., 2020). More recent advances have started to explore characteristic features of natural language that are specifically related to argumentative intent or to particular application domains. By combining such features and the accompanying techniques in a concerted approach, the insights from various disciplines and perspectives can be leveraged to achieve the best results (Lawrence & Reed, 2015). In this section, we discuss two of these central dimensions: argument schemes (12.3.1) and linguistic features (12.3.2). 1 https://bbc.arg.tech/.
12 Argument Mining and Analytics in Archaeology 267 12.3.1 Argument Schemes Argument schemes capture persuasive structures of (typically presumptive) inference from a set of premises to a conclusion, relying on stereotypical patterns of human reasoning.2 As such, argument schemes represent a historical descendant of the topics of Aristotle (Aristotle, 1958) and, much like Aristotle’s topics, play an equally valuable role in the production, analysis, and evaluation of arguments— whether by human arguers or by automated software. Several attempts have been made at creating taxonomies of the most commonly used schemes—to give just a small sample of the existing scholarship: Hastings (1963), Perelman and Olbrechts-Tyteca (1969), Kienpointner (1992), van Eemeren and Grootendorst (1992), Pollock (1995), Walton (1996), Grennan (1997), Katzav and Reed (2004), Walton et al. (2008), and Wagemans (2016). Although these sets of schemes overlap in many places, the number of schemes identified and their granularity varies greatly. As a result, most argument analyses tend to contain examples from only one scheme set, with various permutations of the schemes described by Douglas Walton to be the most commonly used in computational approaches. Whilst the majority of these argument schemes are general in their nature, applying to any situation where argument can be found, some schemes are specific to a certain context or domain. For example, Wyner et al. (2012) identify a ‘consumer argumentation scheme’ to represent the arguments made in product reviews relative to the preferences and values of the potential buyer. Similarly, Green (2015) identifies ten custom argument schemes targeted at genetics research articles. For example, one of the schemes presented, ‘Failed to Observe Effect of Hypothesized Cause’, looks for situations where specific properties were not observed, and where it is assumed that a specific condition that would result in those properties is present, leading to the conclusion that the condition may not be present. Green (2018) further argues for schemes expressed in terms of domain concepts rather than by generic definitions like those used by Walton et al. (2008). She carries out a pilot annotation study of schemes for 15 arguments in the Results/Discussion section of biological/biomedical journal articles. To the best of our knowledge, no domain-specific argument schemes have yet been specified for Archaeology. Understanding the argument scheme, whether general or domain-specific, instantiated in a piece of natural language text can help us understand its persuasive force beyond what many existing automated techniques for extracting meaning offer. If we consider the argument in Example (1), then sentiment analysis techniques, for instance, allow us to understand at a high level what views are being presented— 2 The study of argument schemes has a long history, ranging from Antiquity to modern academic research (Garssen, 2001). In the literature, various authors use different terms to signify the same general idea (with small variations): e.g., ‘argument scheme’ (van Eemeren and Grootendorst, 1992), ‘argumentation scheme’ (Walton, 1996), and ‘argumentative scheme’ (Perelman & Olbrechts-Tyteca, 1969). In this chapter, we will use the term ‘argument scheme’.
268 J. Lawrence et al. that the speaker is against opening the Cave of Altamira in Spain, for example—but they are unable to provide details on exactly why this standpoint is held.3 (1) Opening the Cave of Altamira to the public may damage the drawings, because environmental stability is essential to their preservation, and the visitors risk altering the environmental parameters. Looking at the structure of the argumentation in this review, we can see that the propositions “environmental stability is essential to their preservation” and “the visitors risk altering the environmental parameters” are working together as a linked argument (Snoeck Henkemans, 1992; Freeman, 2011) to support the conclusion “Opening the Cave of Altamira to the public may damage the drawings”. Furthermore, we can see that the link between the premises and conclusion is an instance of Argument from Cause to Effect (Walton et al., 2008). In Walton’s approach to argument schemes a particular label is often assigned to each component part of a scheme instance. For the Argument from Cause to Effect in Example (1), the scheme components are only labelled as major and minor premise, as follows: Major Premise: environmental stability is essential to the preservation of the drawings Minor Premise: the visitors risk altering the environmental parameters Conclusion: opening the Cave of Altamira to the public may damage the drawings The features of these common patterns of argument provide us with a way in which to both identify that an argument is advanced and determine its structure. By using the specific nature of each component proposition in a scheme, we can identify where a particular scheme is being used and classify the propositions accordingly, thereby gaining a deeper understanding of the reasoning expressed in a piece of text. Argument schemes can be a strong feature in argument mining and in the reconstruction of enthymemes (understood narrowly as arguments with unexpressed premises) (Feng & Hirst, 2011). To maximise their efficacy, the number and variation of individual schemes in annotated argument corpora should be as large as possible. Existing annotations, however, tend to use restricted sets of scheme types, while struggling to obtain reliable annotation results. For example, Duschl (2007) initially adopts a selection of nine argument schemes described by Walton (1996) for his annotation of transcribed middle-school student interviews about science fair projects. Later, however, he collapses several schemes into four more general classes no longer directly related to particular scheme types. This deviation from Walton’s typology appears to be motivated by the need to improve annotation agreement. Similarly, Song et al. (2014) base their annotation on a modification of Walton’s typology, settling on a restricted set of three more general schemes: policy, causal, 3 This example was adapted and translated from the Cultural Heritage—Altamira corpus, available at http://corpora.aifdb.org/Altamira. @MP Is there a citation for this to refer to?
12 Argument Mining and Analytics in Archaeology 269 and sample , while Anthony and Kim (2015) employ a bespoke set of nine coding labels modified from the categories used by Duschl (2007) and nine schemes described in a textbook by Walton (2006). Visser et al. (2020b) develop an annotation procedure that aims to stay close to Walton’s original typology, while facilitating the reliable annotation of a broad range of argument schemes. The main principle guiding the annotation is the clustering of argument schemes on the basis of intuitively clear features recognisable to annotators. Due to the strong reliance on the distinctive properties of arguments that are characteristic for a particular scheme, the annotation procedure bears a striking resemblance to methods for biological taxonomy—the identification of organisms in the various sub-fields of biology (see, e.g., Voss, 1952; Pankhurst, 1978). Drawing on the biological analogue and building on the guidelines used by Visser et al. (2018b), they develop a taxonomic key for the identification of argument schemes in accordance with Walton’s typology: the Argument Scheme Key (ASK). The ASK is a dichotomous identification key that leads the analyst through a series of disjunctive choices based on the distinctive features of a ‘species’ of argument scheme to the particular type. Starting from the distinction between source-based and other arguments, each further choice in the key leads to either a particular argument scheme or to a further distinction. The distinctive characteristics are numbered, listing between brackets the number of any not directly preceding previous characteristic that led to this particular point in the key. To further simplify the annotation of argument schemes, Lawrence et al. (2019) develop a software solution that takes the user through the ASK in a series of binary choices to result in a suggested scheme. This ASK Assistant is integrated in the Online Visualisation of Argument (OVA) tool (Janier et al., 2014), a web browser based application for analysing and annotating the argumentative structure of natural language text. OVA4 has over 3000 individual users in 38 countries, analysing argumentative texts ranging from online discussions (Lawrence et al., 2017) to election debates (Visser et al., 2019). 12.3.2 Linguistic Features Discourse markers are explicitly stated linguistic expressions of the relationship between statements (Webber et al., 2011), and, when present, provide strong indicators of argumentative structure (van Eemeren et al., 2007). For instance, if we consider Example (2), then this can be split into two separate propositions “the Palace of Culture and Science in Warsaw should not be demolished” and “it [the Palace of Culture and Science in Warsaw] houses many public institutions”.5 The 4 http://ova.arg.tech. 5 Example adapted from (Budzynska et al., 2021).
270 J. Lawrence et al. presence of the discourse marker “because” between these two propositions is a clear indication that the second is being employed as a reason for the first. (2) The Palace of Culture and Science in Warsaw should not be demolished, because it houses many public institutions. Discourse indicators have been successfully leveraged as a component of argument mining techniques. For example, Stab and Gurevych (2014b) used indicators as a feature in multiclass classification of argument components, with each clause classified as a major claim, claim or premise, or as non-argumentative. Similar indicators are used by Wyner et al. (2012), along with domain terminology (e.g. product names and properties), to highlight potential argumentative sections of online product reviews. However, there has been little study of how well indicators perform on their own, how frequently they occur in real-world text, and how well different individual indicators map to specific argumentative relations. There are many different ways in which indicators can appear, and a wide range of relations which they can suggest (Knott, 1996). Lawrence and Reed (2017) limit their search to specific terms indicating support or attack relations between a pair of propositions. Specifically, they consider those indicators which show an argumentative relation between sequential propositions of the form A [indicator] B (as we saw in Example (2)) or [indicator] A B (e.g. “Because the Palace of Culture and Science in Warsaw houses many public institutions, it should not be demolished”). They also consider the relationship between indicators and the directionality of the argumentative connections (e.g. A because B suggests a support relation from the premise B (single underlined) to the conclusion A (double underlined), whereas A therefore B suggests a support relation from A to B). In this work, two sources of candidate discourse indicators were used: an aggregation of those found in existing literature (Groarke et al., 1997; Knott, 1996), and a domain specific list extracted from relations in the US2106 corpora (Visser et al., 2019). In each case, these lists were extended by including synonyms identified using WordNet (Miller, 1995). The indicators drawn from existing literature are shown in Table 12.1. Surprisingly, the results show that indicators which are commonly mentioned in the literature as being useful for identifying argumentative structure rarely occur in the examined data. The indicator “therefore”, for instance, only occurs once within Table 12.1 Argumentative discourse indicators from existing literature Relation type Indicators support So, therefore, accordingly, then, thus, consequently, hence, ergo support Because, since, as conf lict But, however, nonetheless, nevertheless, still, yet, though, whereas conf lict Although, except, despite, albeit A .−−−−→ B A .←−−−− B A .−−−−−→ B A .←−−−−− B
12 Argument Mining and Analytics in Archaeology 271 the entire US2016G1tv corpus (where it does indeed connect two inferentially linked text spans). Of those indicators which do appear more frequently in US2016G1tv, most provide little information. For example, whilst there were 30 instances of the indicator “so” occurring between adjacent spans, only 37.5% of these instances were between spans where a support relation exists. A possible explanation for which can be found in the spoken genre of the US2016G1tv corpus, in which “so” may be used rather as a linguistic device signalling turn-taking. The one exception here is the indicator “because”. This indicator appears between spans 71 times and, of these, 87.3% were connected by a support relationship. Whilst this is a promising result, and suggests that, in those cases where “because” occurs, it can tell us with high accuracy the type of connection, it is also shown that using this method on its own would leave approximately 80% of support relations (as well as all conflict relations) unidentified. These results are supported by those of earlier work carried out on the Araucaria corpus (Reed et al., 2008). Focusing on the thirteen most reliable support indicators and eleven most reliable conflict indicators, Lawrence and Reed (2015) achieved an overall precision of 0.89, but a recall of only 0.04, concluding that: “discourse indicators may provide a useful component in an argument mining approach, but, unless supplemented by other methods, are inadequate for identifying even a small percentage of the argumentative structure”. 12.4 Annotated Corpora of Argumentation One of the challenges faced by current approaches to argument mining is the lack of suitably large quantities of appropriately labelled (or annotated) arguments to serve as training and test data. Especially techniques based on neural networks and deep learning require vast quantities of data to perform well, and to prevent the system from over-fitting to the data—fitting to a limited arbitrary text sample, at the expense of wider applicability. Several recent efforts have been made to improve this situation through the creation of annotated text corpora and argument datasets across a range of different communicative domains. These efforts can be broken down into two main categories: manually annotated corpora of argumentative components and structure found in natural language text; and manually or automatically generated corpora. In this section, we will discuss some of the prototypical and most widely used annotated and generated text corpora for argument mining. We will inevitably leave out a great many alternatives, but chose this sample to give a reasonable introduction—a wider overview is presented by, e.g., Lawrence and Reed (2020), and some of the datasets not discussed independently in this chapter can still be found in Table 12.2. The Internet Argument Corpus (IAC) consists of .∼390,000 posts in .∼11,000 online discussions, totalling some 73,000,000 words (Walker et al., 2012). Subsets
AraucariaDB US2016 MM2012 Dispute mediation Digging by debating Name AIFdb Corpora Argumentation schemes 62,881 words 87,064 words = 0.68 = 0.75 Single annotator .κ = 0.55 (types), 0.61 (relations) .κ .κ Single annotator 35,789 words 26,923 words 29,068 words Single annotator 6704 words Examples of occurrences of Walton’s argumentation schemes found in episodes of the BBC Moral Maze Radio 4 programme. Collection of analyses of nineteenth century philosophical texts from the Hathi Trust collection. Argument maps of mediation session transcripts Analyses of all episodes from the 2012 summer season of the BBC Moral Maze Radio 4 programme. 2016 US presidential elections: annotations of selected excerpts of primary and general election debates, combined with annotations of selected excerpts of corresponding Reddit comments. An import of 661 argument analyses produced using Araucaria and stored in the Araucaria database. IAA Size Description Table 12.2 Significant argumentation datasets available online http://corpora.aifdb.org/araucaria http://corpora.aifdb.org/US2016 http://corpora.aifdb.org/mm2012 http://corpora.aifdb.org/mediation http://corpora.aifdb.org/dbyd http://corpora.aifdb.org/schemes URL Reed (2006) Visser et al. (2018a) Budzynska et al. (2014) Janier and Reed (2016) Murdock et al. (2017) Lawrence and Reed (2016) Reference 272 J. Lawrence et al.
Available elsewhere Argument annotated essays Language of opposition Microtext Internet argument corpus (IAC) Imported into AIFdb eRulemaking AraucariaDBpl The corpus consists of argument annotated persuasive essays including annotations of argument components and argumentative relations. Argument maps of 67 comment threads from regulationroom.org. Consisting of 11,000 discussions and developed for research in political debate on internet forums. Subsets of the data have been annotated for topic, stance, agreement, sarcasm, and nastiness among others. Used in Rutgers for the SALTS project (http://salts.rutgers.edu/). 112 manually created, short texts with explicit argumentation, and little argumentatively irrelevant material. A selection of over 50 Polish language analyses created using the Polish version of Araucaria. 147,271 words 48,666 words 7828 words 1,031,398 words 26,083 words 2,654 words = 0.22-0.60, ≈ 0.47 = 0.73 = 0.83 = 0.64-0.88 (types), 0.71-0.74 (relations) .κ .κ Not reported .κ̄ .κ .κ Single annotator https://bit.ly/2OlRZnt http://corpora.aifdb.org/Microtext http://corpora.aifdb.org/looc1 http://corpora.aifdb.org/IAC http://corpora.aifdb.org/RRD http://corpora.aifdb.org/araucariapl (continued) Stab and Gurevych (2017) Peldszus (2014) Ghosh et al. (2014) Walker et al. (2012) Park and Cardie (2014) Budzynska (2011) 12 Argument Mining and Analytics in Archaeology 273
IBM project debater datasets Internet argument corpus (IAC) 2 Consumer debt collection practices (CDCP) Name Argument annotated user-generated web discourse Table 12.2 (continued) Description User comments, forum posts, blogs and newspaper articles annotated with an argument scheme based on an extended Toulmin model User comments about rule proposals by the Consumer Financial Protection Bureau collected from an eRulemaking website Corpus for research in political debate on internet forums. It includes topic annotations, response characterizations, and stance. Collection of annotated data sets developed as part of Project Debater to facilitate this research. Organized by research sub-fields. = 0.65 (types), 0.44 (relations) Not reported .∼500,000 Various forum posts Various .α .∼88,000 words IAA = 0.51-0.80 .αU Size 84,673 words https://ibm.co/2OlqieA https://nlds.soe.ucsc.edu/iac2 http://joonsuk.org URL https://bit.ly/2vdkHOD (Rinott et al., 2015), Levy et al. (2017) etc Abbott et al. (2016) Niculae et al. (2017) Reference Habernal and Gurevych (2017) 274 J. Lawrence et al.
12 Argument Mining and Analytics in Archaeology 275 of the data have been annotated with a variety of labels such as topic, stance, agreement, sarcasm, and nastiness. The IAC is further developed in the IAC Version 2 (Abbott et al., 2016), a collection of corpora for research in political debate on internet forums. It consists of three datasets: 4forums (.∼414K posts), ConvinceMe (.∼65K posts), and a sample from CreateDebate (.∼3K posts). The annotation includes topics, response characterisations, and stance classification. The detail of argument annotation in both IAC datasets is still rather limited in comparison to that available in other datasets. One of the ways in which others have succeeded in creating corpora with more detailed argument annotation is by narrowing down the scope and focusing on a specific domain. Green (2014), for instance, creates a freely available corpus of open-access, full-text scientific articles from the biomedical genetics research literature, annotated to support argument mining applications. However, there are challenges to creating such corpora, such as the extensive use of biological, chemical, and clinical terminology in the BioNLP domain requiring specialist annotators trained in this field (Green, 2015). Legal texts constitute another highly specialised domain. Walker et al. (2014) mark up successful and unsuccessful patterns of argument in U.S. judicial decisions. Building on a corpus of vaccineinjury compensation cases that report fact-finding about causation, based on both scientific and non-scientific evidence and reasoning, patterns of reasoning are identified and used to illustrate the difficulty of developing a type or annotation system for characterising these patterns. In the development of the Argument Annotated Essays Corpus (AAEC), Stab and Gurevych (2014a) leverage the inherent argumentative nature of a particular text genre. The AAEC consists of argument-annotated persuasive essays, featuring not just topic and stance identification, and annotation of argument components and relations, but also persuasiveness scores for (a selection of) the arguments. Drawn from 90 English language essays, the initial AAEC corpus comprises 90 major claims, 429 claims, and 1033 premises, connected by 1312 support and 161 attack relations—with the second version of the AAEC (Stab and Gurevych, 2017) further extending this to 402 essays, 751 major claims, 1506 claims, and 3832 premises, connected by 3613 support and 219 attack relations. The persuasiveness annotation by Carlile et al. (2018) also includes scores for attributes that potentially impact persuasiveness: Eloquence, Specificity, Relevance, and Evidence, and the means of persuasion—Ethos, Pathos and Logos. The usefulness of this addition to AAEC has been demonstrated in the development of automated methods for persuasiveness scoring of essays (Ke et al., 2018). Another corpus focusing on persuasive essays is the generated corpus of argumentative “microtexts”. Peldszus (2014) creates this corpus by tasking participants to write approximately five segments in which: all segments are argumentatively relevant; there is a segment acting as the main claim of the text; all other segments are supporting/attacking the main claim or another segment; and at least one possible objection to the claim is considered in the text. Whilst this method of generating textual data produces very clear examples of argumentation, the artificial nature of its construction means that results obtained on the dataset may not
276 J. Lawrence et al. generalise well to naturally occurring unrestricted text. Nonetheless, the Microtext corpus provides a valuable resource for controlled ‘laboratory’ testing of argument mining techniques. Whilst the previously discussed datasets can be viewed as fully structured argument data, there is an increasing usage of larger semi-structured sources. One source for such data is the ChangeMyView6 (CMV) Reddit subcommunity, the argumentative nature of which has been successfully leveraged for gathering semistructured data by, amongst others, Hidey and McKeown (2018). The data takes the form of discussion threads where the original poster of a thread provides a viewpoint on a specific topic, and other users reply with comments aiming to change this view. If the original poster finds that a comment succeeds in changing their viewpoint, they can reply with a .∆ (delta) indicating this. The textual CMV data contains strong indicators of arguments and counterarguments (Hua & Wang, 2017). To support the creation and curation of argument datasets, software infrastructure has been developed, including tools for argument annotation and online repositories. AIFdb7 is an online, freely accessible database of annotated argumentative texts (Lawrence et al., 2012). Arguably the most comprehensively annotated collection of such data, AIFdb contains a range of independent annotated corpora , comprising over 2.2m words and 200,000 claims in fourteen different languages8 and over 20,000 argument maps compliant with the Argument Interchange Format (AIF, see next Section) (Chesñevar et al., 2006). In Table 12.2, we survey some of the corpora contained in AIFdb—both native AIF and imported—as well as some of the main online corpora available elsewhere. 12.5 Argument Interchange Format Argumentation theory is a large and diverse field stretching from analytical philosophy to communication theory and social psychology. The computational investigation of the space has multiplied that spectrum by a diversity of its own in semantics, logics and inferential systems. One of the problems associated with the diversity and productivity of the field, however, is fragmentation: with many researchers from various backgrounds focusing on different aspects of argumentation, it is increasingly difficult to reintegrate results into a coherent whole. This in turn makes it difficult for new research to build upon old. Furthermore, the large variation in theoretical interpretations of argumentative concepts leads to idiosyncratic labels in annotated datasets. To tackle such problems, the computational argument community built a common ontology for argument to support interchange between 6 https://www.reddit.com/r/changemyview/. 7 http://www.aifdb.org. 8 Amharic, Chinese, Dutch, English, French, German, Hindi, Italian, Japanese, Polish, Portuguese, Russian, Spanish and Ukrainian.
12 Argument Mining and Analytics in Archaeology 277 different research projects and applications in the area: the Argument Interchange Format (AIF) (Chesñevar et al., 2006). Owing to its roots in computational argumentation, a main aspiration of the AIF is to facilitate data interchange among various tools and methods for argument analysis, manipulation and visualisation. Whilst the ideal of a single format might not be feasible in such a diverse field, a common consensus on the standards and technologies employed is desirable. Furthermore, the AIF project aims to develop a commonly agreed-upon core ontology that specifies the basic concepts used to express argumentative information and relations. The purpose of this ontology is not to replace other languages for expressing argument but rather to serve as an abstract interlingua that acts as the centrepiece to multiple individual languages for argumentation. These argument languages can be, for example, logical languages (e.g. ASPIC’s defeasible logic Prakken, 2010), visual languages (e.g. Araucaria’s AML format for diagrams Reed & Rowe, 2004) or natural language (e.g. as used in the pragma-dialectical theory van Eemeren, 2018). The AIF can be seen as a representation scheme constructed in three layers. At the most abstract layer, the AIF provides a hierarchy of concepts which can be used to describe argument structure. This hierarchy describes an argument by conceiving of it as a network of connected nodes that are of two types: information nodes that capture data (such as datum and claim nodes in an analysis using the Toulmin (1958) model, or premises and conclusions in a box-and-arrow analysis in the style of Freeman (1991), for example), and scheme nodes that describe passage between information nodes (similar to the application of warrants or rules of inference). Scheme nodes in turn come in several different guises, including scheme nodes that correspond to support or inference (or ‘rule application nodes’), scheme nodes that correspond to conflict or refutation (or ‘conflict application nodes’), scheme nodes that correspond to rephrase and scheme nodes that correspond to value judgements or preference orderings (or ‘preference application nodes’). At this topmost layer, there are various constraints on how components interact: information nodes, for example, can only be connected to other information nodes via scheme nodes of one sort or another. Scheme nodes, on the other hand, can be connected to other scheme nodes directly (in cases, for example, of arguments that have inferential components as conclusions, e.g. in patterns such as Kienpointner’s (1992) ‘warrant-establishing arguments’). Inference captured by multiple incoming scheme nodes thus naturally corresponds to convergent argumentation; that covered by multiple premises supporting a single incoming scheme node corresponds to linked argumentation (Walton, 2006). A second, intermediate layer provides a set of specific argumentation schemes (and value hierarchies, and conflict patterns). Thus, the uppermost layer in the AIF ontology lays out that presumptive argumentation schemes are types of rule application nodes, but it is the intermediate layer that cashes those presumptive argumentation schemes out into Argument from Consequences, Argument from Cause to Effect and so on. At this layer, the form of specific argumentation schemes is defined: each will have a conclusion description (such as ‘A may plausibly be taken to be true’) and one or more premise descriptions (such as ‘E is an expert
278 J. Lawrence et al. in domain D’). Walton’s schemes (Walton, 1996; Walton et al., 2008) have been developed in full for the AIF (Rahwan et al., 2007). Finally, the third and most concrete level supports the integration of actual fragments of argument, with individual argument components (such as strings of text) instantiating elements of the layer above. At this third layer, an instance of a given scheme is represented as a rule application node: RA for applications of rules of inference, CA for conflict scheme applications, MA for rephrases or transformations, PA for preference schemes, etc. These rule application nodes are said to fulfil the presumptive argumentation scheme descriptors at the level above. As a result of this fulfilment relation, premises of the rule application node fulfil the premise descriptors, the conclusion fulfils the conclusion descriptor, presumptions can fulfil presumption descriptors, and conflicts can be instantiated via instances of conflict schemes that fulfil the conflict scheme descriptors at the level above. Rephrase plays a slightly different role, that of connecting information nodes of similar propositional content. Again, all the constraints at the intermediate layer are inherited, and new constraints are introduced by virtue of the structure of the argument at hand. 12.6 Argument Analytics Argument Analytics9 provides a suite of automated techniques for quantitatively processing and visualising characteristics of large sets of analysed argumentative data (Lawrence et al., 2016). More specifically, the developed methods work with any data conforming to the Argument Interchange Format, be it pre-annotated data (from AIFdb, for instance), or the output of argument mining software. Argument Analytics components range from the detailed statistics required for discourse analysis or argument mining, to infographic-style representations, offering insights in a way that is accessible to a general audience. The extendable set of modules currently comprises: simple statistical data, which provides both an overview of the argument structure and frequencies of patterns such as argumentation schemes; dialogical data highlighting the behaviour of participants of the dialogue; and real-time data allowing for the graphical representation of an argument structure developing over time. Together these analytics open an avenue to giving feedback on live debates, generating hypotheses from large sets of evidence, producing summaries of citizen science, and more. The Argument Analytics platform is designed specifically for making sense out of argument data represented according to the AIF such as the data stored in the AIFdb10 database Lawrence et al. (2012). AIFdb Corpora enables Argument Analytics to display the interpretations of data, whether on a single AIF argument 9 http://analytics.arg.tech. 10 http://www.aifdb.org.
12 Argument Mining and Analytics in Archaeology 279 map (stored in AIFdb as a NodeSet), or a large corpus containing hundreds or thousands of such AIF representations. 12.6.1 Simple Statistics The simple statistics modules allows an analyst to quickly make sense of a large amount of annotated argument data. Although these calculations are straightforward and relatively easy to automate, they nevertheless provide interesting insights into the data. The overview page shows a range of statistics, offering a rapidly digested summary of the overall argumentative structure. The number of Information nodes provides an indication of the overall size of the analysis. The average number of words per Information Node illustrates the complexity of the ideas presented, and how succinctly they are expressed. The numbers of inference (RA) and conflict (CA) nodes give a suggestion as to the nature of the dialogue, which is further expanded by showing the ratios of RA to CA (capturing how diverse are the perspectives in the debate) and RA to I (how dense the argumentation is). From these ratios it is possible to get an idea of: how close the relationships are between the points being made, low ratios of RA and CA to I-nodes suggest an argument that is quite loose and fragmented; the levels of conflict and agreement; and, perhaps, how contentious the issue being discussed is, with a higher ratio of CA to RA suggesting a more contentious issue. The Pattern Count modules expand on the overview to give detailed statistics suitable for more in-depth argument and discourse analysis. They provide the frequencies of commonly occurring patterns, split into two categories. Firstly, argumentative and illocutionary patterns which describe both the nature of the interactions, for example levels of agreement and disagreement, and the way in which participants have expressed themselves and interacted with each other, such as how frequently a participant questions the statements of others compared to how frequently they assert their own views. The second category, dialogical patterns, illustrates the flow of the discourse and gives an indication of any dialogical rules, either explicit or implicit, to which the participants are conforming. Such dialogical patterns are also useful, for instance, to show cross-cultural differences in dialogue, or differences in the formality and setting of dialogues. 12.6.2 Comparative Statistics Comparative statistics modules (Duthie et al., 2016) allow for the validation of both manual and automatic argument analysis. Such calculations enable comparison between two manual analyses to determine the efficacy of annotation guidelines via inter-annotator agreement (e.g. Cohen’s Kappa Cohen, 1960), or the comparison
280 J. Lawrence et al. of results from automatic techniques to a manually created gold standard (e.g. precision, recall and F1-Score van Rijsbergen, 1979). The examples given in this section refer to two human annotators, but in each case the same calculations could be applied with one of these being an annotation produced by an automatic system. There are a number of considerations that must be taken into account when calculating agreement, such as what effect a differing segmentation of the original text, in two separate annotations, may have on the assignment of inference and conflict in an argument structure. To account for this, the agreement and results calculations were split into smaller sub-calculations covering segmentation similarity, propositional contents (inference and conflict) and dialogical contents (locutions). Calculating agreement for segmentation of argumentative units is a challenging task (Wacholder et al., 2014). The modular architecture of Argument Analytics allows for a range of measures to be displayed, and currently differences are accounted for using various segmentation similarity algorithms, which give an overall normalised score for the similarity. Propositional contents are compared by separating nodes from the text and instead using the Levenshtein distance for the matching of nodes. Dialogical contents are compared in the same way with word ordering added to the Levenshtein distance for node matching and with the addition of added calculations for the intricacies of dialogue (Duthie et al., 2016 provide an in-depth description of the comparative statistics module). 12.6.3 Dialogue Statistics For those argument analyses where there is a dialogue taking place between multiple participants, a range of dialogue analytics modules are able to provide insights into the dynamics of the discourse, and make these complex interactions accessible to a general audience. There is growing demand to present complex argumentative structures to a broad audience in ways which are both intuitive and interactive. Whilst there is some progress towards this goal, for example, the Election Debate Visualisation Project (Plüss & De Liddo, 2015), many of these approaches rely on custom, genre-specific interfaces for both the elicitation and display of argumentative structure. Dialogue oriented analytics modules make use of both the locution details stored in AIFdb, as well as the participant details provided by the Argument Web social layer (Snaith et al., 2017). Each of the modules in this section are illustrated using data from an episode of the BBC Radio 4 program Moral Maze.11 These examples show how such graphical displays of information can take the technical details captured in the argumentative structure of a complex debate, and present them in ways which are easily processed by a general audience. 11 http://www.bbc.co.uk/programmes/b006qk11.
12 Argument Mining and Analytics in Archaeology 281 Fig. 12.1 Graphical representations of the relative involvement of each participant in a dialogue, and how stimulating the points made by each participant are The structural statistics modules extract particular facets of the argumentative structure in order to display data such as who is speaking most, which pairs of participants are interacting most and who is making the most well supported arguments. As such, they provide a greater insight into the argumentative structure than that which is afforded by looking at a simple argument map of the same data. For each participant, the number of locutions they have made is counted and represented in a bar chart. This provides an easy way of identifying which participants were most, and least, dominant within a dialogue. An example can be seen in Fig. 12.1, which shows that Jan Macvarish was the most active participant in this dialogue with twenty-three locutions, whereas Matthew Taylor was least active with only one locution made. A point of debate is stimulating if it receives responses, either to agree or disagree. From the analysed argument structure, we count the number of locutions which each participant has made that have at least one response, and those which have been ignored by the other participants. The example in Fig. 12.1 shows that whilst Claire Fox has only made three locutions, they have all been responded to in some way, whereas, of the six locutions made by Clifford Longley, only two received any attention from the other participants. The chord diagram shows the interaction between participants. A chord diagram is a graphical method of displaying the inter-relationships between data in a matrix. The data is arranged radially around a circle with the relationships between the points drawn as arcs connecting the data together. In this case, the arcs represent interaction between participants, with the width of the arc at each end representing the number of locutions made by that participant to which the connected participant has responded. Viewing the interactions in this way makes it easy to identify, for example, cliques. An example chord diagram can be seen in Fig. 12.2. Clicking on a specific participant emphasises their connections with other participants. For
282 J. Lawrence et al. Fig. 12.2 Chord diagrams showing the frequency of interactions between participants. The diagram on the right shows Melanie Philips selected, highlighting just those interactions in which she is involved Fig. 12.3 Graphical representation of the turn structure in a dialogue, highlighting the way in which each participant introduces themselves, followed by direct interactions between two pairs of participants example, with Melanie Philips selected (as shown on the right of the figure), we can see that the majority of her interactions were with Jan Macvarish, reflecting the fact that, for a period of the dialogue, Melanie was questioning Jan. Similar to the average number of words per I-node presented in the overview, verbosity shows a comparison of the average length of locutions made by each participant. By comparing in this way, we are able to see not just the overall complexity of the ideas expressed, but also how prolix or concise each participant is in presenting their ideas. Temporal statistics use the time-stamping of locutions provided by AIFdb to show how the state of a dialogue has altered as it has progressed. These statistics provide clues, not easily discernible from an argument map, as to when individual participants have been most involved in the dialogue, when conflict has arisen, and changes in topic that have occurred as the dialogue progresses. Using the timestamping of locutions provided by AIFdb, a graphical representation of the turn structure in a dialogue is created. This visualisation provides a quick overview of when each participant has been most active, suggesting details of any pre-defined turn-taking rules. The example shown in Fig. 12.3 reflects the turn structure in a Moral Maze episode. As the episode begins, each of the four regular panelists speak briefly about the topic being discussed. A guest witness is then introduced, and, after providing their own views on the topic, are then questioned by first one of the panelists and then by a second.
12 Argument Mining and Analytics in Archaeology 283 Semantics-based analytics use Dung-style semantics (Dung, 1995) to determine the acceptability of a participant’s arguments. An AIF graph is translated into ASPIC.+ then, using TOAST, a Dung-style abstract argumentation framework is derived and evaluated. The defended points in a dialogue, are those where conflicting points have been made, but these conflicting points have, in turn been attacked. It is easy in a broad ranging and complex dialogue for points to be made which are not challenged either due to going unnoticed, or being simply dismissed. By looking at those points which are challenged and then later defended we gain an insight into both the validity of a point, and how crucial it is to the argument which a participant is making. Where one participant has more acceptable arguments than another, the former is said to carry more sway. This value is calculated for each participant, and displayed as the relative balance in sway between each pair of the most commonly interacting participants. This can, to some extent, be viewed as who is winning in a debate; best supporting their own points and best attacking the points made by the other participants in the dialogue. 12.6.4 Real-Time Statistics Many of the modules used in Argument Analytics have the ability to not only display data on a fixed, pre-analysed argument structure, but to update in realtime as the structure evolves. This functionality has been used, for example, in a tool developed for the Built Environment for Social inclusion through the Digital Economy (BESiDE) project,12 to facilitate round table discussions between architects working on the design of care environments, and the various stakeholders involved in the design process. As the discussion is taking place, the audio is recorded and an analyst uses a custom-designed interface to segment the dialogue when either the topic or the speaker changes. A simple dialogue protocol is used, allowing participants to make moves of various types (e.g. asking questions, agreeing with another participant, and offering their own opinion), and relating to a set of predefined topics relevant to the design project. Throughout the discussion, the dialogue overview shown in Fig. 12.4 is displayed for all participants to see. This overview includes a transcript of the dialogue on the right hand side, and analytics modules displaying how much each participant has spoken, and which topics have been discussed on the left. In testing these interfaces, it is interesting to see that they serve not only an informative function, but actually impact the dynamics of the dialogue. When a participant can see that they are talking more than everyone else, they tend to let others speak more. When someone hasn’t spoken yet, the other participants notice this, and make an effort to direct questions 12 http://beside.ac.uk/.
284 J. Lawrence et al. Fig. 12.4 Real-time Argument Analytics highlighting the involvement of individual participants and the topics discussed at them. And, when one topic has been less explored than the others, there is a noticeable shift towards that area in both the questions asked and the points raised. This ability for the argumentative and dialogical structure to, not only represent the outcome of a discussion, but to inform the participants and help ensure that all areas are fully explored has wide ranging potential applications. The current limitation to providing this kind of interface more widely is the ability to perform real-time analysis, but as tools, such as the Analysis Wall (Bex et al., 2013) which has been used to analyse several hour-long radio programmes in real time, improve, and automatic argument mining techniques develop, it is easy to imagine such a live display accompanying activities such as debates, meetings and media coverage. 12.7 Conclusion The recent rapid growth in argument mining shows that there is an increasing demand for the automated extraction of deeper meaning from the vast amounts of data that we currently produce and consume. Although techniques in opinion mining are able to tell us which conclusions are drawn, they do not tell us how those conclusions are supported. There is substantial commercial opportunity here as businesses increasingly want to build on the data that they gather in order to know more about the thoughts and behaviours of their customers, and it is unsurprising that many of the large players in the field are engaging, most visibly to date, IBM. One of the primary remaining challenges faced by argument mining is the lack of annotated argument data represented in a consistent ontology. Much recent work has focused on producing annotation guidelines targeted at specific domains (e.g. Kirschner et al., 2015, Walker et al., 2014, and Kiesel et al., 2015), and
12 Argument Mining and Analytics in Archaeology 285 whilst this has shown that data from these fields can be consistently annotated, the use of specific annotation schemes aimed at individual areas means that any techniques developed using this data are limited to that domain. The volume of data, particularly data annotated at the most fine-grained level, is still far below what would be required to apply many of the techniques previously discussed in a domain independent manner. Attempts are being made to overcome this lack of data, including the use of crowdsourced annotation (Ghosh et al., 2014; Skeppstedt et al., 2018) and automatic methods to extend the data currently annotated (Bilu et al., 2015). As these efforts combine with increasing attention to manual analysis, the volume of data available should increase rapidly. Schulz et al. (2018) also offer some solace in this regard, showing how multi task learning (training models across datasets from different domains), can improve results in domains where limited domain specific annotated data is available. To the best of our knowledge, no largescale corpora are available yet for Archaeology or Cultural Heritage. Even in cases where there is a greater volume of data, conflicting notions of argument are often problematic. In a qualitative analysis of six different, widely used, argument datasets, Daxenberger et al. (2017) show that each dataset appears to conceptualise claims quite differently. These results clearly highlight the need for greater effort in building a framework in which argument mining tasks are carried out, covering all aspects from agreement on the argument theoretical concepts being identified, through to uniform presentation of results and data. The Argument Interchange Format (Chesñevar et al., 2006) is a constructive proposal in this direction to arrive at a shared ontology of argumentation. Argument mining remains profoundly challenging, and traditional methods on their own seem to need to be complemented by stronger, knowledge-driven analysis and processing. However, the pieces required to successfully automate the process of turning unstructured data into structured argument are starting to take shape. As the volume of analysed argument continues to increase, and existing techniques are further developed and brought together, rapid progress can be expected. In addition to argument mining, we discussed the Argument Analytics suite, which provides a comprehensive range of analytic tools from the detailed statistics required for discourse analysis, to graphic visual representations making the same data accessible to a general audience. The existing modules which we have described offer solutions to a broad range of potential user groups, including those involved in argument analysis and critical discourse analysis, those working on argument mining applications, scientists working with complex evidence sets, people performing political or social studies, and members of the general public who wish to get a greater understanding of the issues and dynamics of a complex debate. Acknowledgments This research has received financial support from the grant “Heritage 3.0: Argumentation and Conceptual Modelling for Enhanced Cultural Heritage Participation and Management Policies” (ACME), grant number PID2020-114758RBI00 funded by MCIN/AEI/10.13039/501100011033 and project “Deflationist Views in
286 J. Lawrence et al. Ontology and Metaontology”, grant number PID2020-115482GB-I00, both funded by MCIN/AEI/10.13039/501100011033. References Abbott, R., Ecker, B., Anand, P., & Walker, M. A. (2016). Internet argument corpus 2.0: An SQL schema for dialogic social media and the corpora to go with it. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC), Portoroz (pp. 4445– 4452). Anthony, R., & Kim, M. (2015). Challenges and remedies for identifying and classifying argumentation schemes. Argumentation, 29(1), 81–113. Aristotle. (1958). Topics. Oxford: Oxford University Press. Bex, F., Lawrence, J., Snaith, M., & Reed, C. (2013). Implementing the argument web. Communications of the ACM, 56(10), 66–73. Bilu, Y., Hershcovich, D., & Slonim, N. (2015). Automatic claim negation: Why, how and when. In: Proceedings of the 2nd Workshop on Argumentation Mining (pp. 84–93). Denver: Association for Computational Linguistics. Budzynska, K. (2011). Araucaria-PL: Software for teaching argumentation theory. In Proceedings of the Third International Congress on Tools for Teaching Logic (TICTTL 2011), Salamanca (pp. 30–37). Budzynska, K., Janier, M., Reed, C., Saint-Dizier, P., Stede, M., & Yaskorska, O. (2014). A model for processing illocutionary structures and argumentation in debates. In Proceedings of the 9th Edition of the Language Resources and Evaluation Conference (LREC), Reykjavik (pp. 917– 924) Budzynska, K., Koszowy, M., & Pereira-Fariña, M. (2021). Associating ethos with objects: Reasoning from character of public figures to actions in the world. Argumention, 35(4), 519– 549. Carlile, W., Gurrapadi, N., Ke, Z., & Ng, V. (2018). Give me more feedback: Annotating argument persuasiveness and related attributes in student essays. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Melbourne: Association for Computational Linguistics. Chandler, D. (2003). Semiotics: The basics (1st publ. repr edition). London: Routledge. Chapman, R., & Wylie, A. (2016). Evidential reasoning in archaeology. London: Bloomsbury Academic. Chesñevar, C., McGinnis, J., Modgil, S., Rahwan, I., Reed, C., Simari, G., South, M., Vreeswijk, G., & Willmott, S. (2006). Towards an argument interchange format. The Knowledge Engineering Review, 21(04), 293–316. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46. Daxenberger, J., Eger, S., Habernal, I., Stab, C., & Gurevych, I. (2017). What is the essence of a claim? Cross-domain claim identification. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (pp. 2055–2066). Copenhagen: Association for Computational Linguistics. Dung, P. M. (1995). On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games. Artificial Intelligence, 77(2), 321–357. Duschl, R. A. (2007). Quality argumentation and epistemic criteria. In Argumentation in science education (pp. 159–175). Berlin: Springer. Duthie, R., Lawrence, J., Budzynska, K., & Reed, C. (2016). The CASS technique for evaluating the performance of argument mining. In Proceedings of the 3rd Workshop on Argumentation Mining (pp. 40–49). Berlin: Association for Computational Linguistics.
12 Argument Mining and Analytics in Archaeology 287 Feng, V. W., & Hirst, G. (2011). Classifying arguments by scheme. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1 (pp. 987–996). Portland: Association for Computational Linguistics. Freeman, J. B. (1991). Dialectics and the macrostructure of arguments: A theory of argument structure (vol. 10). Berlin: Walter de Gruyter. Freeman, J. B. (2011). Argument structure: Representation and theory. Berlin: Springer. Galassi, A., Kersting, K., Lippi, M., Shao, X., & Torroni, P. (2020). Neural-symbolic argumentation mining: An argument in favor of deep learning and reasoning. Frontiers in Big Data, 2, 52. Garssen, B. J. (2001). Argument schemes. In F. H. van Eemeren (ed.), Crucial concepts in argumentation theory (pp. 81–99). Amsterdam: Amsterdam University Press. Ghosh, D., Muresan, S., Wacholder, N., Aakhus, M., & Mitsui, M. (2014). Analyzing argumentative discourse units in online interactions. In Proceedings of the First Workshop on Argumentation Mining (pp. 39–48). Baltimore: Association for Computational Linguistics. Green, N. (2014). Towards creation of a corpus for argumentation mining the biomedical genetics research literature. In Proceedings of the First Workshop on Argumentation Mining (pp. 11–18). Baltimore: Association for Computational Linguistics. Green, N. (2015). Identifying argumentation schemes in genetics research articles. In Proceedings of the 2nd Workshop on Argumentation Mining (pp. 12–21). Denver: Association for Computational Linguistics. Green, N. (2018). Proposed method for annotation of scientific arguments in terms of semantic relations and argument schemes. In Proceedings of the 5th Workshop on Argument Mining. Brussels: Association for Computational Linguistics. Grennan, W. (1997). Informal logic: Issues and techniques. Montreal: McGill-Queen’s PressMQUP. Groarke, L., Tindale, C., & Fisher, L. (1997). Good reasoning matters! A constructive approach to critical thinking. Toronto: Oxford University Press. Habernal, I., & Gurevych, I. (2017). Argumentation mining in user-generated web discourse. Computational Linguistics, 43(1), 125–179. Hastings, A. C. (1963). A reformulation of the modes of reasoning in argumentation. Ph.D. Thesis, Northwestern University. Hidey, C., & McKeown, K. (2018). Persuasive influence detection: The role of argument sequencing. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans. Hua, X., & Wang, L. (2017). Neural argument generation augmented with externally retrieved evidence. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (pp. 219–230). Vancouver: Association for Computational Linguistics. Janier, M., Lawrence, J., & Reed, C. (2014). OVA+: An argument analysis interface. In S. Parsons, N. Oren, C. Reed, & F. Cerutti (Eds.), Proceedings of the Fifth International Conference on Computational Models of Argument (COMMA 2014) (pp. 463–464). Pitlochry: IOS Press. Janier, M., & Reed, C. (2016). Corpus resources for dispute mediation discourse. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC), Portoroz (pp. 1014–1021). Katzav, J., & Reed, C. (2004). On argumentation schemes and the natural classification of arguments. Argumentation, 18(2), 239–259. Ke, Z., Carlile, W., Gurrapadi, N., & Ng, V. (2018). Learning to give feedback: Modeling attributes affecting argument persuasiveness in student essays. In Proceedings of the TwentySeventh International Joint Conference on Artificial Intelligence, IJCAI-18 (pp. 4130–4136). Stockholm: International Joint Conferences on Artificial Intelligence Organization. Kienpointner, M. (1992). Alltagslogik: Struktur und funktion von argumentationsmustern. Stuttgart: Frommann-Holzboog. Kiesel, J., Al Khatib, K., Hagen, M., & Stein, B. (2015). A shared task on argumentation mining in newspaper editorials. In Proceedings of the 2nd Workshop on Argumentation Mining (pp. 35–38). Denver: Association for Computational Linguistics.
288 J. Lawrence et al. Kirschner, C., Eckle-Kohler, J., & Gurevych, I. (2015). Linking the thoughts: Analysis of argumentation structures in scientific publications. In Proceedings of the 2nd Workshop on Argumentation Mining (pp. 1–11). Denver: Association for Computational Linguistics. Knott, A. (1996). A data-driven methodology for motivating a set of coherence relations. Ph.D. Thesis, Department of Artificial Intelligence, University of Edinburgh. Lawrence, J., Bex, F., Reed, C., & Snaith, M. (2012). AIFdb: Infrastructure for the argument web. In Proceedings of the Fourth International Conference on Computational Models of Argument (COMMA 2012) (pp. 515–516).Vienna: IOS Press. Lawrence, J., Duthie, R., Budzysnka, K., & Reed, C. (2016). Argument analytics. In P. Baroni, M. Stede, & T. Gordon (Eds.), Proceedings of the Sixth International Conference on Computational Models of Argument (COMMA 2016) (pp. 371–378). Berlin. IOS Press. Lawrence, J., Park, J., Budzynska, K., Cardie, C., Konat, B., & Reed, C. (2017). Using argumentative structure to interpret debates in online deliberative democracy and erulemaking. ACM Transactions on Internet Technology, 17(3), 25. Lawrence, J., & Reed, C. (2015). Combining argument mining techniques. In: Proceedings of the 2nd Workshop on Argumentation Mining (pp. 127–136). Denver: Association for Computational Linguistics. Lawrence, J., & Reed, C. (2016). Argument mining using argumentation scheme structures. In P. Baroni, M. Stede, & T. Gordon (Eds.), Proceedings of the Sixth International Conference on Computational Models of Argument (COMMA 2016) (pp. 379–390). Potsdam: IOS Press. Lawrence, J., & Reed, C. (2017). Mining argumentative structure from natural language text using automatically generated premise-conclusion topic models. In Proceedings of the 4th Workshop on Argument Mining (pp. 39–48). Copenhagen: Association for Computational Linguistics. Lawrence, J., & Reed, C. (2020). Argument mining: A survey. Computational Linguistics, 45(4), 765–818. Lawrence, J., Visser, J., & Reed, C. (2019). An online annotation assistant for argument schemes. In Proceedings of the 13th Linguistic Annotation Workshop (pp. 100–107). Florence: Association for Computational Linguistics. Levy, R., Gretz, S., Sznajder, B., Hummel, S., Aharonov, R., & Slonim, N. (2017). Unsupervised corpus–wide claim detection. In Proceedings of the 4th Workshop on Argument Mining (pp. 79–84). Copenhagen: Association for Computational Linguistics. Lucas, G. (2019). Writing the past, 1 edn. Milton: Routledge. Miller, G. A. (1995). Wordnet: A lexical database for english. Communications of the ACM, 38(11), 39–41. Moens, M.-F., Boiy, E., Palau, R. M., & Reed, C. (2007). Automatic detection of arguments in legal texts. In Proceedings of the 11th International Conference on Artificial Intelligence and Law (pp. 225–230). Stanford: ACM. Murdock, J., Allen, C., Borner, K., Light, R., McAlister, S., Ravenscroft, A., Rose, R., Rose, D., Otsuka, J., Bourget, D., Lawrence, J., & Reed, C. (2017). Multi-level computational methods for interdisciplinary research in the hathitrust digital library. PLOS ONE, 12(9), 1–21. Niculae, V., Park, J., & Cardie, C. (2017). Argument mining with structured SVMS and RNNS. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 985–995). Vancouver: Association for Computational Linguistics. Palau, R. M., & Moens, M.-F. (2009). Argumentation mining: The detection, classification and structure of arguments in text. In Proceedings of the 12th International Conference on Artificial Intelligence and Law (pp. 98–107). Barcelona: ACM. Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2, 1–135. Pankhurst, R. J. (1978). Biological identification. London: Edward Arnold. Park, J., & Cardie, C. (2014). Identifying appropriate support for propositions in online user comments. In Proceedings of the First Workshop on Argumentation Mining (pp. 29–38). Baltimore: Association for Computational Linguistics.
12 Argument Mining and Analytics in Archaeology 289 Peldszus, A. (2014). Towards segment-based recognition of argumentation structure in short texts. In Proceedings of the First Workshop on Argumentation Mining (pp. 88–97). Baltimore: Association for Computational Linguistics. Perelman, C., & Olbrechts-Tyteca, L. (1969). The new rhetoric: A treatise on argumentation. Notre Dame: University of Notre Dame Press. Plüss, B., & De Liddo, A. (2015). Engaging citizens with televised election debates through online interactive replays. In Proceedings of the ACM International Conference on Interactive Experiences for TV and Online Video (pp. 179–184). New York: ACM. Pollock, J. L. (1995). Cognitive carpentry: A blueprint for how to build a person. Cambridge: MIT Press. Prakken, H. (2010). An abstract framework for argumentation with structured arguments. Argument and Computation, 1(1), 93–124. Rahwan, I., Zablith, F., & Reed, C. (2007). Laying the foundations for a world wide argument web. Artificial Intelligence, 171, 897–921. Reed, C. (2006). Preliminary results from an argument corpus. In E. M. Bermúdez & L. R. Miyares (Eds.), Linguistics in the twenty-first century (pp. 185–196). Cambridge: Cambridge Scholars Press. Reed, C., Mochales Palau, R., Rowe, G., & Moens, M.-F. (2008). Language resources for studying argument. In Proceedings of the 6th Language Resources and Evaluation Conference (LREC2008), Marrakech (pp. 91–100). Reed, C., & Rowe, G. (2004). Araucaria: Software for argument analysis, diagramming and representation. International Journal on Artificial Intelligence Tools, 13(4), 961–980. Rinott, R., Dankin, L., Perez, C. A., Khapra, M. M., Aharoni, E., & Slonim, N. (2015). Show me your evidence-an automatic method for context dependent evidence detection. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon (pp. 440–450). Schulz, C., Eger, S., Daxenberger, J., Kahse, T., & Gurevych, I. (2018). Multi-task learning for argumentation mining in low-resource settings. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers) (pp. 35–41). New Orleans: Association for Computational Linguistics. Skeppstedt, M., Peldszus, A., & Stede, M. (2018). More or less controlled elicitation of argumentative text: Enlarging a microtext corpus via crowdsourcing. In Proceedings of the 5th Workshop on Argument Mining (pp. 155–163). Brussels: Association for Computational Linguistics. Snaith, M., Medellin, R., Lawrence, J., & Reed, C. (2017). Arguers and the argument web. In F. Bex, F. Grasso, N. Green, F. Paglieri, & C. Reed (Eds.), Argument technologies: Theory, analysis & applications (pp. 57–72). College Publications. Snoeck Henkemans, A. F. (1992). Analyzing complex argumentation. SicSat. Song, Y., Heilman, M., Beigman Klebanov, B., & Deane, P. (2014). Applying argumentation schemes for essay scoring. In Proceedings of the First Workshop on Argumentation Mining (pp. 69–78). Association for Computational Linguistics. Stab, C., & Gurevych, I. (2014a). Annotating argument components and relations in persuasive essays. In Proceedings of the 25th International Conference on Computational Linguistics, Dublin (pp. 1501–1510). Stab, C., & Gurevych, I. (2014b). Identifying argumentative discourse structures in persuasive essays. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 46–56). Doha: Association for Computational Linguistics. Stab, C., & Gurevych, I. (2017). Parsing argumentation structures in persuasive essays. Computational Linguistics, 43(3), 619–659. Toulmin, S. E. (1958). The uses of argument. Cambridge: Cambridge University Press. van Eemeren, F. H. (2018). Argumentation theory: A pragma-dialectical perspective. Argumentation Library. Berlin: Springer. van Eemeren, F. H., & Grootendorst, R. (1992). Argumentation, communication, and fallacies: A pragma-dialectical perspective. Mahwah: Lawrence Erlbaum Associates.
290 J. Lawrence et al. van Eemeren, F. H., Houtlosser, P., & Snoeck Henkemans, A. F. (2007). Argumentative indicators in discourse: A pragma-dialectical study. Argumentation Library. Berlin: Springer. van Rijsbergen, C. J. (1979). Information retrieval. Butterworth. Visser, J., Duthie, R., Lawrence, J., & Reed, C. (2018a). Intertextual correspondence for integrating corpora. In N. Calzolari, K. Choukri, C. Cieri, T. Declerck, S. Goggi, K. Hasida, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, S. Piperidis, & T. Tokunaga (Eds.), Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (pp. 3511–3517). Miyazaki: European Language Resources Association (ELRA). Visser, J., Konat, B., Duthie, R., Koszowy, M., Budzynska, K., & Reed, C. (2020). Argumentation in the 2016 US presidential elections: annotated corpora of television debates and social media reaction. Language Resources and Evaluation, 54(1), 123–154. Visser, J., Lawrence, J., & Reed, C. (2020a). Reason-checking fake news. Communications of the ACM, 63(11), 38–40. Visser, J., Lawrence, J., Reed, C., Wagemans, J., & Walton, D. (2021). Annotating Argument Schemes. Argumentation, 35(1), 101–139. Visser, J., Lawrence, J., Wagemans, J., & Reed, C. (2018b). Revisiting computational models of argument schemes: Classification, annotation, comparison. In S. Modgil, K. Budzynska, & J. Lawrence (Eds.), Proceedings of the Seventh International Conference on Computational Models of Argument (COMMA 2018) (pp. 313–324). Warsaw: IOS Press. Voss, E. G. (1952). The history of keys and phylogenetic trees in systematic biology. Journal of the Science Laboratories, Denison University, 43(1), 1–25. Wacholder, N., Muresan, S., Ghosh, D., & Aakhus, M. (2014). Annotating multiparty discourse: Challenges for agreement metrics. LAW VIII, p. 120. Wagemans, J. (2016). Constructing a periodic table of arguments. In P. Bondy & L. Benacquista (Eds.), Argumentation, Objectivity, and Bias: Proceedings OSSA 11 (pp. 1–12). OSSA. Walker, M. A., Tree, J. E. F., Anand, P., Abbott, R., & King, J. (2012). A corpus for research on deliberation and debate. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC), Istanbul (pp. 812–817). Walker, V., Vazirova, K., & Sanford, C. (2014). Annotating patterns of reasoning about medical theories of causation in vaccine cases: Toward a type system for arguments. In Proceedings of the First Workshop on Argumentation Mining (pp. 1–10). Baltimore: Association for Computational Linguistics. Walton, D. (1996). Argumentation schemes for presumptive reasoning. Mahwah: Lawrence Erlbaum Associates. Walton, D. (2006). Fundamentals of critical argumentation. Cambridge: Cambridge University Press. Walton, D., Reed, C., & Macagno, F. (2008). Argumentation schemes. Cambridge: Cambridge University Press. Webber, B., Egg, M., & Kordoni, V. (2011). Discourse structure and language technology. Natural Language Engineering, 18(4), 437–490. Wyner, A., Schneider, J., Atkinson, K., & Bench-Capon, T. (2012). Semi-automated argumentative analysis of online product reviews. In Proceedings of the Fourth International Conference on Computational Models of Argument (COMMA 2012) (pp. 43–50). Vienna: IOS Press.
Chapter 13 Computational Processing of Language Vagueness for Archaeological Site Modelling Maria Elena Castiello Abstract This chapter aims to outline the challenge of language uncertainty and vagueness for the construction of predictive models in archaeology. It includes methods and examples to deal with the issue of uncertainty and vagueness arising from archaeological datasets and elaborates quantitative tools to process and integrate it in a Machine Learning (ML) framework. In particular, the chapter is focused on the combination of a fuzzy set approach with the well-known ensemble algorithm of Random Forest (RF). On this basis, Archaeological Predictive Maps (APM) for two case studies are produced and an uncertainty visualization strategy is defined, based on statistics and cognitive theory methods. A procedure is suggested in order to visually represent and communicate the uncertainty in the final output of the modeling procedure. A four-steps methodology is described here, to consistently estimate and process the language vagueness, without increasing the computational cost of an ML environment, so that it is possible to produce APMs, incorporating confidence intervals and subjective values. The goal is to provide archaeologists with the necessary theoretical and methodological infrastructure to critically evaluate and compute various levels of uncertainty and vagueness that are inherent in archaeological databases and to design best practices for establishing scientific transparency of the results, as well as to improve the efficiency of APMs in research and decision-making processes, within cultural heritage management and archaeological research. Keywords Archaeology · Uncertainty · Machine learning · Fuzzy theory · Predictive maps M. E. Castiello (!) Institute of Archaeological Sciences, University of Bern, Bern, Switzerland e-mail: maria.castiello@faculty.unibe.ch © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 C. Gonzalez-Perez et al. (eds.), Discourse and Argumentation in Archaeology: Conceptual and Computational Approaches, Quantitative Archaeology and Archaeological Modelling, https://doi.org/10.1007/978-3-031-37156-1_13 291
292 M. E. Castiello 13.1 Introduction There have been a multitude of different attempts at taxonomies for structuring uncertainty (an example is shown in Fig. 13.1). Smithson’s taxonomy indicates that uncertainty itself can be internally differentiated (Smithson, 1989) and recently Spiegelhalter (2017) has provided a short categorization of uncertainty, focusing on statistical modelling of risks, which is expressed as follow: – aleatoric (irreducible randomness inherent in a process) – epistemic (uncertainty from a lack of knowledge that could theoretically be reduced given more information) – ontological (uncertainty about how accurately the modelling describes reality, which can only be described subjectively) Epistemic uncertainty, considered as ‘lack of information’, is an unavoidable aspect of life and is inherent to most data. In this Chapter, the term uncertainty is used to refer to the epistemic one, that can be quantified and visualized, most commonly as a probability distribution a priori of any archaeological (predictive) modelling procedure. Moreover, being the future by definition uncertain, as we only have limited information about it, predictions are furthermore generally uncertain, and this uncertainty can only be resolved by comparing it with the actual outcome, according to Piotrowski (2019). Thus, using models to project probable futures, based on current information and understanding, can entail additional uncertainties. Indeed, Evans (2012) framed that the uncertainty can have various sources, occurring throughout the life cycle of any model exercise: (i) uncertainty associated with input data; (ii) model choice; (iii) model parameters; and (iv) model outputs. Input data in particular can introduce uncertainty by various means, such as errors in data measurement, overlooking data, or through the choice of inappropriate sample sizes or inappropriate discretization measures. Choices in model settings might also introduce uncertainty in different Fig. 13.1 Smithson’s taxonomy of ignorance. (Source: Smithson, 1989: 9)
13 Computational Processing of Language Vagueness for Archaeological Site. . . 293 ways, in particular by the selection of variables, scale, parameters, and algorithms, or mathematical transformations (Espig et al., 2020; Brouwer Burg et al., 2012). But epistemic uncertainty does not only concern the future instead mainly the past, and everything is interconnected with it. Fusco and de Runz (2020) have rightly pointed out that “what we perceive about past human activities is limited to the materials that cross the ages to reach us, and obviously to the areas investigated and methods used”. This archaeological incompleteness causes uncertainty (often described linguistically/verbally) at different research stages, starting from the excavation databases and the assembly of inventory collections, that are used today for a multitude of quantitative analysis (e.g. spatial and statistical analyses; site distribution maps; 3D reconstructions, etc.). As already highlighted by Niccolucci and Hermon (2003), all “Archaeological (spatial) concepts are often defined in an imprecise way” such as the archaeological “site” itself and epistemic vagueness may occur indeed, “when trying to define what exactly is a settlement or when struggling to ascribe an archaeological object to a precise epoch”, as Gonzalez-Perez (2018) pointed out (Fig. 13.2). Scholars have long acknowledged and extensively discussed this problem (see for example: Taheri et al., 2019; Ramos-Soto et al., 2017; van der Leeuw, 2016; Barceló & Bogdanovic, 2015; Niccolucci & Hermon, 2015; Lieskovský et al., 2013; Evans, 2012; Mink et al., 2009; De Runz et al., 2007; Refsgaard et al., 2007; Niccolucci & Hermon, 2004; Ducke, 2003). In 1996, Lock G. and Harris T. underlined the general weight of uncertainty in archaeology where data are mostly fragmentary and difficult to date. A theme addressed as well by Bevan et al. in 2013, who said: “One final, complicating factor for archaeologists is the fact that archaeological observations are very partial, imperfect records of past activity. Much of the variability in our observed spatial patterns is due to patchy levels of archaeological preservation and investigation”. The uncertainty or the bias encompassed in the archaeological datasets is clearly an endemic factor and can often be traced back to the survey and storage/digitalization methods. Hermon and Niccolucci (2003) pointed out how the elaboration of a database is essentially based on maps (sometimes in the traditional paper format, which needs to be digitized), photos (that need to be adjusted with photogrammetry programs) and drawings (both hand and computer made) and by doing so, archaeologists perform a simplification, that in some cases may also result in an oversimplification of the past reality. Likewise, the typological classification of an archaeological object presents several aspects that may affect the accuracy of the archiving procedure. If we look closely at the excavation activity, the uncovered objects or structures are often only the remains of what constituted probably the foundations of ancient buildings. This scarce information needs to be completed by analyzing sparse material, using ethnographic parallels, textual descriptions or by comparing with better preserved sites (if they exist). The rest of the interpretation has to be completed by the researcher based on his/her knowledge or, ultimately left incomplete. When creating an excavation database, the interpretations of analytical results are always prone to an unknown degree of uncertainty. Similarly, many uncertainties surround the interpretive possibilities of an archaeological record as well as the subsequent contextualization and modelling of human behavior.
294 M. E. Castiello Fig. 13.2 A conceptual model of uncertainty in spatial data. (Adapted from Fischer et al., 2006) The uncertain, inaccurate archaeological information problem has been approached in different ways over the time, believing that by tackling the issue this would further strengthen the modelling accuracy. Fusco and de Runz (2020) in fact point out that “if swept under the carpet, data imperfection spreads throughout analyses, results and interpretation. It then grows out of control, and prevents us from assessing the validity of our conclusions or from directly comparing situations and phenomena.” Thus, the integration of uncertainty can moreover allow for questioning how models can be compared, and provides more information about the data and the model itself (Morrison, 2015). A seen so far, any modelling exercise is sensitive to the quality and quantity of the underlying data. However, at the same time, defining what is meant by quality of information can be very difficult, given the diversity of dimensions that this concept takes on (Fusco, 2016). As of yet, according to Goodchild (2003): Quality [...] is a measure of the difference between the data and the reality that they represent, and becomes poorer as the data and the corresponding reality diverges. Thus, if data are of poor quality, and tell us little about the geographic world, then they have little value.
13 Computational Processing of Language Vagueness for Archaeological Site. . . 295 The predictive models used in archaeology are no exception to such rules. Archaeological Predictive Maps (APMs) have been abundantly described in literature and are intended here as exploratory data tool that can help identifying suitable locations of specific types of human activity and their archaeological remains, to trace and highlight patterns of settlement preference choices in the landscape (to cite few examples: Castiello, 2022; Castiello & Tonini, 2021; Rogers et al., 2014; Verhagen, 2007; Kamermans et al., 2005; Van Leusen, 2002; Kvamme, 1990). APMs, as result of computational models are subject to accumulate an unknown degree of uncertainty and reliability all along their designing process. However, they have the clear advantage of being (1) formal and (2) executable and can thus be automatically tested against large amount of actual linguistic assertions (Piotrowski, 2019). According to Piotrowski (2019) “uncertainty, when processed and manipulated in computational models, can be preserved through the modeling exercise and can be formalized and made explicit in the visual results. The problem of managing uncertain data in computer science has generally received much more attention in mathematics and natural sciences than in humanities, and various methods and approaches have been developed by mathematicians and statisticians to deal with the different types of uncertainty. This is probably motivated by the need to provide effective real world-applications (e.g. the need to deal with uncertainty data from environmental sensors or medical data, statistical anonymized data, spatio-temporal data extrapolated by mobile applications, etc.) (Piotrowski, 2019). The literature indeed provides several examples of computational approaches for dealing with uncertainty drawing heavily on probability theory, statistics and information theory (see e.g. Nagypál & Motik, 2003; McBurney & Parsons, 2001; Shannon, 1948; Dempster, 1967; Shafer, 1976; Zadeh, 1965). In the archaeological research field, Brouwer Burg et al. (2012) noticed that despite the growing body of literature on archaeological modelling and more recently of computational modelling, the question of uncertainty and model validation is still rarely addressed. It is believed, according to Gonzalez-Perez (2018) that “revealing all doubts, as well as the general model limitations to the reader, can be seen as a matter of scientific ethics, at least as important as compiling a convincing story.” Attempts to resolve this conundrum of structuring and integrating uncertainty in archaeological (predictive) modeling procedures have long mostly relied on a ‘classic’ probabilistic framework. The Archäoprognose Brandenburg project carried out by Ducke (2014) and the resulting APM, represent a first attempt made in this direction. A key concept of this study was the management of uncertainty as introduced by missing data, incomplete datasets, errors, and the diverse sources of information. The procedure selected relied on Dempster Shafer Theory of Evidence (DST) (Van Leusen et al., 2009; Ducke & Münch, 2005; Ejstrud, 2005; Ducke, 2003; Ejstrud, 2003). The author assigns an explicit set of values to the data e.g. by weighting the data or variables used in the modeling procedure, based on expert judgment or subjective knowledge. The uncertainty in its chronological declination has been meanwhile probabilistically approached with Bayesian statistics (Buck et al., 1996). Desachy (2012), for example, produced a first chronological model to deal with stratigraphic sequences
296 M. E. Castiello interpretation and the chronological uncertainty, tasks generally handled during archaeological excavations. Crema et al. (2010) addressed the similar issue of intrinsic temporal uncertainty in archaeological datasets by adopting a probabilistic approach and a diachronic analysis for developing a distribution model of the Middle to Late Jomon pithouses in Japan, trying “to make best use of the available information by integrating different degrees of knowledge”. The authors ultimately suggested an “environment where comparisons between alternative hypotheses are made easier”. Similarly, Bevan et al. (2013) explored probabilistic and spatial-statistical methods for assigning pottery artifacts discovered during intensive excavations carried out on Antikythera island (Greece) to particular chronological periods. The authors suggested a belief-based approach to quantify local, intra-site uncertainty and compare the chronology assigned across different excavations and sites “by considering the degree to which the uncertainty associated with one period is linked to the uncertainty associated with another”. In the predictive archaeological exercise instead, scholars gradually moved away from the ‘classical’ probabilistic framework, considered not the best suited to model uncertainty and vagueness in predictive archaeology (Fusco & de Runz, 2020). For instance, Conolly and Lake (2006) stated that: Prediction is probabilistic. Very few, if any, models predict site occurrence with the absolute certainty of presence or absence. Consequently it usually makes sense to talk about the model correctly predicting site presence at some specified probability p between 0.0 and 1.0. The most recent trends show a strong preference for a fuzzy logic approach. To streamline, fuzzy logic (a branch of mathematics evolved out of “fuzzy set” theory) is a technique that allows considering uncertainty by ranking the “truth” or the accuracy of the modeled data by degree or percentage rather than seeing it as a binary (true/false) information (Zadeh, 1965; Hájek, 1998; Yager & Filev, 1994; Halpern, 2003). As useful recall on the fuzzy theory stands Gacôgne’s (2003) definition: “Fuzzy logic, or more generally the treatment of uncertainties, is to study the representation of imprecise knowledge and reasoning approached”. Zadeh introduced the so-called fuzzy sets theory in 1965, characterized by a function that may vary between 0 and 1, not only assuming the two extreme values as for ordinary sets (crisp sets). As defined by Fisher (2006), a fuzzy set F is a pair (!, µF) where !, is a set and µF is the mapping of ! to the unit interval [0, 1]. degree of membership of ! in F µF: ! → [0, 1] µF (ω) for ω ∈ ! Although this approach has not yet been widely adopted in archaeology (Fusco, 2016; Evans, 2012; Refsgaard et al., 2007), its theoretical and methodological framework is particularly well suited when trying to model human behavior over space and time in attempting to analyze and predict with exactitude the spatial preference, choice, and movement of individuals. In the predictive modeling procedure, such approach, as pointed out by Fusco (2016), has the advantage of keeping all
13 Computational Processing of Language Vagueness for Archaeological Site. . . 297 the available data rather than considering only those estimated as “reliable” and eliminating the “unreliable” from the databases. This pre-selection in fact could even be seen as counterproductive: in addition to loosing data, it could lead the analysis to overestimate the quality and certainty of the data considered to be reliable. Thus, this technique has been applied in many research fields since its formulation (Ragin, 2000; Roberts, 1986; Moraczewski, 1993; Sattler, 1996) but again, with very few explorations undertaken in archaeology (Verhagen, 2007; Hatzinikolaou et al., 2003; Niccolucci et al., 2001; Barceló & Pallarés, 1998). For instance, the ArchaeDyn I research project (Favory & Nuninger, 2008) developed a method for weighting archaeological data incompleteness and uncertainty through “Confidence Maps”. This approach highlighted inventories data biases and reliability, suggested ways to interpret the absence or the lower number of archaeological data in a particular area investigated by using a fuzzy logic approach. This study contributed to raise awareness on the importance of uncertainty visualization in archaeological data processing. “Considering information in a fuzzy dimension offers an alternative method which prevents us from making restrictive choices in modelling and/or forcing us to reject all unreliable data” as Fusco and de Runz (2020) stressed out. They indeed have very recently suggested an exploratory method to model the spatiotemporal structures and dynamics of settlements during the Bronze Age in the Syrian Fertile Crescent, which were described mainly by imperfect archaeological data. The authors developed a Fuzzy set approach to tackle data imperfection and to further make estimates and assumptions about potential settlement location in unsurveyed areas by setting up an archaeological predictive modeling framework which moreover integrated uncertainty. Strong emphasis has been placed on the advantage of incorporating not only textually but also numerically (calculating an index of reliability, for example one that goes from “totally sure” – this is what we believe – to pure uncertainty – that is “we have no proof whatsoever”) and visually such qualitative predication into the predictive modelling procedure, which becomes central to a robust analysis and strengthens the integrity of the research results (Gonzalez-Perez, 2018; Balla et al., 2013; Jaroslaw & Hildebrandt-Radke, 2009; Vaughn & Crawford, 2009). A quantification protocol allows us assigning quantitative uncertainty values to qualitative linguistic labels (such as “unsure” or “unknown”, etc.) in a systematic manner. Once obtained, quantitative values can be further algorithmically processed in the ML framework. The approach developed in this Chapter is part of an exploratory line of reasoning where one first tries to identify the different levels of uncertainty expressed by the data, going from the general to the specific, before to model uncertain sites through the ML approach. In fact, also according to Farinetti et al. (2004), uncertainty should ideally not be considered only a posteriori, when the results of the modeling procedure have been already obtained, but should be integrated at the time of data collection, not only as an attribute of the artifact or site in question (such as its epoch or typology) but as the point of view, or even the theoretical basis from which the data are considered.
298 M. E. Castiello It is argued here that combining Machine Learning algorithms based on a binary logic (“site”/“no site”) with a upstream fuzzy approach represents an innovative and interesting solution and “can enable more rigorous research practice, and attune archaeologists to data-centric imperfections processing in archaeological data” (Gupta, 2020). At the same time, it is also necessary to frame and communicate this uncertainty effectively, thus to choose the right visualization method. The communication of uncertainty can play indeed an equally significative role in the context of quantification and processing of archaeological information, although the research in this context has been relatively limited and often underestimated. It is only recently that this research topic is emerging as of great importance. Significant contributions in delivering uncertain data and information originate by combining techniques using cognitive theory and psychology statistical methods (Padilla et al., 2021; Hullman et al., 2015, 2018; Fernandes et al., 2018; ZikmundFisher et al., 2014). In this specific context, the most accredited method to visually represent uncertainty belongs to the group of visual encoding channels (Munzner, 2014; MacEachren et al., 2012; Brodlie et al., 2012). As described by Padilla et al. (2021): Visual encoding channels define the appearance of marks using controls such as color, position, and transparency. Techniques that use encoding channels have the added benefit of adjusting a mark that is already in use, such as making a mark more transparent if the uncertainty is high (Fig. 13.3). Since the way which people reason with uncertainty is nonintuitive and it can be intensified if uncertainty information is addressed and communicated visually (Padilla et al., 2021), the encoding channel method is believed to be more efficient in evoking uncertainty associations notably in geographic information systems and cartography (Kinkeldey et al., 2014, 2017; MacEachren et al., 2018). Thus, it has been selected in this Chapter to visually represent uncertainty in the final predictive maps. It is argued that using a visualization technique based on encoding channels can stimulate the archaeologists to incorporate uncertainty in their decision-making process and indirectly instigate them to use this uncertainty information rather than eliminating it. Fig. 13.3 Examples of encoding channels. (Source: Padilla et al., 2021)
13 Computational Processing of Language Vagueness for Archaeological Site. . . 299 13.2 Case Studies In a tangible manner, two case studies have been selected and analyzed for uncertainty quantification processing and predictive modeling computations. The institutional archaeological databases of the Cantons of Aargau and Geneva, two regions respectively located in the northern and southern Switzerland (Figs. 13.4 and 13.5) were provided by the local Archaeological Departments1 in the form of digital tables containing a list of surveys carried out in the regions over the last decades and inventoried until 2015 (for Aargau: 3101 entries; for Geneva: 865 entries). Both databases embedded information belonging to different epochs, spanning from Mesolithic to Middle Age with several attribute fields and details about the discoveries made in the regions. As it is often the case, the recorded information varies considerably in structure and quantity from one region to another. Hence, two new constructed geo-spatial databases were developed in ArcGIS environment (Esri, release 10.7) in order to establish a reproducible approach of data management. Since one of the goals of the modeling procedure as explained in this study is to identify areas in the landscape susceptible to experience the presence of still undiscovered Roman settlements (based on uncertain information), analyses were focused on the “settlement” category, which often referred to as building-housingliving spaces. As shown in Fig. 13.6, the entries defined as belonging to the Roman epoch account for roughly 10–30% of all data points in the Cantons of Aargau and Geneva. At a closer look to the databases, many fields carry doubt information and numerous entries lack precise geographical coordinates, as well as several rows were left blank. The inventoried records mainly express uncertainty or vagueness through linguistic statements, in the sense that archaeologists qualitatively evaluated and inventoried the objects discovered by means of a degree of subjective reliability by adding attributes in the fields, such as: “sure”, “unsure”, “unknown”, “undefined”, “possible”, “potential.” Then, a new database architecture was designed to minimize the potential errors during new data entry or modification and to maximize the database flexibility and its potential for further research based on the same data. Entries with no coordinates or those left blank were erased. Coordinates were checked and adjusted to comply with the new system requirements in use. Descriptive information about each entry were cross-referenced. In the specific case of AG database, the uncertainty is manifold. At time, it relates to the chronology and to the typology characteristics. The degree of uncertainty is originally stored in both fields of Typology and Datation with qualitative expressions such as sure, unsure, unknown. In the Typology field we can find for example the following entries: Settlement – unsure, Religious site - sure, Grave – unknown, etc. 1 Kanton Aargau, Departement Bildung, Kultur und Sport, Abteilung Kultur, Kantonsarchäologie. République et Canton de Genève, Office du patrimoine et des sites, Service cantonale d’archéologie.
300 Fig. 13.4 Roman archaeological sites in the Canton of Aargau M. E. Castiello
13 Computational Processing of Language Vagueness for Archaeological Site. . . 301 Fig. 13.5 Roman archaeological sites in the Canton of Geneva The same is true for the Datation field, where information are stored as following: Roman – sure, Medieval - unsure, Neolithic – unknown, etc. In GE database, uncertainty is not differentiated between typology and datation, but is assigned to each entry as a whole, e.g. a roman settlement is considered as sure or a certain area is considered as potential-suspected Roman religious site. The italicized terms are inexact concepts whose meanings are fuzzy. Thus, uncertainty arising from the interpretation and processing of these inexact concepts has nothing to do with randomness but is directly related to fuzziness. Since interaction exists between those linguistic variables, with varying degrees of intensity, conventional binary representation is defined as usually inadequate (Leung, 1983). Indeed, the uncertainty contained in the two databases analyzed is subjective and
302 M. E. Castiello Fig. 13.6 Percentage of entries labeled as belonging to the Roman epoch in the Aargau and Geneva databases. (Source: Castiello, 2022) Fig. 13.7 The four steps of the proposed methodology directly related to the opinion of the agent/researcher who compiled them in the first instance, and the reliability of the data depends on said agent’s state of knowledge. Fuzzy theory and Fuzzy sets can help to better address and handle this problem because it does not require to have sharp boundaries that distinguish members of a set from non-members. On the contrary, a fuzzy membership rather reflects a matter of degree of belonging (Zadeh, 1965). A fuzzy set A of a universe X is defined by a function that assigns to each object x in X a membership degree of x in A. 13.3 Method Four sequential steps were followed for the exploration and processing of archaeological language uncertainty: (i) identification, (ii) quantification, (iii) modelling and (iv) visualization as schematically illustrated in Fig. 13.7. The methodological procedure essentially aims to explore the effectiveness of a fuzzy set theory approach in archaeological data uncertainty quantification and to develop a ML model framework for predicting archaeological settlements in given
13 Computational Processing of Language Vagueness for Archaeological Site. . . 303 areas. In particular, it suggests a way to visually represent the quantified uncertainty and its incorporation into the modeling outputs. Thus, the approach selected implies starting from what we know, within a computed degree of certainty (site presence and their characteristics) and leads to model what we want to know (where is the highest probability to discover archaeological sites). 13.3.1 Identification According to the more recent literature (Fusco & de Runz, 2020; Martin-Rodilla et al., 2019; Gonzalez-Perez, 2018; Fusco, 2016; Niccolucci & Hermon, 2015; Oštir et al., 2007; Hatzinikolaou, 2006; Niccolucci & Hermon, 2004), after a first screening and pre-processing of the databases, a fuzzy quantification procedure was defined. The modelling procedure was then extended to compute the “numerical confidence values” or “numerical degree of membership” assigned to the settlement presences in both databases, which expresses the subjective level of ‘confidence’ in the assignment under consideration (Quality uncertainty). Although this coefficient of membership is assigned indeed “subjectively”, the procedure roots in a long tradition that has seen among its major exponents De Finetti (1970) for probability theory and Savage (1972) for statistics. Thus, the numeric values are the expression in numerical terms of a series of elements, evaluated subjectively, in which the experience and scientific correctness of the research converged (Hermon & Niccolucci, 2003) aiming to give the scientific status of measurability and verifiability to a reliable problem. 13.3.2 Quantification As shown in Table 13.1, a numeric value intended as a coefficient of membership was first assigned to the Typology of the discovery as classified and stored in the database. As mentioned above, this study focuses on roman settlements. Therefore, site types corresponding to settlements are assigned the maximum membership value of 1. Findings that may hint to the presence of settlements, but are not defined as such, are assigned a value of 0.75 and single and other findings (e.g. ceramic shreds, coins, etc.) the value 0.5. Considering that e.g. Fortifications/Water infrastructure/Religious sites are often discovered in close proximity to a settlement, the coefficient membership or degree of reliability assigned to this categories was defined as the highest, and so on for the others. The second step implied the assignment of a coefficient of membership to the Datation field entries, as shown in the Table 13.2. It comes with no surprise that datations corresponding to the roman period are assigned the value 1. A value of 0.25 was assigned to the medieval epoch, as settlements of that period are susceptible to have developed in continuum with former roman settlements.
304 Table 13.1 Coefficient memberships assigned to each class of the AG database M. E. Castiello Type uncertainty Settlement Fortification Water infrastructure Religious sites Graves Roads Bridges Quarry Single finds Others Unknown 1 1 1 1 0.75 0.75 0.5 0.5 0.5 0.5 0.5 Table 13.2 Coefficient memberships assigned to each chronological definition as defined in the AG database Period uncertainty Roman 1 Roman Empire 1 Medieval 0.25 Others 0 Table 13.3 The quantification of uncertain linguistic variables expressed with a degree of reliability value Quality uncertainty Sure 1 Unsure 0.75 Unknown 0.5 As mentioned above, the typological classification as well as the datation of each entry is labeled with a degree of reliability expressed by a qualitative expression. A degree of reliability value was assigned to the linguistic uncertainty terms that accompanied the Typology and Datation definitions as shown in the Table 13.3. Figure 13.8 shows the Typological uncertainty quantification as resulted from the coefficient membership of Type multiplied with the degree of reliability Quality of that definition. While Fig. 13.9 shows the Datation uncertainty derived from the coefficient membership of Period multiplied with the degree of reliability Quality of that definition. Finally, the Total uncertainty is calculated by multiplying the Typological uncertainty with the Datation uncertainty (Fig. 13.10). A similar, although more simple procedure was set up for computing the uncertainty in the GE database, as only one-dimensional information about uncertainty is provided in this database. Table 13.4 shows the uncertainty for the linguistic variable contained in the database, giving an appreciation of the reliability of the interpretation for each entry of the database.
13 Computational Processing of Language Vagueness for Archaeological Site. . . Fig. 13.8 Typological uncertainty quantification for the AG database Fig. 13.9 Datation uncertainty quantification for the AG database 305
306 M. E. Castiello Fig. 13.10 Total uncertainty quantification for the AG database Table 13.4 General uncertainty quantification for the GE database Quality uncertainty for Roman settlements in GE Known and/or excavated site Potential site extension (around known site) Presumed site Other epochs or no findings Sure Potential Unsure Absence 1 0.75 0.5 0 13.3.3 Modelling Both AG and GE databases were processed and computed first in a GIS (Geographical Information System) environment and secondly within R, a software environment for statistical computing and graphics (R Core team, 2018). The new elaborated datasets carried only the quantified uncertainty were integrated in the RF predictive modelling procedure. Specifically, for probability mapping the package randomForest (Liaw & Wiener, 2002) was used.
13 Computational Processing of Language Vagueness for Archaeological Site. . . 307 Random Forest (RF) algorithm (Breiman et al., 2018; Breiman & Cutler, 2010), a machine learning based approach capable of handling discrete values, was adapted here to estimate the probability of discovering archaeological Roman settlements in the two regions analyzed.2 Generally, the RF regression-based computations involve the use of several geo-environmental proxies. A short list of these proxies contained: Digital Elevation Model (DEM; altitude) and derivates (Slope, Northness and Eastness); Distance to water (lakes and rivers); Agricultural suitability; Depth of vegetal soil; Soil skeleton; Water saturation and Water storage capacity; Permeability and Nutrient storage capacity, and prone to influence the sites location, were combined with the pre-processed archaeological data (site presences and site pseudo absences) (Lotfian, 2016; Kulkarni & Sinha, 2012; Breiman, 2001; Liaw & Wiener, 2002). The parameters for the RF logistic regression models were then calibrated in the same way for both the regions, following the protocol as described in (Castiello 2022): • Define the input training and testing datasets, including predictor variables and response variables (1380 real presences and pseudo absences for AG and 241 real presences and pseudo absences for GE) • Perform an external cross-validation to add an additional accuracy measurement (Spatial K-cross validation; Valavi et al., 2019) • Select the number of trees to develop (1000) • Select the number of predictor variables for creating the binary rules for each split (4) 13.3.4 Visualization The output of the regression predictive models performed are thus visually expressed by the maps in Figs. 13.11 and 13.12. As mentioned, the regression returned a prediction on continuous values [0.0, . . . ,1]. These APMs show the probability of each pixel of the rasters to contain a Roman settlement. The probability values are expressed with a gradient color scale that goes from 0.0 to 1 and from light green to dark green, where dark green corresponds to a “sure” prediction for roman settlements, light green to an “unsure” prediction and white to “sure” prediction of the absence of roman settlements. The scale of uncertainty is thus reproduced by the color intensity which can be blurred proportional to the uncertainty related to the settlements Typology and Datation. As a result, the most uncertain values and areas all appear as the same shade of green. The range absence-unknown-unsure-presumed-potential-sure is integrated and reproduced in the prediction outputs. 2 A comprehensive and detailed description of RF functioning and application in archaeological context and predictive modelling can be found for example in Castiello M.E. (2022) and Castiello and Tonini (2021).
308 M. E. Castiello Fig. 13.11 Predictive map for the Roman archaeological settlements in Canton Aargau 13.4 Conclusion This Chapter explored the concept of processing and modeling uncertain information expressed through linguistic variables in an archaeological context. It sets out how to explore, describe, quantify, process, and finally visualize uncertainty as crucial steps of the archaeological research process. The aim was to provide an overview of processing and quantification approaches and to further integrate uncertain values within an innovative predictive modeling
13 Computational Processing of Language Vagueness for Archaeological Site. . . 309 Fig. 13.12 Predictive map for the Roman archaeological settlements in Canton Geneva framework. The methodology developed accounts for the effects of quantified uncertainty in the institutional archaeological databases of the two selected case studies through a Fuzzy approach, as well as for presenting accessible methods to model the location of Roman archaeological sites by using cutting edge technologies of Machine Learning. Given a set of environmental features selected as influential factors in site location preferences, an innovative application of Random Forest algorithm for
310 M. E. Castiello computing the probability to discover Roman archaeological sites in the two regions analyzed was proposed. Finally, a procedure for uncertainty visualization was selected from the list of well known techniques in use in natural sciences, to help addressing archaeological uncertain information to the very final prediction output in an effective manner. First, different definitions and categorizations of uncertainty and ignorance were presented to better frame the uncertainty in the original databases, essentially expressed as vague information. Secondly, previous approaches to archaeological uncertainty were examined and a suitable mathematical method was selected and described, for quantifying an expert’s belief based on limited knowledge, to archaeological scenarios. Fuzzy sets theory was explained in detail and applied to the archaeological case studies. Third, the uncertainty quantification procedure was integrated into the predictive modeling framework based on Random Forest algorithm. The regression analysis performed produced advanced predictive maps, highlighting zones with highest and lowest probability to discover Roman archaeological settlements, given a set of environmental proxies. The various levels of uncertainty expressed by linguistic terms in the databases are integrated in calculations and transposed in the final result by means of visualizing the degree of uncertainty with color grades. While an extensive literature on spatial and geo-historical data imprecision exists, the literature on uncertainty in archaeological research contexts highlights some cumbersome issues. Quantification, processing and visualization of archaeological uncertainty are processes still in their infancy and at their conceptual stage when compared to other disciplines. Reasoning with uncertainty or with imperfection is unilaterally difficult, but in line with recent contributions to the uncertainty quantification research, it is argued that complying with the various dimensions of archaeological data imperfection will prevent us from assessing hypotheses on past settlement patterns that are too rigid and restrictive. Indeed, as recent studies have revealed, some types of modelling, for example through fuzzy logic and fuzzy set theory, can broaden the horizons of archaeological research, and the right visualization methods can improve decision-making in a variety of diverse contexts, from hazard forecasting to healthcare communication and certainly also in archaeology and cultural heritage management. References Balla, A., Pavlogeorgatos, G., Tsiafakis, D., & Pavlidis, G. (2013). Locating Macedonian tombs using predictive modelling. Journal of Cultural Heritage, 14(5), 403–410. Barceló, A., & Bogdanovic, I. (Eds.). (2015). Mathematics and archaeology. Taylor & Francis. Barceló, J. A., & Pallarés, M. (1998). Beyond GIS: The archaeology of social spaces. Archaeologia e Calcolatori, 1, 47–80. Bevan, A., Crema, E.R., Li, X., & Palmisano, A. (2013). Intensities, Interactions and Uncertainties: Some New Approaches to Archaeological Distributions. In Computational Approaches to
13 Computational Processing of Language Vagueness for Archaeological Site. . . 311 Archaeological Space, edited by A. Bevan, and M. Lake, 27–52. Walnut Creek: Left Coast Press. Breiman, L. (2001). Random forests. Machine Learning, 45, 15–32. Breiman, L., & Cutler, A. (2010). Random forests. Available at: http://www.stat.berkeley.edu/ ~breiman/RandomForests/ Breiman, L, Cutler, A, Liaw, A., & Wiener, M. (2018). Breiman and Cutler’s Random Forests for Classification and Regression. R package version 4.6–14. https://doi.org/10.1023/ A:1010933404324 Brodlie, K., Allendes, R. O., & Lopes, A. (2012). A review of uncertainty in data visualization. In J. Dill et al. (Eds.), Expanding the frontiers of visual analytics and visualization (pp. 81–109). Springer. Brouwer Burg, M., Peeters, H., & Lovis, W. A. (Eds.). (2012). Uncertainty and sensitivity analysis in archaeological computational modeling. Springer/University of California. Buck, C. E., Cavanagh, W., & Litton, C. D. (1996). Bayesian approach to interpreting archaeological data. Wiley. Castiello, M. E. (2022). Computational and machine learning tools for archeological site modeling. Springer. ISBN : 978-3-030-88566-3 Castiello, M. E., & Tonini, M. (2021). An explorative application of random forest algorithm for archaeological predictive modelling. A Swiss case study. Journal of Computer Applications in Archaeology, 4, 110–125. Conolly, J., & Lake, M. (2006). Geographical information systems in archaeology (p. 338). Cambridge University Press. Crema, E. R., Bevan, A., & Lake, M. (2010). A probabilistic framework for assessing spatiotemporal point patterns in the archaeological record. Journal of Archaeological Science, 37(5), 1118–1130. De Finetti, B. (1970). Teoria delle probabilità, Sintesi introduttiva con appendice critica. Einaudi. De Runz, C., Desjardin, E., Piantoni, F. Herbin, M. (2007). Using fuzzy logic to manage uncertain multi-modal data in an archaeological GIS. International symposium on Spatial Data Quality, Pays-Bas, Enschede. Dempster, A. P. (1967). Upper and lower probabilities induced by a multi-valued mapping. Annals of Mathematical Statistics, 38, 325–339. Desachy, B. (2012). Formaliser le raisonnement chronologique et son incertitude en archeologie de terrain. Cybergeo: European Journal of Geography, Systemes, Modelisation, Geostatistiques, document 597. Ducke, B. (2003). Archaeological predictive modelling in intelligent network structure. In M. Doerr & A. Sarris (Eds.), Proceedings of the 29th conference of the computer applications in archaeology (pp. 267–273). Hellenic Ministry of Culture. Ducke, B. (2014). An integrative approach to archaeological landscape evaluation: Locational preferences, site preservation and uncertainty mapping. The Archaeology of Erosion, the Erosion of Archaeology, 1, 13–22. Ducke, B. & Münch U., 2005. Predictive modelling and the archaeological heritage of Brandenburg (Germany) (M. van Leusen & H. Kamermans, Eds.) (pp. 93–107). Ejstrud, B., 2003. Indicative models in landscape management: Testing the methods. The archaeology of landscapes and geographic information systems. Predictive maps, settlement dynamics and space and time in prehistory (J. Kunow & J. Müller, Eds.) (pp. 119–134). Ejstrud, B. (2005). Taphonomic models. Using Dempster-Shafer theory to assess the quality of archaological data and indicative models (H. Kamermans & M. van Leusen, Eds.) (pp. 189– 198). Espig, M., Finlay-Smits, S.C., Meenken, E.D., Wheeler, D.M., Sharifi, M., Shah, M., 2020. Understanding and communicating uncertainty in data-rich environments: Towards a transdisciplinary approach. In: Nutrient management in farmed landscapes. (Eds.) C.L. Christensen, D.J. Horne and R. Singh). Occasional Report No. 33. Farmed Landscapes Research Centre, Massey University, .
312 M. E. Castiello Evans, A. (2012). Uncertainty and error. In A. J. Heppenstall, A. Crooks, L. M. See, & M. Batty (Eds.), Agent-based models for geographical systems. Springer. Farinetti, E., Hermon, S., & Niccolucci, F. (2004). Fuzzy logic application to artefact surface survey data. In F. Niccolucci & S. Hermon (Eds.), Beyond the artifact: Digital interpretation of the past: Proceedings of CAA 2004 (pp. 125–129). Budapest. Favory, F., & Nuninger, L. (2008). ArchaeDyn. Dynamique spatiale du peuplement et ressources naturelles: vers une analyse intégrée dans le long terme, de la Préhistoire au Moyen Age, ArchaeDyn, Rapport d’activité scientifique 2005–2007, p. 71. Fernandes, M., Walls, L., Munson, S., et al. (2018). Uncertainty displays using quantile dotplots or CDFs improve transit decision-making. In Proceedings of the 2018 CHI conference on human factors in Computing Systems, ACM, p. 144. Fischer, P., Comber, A., Wadsworth, R. (2006). Approaches to Uncertainty in Spatial Data. In R. Devillers & R. Jeansoulin (Eds.) Fundamentals of Spatial Data Quality. Wiley, ISBN: 9780470612156 Fisher, P. F. (2006). Models of uncertainty in spatial data. In P. A. Longley, M. F. Goodchild, D. J. Maguire, & D. W. Rhind (Eds.), Geographical information systems: Principles, techniques, management and applications (pp. 191–205). Wiley. Fusco, J. (2016). Analyse des dynamiques spatio-temporelles des systèmes de peuplement dans un contexte d’incertitude: Application à l’archéologie spatiale. University Nice Sophia Antipolis. Retrieved from https://tel.archives-ouvertes.fr/tel-01341554 Fusco, J., & de Runz, C. (2020). Spatial fuzzy sets. In M. Gillings, P. Hacıgüzeller, & G. Lock (Eds.), Archaeological spatial analysis. A methodological guide. Routledge. Gacôgne, L. (2003). Logique floue et applications (p. 128). Institut d’informatique d’entreprise d’Evry. Gonzalez-Perez, C. (2018). Information modelling for archaeology and anthropology. Software engineering principles for cultural heritage. Springer. Goodchild, M. F. 2003. The nature and value of geographic information. In: M. Duckham, M. F. Goodchild, & M. Worboys (Eds.), Foundations of geographic information science : Taylor & Francis. pp. 18–30. Gupta, N. (2020). Preparing archaeological data for spatial analysis. In M. Gillings, P. Hacıgüzeller, & G. Lock (Eds.), Archaeological spatial analysis. A methodological guide. Routledge. Hájek, P. (1998). Metamathematics of fuzzy logic. Kluwer. Halpern, J. Y. (2003). Reasoning about uncertainty. MIT Press. Hatzinikolaou, E. G. (2006). Quantitative methods in archaeological prediction: From binary to fuzzy logic. In M. W. Mehrer & K. L. Wescott (Eds.), GIS and archaeological site location modelling (pp. 437–446). Taylor & Francis. Hatzinikolaou, E. G., Hatzichristos, T., Siolas, A., & Mantzourani, E. (2003). Predicting archaeological site locations using GIS and fuzzy logic. In M. Doerr & A. Sarris (Eds.), The digital heritage in archaeology: Computer applications and quantitative methods in archaeology (pp. 169–178). Archive of Monuments and Publications, Hellenic Ministry of Culture. Hermon, S., & Niccolucci, F. (2003). A Fuzzy Logic Approach to Typology in Archaeological Research. In M. Doerr and A. Sarris (Eds), The digital Heritage of Archaeology. Athens, Archive of Monuments and Publications. 307–310. Hullman, J., Resnick, P., & Adar, E. (2015). Hypothetical outcome plots outperform error bars and violin plots for inferences about reliability of variable ordering. PLoS One, 10(11), e0142444. Hullman, J., Qiao, X., Correll, M., et al. (2018). In pursuit of error: A survey of uncertainty visualization evaluation. IEEE, 25(1), 903–913. Jaroslaw, J., & Hildebrandt-Radke, I. (2009). Using multivariate statistics and fuzzy logic system to analyse settlement preferences in lowland areas of the temperate zone: An example from the Polish Lowlands. Journal of Archaeological Science, 36(10), 2096–2107. Kamermans, H., Deeben, J., Hallewas, D., Zoetbrood, P., van Leusen, M., & Verhagen, P. (2005). Project proposal. In M. van Leusen & H. Kamermans (Eds.), Predictive modelling for archaeological heritage management: A research agenda (Nederlandse Archeologische Rapporten 29) (pp. 13–23). Rijksdienst voor het Oudheidkundig Bodemonderzoek.
13 Computational Processing of Language Vagueness for Archaeological Site. . . 313 Kinkeldey, C., MacEachren, A. M., & Schiewe, J. (2014). How to assess visual communication of uncertainty? A systematic review of geospatial uncertainty visualisation user studies. Cartography and Geography, 51(4), 372–386. Kinkeldey, C., MacEachren, A. M., Riveiro, M., & Schiewe, J. (2017). Evaluating the effect of visually represented geodata uncertainty on decision-making: systematic review, lessons learned, and recommendations. Cartography and Geography Information Science, 44(1), 1– 21. https://doi.org/10.1080/15230406.2015.1089792 Kulkarni, V. Y., & Sinha, P. K. (2012). Pruning of Random Forest classifiers: A survey and future directions. In International Conference on Data Science & Engineering (ICDSE), Cochin, Kerala, 2012 (pp. 64–68). https://doi.org/10.1109/ICDSE.2012.6282329 Kvamme, K. L. (1990). The fundamental principles and practice of predictive archaeological modeling. In A. Voorrips (Ed.), Mathematics and information science in archaeology: A flexible framework (pp. 275–295). HOLOSVerlag. Leung, Y. (1983). Fuzzy sets approach to spatial analysis and planning, a nontechnical evaluation. Geografiska Annaler. Series B, Human Geography, 65(2), 65–75. Liaw, A., & Wiener, M. (2002). Classification and regression by Random Forest. R News, 2(3), 18–22. Lieskovský, T., Ďuračiová, R., & Karell, L. (2013). Selected mathematical principles of archaeological predictive models creation and validation in the GIS environment. Interdisciplinaria archaeologica. Natural Sciences in Archaeology, 4(2), 33–46. Lock, G., & Harris, T. M. (1996). Danebury revisited: An English iron age hillfort in a digital landscape. In M. Aldenderfer & H. D. G. Maschner (Eds.), Anthropology, space and geographic information systems (pp. 214–240). Oxford University Press. Lotfian, M. 2016. Urban climate modeling, case study of Milan city. Master thesis, Politecnico di Milano. MacEachren, A. M., Roth, R. E., O’Brien, J., et al. (2012). Visual semiotics & uncertainty visualization: An empirical study. IEEE Transactions on Visualization and Computer Graphics, 18(12), 2496–2505. https://doi.org/10.1109/TVCG.2012.279 MacEachren, A. M., Roth, R. E., O’Brien, J., et al. (2018). Visual semiotics & uncertainty visualization: an empirical study. IEEE Trans. Vis. Comput. Graph., 18 (12), 2496–2505. http:/ /doi.org/10.1109/TVCG.2012.279 Martin-Rodilla, P., Pereira-Farina M., Gonzalez-Perez, C. 2019. Qualifying and quantifying uncertainty in digital humanities: A fuzzy-logic approach. In Seventh international conference on technological ecosystems for enhancing multiculturality, 16–18 October 2019, Leon. McBurney, P., & Parsons, S. (2001). Representing epistemic uncertainty by means of dialectical argumentation. Annals of Mathematics and Artificial Intelligence, 32(1–4), 125–169. Mink, P., Ripy, J., Bailey, K., & Grossardt, T. H. (2009). Predictive archaeological modeling using GIS-based fuzzy set estimation: A case study in Woodford County, Kentucky (Kentucky Transportation Center Faculty and Researcher Publications. 12). https://uknowledge.uky.edu/ ktc_facpub/12 Moraczewski, I. R. (1993). Fuzzy logic for phytosociology II. Generalizations and predictions. Vegetatio, 106(1), 13–20. Morrison, M. S. (2015). Reconstructing reality: Models, mathematics, and simulations. Oxford University Press. Munzner, T. (2014). Visualization analysis and design. CRC Press. Nagypál, G., & Motik, B. (2003). A fuzzy model for representing uncertain, subjective, and vague temporal knowledge in ontologies. In R. Meersman, Z. Tari, & D. C. Schmidt (Eds.), On the move to meaningful internet systems. Springer. Niccolucci, F., & Hermon, S. (2003). La logica fuzzy e le sue applicazioni alla ricerca archeologica. Archeologia e Calcolatori, 14, 97–110. Niccolucci, F., & Hermon, S. (2004). A fuzzy logic approach to reliability in archaeological virtual reconstruction, in Proceedings of the 2004 Computer Applications in Archaeology (CAA) Conference
314 M. E. Castiello Niccolucci, F., & Hermon, S. (2015). Time, chronology and classification. In J. A. Barceló & I. Bogdanovic (Eds.), Mathematics and archaeology. Taylor & Francis. Niccolucci, F., D’Andrea, A., & Crescioli, M., 2001. Archaeological applications of fuzzy databases. In Z. Stančič & T. Veljanovski (Eds.), Computing archaeology for understanding the past. CAA 2000. Computer applications and quantitative methods in archaeology. Proceedings of the 28th conference, Ljubljana, April 2000, pp. 107–116. Oštir, K., Kokalj, Ž., Saligny, L., Tolle, F., Nunninger, L., avec la collaboration de F. Pennors et K. Zaksek. (2007). Confidence maps: A tool to evaluate archaeological data’s relevance in spatial analysis. In Layers of perception. Proceedings of the 35th computer applications and quantitative methods in archaeology conference, Berlin, Germany, April 2–6, 2007, Bonn, pp. 272–277. Padilla, L. M. K, Powell, M, Kay, M., & Hullman, J. (2021). Uncertain About Uncertainty: How Qualitative Expressions of Forecaster Confidence Impact Decision-Making With Uncertainty Visualizations. Front. Psychol. 11:579267. https://doi.org/10.3389/fpsyg.2020.579267 Piotrowski, M. (2019). Accepting and modeling uncertainty. In v. A. Kuczera, T. Wübbena, & T. Kollatz (Eds.), Die Modellierung des Zweifels – Schlüsselideen und -konzepte zur graphbasierten Modellierung von Unsicherheiten (Zeitschrift für digitale Geisteswissenschaften, 4). Ragin, C. C. (2000). Fuzzy-set social science. University of Chicago Press. Ramos-Soto, A., Alonso, J. M., Reiter, E., & van Deemter, K. (2017). An empirical approach for modeling fuzzy geographical descriptors. IEEE. Refsgaard, J. C., van der Sluijs, J. P., Etejberg, A. L., & Vanrollegham, P. A. (2007). Uncertainty in the environmental modeling process—A framework and guidance. Environmental Modeling and Software, 22, 1543–1556. Roberts, D. W. (1986). Ordination on the basis of fuzzy set theory. Vegetatio, 66, 123–131. Rogers, S. R., Fischer, M., & Huss, M. (2014). Combining glaciological and archaeological methods for gauging glacial archaeological potential. Journal of Archaeological Science, 52, 410–420. https://doi.org/10.1016/j.jas.2014.09.010 R Core Team. (2018). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Available at http://www.R-project. org/ Sattler, R. (1996). Classical morphology and continuum morphology: Opposition and continuum. Annals of Botany, 78, 577–581. Savage, L. (1972). The foundation of statistics. Dover. Shafer, G. (1976). A mathematical theory of evidence. Princeton University Press. Shannon, C. E. (1948). A mathematical theory of communications. The Bell System Technical Journal, 27, 379–432. Smithson, M. (1989). Ignorance and Uncertainty: Emerging Paradigms. New York: SpringerVerlag. https://doi.org/10.1007/978-1-4612-3628-3 Spiegelhalter, D. (2017). Risk and uncertainty communication. Annual Review of Statistical Applications, 4, 31–60. Taheri, S. M., Ghadim, F. I., & Kabirian, M. (2019). Application of fuzzy inference systems in archaeology. In 7th Iranian joint congress on Fuzzy and Intelligent System, Iran, Bojnurd, 29– 31 January 2019. Valavi, R., Elith, J., & Guillera-Arroita, G. (2019). blockCV: An r package for generating spatially or environmentally separated folds for k-fold cross validation of species distribution models. Methods in Ecology and Evolution, 10(2), 225–232. https://doi.org/10.1111/2041-210X.13107 van der Leeuw, S. 2016. Uncertainties. In: Brouwer Burg, M Peeters J and Lovis W (Eds.) Uncertainty and sensitivity analysis in archaeological computational modeling. Springer. Van Leusen, P. M. (2002). Pattern to process: Methodological investigations into the formation and interpretation of spatial patterns in archaeological landscapes. PhD thesis, Faculty of Arts. Available at: http://dissertations.ub.rug.nl/faculties/arts/2002/ Van Leusen, M., Millard, A. R., & Ducke, B. (2009). Dealing with uncertainties in archaeological prediction. In H. Kamermans, M. van Leusen, & P. Verhagen (Eds.), Archaeological prediction and risk management: Alternatives to current practice. (pp. 123–160). Leiden: Leiden University Press.
13 Computational Processing of Language Vagueness for Archaeological Site. . . 315 Vaughn, S., & Crawford, T. (2009). A predictive model of archaeological potential: An example from northwestern Belize. Applied Geography, 29(4), 542–555. Verhagen, P. (2007). Case studies in archaeological predictive modelling. PhD thesis, Leiden University Press. Yager, R. R., & Filev, D. P. (1994). Essentials of fuzzy modeling and control. Wiley. Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8, 338–355. Zikmund-Fisher, B. J., Witteman, H. O., Dickson, M., et al. (2014). Blocks, ovals, or people? Icon type affects risk perceptions and recall of pictographs. Medical Decision Making, 34(4), 443– 453.
Part III The Future
Chapter 14 Future Directions Cesar Gonzalez-Perez, Martín Pereira-Fariña, and Patricia Martín-Rodilla Now we reach the end of the book. Over 13 chapters, we have described a number of conceptual approaches and computational techniques for discourse and argumentation analysis in archaeology. Our aim has been to offer a consolidated and integrated view of various works and research lines, which are often found scattered across different fields. By combining the expertise of specialists in discourse analysis, argumentative analysis, natural language processing, archaeology, and digital humanities, we hope to have succeeded in our goal. In Part I, composed of 7 chapters, we have addressed the ways in which the production and understanding of different genres of archaeological discourse (technical reports, scientific papers, dissemination documents) can benefit from incorporating discourse analysis methodologies and principles. In Part II, composed of 5 chapters, we have described a sample of computational techniques that can be applied to partially or fully automate most of the conceptual approaches described in Part I. C. Gonzalez-Perez (!) Incipit CSIC, Santiago de Compostela, Spain e-mail: cesar.gonzalez-perez@incipit.csic.es M. Pereira-Fariña Department of Philosophy, University of Santiago de Compostela, Santiago de Compostela, Spain e-mail: martin.pereira@usc.es P. Martín-Rodilla Department of Computer Science and Information Technologies, University of A Coruña, A Coruña, Spain e-mail: patricia.martin.rodilla@udc.es © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 C. Gonzalez-Perez et al. (eds.), Discourse and Argumentation in Archaeology: Conceptual and Computational Approaches, Quantitative Archaeology and Archaeological Modelling, https://doi.org/10.1007/978-3-031-37156-1_14 319
320 C. Gonzalez-Perez et al. 14.1 Areas to Develop This book is just a starting point. Although computer systems are widely used today to store and process data, their usage for the storage and processing of discourses is still uncommon. We hope that this book can work as a guide to elaborate guidelines and protocols to aid in the adoption of the proposed approaches and techniques by archaeologists, as well as help in the implementation of computer systems that are capable of processing discourses and arguments as their primary kind of information. To achieve this, a few areas need to be developed further. First, we believe that natural language processing and other language technologies are being primarily applied to the analysis of plain text, that is, text that has been decoupled from its context and purpose. Current approaches and algorithms are capable of parsing, analysing and manipulating the elements in a piece of text, but this is not sufficient. We argue that language technologies like these should aim to raise their level of abstraction so that they can deal with discourses, rather than texts, that is, language as being used by particular agents with specific purposes and in specific social contexts. In this manner, the context where a discourse is produced, the intention of the speaker, and the relationships between the discourse and the entities in the world would become an object of study in addition to the lexicon, structure and meaning of the text itself. Incorporating these contextual aspects will require new conceptualisations and new computational techniques. Second, visualisation and dissemination techniques must be improved. Displaying the result of an argument analysis on a computer screen, for example, requires a large amount of screen real estate, and still fails to convey the necessary details for a comprehensive and deep understanding of what is being represented. We need to devise new visualisation techniques that can easily expose the large and complex networks of discourse elements on a two-dimensional screen, either for researchers or for the general public as a dissemination vehicle. The addition of interaction to these visual devices would add an extra layer of value to the exploration and comprehension of argumentation structures. This interaction can be implemented in different ways, such as by means of the automatic production of linguistic summaries as a response to queries formulated by users, or even conversational agents that would allow users to engage in a conversation that dynamically navigates the results of the argument analysis. A third area that needs further development is that of the applicability of these conceptual approaches and computational techniques. This book has been specially oriented towards archaeology, but nothing precludes the presented approaches and techniques from being applied also to anthropology, history, and other related disciplines. Doing this will probably require new conceptual developments as well as discipline-specific trials and experiments.
14 Future Directions 321 14.2 The Future Of course, anything that we say about the future is quite speculative, but we want to conclude by examining some plausible scenarios where we may find ourselves in the near future. Once the areas described in the previous section are properly developed, we will be able to attain new standards with regard to discourse and argumentation analysis in archaeology. First and foremost, open and public science will be truly achievable, as the justification of knowledge will be almost universally accessible. In other words, if we are able to unpack, visualise and explain the complete argumentative structure of any claim, anyone will be able to assess how well supported that knowledge is. Second, a new kind of scientific repository will become possible. We know dataset repositories, which store and make available datasets for public reuse; or document repositories, which store and serve documents. By developing the areas described above, however, we would be able to create argument repositories, which would contain large meshes of interconnected entities, claims about these entities, and argumentation relationships (such as inferences, conflicts or rephrasings) between them. These meshes would be constructed from many different works by many different authors, so a truly multi-vocal and intertextual account of the archaeological record would be possible. Tasks that today are cumbersome and difficult, such as gathering comprehensive bibliography on a particular site or find, would become extremely easy. Third, knowledge generation from argument repositories through artificial intelligence (AI) would become possible. Current AI techniques are capable of detecting patterns, mining for hidden relationships, and learning about them. Once argument repositories are available, current and future AI approaches will be able to generate new knowledge in an automated or semi-automated manner. This will allow the scientific community to detect and fight against fraudulent research where data or evidences are falsified or strongly biased in favour of any given agenda. In addition to these advances, some new challenges and issues will appear as well. First, open argument repositories will be vulnerable to security and safety threats, so proper measures will need to be implemented. Given the fact that these repositories will be expected to constitute a reliable source of knowledge, their robustness against tampering or accidental misuse should be critical concerns. Second, bias will be injected in argument repositories. Human knowledge is never exempt of bias, which would be certainly captured and “fossilised” into repositories, as it is today captured and preserved in reports or books. It is difficult to assess how big a problem this will be. On the one hand, argument repositories as described above will be much larger and complex than any book or resource that exists today, so that bias will accumulate in larger quantities. On the other hand, argument repositories will contain specific features to deal with multi-vocality and subjective perspectives, so that the fact any particular claim is subjectively biased may not constitute a significant problem after all. In any case, we need to be aware of bias injection, and design systems that can handle it adequately.
322 C. Gonzalez-Perez et al. Third, new knowledge generated through the application of AI on argument repositories could result, in some occasions, in morally challenging or even unacceptable situations. We can imagine, for example, that an AI system reaches the conclusion, by working on an argument repository, that some highly valued archaeological monument must be abandoned, destroyed, or interpreted according to some specific perspective to the detriment of others. Finding a moral justification for this kind of results will be difficult. Explainable AI is an active area of research today, so we may expect to see AI approaches in the near future that are better at explaining why a result is the way it is, thus allowing us to find improved moral justifications for them. Fourth, this envisioned future may entail a threat to creativity in archaeology. If computer-based argument repositories are taken as a reliable major source of knowledge, the human role in devising new research questions, proposing hypothesis, making interpretations and overall producing knowledge is likely to be challenged. Even when unaware, archaeologists will possibly become biased by the very information in the repositories. Again, it is difficult to forecast how much the built-in multi-vocality and subjectivity management features of our future knowledge systems will be able to guard us from this. We leave you with these thoughts. Language is perhaps the most human of traits and, in this book, we have argued that discourse and argumentation in archaeology can be somehow tamed through analysis and computerisation. How far can we go down this road? How far should we venture?