Discourse and Argumentation in Archaeology: Conceptual and Computational Approaches - Gonzalez-Perez C., Martin-Rodilla P., Pereira-Fariña M.

1 Introduction to Discourse Analysis and Argumentation Theory
1.1 Introduction
1.3.1 Pre-campaign
1.3.1.1 Creating the Raw Corpus
1.4 Exploiting the Results
1.4.1 Descriptive Metrics
2 Discourse and Argumentation in Archaeology
2.1 History
2.2 Representing Inference Chains in CRMbase
2.4 The Extension of CRMinf to Cover Scholarly Reading
3 Making Good Arguments in Archaeology
3.1 Introduction
3.2 The Historical Lack of Attention to Argumentation in Archaeology
3.3 Archaeological Argumentation
3.3.1 The Importance of Testing
3.3.2 The Decline of Argument by Analogy
3.3.3 The Popularity of Abstract Social Theory
3.5 Using Warrants to Distinguish Strong and Weak Arguments
4 A Causal Model Application to a Cultural Heritage Sentence Analysis
4.1.1 Analysis of a Sentence About Cultural Heritage
4.2 Conditional Logic and Causal Logic
4.2.1 Strict and Material Conditional
4.2.2 Indicative Versus Subjunctive or Counterfactual Conditional
4.2.3 Positive and Negative Causality
4.3 Conditionality and Causality in Natural Sciences and Law
The Background
Fundamentals of Law
FIRST. Approach to the Contentious-Administrative Appeal
SECOND. Relevant Background for the Resolution of the Case
THIRD. On the Inadmissibility of the Claim of Patrimonial Responsibility
FOURTH. Regarding the Patrimonial Responsibility as a Result of the Registration of the Alkerdi Berroberria System as LHI in the Register of Listed Heritage Items of Navarra
SIXTH. On the Declaration of LHI by the Ministry of Law and the Protection of the Alkerdi Berroberría System
5 What Archaeological Texts Argue About
5.1 Introduction
6 The Social Production of Discourse in Archaeology
6.1 Introduction
6.2 Approaching Archaeological Discourses
6.2.1 Whereabouts of Archaeological Discourses
6.2.2 Discussants in Archaeological Discourses
6.2.3 Approaches to Analysing Discourses
6.3 Characteristics of Archaeological Discourses
6.3.1 Social and Societal Underpinnings
6.3.3 Structural and Infrastructural Scaffolding
6.4 Understanding the Discursive Production of Archaeological Knowledge Matters
7 Dealing with Vagueness in Archaeological Discourses
7.1 Philosophical Groundings of Vagueness
7.1.1 Philosophical Approaches to Vagueness
7.2 Computational Treatments of Vagueness
7.3 Concept of Vagueness as a Conceptual Modelling Issue
7.3.2 Vagueness Variables
7.3.2.1 Imprecision
7.3.2.2 Inaccuracy
7.3.3 Relationships Between Vagueness Variables
7.3.3.1 Imprecision Decreases Uncertainty
7.3.3.2 Imprecision Increases Inaccuracy
7.3.3.3 Error Increases Uncertainty
7.4 Empirical Study
7.4.1 Measuring Imprecision and Uncertainty
8 Extending Discourse Analysis in Archaeology
8.1 Archaeology and Discourse Analysis
8.3 Discourse Analysis and Visualisation
8.5 Multimodal Analysis and Archaeological Discourse
8.6 Digital Multimodal Discourse Analysis
9.2.1 An Overview on Natural Language Computational Analysis in Archaeology
9.3 Where Archaeological Discourse and Computers Meet
9.3.1 Computational Analysis of Discourse
9.3.2 Applications in Archaeological Discourse
9.4 Computer Processing of Language and Discourse in Archaeology
References
10 NLP and Archaeology
10.1 Introduction
10.2 Archaeology and Unpublished Fieldwork Reports
10.3 Metadata Challenges
10.4 The Archaeotools Project
10.6 The ADS and NLP at the University of York
11 Information Extraction and Machine Learning for Archaeological Texts
11.1 Introduction
11.2.3 Topic Modelling
11.2.4 Information Retrieval
11.3 Previous Research on Information Extraction in the Archaeology Domain
11.4.2 Optical Character Recognition
11.4.3 Sentence Boundary Detection
11.4.5 Normalisation
11.4.5.1 Lowercasing
11.4.5.2 Removing Words
11.4.5.3 Stripping Characters
11.4.5.4 Stemming
11.4.5.5 Lemmatisation
11.4.5.6 Normalisation and Information Loss
11.4.6 Adding Structure
11.4.6.1 Term Frequency and Inverse Document Frequency
11.4.7 Selecting Preprocessing Steps
11.5 Machine Learning
11.5.1 Supervised and Unsupervised Learning
11.5.3 Commonly Used Algorithms for Information Extraction
11.5.4 Evaluation and Performance Metrics
12 Argument Mining and Analytics in Archaeology
12.1 Introduction
12.2 Applying Argument Technologies in Archaeology
12.6.1 Simple Statistics
12.6.2 Comparative Statistics
13 Computational Processing of Language Vagueness for Archaeological Site Modelling
13.3.1 Identification
13.3.2 Quantification
Автор: Gonzalez-Perez C. Martin-Rodilla P. Pereira-Fariña M.
Теги: history archaeology archaeological research
ISBN: 978-3-031-37155-4
Год: 2023
Похожие
Milestones in Archaeology: A Chronological Encyclopedia
Advances on Intelligent Informatics and Computing
Modern Statistics: A Computer-Based Approach with Python
Caucasian Albania
Текст
                    Quantitative Archaeology and Archaeological Modelling

Cesar Gonzalez-Perez
Patricia Martin-Rodilla
Martín Pereira-Fariña Editors

Discourse and
Argumentation
in Archaeology:
Conceptual and
Computational
Approaches

Quantitative Archaeology and Archaeological
Modelling
Series Editors
Andrew Bevan, University College London, London, UK
Oliver Nakoinz, Institut für Ur- und Frühgeschichte, University of Kiel, Kiel,
Germany

Quantitative approaches and modelling techniques have played an increasingly
significant role in archaeology over the last few decades, as can be seen both
by their prominence in published research and in university courses. Despite this
popularity, there remains only a limited number of book-length treatments in
archaeology on these subjects (with the exception perhaps being general-purpose
GIS). ‘Quantitative Archaeology and Archaeological Modelling’ is a book series
that therefore responds to this need for (a) basic, methodologically transparent,
manuals for teaching at all levels, (b) good practice guides with a series of
reproducible case studies, and (c) higher-level extended discussions of bleeding
edge problems. This series is also intended to be interdisciplinary in the analytical
theory and method it fosters, international in its scope, datasets, contributors and
audience, and open to both deliberately novel and well-established approaches.

Cesar Gonzalez-Perez • Patricia Martin-Rodilla •
Martín Pereira-Fariña
Editors

Discourse and
Argumentation in
Archaeology: Conceptual
and Computational
Approaches

Editors
Cesar Gonzalez-Perez
Incipit, CSIC
Santiago de Compostela, Spain
Martín Pereira-Fariña
Facultade de Filosofía
University of Santiago de Compostela
Santiago de Compostela, Spain

Patricia Martin-Rodilla
Facultade de Informática
Universidade de A Coruña
A Coruna, Spain

ISSN 2366-5998
ISSN 2366-6005 (electronic)
Quantitative Archaeology and Archaeological Modelling
ISBN 978-3-031-37155-4
ISBN 978-3-031-37156-1 (eBook)
https://doi.org/10.1007/978-3-031-37156-1
This work was supported by COST Action “Saving European Archaeology from the Digital Dark Age”
(SEADDA), CA 18128, https://www.cost.eu/actions/CA18128/ (CA 18128) and by grant PID2020114758RB-I00 funded by MCIN/AEI/10.13039/501100011033
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland
AG 2023
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Paper in this product is recyclable.

Foreword

How do archaeologists come to establish and validate accounts of the past, based
on their encounters with its material remains as mediated by fieldwork, collections,
and years of study and toil? How do they justify claims they make, and on what
grounds do they accept or reject claims made by others? How do they reach good
decisions as they investigate, construct, curate, and communicate the archaeological
record? What are archaeological facts, and how do they come to be accepted as
such? What are the traits of sound archaeological syllogisms? And, more generally,
what is archaeological knowledge? Where can we find it, and in which forms does
it manifest itself? How can it be captured, represented, and analyzed? How is it
communicated, debated, and evaluated? Is there “good” and “bad” archaeological
knowledge, and how can we tell them apart? Which factors are at play in knowledgemaking, and in knowing? What are the implications and stakes of archaeological
knowledge, and the ways it comes into being?
Few archaeologists spend much time reflecting directly on this Pandora’s box
of vexing questions. Yet many of them, prompted by engaging with the transdisciplinary perspectives in this exciting volume on the use of computational
approaches to discourse and argument analysis in archaeology, are central to
methodological aspects of archaeological research, and to the acquisition of archaeological expertise. For one thing, competent archaeologists should surely be able
to reason on the validity of an archaeological study in their area of expertise, and,
beyond that, to produce research findings substantiated by persuasive arguments,
supported by reliable evidence, and consonant with accepted knowledge in their
field. On the other hand, scholars of archaeological theory, as well as those
concerned with policies, decision-making, and interventions related to the preservation of archaeological heritage, its multiple and often conflicting socioeconomic,
cultural and symbolic uses, and the future of archaeological work, need also to
grapple routinely with questions related to the factors under which archaeological
knowledge is produced, the felicity conditions under which archaeological facts can
be deemed to be acceptable, and the status, impact, and repercussions of resulting
knowledge for contemporary societies. In almost all aspects of archaeological work,
researchers and professionals are inevitably entwined in knowledge-laden activities,
v

vi

Foreword

as they engage with the body of scholarship in their area of expertise; as they identify
research topics and questions; as they collect, represent, and analyze evidence
from archaeological fieldwork and collections; as they develop identifications,
classifications, descriptions, explanations, and, more generally, accounts of the
material record of humanity and its implications for past societies and cultures;
as they produce archaeological reports, catalogs, databases, monographs, articles,
and conference papers; as they debate and come to conclusions on the validity of
research ideas and findings, and on deliberations on the management and use of
archaeological heritage, be it in scholarly publications, administrative and policy
venues, or in informal interpersonal settings including online communications;
and, last but not least, as they address the historical and contemporary misuses of
archaeology by political and state actors, the appropriation of research agendas and
heritage policies by dominant ideologies and sectarian and economic interests, and
of archaeologically manifested phenomena by sensationalism, pseudo-science, and
irrationalism.
We might assert, paraphrasing Bruno Latour, that archaeology, not unlike
experimental science, “has two faces: one that knows, and one that does not yet.”
The latter is of relevance here. It offers a view of the discipline not as “readymade
science” with its middle-range theories and accounts of particular sites, cultures,
periods, artifact types, etc., but as a “science in the making”: a domain where
archaeological knowledge, as an object (manifested in the representations of ideas
in texts, visual representations, data structures, and the like), is examined in its
articulation with archaeological knowing or knowledge-making as an activity, ripe
with “uncertainty, people at work, decisions, competition, controversies.” It is
precisely in this domain of archaeological activity where the Pandora’s box of our
initial questions is primarily located.
Studying how archaeologists establish ideas, facts, and assertions from their
encounters with the material remains of the past, from the translation of the
material record of features and finds in the field into an informational record
made of descriptions, data points, visualizations, enmeshed with identifications of
sites, archaeological contexts, artifacts, types and assemblages in the excavation
report, and further developed into typologies, seriations, and other manifestations
or archaeological systematics, as well as into synthetic accounts and interpretations, explanations, and theories in scholarly publication, has been a fruitful
way to approach archaeology “in the making.” From publications such as Mike
Edgeworth’s fascinating ethnography of the “acts of discovery” in an unnamed
excavation in Britain, to the fertile qualitative investigations of diverse aspects of
archaeological information work in northern Europe by Isto Huvila, and the multisited study of archaeological curation across different stages in the formation of four
North American archaeological collections in Sarah Buchanan’s insightful doctoral
dissertation, the study of archaeological practices and knowledge work has emerged
as the pursuit of an growing trans-disciplinary community of researchers concerned
with making sense of the agents, processes, settings, mediating tools, and objects of
archaeology “in the making.”

Foreword

vii

A central aspect of “archaeology in the making” concerns how archaeological
data, facts, and assertions related to them are represented in different genres of
representations, and how such representations – from descriptive records, lists,
and catalogs to research publications – underlie different modes of archaeological
knowledge production. As I argued in an earlier manuscript (Dallas, 2016), we owe
a seminal, and perhaps the first, systematic attempt toward a theorization of these
questions to the still under-appreciated intellectual contribution of French Classical
archaeologist and information scientist Jean-Claude Gardin. A pioneer of computational analysis in archaeology in the 1950s, he was initially preoccupied with the
development of analytical “codes” or vocabularies for the formal description and
classification of archaeological artifacts, culminating into the development of his
Syntol free structure indexing language, a means for representing the content of
documents through n-place predicates expressible in a machine language. Drawing
critically from fields as diverse as documentation, classification theory, material
culture studies, structural linguistics, argumentation theory, and philosophy of
science, in his “Document analysis and linguistic theory” (1973), Gardin then
expands his earlier attempts to account for the intellectual content of archaeological
documents through term indexing by an added emphasis on their syntax and
semantics, noting that “the boundary between syntax and semantics becomes so
fuzzy that it is not possible any more to regard syntax as independent nor to confine
semantics to an interpretative function.”
This is the foundation of Jean-Claude Gardin’s seminal contribution to the
theory of archaeological argumentation and discourse, translated into English as
Archaeological constructs: an aspect of theoretical archaeology (1980). The book
is a formidable theoretical construct in its own right. In the first chapter, it outlines
Gardin’s “iterative model” linking the acquisition of archaeological materials with
their annotation and consequent generation of propositions, and offers examples of
what he calls a “logicist analysis” of processes of cataloging, classification, pattern
recognition, and historical inference that constitute the “lifecycle” of archaeological
knowledge process. He then goes on to analyze processes relevant to the construction of two very different kinds of archaeological publications: “compilations,”
such as finds catalogs or excavation reports, typically concerned with material
remains of the past and their attributes, and “explanations,” such as synthetic
monographs and interpretative accounts of ancient societies, their history, and
mode of life. In his analysis, he castigates the failure of traditional archaeological
publication in the narrative genre to attend to methodological rigor, theoretical
frugality, and clarity, even often violating sound reasoning. As an alternative, he
advocates the “condensation” of archaeological scholarly prose through a process
of schematization, taking the form of an ordered tree of logical inferences using
modus ponens, and operating on a lexicon of structures of symbols representing
propositions – in other words, an inference tree.
But then, Gardin adds the following qualification: “I am not proposing a new
handbook on archaeological theory, from which students can learn the techniques of
observation and interpretation [ . . . ] my goal is an analysis of the mental operations
carried out in archaeological constructions of all sorts, from the collecting of data to

viii

Foreword

the writing of an article or book in published form.” While his action-oriented, even
polemical, advocacy of a mode of archaeological communication based on formal
reasoning is undeniable, he notably advances also a salient approach to representing
and understanding the way actual archaeological argument unfolds in practice: a
way to make archaeologists “more aware of the empirical or social limits of our
interpretations” – what he calls “a practical epistemology” of archaeological knowledge. Adopting Stephen Toulmin’s criterion of “reasonableness,” he advocates
an archaeology whose propositions and theories, as represented in its publication
practices, stand the test of reason, but also intends his logicist schematization as a
means to “to gain a deeper understanding of what our interpretive writings ‘are’, as
symbolic constructs; we also wish to evaluate what those constructs can ‘do’, in the
universe of discourse under study.”
The most notable methodological contribution of Gardin’s theorization of
archaeological argumentation concerns archaeological publication. His method
of re-expressing traditional archaeological argument in terms of a lexicon of
symbols and a set of argumentation operations has been adopted by a limited
number of studies. Among them, ethnoarchaeologist Valentine Roux’s Arkeotek
project goes beyond logicist schematization to address the interdependence between
archaeological data constitution on the one hand and scholarly argumentation on
the other. Its hypertext-based “Scientific Constructs and Data” model provides for
integrating archaeological argumentation structure with descriptive archaeological
data. Further work demonstrates the possibility of modeling the logicist schema
of scholarly reasoning as a formal ontology. In a parallel development, the UK
Archaeology Data Service’s Internet Archaeology journal featured, as early as
1997, a similar ability of offering interactive access to archaeological studies
that allowed simultaneous access to scholarly claims and supporting data: a
non-lasting experiment which, nevertheless, still goes beyond the current stateof-play in research data publication. Such attention to the structure and content
of archaeological scholarly communication, and its reliance on the propositional
content and structure of publications, is self-evidently justified on pragmatic reasons
of allowing better access to and evaluation of claims made by archaeological
research.
Yet, dealing with argumentation and discourse in archaeology makes the case for
accounting, beyond methodology, for ontological, epistemological, and axiological
considerations. In other words, when we consider archaeological knowledge “in the
making” as a worthy subject of study, we need to decide on questions of existence,
knowledge, and values. As regards ontology, most archaeologists would agree that
their domain of reference – including material remains of past human activity and
past people – exists, or has existed, independent from our knowledge of it; that it
consists of differentiated objects and structures – be it natural or social – which have
powers and ways of acting that contribute to the production of events; that apart from
actual objects accessible directly to experience, this external world is also composed
of latent, underlying entities and relations between observable entities, yet such
relations may be contingent rather than necessary; but also that, unlike natural
objects, social particulars such as a specific action, an artifact, or an archaeological

Foreword

ix

culture are dependent also on categories accessible only within our own interpretive
frame, even if we still admit that they exist regardless of our specific interpretation
of them. At the epistemological level, on the other hand, many (but not all) workers
in the field will admit that archaeological knowledge is theory-laden, socially
constructed, and historically situated; therefore, what we accept as true today may
be falsified tomorrow, and “thought collectives” (to use Ludvik Fleck’s useful term)
adopting different theoretical premises may legitimately have conflicting views of
what constitutes knowledge on a given subject; that there are both continuities and
discontinuities in the evolution of archaeological knowledge; and that the production
of archaeological knowledge is a social practice, and therefore social relations,
context, and interests, as well as the ways in which archaeological knowledge
is communicated (typically, through historically sanctioned genres of information
carriers), influence its content. Finally, at the axiological level, most archaeologists
would adhere to the idea that archaeological research should be critical of its
object of inquiry, and that the understanding of archaeological phenomena entails
viewing them critically; some would also add that archaeological practice should be
emancipatory, and adhere to values of social justice and an ethics of care.
Readers with an interest in the philosophy of science may recognize that this set
of ontological, epistemological, and axiological positions is aligned with a critical
realist account of the human sciences (and, in fact, derived directly from Andrew
Sayer’s account of critical realist assumptions): a transcendental realist ontology,
a constructivist epistemology, and a value-laden, reflexive axiology. In tandem,
a critical realist account conceives the process of archaeological explanation –
one common objective of archaeological argumentation – as consisting of the
identification of some past human activity or phenomenon to be explained and its
resolution into elements, re-description of these elements in the theoretical language
of archaeology (or the approach to archaeology espoused by the researcher), a
retroductive attempt to describe the likely structural conditions (such as causal
mechanisms, material-semiotic rules, etc.) and tendencies involved, and, finally,
a process of elimination of alternative causes, or explanations. Of course, not all
archaeological research aims at explanation: in fact, the reliance of archaeological
knowledge related to social aspects of past reality on categories (kinds) that
can only become accessible through human cognition – those which, in a more
clearly constructivist vein, have been called “interactive kinds” by philosopher Ian
Hacking – on the shared scholarly language of the epistemic community in which
an archaeological study is situated, makes it clear that words used for identification
or assignment of properties of archaeological entities have consequences on the
content of archaeological knowledge. In other words, far from being the result
of menial or mechanical work with limited value as knowledge, archaeological
descriptions, such as those found in field recording sheets and collections databases,
do matter.
This has an interesting implication on what we consider as the scope of archaeological argumentation. Clearly, a causal syllogism connecting an archaeological
phenomenon to likely causes, or a justification provided for some intervention
concerning the protection and use of an aspect of the archaeological heritage,

x

Foreword

belongs within the purview of argumentation. But what about a finds database?
What about the identification of some archaeological feature, its assignment to some
particular function, provenience, or cultural period, in a catalog without explanatory
aspirations? What about the broad range of visualizations often included as part
of archaeological reports and publications? What of the illustrations – figures,
photographs, diagrams, models – often accompanying archaeological texts? Are we
to assume that they play no role in archaeological argumentation, and, if so, that
they are not involved in knowledge production?
The last statement points to an interesting conundrum: pragmatically, the very
inclusion of visualizations and illustrations within archaeological documents indicates that they contribute to knowledge production. If we were to accept that they
do not participate in argumentation, then we would need to posit other rhetorical
modes of archaeological knowledge beyond argumentation. But, in fact, it should
not surprise us that no archaeological document consists solely of propositions
linked together to form an argumentation structure. The most lucid exposition
(pun intended) of this is provided by Gavin Lucas in his recent Writing the Past
monograph, where he demonstrates how argument not only co=exists but in fact
cooperates in the very same text toward the archaeological knowledge construction
with instances of all three alternative rhetorical modes systematized as early as the
nineteenth century in the context of rhetoric and composition studies: narrative,
presenting a story unfolding through time through the involvement of actors and
events; description, involving the presentation of qualities and attributes of some
observed object or event; and, exposition, explaining or clarifying a topic or issue.
How, then, different archaeological communication objects mobilize different
rhetorical modes, and how they are articulated in reports and publications to
construct archaeological knowledge, is a fascinating topic. Going beyond rhetorical
modes, the example of archaeological visualizations which I had the opportunity to
reflect upon a few years ago in an interesting conference session on “Visualization
as analysis in archaeology,” which provides good insights on how a site section
and “hermeneutic matrix” diagram may act as an exposition of the temporality
and longevity of each excavation cut; or, how a dynamic virtual reconstruction
of the Antikythera mechanism captures performative knowledge, and supports a
plausible explanation, about the function of the mechanism; and, more generally,
how archaeological visualization constitutes an objectual epistemic practice rather
than being merely an act of display; and an archaeological 3D visualization can
act as an “epistemic contract” (borrowing Harold Garfinkel’s identification of the
transcript of an outpatient clinic interview as “therapeutic contract” rather than as
“actuarial record”), made to support the generation of knowledge claims in further
steps of the interpretation ladder, rather than to represent faithfully “what the sensor
saw.”
This edited volume is not an archaeological study. It is, rather, a collective work
about archaeology as a field of knowledge and as a practice of knowledge-making.
It offers a shared foundation useful to archaeologists curious about the conditions of
archaeological knowledge production and the potential of computational approaches
for opening new paths for communicating and validating archaeological research,

Foreword

xi

computer scientists from the fields of natural language processing and argumentation support, information researchers interested in archaeological practices and
knowledge work, anthropologists and sociologists of science, and others interested
in how archaeologists produce knowledge through argumentation “in use.” In the
spirit of the agonistic nature of argument, the volume accommodates diverse,
and in some cases dissonant, conceptualizations and computational approaches
to argumentation and discourse, ranging from archaeological to computational,
from normative to accommodative, from pragmatic to illustrative, from synthetic
to highly focused, and from instrumental to critical. It provides useful insights,
and stimulates ample reflection toward new questions. It is unique in combining
critical and theoretical accounts of archaeological discourse and knowledge work,
and overviews of key computational approaches to discourse and argument analysis,
with examples of specific applications to the formal representation of archaeological
knowledge, ranging from the identification of topics through computer-assisted
recognition of historical names and common descriptors, to formal conceptualizations that allow the articulation between the domain of archaeological discourse
which archaeological texts inhabit, and the domain of past human activity which
such texts refer to.
Reiterating the core thesis he originally advanced in The Uses of Argument,
Stephen Toulmin admits to “a single, deeply held conviction: that, in science
and philosophy alike, [people] demonstrate their rationality not by ordering their
concepts and beliefs in tidy formal structures, but by their preparedness to respond to
novel situations with open minds—acknowledging the shortcomings of their former
procedures and moving beyond them. Here again, the key notions are ‘adaptation’
and ‘demand’, rather than ‘form’ and ‘validity’.” In a similar vein, the dynamic
nature, historicity, and pragmatic situatedness of archaeological argumentation are
acknowledged across this volume. In diverse ways, different chapters address the
content of archaeological argumentation, offer methods and examples to identify its
subject-matter computationally and to represent formally its logical and procedural
structure, and offer insights on the conditions under which particular claims
are (and should be) accepted. They account for the reliance of archaeological
argumentation on communicative processes, set in motion by archaeologists in
conversational semiotic activity governed by historically situated systems of signification. Furthermore, they also engage with the dependence of archaeological
discourse on reference to “things-in-the-world” – empirically manifested aspects
of the archaeological record, persons and collectivities, objects, places, and events,
as well as conceptual entities comprising the subject-matter of arguments. Finally,
they illustrate how discourse “in use” hinges on the pragmatic dimensions of
archaeological knowledge work – affiliation to thought collectives (to use Ludvik
Fleck’s salient notion) and communities of interest with their shared communicative
codes and accepted knowledge, presuppositions, norms, motivations, affects, and
future stakes – which underpin the discursive activity of archaeologists as they
respond and adapt to a changing field of epistemic, ethical, political, socioeconomic,
and cultural challenges. Reaching beyond epistemological, methodological, and
axiological considerations on the nature, poetics, and politics of archaeological

xii

Foreword

knowledge, argumentation, and discourse, which have been the focus of numerous
earlier contributions (from Jean-Claude Gardin to Alison Wylie, Rosemary Joyce,
and Gavin Lucas, to name but a few), this volume provides a pragmatically useful
body of knowledge on the relevance, critical context, methods, and practical applications of discourse and argument analysis technologies as tools to represent, analyze,
and reflect on archaeological knowledge and its production, aptly demonstrated
through salient case studies of computational approaches.
At a time when the representation of the archaeological record and the production of archaeological knowledge is increasingly mediated by digital research
infrastructures and associated standards, tools, and procedures, and when the
promises of deep learning and artificial intelligence assume renewed impetus
across the disciplines, the task of understanding archaeological discourse and
argumentation as knowledge work becomes an urgent undertaking. This volume
addresses consequential issues and offers examples of promising computational
approaches for representing the dynamic structure and situated process of archaeological argument, and its discursive and pragmatic underpinnings in past and
contemporary realities. It opens important additional questions, contributing to the
emergence of an important interdisciplinary subfield bridging archaeological theory
and method with computational approaches to meaning and argument analysis.
Most importantly, it also provides a springboard for intervening, by mobilizing
the archaeological community to act toward the use of computational technologies
to enable reflexive, critically informed, and relevant approaches to the production,
publication, epistemic validation, and use of archaeological knowledge, adapted to
the demands and challenges facing contemporary societies, and the planet.
Faculty of Information, University of Toronto, Toronto, Canada

Costis Dallas

Reference
Dallas, C. (2016). Jean-Claude Gardin on archaeological data, representation and
knowledge: Implications for digital archaeology. Journal of Archaeological
Method and Theory, 23(1), 305–330. https://doi.org/10.1007/s10816-015-9241-3

Preface

Most of the knowledge that we produce in archaeology comes from careful argumentation from basic premises to elaborate conclusions. Initial premises include
descriptions of finds, features, sites, and landscapes, while conclusions range from
settlement patterns to trade routes or social organisations. In this regard, most
archaeological texts constitute discourses aiming to persuade the reader to accept a
series of conclusions based on some initial premises, often factual and evidentially
supported. Whether or not an archaeological text is capable of persuading its readers
and thus advance the state of the art in the field depends on the quality of the chosen
premises as well as the robustness of the subsequent argumentation. Therefore,
paying attention to discourse and argumentation in archaeology constitutes a crucial
aspect of meta-research.
Language technologies have evolved rapidly over the last 10 years, and today
we can process natural language on a computer with relative ease, at least for
some well-defined purposes. The conceptualisation of discourse and argumentation
has advanced significantly as well, together with applied approaches. Although the
importance of discourse and language in archaeology has been pointed out by many
authors, there is no comprehensive work to date that presents a panoramic view of
argumentation and discourse approaches and technologies in archaeology. In this
book, we aim to provide this.

Audience and Objectives
This book is aimed at archaeologists with an interest in language, discourse, and
argumentation, and specifically on how archaeological conclusions are obtained
through argumentation processes. In particular, researchers in archaeology can find
the book useful to gain a better understanding on how argumentation can take us
from premises to conclusions and learn how to do it better. Lecturers and students
of archaeology can use the book to learn specific conceptual approaches and

xiii

xiv

Preface

computational approaches to discourse and argumentation analysis for archaeological texts.
All in all, the book aims to provide a comprehensive overview of conceptual
approaches and computational techniques for argument analysis in archaeology.
It does so by building slowly from scratch, starting with introductory topics and
progressing towards advanced and more specialised issues. Also, the book unites
theory and practice, providing a comprehensive panorama of conceptual approaches
and computational techniques.
The book starts with the basic foundations of discourse and argumentation
analysis, introducing the main goals of discourse analysis, presenting different
approaches to what an argument is, and concluding with cutting-edge and stateof-the-art technologies for the fully automatic analysis of texts. In addition, the
book tackles different contexts where archaeological discourses are found, from data
collected during fieldwork to archiving of excavation reports or court resolutions on
heritage-listed items.
The book also presents an updated review of approaches and methods related to
natural language processing and text mining that are applicable to archaeological
settings, and at multiple linguistic levels such as lexical, grammatical, and discursive. Also, the book proposes some methodological approaches for the analysis of
argumentative strengths and weaknesses in archaeological texts based on Toulmin’s
schemes.
Finally, the book considers different degrees of formalisation in discourse analysis, from critical Foucauldian approaches to the more quantitative computational
analytics, and takes into account the social dimension of archaeological discourse
production.

Book Structure
This book is organised into two major sections: Conceptual Approaches and Computational Techniques. A preface provides a general introduction, and a final chapter
offers some speculations as to what the future of discourse and argumentation in
archaeology may look like.
The first section, Conceptual Approaches, contains a collection of contributions
from different foundations and perspectives, offering a comprehensive overview of
the discursive and argumentative phenomenon in archaeology and its ramifications.
In Chap. 1, Martín Pereira-Fariña presents the fundamentals principles of discourse
analysis and three different theoretical approaches of how arguments can be
represented, summarising the process to transform raw data into an annotated corpus
that allows us to draw conclusions anchored in how language is used in context. In
Chap. 2, Stephen Stead deals with the issue of documenting the argumentation in a
discourse so that it can interoperate with other sets of data. In Chap. 3, Michael E.
Smith offers a historical journey through different stages and degrees of importance
attributed to the study of archaeological argumentation, analyses some reasons for

Preface

xv

the low level of attention that is paid to argumentation in archaeology today, and
presents a methodological proposal based on argument strengths and weaknesses. In
Chap. 4, Alejandro Sobrino and Beatriz Calderón introduce a theoretical framework
for the analysis of causal linguistic structures related to culturally relevant elements,
acknowledging that causality can be linguistically expressed in multiple ways, and
showing how this issue can be tackled.
In Chap. 5, in turn, Cesar Gonzalez-Perez focuses on what archaeological texts
talk about and presents an approach to connect the argumentation in the discourse
with the underlying ontological elements in the world, using a referential device
named ontological proxies. In Chap. 6, Isto Huvila takes on a more sociological,
anthropological, and critical nature to archaeological discourse and reflects on
discourses in archaeology as situated in their social context of production, including
an analysis on the role of different agents and the impact of new ways of discourse
production such as social networks or other techno-mediated mechanisms. In
Chap. 7, Cesar Gonzalez-Perez, Martín Pereira-Fariña, Patricia Martín-Rodilla, and
Leticia Tobalina tackle the issue of vagueness in archaeological discourses and
present a conceptual framework to capture and manage vague information from the
field to the text. Finally, in Chap. 8, Jeremy Huggett uses a multimodal approach to
extend discourse analysis in archaeology beyond the mere text.
The second section, Computational Techniques, provides a sample of some
algorithmic approaches that have proved useful to deal with discourse and argumentation in archaeology. In Chap. 9, Patricia Martín-Rodilla offers an introductory
overview of how computer-based processing of natural language has been applied
to archaeological texts, and what major lines of work exist today. In Chap. 10, Holly
Wright, Tim Evans, and Katie Green deal with the natural language processing
of lexicon in archaeological texts from the perspective of a large digital archive,
showing how these techniques are useful for information extraction for researchers.
In Chap. 11, Alex Brandsen deals with text mining at the lexical, grammatical, and
discursive levels, as well as machine learning applied to archaeological texts. In
Chap. 12, John Lawrence, Martín Pereira-Fariña, and Jacky Visser go beyond the
discourse itself to explore the mining and analysis of arguments from plain text,
with a special focus on argument analytics and result dissemination. Lastly, in Chap.
13, Maria Elena Castiello provides an approach to processing the vagueness that is
inherent to archaeological language in a site modelling context.
For those readers who have a special interest in a particular topic, the book admits
a theme-oriented reading in addition to a linear sequence of chapters. Chapters 2, 4,
and 3 in Part I, as well as Chap. 12 in Part II, deal with argumentation and different
approaches to understanding how people argue to defend their standpoints. Chapters
5 and 7 in Part I, as well as Chaps. 9, 10, and 11 in Part II, deal with lexical,

xvi

Preface

grammatical, and semantic language processing. Finally, Chaps. 4, 6, 7 and 8 in
Section I, as well as Chap. 13 in Part II, deal with language as used in context,
including social aspects, vagueness, and multi-modality.
Enjoy reading!
Santiago de Compostela, Spain
A Coruna, Spain
Santiago de Compostela, Spain

Cesar Gonzalez-Perez
Patricia Martín-Rodilla
Martín Pereira-Fariña

Acknowledgements

The editors wish to thank the authors of the chapters of this book for their generous
contributions, as well as the Springer staff who guided and helped us throughout the
publication process.
The editors must acknowledge the contributions and support of the following grants towards the preparation of this book: project “Heritage 3.0: Argumentation and Conceptual Modelling for Enhanced Cultural Heritage Participation and Management Policies” (ACME), grant number PID2020-114758RB-I00
funded by MCIN/AEI/10.13039/501100011033; project “Deflationist Views in
Ontology and Metaontology”, grant number PID2020-115482GB-I00 funded by
MCIN/AEI/10.13039/501100011033; project “Saving European Archaeology from
the Digital Dark Age” (SEADDA), grant number CA18128 funded by EC COST
Actions; and Consellería de Educación, Universidade e Formación Profesional
(accreditation 2019-2022 ED431G/01, ED431B 2019/03); and European Regional
Development Fund, which acknowledges the CITIC Research Centre in ICT at the
University of A Coruña as a member of the Galician University System.

xvii

Contents

1

Introduction to Discourse Analysis and Argumentation Theory . . . . . .
Martín Pereira-Fariña

1

Part I Conceptual Approaches
2

Discourse and Argumentation in Archaeology: Conceptual
and Computational
Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Stephen Stead

29

3

Making Good Arguments in Archaeology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Michael E. Smith

4

A Causal Model Application to a Cultural Heritage Sentence
Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Alejandro Sobrino and Beatriz Calderón-Cerrato

55

What Archaeological Texts Argue About: Denotations
and Ontological Proxies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Cesar Gonzalez-Perez

93

5

37

6

The Social Production of Discourse in Archaeology . . . . . . . . . . . . . . . . . . . . 115
Isto Huvila

7

Dealing with Vagueness in Archaeological Discourses . . . . . . . . . . . . . . . . . 137
Cesar Gonzalez-Perez, Martín Pereira-Fariña, Patricia MartínRodilla, and Leticia Tobalina-Pulido

8

Extending Discourse Analysis in Archaeology: A Multimodal
Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
Jeremy Huggett

xix

xx

Contents

Part II Computational Techniques
9

Computer Processing of Language: Where Archaeological
Discourse and Computers Meet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
Patricia Martín-Rodilla

10

NLP and Archaeology: A View from a Digital Archive . . . . . . . . . . . . . . . . 215
Holly Wright, Tim N. L. Evans, and Katie Green

11

Information Extraction and Machine Learning for
Archaeological Texts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
Alex Brandsen

12

Argument Mining and Analytics in Archaeology . . . . . . . . . . . . . . . . . . . . . . . 263
John Lawrence, Martín Pereira-Fariña, and Jacky Visser

13

Computational Processing of Language Vagueness for
Archaeological Site Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
Maria Elena Castiello

Part III The Future
14

Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
Cesar Gonzalez-Perez, Martín Pereira-Fariña, and Patricia MartínRodilla

Chapter 1

Introduction to Discourse Analysis
and Argumentation Theory
Martín Pereira-Fariña

Abstract Discourses analysis is an explicit and systematic study of the structures,
strategies and manoeuvres of texts or talks in terms of a given theoretical framework.
The current stage of computational technologies allows us to tackle this task
from different perspectives. Along this chapter, I explore how an argument can be
characterised and analysed from three theoretical perspectives (logic, pragmatic and
cognitive). Each of these approaches lead us to different types of discourse analysis,
emphasizing different angles of the same text, which shows the richness of this
analytical framework. After that, I describe the main steps for transforming raw
text into an annotated corpus, essential to draw any reliable conclusions from it.
Annotation is a complex task, essential for a good quality analysis of discourse, but
it can be split into doable steps. Finally, the chapter concludes with some ideas for
the exploitation of these results and how they can be disseminated.
Keywords Argumentation · Annotation · Corpus creation · Discourse analysis ·
Ontology

1.1 Introduction
It is 4 pm on a cold day in February. Two senior archaeologists are discussing about
the future of The Cave of Altamira,1 a set of charcoal drawings and polychrome
paintings that constitute one of the firsts masterpieces in the history of mankind.
Sitting in front of each other, together with a moderator, they debate a question

1 World

Heritage Site by UNESCO located in Santillana del Mar (Cantabria), North of Spain. For
more information: http://www.culturaydeporte.gob.es/mnaltamira/en/home.html
M. Pereira-Fariña (!)
Department of Philosophy, University of Santiago de Compostela, Santiago de Compostela, Spain
e-mail: martin.pereira@usc.gal
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
C. Gonzalez-Perez et al. (eds.), Discourse and Argumentation in Archaeology:
Conceptual and Computational Approaches, Quantitative Archaeology
and Archaeological Modelling, https://doi.org/10.1007/978-3-031-37156-1_1

1

2

M. Pereira-Fariña

that has been floating around Altamira during the last 20 years: should be the Cave
opened for public access or just for experts with research purposes? Experts have
mixed opinions:
• Researcher 1 (R1): I think the question is basically that the Cave of Altamira
should be opened because it is obviously a place, to say simply, that everyone
has the right to visit. Thus, from that principle, I think that all that can be
negotiated, discussed, and talked about, it’s under what conditions and, above
all, for what, that is, what is the benefit of opening it, right? Starting from the
principle that heritage, everyone has the right to access it, then, considering the
problematic and risky conditions that the cave has, from there one can think of
possible restrictions and criteria to restrict and so on, but from the outset, I mean,
it must be open. That is my position.
• Researcher 2 (R2): I think not, precisely for the same reason; because everyone
has the right to access heritage, but if everyone accesses heritage, that heritage
is destroyed, isn’t it? So, I think that access should be restricted to experts, let’s
say, and to researchers, and in fact I think that there was even a more or less
exact reproduction and, well, that for tourism purposes or just for dissemination,
I think that it could therefore do the job quite well. And basically, that is my point.
If you want, we can go into more detail, but...
• Moderator (M): Only experts, you say, should access to the cave.
• Researcher 2 (R2): Yes, indeed, just researchers. Let’s say, people who are in
research centres and it is essential, come on, that they enter to check certain
things, for example... I don’t know.
We can easily appreciate how each position is argued and how a certain common
background is presupposed. So, how are both discourses elaborated? Are they
talking about the cave as a physical object or a social object? What type of reasons
do they use for supporting their corresponding positions? What is the connection
between the researchers and the cave? Are they considering themselves as experts
or as regular visitors? Could they have any potential conflict of interest? What is the
context in which this debate is happening?
This is just not a scientific debate, but it also has a social impact. Understanding
and evaluating it requires to unpack the connections between language, reality,
and speakers. Discourse analysis tackles precisely this question, as it is defined by
Paltridge (2012, p. 2):
Discourse analysis examines patterns of language across texts and considers the relationship
between language and the social and cultural contexts in which is used. Discourse analysis
also considers the ways that the use of language presents different views of the world and
different understandings. It examines the use of language is influenced by relationships
between participants as well as the effects the use of language has upon social identities and
relations. It also considers how views of the world, and identities, are constructed through
the use of discourse.

Therefore, discourse analysis does not study the language itself, but the language in
use (Gee, 2011). It is a broad and interdisciplinary field, connected with other disciplines such as semiotics (Eco, 1979), linguistics (Serrano, 1983) or communication

1 Introduction to Discourse Analysis and Argumentation Theory

3

studies (Chandler, 2003). There are two main methodological approaches which are
differentiated by their respective goals (Gee, 2011):
• Descriptive: It aims to understand how language works in different communicative situations: what are the topics of discussion, what is the grammar applied
to produce meaning, what are the different stylist resources of manoeuvres to
produce meaning, etc.
• Critical: It aims to intervene in social, political or cultural problems and
controversies and provoke changes in the world based on studying how language
works.
In this chapter, I will focus on a descriptive type of discourse analysis which main
goal is to unpack argument structures in a given discourse and, eventually, the folk
ontology underpinning a specific discourse. The goal is to provide robust theoretical
and methodological grounds for understanding the different methods of reasoning
and how knowledge is produced in a field such as cultural heritage (Lucas, 2019).
Discourse analysis, following the spirit of archaeological stratigraphy, allows us to
identify how the different layers that constitute the structures of meaning (what are
the internal elements of the text and how they are organised) and interaction (how
speakers take part in the discourse).
This chapter is organised as follows. Section What is an argument? Introduce
the three different views about the notion of argument. Section Designing and
Annotation Campaign describes how to design and carry out a concrete study
in discourse analysis. Section Exploiting the results provides some ideas for the
communication and dissemination of the results extracted from the study. Finally,
this chapter concludes with some reflections on the impact that discourse analysis
can have in the field of archaeology and cultural heritage studies.

1.2 What Is an Argument?
R1 says “I think the question is basically that the Cave of Altamira should be
opened” while R2 replies “I think that access should be restricted to experts”.
Both speakers maintain opposed positions with respect to the Cave, which can be
reconstructed as assertions as follows:
• R1: The access to the Cave should be opened to everybody.
• R2: The access to the Cave should be restricted to experts.
These are not arguments; these are just assertions. So, what does we need to have an
argument? Generally speaking, an argument requires at least another assertion that
play the role of support. A more specific definition is highly dependent on how the
relationship between both statements is conceptualized. Next, I will describe three
different approaches:

4

M. Pereira-Fariña

• Logical approach: An argument is a sort of linguistic entity where a statement,
named conclusion, is supported by one or more statements, named premises
(Salmon, 1984). Logic studies the connection between premises and conclusion
in order to determine when it is correct and when is not; i.e., the rules and
principles to determine the validity of the argument.
• Pragmatic approach: An argument is a particular type of speech act where a
speaker has the intention to support a specific statement, the conclusion, by
means of another statement or a set of them, the premises (Janier & Reed, 2017).
Argument theories based on Speech Act Theory (Austin, 1989; Searle, 1965)
aims to identify when a speaker intends to make an argument; determining its
validity is a secondary issue.
• Cognitive approach: An argument is a cognitive category where the linguistic
expression is acting as a sign vehicle of a specific relation of support between
two or more mental representations where one is the conclusion and the other
or others are the premises (van den Hoven, 2015; Searle, 1965). This is the most
ambitious approach, because it aims both to identify the structure of the argument
and its strength. The key point of strength is not logical validity but acceptability;
i.e., whether the argument has convinced its addressee or not (Mercier, 2012).
In the following subsections, I will describe in detail each of these views and I
will illustrate how they can be diagrammatically represented for their computational
analysis.

1.2.1 Logical Approach
The logical analysis of a discourse fragment entails three basic steps (Salmon,
1984): (i) checking whether that text is an argument or not; (ii) distinguishing
between premises and conclusion; and, (iii) if the argument is not complete, adding
the hidden or presupposed premises. Thus, let’s consider again the main positions
expressed by R1 and R2:
R1: I think the question is basically that the Cave of Altamira should be opened
because it is obviously a place, to say simply, that everyone has the right to visit.
R2: I think not, precisely for the same reason; because everyone has the right to
access heritage, but if everyone accesses heritage, that heritage is destroyed,
isn’t it?
Arguments rarely appear in a stereotypical way (a premise per line, and horizontal
line and the conclusion below it) in natural language discourse. Usually, they
appear disorganised and hidden in the middle of the discourse, accompanied by
non-argumentative fragments. So, step (i) aims to recognise argumentative text
among non-argumentative one. We must start looking for certain linguistic particles
or phrases that indicates the presence of arguments. Some typical expressions
are “therefore”, “hence”, “consequently”, “so”, “it follows that”, “since”, “for”,

1 Introduction to Discourse Analysis and Argumentation Theory

5

“because”, etc. In the example, both R1 and R2 use “because” (in bold) and this
indicates that there is an argument there.
Step (ii) consists of identifying the premises and the conclusion of the argument
and reconstructing the propositions expressed by them. For distinguishing premises
from conclusion, we can use linguistic markers again. Particles such as “therefore”,
“hence”, “consequently”, “so” or “it follows that” indicate that the conclusion is
going to be introduced; particles such as “since”, “for” or “because”, indicate that
what is following are premises. In the previous example, both R1 and R2 use
“because”, which gives us a delimitation mark to split the text following this pattern
“<conclusion>because<premise(s)>”. R2, in addition, uses the linguistic marker
“but” (underlined), which usually indicates that a new premise is added to the
argument. Next, I reconstruct the argument structure of both speakers (Table 1.1).
The next step is reconstructing the propositions. This is a problematic notion in
philosophy (Richard, 2013). For the sake of simplicity, we assume here its minimal
definition: a proposition is what is expressed by a statement, and it has a truth-value
(it is true or false). Next, we show the simplest reconstruction of the argument and
the propositions by R1 and R2, removing epistemic verbs and any other linguistic
elements not necessary to make clear its main contain (Table 1.2).
However, both R1 and R2 seems to be incomplete, there is a lack of connection
between the premise and the conclusion. Step (iii), following Salmon’s methodology, consists in the reconstruction of hidden premises. Thus, R1 is presupposing
a link between “the right to visit a place by everyone” and “The Cave of Altamira
should be opened”; therefore, we need an additional premise (a conditional) to make
this connection: “If the Cave of Altamira is a place that everyone has the right
to visit, then it should be opened”. In the case of R2, the additional premises are
“The Cave of Altamira is a heritage site” and “The Cave of Altamira should not be
Table 1.1 Reconstruction of the argument structure of R1 and R2 arguments
R1
The Cave of Altamira is obviously a place
that everyone has the right to visit.

The Cave of Altamira should be opened.

R2
The Cave of Altamira will be destroyed if
everyone has the right to access it.
If everyone accesses heritage, that heritage is
destroyed.
The access to the Cave of Altamira should be
restricted to experts.

Table 1.2 Reconstruction of the propositions of the R1 and R2 arguments
R1
The Cave of Altamira is a place that
everyone has the right to visit

The access to the Cave of Altamira should
be opened to everyone

R2
If everyone accesses heritage, then that heritage
is destroyed
If everyone has the right to access the Cave of
Altamira then, it will be destroyed
The access to the Cave of Altamira should be
restricted to experts

6

M. Pereira-Fariña

Fig. 1.1 Full reconstruction of R1(left-hand side) and R2 (right-hand side) arguments. In green,
the hidden premises that have been added. The nodes content premises and conclusions and the
arrows always point to the conclusion

destroyed”. Figure 1.1 shows the full reconstruction of R1 and R2, including hidden
premises, by means of a diagrammatic representation using LogosLink.
Logical approach considers arguments as single and autonomous units that must
be fully reconstructed to be evaluated. The two basic types of arguments are:
(i) deductive arguments; and (ii) inductive arguments. Deductive arguments are
demonstrative (Salmon, 1984); therefore, if the premises are true and the argument
is valid, then the conclusion is necessarily true. However, it does not provide
new information because the information in the conclusion is already implicit in
the premises; in other words, the conclusion only makes explicit information that
was already in the premises. Inductive arguments are not demonstrative (Black,
1967); therefore, premises only provide a degree of support or confidence ore
even probability to the truthfulness of the conclusion. However, it provides new
information which is not included in the premises.
R1 is reconstructed as a deductive argument, since the conclusion is just the
consequent of the conditional that can be inferred because the antecedent is asserted
as a premises. R2 is an inductive one, since it is adding new information, such as
“experts” also entails that “the Cave should not be opened to everyone”.
Logic is a very well-defined methodology for evaluating the quality of arguments.
Different types of logics (propositional logic, first-order logic, etc.) allow us
to evaluate different type of arguments. However, deductive or even inductive
arguments are very rare in natural language discourse because we have to deal with
incomplete information and uncertainty in many everyday situations. Moreover,
reconstructing arguments in this way usually requires a lot of presuppositions and
extracting implicit information that cannot be easily derived from the original text.
Finally, it does not allow us to capture the dynamics of the debate and complex
argumentative structures cannot be analysed.
A more flexible framework under a logical approach is the Periodic Table of
Arguments (PTA) (Wagemans, 2016, 2019). It focuses on the study of arguments

1 Introduction to Discourse Analysis and Argumentation Theory

7

in natural language by means of a step-by-step method for identifying arguments,
including more types than deductive and inductive.

1.2.2 Pragmatic Approach
Usually, arguments are elaborated during a communicative interchange, in a
dialogue. In such as circumstances, any speaker pursues a specific goal: either
justify him or herself or persuade others (Mercier & Sperber, 2018). To achieve this
goal, speakers use different linguistic structures and argument structures. From a
rhetorical point of view, if a speaker uses rational arguments, he must prove the truth
of his premises and the audience will accept the truth of the conclusion (Perelman
& Olbrechts-Tyteca, 1973).
Speech act theory is the general frame upon which this approach is built up. A
speech act is the production of a linguistic instance, an utterance, under specific
circumstances (Searle, 1965). The illocutionary act is the minimal unit of linguistic
communication, and it comprises two components (Searle & Vanderveken, 1989):
(i) an illocutionary force; and, (ii) a propositional content. For example, “Open
the window!” and “Could you open the window?” are two utterances with the
same propositional content (i.e., ‘you should open the window’) but with different
illocutionary forces: the former is an ‘order’, and the latter is a ‘request’. Currently,
there is not a fixed catalogue of illocutionary forces, although some of them are
widely accepted such as assertion or questioning (Searle & Vanderveken, 1989).
In this section, I will introduce Inference Anchoring Theory (IAT) (Reed &
Budzysnka, 2010; Janier & Reed, 2017), which main goal is to describe and
capture dialogical aspects of argumentation; and Pragma-dialectics (van Eemeren &
Grootendorst, 1984, 2004), a normative approach for the development of a rational
conversation.
IAT presupposes that the analysis of dialogical interactions allows us to extract
the argument form of a discourse, since linguistic argumentative indicators (such
as, ‘therefore’ or ‘because’) are not as common in spoken language as in written
texts (Janier & Reed, 2017). The sequence of interventions during a dialogue also
conveys the structure of the argument that the speaker wants to elaborate. Thus,
IAT argument analysis requires the following steps: (i) segmenting the utterances
of each speaker into argumentative units; (ii) identifying the illocutionary forces
and reconstructing the propositional content of the argumentative units; and, (iii)
unpacking and reconstructing the argumentative relations between the propositions.
Figure 1.2 shows the diagrammatic analysis of a fragment of the first interchange
between R1 and R2 using IAT framework and OVA+ (Janier & Reed, 2017), a web
annotation tool specifically developed for IAT analysis.
As can be observed in Fig. 1.2, the fragment of the dialogue between R1 and
R2 is represented as a graph composed by three main sections: (i) the right-hand
side, where we capture the dialogical structure and it comprises both the utterances
from each speaker (locutions) and the relevant moves between them (transitions);

8

M. Pereira-Fariña

Fig. 1.2 IAT analysis of dialogue between Researcher 1 and Researcher 2

(ii) the middle side, that contains the illocutionary forces representing the speaker’s
communicative intentions; and, (iii) the left-hand side, which represent the argument
structure.
This analysis presents several relevant differences with respect to the logical one.
Firstly, it captures both the utterances (the actual statement that was said by the
speaker) and their propositional content (with the minimal possible reconstruction),
which allows us to keep track about what was actually said by each speaker.
Secondly, it shows the dynamic of the dialogue and the turn taking among the
participants. Thirdly, the disagreement between both speakers is explicitly captured
by the “Default conflict” node in the left-hand side, which indicates that an already
said proposition is being neglected. Fourthly, it also gathers the intentions of the
speakers through the illocutionary forces, which can come from both the utterances
and the turn taking. Figure 1.2 contains four illocutionary forces, although IAT
defines more than 20 different ones (Janier & Reed, 2017): (i) “Asserting”, which
indicates that a speaker just made a statement; (ii) “Arguing”, which indicates that

1 Introduction to Discourse Analysis and Argumentation Theory

9

a speaker has the intention to support a claim; (iii) “Disagreeing”, capturing the
speaker intention of rejecting a statement that has already been said; and, (iv)
“Rhetorical question”, which shows that a speaker has made a claim but formulating
it as a question, so no answer is required.
Thus, from the perspective of Discourse Analysis, we can get a deeper understanding on how argumentation is elaborated using the pragmatic approach rather
than the logical one. Its main weakness is the lack of a systematic methodology for
evaluating the strength of the argument, something that logical approach provides.
The other mentioned pragmatic approach, Pragma-dialectics (van Eemeren &
Grootendorst, 1984, 2004), is, essentially, a normative model where any argumentative exchange is taken as an instantiation of the ideal model of a critical discussion
which goal is a reasonable resolution of difference of opinion. This conversation
is guided by a set of rules, named “dialogue protocol” (that should be captured by
“Transitions”), to achieve the proposed goal; the violation of any of these rules will
constitute a fallacy.
Pragma-dialectics establishes three basic components for a rational conversation:
(i) setting the roles of participants, basically protagonist (who argues in favour of the
standpoint) and antagonist (who argues against the standpoint); (ii) going through
the four stages of the discussion (confrontation stage, opening stage, argumentation
stage and concluding stage); (iii) evaluation if any of the 15 rules of critical
discussion (spread along the different stages) were violated.
The analysis of an argument within this framework requires five different steps:
(i) identifying the standpoints of the discussion, which is composed by a proposition
and the illocutionary force (the attitude of the speaker with respect to that proposition); (ii) recognizing the protagonist and the antagonist, assigning their respective
standpoints; (iii) agreement on the shared propositions that establishes the common
ground of the speakers; and, (iv) identifying the argumentative structures used by the
speakers during the discussion, which include both argument schemes and critical
questions (Walton et al., 2008). A deeper analysis of this framework is out of the
scope of this paper, since it requires the analysis of the full dialogue; however, from
the perspective of Discourse Analysis, it is a very valuable framework.

1.2.3 Cognitive Approach
The last approach to the nature of arguments that I will explore in he is considering
argumentation as a mental process (van den Hoven, 2015) or a cognitive activity
(Mercier & Sperber, 2018). An argument expressed in natural language (written,
spoken, trough images, etc.) is not the argument itself but the representation of a
mental process. Therefore, understanding or interpreting an argument always entails
the reconstruction of the corresponding mental process.
Thus, any linguistic argument is a sign vehicle of the mental process and,
therefore, it must be analysed as a semiotic entity (van den Hoven, 2015): its
textual part is a sort of representamen which stands for the argument itself –the

10

M. Pereira-Fariña

object– which is a cognitive entity with a particular goal. As hearers, we reconstruct
that connection between the textual argument and the argument itself through the
interpretant (Peirce, 1958; Chandler, 2003).
This semiotic conception of arguments presupposes two main types of relationships (van den Hoven, 2015):
– Mimesis: The textual argument is a perfect imitation of the mental process of
argumentation.
– Diegesis: The propositions constituting the argument convey a specific interpretation and evaluation of the world.
Both relationships are the constituent parts of the named ‘discourse world’. It
comprises the background, presuppositions, commitments, beliefs or desires of each
speaker –shared or not– and, therefore, it plays a major role in the reconstruction of
the argument for its understanding and evaluation (Mercier & Sperber, 2018). Under
this approach, the intention of the speaker of making an argument is not enough to
have an argument, it also requires to be recognised as that by the hearer. Therefore,
arguing is, essentially, a social activity (Mercier & Sperber, 2018).
From our point of view, this is the richest framework for modelling argumentative
discourse also it is the most complex one. To the best of my knowledge, there is not
still a fully developed framework for that. IAT/ML (Gonzalez-Perez, 2020), that
combines IAT with conceptual modelling (Gonzalez-Perez, 2018), is a theoretical
approach under development grounded on this conception of argumentation.
IAT/ML defines four basic steps to carry on a cognitive analysis of an argumentation: (i) setting the initial discourse world of the participants in the conversation
by means of a conceptual modelling language; (ii) identifying the chunks of
texts that are acting as a sign of a mental process of argumentation (following
linguistic indicators, grammar structures, images, etc.); (iii) reconstructing the
argument mentally elaborated by each speaker using the contextual information and
foreknowledge available for the analyst (which might significantly vary between
analysist); and, (iv) evaluating whether the result of the interaction requires any
change in the discourse world.
Figure 1.3 shows a reconstruction of the discourse world (ontology) underlying
the debate between R1 and R2 about the Cave of Altamira using ConML, a
conceptual modelling language.2 Each node represents a discourse entity, such as
the “Cave of Altamira”, which appears in a central position since it is the main
entity discussed in the debate. Both speakers know that the cave is the support of the
prehistorical paintings, but they disagree with respect to the “RightOfUse”, which
appears twice taking two different values: once as “Experts may access” and other as
“Everyone may access”. Each edge defines a directed connection between entities,
such as between the “RightOfUse” and two different groups of people, “Experts”
from one side and “Everyone” from the other side.

2 http://www.conml.org/default.aspx

1 Introduction to Discourse Analysis and Argumentation Theory

11

Fig. 1.3 Conceptual model of the debate between Researcher 1 and Researcher 2 about the Cave
of Altamira

Fig. 1.4 Ontological proxy for connecting the argument and ontological analysis of the debate
between Researcher 1 and Researcher 2

Ontological model in Fig. 1.3 can be related to the argument diagram in Fig. 1.2
to obtain a more complete analysis of this debate. However, both diagrams cannot
be linked straightforward, but they require a sort of intermediary; i.e., an ontological
proxy (Gonzalez-Perez, 2020). Figure 1.4 shows an example of ontological proxy,
where R1 has committed with existence (“Makes reference that refers to”) of the
entities “Altamira Cave” and “Everyone” when he is asserting the proposition “The
cave should be opened to everyone”.

12

M. Pereira-Fariña

Although IAT/ML is still under development, it has several strengths. It allows us
to gather the “discourse world” underlying the debate, which is essential to identify
the disposition of the hearer to be persuaded; i.e., whether the arguments exposed by
the other speaker are align with his or her previous beliefs, desires or assumptions
or, by the contrary, there is a conflict with that. The former, makes persuasion easier
than the later. On the drawbacks, this type of analysis is more time consuming and
demanding than an analysis based on a pragmatic approach.
I have exposed three theoretical frameworks, supported on alternative views of
the nature of arguments, that can be applied for argument analysis in Discourse
Analysis. Next, I will describe the key steps of a general methodology for carrying
on the argumentative analysis of a specific dataset, independently on the adopted
theoretical framework.

1.3 Designing an Annotation Campaign
Annotation means to add interpretative information (premise, conclusion, illocutionary force, etc.) at a meta-level to describe how language-in-use works (Fort,
2016, p. 10). This is the basic methodology in Discourse Analysis and the type of
information that can be added includes a wide variety of elements, from just adding
who are the speakers or the timestamps to a transcription of a recoded audio to a
very detailed marks for intonation, voice pitch, emphasis, etc. Here, I will focus on
the basic principles and requirements for annotating arguments; i.e., on marking the
argument structure and their components on a selected dataset.
Annotation is usually organised in campaigns. Every annotation campaign is
guided by a goal; i.e., what is the research question or the hypothesis to be
validated. In the case of argument annotation, those goals range from unpacking
the argumentative structure of a specific discourse, such us in the case of US
2016 presidential debates (Visser et al., 2018), to other more specific aims, such
as annotating the argument schemes (Visser et al., 2021) or identifying the type of
argumentative propositions that have been used (Jo et al., 2020).
An annotation campaign is, most of the times, a collaborative process that
requires a team, although it can be done individually as well. A successful annotation campaign requires a preparatory work, which is crucial when the annotation
is collaborative. The main actors participating in any annotation campaign are (Fort,
2016): (i) campaign manager, the person in charge of the full process and who
decides when the annotation is ready to start and when is finished; (ii) expert
annotator, basically experts, a person or set of people who knows very well the
theory and the guidelines and who is able to assess the quality of the annotation; and,
(iii) annotators, the team of people specifically trained for the annotation campaign
who will do the main task of adding marks and label to the raw corpus.
Next, I will describe the three basic stages of an annotation campaign (Fort,
2016): (i) pre-campaign, which consist of preparing all the requirements and training

1 Introduction to Discourse Analysis and Argumentation Theory

13

annotators for the task; (ii) annotation, when the material is actually annotated by
the annotators; (iii) evaluation, when the quality of the annotation is assessed.

1.3.1 Pre-campaign
The very first step in any annotation campaign is to set its goal; as I said, what is the
research question that needs to be addressed, the hypothesis that we want to validate
or even the type of annotated data that we want to obtain. The definition of this goal
is also highly dependent on the theoretical framework supporting the annotation.
For instance, if we adopt a logical approach, the dynamics of the interaction among
the speaker is not relevant, while, if we adopt a pragmatic or a cognitive approach,
this information must be annotated.
Any pre-campaign should include, at least, the following stages (Fort, 2016): (i)
creating the raw corpus, (ii) creating the guidelines; and, (iii) training the annotators.
1.3.1.1

Creating the Raw Corpus

Annotation campaigns are expensive. Either they require a very well-trained team
for completing the task well (which is expensive to train) or a large team to complete
the task quick (which is expensive in terms of quality). For that reason, the material
to be annotated must be carefully selected by the campaign manager in order to
make the annotation campaign as efficient as possible.
The manager campaign is the responsible for creating the raw corpus (Fort,
2016). It has to keep in mind what are the main goals of the annotation campaign
and the theoretical framework for the argument analysis in order to select a
representative and not bias raw data set. Some discourse features to consider
are topic, genre, context, etc. Any error in the selection of the raw text might
entail annotating more material or reannotating something already annotated, which
makes the campaign more expensive.
The second main task for the manager in this stage is to guarantee that all that
material is in the right format to be tackled with the annotation tool (today, any
annotation task is computationally supported) used by the annotators. Most of the
times the format of the material is written text. Thus, it must be presented as a
clean, structured, and human readable document for the annotators. This usually
requires, at least, the following points: (i) removing all those text spans already
discarded for the analysis (i.e., footnotes, comments, typos, etc.); (ii) introducing all
those metadata required for the analysis (speakers, timestamps, etc.) and formatting
the text in a specific manner to make clear the distinction among these types of
information; (iii) dividing the intervention of the speakers in different paragraphs
and make a clear cut between them; and, (iv) identifying the language in which the
text is written.

14

M. Pereira-Fariña

Annotating written text has several advantages (easy to store, stable, machine
readable, etc.) but it has some drawbacks as well (paralinguistic elements of
communication such as pitch, gestures, etc. are missed). Others available formats
are just audio or video-recordings, but they are usually transcribed into written text
because it is simpler working whit that in this format. What needs to be transcribed
will depend on the goal of the analysis.
1.3.1.2

Annotation Guidelines

Annotation guidelines constitute the set rules and principles that must be applied
to identify what should be annotated and the categories that must be applied
and written by the annotators (Fort, 2016). A relatively stable and agreed set of
guidelines is crucial for good quality in annotation and it always must be attached to
the annotated corpus. As it happens with the selection of the raw corpus, any change
in the annotation guidelines might have a severe impact on the annotated material;
for that reason, introducing a change must be carefully evaluated.
In the case of annotation campaigns on argumentative texts, guidelines must
indicate both what part of the text should be annotated (it might be the case that
not all the text is argumentative) and how. These guidelines are highly dependent
on the theoretical framework supporting the analysis, and they should be as short
as possible but detailed enough to facilitate annotators task. Here, I suggest four
basic recommendations for the elaboration of a stable and good set of guidelines for
argument analysis:
1. Explaining how to navigate through the guidelines, recommending what tasks
should be done first and how to proceed. It might the case that an annotator is
more efficient doing the task in a different way, but it is important to set certain
milestones to be able to evaluate the progress of annotation.
2. Defining as clear as possible criteria for distinguishing between argumentative
and non-argumentative parts of the text (i.e., guide the annotator for selecting
those parts that should be annotated and those not). Argumentative text must be
carefully analysed following the principles and rules defined in the next point;
non-argumentative texts can be omitted, but it is important to keep these text
fragments in the raw corpus just in case further analysis is required.
3. Defining the set of labels, marks, and rules that the annotators must apply
on the text. It is essential to illustrate this with examples, ideally, with real
examples but selected from other data set to avoid potential biases. In the case
of argumentative analysis, these include propositions, locutions, illocutionary
forces, argumentation relations, transitions, etc.
4. Defining a checklist with the most common errors that the annotators must review
before moving on to the next piece of text to minimize possible and recurrent
errors.

1 Introduction to Discourse Analysis and Argumentation Theory

15

Annotation guidelines are usually a result of an iterative process by means of
which they are frequently updated. Several annotation methodologies have been
proposed which include the creation of guidelines as one of the first steps, such
as Agile annotation (Voormann & Gut, 2008), Hovy and Lavid’s methodology
(Hovy & Lavid, 2010), who have defined a 6 steps procedure, where steps
from 2 to 5 consist of updating the annotation guidelines until they are reliable
enough to be used; MAMA methodology (Pustejovsky & Stubbs, 2012) which
is based on a cyclical annotation process until the quality of the annotation is
good enough. All these methodologies coincide in two basic recommendations: (i)
starting with a quick draft of the guidelines; and, (ii) testing it with the material
that is going to be annotated and updating it accordingly until their quality is good
enough.
Another approach for annotating text is defining a set of relevant questions
(instead of rules and labels) that must answered using the raw data. Gee (2011)
proposes an annotation guide consisting of a set of questions to be asked to the raw
material and labelling those statements in the text that answer the questions.
1.3.1.3

Annotators Training

Annotators, the actual team who is going to create the annotated corpora, are an
essential part of the process, and, therefore, they must be carefully selected and
trained. A good training should be done under the same conditions of the actual
annotation and must address, at least, three dimensions (Fort, 2016): training on the
annotation itself, training on the annotation tool and evaluating the training.
The training on annotation itself focuses on teaching the team how to read the
guidelines and how to apply them. This should be done with a mini corpus extracted
from the raw data to make them familiar with the material that is going to be
analysed. In addition, these training sessions should be used to discard those people
that cannot perform the task correctly. Once the annotator team has been selected,
a new annotation session in real conditions is highly recommendable to assess how
they work together and to obtain a realistic time estimation of the task.
The training on the annotation tool should make all the annotators familiar with
the annotation software. This should be also done with a mini corpus of the raw
data to identify all the potential doubts and problems that might appear during the
annotation itself.
The evaluation of the training will allow us to know when the team is ready
to begin the task. This should be done also in both on a mini-corpus of the raw
material (F-measure or accuracy, see section Evaluating the Annotation) or between
annotators (inter-annotator agreement, see section Evaluating the Annotation). The
insights of the evaluation will be also very useful for the organization of the
annotation itself, organising, for instance, annotators according to their better skills.

16

M. Pereira-Fariña

1.3.2 Annotation
Once annotation guidelines are relatively stable and annotators have been trained,
the actual annotation can start. Although it is likely that both guidelines and annotators need to be corrected to reach their maximum capabilities, these corrections
should be as minimal as possible and never as relevant as during the pre-campaign
stage. At this stage, which usually is time limited (although the limit can be
months), trained annotators add the marks and labels to the raw data and produce
the annotated corpus, the output of the process.
There are two basic forms for organising an annotators team (Fort, 2016): (i) they
can constitute a well-defined and limited team of people (collaborative annotation);
or, (ii) an undefined and large group of people (crowdsourcing annotation).
Collaborative annotation can run both in parallel, where each annotator is
annotating a different part of the raw corpus at the same time; or sequentially, where
each annotator is doing the same task in the whole corpus (i.e., one annotator is
doing rule 1 of the guidelines, other is just doing rule 2, etc.). Sequential annotation
can speed up the process, since annotators are specialised on one specific task
(i.e., splitting the text between argumentative and non-argumentative, identifying
argumentative relationships, etc.) and they can do it very efficiently. However, it
also entails a risk of bias, given subsequently steps are based on a unique source
and accuracy is more difficult to control. On the other hand, parallel annotation
makes easier the quality control (each output from each annotator can be evaluated
individually) but it is usually slower. No of those approaches is essentially better
than the other, choosing one or another is a matter of each annotation campaign.3
Crowdsourcing annotation has been gaining popularity and relevance during the
last 15 years, especially after Amazon Mechanical Turk4 has appeared. Unlike
collaborative annotation, although both require a group of annotators, this is an
undefined and large group of people, usually recruited through and open call (Fort,
2016, p. 63). Crowdsourcing has achieved a massive success due to it is a low-cost
solution and it is broadly distributed, therefore annotators can be easily and quickly
substituted. Sequential annotation is very rare in crowdsourcing because its core
is, precisely, a group of people working in parallel. Its main challenge is training
annotators, given we are dealing with an open and undefined team. There are several
strategies to evaluate the skills of the crowd (i.e., background knowledge that they
need, training exercises, tests, etc.) but they are much more difficult to control than
in collaborative annotation.
Collaborative and crowdsourcing annotation are two alternative and complementary ways of doing annotation campaigns, none of them is essentially better than the
other.

3 http://bbc.arg.tech/

show an example of an annotation campaign done using collaborative
annotation within 24 h.
4 https://www.mturk.com

1 Introduction to Discourse Analysis and Argumentation Theory

17

1.3.3 Evaluating the Annotation
Evaluating the quality of the annotation is an essential step to obtain a good
quality annotated corpus. However, it is also a challenging task since there is not
such a ground truth. Annotating is highly interpretative and discrepancies among
annotators are inevitable. Thus, we only can aim to tackle this evaluation in terms of
consistency among the annotators. If two or more coincide in the analysis of a text
span, that analysis is expected to be accepted by any other person who is realising
the same task. However, this is also an expensive task, both in time and human
resources, because it always requires more annotation. The advantage of a regular
evaluation is to check whether the annotation is correctly done, if there is any error
that needs to be fixed or if the guidelines require an update.
There are two main approaches for assessing the quality of a annotation
task (Fort, 2016): (i) inter-annotator agreement, which consists in checking the
agreement between two or more annotators in the annotation, (ii) gold-standard,
which consists in creating an ideal mini-reference annotated corpus from the raw
data and compare the outputs from the annotators with that (precision, accuracy, Fmeasure, etc.). In both cases, a double annotation is required, and it usually consists
of a randomly sample of the 10% of the raw data.
For inter-annotator agreement metrics there are two basic categories of evaluation
metrics:
• Based on agreement: They asses the reproducibility of the annotation considering
the matching of two annotators annotating the same excerpt. The main assumption can be formulated as follows: if two annotators match, the annotation is
right; if there is a mismatch, the annotation is wrong.
• Based on error detection: They compare annotated chunks of text with a goldstandard. The main assumption can be formulated as follows: if the annotator
matches with the gold standard the annotation is right; if not, there is a mistake.
Another possible method is to consider one of the annotators as the gold standard
and evaluating other in terms of how close is to the gold standard and vice versa.
They key concept in both measures is observed agreement; i.e., the percentage
of times the annotators annotated the same chunk of text in the same way. It is
calculated by means of a confusion matrix or contingency table, which allows us
to observe in what categories or labels annotators agree or what categories have
been actually used. However, observed agreement is not enough for a reliable
quality measure, since it does not consider agreement by chance: the simpler
the annotation, the higher the probability of an agreement by chance. Table 1.3
recreates a fictional contingency table for the annotation of two categories (Support
and Conflict) in a corpus with 39 argumentative instances. Diagonal shows the
actual agreement between both annotators (both agree that there were the same 15
instances of Support and the same 18 instances of Conflict) and the other values
show disagreement; for instance, there were 2 instances classified as Support by
Annotator 2 but annotated as Conflict by Annotator 1 and there were 4 instances

18

M. Pereira-Fariña

Table 1.3 Contingency table
for the annotation of two
argumentative categories,
Support and Conflict, in a
corpus with 39 instances

Annotator 2

Support
Conflict
Total

Annotator 1
Support Conflict
15
2
4
18
19
20

Total
17
22
39

of Support annotated by Annotator 1 that were labelled as Conflict by Annotator 2.
Thanks to this representation, it is easy to infer what are the prevalence categories
or whether disagreements are concentrated into a single category, or they are sparse
amongst most of them.
The well-known Kappa measures, based on the notion of observed agreement,
are useful both for assessing the reliability of the guidelines for annotation and the
quality of the annotated corpora. Two are the most widely used metrics are:5
• Cohen’s kappa (Cohen, 1960): It compares the annotation between two annotators.
• Fleiss’ kappa (Fleiss, 1971): Very similar to Cohen’s kappa, it compares the
annotation between more than two annotators.
Both measures define a coefficient between the observed agreement and the
expected agreement; i.e. the chances of two annotators matches based on how they
actually annotated categories. This means that the expected agreement between two
annotators in two categories is not 50%, but the probability of each annotator to
assign a specific category to a unit. Let’s imagine an unreliable annotator that applies
the same category to all the units; the expected value for that annotator is not 50%
to each category but 100% to one of them.
Cohen’s kappa is defined as follows:
κ=

.

(Ao − Ae )
(1 − Ae )

where Ao stands for the observed agreement and Ae stands for the expected
agreement by chance (it must be calculated). There are different computational
resources for obtaining Cohen’s kappa metrics, such as R, SPSS, Excel or even
webpages (i.e., https://www.graphpad.com/quickcalcs/kappa1/). For instance, the
Cohen’s kappa for the annotators of Table 1.3 is 0.691.
Cohen’s kappa (and Fleiss’ kappa, since it is a generalization of Cohen’s kappa)
presupposes an equal negative impact for all errors. However, this is not always the
case. Let’s imagine an annotation task where annotators have to identify inference
types and conflict types. A disagreement between annotating inference and conflict
is much more relevant that a disagreement between inference types or conflict types.
Weighted coefficients for errors allow us to rank disagreements according to their

5 For

a very detailed analysis of kappa measures, cfr. (Fort, 2016).

1 Introduction to Discourse Analysis and Argumentation Theory

19

impact on the annotation. The most used of this type of measures is a weighted
version of Cohen’s kappa (Cohen, 1968):
κw = 1 −

.

Do
De

where Do is the observed disagreement and De the expected disagreement; i.e., the
chance of agreement given the distance between categories (if they can be ordered)
or the relevance of the error. It is worth noting that this weight is not empirically
obtained but it depends on the knowledge or intuitions of the campaign manager.
This is especially relevant in annotation campaigns where categories are highly
interpretative or subjective, because this might introduce a bias from the manager.
Other campaigns, where the annotation is not as subjective, such as argument
schemes annotation (Visser et al., 2021), weighted Cohen’s kappa can provide an
informative result of the reliability of the annotation.
The result of kappa measures ranges from −1 to 1, where −1 means total
disagreement, 1 total agreement and 0 no agreement. There is not unanimity in
the selection of a single value to say what is a good agreement, since this is highly
dependent on the complexity of the annotation task. However, three main ranges are
generally stablished:
• (0,0.4]: reliability is poor or very poor.
• (0.4,0.8): reliability is good.
• [0.8,1]: reliability is very good or almost perfect.
Metrics based on gold standard aims to detect error in annotations rather
than disagreements. The most used metric is F-measure, originally designed for
information retrieval, and it is a useful tool for quality evaluation of in annotation
campaigns (Fort, 2016). It is simpler to calculate than any of the previous measures
and each annotation category can be individually evaluated.
F-measure is defined as the weighted average of recall and precision, and it range
from 0 to 1 ([0, 1]), where 0 is the worst result and 1 the perfect one:
F − measure = 2·

.

precision· recall
precision + recall

where precision and recall are, respectively:
Recall =

.

Number of correct annotations by annotator
Number of correct annotations in gold standard

Precision =

.

Number of correct annotations by annotator
Total number of annotations by annotator

20

M. Pereira-Fariña

The key point for F-measures is the gold standard. This can be built as the result
of a very careful expert annotation or as the result of the overlapping and solved
disagreements between two annotators.
F-measure is very useful for training annotators because it allows us to analyse
each annotated category individually and guide the training to improve errors very
specifically. Its main drawback is that chance is not taking into account and this
can have a severe impact on the reliability of the evaluation when the number
of categories is very low (e.g., in annotation campaign with only two categories,
baseline for F-measure is already 0.5 if both annotators randomly annotate the text);
however, when number of categories is very high, kappa measures and F-measure
tend to agree on (Hripcsak & Rothschild, 2005).
Kappa family and F-measure are the two most widely metrics for assessing the
quality of an annotation. There are other metrics (Fort, 2016, pp. 50–62), although
these are out of the scope of this introductory chapter. A complementary measure
is Intra-annotator agreement, which consists in applying the same measures but
comparing the annotator with himself in order to assess the reproducibility of the
annotation.

1.3.4 Finalization
Finalization is the last stage of any annotation campaign and, when it is completed,
it means that the corpus is ready to be made public. His main responsible is the campaign manager since it is the maximum expert both in the topic and in the theoretical
framework that support annotation guidelines. Three main parts constitute it (Fort,
2016): (i) adjucation; (ii) technical reviewing; and, (iii) publication.
The adjucation is the correction, by the campaign manager or other expert, of the
annotated corpora. Its goal is to remove all discrepancies between annotators. In this
case, the campaign manager (or another expert) has to review the annotation, check
the competing interpretations and decide which one fits better with the annotation
guidelines.
The technical reviewing consists in checking that there is not errors or corrupted
files. For instance, IAT guidelines (and IAT/ML as well) indicate that every
inference node must be anchored to a transition, therefore any un-anchored node is
a technical error. In addition, it should be checked whether the annotation includes
any forbidden character or element that might corrupt the final file. Ideally, this
reviewing is automatically or, at least, semi-automatically done and integrated in
the annotation tool itself.
Publication is the very last step of the process, meaning that the job has been
finished. The annotated corpus should be published with the last version of the
annotation guidelines. For instance, if during the adjucation appears an error that
requires a modification in the guidelines, then that updated version of the annotation
guide must go with the annotated corpora.

1 Introduction to Discourse Analysis and Argumentation Theory

21

1.4 Exploiting the Results
Goodly annotated corpora are valuable resources. They can be exploited in several
ways, both for the development of practical applications, such as the development of
machine learning algorithms for Natural Language Processing, and for addressing
theoretical issues, such as the study of specific research questions. In addition, they
are highly reusable, since the same corpora can be used by different researchers and
for tackling different research questions, even formulated years after the annotation
has been done, such as in (Visser et al., 2021).
In this chapter, we will focus on two ways of exploiting an annotated corpus of
arguments: (i) descriptive metrics; and (ii) interpretative analysis.

1.4.1 Descriptive Metrics
Descriptive metrics provide a first and quick insight of the annotated corpora. They
consist of a range of statistics providing different numbers that allows us to build an
overview of the corpora (Lawrence et al., 2016). The most basic stats, proposed in
(Lawrence et al., 2016), are:
• Raw numbers: A quantitative description of the entities that constitute the
annotated corpora, such as the number of words, sentences, etc. Its main idea
is to obtain a general idea of the size of the corpus.
• Number of propositions: This shows the number of annotated propositions,
which are one of the central points in any argumentative theoretical framework.
• Number of argumentative nodes: It indicates the number of propositions linked
with, at least, another proposition. This give us a general idea of the nature of
the text, whether it is highly argumentative (a high proportion of arguments with
respect the number of propositions) or not (a low proportion of arguments with
respect the number of propositions).
Depending on the adopted theoretical framework, more elaborated stats can be
defined. Next, we propose some exploratory metrics ones based on IAT/ML (many
of them are also applicable to IAT):
• Categories of illocutionary connections: This metrics must capture the number
of instances of each category of the illocutionary forces by speaker. The total
number is the same of the number of propositions (every proposition has an
illocutionary force) but the distribution among the different categories suggests
us the type of text or even its genre; for instance, a text with a high proportion of
challenging by one speaker and not by the other indicates that this might be an
interview; on the other hand, a more or less well balance number of challenging
might indicate a debate.
• Number of conflict nodes and rephrase nodes: They are complementary to the
metric of argumentative nodes. The combination of the three of them provides

22

M. Pereira-Fariña

us some insights about the nature of the annotated corpora, whether it is a highly
controversial topic or not, for instance.
• Number of objects: This metric derives from conceptual modelling (GonzalezPerez, 2018). It denotes the number of relevant entities in the world mentioned
in the corpora and the values of its attributes. This provides us insights about
the presumptions and background of the speakers involved in the debate, what
are their respective starting points and what is their final status with respect their
beliefs or commitments respect to the world.
• Hot objects: It is the result of calculating the number of references from the
propositions that an object receives. This indicates how central is the entity in the
debate and allows us to identify the main positions of speakers with respect to it.
Given the statistical nature of these metrics, typical visualization methods, such
as tables, boxplots, diagrams, etc. are perfectly suitable for the dissemination and
communication of these results.

1.4.2 Interpretative Analysis
We can go further than basic stats and elaborate a more in-depth analysis of
the annotated corpora. A well-defined set of these type of statistics for IAT are
described in (Lawrence et al., 2016), which distinguished between two main
families: (i) dialogically oriented statistics, which focus on providing insights about
the dynamics and complex interactions of the debate; and, (ii) real-time statistics,
which focus on displaying how the debate is evolving.
Dialogically stats unpack the inner dynamics of the debate, showing the key
points of the interaction among speakers once the debate is over. They provide
a posteriori analysis which allows us to characterise, among other things, the
presence of each speaker in the discussion according to the uttered locutions, the
interaction among participants (who replies who and when) or what are the most
controversial propositions of the debate, those that have generated more reactions
among speakers. An application of this technology in a real-world case can be
checked at Piloting argument technology with the BBC,6 a project developed by
the Center for Argument Technology (ARG-tech) and the BBC in 2017.
Real-time stats, on the other hand, focus on showing how the debate is live
evolving (Plüss et al., 2018). The current status of this technology only allows us
to visualize the transcription of the debate, the participation of the speakers and
the topics that have been discussed. However, since this is shown in real time, it
can modify the behaviour of the proper participants: the visuals show if someone is
speaking too much or who is totally quiet.

6 http://bbc.arg.tech/bts/

1 Introduction to Discourse Analysis and Argumentation Theory

23

A step further in this interpretative analysis of an annotated corpora under
IAT/ML theory is to analyse its underlying conceptual model. Currently, I only
propose a posteriori analysis, but it can provide us interesting insights about the
worldview of each speaker and the potential overlapping and discrepancies between
them. We distinguish three main families of metrics: (i) ontological oriented
statistics; (ii) hierarchical oriented statistics; and (iii) temporal statistics.
Ontological oriented statistics focus on the discrepancies among the entities
presupposed or accepted by each speaker. These contain two main types of
ontological disagreements (Gonzalez-Perez, 2018, pp. 158–159):
• Predication conflict: It occurs when two different speakers agree both on the
existence of an entity and its properties, but they differ on the values assigned to
them. For instance, in Figs. 1.1 and 1.3, R1 and R2 agree on the existence of the
“RightOfUse” but they differ on its predication, R1 asserts than “Everyone might
access” while R2 asserts only “Experts might access”.
• Existence conflict: It happens when two different speakers disagree on the
existence of the entity itself; one of the speakers says that the entity exists while
the other denies it. For instance, R2 asserts the existence of different situations
that might affect the conservation of the Cave, which are not recognised by R1.
Hierarchical oriented statistics describe what are the relationships between the different entities handled by speakers (both categories and instances of that categories).
The typical disagreement that happens here is the classification conflict (GonzalezPerez, 2018), when two speakers classify an entity in different manners. In the
debate between R1 and R2 there is not such as conflict, but let’s imagine a discussion
where R1 is an archaeologist and R2 a geologist, both agree with the existence of the
cave, but they might differ in its classification; the former asserts that is importance
relies on being a cultural object while the latter only emphasises its relevance as a
geological element.
Finally, temporal statistics focus on assessing how many changes have occurred
between the initial conceptual model (defined at the beginning of the debate)
and the final one, when the debate has ended. These changes are derived from
the disagreements described before and they capture whether they were solved (a
change in the conceptual model) or not (the disagreement is kept). This would allow
us to assess how the discourse world has been modified after the debate.

1.5 Conclusion
Discourse analysis, and specifically argumentative analysis as an essential part of it,
are useful theoretical frameworks to understand how speakers motivates and justifies
their ideas or actions in the world and how these are communicated to others for
persuading them. Despite this is a qualitative approach, highly interpretative and
context dependent task, it must be rigorously performed in order to obtain acceptable
results from a scientific point of view.

24

M. Pereira-Fariña

There are different approaches for tackling the analysis of the argumentative
structure of a given discourse in the literature. We have described three of them,
which relies on the different views about what an argument is. Thus, when an
argument is considered as a logical entity, it is considered as an isolated unity and the
goal of the analysis is to identify its inner structure; when an argument is conceived
as a speech act, the argument is studied as a communication element where the
intention of the speaker plays a major role; and, when an argument is perceived as a
cognitive element, it has a communicative part and a representational one and both
the intention of the speaker and the background of the hearer are part of the analysis
to reconstruct the discourse world underlying the debate. Each of these frameworks
allows us to achieve different goals, ranging from looking for pure linguistic results
to uncover the beliefs, intentions and commitments of speakers.
Each theoretical approach determines how discourse in general and arguments
in particular must be annotated. Annotation is the procedure through we add
interpretative information to a text, and it is an essential step for performing a
reliable analysis. Its key point is the definition of the annotation guidelines, which
translate the theoretical principles into specific labels to be added. I have described
the basic principles that should guide an annotation campaign for obtaining a
valuable result, which include data collection, the creation of annotation guidelines,
the training of the annotators and the finalization of the campaign.
The final step is the exploitation and communication of the results. The exploitation is usually based on a qualitative analysis of the annotated corpus, identifying
patterns, regularities, etc. supported on a quantitative analysis as well (such as the
number of words, propositions, inferences, conflicts, rephrases, etc.). Visualization
is the key point for the communication of these results, since annotated corpora is
rarely friendly for a non-expert user and difficult to interpret.
However, any discourse analysis study has its limitations. Firstly, theoretical
frameworks are very interpretative and different researchers can elaborate different annotation guidelines from the same theory. This might potentially generate
confusion when the obtained results are compared, but it is also a source of
richness, showing how complex natural language is when it is studied inside its
communicative dimension (rather than a representative one). Secondly, elaborating
annotation guidelines is a time-consuming task which usually required several
cycles, and a perfect and absolutely complete set of guidelines is impossible to
reach. However, a rigorous evaluation both the process and the annotated corpora
can guarantee a reliable result. The last main drawback is the risk of bias, because
it is difficult for a researcher to avoid his or her presumptions or prejudices about
the topic of analysis; however, this problem can be minimised following a rigorous
evaluation as it was described.
Discourse analysis in general, and argumentative analysis, is a methodology
which will allow us to acquire a better understanding about how knowledge is
produced and communicated to others in cultural heritage studies, among other
areas.

1 Introduction to Discourse Analysis and Argumentation Theory

25

Acknowledgements This research has received financial support from the grant “Heritage 3.0: Argumentation and Conceptual Modelling for Enhanced Cultural Heritage
Participation and Management Policies” (ACME), grant number PID2020-114758RBI00 funded by MCIN/AEI/10.13039/501100011033 and project “Deflationist Views in
Ontology and Metaontology”, grant number PID2020-115482GB-I00, both funded by
MCIN/AEI/10.13039/501100011033.

References
Austin, J. L. (1989). How to do things with words: The William James lectures delivered at Harvard
University in 1955 (2nd ed.). University Press.
Black, M. (1967). Induction. In P. Edwards (Ed.), The encyclopedia of philosophy (pp. 169–181).
Macmillan/Free Press and Collier-Macmillan.
Chandler, D. (2003). Semiotics: The basics (1st publication repr ed.). Routledge.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological
Measurement, 20(1), 37–46.
Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4), 213–220.
Eco, U. (1979). A theory of semiotics. Indiana University Press.
Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological
Bulletin, 76(5), 378–382.
Fort, K. (2016). Collaborative annotation for reliable natural language processing: Technical and
sociological aspects. ISTE/Wiley.
Gee, J. P. (2011). An introduction to discourse analysis: Theory and method (3rd ed.). Routledge.
Gonzalez-Perez, C. (2018). Information modelling for archaeology and anthropology: Software
engineering principles for cultural heritage. Springer.
Gonzalez-Perez, C. (2020). Connecting discourse and domain models in discourse analysis through
ontological proxies. Electronics (Basel), 9(11), 1955.
Hovy, E., & Lavid, J. (2010). Towards a “science” of corpus annotation: A new methodological
challenge for corpus. International Journal of Translation, 22(1), 13–36.
Hripcsak, G., & Rothschild, A. S. (2005). Agreement, the f-measure, and reliability in information
retrieval. Journal of the American Medical Informatics Association: JAMIA, 12(3), 296–298.
Janier, M., & Reed, C. (2017). I didn’t say that! Uses of SAY in mediation discourse. Discourse
Studies, 19(6), 619–647.
Jo, Y., Mayfield, E., Reed, C., & Hovy, E. (2020). Machine-aided annotation for fine-grained
proposition types in argumentation. In 12th international conference on language resources
and evaluation Marseille (p. 1008).
Lawrence, J., Duthie, R., Budzynska, K., & Reed, C. (2016). Argument analytics (p. 371).
Lucas, G. (2019). Writing the past (1st ed.). Routledge.
Mercier, H. (2012). Looking for arguments. Argumentation, 26(3), 305–324.
Mercier, H., & Sperber, D. (2018). The enigma of reason. Penguin Books.
Paltridge, B. (2012). Discourse analysis: An introduction (2nd ed.). Bloomsbury.
Peirce, C. S. (1958). Collected papers of Charles Sanders Peirce. Harvard University Press.
Perelman, C., & Olbrechts-Tyteca, L. (1973). The new rhetoric. A treatise on argumentation.
University of Notre Dame Press.
Plüss, B., Sperrle, F., Gold, V., El-Assady, M., Hautli-Janisz, A., Budzynska, K., & Reed, C. (2018).
Augmenting public deliberations through stream argument analytics and visualisations.18
October 2018 through 19 October 2018, p. 1.
Pustejovsky, J., & Stubbs, A. (2012). Natural language annotation for machine learning. O’Reilly
Media, Incorporated.

26

M. Pereira-Fariña

Reed, C., & Budzysnka, K. (2010). How dialogues create arguments. In F. van Eemeren, B.
Garrsen, D. Godden, & G. Mitchell (Eds.), 7th international conference of the International
Society for the Study of argumentation. Ronzenberg/Sic Sat.
Richard, M. (2013). What are propositions? Canadian Journal of Philosophy, 43(5–6), 702–719.
Salmon, W. C. (1984). Logic (3rd ed.). Prentice-Hall.
Searle, J. (1965). What is a speech act? In Philosophy in America (pp. 221–239). Allen and Unwin.
Searle, J., & Vanderveken, D. (1989). Foundations of illocutionary logic (1st ed., repr. ed.).
University Press.
Serrano, S. (1983). La Semiótica: una introducción a la teoría de los signos (2nd ed.). Montesinos.
van den Hoven, P. J. (2015). Cognitive semiotics in argumentation; a theoretical exploration.
Argumentation, 29(2), 157–176.
van Eemeren, F. H., & Grootendorst, R. (1984). Speech acts in argumentative discussions. De
Gruyter.
van Eemeren, F. H., & Grootendorst, R. (2004). A systematic theory of argumentation (1st
Publication ed.). Cambridge University Press.
Visser, J., Lawrence, J., Wagemans, J. H. M., Reed, C., Modgil, S., & Budzynska, K. (2018).
Revisiting computational models of argument schemes: Classification, annotation, comparison
(p. 313). IOS Press.
Visser, J., Lawrence, J., Reed, C., Wagemans, J., & Walton, D. (2021). Annotating argument
schemes. Argumentation, 35(1), 101–139.
Voormann, H., & Gut, U. (2008). Agile corpus creation. Corpus Linguistics and Linguistic Theory,
4(2), 235–251.
Wagemans, J. H. M. (2016). Constructing a periodic table of arguments. Available: https://
scholar.uwindsor.ca/ossaarchive/OSSA11/papersandcommentaries/106. 29 Jan 2021.
Wagemans, J. H. M. (2019). Four basic argument forms. Research in Language, 17(1), 57–69.
Walton, D., Reed, C., & Macagno, F. (2008). Argumentation schemes. Cambridge University Press.

Part I

Conceptual Approaches

Chapter 2

A New Approach to Interoperable
Argumentation Documentation
Stephen Stead

Abstract The chapter outlines the development of support for inference chains
in the CIDOC Conceptual Reference Model family of standards. It illustrates the
capabilities available in CRMbase and notes the limitation that the evolution of any
assertion or knowledge revision cannot be adequately documented. It then continues
by detailing the extended facilities delivered by CRMinf that address this shortcoming. Next it considers the added potential to document the scholarly reading of
texts that are considered specious and finally looks to future work on the extension.
Keywords CIDOC CRM · CRMinf · Argumentation · Inference chains

2.1 History
The CIDOC Conceptual Reference Model (CIDOC CRM) is a formal ontology
intended to facilitate the integration of cultural heritage data sets. It provides the
semantic definitions of elements of scholarly discourse, about both tangible and
intangible heritage, that are needed to support such cross-organisational integration.
It has been an ISO standard (ISO 21127) since 2004 and is actively maintained,
refined, and enhanced by a user community known as the CRM Special Interest
Group (CRM-SIG). It consists of a core set of concepts (CRMbase ), that form the
ISO standard, plus a family of extensions that add the elements needed to cover
more specialised areas of data and their integration.
The idea of incorporating Jean-Claude Gardin’s concept of an “Inference Chain”
(Gardin, pers. comm. and 1990) was first suggested at the CIDOC Conceptual
Reference Model working meeting in Agios Pavlos, Crete in 2000 (Stead, pers.
comm.). This became truly possible in version 3.4.2 (Crofts et al., 2003) when,
based on the experience of the team at the University of Oslo, the properties P140
S. Stead (!)
Paveprime Ltd, Purley, UK
e-mail: steads@paveprime.org
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
C. Gonzalez-Perez et al. (eds.), Discourse and Argumentation in Archaeology:
Conceptual and Computational Approaches, Quantitative Archaeology
and Archaeological Modelling, https://doi.org/10.1007/978-3-031-37156-1_2

29

30

S. Stead

assigned attribute to (was attributed by) and P141 assigned (was assigned by) were
introduced. However, as Doerr et al. (2011) demonstrated this is not sufficient to
represent both the structure and the evolution of argumentation, as it only allows
the representation of a “finished” chain at the point in time of the completion of
the current documentation state. However, despite the successful proof-of-concept
implementation of the Integrated Argumentation Model (IAM) (Boutsika, 2010), it
was felt that a lighter version was required for integration into the CRM standards
family. The work on this was started in 2014 and, after initial consultation drafts,
a working version with an RDFS representation was released in 2015. A further
enhancement was released in 2019 that formalised the concept of scholarly reading
that had been developing in the CRM-SIG (Special Interest Group) since 2017 (see
Issue 334: Scholarly Reading at http://www.cidoc-crm.org/Issue/ID-334-scholarlyreading).

2.2 Representing Inference Chains in CRMbase
The representation of simple Inference Chains that document the construction of
archaeological phases can be done by simply using the property P46 is composed
of (forms part of). This allows the documentation of the constituent elements that
together, form larger groupings that are of interest to the researcher.
For example, at the West House in Akrotiri, Santorini:
E19 Physical Object [individual slab] P46i forms part of E18 Physical Thing [upper room
slab surface] P46i forms part of E19 Physical Object [West House] (Mιχαηλίδoυ, 2001,
pp. 40, 68–70)

However, this representation does not capture that this is a later interpretation of the
original excavation data. It only represents the “current” understanding at the time
of the 2001 publication of the Inference Chain.
A richer picture can be captured using E13 Attribute Assignment and associated
properties to capture the details of the process of making such assertions. For
example, the first triple in the above (E19-P46i-E18) can be enriched with the
following:
Path 1} E19 Physical Object [individual slab] P141i was assigned by E13 Attribute
Assignment [connecting slab to floor] P140 assigned attribute to (was attributed by) E18
Physical Thing [upper room slab surface]
Path 2} E13 Attribute Assignment [connecting slab to floor] P177 assigned property of type
E55 Type [“P46 is composed of (forms part of)”] P2 has type (is type of) E55 Type [“Type
of Property”] P2 has type (is type of) E55 Type [“Type of Type”]
Path 3} E13 Attribute Assignment [connecting slab to floor] P14 carried out by (performed)
E39 Actor [ID of actor] (P14.1 in the role of E55 Type [“Assigner”]) P1 is identified by
(identifies) E41 Appellation [“Mιχαηλίδoυ, A.”]

This is in effect a reification triangle over the original triple, but it additionally
allows the connection of a timespan to the instance of E13 Attribute Assignment

2 A New Approach to Interoperable Argumentation Documentation

31

and information on how it was done, by adding a reference to the E29 Design or
Procedure used.
Path 4} E13 Attribute Assignment [connecting slab to floor] P4 has time-span (is time-span
of) E52 Time-Span [2001]
Path 5} E13 Attribute Assignment [connecting slab to floor] P33 used specific technique
(was used by) E29 Design or Procedure [Guidelines for Fragment Reconstruction]

Even this enriched representation provides no direct mechanism for understanding
the evolution of the inference chain. The temporal order of instances of E13
Attribute Assignment can of course be recovered but representing changes in
understanding, using only this temporal order, can make querying for the current
state of knowledge very difficult. For instance, if an earlier assertion was made
that the individual slab was part of a different slab surface (the lower room one
for instance), how would it be known that the corresponding triple was no longer
“current” when the new assertion about the upper slab floor was entered? It would be
apparent that the corresponding E13 Attribute Assignment was more recent, but this
would not enable the user to understand if the old assertion was no longer valid, or if
both assertions were valid, or if they were competing assertions. The querier would
need detailed understanding of the specifics of each knowledge revision process to
be able to craft queries that gave the desired results and in a multi-source integration
environment this would be extremely onerous.

2.3 The Building Blocks of CRMinf
CRMinf provides the framework for documenting the evolution of Inference Chains.
It recognises there are clear separations between the assertions that constitute an
argument; the process and temporal order of argumentation activities; and the belief
in the result. This clear separation is a novelty in argumentation models (Doerr et
al., 2011) and enables two important capabilities. The first is the documentation of
Inference Chains that were not constructed in the logical order of their constituent
assertions: so in the first example the recognition of the Upper Room Slab Surface
being part of the West House occurred before the assertion that the Individual
Slab was part of it (Mαρινάτoς 1974; Mιχαηλίδoυ, 2001). That the chronology
of argumentation does not match the logical sequence of the assertions that make
up the argument is almost axiomatic in archaeology as the observation process is
usually the inverse of the formation process. The second capability is the clear and
unambiguous recording of the process and results of knowledge revision activities
acting upon an information system.
To enable this separation CRMinf provides three functional groups of classes and
properties. In the first functional group the argumentation activities are located in
time (process); the second provides the mechanism for understanding who believes
what (belief) and the third what it is they believe in (the assertions).

32

S. Stead

Fig. 2.1 The first functional group of CRMinf entities showing the three types of knowledge
creation activity

The first functional group differentiates (see Fig. 2.1) between three types of
knowledge creation activity and groups them under the I1 Argumentation class. The
stipulation is that the person, or group, undertaking the argumentation are making
honest inferences or observations; that is they attest the resulting believe value is
correct at the time the activity was undertaken and that any methodology used was
correctly applied. Knowledge creation is specifically the instilling of justified belief
in the mind of some person(s) (Bruseker et al., 2018). The first type of knowledge
creation is observation. The class for this is defined in CRMsci and covers the use of
human senses, often augmented with tools and instruments, to note some attributes
of real-world objects and processes. These attributes are approximations to the
traits exhibited by the real-world entities and take the form of propositions about
them. Subsequently, these propositions can be used for inference making, including
evaluation, and are typically recorded in an information system of some sort.
The second type of knowledge creation is inference making. In this, a set of
propositions has some kind of processing or inference logic (see Fig. 2.3) applied
to it to produce a new set of propositions. In CRMinf the source and outcome
sets of propositions always have a conviction or belief associated with them. This
ensures that all the steps in an Inference Chain are fully provenanced. The inference
logic that is applied can be in any form that is acceptable in the community of
use. It can include employing formal logic; using probabilistic reasoning and other
mathematical models; the application of social theory, or the comparison with
cultural parallels.
The third type of knowledge creation is belief adoption. This covers the cases
where an existing belief in a set of propositions is adopted by a person or group
who were not the creator(s) of the belief that has been adopted. The adoption does
not have to include all the propositions in the original set, but it does have to accept
the same level of belief as originally believed. To disagree with the original belief,

2 A New Approach to Interoperable Argumentation Documentation

33

Fig. 2.2 The second
functional piece of CRMinf :
the I2 Belief. It also shows
the elements required for
scholarly reading

a process of inference making is required in which the original belief is the premise
for the inference making.
The second functional group is the conviction or believe itself (see Fig. 2.2).
Convictions and beliefs are temporal entities: that is, they are perdurants and exist
only for a period of time. They come into existence when an individual or group (an
instance of E39 Actor in CRM terms) performs some kind of knowledge creation
activity (observation, inference making or belief adoption (see above)) and pass out
of existence when, either the individual dies (or the group is dissolved), or the actors
change their believe in the associated propositions through an(other) inference
making activity. Note that the adoption of another contradictory believe about the
some or all of the propositions in the original propositions set should, necessarily,
terminate the original believe, despite humans being contrary creatures that can
simultaneously believe quite contradictory things. The problem with allowing an
actor to hold such contradictory opinions, in a system of honest believes, is that it
both undermines faith in the “honesty” and means that the automation of change
propagation in downstream inference chains becomes highly problematic. Forcing
the use of an inference making activity to change belief in the whole of the original
proposition set and simultaneously creating one or more new beliefs (with the same
believe value in zero, one or more of the original propositions and new believe
values in all of the remainder) alleviates this issue. Beliefs are always held in
relation to a particular proposition set (which is the third functional group) and
associate a specific belief value with that proposition set for their lifespan. Belief
values can take many forms, including Bayesian probability values or ordinal
scales (for instance: “unlikely”, “possible”, “probable”, “certain”). However, the

34

S. Stead

minimum implementation requirement is a three-value system of “True”, “False”,
and “Unknown”. This allows automation of downstream change propagation, with
previous values of “True” or “False” being changed to “Unknown” for all inferences
downstream of a changed belief value. Boutsika (2010) reports on a successful
proof-of-concept implementation of this strategy for inferences about the Oetzi
Iceman.
The third functional group is the proposition set that the belief is held about
(See Fig. 2.3). The standard is agnostic as to how this is implemented but
requires that the propositions are uniquely identifiable and refer to recognisable
instances of the classes or concepts of a formal ontology (including, but not
restricted to, the CRM). Guidelines for using Named Graphs are under development in the CRM-SIG (http://www.cidoc-crm.org/Issue/ID-526-named-graphusage-recommendations-guideline-document) and interest in RDF-star is also covered by this issue.
The properties that link the components of these three functional groups are
straightforward (see Fig. 2.4). All three knowledge creation activities are linked
to the resulting conviction (the superclass of belief introduced to support scholarly
reading (see below)) using the J2 concluded that (was concluded by) property. The
link from inference making activities to the input/source conviction (i.e. belief) is
provided by the J1 used as premise (was premise for) property and the link to the
inference logic applied is made with the J3 applies (was applied by) property. Belief
adoption is connected to the adopted belief by the J6 adopted (adopted by) property.

Fig. 2.3 The third functional piece of CRMinf : the I4 Proposition Set. The figure also shows the
position in the hierarchy of I3 Inference Logic and I6 Belief Value

2 A New Approach to Interoperable Argumentation Documentation

35

Fig. 2.4 An overview of the key CRMinf properties linking the classes (in red). Super-properties
from CRMbase are shown in green

Finally, the beliefs are connected to their proposition sets by the J4 that (is subject
of) property and to their believe values by the J5 holds to be property.

2.4 The Extension of CRMinf to Cover Scholarly Reading
As originally conceived the CRMinf provides a rich framework for documenting
the adoption of a scholar’s belief by another. However, what about the case where
the requirement is to document a publication where some or all of the content is
considered specious. Here the belief is that the content was correctly interpreted,
irrespective of the readers belief in the propositions set out in the publication. This
has been generalized to cover the uncontentious reading of any publication. By
uncontentious reading the standard intends to cover cases where multiple scholars
are likely to agree on the propositions that would be recovered from inspecting
a copy (i.e. the symbolic representation) of the publication, even while some of
them vehemently reject these propositions: that is scholars may agree that Gaius
Suetonius Tranquillus wrote in De Vita Caesarum that Nero was singing in Rome
while it was burning from July 19 in 64 CE even if they do not believe that this is
true. However, it is categorically not intended to cover cases where scholarly debate
is about the “reading” of partially illegible texts.
The new superclass I8 Conviction was introduced to provide a generalisation
over I2 Belief and the new I9 Provenanced Comprehension, that covers the correct
reading of the overt message of an instance of E73 Information Object and its
conversion into a set of propositions (see Fig. 2.2). Such a reading and conversion

36

S. Stead

is always undertaken in the context of an explicit statement about the provenance of
the source being read. Three new properties provide the necessary links for such
uncontentious reading: J8 understands (is understood by) links it to the source
information object, J9 believes in provenance (provenance is believed by) attaches
the explicit statement of provenance and J10 reads as connects to the proposition
set generated by the reading.
At the same time as the introduction of the provenanced comprehension construct
a set of simple links to the source of the proposition set being adopted by belief
adoption activities were introduced. This is to provide a parallel, simple mechanism
for documenting uncontentious reading of sources that are believed and provides
simple links or shortcuts to a range of potential source types: J7 is based on evidence
from (is evidence for) links to information objects, J11 used manifestation (was
manifestation used by) links to LRMoo manifestations and J12 used item (was item
used by) links to LRMoo items.

2.5 The Future of CRMinf
Work continues on CRMinf and the development of support for I11 Situations, as a
subclass of proposition sets, that deal the persistence of value ranges of things over
a timespan is being undertaken as part of the work on the CRMsoc extension that is
intended to deal with social relationships and obligations. In addition, the new, as yet
unnamed, extension that provides properties that link to types rather than instances
of classes, as well as “Closed-World” assertions about types of things that are not
present will provide exciting new opportunities to exploit the power of CRMinf .

References
Boutsika, K. 2010. Computer supported collaborative factual argumentation and conflict resolution (Masters of Science thesis). Department of Computer Science, University of Crete.
Bruseker, R., Daskalaki, M., Doerr, M., & Stead, S. (2018). 2018 is that a good concept? In
Mieko Matsumoto and Espen Uleberg (Eds.), CAA2016: Oceans of data proceedings of the
44th conference on computer applications and quantitative methods in archaeology.
Crofts, N., Doerr, M., Gill, T., Stead, S., & Stiff, M. (2003). Definition of the CIDOC conceptual
reference model. Version 3.4.2.
Doerr, M., Kritsotaki, A., & Boutsika, K. (2011). Factual argumentation—A core model for
assertions making. Journal on Computing and Cultural Heritage, 3(3), 34. https://doi.org/
10.1145/1921614.192161
Gardin, J-CL. 1990. The structure of archaeological theories. In Studies in modern archaeology.
Vol 3. Mathematics and information science in archaeology: A flexible framework (pp. 7–25).
Bonn.
Mαρινάτoς, #. (1974). Aνασ καϕαί Θ ήρας VI. BAE 64, π ιν.38β. H Eν Aθήναις
Aρχαιoλoγική Eταιρεία.
Mιχαηλίδoυ, A. (2001). Aκρωτ ήρι Θ ήρας. H µελ0́τ η τ ων oρ óϕων σ τ α κτ ήρια τ oυ
oικισ µoύ. BAE 212, Aθήνα: H Eν Aθήναις Aρχαιoλoγική Eταιρεία.

Chapter 3

Making Good Arguments in Archaeology
Michael E. Smith

Abstract This chapter reviews epistemological and methodological issues of argumentation in archaeology. It begins with historical reasons for the lack of attention
to argumentation in recent decades. Next, it reviews the status of archaeological
argumentation as set out in a 2015 paper (Smith ME, SAA Archaeol Record 15:18–
23, 2015b). This is followed by an expansion of this line of thought based on a
methodological approach initiated by Stephen Toulmin in 1958. Toulmin’s scheme
is based on visual diagrams to show the sequential steps in an argument. It is
a particularly helpful method to show the difference between strong and weak
archaeological arguments about the past. I examine four archaeological arguments,
and use Toulmin’s method to assess their strength. The final topic is an examination
of archaeological modeling as a form of argument.
Keywords Arguments · Stephen Toulmin · Warrants · Analogy

3.1 Introduction
After some serious missteps by Lewis Binford and other new archaeologists in the
1970s, most archaeologists stopped paying attention to the forms of argumentation
in our field. Discussions of the use of analogy continued for a while longer (Wylie,
1985), but they eventually died down. Postprocessualists and scientifically-minded
archaeologists alike avoided the argumentation topic, leaving the theme of archaeological epistemology impoverished. Recently, however, attention to arguments in
archaeology has started to grow again (Smith, 2017; Chapman & Wylie, 2015, 2016;
Smith, 2015b; Orser, 2014; Gibbon, 2014; Moro Abadía & Lewis-Sing, 2021 Currie,
2016). This chapter is a contribution to this line of work.

M. E. Smith (!)
School of Human Evolution and Social Change, Arizona State University, Tempe, AZ, USA
e-mail: mesmith9@asu.edu
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
C. Gonzalez-Perez et al. (eds.), Discourse and Argumentation in Archaeology:
Conceptual and Computational Approaches, Quantitative Archaeology
and Archaeological Modelling, https://doi.org/10.1007/978-3-031-37156-1_3

37

38

M. E. Smith

This chapter begins with historical reasons for the lack of attention to argumentation in recent decades. Next, it reviews the status of archaeological argumentation
status as set out in a 2015 paper (Smith, 2015b). This is followed by an expansion
of this line of thought based on a methodological approach initiated by Stephen
Toulmin in 1958 (Toulmin, 2003). Toulmin’s scheme is based on visual diagrams to
show the sequential steps in an argument. It is a particularly helpful method to show
the difference between strong and weak archaeological arguments about the past.
The final topic is an examination of archaeological modeling as a form of argument.

3.2 The Historical Lack of Attention to Argumentation
in Archaeology
In the 1960s and 1970s, Lewis Binford and the processualist archaeologists adopted
the covering law approach associated with Carl Hempel (1965) and the logical
positivist philosophers of science. The most explicit and strident statement of
this choice was the book Explanation in Archaeology: An Explicitly Scientific
Approach (Watson et al., 1971). To explain an event, these authors claimed, is to
subsume the event under a general law, which then implies that the law explains
the event. By the time the processualists had adopted what Wylie (2002) calls
“Hempelian positivism,” philosophers of science had already rejected this approach
to explanation as inappropriate for the social and historical sciences. Indeed, the
explanatory approach of Watson et al. was criticized by a philosopher of science
(Morgan, 1973). Watson et al. responded to this critique by saying that as a
philosopher of science, Morgan didn’t know anything about archaeology, and
therefore he should leave them alone (Watson et al., 1974).
Independent of the philosophy of science, some archaeologists were quick to
criticize the use of Hempel and covering laws by their colleagues. Jeremy Sabloff et
al. (1973:112) called this work “naïve,” and Kent Flannery (1973:51)—never one to
mince his words—opined that the Hempelian approach “has produced some of the
worst archaeology on record.” The only covering laws archaeologists could come
up with were so trivial that Flannery called them “Mickey Mouse laws” (p. 51).
Neverthless, the processualists maintained their use of covering law explanation,
in spite of its lack of fit for historical and social sciences. This stubborn holding
to a faulty view of explanation “caused great harm to archaeology by setting
scientifically minded archaeologists on an unproductive tangent” (Smith, 2017:521).
During the 1970s and 1980s archaeology published in the English language
saw acrimonious debates between two opposing philosophical and epistemological
camps. Lewis Binford and the processualists promoted a scientific approach that
used a faulty explanatory method (covering laws), while Ian Hodder and the postprocessualists promoted a non-scientific and humanities-oriented archaeology. Bruce
Trigger (2006:444–478) characterized this debate as part of a long-term conflict
between rationalism (processualism) and romanticism (post-processualism).

3 Making Good Arguments in Archaeology

39

In the wake of these archaeological “theory wars,” neither camp engaged publicly
or published on the issue of argumentation. The postprocessualists—in their
embrace of social constructivism, interpretivism, and the primacy of meaning—
ignored explicit considerations of argumentation and explanations. In the words
of Bruce Trigger (2006:466), “neither Leroi-Gourhan nor Hodder discovered how
to advance beyond speculation in interpreting the meaning of such regularities.”
Postprocessualists avoided engagement with the philosophy of science concepts on
these topics, with one exception. That exception was an erroneous claim that logical
positivism was synonymous with science, and therefore the more general scholarly
rejection of logical positivism as an adequate explanatory framework implied that
science was not appropriate for archaeology (Johnson, 2010; Martinón-Torres &
Killick, 2013). If these authors were correct on this matter, then much of science—
from genetics to astronomy—would cease to exist. But, of course, few if any of the
sciences have ever used the covering law approach for many decades. Whether this
claim by the postprocessualists originated in ignorance or guile, it does nothing to
advance argumentation or theory in archaeology (Smith, 2017).
The processualists and their descendants, whose theoretical approach has been
called “processualist-plus” (Hegmon, 2003), similarly failed to engage explicitly
with epistemological issues surrounding arguments and explanation, although there
were a few exceptions (e.g., Fogelin, 2007). Archaeologists employing a scientific
epistemology largely put their heads down, worked on their own materials with the
best explanatory schemes they knew of, and stayed away from public discussion of
arguments and explanations. In Trigger’s (2006:462) words, “Although American
archaeologists were increasingly open to theoretical diversity, most of them lacked
the ambition to try to determine in an operational manner under what circumstances
specific sorts of theories were and were not applicable.”
Allison Wylie noted that during the 1990s the loud debates between the processualists and the postprocessualists largely subsided from public view. She observed
that, “An unreflective ‘live and let live’ pluralism exempts a great many untenable
assumptions from reasoned examination” (Wylie, 2017a:129). But the situation is
more serious than she suggested. Wylie has failed to examine the spread of weak
and speculative arguments by the postprocessualists and their descendants. It seems
likely that a major reason for this is the lack of a robust epistemological literature in
archaeology that provides guidelines on acceptable arguments and explanations.
Many archaeologists tried to conduct their research using a scientific perspective,
but were frustrated with the lack of epistemological discussion of how to improve
archaeological arguments. Nearly all of the professional debate was on the level
of abstract social theory, not epistemology. In my own case, I began reading in
social science disciplines beyond my home discipline of anthropology, largely to
learn about research on cities and neighborhoods. I was pleasantly surprised to find
an active epistemological literature on explanation and argument, particularly in
the fields of sociology, political science, and historical social science (Tilly, 2008;
Gerring, 2012; Abbott, 2004; Mahoney et al., 2009). Covering law explanations
are absent from these fields, and causal mechanisms provide the dominant form
of explanation (Demeulenaere, 2011; Hedström, 2005); see other papers in this

40

M. E. Smith

volume for discussions of causality. The social-science works cited above lead one
to the philosophy of social science and history for suggestions on how to improve
argumentation in archaeology (Bunge, 2004; Little, 2010; Manicas, 2006).

3.3 Archaeological Argumentation
A 2015 paper on argumentation (Smith, 2015b) included three broad critiques
of widespread archaeological practices of argumentation and explanation: the
lack of testing of ideas; the poor use of methods of analogy; and a reliance on
abstract, philosophical social theory. This section reviews the current status of these
issues and provide background for a discussion of Stephen Toulmin’s methods for
analyzing arguments.

3.3.1 The Importance of Testing
Stephen Haber (1999:312) articulates an important consideration in argumentation
as follows: “The fundamental question of all serious fields of scholarly inquiry
{is}: How would you know if you are wrong?” (Haber, 1999:312). This notion
derives from Karl Popper’s (1934) concept of falsifiability. For Popper, scientific
explanations must be falsifiable. He emphasized crucial experiments that can falsify
definitively one or more propositions. In the social sciences, however, such definitive
experiments are rare. In the words of John Gerring (2012:31), “Some theories are
more falsifiable than others.” In their textbook on social science methods, Charles
Ragin and Lisa Amoroso discuss the importance of testing as follows:
By testing hypotheses, it is possible to improve the overall quality of the pool of ideas.
Ideas that fail to receive support gradually lose their appeal, while those that are supported
more consistently gain greater stature in the pool. While a single unsuccessful hypothesis
rarely kills a theory, over time, unsupported ideas fade from current thinking. It is important
to identify the most fertile and powerful ways of thinking and to assess different ideas,
comparing them as explanations of general patterns and features of social life. Testing
theories can also serve to refine them. By working through the implications of a theory
and then testing this refinement, it is possible to progressively improve and elaborate a set
of ideas. (Ragin & Amoroso, 2011:39).

This emphasis on testing is almost universal in the literature on social science
methods. Perri 6 and Christine Bellamy (2012:52) conclude a discussion of Popper’s
views by observing that, for social scientists, “to meet scientific standards of
rigour, theories must be stipulated in ways that make them empirically testable.”
Philosopher of science Mario Bunge, discussing Bruce Trigger’s approach to
archaeology, states that the scientific method “may be boiled down to the rule,
check your guesses” (Bunge, 2013:153). Perhaps not surprisingly, quite a few
postprocessualist archaeologists are on record opposing the usefulness of testing.

3 Making Good Arguments in Archaeology

41

Mathew Johnson (2010:223), for example, argues that archaeologists should “shift
from a language of ‘testing’ to a language of ‘evaluation’,” and Ian Hodder and Scott
Hutson (2003:239) claim that, “Instead of testing, we come to an understanding.”
The post-hoc argument is a type of untested—and untestable—argument common in archaeology. I quote from my 2015 paper on this:
Lewis Binford (1981) discussed problems with this procedure, which he called “post-hoc
accommodative argument.” He was referring to an interpretation that is applied to the
data and findings once the research activities are complete. The problem with post-hoc
arguments is that they can’t be shown to be wrong. The analysis is done, and the post-hoc
interpretation cannot be disproven without another round of research. We can all dream
up numerous alternatives to explain (or explain away) any set of findings. But without
some form of testing, post-hoc arguments serve to introduce potentially faulty or misleading
interpretations into the literature. (Smith, 2015b:19).

As pointed out by Geoffrey Clark (2000:852), such arguments are common in the
field of paleoanthropology, in spite the fact that, “it is a weak form of explanation.”
In some branches of psychology, post-hoc arguments are strongly condemned not
only as problematic arguments but also as ethical lapses (Kerr, 1998; Leung, 2011).
A common analogy for post-hoc arguments is a farmer who paints bulls-eyes around
the bullet holes in his barn to show off his superior shooting skill.

3.3.2 The Decline of Argument by Analogy
A large and sometimes contentious literature on the use of “ethnographic analogy”
in Americanist archaeology was synthesized and formalized by Alison Wylie’s
paper, “The Reaction Against Analogy” (Wylie, 1985). Nearly all of the explicit
archaeological uses of the method of analogy—from Lewis Binford and the new
archaeologists through Wylie’s paper—were in the form of inductive logic. Indeed,
Wylie’s criteria for assessing the strength of an argument by analogy are almost
identical to the criteria for inductive inferences as discussed in textbooks on logic
(Copi, 1982:397–400). Analogies are neither correct nor incorrect; instead, they are
more or less useful, typically depending on their strength. Here is what Wylie said
about the strength of analogies:
The standard criteria for evaluating what I have described as formal analogies are, then: the
number and extent of similarities between source and subject; the number and diversity
of sources cited in the premises in which known and inferred similarities co-occur as
postulated for the subject; and finally, expansiveness of the conclusions relate to the
premises (Wylie, 1985:98).
The two strategies developed for strengthening formal analogy—the strategies of expanding
the base of interpretation and elaborating the fit between source and subject—must be
treated as directives for the active investigation of sources and subjects rather than as
criteria for assessing analogical conclusions reflectively, after they are formulated. And the
inquiry they initiate must be specifically designed to determine what causal connections
hold between the material and cultural or behavioral variables of interest, and under what
conditions these connections may be expected to hold (Wylie, 1985:101).

42

M. E. Smith

This approach to argument by analogy—a formal argument based on the rules of
inductive logic—has been abandoned by many archaeologists in the decades since
1985. In its place, archaeologists have begun using three problematic practices that
generate weak and often misleading arguments: ad-hoc analogies, empty citations,
and heuristic analogies.
1. The problem of “ad-hoc analogies” has been described as follows:
Instead of following these simple and well-known guidelines, many authors today
invoke analogy by citing one, or perhaps two, analogical cases from anywhere in the
world that seem somehow related to the argument at hand. I refer to these arguments as
“ad-hoc analogies.” There is little consideration for sampling or formal comparison.
Ad hoc analogies provide no support at all for the argument at hand. The fact that
some human group somewhere in the world did something vaguely similar to what you
are claiming for your archaeological case does not in fact support your claim. (Smith,
2015b:20)

2. Empty citations are bibliographic references to works that do not contain any
data supportive of the case at hand. Instead, they merely signal works that make
a point similar to the point of the author. Such works are cited to lend an aura
of support to the argument, when in fact they contain no empirical support at all.
Empty citations are included in a work to falsely inflate the apparent strength or
quality of an analogical argument. The classic discussion of empty citations is
a paper by Anne-Wil Harzing (2002); other analyses include Todd et al. (2010),
Henige (2011), and Abbott (2010).
3. The use of heuristic analogues is a growing practice in archaeology with
parallels in the field of historical climate change and sustainability science. For
the latter realm, Meyer et al. (1998) contrast formal and heuristic analogies. A
formal analogy is an argument employing inductive logic, as promoted by Wylie.
Heuristic analogues are less rigorous comparisons, either of whole societies
or systems, or of parts of a small sample of societies. Typically, a complex
historical or archaeological setting or event is compared with conditions today,
without any formal testing. “They are heuristic because they are too complex
or too contextually different to be formally specified” (Meyer et al., 1998:220).
Examples include historical episodes (e.g., the collapse of an empire), and events
(a plague); these can be based on historical narratives, archaeological data,
paleoenvironmental reconstructions, or ethnographic documentation. Jared Diamond’s (2004) analyses of societal collapses are heuristic analogues, as are the
cases promoted by Michael Glantz (1991, 2019) in what he called “forecasting
by analogy.” While such analogues can be enlightening and educational, they
rarely provide a rigorous scientific explanation or understanding (Dearing et al.,
2010). They do not permit testing, and they do not conform to the criteria for a
successful inductive (formal analogical) argument.
Archaeologists are increasingly offering heuristic analogues in the name of argument by analogy. The procedure tends to go as follows. The archaeologist wants to
explain attributes of a particular past cultural context, often a social or institutional
setting; this is the target case. He or she chooses a single better-documented

3 Making Good Arguments in Archaeology

43

parallel case (from history, archaeology, or ethnography), and asserts that the
two settings are sufficiently similar to apply information from the well-described
case to the target case. This permits numerous details from the former to be
simply applied to the latter without testing. Keith Eppich (2020), for example uses
information from medieval Italy to illuminate Classic Maya society, and Maxime
Lamoureux-St-Hilaire (2020) generalizes and promotes this process of heuristic
analogue comparison for interpreting Classic Maya society. Davide Domenici’s
(2018) application of Aztec evidence to Teotihuacan provides another example.
These heuristic analogues in archaeology are a particularly weak form of
argument. In comparison with formal analogy, where archaeologists have developed
strategies to improve the strength and relevance of analogical reasoning (Wylie,
1985), heuristic analogues use only a single case for their source-side comparison.
In his discussion of argument by analogy, Matthew Johnson (2010:66–69) phrases
a number of hypothetical examples in terms of this kind of single-case, heuristic
comparison. Additional critiques of complex untested analogues are found in the
literature on comparisons in the discipline of history (Kocka, 2003; Sewell, 1967)
and anthropology (Ember & Ember, 2009; Bodnár, 2019). While the single-case
analogy may suffice if one’s goals are to “color the past,” or “make it recognizable
to us and our audiences” (Lamoureux-St-Hilaire, 2020:8), such heuristic analogues
are inadequate if our goal is to explain past events and processes (Meyer et al.,
1998).

3.3.3 The Popularity of Abstract Social Theory
For many archaeologists, “theory” has come to be synonymous with highly abstract
social theory (Thomas, 2015). While this body of thought may be useful for
understanding the social world on a very general, philosophical level, it is not of
much help for understanding the basic human activities, institutions, and social
conditions that comprise human life and society on a daily basis. High-level theory
is very broad and applicable to many situations, but its empirical content is quite low
(Abend, 2008; Mills, 1959; Tilly, 2008; Bunge, 1999). In the social sciences, most
explanatory theory is of a lower epistemological level, and it is often called “middlerange theory” (Merton, 1968). This concept has almost nothing in common with the
notion of middle-range theory as used by Lewis Binford (1983, 1989); for comment,
see Raab and Goodyear (1984), or Smith (2015b).
Because of the highly abstract epistemological level of much social theory in
archaeology (Thomas, 2015), it is hard to make rigorous empirical arguments.
Concepts such as practice theory, materiality, alterity and assemblage theory are
consistent with a staggeringly broad range of propositions, to the point where
it can be difficult to determine whether a particular argument is supported or
not. Proponents of these approaches find it difficult or unpleasant to frame their
arguments in a fashion that can be tested, including formal inductive analogies.
Not surprisingly, those archaeologists who promote the use of abstract social theory

44

M. E. Smith

(Johnson, 2010; Hodder & Hutson, 2003) are the same writers to disparage the use
of testing in archaeology.

3.4 The Structure of Arguments: Stephen Toulmin’s Scheme
“An argument is a connected series of statements intended to establish a proposition” (Monty Python, 1989:86). In 1958 Stephen Toulmin introduced a new formal
approach to argumentation to the philosophy of science (Toulmin, 2003). In place
of the former scheme—which was based on major premises, minor premises and
conclusions—his approach emphasized the varied nature of facts and their level of
support. He introduced the strength of arguments as an important consideration,
using a diagram (Fig. 3.1) to show the logical trajectory from data or facts to the
claim. Warrants, which providing justification and support for the claim, are a key
feature. This scheme was then developed for archaeology by Chapman and Wylie
(2016); see also Bonnin (2019) for an account relevant to archaeology.
The following summary is based on Toulmin (2003) Chapman and Wylie (2016)
and Bonnin (2019). Bonnin suggests that to begin the process, facts are preferable
to data, the term used by Toulmin. Data connotes the total information generated
by a project, whereas facts better describe the specific pieces of data arrayed for a
given argument. The line from facts to claims is supported by warrants. Warrants
are “general, hypothetical statements, which act as bridges and authorize the sort
of step to which our particular argument commits us” (Toulmin, 2003:91). This is
probably the most important innovation of Toulmin’s approach.
Toulmin’s concept of warrant was incorporated into the generalized argument
structure described in the textbook, The Craft of Research (Booth et al., 2008:chap.
7). These authors define warrant as a “general principle that justifies relating your
particular reason to your particular claim” (p. 114). Most warrants in archaeological
arguments consist of either comparative data or theory (Smith, 2015b:20), and a
variety of models also regularly serve as archaeological warrants. Analogy is one
of the formal ways that comparative data are employed as warrants. Warrants are
justified and supported by backings. Bonnin (2019:6) defines backings as “further
facts that can be brought to ensure the applicability of the warrants by specifying that
the circumstances in which the warrants are applied are the right ones. Backings are
Fig. 3.1 Stephen Toulmin’s
diagram for arguments.
(Graphic by Michael E.
Smith, based on Toulmin
(2003:97))

3 Making Good Arguments in Archaeology

45

secondary facts used in support of warrants. Backings are distinguished from facts
functionally.”
Rebuttals “identify exceptions and delimit the scope of an argument” (Chapman
& Wylie, 2016:35); they “indicate specific circumstances in which the claim made
would turn out to be invalid” (Bonnin, 2019:4). Finally, qualifiers describe the
quality and strength of the evidence (facts, warrants, backings) as it relates to the
strength of the argument. Toulmin’s (2003) examples of qualifiers include “This
must be the case,” “This may be the case,” as well as terms such as certainly,
probably, and possibly. These are difficult to incorporate into the basic diagram.
Chapman and Wylie (2016) provide a number of complex archaeological arguments
diagrammed with Toulmin’s scheme, and I refer readers to that source for an
excellent discussion.

3.5 Using Warrants to Distinguish Strong and Weak
Arguments
The warrants in Toulmin’s scheme provide a way to measure the strength or
weakness of a particular argument. In this section I link the topic of testing to
Toulmin’s argument structure. The warrants in strong arguments are those that
have been—or can be—tested and rejected if necessary, while many or most of the
warrants in weak arguments cannot be tested. Social science methodologists 6 and
Bellamy state, “We can define warrant as the degree of confidence that we have in an
inference’s capability to deliver truths about the things we cannot observe directly”
(6 and Bellamy 2012:13).
Warrants and their backings provide support to the claim of an argument; they
are not subjected to testing in a given study. If a given warrant and its backings
have been tested in the past, their relevance and strength are much greater than a
different warrant that has not been—or cannot be—tested. Economists Klappholz
and Agassi discuss the importance of testing as follows: “our interest in testing
stems from the fact that we learn by it. Yet in order to learn it is necessary that
the test be such as to expose a hypothesis to the risk of falsification” (Klappholz
& Agassi, 1959:65). These points are best illustrated with examples. Instead
of using the graphical representation of arguments, I employ a list format that
permits a more efficient use of space; these lists can be easily transformed into
the graphical form of Fig. 3.1. For clarity of comparison, I limit consideration to
relatively simple arguments for the use or significance of domestic artifacts and
features.
Two strong arguments from my own research are shown in Fig. 3.2. Argument
1—that small bowls were tools used in hand-spinning cotton—is supported by a
number of warrants, each supported in turn by solid backings (Smith & Hirth, 1988;
Smith, 2015a). I use the term “solid” because these backings have been exposed
to evaluation in the past. For example, a prior implied claim that a twirling spindle

46

M. E. Smith

Fig. 3.2 Schematic depiction of two strong arguments from the author’s research. Argument 1 is
based on Smith and Hirth (1988) and Smith (2015a); argument 2 is based on Olson and Smith
(2016)

leaves abrasion on the base of a bowl (Warrant 2, Backing 1) has never been falsified.
Most of the warrants for this argument amount to prior tests that have failed to
falsify specific claims. As a result, Argument 1 is a strong argument that is widely
accepted by archaeologists. The warrants and backings for Argument 2—that large
houses at Aztec rural provincial sites were elite residences—are not quite as secure
as in the first case. These backings are based on social patterns and trends, not on
technological constraints as in Argument 1. While the social patterns are relatively
strong (Olson & Smith, 2016; Smith, 1992), they are somewhat weaker than the

3 Making Good Arguments in Archaeology

47

warrants and backings in Argument 1. I judge “very likely” as an appropriate
qualifier for this argument.
Fig. 3.3 portrays two weak interpretivist arguments for the meanings of ceramic
figurines in term of political ideology.1 Most archaeologists would probably agree
that any interpretation of the meaning of ancient objects will be less secure—
weaker—than most interpretations of the uses of objects or buildings. Indeed, many
archaeologists are of the opinion that meanings cannot be recovered for ancient
objects in the absence of texts (Flannery & Marcus, 1993). One advantage of
Toulmin’s scheme is that it allows us to see precisely where the weaknesses in such
arguments lie.
In Argument 3, Elizabeth Brumfiel (1996) makes the claim that the attributes
and contexts of female ceramic figurines imply that the dominant ideology of the
Aztec state was resisted by commoners. The warrants she gives are noteworthy
for their low level of support by backings. Some have no (stated) backings at all,
and others are simply weak. Warrant 2 is backed by an abstract theoretical position
that can be reconciled with a number of conflicting claims. Because of the abstract
nature of the dominant ideology thesis, it cannot be tested directly, making this
a weak warrant and backing Warrant 4 is backed by citing the opinions of other
scholars, rather than empirical findings. This is an example of empty citation, as
discussed above. Warrants 1 and 5 are basically assertions. I suggest the qualifier,
“It is possible that” indicates the level of strength of this argument. Brumfiel
(1996:161) qualifies her argument as follows: “where the influence of the dominant
ideology is felt, it does not always result in ideological dominance.” While this
is a somewhat vague conclusion, it does provide some qualification to her central
claim.
In Argument 4, Christina Halperin (2009) builds on Brumfiel’s paper to make a
related argument about the role of figurines in transmitting a dominant ideology. As
in Argument 3, the backings here are quite weak; they include abstract theoretical
positions (Warrants 2 and 3), and assertions (Warrants 4 and 5). Again, this is a weak
and speculative argument, but that does not stop the author from making rather
definite claims without qualification; one example is her conclusion: “figurines
aided in the dissemination of state symbols and ideologies” (Halperin, 2009:396).
If archaeologists were to use Toulmin’s scheme to discuss and present their
arguments, they might be induced to provide more realistic qualifiers of the strength
of those arguments. This is, in fact, a fundamental requirement of arguments in
science. Arguments “should never be a categorical assertion, but should always
convey the author’s assessment of the credibility of his own claims” (Ziman,
1978:64); see also 6 and Bellamy (2012:36–37). In some fields—such as climate
change research by the Intergovernmental Panel on Climate Change—scientists

1 I chose these examples because they are two of the cases that sparked my initial inquiry into the
strength of arguments (Smith, 2015b). I saw these as particularly weak or problematic arguments
whose validity could not easily be tested, an observation that led me to investigate the epistemology
of argumentation in greater depth

48

M. E. Smith

Fig. 3.3 Schematic depiction of two weak arguments. Argument 3 is from Brumfiel (1996);
argument 4 is in Halperin (2009)

have developed coding systems for explicitly indicating the strength of every
claim (Adler & Hirsch Hadorn, 2014; Ebi, 2011), something archaeologists should
consider doing.

3 Making Good Arguments in Archaeology

49

3.6 Models as Arguments
A model is a simplified representation of some part of the world created in order
to better understand the organization and dynamics of that part of the world. In the
words of John Ziman (1978:23), a model is “no more than an analogy or metaphor.
It implies a structure of logical and mathematical relations that has many similarities
with what it purports to explain, but cannot be fully identified with it.” Following
the definitions of arguments and models employed in this paper, models can be seen
as a type of argument, and arguments can in turn be seen as a type of model.
In spite of their abstract similarity, these two concepts—models and arguments—
are rarely discussed together in archaeology. Each has its own literature, with
relatively few citations across one another. Chapman and Wylie (2016) include
discussion of both in their book, but they are included in separate chapters without
cross-references. These authors only connect the two concepts at an abstract level,
where arguments and models are both components or strands of the cables of
evidence and inference that make up archaeological knowledge of the past. I
suggest that in order to achieve a more comprehensive view of argumentation in
archaeology, it is useful to view models as a type of argument. Models begin with a
set of facts, they are manipulated by the analyst in ways analogous to warrants, and
the end result is a claim.
From the perspective of argumentation, archaeological models consist of two
sequential arguments. The first argument—which I will call the internal argument—
is the model itself. The second, or external, argument is the operation that links the
results of the model to some aspect of the past, or to a more general realm of social or
ecological processes. Archaeological works on modeling devote most or all of their
attention to the internal arguments (Clarke, 1972; Wylie, 2017b; van der Leeuw &
McGlade, 1997; Kohler & van der Leeuw, 2007; Romanowska et al., 2019). The
external argument—usually called the validation of the model—is where the results
are compared to external data to assess the degree of fit (Cegielski & Rogers, 2016;
McGlade, 2014).
I use the terms internal and external deliberately to line up with the concepts
of internal and external validity in the field of social science methodology. Internal
validity asks whether a finding is true for a chosen sample. That is, does the model
operate properly and produce results that make sense given the inputs and methods?
External validity asks whether a given finding can be generalized to a broader
population of cases (Gerring, 2012:84). James McGlade (2014:288), in discussing
issues in archaeological simulation models, stresses the need for “a stronger focus
on epistemological issues, rather than on technological/methodological preoccupations,” and this suggestion can be mapped onto the internal/external division.
Indeed, the preoccupation of modelers with internal validity at the expense of
external validity parallels the situation in the field of economics. In the words of
philosopher of science Nancy Cartwright:
Economists make a huge investment to achieve rigor inside their models, that is to achieve
internal validity. But how do they decide what lessons to draw about target situations outside

50

M. E. Smith
from conclusions rigorously derived inside the model? That is, how do they establish
external validity? We find: thought, discussion, debate; relatively secure knowledge; past
practice; good bets. But not rules, check lists, detailed practicable procedures; nothing with
the rigor demanded inside the models. (Cartwright, 2007:18)

Perhaps an acknowledgement of models as two-part arguments may promote greater
attention to epistemology by archaeological modelers, as called for by McGlade
(2014).

3.7 Conclusions
One of the negative consequences of the period when argumentation and epistemology receded in archaeology is that weak arguments have become tolerated.
Weak arguments are now a regular feature of peer reviewed publications. Their
conclusions—often based on abstract social theory combined with scanty empirical
evidence with no testing—are not reliable, preventing the development of a strong
foundation of solid archaeological evidence. While this situation may be acceptable
to those with an interpretivist orientation—where concerns are local with little
concern for generalization—a scientific perspective requires the creation of a
reliable body of findings, and those findings must rely on adequate forms of
argumentation.
The chapters in this book contribute to a growing trend of published studies
on argumentation in archaeology. In addition to the suggestions of other chapters,
I propose that attention to the form and structure of arguments can improve
the reliability and usefulness of the claims we make from archaeological data.
The works of philosophers of science who focus on social science and history
(Toulmin, 2003; Little, 2010; Wylie, 2002) are very helpful in this regard, but
it is up to archaeological practitioners to do what it takes to improve our arguments.
A continuing methodological advancement in argumentation will have two
benefits. First, it will help create a more robust record of archaeological knowledge,
thereby improving our understanding of past societies and cultures around the
world. Second, it will allow archaeological data to be used in transdisciplinary
research that goes beyond the confines of our discipline (Smith, 2021) and
contributes to broader research questions in the social, natural, and historical
sciences.
Acknowledgements An email exchange with Alison Wylie helped organize my thoughts on several issues of argumentation. Iza Romanowska, Stefani Crabtree, and several other archaeological
modelers on Twitter stimulated my thinking on models and their relationship with arguments.
Frasier Neiman made some insightful comments and suggestions that helped me see the linkages
between testing and warrants with more clarity.

3 Making Good Arguments in Archaeology

51

References
6, P., & Bellamy, C. (2012). Principles of methodology: Research design in social science. Sage.
Abbott, A. (2004). Methods of discovery: Heuristics for the social sciences. Norton.
Abbott, A. (2010). Varieties of ignorance. American Sociologist, 41, 174–189.
Abend, G. (2008). The meaning of “theory”. Sociological Theory, 26, 173–199.
Adler, C. E., & Hirsch Hadorn, G. (2014). The IPCC and treatment of uncertainties: Topics and
sources of Dissensus. Wiley Interdisciplinary Reviews: Climate Change, 5(5), 663–676. https:/
/doi.org/10.1002/wcc.297
Binford, L. R. (1981). Bones: Ancient men and modern myths. Academic.
Binford, L. R. (1983). In pursuit of the past: Decoding the archaeological record. Thames and
Hudson.
Binford, L. R. (1989). Debating archaeology. Academic.
Bodnár, J. (2019). Comparing in global times: Between extension and incorporation. Critical
Historical Studies, 6(1), 1–32.
Bonnin, T. (2019). Evidential reasoning in historical sciences: Applying Toulmin schemes to the
case of Archezoa. Biology and Philosophy, 34(2), 30.
Booth, W. C., Colomb, G. G., & Williams, J. M. (2008). The craft of research. 3rd. ed. University
of Chicago Press.
Brumfiel, E. M. (1996). Figurines and the Aztec state: Testing the effectiveness of ideological
domination. In R. P. Wright (Ed.), Gender and archaeology (pp. 143–166). University of
Pennsylvania Press.
Bunge, M. (1999). Social science under debate: A philosophical perspective. University of Toronto
Press.
Bunge, M. (2004). How does it work?: The search for explanatory mechanisms. Philosophy of the
Social Sciences, 34(2), 182–210.
Bunge, M. (2013). Bruce Trigger and the philosophical matrix of scientific research. In S.
Chrisomalis & A. Costopolous (Eds.), Human expeditions: Inspired by Bruce Trigger (pp. 143–
159). University of Toronto Press.
Cartwright, N. (2007). Are RCT’s the gold standard? BioSocieties, 2(1), 11–20.
Cegielski, W. H., & Rogers, J. D. (2016). Rethinking the role of agent-based modeling in
archaeology. Journal of Anthropological Archaeology, 41, 283–298. https://doi.org/10.1016/
j.jaa.2016.01.009
Chapman, R., & Wylie, A. (Eds.). (2015). Material evidence. Routledge.
Chapman, R., & Wylie, A. (2016). Evidential reasoning in archaeology. Bloomsbury Press.
Clark, G. A. (2000). On the questionable practice of invoking the metaphysic. American Anthropologist, 102(4), 851–853.
Clarke, D. L. (Ed.). (1972). Models in archaeology. Methuen.
Copi, I. M. (1982). Introduction to logic. 6th ed. Macmillan.
Currie, A. (2016). Ethnographic analogy, the comparative method, and archaeological special
pleading. Studies in History and Philosophy of Science Part A, 55, 84–94.
Dearing, J. A., Braimoh, A. K., Reenberg, A., Turner, B. L., II, & van der Leeuw, S. (2010).
Complex land systems: The need for long time perspectives to assess their future. Ecology and
Society, 15(4), 21.
Demeulenaere, P. (Ed.). (2011). Analytical sociology and social mechanisms. Cambridge Universitiy Press.
Diamond, J. (2004). Collapse: How societies choose to fail or succeed. Viking.
Domenici, D. (2018). Beyond dichotomies: Teotihuacan and the Mesoamerican urban tradition.
In D. Domenici & N. Marchetti (Eds.), Urbanized landscapes in early Syro-mesopotamia
and prehispanic Mesoamerica: Papers of a cross-cultural seminar held in honor of Robert
McCormick Adams (pp. 35–70). Otto Harrassowitz.
Ebi, K. L. (2011). Differentiating theory from evidence in determining confidence in an assessment
finding. Climate Change, 108, 693–700.

52

M. E. Smith

Ember, C. R., & Ember, M. (2009). Cross-cultural research methods. AltaMira.
Eppich, K. (2020). Analogy as theory and method. The SAA Archaeological Record, 20(1), 31–34.
Flannery, K. V. (1973). Archaeology with a capital S. In C. L. Redman (Ed.), Research and theory
in current archaeology (pp. 47–58). Wiley.
Flannery, K. V., & Marcus, J. (1993). Cognitive archaeology. Cambridge Archaeological Journal,
3, 260–270.
Fogelin, L. (2007). Inference to the best explanation: A common and effective form of archaeological reasoning. American Antiquity, 72, 603–625.
Gerring, J. (2012). Social science methodology: A unified framework. Cambridge University Press.
Gibbon, G. (2014). Critically reading the theory and methods of archaeology: An introductory
guide. Rowman and Littlefield.
Glantz, M. H. (1991). The use of analogies: In forecasting ecological and societal responses to
global warming. Environment: Science and Policy for Sustainable Development, 33(5), 10–33.
Glantz, M. H. (2019). Societal responses to regional climatic change: Forecasting by analogy.
Routledge.
Haber, S. (1999). Anything goes: Mexico’s “new” cultural history. Hispanic American Historical
Review, 79, 309–330.
Halperin, C. T. (2009). Figurines as bearers of and burdens in late classic Maya state politics. In C.
T. Halperin, K. A. Faust, R. Taube, & A. Giguet (Eds.), Mesoamerican figurines: Small-scale
indices of large-scale social phenomena (pp. 378–403). University Press of Florida.
Harzing, A.-W. (2002). Are our referencing errors undermining our scholarship and credibility?
The case of expatriate failure rates. Journal of Organizational Behavior, 23, 127–148.
Hedström, P. (2005). Dissecting the social: On the principles of analytical sociology. Cambridge
University Press.
Hegmon, M. (2003). Setting theoretical egos aside: Issues and theory in North American
archaeology. American Antiquity, 68, 213–243.
Hempel, C. (1965). Aspects of scientific explanation. Free Press.
Henige, D. P. (2011). Truth or hope? Stimulus and response in scholarly publishing. Journal of
Scholarly Publishing, 42(2), 205–225.
Hodder, I., & Hutson, S. R. (2003). Reading the past. Cambridge University Press.
Johnson, M. (2010). Archaeological theory: An introduction. Blackwell.
Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and Social
Psychology Review, 2(3), 196–217.
Klappholz, K., & Agassi, J. (1959). Methodological prescriptions in economics. Economica,
26(101), 60–74.
Kocka, J. (2003). Comparison and beyond. History and Theory, 42, 39–44.
Kohler, T. A., & van der Leeuw, S. E. (Eds.). (2007). Model-based archaeology of socionatural
systems. SAR Press.
Lamoureux-St-Hilaire, M. (2020). Comparative approaches and analogical reasoning for
Mayanists. The SAA Archaeological Record, 20(1), 8–13.
Leung, K. (2011). Presenting post hoc hypotheses as a priori: Ethical and theoretical issues.
Management and Organization Review, 7(3), 471–479.
Little, D. (2010). New contributions to the philosophy of history. Springer.
Mahoney, J., Kimball, E., & Koivu, K. L. (2009). The logic of historical explanation in the social
sciences. Comparative Political Studies, 42(1), 114–146.
Manicas, P. T. (2006). A realist philosophy of social science: Explanation and understanding.
Cambridge University Press.
Martinón-Torres, M., & Killick, D. (2013). Archaeological theories and archaeological sciences. In
A. Gardner, M. Lake, & U. Sommer (Eds.), Oxford handbook of archaeological theory. Oxford
University Press.
McGlade, J. (2014). Simulation as narrative: Contingency, dialogics, and the modeling conundrum.
Journal of Archaeological Method and Theory, 21(2), 288–305.
Merton, R. K. (1968). Social theory and social structure. Free Press.

3 Making Good Arguments in Archaeology

53

Meyer, W. B., Butzer, K. W., Downing, T. E., II, Wenzel, B. L. T., & Wescoat, J. L. (1998).
Reasoning by analogy. In S. Rayner, & E. L. Malone (Eds.), Human choice and climate change,
vol. 3: Tools for policy analysis (pp. 217–289). Battelle Press.
Mills, C. W. (1959). The sociological imagination. Oxford University Press.
Monty Python. (1989). The complete Monty Python’s flying circus: All the words (Vol. 2). Pantheon
Books.
Morgan, C. G. (1973). Archaeology and explanation. World Archaeology, 4(3), 259–276.
Moro Abadía, O., & Lewis-Sing, E. (2021). The decline of epistemology in archaeology:
Comments on an ongoing discussion. In L. Coltofean-Arizancu & M. Díaz-Andreu (Eds.),
Interdisciplinarity and archaeology: Scientific interactions in nineteenth- and twentieth-century
archaeology (pp. 203–223). Oxbow Books.
Olson, J. M., & Smith, M. E. (2016). Material expressions of wealth and social class at Aztecperiod sites in Morelos, Mexico. Ancient Mesoamerica, 27(1), 133–147.
Orser, C. E., Jr. (2014). Archaeological thinking: How to make sense of the past. Rowman and
Littlefield.
Popper, K. R. (1934). The logic of scientific discovery. Harper and Row.
Raab, L. M., & Goodyear, A. C. (1984). Middle-range theory in archaeology: A critical review of
origins and applications. American Antiquity, 49, 255–268.
Ragin, C. C., & Amoroso, L. M. (2011). Constructing social research: The unity and diversity of
method. Sage.
Romanowska, I., Crabtree, S. A., Harris, K., & Davies, B. (2019). Agent-based modeling for
archaeologists: Part 1 of 3. Advances in Archaeological Practice, 7(2), 178–184. https://doi.org/
10.1017/aap.2019.6
Sabloff, J. A., Beale, T. W., & Kurland, A. M., Jr. (1973). Supplement: Recent developments in
archaeology. Annals of the American Academy of Political and Social Science, 408, 103–118.
Sewell, W. H. (1967). Marc Bloch and the logic of comparative history. History and Theory, 6(2),
208–218.
Smith, M. E. (1992). Archaeological research at Aztec-period rural sites in Morelos, Mexico.
Volume 1, Excavations and Architecture/Investigaciones arqueológicas en sitios rurales de la
época Azteca en Morelos, Tomo 1, excavaciones y arquitectura. University of Pittsburgh.
Smith, M. E. (Ed.). (2015a). Artefactos Domésticos de Casas Posclásicas en Cuexcomate y
Capilco, Morelos. Archaeopress.
Smith, M. E. (2015b). How can archaeologists make better arguments? The SAA Archaeological
Record, 15(4), 18–23.
Smith, M. E. (2017). Social science and archaeological inquiry. Antiquity, 91(356), 520–528.
Smith, M. E. (2021). Why archaeology’s relevance to global challenges has not been recognized.
Antiquity, 95, 1061–1095.
Smith, M. E., & Hirth, K. G. (1988). The development of Prehispanic cotton-spinning Technology
in Western Morelos, Mexico. Journal of Field Archaeology, 15, 349–358.
Thomas, J. (2015). The future of archaeological theory. Antiquity, 89, 1287–1296.
Tilly, C. (2008). Explaining social processes. Paradigm Publishers.
Todd, P. A., Guest, J. R., Lu, J., & Chou, L. M. (2010). One in four citations in marine biology
papers is inappropriate. Marine Ecology Progress Series, 408, 289–303.
Toulmin, S. (2003). The uses of arguments. Updated edition, Cambridge University Press.
Trigger, B. G. (2006). A history of archaeological thought. 2nd ed. Cambridge University Press.
van der Leeuw, S. E., & McGlade, J. (Eds.). (1997). Time, process, and structured transformation
in archaeology. Routledge.
Watson, P. J., LeBlanc, S. A., & Redman, C. L. (1971). Explanation in archaeology: An explicitly
scientific approach. Columbia University Press.
Watson, P. J., LeBlanc, S. A., & Redman, C. L. (1974). The covering law model in archaeology:
Practical uses and formal interpretations. World Archaeology, 6(2), 125–132.
Wylie, A. (1985). The reaction against analogy. Advances in Archaeological Method and Theory,
8, 63–111.

54

M. E. Smith

Wylie, A. (Ed.). (2002). Thinking from things: Essays in the philosophy of archaeology. University
of California Press.
Wylie, A. (2017a). From the ground up: Philosophy and archaeology. Proceedings of the American
Philosophical Association, 91, 118–136.
Wylie, A. (2017b). Representational and experimental modeling in archaeology. In L. Magnani &
T. Bertolotti (Eds.), Springer handbook of model-based science (pp. 989–1002). Springer.
Ziman, J. (1978). Reliable knowledge: An exploration of the grounds for belief in science.
Cambridge University Press.

Chapter 4

A Causal Model Application to a Cultural
Heritage Sentence Analysis
Alejandro Sobrino and Beatriz Calderón-Cerrato

Abstract In this paper we will approach a cultural heritage sentence focusing
on its causal content with the aim of providing a causal graph that, once pruned
using bayesian techniques, schematically shows in an abbreviated way the essential
content of the sentence for a non-specialist or general audience. For that purpose,
the paper develops the following story line. We begin by noting the frequent
controversies around heritage and its prosecution when discrepancies emerge. Next,
we analyze a Spanish legal sentence about cultural heritage focusing on its causal
structure and lexicon. In this respect, relevant aspects of causality are discussed both
from a logical and a lexical point of view, which makes it possible to extract from
the text of the judgment those sentences that are causally most salient. Differences
between causality of physical and law facts are also cleared. Finally, a causal
graph is depicted from the selected set of causal phrases of the sentence and a
Bayesian analysis is applied to separate effective causes from the spurious ones for
understanding the judge’s verdict, concluding the usefulness of the causal analysis
with the aim to grasp the factual and evidentiary contents of a sentence about
heritage.
Keywords Causality · Knowledge · Explanation · Bayesian networks ·
Counterfactual

A. Sobrino (!)
Department of Philosophy, University of Santiago de Compostela, Santiago de Compostela, Spain
e-mail: alejandro.sobrino@usc.es
B. Calderón-Cerrato
Incipit CSIC, Santiago de Compostela, Spain
e-mail: beatriz.calderon-cerrato@incipit.csic.es
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
C. Gonzalez-Perez et al. (eds.), Discourse and Argumentation in Archaeology:
Conceptual and Computational Approaches, Quantitative Archaeology
and Archaeological Modelling, https://doi.org/10.1007/978-3-031-37156-1_4

55

56

A. Sobrino and B. Calderón-Cerrato

4.1 Cultural Heritage and Its Disputes
The cultural heritage of a community is, together with other factors as its geography,
its weather or its language, a property that defines the people. Cultural heritage is
anchored in the past in terms of the legacy we receive and is projected into the future
as an unifying force of societies that use their identities as elements that define and
difference them: the peculiarity of their squares, their dances, their traditions or
their accents. Cultural heritage comprises not only the material heritage, but also
the natural and intangible one, with which the access to the cultural diversity is
promoted, conveying lifestyles and experiences between generations.
This has conferred cultural heritage notoriety in our modern societies, where
tourism has turned into a massive and regular activity, with an essential economic
function. It is indispensable, then, to look after its authenticity and its good conservation. Heritage has to be exposed, but at the same time preserved, in a difficult
balance in which sustainability is a key element. Heritage maintenance requires
policies that preserves it from its fragility, reconciling the necessary exploitation
of the environment with the care of the legacy of our ancestors, investing in the
conservation and revitalization of what is inherited and in its legal protection against
possible abuses.
It has been tried to preserve the ‘fragile wealth’ that heritage entails, at least, with
three types of actions:
1. Identification: registrations and inscriptions, which determine what elements are
valuable and require special attention in their defense or safeguard. Registering
and inventorying thus become necessary activities to know what each culture has
and to detect what it lacks.
2. Promotion: investment and awareness, that measure the degree of commitment
of public bodies in the valorization and defense of what is inherited; promotion
of continued investments that result in the conservation and luster of heritage,
involving the private sector and civil society, which thus recognizes its importance.
3. Preservation: veto by law those individuals or actions that, voluntary or involuntarily, attack the heritage, receiving for it the appropriate notice or reprimand. If a
loss of profit happens for the protection of that asset, the Administration assumes
the appropriate compensation.
In the case of law preserved Listed Heritage Items (LHI from now onwards), judicial
decisions play a leading role. A judicial sentence is a legal resolution that expresses
a final decision on a process, which can be either criminal or civil. With the judicial
sentence, the litigation or lawsuit filed by the parties ends and the judge fails the final
resolution. Due to the form, sentences can be classified into written or oral, although
the latter is only possible for some processes. In this work we use a written sentence.
Regarding their content and explanations, sentences must have the following parts
or sections:

4 A Causal Model Application to a Cultural Heritage Sentence Analysis

57

1. Preamble: it encloses the data of the place, date, identification of the parties,
lawyers, procedure number, etc.
2. Background of facts and proven facts: it literally explains the requests of the
parties involved in the process and expresses what has happened according to the
discretion of the judge and the available evidence.
3. Fundamentals of Law: it describes in separate and numbered paragraphs the legal
arguments that have motivated the resolution in favor of one of the parties.
4. Operative part and ruling: it contains the judge’s decision and determines the
resolution to which the parties must abide.
However, civil sentences like the one we handle do not have a precise structure
in the Civil Procedure Law itself (Taranilla, 2015: 67). Proven facts are, then, a
section that does not usually have independence within the superstructure of the
text and, instead, are integrated into the section on the fundamentals of law. Even
so, the sentence that we analyze has a sub-section within the fundamentals of law
called Relevant background for the resolution of the case to differentiate these two
sections.
Judicial sentences are argumentative texts where verdict is reasoned using proven
facts and fundamentals of law terms. The sentence follows a logical scheme of
legal coverage, where proven facts are subsumed in a juridical rule to extract the
consequences expected. The parties must defend their points of view relying on
convincing facts and theories as well as supporting evidence to persuade to the
court and convince the judges to embrace their petitions through arguments. As
an argumentative text, it can be formalized using a logical language attempting
to demonstrate how conclusions follow deductively from premises. However, this
approach is questionable because, (i) it ignores that law is not only a conceptual or
axiomatic system, but it has intentional agents and social effects that may require
that a legal rule to be annulled or modified and (ii) legislators can never fully
predict under what circumstances the law should apply. Both factors imply that the
legislation is formulated in general and abstract terms, which creates uncertainty and
space to disagreement. Legal reasoning seems to respond to a dynamic logic that
allows the logical status of some premise to be changed if others are incorporated
and to change the assessment that the first one has deserved. Bayesian networks
offer a model to formally study these cases.
Since they are argumentatively linked texts, court rulings include conditional and
causal sentences that attend to generic formats as ‘Si . . . entonces’ (‘If . . . then’),
‘En consecuencia’ (‘In consequence’) or ‘Hay relación de causalidad’ (‘There is
causal relation’). In this paper, we analyze a court ruling about cultural heritage
in order to: i) showing its conditional and causal structure, given the abundance
and notoriety of this type of sentences, (ii) drawing a causal graph that visually
shows the reasoning flow that leads to the verdict, (iii) using counterfactual or
Bayesian analysis of causality to prune the causal graph and show in an effective
and summarized way the causes that de facto substantiate the verdict.

58

A. Sobrino and B. Calderón-Cerrato

4.1.1 Analysis of a Sentence About Cultural Heritage
The sentence object of our analysis corresponds to the Superior Court of Justice
(Contentious Chamber) based in Pamplona/Iruña, appeal No. 306/2018, against
Resolution 95/2018 of April 6 of the General Director of Culture and the Príncipe de
Viana Institution. Next, we summarize the sentence according to the aforementioned
parts:
Background of facts:
• 08.11.1984. Mármoles Baztán is granted the exploitation of the AlkerdiBerroberría area to extract marble.
• The concession is extended 30 years from 2014.
• 05.07.2014. There is a blast by Mármoles Baztán that causes alarm in the
population. The Urdax City Council requires the concessionaire not to do any
more blasting and entrusts a study to the Department of Environment of the
Autonomous Community.
• 12.08.2014. Blasting is temporarily suspended and archaeological studies are
started in nearby caves.
• 04.08.2016. Paleolithic engravings are discovered.
• 06.08.2016. The entire Alkerdi-Berroberría system is registered as LHI.
• 31.08.2016. Mármoles Baztán files a patrimonial claim for damages derived from
the inscription as LHI that is dismissed.
• 21.11.2016. Mármoles Baztán also files a patrimonial claim with the Economic
Development Council of the Government of Navarra for damages derived from
the declaration of nullity of the concession extension. It is dismissed.
• 12.09.2016. Once a new extension was processed, it was rejected due to the
existence of LHI.
• 13.06.2018. An appeal was filed, and it was dismissed.
• 29.05.2018. Such inscription was appealed again, and it was rejected because
there was rock art.
Fundamentals of Law:
• Registration as LHI is an act of non-declarative procedure that lacks substantivity
to produce damages (sentences 19 Nov 2013, TS 21 cot 2008).
• It is necessary for the administration to issue an act by which the existence of LHI
is recognized or declared (art 40.2 law 16/1985 Spanish Historical Heritage).
The parties allege that:
• Plaintiff: the causal relationship between the actions of the administration (registration as LHI) and the damages derived from the impossibility of continuing
with the exploitation is evident.
• Defendant: the causal relationship between the declaration of the registration as
LHI of the Alkerdi-Berroberría system and the damage suffered for which it is
now claimed has not been proven.

4 A Causal Model Application to a Cultural Heritage Sentence Analysis

59

• Co-defendant: Zurich Ins. says that the policy does not cover the claim that is the
subject of this process. There is no causal relationship between the administrative
action and the prejudices claimed.
Petition of the Defendant and Co-defendant. Annulment of the appeal and, consequently, inadmissibility of patrimonial responsibility for damages derived from
the registration as LHI of the Alkerdi-Berroberría system operated by Mármoles
Baztán.
Disagreement of the Plaintiff. There is a set of actions by the Administration that
are determinants of the damage caused:
Extension of the concession.
Investigation of the Alkerdi-Berroberría system.
Declaration of the Alkerdi-Berroberría system as LHI.
and that they were carried out, not because the law required it, but by decision of
the Administration, which is the cause of the damage suffered. Therefore, the causal
relationship between the activity of the administration (registration of an area as
LHI) and the damages derived from not continuing the exploitation is evident.
Disagreement of the Defendant. The registration of the Alkerdi-Berroberría system
as LHI is an administrative action in accordance with the law and an act of procedure
for containing cave paintings that is not causing damage. The inscription itself has
no effect on the exploitation of the quarry. There is no causal relationship between
the administrative action and the damage that is said to be borne by the administered,
since registration as a LHI does not produce the effect of either the revocation of the
extension of exploitation or the denial of it.
Resolution. Therefore, it was demonstrated that the contested administrative action
was in accordance with the law and the reason for the appeal and the claim filed by
Mármoles Baztán were dismissed because there is no causal relationship between
the registration of the Alkerdi-Berroberría system as LHI and the damages claimed.
In this schematic summary of the sentence, which paraphrased selected texts
of the sentence, the presence of causal relationships can be noted both in the
petitions of the parties and in the justification of the sentence. The causality and
the implication or subsumption of particular cases in norms are essential in the
justification of the verdict. Therefore, its causal study seems justified. This paper,
then, is articulated in the following way: in Sect. 4.2 we analyze the structure
and logical properties of conditional and causal sentences, showing analogies
and differences. In Sect. 4.3 we show the differences between the concepts of
causality, typical of the physical sciences, and causation, characteristic of the social
sciences, such as law, and we show a causal model for its adequate analysis. In
Sect. 4.4 we analyze the grammatical and lexical form of conditional and causal
sentences. In Sect. 4.5, Bayesian networks are applied to the analysis of the sentence
summarized above, in order to separate the effective causes from the spurious ones
in the justification of the verdict or opinion. This analysis allows us to obtain a
causal graph showing how the ruling follows form the proven facts. In Sect. 4.6,

60

A. Sobrino and B. Calderón-Cerrato

we summarize the conclusion of the work and expose how to continue it. As
complementary material, the sentence under analysis is added as an annex.

4.2 Conditional Logic and Causal Logic: Analogies
and Divergences
‘Ingesting 2 grams of cyanide causes death’ can be paraphrased as ‘If someone takes
2 grams of cyanide, he dies’. Conditional statements and causal statements attempt
to relate sentences to each other so that there is some entanglement or link between
them. While they have similarities, they also show differences. Below, we will look
at some analogies and some dissimilarities.

4.2.1 Strict and Material Conditional
Conditional sentences usually contain the discursive markers if, . . . , then, but there
are sentences containing these words that do not express any relation between the
protasis and the apodosis, such as If Einstein was a physicist, Madrid is the capital
of Spain. That is a material conditional. Strict conditionals are opposed to material
conditionals because the relation between the antecedent and the consequent is
imperative or necessary. A strict conditional is one that uses the modal operator of
necessity, ! (p➔q), meaning that q follows from p in all possible worlds, i. e., that
there is no imaginable situation that makes the antecedent true and the consequent
false. Such is the case in If a set A is properly contained in another set B, then A is
smaller than B. The conditional of a valid inference rule is strict since the conclusion
necessarily follows from the premises.
Although from a strictly logical point of view validity is only concerned with
form, in the causal arguments the content is a matter of interest. A causal argument
substantiates an effective link between cause and effect. But, in addition to the
true connection between cause and effect, the possible non-present but conceivable
elements that would help to enable or disable such a connection are kept in mind.
Let the following arguments be in the Modus Ponens format (Table 4.1):
Formal logic classifies both arguments as equally valid but causal logic gives
them different credibility. Indeed, the argument on the left is more convincing than
the argument on the right because it has fewer disabling factors: There are few
Table 4.1 Examples of Modus Ponens arguments
If you pull the trigger, the gun goes off
I pull the trigger
Then the gun goes off

If I fertilize the plant, it grows
I fertilize the plant
Then it grows

4 A Causal Model Application to a Cultural Heritage Sentence Analysis

61

scenarios where, if the trigger is pulled, the gun will not fire, but there are plenty
of scenarios in which a plant gets fertilized and does not grow – if it is not watered
enough, if it does not have good sun exposure, if it does not have fungus, etc. (cf.
Cummings et al., 1991).

4.2.2 Indicative Versus Subjunctive or Counterfactual
Conditional
In causal relations, indicative conditionals (if a then b) and subjunctive or counterfactual conditionals (not b unless a; if not a then not b) are relevant. Subjunctive
conditionals are often used as a test of causality: if a is the cause of b, then if a does
not occur, then b does not occur either or, in other view, b does not occur unless it
does a. This is true if the cause is single, but not in a multi-causal scenario, where
the effect may be due to an alternative cause sufficient as the previous one to cause
it.
Subjunctive conditionals allow us to conjecture what follows if the cause does
not occur in the current world, either because we assume its non-occurrence (that
the earth does not have rotational motion around the sun) or because we intervene
in it (moving one billiard ball before it is hit by another and thus impeding the
transmission of motion to a third). Intervention, which is a human intentional action,
suggests an alternative interpretation for the word ‘condition’: in an indicative
conditional statement, the antecedent is a sufficient condition for the consequent;
in a subjunctive conditional statement, the effect is dependent or independent of a
possible cause, conditioned on whether (or not) an intervention (Markov principle)
is made on it (Cfr, Pearl, 2009). In the case of multiple causality, this permits
to calculate how possible alternative causes influence each other. For example,
whooping cough can be caused by two bacteria: bordetella pertussis and bordetella
parapertussis. Each is sufficient to independently cause the disease. Suppose a
person is found to have whooping cough and it is not due to the bacterium
bordetella pertussis. The causes that were independent before are not now, because
the exclusion of one of them allows the other to be inferred as the effective cause.

4.2.3 Positive and Negative Causality
Positive causality came from the assumption that causal influence is produced by the
transfer of matter, energy, or information from cause to effect (Cfr. Dowe, 2000).
Negative causality, on the other hand, is the absence of such influence, either by
inaction, omission, or absence. In positive causality there is an effective relationship
between cause and effect (If I pull the trigger, the gun goes off ). Negative causation,
on the other hand, is because the cause, the effect, or both, do not happen (The

62

A. Sobrino and B. Calderón-Cerrato

absence of vigilance caused the robbery). Depending on this, they are called by
different names: prevention, prevention by omission or prevention by absence and,
finally, absence (Cfr. Barros, 2013).
Some philosophers, such as Lewis or Armstrong, stand that negative causality
is not a real causality because a non-fact cannot be considered a real cause (cf.
Armstrong, 1999). But there is no doubt that negative causality takes place in fields
such as medicine or law: thus, it is said that scurvy is caused by the absence of
vitamin C or that the omission of help in an accident is a cause of punishment in
case of serious injuries. It could be said that, even if they are not causally productive,
absences are causally relevant; they do not transfer anything, but they cause effects
(Cfr., Glennan, 2009).
Negative causation or causation by disconnection presents problems for a
counterfactual definition of causation. Thus, it is difficult to determine in omissions
a single cause which, if it did not occur, would cause an effect. In positive causality,
every alternative cause is sufficient for the effect, but in negative causality, all
omissions are necessary. How many omissions must be taken into account and
which of them are responsible for the effect are questions that frequently do not
have a clear answer. In the legal field this is called promiscuity and indicates that,
in multi-causal situations, it is necessary to distinguish between relevant and nonrelevant causes. But relevance is an imprecise term and the scales on which it is
ordered may vary.

4.2.4 Transitivity
The transitivity of causality is often associated with physical or Michottian causality,
exemplified by the movement of a billiard ball, which is transferred to other balls
with which it collides in a domino effect:
Figure 4.1 illustrates the fact that if motion is impressed on ball A (efficient
cause) and this in turn strikes ball B, C also moves because of A. In the ideal case,
this example could have as many intermediate ball links as desired, although the
inevitable friction on any surface limits the transmission of the motion initiated by
A. Transitivity allows causes to be seen as proximate or distant; thus, in the previous
fig. B is closer to C than A, and the closer it is to the effect, the better its causal
influence can be identified. In Fig. 4.1, A causally influences C by means of B. But
an intervention on B can put A out of play. This example shows that michottian
causality is always transitive, but causal dependence may not be (cf., McDermott,
1995).
Fig. 4.1 Michottian causality

4 A Causal Model Application to a Cultural Heritage Sentence Analysis

63

In effect, a counterfactual reading of the Fig. 4.1 says that ‘C if B’ and that ‘B
if A’. But if an intervention is made on the ball B altering its state (e.g., pushing
it towards C before A hits it or separating it from A’s trajectory), the link between
C and A is broken and it is then meaningless to counterfactually attribute to A any
causal power over C.

4.3 Conditionality and Causality in Natural Sciences
and Law
Causality plays an undeniable role in natural sciences, but also in -social sciences.
In the former, it is usually named ‘causality’, and in the latter, ‘causation’. In the
laws of physics, equations usually reflect the necessary connection between the
events they connect, a link that is intended to be stable or eternal and precise.
In contrast, causation in the social sciences, such as law, reflects the relationship
between a person’s conduct and the damage caused, which legally takes the form of
pointing out a fault or attributing guilt, and varies according to countries, courts, or
individuals (cf. Lagnado & Gensterberg, 2017:574). While in the field of physical
sciences one usually speaks of causality, in the field of law the term ‘causation’ is
used to emphasize that not only physical processes are at work, but also rational
agents acting intentionally. We will analyse this difference according to several
parameters.
3.1 Causality and precision: while causality tends to be precise, causation is
generally imprecise
Physical causality is often characterized as precise: same causes, same effects and
always the same effects. Physical or Laplacian determinism argues that, if we know
the laws governing the matter and the initial conditions of a problem, it is possible
to accurately predict any future event or occurrence. Causation in social sciences,
on the other hand, is imprecise. Legal reasoning must be based on proved facts
or relationships, but also on enacted laws and the social sentiment about them.
While causality is based on physical laws, causation is rested on rules published
by legal scholars attending only more or less widely accepted social norms, using
therefore vague language to accommodate divergences which possibly lead to
different interpretations (cf. Li, 2017). If the rules of physics tend to be precise,
social rules are contextual and imprecise, since they involve not only immaterial
objects or entities, but qualitatively reasoners using beliefs and presuppositions.
Hence, legal reasoning perhaps shares more elements with history than with natural
sciences (cf. Lehmann & Breuker, 2000:127–8).
3.2 Concrete and abstract causality: while causation is usually abstract, causation
is concrete

64

A. Sobrino and B. Calderón-Cerrato

Causality is concerned with regularities involving physical objects and shows a kind
of necessary relationship between them, so that the cause provokes the effect, the
effect does not occur without the cause, and the effect occurs after the cause. This
relationship constitutes for Mackie (1980) the cement of the universe and illustrates
the stable or essential regularities underlying scientific explanation. The laws of
physics are often refined expressions of causal relationships. Given its relevance,
causality is a key notion in science and metaphysically prior to the notions of space
and time. Causation, on the other hand, refers to a social, not a physical property.
It does not occur between objects, but between events affecting agents and it is
characterized by the application of a general principle to a specific case, an action
from which follows the effects that the rule foresees, usually in terms of attribution
of guilt. In effect, Hart and Honoré (1959) based the assignment of liability to actors
who cause harm to others on:
• their specific conduct,
• the causal connection between that conduct and the harm caused, and
• the culpability legally implied by it.
Legal causation is concerned with factual actions and the ascription of liability is,
to a large extent, cause in fact and always after-the-fact.
3.3 INUS Causation and NESS Causation: while causality is usually INUS-type,
causation is said to be NESS-type
Physical causation can be uni or pluri-causal, the latter being more abundant due
to the rarity of sufficient and necessary causes. Mackie characterizes physical
causation as INUS causation -Insufficient but Non-redundant parts of a condition
which is itself Unnecessary but Sufficient for the occurrence of the effect- (cfr.
Mackie, 1980) where a cause is a necessary element in a non-redundant group that
is sufficient, but not necessary, to cause the effect. For example, let’s say that to
identify (I) an individual, one needs to know his full name (N) and ID number
(D) or passport number (P); in symbols: N ∧(D∨P) ➔ I. Let us look at one of
the causal factors: (D∨P) and, specifically, let us note that D. D is not necessary
for identification, insofar as P can do that function, but it is not sufficient either,
because N is always required to obtain I. Therefore, D belongs to a causal group
(D∨P) that is insufficient, but not redundant, for the effect, and is an unnecessary,
but sufficient part of that group. Wright adapted this causality to the field of law
and called it NESS causality – Necessary Element of a Sufficient Set (Cfr. Wright,
1985): in a specific situation, a relevant causal condition is a necessary element of
a set of conditions that are jointly responsible and sufficient for the harm to occur
(Cfr. Honoré & Gardner, 2019).
3.4 Negative causes and negligence: while in causality absence is not always
readily admitted as a cause, negative causes are a regular part of causation
and can be substantiated in negligence, a clear form of causing harm
Negligence in law relates to negative causation and has to do with sufficient
causes that are independent of each other to the effect. When there are several

4 A Causal Model Application to a Cultural Heritage Sentence Analysis

65

negligent actors that are independent of each other, each of them can be considered
an independent sufficient cause and, so, legally liable for the provoked damage,
as it is a substantial and autonomous causal factor of it. For example, when a
fire is favoured by the failure of a watchman and a fireman to act in time, the
negligence is of both. In that case, liability arises from an inaction, although it
can sometimes also be attributed to factual but ineffective attempts to prevent
damage.
3.5 Causal transitivity: while in causality the cause can be propagated distantly
or, under ideal conditions, even indefinitely, in causation the proximate cause
has a relevant value, limiting the transmission of the ‘conditio sine qua non’ or
‘but-for test’ backward
Causal chains break if the causal power decreases at each link and do
not break otherwise. The latter is only the case if ideal assumptions are
postulated, such as a frictionless surface or unbiased agents in their actions,
far removed from how objects in nature or people in society behave. In
law, the remoteness of the cause is relevant for causal attribution. Usually,
only proximate causes have causal power. Causality decreases as causal links
increase.
In effect, assuming an unlimited transitive causal link the conditio sine qua non
or but-for test would be endless and the causal attribution doubtful: the victim would
not have been harmed if the aggressor had not fired, he would not have fired if the
weapon had not been made, the weapon would not have been made if the metal
was not available, etc. Proximate cause is a legal limitation of cause in fact. Despite
this difficulty, counterfactual explanation is relevant in law since liability is always
judged after the facts and once the process of reasoning involving facts and rules
is completed. Causal reasoning is a kind of abductive inference that the occurrence
of an event B (a damage) can be explained by a previous event A which causes it.
In case of causal overdetermination, once the effect is instantiated -e.g., by proving
that it occurs- backward inference allows causation to be attributed by virtue of
the communication opened between the causal factors that were sufficient before to
cause it independently. Let’s look at an example known as a ‘collider’, represented
in Fig. 4.2:
A is caused by C or B and, in turn, A causes E: B → A ← C, A → E. In
principle, B and C are independent factors in causing E. But when the value of E
is known and the value of B or C is instantiated, the communication between the
possible causes is opened, allowing the choice of one of them as the effective cause.
For example, let B and C be two people each wielding a firearm that can cause
serious injury and death. The information that B has one weapon says nothing about
what C, who has another gun, can do. However, if it is verified that E is dead and
that B’s pistol is jammed, the cause of his death is attributable to C. Therefore, the
instantiation of the effect and one of the causes turns causes that were independent
before into causes that now depend on each other in the assessment of the harm
caused.

66

A. Sobrino and B. Calderón-Cerrato

Fig. 4.2 Example of causal overdetermination and backward inference

4.4 Causal and Conditional Lexicon
Law language is considered a language of specialty. Unlike other specialty languages, that create new words to refer to a specific reality, the law often uses
common words to which new meanings are added. That is why it is difficult to
understand a legal text, because despite knowing most of the words used, they allude
to specific meanings (Martí Sánchez, 2004: 175). Given the nature of the sentence
as a legal textual genre, the purpose of which is to manifest the guilt or innocence of
an entity or person for their actions, many of these new meanings denote causal and
conditionality relationships. Another characteristic of legal texts is the abundance
of conjunctions and phrases. In the case of conditional sentences, there are four
common structures in English:
(a)
(b)
(c)
(d)

If + present + present (universal truth);
If + present + will + infinitive (real or possible);
If + past simple + would + infinitive (hypothetical);
If + past perfect + would + have + participle (regret).

In Spanish, however, this is a more complex construction that contains different
interpretations according to the verbal forms used (Bosque & Demonte, 1999):
(a) Si + present indicative + present indicative/future indicative/conditional/past
indicative forms; si + imperfect indicative + imperfect indicative/imperfect
subjunctive/conditional (probability);
(b) Si + imperfect subjunctive + conditional/imperfect indicative; si + imperfect
indicative + imperfect indicative; si + conditional + conditional; si + imperfect subjunctive + conditional (improbable);
(c) Si + plusperfect subjunctive + plusperfect subjunctive/compound conditional/conditional/plusperfect indicative; si + plusperfect indicative + plusperfect indicative; si + present indicative + present indicative (unreality).

4 A Causal Model Application to a Cultural Heritage Sentence Analysis

67

It also depends on the conjunction or phrase used instead of si (‘if’). In this section,
we present some of these elements and new meanings briefly, but for our study, we
only chose a small subset of all the sentences that could be rescued.
In our sentence, there are words that coincide with or belong to the family of
‘cause’ or ‘consequence’ or others with the same flavor, but traditionally legal use,
such as guilt, damage or verbs of the type to verify, which will be those that allow us
to rescue in this case those relevant paragraphs in the causal argument that justifies
the failure of the sentence. Let us look at some of those examples below:
(a) Dicho cese constituye un daño efectivo, consistente en la pérdida de los gastos
realizados ( . . . ). [‘Mentioned termination constitutes an effective damage,
consisting of the loss of expenses incurred’]
(b) ( . . . ) Se indemnice en concepto de daño emergente. [‘Compensation for
consequential damages’]
(c) ( . . . ) Se condene a la administración a la indemniaciIón de los daños y
perjuicios causados. [‘The administration has to compensate for the damages
and compensations caused’]
(d) ( . . . ) Por la que se desestimó la reclamación patrimonial derivada del perjuicio sufrido a consecuencia de la suspensión de los trabajos de perforación
y voladuras de la cantera. [‘By which the patrimonial claim derived from the
damage suffered as a result of the suspension of the drilling and blasting works
of the quarry was rejected’]
In (a) and (b) we have two common lexical markers in law. On the one hand,
(a) refers to the damage to tangible assets, that is, to the economic loss that the
investments already made entail for the plaintiff (Mármoles Baztán S.L., (MB)).
On the other hand, in (b) there is a reference to consequential damage that, in civil
law (branch in which the sentence that we analyze is inserted), refers to the losses
that are a consequence of not being complied with or that it has been belatedly
fulfilled an obligation (Real Academia Española, 2020, definition 2)1 in this case,
by the administration. In (c) ‘daños y perjuicios’ (‘damages and compensation’)
is, a well-known expression in law that refers to economic injury resulting from
actions or omissions, whether intentional or not (ibid., definition 1).2 Finally, in d)
damage is used, with a meaning close to that of effective damage, to argue that
the decisions made by the administration have involved a real damage in terms of
expected earnings.
Culpa (‘guilt’) is another of those words with causal implications that are
commonly used in legal language. In our sentence we have the derivative culpabiblísticos (‘culpabilistic’):
(a) Esta nota es la aparentemente más compleja, puesto que la doctrina común de
la responsabilidad extracontractual y por actos ilícitos deviene en un complejo

1 Retrieved
2 Retrieved

from https://dpej.rae.es/lema/da%C3%B1o-emergente
from https://dpej.rae.es/lema/da%C3%B1os-y-perjuicios

68

A. Sobrino and B. Calderón-Cerrato

fenómeno de examen sobre la relación de causalidad, la eventual concurrencia
y relevancia de concausas y la existencia de elemento, culpabilísticos. [‘This is
the apparently most complex note, since the common doctrine of tort liability
and for illegal acts becomes a complex phenomenon of examination on the
causal relationship, the eventual concurrence and relevance of concauses and
the existence of elements, guilty’].
In this case, it refers to the procedure to attribute damage to the public administration, for which it is necessary to examine the causal relationship between the
administration’s actions and the effects, as well as its relevance and determine if
those effects can really be attributed to the proceed from the administration.
As we said, we also find other voices that allow us to weave a causal network
and that do not necessarily have a specific meaning in this type of text. For
example, verbs such as derive (‘derivar’), force (‘obligar’) or check (‘comprobar’)
flexed especially in the participle in our sentence, have a semantics that implies
the influence of one origin or one agent on another or on something. Therefore,
it is necessary to consider them in our analysis, since they are relevant for the
conformation of the argument:
(a) ( . . . ) Se reconozca la responsabilidad patrimonial por los daños derivados de
la inscripción como BIC del Sistema Alkerdi Berroberría (AB). [‘( . . . ) The
patrimonial responsibility for the damages derived from the registration as LHI
of the Alkerdi Berroberría System (AB) is recognized’].
(b) El órgano que ordene un acto de ejecución material de resoluciones estará
obligado a notificar al particular interesado la resolución que autorice la
actuación administrativa. [‘The body that orders an act of material execution of
resolutions will be obliged to notify the interested individual of the resolution
that authorizes the administrative action’].
(c) La Administración ha comprobado que existe arte rupestre en el Sistema
Alkerdi Berroberría. [‘The Administration has verified that there is rock art
in the Alkerdi Berroberría System’].
Also, words typical of the idea of causality, such as causa (‘cause’), efecto (‘efect’)
or consecuencia (‘consequence’):
(a) Las partes, además se han pronunciado sobre la cuestión de fondo, de manera
que no se causa indefensión alguna. [‘The parties have also ruled on the
substantive issue, so that no defenselessness is caused’].
(b) ( . . . ) La inscripción del BIC no produce como efecto, ni revocación de la
prórroga de la explotación que ostentaba la recurrente ni la denegación de
la misma. [‘( . . . ) The registration as LHI does not produce the effect, neither
revocation of the extension of the exploitation that the appellant held, nor the
denial of the same’].
(c) La inscripción en el Registro de Bienes del patrimonio cultural de Navarra
es consecuencia necesaria de su declaración como BIC por ministerio de la
Ley. [‘The inscription in the Register of Listed Heritage Items of Navarra is a
necessary consequence of its declaration as LHI by Ministry of Law’].

4 A Causal Model Application to a Cultural Heritage Sentence Analysis

69

Regarding to conjunctions and phrases, these are particles that abound in legal
language, precisely because of their ability to relate both and other elements.
Por (‘by’) is a prominent element due to its multiple functions. It works as
introducer of causal complements and agent complements; the latter employed both
in passive sentences and in participle constructions (Herrero Ruiz de Loizaga, 1992:
342). Likewise, it conforms lexicalized phrases, as por razón de (‘by reason of’),
whose function is to introduce causal subordinate clauses; in addition to heading
prepositional phrases that are also lexicalized, as por tanto (‘therefore’) or por
ello (‘for that reason’), that allow anaphorically retaking the previous content and
introducing it with causal value (Pérez Saldanya, 2014: 3450).
Other examples traditionally defined as conjunctions and causal structures are
porque (‘because’), pues, ya que, puesto que (‘since’), dado que (‘given that’) and
al + infinitive (‘as+pronoun+to be’). All of them introduce causal subordinate
clauses and their function in this sentence is that of adjunct, providing information
that justifies or explains the content of the main clause:
(a) La actuación de la administaciIón ha sido anormal porque se procedió a
inscribir el BIC sin previo expediente de declaración ( . . . ). [‘The behaviour of
the administration has been abnormal because the LHI was registered without
priori declaration file ( . . . )’]
(b) No se ha admitido tácitamente los daños ni su valoración, pues la resolución
objeto de la litis no se pronuncia al respecto ( . . . ). [‘The damages or their
assessment have not been tacitly admitted, since the resolution that is the
subject of the dispute does not pronounce on the matter ( . . . )’].
(c) No existe vía de hecho ya que se tramitó expediente administrativo en la
declaración del Sistema Alkerdi Berroberría como BIC. [‘There is no de facto
procedure since an administrative file was processed in the declaration of the
Alkerdi Berroberría System as LHI’].
(d) En el momento de interponer demanda la sentencia no era firme, puesto que
contra ella se había presentado recurso de casación ante el Tribunal Supremo.
[‘At the time of filing the claim, the sentence was not final, since an appeal for
cassation had been filed against it before the Supreme Court’].
(e) ( . . . ) se ha interpuesto de forma prematura y extemporánea dado que pendía
proceso judicial sobre la conformidad o disconformidad a derecho de la
declaración inscripción del Sistema Alkerdi Berroberría como BIC. [‘( . . . )
it has been filed prematurely and extemporaneously, given that judicial
proceedings were pending regarding the conformity or non-conformity with
the right of the declaration of registration of the Alkerdi Berroberría System as
LHI’].
(f) La inscripción de cuevas concretas como Alkerdi I o Alkerdi II no garantizan
la protección del patrimonio al tratarse de un sistema kárstico único. [‘The
inscription of specific caves such as Alkerdi I or Alkerdi II does not guarantee
the protection of the heritage as it is a unique karst system’].
Otherwise, documents such as that of the Comisión para la modernización del
lenguaje jurídico (‘Commission for the modernization of legal language’), directed

70

A. Sobrino and B. Calderón-Cerrato

by Estrella Montolío Durán (Montolío Durán, 2011: 118), show a characteristic use
of the gerund of legal language, called cause-consequence. Its use is considered
incorrect, since it violates one of the three guidelines proposed by the author,
which is that «la acción del gerundio tiene que realizarse al mismo tiempo o antes
que la acción del verbo principal». This is the case of Interpuesto recurso, por
sentencia de 13 de junio de 2018 ORD 112/2013 se desestimó confirmándose
la resolución [‘Appeal filled, by sentence of June 13, 2018 ORD 112/2013 was
dismissed confirming the resolution, in which the confirmation of the resolution
is after the dismissal of the appeal’]. In this example, the dismissal of the appeal
happens first, and then comes the confirmation of the resolution. That is why the
use of gerund is considered incorrect, because the action expressed in gerund is not
happening at the same time or before the one of the principal verb. In this case, in
addition, it is not fulfilled that the subject of the gerund is the same as that of the
main sentence (ibid.).
With regard to conditionality, we have already seen that the boundaries between
conditionality and causality are fuzzy. Even so, it is also possible to extract from the
sentence some properly conditional lexicon, as siempre que (‘always that’), bajo la
condición de (‘under the condition of’), en (el) caso de (‘in case of’) and una vez
(‘once’) + participle. All these elements introduce conditional subordinate clauses
and express either the conditions for a specific circumstance to occur, as in (a);
or the hypothetical consequences that would occur if another act happened, also
hypothetical, as indicated in example (b). The adverbs solo or solamente (‘only’)
and si (‘if’) are also relevant, by means of which the prosthesis of the subordinate
appears postponed and its function is to restrict the meaning of the main sentence
(Bosque & Demonte, 1999: 3652), as in (c):
(a) ( . . . ) prorrogables por períodos iguales hasta un máximo de 90 años, bajo la
condición de cumplir el Plan de Restauración del Espacio. [‘( . . . ) extendable
for equal periods up to a maximum of 90 years, under the condition of
complying with the Space Restoration Plan’].
(b) ( . . . ) lo que hubiera podido variar el acto administrativo originario en caso
de haberse observado el trámite omitido. [‘( . . . ) what could have changed the
original administrative act in case of the omitted procedure had been observed’].
(c) ( . . . ) solamente podría estimarse una declaración de daños y perjuicios a
favor de la demandante si la actuación administrativa fuera contraria al
ordenamiento jurídico [‘( . . . ) a declaration of damages could only be upheld
in favor of the plaintiff if the administrative action was contrary to the legal
system’].
These elements appear throughout the sentence; however, an examination that
uses the entire causal and conditional lexicon documented here is highly complex,
especially due to discards, since sentences that are causal in themselves may not
have relevance in the chain of cause-effects that lead to the failure.
As we said at the beginning of this paper (see Sect. 4.1), our sentence has a
sub-section within the fundamentals of law called Relevant background for the
resolution of the case. Due to its relevance for the resolution, we comment very

4 A Causal Model Application to a Cultural Heritage Sentence Analysis

71

briefly on the characteristics of it: the expressed temporal indication and the use of
the present durative or of impersonal sentences. These last two are due to the claim
of objectivity typical of legal texts (Aguirrezabala & Fanduzzi, 2012: 111).

4.5 Cultural Heritage Sentence: Conditional and Causal
Structure
Once we have shown the logical and lexical properties of causal expressions, the
aim is to apply these distinctive features to the extraction of relevant causal phrases
from the heritage sentence described in point 1 with the purpose of using them to
draw a causal graph that, once pruned with Bayesian methods, will provide the nonspecialist public with a summarized interpretation of the sentence.
In the analysis of a legal sentence, it is a challenge to establish what, how and
why the convicted actors did things or engaged in reprehensible conduct that allow
responsibility to be attributed to them. To answer the question-what (what did they
do?) means to set the facts and laws related to the case, to answer the question-why
(why did they do it?) entails to show the necessity or plausibility of the judicial
ruling, and to answer the question-how (how did they do it?) means to justify the
reasoning process that evidences the coherence of the ruling from the facts and
norms. In order to contextualize the answers to these questions in the analysis of
a sentence, the heading must be retrieved and, from the facts mentioned, it should
be selected those that permit to infer the conclusion. For the sake of simplicity,
we will illustrate our proposal using a toy example. Generalizing this methodology
to a larger number of sentences is beyond the scope of this paper and would be a
challenge for future work.
In order to address this task, we select the following sentences: (1) Sentences
from the heading. (2) Phrases from the several sections -mentioned in point 1of the heritage sentence which include causal lexicon, such as the words ‘cause’,
‘effect’ or other relevant from a legal view, such as ‘fault’, ‘foreseeability’ (or
their synonyms). (3) Sentences containing inferential words, such as ‘consequently’,
‘inferred’, . . . , linked to the evidentiary process. (4) Sentences containing proven
facts; these constitute an essential part of any judgement since they are used as
premises to infer a conclusion, although they correspond to sentences without a
specific lexicon and they are so difficult to retrieve by a non-manual procedure.
The lexicon will make it possible to retrieve sentences with causal content. Those
sentences can be represented by means of a graph. The proven facts will permit to
instantiate specific nodes of the graph with values and, consequently, to intervene
in the causal chain by isolating those that lose their influence and thus varying the
possible attribution of blame.
Using the lexicon referred to in Table 4.2, it is possible to retrieve several
sentences with causal content in the heritage sentence analyzed (see Annex). It
deserves a special mention the heading and the proven facts, propositions that

72

A. Sobrino and B. Calderón-Cerrato

Table 4.2 Scrutinized lexicon in the heritage sentence
Lexicon

+Causal
Cause
Effect
Damage
Loss
Fault

−Causal
By absence

Evidential
Derived
Consequence
Committed
Proved
Foreeseable/typical hehaviour

truthfully describe what happened. The heading can be highlighted not so much for
the specificity of its vocabulary but for its position in the legal sentence (it appears
in first place). In the legal sentence, the proven facts are to be found in the section
‘Background relevant to the resolution of the case’, and they are rescued as the
lexical characterization described in Sect. 4.4 advises.
Retrieved sentences can be represented by means of a causal graph that illustratively abbreviates the information contained in it. A causal graph consists of nodes
labelled with data or information, and links denoting the causal relationship between
them. An interventionist or counterfactual analysis of the causal graph permits to
write down a possible instantiation of a node and so discarding other nodes linked
to it as possible causes (Fig. 4.3).
The ruling sentence, which is negative for the interests of Mármoles Baztán
(MB) -the efficient cause of the complaint-, bases its argumentation on whether
the administration voluntarily took the decision to register the exploited system as
an Listed Heritage Item (LHI) (value 0) -as Mármoles Baztán says- or by legal
imposition (value 1). Therefore, the crucial node in the causal tree is the one now
highlighted in colour. It is a fact of law that the Administration must register as LHI
any geographical area in which rock remains are found. Therefore, the node receives
the value 1, becomes active and in position to transmit causality to those that follow
it. At the same time, since it is instantiated, it breaks any connection with the nodes
that precede it, deactivating its possible causal influence (Fig. 4.4).
The administration could have omitted the inscription of the Alkerdi Berroberría
system (AB) as LHI if it was not obliged by law so that the causal flow would be
that of the complete graph and the liability for loss of profit causally attributable
to the department of the administration that had made the registration. But if the
law obliges the administration to inscribe as LHI any area in which Neolithic
engravings are found, this exonerates it from any liability for the damage caused.
An interventionist analysis of causality has, in this case, an illustrative and truthful
exemplification and shows how structural models make it possible to intervene in
causal nodes and attribute degrees of responsibility in terms of proximity to the
effect (Cfr. Stapleton, 2015).
In the sentences extracted from the sentence and in the graph that summarizes
them, the following issues can be noted in relation to the theoretical background set
out in the previous sections:

4 A Causal Model Application to a Cultural Heritage Sentence Analysis

Fig. 4.3 Causal graph of the analyzed legal sentence

73

74

A. Sobrino and B. Calderón-Cerrato

Fig. 4.4 Instancing and intervention in the causal graph of the sentence

1. Sentences are plenty of causal knowledge, as is revealed by the considerable
number of paragraphs in which causal lexicon can be found. The causal content
has a descriptive, but principally an evidential relevance, as the mined sentence
from the resolution section shows.
2. Sentences containing the word ‘obliged’ in the factual background may refer to
legal text of necessary or unavoidable compliance and, therefore, to legal facts to
which value 1 must be assigned when they appear. Thus, we have included the
term ‘obliged’ in the section on the logical lexicon.

4 A Causal Model Application to a Cultural Heritage Sentence Analysis

75

3. The expression ‘for lack of’ points to a negative cause of prevention by omission:
the failure to publish the restructuring plan of Mármoles Baztán (an event which
is due to intentional agents, not to natural forces) causes the nullity of the
extension of the exploitation concession of the Alkerdi Berroberría system for
the marble company.
4. The words ‘damage’ and ‘harm’ point to the possible liability of the defendant,
from which it is exonerated if it is an unintended effect of its activity. The
distinction between a central or collateral effect is relevant. The but-for test often
serves this purpose characterizing the necessity or the conditio sine-qua-non of
the cause. Side effects are unintended effects, but in some cases, foreseeable
effects, i.e., derived from the cause.
5. The words ‘derived’ or ‘consequently’ are part of sentences with logical taste and
causal implications: ‘for damage deriving from the extension of the concession’
is equivalent to ‘for damage caused by the extension of the concession’.
Consequently, they point to a necessary inference from a premise which is
by definition: ‘true’, being so stipulated by law; i.e., something therefore
‘foreseeable’, ‘irresistible’ or ‘force majeure’.
6. A relevant question in this legal sentence, and therefore worthy of attention,
is whether the administration carried out the archaeological studies because of
the cultural alarm caused in the population living near the quarry or because
of another more spurious type of concern, such as the fear of landslides or the
appearance of cracks in their houses. In other words, it is not clear whether
people had suspicions about the archaeological value of the field and this caused
them concern or whether, fearing that the quarry activity would cause damage
to their geographical surroundings or their personal properties, they urged the
administration to carry out archaeological studies. In that case, the cessation of
quarrying activity would have had the desired collateral effect.
This legal sentence shows not only the abundant presence of causal relationships,
but also of theoretical aspects related to this subject. This is pointed out, f. ex.,
in the section devoted to the ‘Legal backgrounds’, where it is stated that a legal
doctrine used in the debate derives from a complex examination of three aspects:
“causality, the possible concurrence or relevance of causes and the existence of a
fault element”, emphasizing multi-causality and the restrictions that must be taken
into account when considering the causal factor, whether in a forward or backward
(counterfactual) interpretation.

4.6 Conclusions and Future Work
In this paper, we have analyzed a sentence on a lawsuit focusing on cultural heritage
in wich cave engravings are involved. The sentence shows an argumentative thread
where causal and conditional logic with causal content has a notable presence. In the
text we note the frequent presence of phrases involving causal relationship between

76

A. Sobrino and B. Calderón-Cerrato

the proven facts or proved or not damages resulting from those facts. With no claims
of completeness, and recognising that important points of law are left out of the
picture, we have summarized the sentence taking into account the facts described
and the causal relationship between them, allowing the justification of the verdict.
Furthermore, we have deactivated the nodes that are spurious resulting in a scaleddown version of the above-mentioned graph, useful for informational purposes of a
non-specialist people.
Court documents are increasingly available in an electronic format, facilitating
their accessibility and digital processing. Sentences are documents that follow
a general and conventional structure, where specific paragraphs have specific
communicative functions. Also, they contain descriptions of facts and implicit and
explicit references to legal norms. The facts are frequently expressed in a narrative
way; they attend to events or episodes and are relevant because they are premises
that serve to justify the verdict. Causal graph schematically show the causal link
between norms, facts and failure, helping to explain the attribution of guilt or
responsibility of the agents involved in the legal trial, focusing on causation as
something typical of intentional agents who have diverse or opposing interests.
The analysis carried out in this paper must be generalized so that it becomes
useful to the litigations that have cultural heritage as focus. Sentences are, in
general terms, difficult to understand for general public, who are often reluctant
to read them in their entirety. The analysis proposed here can be extended to
other sentences, using techniques of natural language processing and automatic
generation of text summaries, thus placing itself in the tradition of automating
the information contained in legal texts, such as LetSum systems (Farzindar &
Lapalme, 2004) or DIRECT (Hoekstra & Breuker, 2007). Therefore, a future work
that would be fruitful to undertake is to generalize and automate the technique
of generating causal graphs of sentences on cultural heritage to achieve automatic
summaries accessible to all the people, allowing a visual but sufficiently informative
understanding of their content.
Acknowledgement This research was funded by the Spanish Ministry for Science, Innovation and
Universities (grants TIN2017-84796-C2-1-R, PID2020-112623GBI00, and PDC2021-121072C21) and the Galician Ministry of Education, University and Professional Training (grants
ED431C2018/29 and ED431G2019/04). All grants were co-funded by the European Regional
Development Fund (ERDF/FEDER program).

Annex
SENTENCE N◦ 8/2020
PRESIDENT,
Dª Mª JESÚS AZCONA LABAINO
MAGISTRATES,
D. ANTONIO SÁNCHEZ IBÁÑEZ

4 A Causal Model Application to a Cultural Heritage Sentence Analysis

77

Dª ANA IRURITA DIEZ DE ULZURRUN
In Pamplona, February 4 two thousand and twenty.
Seen by the Contentious-Administrative Chamber of this Hon. Superior Court
of Justice of Navarra, constituted by the Magistrates expressed, the judicial decree
of the appeal number 306/2018, filed against Resolution 95/2018 of April 6, of the
General Director of culture and the Príncipe de Viana Institution, being parties in it:
as appellant the entity “MÁRMOLES BAZTÁN, SA”, represented by the Lawyer
Mr. Miguel José Leache Resano and assisted by the Attorney Mr. Oriol Prósper
Cardoso; as defendant the FORAL COMMUNITY OF NAVARRA, represented
and directed by the Lawyer of the Legal Services of the aforementioned Public
Administration; and as co-defendant the insurer “ZURICH INSURANCE PLC
SUC. EN ESPAÑA”, represented by the Attorney Ms. Natividad Izaguirre Oyarbide
and assisted by the Attorney Ms. Olga Triguero Arrojo.

The Background
FIRST. After the appropriate procedural steps, by means of a document presented
on December 28, 2018, the claim corresponding to the appeal of the heading in
supplication was formalized that a sentence be handed down by which, allowing
the appeal, the appealed resolution is annulled, the patrimonial responsibility
for the damages derived from the registration as LHI of the Alkerdi Berroberría
System, and the Administration is ordered to indemnify Mármoles Baztán, SA
the amount of 7,923,207 euros for consequential damages and 31,829,938.32
euros plus the legal interest of said amount calculated from the date of the claim
in administrative proceedings until the notification of the resolution, as well as
the payment of the costs procedural.
SECOND. Once the corresponding transfer was made, in writing presented on
February 13, 2019, the defendant Administration opposed the demand, based on
the facts and fundamentals of law that it deemed appropriate. In the same terms,
the co-defendant made its answering brief.
THIRD. Received the trial lawsuit and completed the process of conclusions, it was
indicated for voting and ruling that took place on February 4, 2020, being the
speaker Mrs. Magistrate Mrs. ANA IRURITA DIEZ DE ULZURRUN.

Fundamentals of Law
FIRST. Approach to the Contentious-Administrative Appeal
The subject of this contentious-administrative appeal is Resolution 95/2018 of April
6, of the General Director of Culture, Príncipe de Viana Institution, by which the

78

A. Sobrino and B. Calderón-Cerrato

claim for patrimonial liability made by Mármoles Baztán, S.A. is inadmissible on
August 31, 2017 for the economic damage suffered as a result of registration in
the Register of the Property of Cultural Interest as an Archaeological Zone of the
Alkerdi Berroberría System. It is reasoned in the aforementioned resolution that the
registration of the LHI is an act of non-declarative procedure that lacks substantivity
to produce damages, in addition to the impossibility of issuing a pronouncement on
the merits while the judicial pendency persists, there are various processes in process
regarding the declaration/inscription of the aforementioned cultural asset. In support
of this reasoning, he cites the sentences of the AN of November 19, 2013, or of the
Supreme Court of October 21, 2008. This premature claim due to lack of exhaustion
of legal remedies prevents the alleged damage from being produced, specified and
determined, STS July 10, 1992 or September 30, 2014. Consequently, the claim is
inadmissible.
The plaintiff defends the nullity of the aforementioned resolution by affirming
the concurrence of each and every one of the requirements to be able to assess the
patrimonial responsibility of the defendant Administration.
In the first place, it alleges that it is necessary for the administration to issue
an act recognizing or declaring the existence of the LHI, as required by article
40.2 Law 16/1985 on Spanish Historical Heritage. Registration also requires prior
determinations such as the delimitation of the area that is declared LHI, which
requires an administrative act. This is the procedure followed in all the Autonomous
Communities. In this case, there is no administrative act declared by the LHI, so
there has been a de facto way, on which the claim of patrimonial responsibility is
based. The inscription of the Alkerdi System in the Registry includes the mention of
the level of protection that corresponds to it, therefore the exploitation of the quarry
cannot continue. Said termination constitutes an effective damage, consisting of the
loss of the expenses incurred and the loss of the future profits of the exploitation.
The causal relationship between the administration’s activity (LHI registration) and
the damages derived from the impossibility of continuing exploitation is evident.
The amount claimed amounts to 7,923,207 euros for expenses and 31,829,938' 32
euros for the loss of the exploitation. The damage is unlawful since the behaviour of
the administration has been abnormal because the LHI was registered without prior
declaration file, and the administration must compensate for the damages derived
from the imposed limitation (STS of December 2, 2014). The claim is not premature,
because the LHI statement is executive and has caused damage that is permanent and
can be the subject of a claim.
For all this, he requests that his claim be estimated, the nullity of resolution
95/2018 of April 6 is declared, the administration is ordered to compensate for
damages caused (7,923,207 euros for consequential damages and 31,829.938.32
euros in concept of lost profits), the interests from the claim, the proceedings and
the costs.
The defendant Administration is interested in the dismissal of the contentiousadministrative appeal, and that the contested resolution is declared in accordance
with the law. Remember that the claim is based on damages supposedly produced
by the declaration or registration as LHI of the Alkerdi Berroberría system, the

4 A Causal Model Application to a Cultural Heritage Sentence Analysis

79

declaration was the subject of the appeal ORD 525/2016, which had not been
resolved at the time of issuing resolution 95/2018. The appeal was dismissed by
sentence 200/2018 of May 29, declaring the administrative action in accordance
with law. At the time of filing the claim, the sentence was not final, since an appeal
had been filed against it before the Supreme Court. By order of March 28, 2019, the
appeal was inadmissible but resolution 95/2018 is in accordance with the law when
it states that the claim must be considered untimely and premature, since when the
damage is attributed to the annulment of an act, the term to claim has to start from
its firmness as provided in article 67.1 Law 39/2015. In this case, at the time of
issuance of the resolution under appeal, sentence 200/2018 had not reached firmness
and therefore the claim it was formulated ahead of time.
Alternatively, the claim should be dismissed. There is no de facto procedure since
an administrative file was processed in the declaration of the Alkerdi Berroberría
System as LHI, inasmuch as there are manifestations of rock art in it. There has been
no defencelessness since it is the current legislation that determines which assets
are considered LHIs. The inscription in the Listed Heritage Items of the cultural
heritage of Navarra is a necessary consequence of its declaration as LHI by ministry
of the Law – articles 24 and 13 of Law 14/2005. The administration was obliged
to register, which is what it has done in this procedure without there being any
abnormal operation of the public services.
Based on the above and in terms of the damages, they are not sufficiently
specified either in the administrative claim or in the lawsuit. Many of them coincide
with those claimed in procedure 37/2017 filed against Resolution 255/2016, of
November 21, of the Technical Secretary General of the Department of Economic
Development of the Government of Navarra, by which the patrimonial claim derived
from the damage suffered as a result of the suspension of the drilling and blasting
works of the quarry, agreed by resolution 668/2014 being parties thereto. Likewise,
the causal relationship between the declaration or registration as LHI of the Alkerdi
Berroberría system and the damage suffered for which it is now claimed has not been
established. Damages are not a consequence of the declaration or registration of the
LHI. The damages or their valuation have not been tacitly admitted, as the resolution
that is the subject of the dispute does not pronounce on the matter because it does
not enter into valuation of said concepts. Quantification is not possible because it
was not carried out at the administrative headquarters. For all this, he requests that
the claim be dismissed.
Zurich Insurance opposes the lawsuit by first claiming that its policy does
not cover the claim that is the subject of this process. Once the foregoing has
been established, it indicates that the inadmissibility is correct, given that the
administrative claim was filed before the ruling was issued on the legal compliance
of the declaration of the Alkerdi Berroberría System as LHI. The final sentence is
the one that determines the moment of birth of the action to claim.
Alternatively, the claim cannot succeed because registration as a LHI is not
the execution of an administrative act but compliance with a legal provision once
the existence of rock art has been proven in compliance with the provisions of
article 15 and DA 2ª of the Foral Law 14/2005. It was not necessary to process

80

A. Sobrino and B. Calderón-Cerrato

the procedure provided for in article 19 of the aforementioned Foral Law since
the budget for such a declaration was agreed, as stated in sentence 200/2018 of
May 29 of this Chamber. The inscription in the Registry is a consequence of the
declaration of the LHI, required by article 24 of the LF 14/2005. It is irrelevant
how the procedures are managed in other Autonomous Communities, there is own
regulation in our legislation. Regarding damages, the same concepts as in PORD
37/2017 and 313/2018 are being claimed without in this case there being a causal
relationship between the administrative action and said damages.
SECOND. Relevant Background for the Resolution of the Case
From the documentary in the file, the following relevant facts emerge, which have
not been denied by the parties:
1. On November 8, 1984 it was granted to “Mármoles del Baztán, S.A.” Alkerdi’s
exploitation concession title for the extraction of marble for a period of 30 years,
extendable for equal periods up to a maximum of 90 years, under the condition
of complying with the Natural Space Restoration Plan.
2. By Resolution 901/2013 of October 10 of the General Director of Industry,
energy and innovation, the concession of the Alkerdi exploitation was extended,
with marble limestone as its object, with a period of validity of 30 years from
2014, expiration date of the initial concession, the exploitation project as well as
the restoration plan were approved.
3. On July 15, 2014, a blast took place that caused great alarm in the population
due to the power and noise generated, after which the Urdax City Council
issued a resolution dated July 25, 2014 requiring the concessionaire to comply
with the established in resolution 513/1999 of the General Director of Culture
(Príncipe de Viana Institution), within the framework of the 1999 Environmental
Impact Study, and in order to protect the archaeological sites of Berroberría and
Alkerdi, as well as the rest of the caves and existing water currents, do not
carry out blasting in situ, at the same time that it requires the Department of
the Environment to, together with the Department of Culture and the Department
of Economy, Finance, Industry and Employment, as well as the CHC, inspect in
the field of their respective competencies the activity that is being carried out in
the concession so that the appropriate corrective measures may be required.
By resolution 668/2014 of August 12, the blasting is provisionally suspended
and various studies are started on them and the archaeological status of the nearby
caves.
4. In August 2016, as a result of the research study carried out by the Sociedad
Ciencias Aranzadi, commissioned by the Government of Navarra, a series of
engravings from the Paleolithic era were discovered in the cave of Alkerdi 2,
until then unknown, and classified as the most ancient of Navarra. The report also
notes that the entire Karst system as a whole needs to be protected by warning
that continued activity at the quarry would affect undiscovered cultural heritage.

4 A Causal Model Application to a Cultural Heritage Sentence Analysis

81

5. As a result of all this, on August 29, 2016, the entire Alkerdi Berroberría system
was registered in the Register of Listed Heritage Item of Navarra as LHI and
in the category of Archaeological Zone. This statement was the subject of the
appeal ORD 525/2016, issuing sentence 200/2018 of May 29, which dismissed
the claim as it did not appreciate the factual way in the declaration of the Alkerdi
Berroberría System as LHI and the unnecessary observance of the procedures of
article 19 of the Regional Law 14/2005 given the verification of the existence of
rock art in the caves of the system. The sentence also declared inadmissible the
claim for damages for the stoppage of the exploitation activity, “because it was
understood that they were not a consequence of the declaration of the LHI but,
where appropriate, of the revocation of the extension of the concession” (FJ 7◦ ).
By order of March 28, 2019, the Supreme Court rejected the appeal for cassation.
6. By Regional Order 192/2016 of September 12 of the Minister of Economic
Development, the nullity of resolution 901/2013 of October 10 was declared ex
officio, by which the extension of the Alkerdi concession had been declared due
to lack of publication of the plan restoration. The Foral Order is appealed giving
rise to ORD 113/2017 in which dismissal sentence No. 251/2018 of June 28 was
issued.
7. Once the extension was processed again, by Resolution 197/2016 of October 28
of the General Director of Industry, Energy and Innovation, the existence of the
new LHI was denied. Appeal filed, by sentence of June 13, 2018 ORD 112/2013
was dismissed confirming the resolution.
8. On August 31, 2017, Mármoles Baztán filed a patrimonial claim with the
Counselor of the Department of culture, sports and youth of the Government
of Navarra, for the damages derived from the registration of the LHI for the
amount of 7,923,207 euros as consequential damages and 31,829 .938,32 euros
for the loss of the exploitation concession, which is inadmissible by resolution
95/2018 of April 6 of the General Director of Culture and the Príncipe de Viana
Institution, subject of this litigation.
9. On August 31, a patrimonial claim was also filed before the Minister of
Economic Development of the Government of Navarra for the damages derived
from the declaration of nullity of the extension of the mining concession by
Foral Order 192/2016 of September 12, for the amount of 7,923,207 euros,
inadmissible by resolution 72/2018 of April 20, giving rise to ORD 313/2018.
THIRD. On the Inadmissibility of the Claim of Patrimonial Responsibility
Resolution 95/2018 of April 6 rejects the claim of responsibility articulated by
the appellant company on the understanding that it has been filed prematurely and
extemporaneously given that judicial proceedings regarding the conformity or nonconformity to the right of the registration declaration of the Alkerdi Berroberría
System were pending as LHI. On the date of said resolution, sentence 200/2018 of
May 29, ORD 525/2016, had not been issued on this matter. At the time of filing
the contentious-administrative appeal; on July 25, 2018, the sentence was pending

82

A. Sobrino and B. Calderón-Cerrato

an appeal before the Supreme Court, an appeal that has finally been inadmissible by
order of March 28, 2019. That is, at this time if it is possible to analyze the claim
of patrimonial responsibility, since the procedural impediment for the birth of the
action wielded by defendants, has disappeared, since there is already a final sentence
that affirms the conformity to law of the declaration of the Alkerdi Berroberría
System as LHI. The parties have also ruled on the substantive issue, so that no
defencelessness is caused.
Notwithstanding the foregoing, if it is necessary to indicate that the pending
judicial process on the LHI registration declaration was not an obstacle to resolving
the claim of patrimonial liability articulated by Mármoles Baztán SA, since the
action was based on the damages that the registration of the LHI, declared by way
of fact in the plaintiff’s thesis, has caused her, a statement that is executive. That
is to say, it was not necessary to wait for the issuance of a sentence on whether or
not said statement was in accordance with the law, since the appellant claims for
the damages that in his opinion the statement of the LHI itself has caused it and its
registration in the corresponding Registry affecting the exploitation concession held
over the Alkerdi quarry.
FOURTH. Regarding the Patrimonial Responsibility as a Result
of the Registration of the Alkerdi Berroberria System as LHI in the
Register of Listed Heritage Items of Navarra
The patrimonial liability of the public administration is regulated by articles 32–
35 of Law 40/2015 of October 1, on the Legal Regime of the Public Sector, legal
precepts that make explicit the general principle of compensation by the Public
Administrations for damages and losses caused by the operation of public services,
constitutionally sanctioned in Spain in article 106.2 of the Constitution – which
indicates that “Individuals, in the terms established by law, shall have the right
to be compensated for any injury they suffer in any of their assets and rights,
except in cases of force majeure, provided that the injury is a consequence of the
operation of public services”. These norms are applicable to Local Entities in merit
of the normative provision of article 54 of the Regulatory Law of the Bases of the
Local Regime (Law 7/1985, of April 2, which refers to the general legislation on
administrative responsibility, to the as well as article 223 of the Regulations for the
Organization and Operation of Local Corporations (Royal Decree 2568/1986, of
November 28.
The aforementioned legal regime has been extensively applied – and, consequently, developed and interpreted – by the Jurisprudence (both applying the current
and cited article 32.1 and its predecessor, article 139 of Law 30/1992), forming a
body of doctrine, within which it can be affirmed that, for the declaration of the
patrimonial responsibility of the Administration, the concurrence of two substantial
positive requirements, one negative and the other procedural, is necessary:

4 A Causal Model Application to a Cultural Heritage Sentence Analysis

83

A. The first of the positive ones is that there is an effective, economically evaluable
and individualized damage with respect to a person or group of people, that
the interested party does not have the legal duty to bear. This requirement is
included in the elements to be tested, although some of its aspects are produced
or manifested within the scope of the arguments of the parties (simplified by
the existence of a catalogue of jurisprudential solutions that can be invoked and appreciate- without further discussion), such as the extent and nature of the
compensable damages, the legitimate persons and the cases in which there is a
legal obligation to bear the damage.
B. The second positive requirement is that the damage is attributable to a Public
Administration. This note is apparently the most complex, since the common
doctrine of tort liability and for illegal acts becomes a complex phenomenon of
examination on the causal relationship, the eventual concurrence and relevance
of causes and the existence of guilty elements. However, in the administrative
patrimonial responsibility, in the configuration that we have enjoyed since the
1957 Law (even since the 1954 Expropriation Law), it is greatly simplified by
the legal expression that the injury “is a consequence of the normal or abnormal
operation of public services” (articles 122 of the Forced Expropriation Law,
40 of the Law of the Legal Regime of the State Administration and 32 Law
40/2015). Fundamentally, there are four imputation titles for the purpose of
determining the responsibility of an Administration with respect to a specific
injury: that the injury occurs as a direct consequence of the ordinary exercise of
the service; that the injury is due to an abnormality or non-functioning of the
public service; that there is a risk situation created by the Administration in the
area of production of the harmful event, or that there is an unjust enrichment by
the Administration.
C. The negative factor is that it does not obey force majeure damage. This note has
been conceptually and jurisprudentially specified in the sense that it concerns,
in order to be able to the concurrence of force majeure, an event produced with
the traditional requirements that distinguish force majeure from the fortuitous
event (concepts of predictability and irresistibility), but specifically that it is a
cause outside the scope of the public service.
D. The procedural element is that the appropriate claim is formulated before the
responsible Administration in the period of 1 year, counting from the occurrence
of the injury. This element raises the question of the initial term – on which there
are sufficient jurisprudential details – and on the Administration to which the
claims should be addressed if several of them concur.
The plaintiff considers that there is a set of actions of the administration that are the
determinants of the damage that has been caused and, which include the extension
of the concession, the investigation of the system by hiring the Aranzadi study
society, the declaration as LHI and the delimitation of the Alkerdi Berroberría area,
including the Alkerdi quarry as LHI. The appellant highlights that the delimitation
of the protection zone and the non-continuation of the exploitation are not produced
by operation of the law but are a decision of the administration and both actions

84

A. Sobrino and B. Calderón-Cerrato

are those that suppose normal or abnormal action of the administration causing the
damage that it has suffered, consisting of expenses incurred to continue with the
mining operation after the renewal of the initial concession and the loss of profit
that occurs when not being able to continue with the activity.
However, the assumptions required by law and jurisprudence do not meet for the
claim to prosper.
First of all, it is necessary to indicate that the registration of the Alkerdi Berroberria System as an archaeological zone LHI is an administrative action in accordance
with the law. Said question as well as the analysis of the procedure followed for
the declaration of the LHI; the extension of the same and its incompatibility with
the blasting system for the extraction of mineral, was alleged in the procedure ORD
525/2016 of this Chamber, and resolved by the final sentence 200/2018 of May 29
that reasons in this regard:
SIXTH. On the Declaration of LHI by the Ministry of Law
and the Protection of the Alkerdi Berroberría System
The plaintiff also challenges the appealed resolution alleging the contradiction of
the LHI statement by operation of the Law with art. 93 of Law 39/2015, which
prohibits the Administration from initiating a material enforcement action that limits
the rights of individuals without previously adopting the resolution that serves as a
legal basis, for which it has acted in fact, causing defencelessness.
The art. 93 of Law 30/1992 (although the plaintiff mistakenly points out art. 93
of Law 39/2015, which has no relation to the issue under discussion) establishes
that: “1. Public Administrations shall not initiate any material action of execution
of resolutions that limit the rights of individuals without having previously adopted
the resolution that serves as a legal basis. 2. The body that orders an act of material
execution of resolutions will be obliged to notify the interested individual of the
resolution that authorizes the administrative action”.
The precept is framed in the TV, referring to the execution of administrative acts
and is not applicable in this case because the registration as LHI of the Alkerdi
Berroberría System is not an execution of a previous administrative act, but rather,
once the existence of rock art, the inscription is carried out in the General Register
of Listed Heritage Items of the Ministry of Education, Culture and Sports, with the
category of archaeological zone, in compliance with the provisions of art. 15 and the
Second Additional Provision of Foral Law 14/2005, of November 22, on the Cultural
Heritage of Navarra, which is the one that establishes that “Listed Heritage Items
are declared by the Ministry of this Foral Law: a) Caves, shelters and places that
contain manifestations of rock art, as well as prehistoric megalithic manifestations”.
The Administration has verified that there is rock art in the Alkerdi Berroberría
System and, in compliance with legal provisions, has proceeded to request registration as a LHI, in the exercise of its powers (art. 4 of the Foral Law 14/2005),
and this does not imply consecrating the impunity of the Administration, because
said registration is subject to the control of the Courts, as is the case with this

4 A Causal Model Application to a Cultural Heritage Sentence Analysis

85

contentious-administrative appeal, and neither has material defencelessness been
caused to the plaintiff, who is the only one that could determine the nullity of the
administrative action following the constant doctrine of the Constitutional Court
that has been stating that the defencelessness with constitutional legal significance
occurs only when the interested party is, unjustifiably, unable to impetrate the
judicial protection of their rights and legitimate interests or when the violation of
procedural or procedural norms carries with it the deprivation of the right to the
defense, with the consequent real and effective damage to the interests of the affected
party by being deprived of their right to allege, prove and, where appropriate, to
reply to the contrary arguments (Constitutional Court Sentences 31/1984, 48/1984,
70/1984, 48/1986, 155/1988 and 58/1989, 161/2001 among many others).
The STS of February 1, 2018 (ROJ: STS 350/2018 – ECLI: ES: TS: 2018: 350)
Sentence: 139/2018 Appeal: 3218/2015 Speaker: Maria Del Pilar Teso Gamella,
recalls that: “jurisprudence of this Chamber has been especially restrictive regarding the treatment of this ground for full nullity, based on article (62.1.e) of Law
30/1992, declaring that the formal defects necessary to apply this radical nullity
must be of such a dimension that the procedure must have been completely and
absolute dispensed with, and the omission of any of its procedures is not enough.
Having singularly valued” the consequences produced by such omission to the
interested party, the lack of defence that has really originated and, above all, what
could have changed the original administrative act in case the omitted procedure
had been observed (SSTS of October 17, 1991 and May 31, 2000) (STS of May 5,
2008).
The art. 19 of the Foral Law 14/2005 establishes the procedure for the declaration of Listed Heritage Items, with the initiation agreement, 30-day public
information process in the case of real estate, the provisional application to the
affected assets of the protection regime established in the provincial law, the hearing
of the interested parties and the town councils in whose term the real estate and the
Administration Departments of the Provincial Community affected by reason of their
competences, the mission of the technical reports necessary for the description of
the property, as well as the justifications of the relevance and singular character that
determine its declaration as Listed Heritage Item. Likewise, the mandatory report of
the Navarre Council of Culture and the declaration as Listed Heritage Item by the
Government of Navarra must be included, at the proposal of the competent body,
with the subsequent registration in the Register of Listed Heritage Items of Navarra
that will be communicated to the General State Administration, the City Council
where the property is located and the interested parties.
This is the general procedure, however, the same law contains the express
provision that the caves, shelters and places that contain manifestations of rock
art, as well as prehistoric megalithic manifestations, are Listed Heritage Items
by ministry of this Foral Law. This legal provision displaces, in the case of caves
with rock art, the procedure contained in art. 19 and it is the Law itself that
determines that these places, once the existence of rock art has been verified, are
Listed Heritage Items and, consequently, the inscription in the Register of Listed
Heritage Items of Navarra proceeds.

86

A. Sobrino and B. Calderón-Cerrato

Therefore, the Provincial Administration has acted in accordance with the
specific provisions for this type of property contained in the Provincial Law 14/2005,
which determines the non-existence of the alleged de facto way.
Regarding the allegation that the technicians of the Administration’s Mining
Section maintain the criterion that it is not necessary to protect the entire massif
and that as only rock art has been found in the Alkerdi and (supposedly) Alkerdi 2
caves, only It can be understood that the declaration of LHI takes place by operation
of the law of said cave, and not of the rest of the massif, considering the evidence
practiced in court, it must also be rejected. Thus, although D. Victorio, a mining
engineer, who made a report for the plaintiff, maintains that the karst massif is
irregular, without an established pattern and the different caves may not be linked,
a connection study must be carried out between the caves and that the quarry can
be exploited without blasting and without damaging the heritage, D. Jose María
stated that Alkerdi II is the mouth, but the entire karst system must be protected, the
entire network of cavities that are connected. They have reached level 3, level 4 is
missing which is where the current river is. The caves communicate because there
is circulation of air and water. There is only painting in Alkerdi II and engravings
in Alkerdi I and Alkerdi II. The condition of the quarry is proven because dust, cut
marble fragments, plastics, etc. are entering, water is leaking into the cave. They
are remains of the quarry and it has already affected the cave. It concludes that the
quarry is incompatible with the cave, even without blasting, the extractive activity as
a whole is incompatible. In the same way, Don Juan Ramón, professor at the school
of mines, states that the caves form a karst system, Dª Estrella, a geologist, declared
that it is a single system and that the entire system must be protected because it is not
fully investigated, not quarry activity must continue. In the square where gravel is
accumulated there are two holes that communicate with the painting room, so, in his
opinion, the operation of the quarry cannot be followed by the currents of humidity
and air and CO2. There is incompatibility between the quarry and the protection
of the caves. Finally, D. Adriano, a geologist who participated in the study by the
Aranzadi Science Society, stated that Alkerdi II is under the quarry, that the system
is unique and the dynamics of the entire system, therefore, to protect it, they have
to act on the entire system and insists, like the other technicians, that the quarry is
incompatible with the conservation of heritage.
Assessing the explanations offered by the different technicians in trial, all the
technicians who have declared are conclusive that it is a unique karst system that
requires global protection and, therefore, the declaration of Listed Heritage Item of
the entire System are forceful. The technician who disagrees with this conclusion,
D. Alonso, what he says is that the different caves may not be linked, does not affirm
with verified data that they are not linked and although he states that, in his opinion,
the quarry can be exploited without blasting and without damaging the heritage,
he does not offer sufficient reasons to estimate his conclusions to the detriment of
those of all the other technicians who have deposed in court. The reasons offered
by the experts in geology and archaeology who have judicially declared endorse
the inscription as a Site of Cultural Interest of the Akerdi-Berroberría System, not
of isolated caves, and, in this sense, the comparison made by D. Antonio was very

4 A Causal Model Application to a Cultural Heritage Sentence Analysis

87

graphic when he pointed out that it would be like protecting an altarpiece in a
church and not protecting the church that contains it. It is not possible to carry out
the restrictive interpretation that the plaintiff postulates in its brief of conclusions
because the protection of the archaeological heritage must be carried out in an
integral way and for this purpose the registration of the entire System as LHI is
oriented. The inscription of specific caves such as Alkerdi I or Alkerdi II does
not guarantee the protection of heritage as it is a unique karst system. Due to the
foregoing, this ground of challenge must also be rejected.
That is to say; there was no de facto way in the LHI declaration and it was
necessary, in view of the expert reports carried out, to register the entire Alkerdi
Berroberría system in the Register of Listed Heritage Items of Navarra, including
the Alkerdi quarry. These are actions that the Government of Navarra has carried
out in full compliance with Regional Law 14/2005 of November 22, on the
Cultural Heritage of Navarra and therefore we are facing a normal operation of
the administration.
Based on the above, the registration of the Alkerdi System as LHI in the Registry
does not produce any more effects than those provided for in LF 14/2005 regarding
the protection regime of the aforementioned assets, which for properties of cultural
interest are included in articles 35 41 of the aforementioned rule. The registration
in itself has no effect on the exploitation of the quarry, which in this case was not
even paralyzed at the time of initiation of the declaration procedure of the LHIarticle 19.1 d Foral Law 14/2005 d. The initiation of the file of declaration of
cultural interest with respect to a real estate will determine the suspension of the
corresponding municipal licenses for subdivision, construction or demolition in the
affected areas, as well as the effects of those already granted. The works that due
to force majeure had to be carried out without postponement in such areas will
require, in any case, the authorization of the Department responsible for culture.,
since it had been paralyzed by a resolution other than said procedure, such as the
Resolution 668/2014 of August 12, final resolution. Therefore, there is no causal
relationship between the administrative action and the damage that is said to be
borne by the company, since the registration of the LHI does not produce as an effect
or revocation of the extension of the exploitation that the appellant held or the herself
denial. The revocation of the extension of the concession was agreed by Resolution
901/2013 of October 10 of the Director General of Industry, Energy and Innovation
of the Government of Navarra, in response to the omission of an essential procedure
in its processing, such as the submission to public information of the restoration
plan submitted by the applicant company. Subsequently, by Resolution 197/2016 of
October 28 of the General Director of industry, energy and innovation, the extension
of the requested exploitation was denied, since it was understood that the mining
exploitation works on the Alkerdi quarry were incompatible with the preservation of
the LHI System Alkerdi Berroberría, having reported unfavorably both the Príncipe
de Viana Institution based on a report prepared by the Aranzadi Studies Society and
the Environmental Quality and Climate Change Service. In other words, the nullity
and denial have been based on issues unrelated to the LHI registration; one of a
procedural nature and the other derived from the assessment of the compatibility

88

A. Sobrino and B. Calderón-Cerrato

between the specific intended exploitation and the necessary preservation of the
Alkerdi Berroberría System. For this reason, it is necessary to conclude that what
has been able to generate the damages that are claimed here – expenses incurred
to continue with the exploitation and derived from the loss of profits – would in
any case, the nullity of the extension agreed by Foral Order 192/2016 of 12 of
September of the Minister of Economic Development and the denial of the extension
by Resolution 197/2016 of October 28 of the General Director of Industry, Energy
and Innovation of the Government of Navarra. Both are the administrative actions
that have dealt with the rights and expectations that Mármoles Baztán held about the
Alkerdi quarry. This was indicated in the seventh legal basis of sentence 200/2018
issued in ORD 525/16:
SEVENTH. About the Damages Claimed
Finally, the plaintiff alleges that the declaration and registration of the Alkerdi
Berroberría System supposes for Mármoles del Baztán SA the paralysis of its
exploitation activity and requests that the Administration compensate her in the
amount of A
C 42,546 per month for the period that mediates from the registration of
the “Alkerdi Berroberría System” in the Register of Cultural Heritage of Navarra
until the ruling is handed down in this procedure plus the cost of employment
regulation during the same period.
This application must also be rejected because the object of the initial procedure
is the conformity or not to the Right of registration as LHI of the Alkerdi Berroberría
System in the General Register of Goods of Cultural Interest of the Ministry of
Education, Culture and Sports, with the category of archaeological zone and a
declaration of damages could only be upheld in favour of the plaintiff if the
administrative action was contrary to the legal system and, as has already been
stated throughout this sentence, the contested administrative action is considered
to be right. Furthermore, as upon reaching the defendant, the quantification of the
damages alleged by the plaintiff is not duly accredited, because they have not been
the subject of this proceeding, and they are not damages derived from the LHI
declaration by operation of law, but, where appropriate, of the revocation of the
extension of the concession, which is the object of the OP O12/2017 followed before
this Chamber; which determines the rejection of this motive for appeal and with that
of the complaint filed as the administrative action challenged in accordance with the
Legal System”.
And actually, this is also what the plaintiff comes to understand, who at the same
time that she filed the claim for patrimonial liability that has given rise to this Litis,
she filed another claim for damages, this one yes, that had caused her the Foral Order
192/2016 of September 12 of the Economic Development Counselor cancelling the
extension of the exploitation concession that had been granted in 2014. This claim
has led to the ordinary procedure 313/2018, which is pending.

4 A Causal Model Application to a Cultural Heritage Sentence Analysis

89

In view of the foregoing and reiterating that there is no causal relationship
between the registration of the Alkerdi Berroberría System as LHI and the damages
claimed, the lawsuit cannot succeed, and must be dismissed in this regard.
FIFTH. Costs
As for the costs, their imposition is not appropriate in view of the partial estimate of
the claim.
In the name of His Majesty The King and by the authority conferred by The
Spanish Nation,

We Rule
That we must partially uphold the present contentious administrative appeal filed
by the attorney SR LEACHE on behalf of MÁRMOLES DEL BAZTÁN SA
against the agreement already identified in the heading of this resolution, which is
annulled for not being in accordance with the law, rejecting the claim of patrimonial
responsibility filed for the damages derived from the registration of the Alkerdi
Berroberría System as a Listed Heritage Item.
Without costs.
Notify this Judicial Resolution in accordance with article 248 of the Organic Law
of the Judicial Power, stating that against it, it is only possible to lodge an appeal
against the corresponding Chamber, solely and exclusively, in the event that there is
any case of objective cassation interest and with the established legal requirements,
all in accordance with articles 86 and following of the Law of the Contentious
Administrative Jurisdiction in the wording given by Organic Law 7/2015 of July
21.
Said appeal must be prepared before this Chamber of the Superior Court of
Justice of Navarra within the period of 30 days following the notification of this
Sentence.
The parties are informed that in any case, and in all cassation appeals that are
presented, all the writings relating to the corresponding cassation appeal must be
inexcusably adjusted to the extrinsic conditions and requirements that have been
approved by Agreement of the Chamber of Government of the Supreme Court
and this Superior Court of Justice of Navarra on dates 21-4-2016 and 27-6-2016
respectively.
These Agreements are posted on the notice board of this Superior Court of
Justice as well as published on the website of the General Council of the Judiciary
(www.poderjudicial.es) for the public and general knowledge.
Thus, by this our sentence definitively judged, we pronounce it, send it and sign
it.

90

A. Sobrino and B. Calderón-Cerrato

References
Aguirrezabala, M., & Fanduzzi, N. P. (2012). Selección de herramientas discursivas para el análisis
del lenguaje jurídico. Foro, Nueva época, 15(2), 105–123.
Armstrong, D. M. (1999). The open door. In H. Sankey (Ed.), Causation and laws of nature (pp.
175–185). Kluwer Academic Publishers.
Barros, D. B. (2013). Negative causation in causal and mechanistical explanation. Synthese, 190,
449–569.
Bosque, I., & Demonte, V. (dir.) (1999). Gramática descriptiva de la lengua española. Espasa
Calpe S.A.
Cummins, D. D., Lubart, T., Alksnis, O., et al. (1991). Conditional reasoning and causation.
Memory & Cognition, 19, 274–282.
Dowe, P. (2000). Physical causation. University of Cambridge Press. https://doi.org/10.1017/
CBO9780511570650
Farzindar, A., & Lapalme, G. (2004). LetSum, an automatic legal text summarizing system. In
T. Gordon (Ed.), Legal knowledge and information systems, Jurix, 2004 (pp. 11–18). The 6th
Annual Conference IOS Press.
Glennan, S. S. (2009). Productivity, relevance and natural selection. Biology and Philosophy, 24,
325–339.
Hart, H. L. A., & Honoré, A. M. (1959). Causation in the law. Oxford University Press.
Hoekstra, R., & Breuker, J. (2007). Commonsense causal explanation in a legal domain. Artificial
Intelligence and Law, 15(3), 281–299.
Honoré, T, & Gardner, J. (2019). Causation in the law. Stanford Encyclopedia of Philosophy
(Fall 2019 edition), Edward N. Zalta (Ed.), https://plato.stanford.edu/archives/fall2019/entries/
causation-law/.
Lagnado, D. A., & Gerstenberg, T. (2017). Causation in legal and moral reasoning. In M. R.
Waldmann (Ed.), The Oxford handbook of causal reasoning (pp. 565–601). Oxford University
Press.
Lehmann, J. & Breuker, J., (2000). On automatic causal reasoning. In J. Breuker, et al. (Eds.),
Legal knowledge and information systems. Jurix 2000 (pp. 123–134). The Thiteenth Annual
Conference. IOS Press.
Li, S. (2017). A corpus-based study of vague language in legislative texts: Strategic use of vague
terms. English for Specific Purposes, 45, 98–109.
Mackie, J. L. (1980). The cement of the Universe: A study of causation. Clarendon Press.
Martí Sánchez, M. (2004). La compleja identidad del léxico jurídico. Estudios de Lingüística
Universidad de Alicante (ELUA), 18, 169–189.
McDermott, M. (1995). Redundant causation. The British Journal for the Philosophy of Science,
46(4), 523–544.
Montolío Durán, E. (dir.) (2011). Estudio de campo: Lenguaje escrito. Comisión para la
modernización del lenguaje jurídico. Ministerio de Justicia.
Pearl, J. (2009). Causality: Models, reasoning, and inference (2nd ed.). Cambridge University
Press.
Pérez Saldanya, M. (2014). Oraciones causales. En Company, C. (dir.), Sintaxis histórica de
la lengua española. Tercera parte: Adverbios, preposiciones y conjunciones. Relaciones
interoracionales. Volumen 3 (pp. 3449–3609). México: FCE, UNAM.
Real Academia Española. (2020). Diccionario panhispánico del español jurídico. Consultado en
https://dpej.rae.es/
Ruiz, H., & de Loizaga, F. J. (1992). Algunas Consideraciones en torno al complemento agente.
Revista Española de Lingüística., 22(2), 339–359.
Stapleton, J. (2015). An ‘extended but-for’ test for the causal relation in the law of obligations.
Oxford Journal of Legal Studies, 35(4), 697–726. https://doi.org/10.1093/ojls/gqv005

4 A Causal Model Application to a Cultural Heritage Sentence Analysis

91

Taranilla, R. (2015). El género de la sentencia judicial: Un análisis contrastivo del relato de hechos
probados en el orden civil y en el orden penal. Ibérica, Revista de la Asociación Europea de
Lenguas para Fines Específicos, núm, 29, 63–82.
Wright, R. W. (1985). Causation in tort law. California Law Review, 73, 1735–1828.

Chapter 5

What Archaeological Texts Argue About:
Denotations and Ontological Proxies
Cesar Gonzalez-Perez

Abstract Argumentation-oriented discourse analysis usually focuses on what is
being said and how, following the text under analysis quite literally, and paying
little attention to the things in the world to which the text refers. However,
to perform argumentation-oriented discourse analysis, one must assume certain
conceptualisations by the author in order to interpret and reconstruct propositions
and argumentation structures. These conceptualisations are rarely captured as a
product of the analysis process. In this chapter, we argue that considering the
ontology to which a discourse refers as well as the text itself provides a richer
and more useful representation of the discourse and its argumentation structures,
facilitates intertextual analysis, and improves understandability of the analysis
products. To this end, we propose the notions of ontological proxies and denotations,
i.e. the conceptual artefacts that connect elements in the argumentation structure to
the associated ontology elements, and the propositional segments that anchor these
to the text, respectively.
Keywords Ontological proxies · Argumentation · Discourse modelling ·
Conceptual modelling · Ontologies

5.1 Introduction
Discourse analysis helps us understand the structure, content and objectives of texts,
contributing to better insights into how people say what they say, how they justify
their claims and overall, how we construct knowledge. Usually, discourse analysis
focuses on “saying, doing and being” (Gee, 2014), where saying refers to what is
said, doing to the practice of speaking by the author, and being to his or her the social
roles. Different discourse analysis techniques such as RST (Rhetorical Structure
C. Gonzalez-Perez (!)
Incipit CSIC, Santiago de Compostela, Spain
e-mail: cesar.gonzalez-perez@incipit.csic.es
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
C. Gonzalez-Perez et al. (eds.), Discourse and Argumentation in Archaeology:
Conceptual and Computational Approaches, Quantitative Archaeology
and Archaeological Modelling, https://doi.org/10.1007/978-3-031-37156-1_5

93

94

C. Gonzalez-Perez

Theory) (Mann & Thompson, 1988) or IAT (Inference Anchoring Theory) (Janier
et al., 2016; Reed & Budzynska, 2010) focus on different purposes, being one of
them the identification and study of argumentation structures. Argument-oriented
discourse analysis usually proceeds by breaking down a text into meaningful
chunks, such as locutions or utterances, and then constructing a model of how these
chunks are related to each other in terms of argumentation schemes or coherence
relations (Centre for Argument Technology, 2018). The final products of argumentoriented discourse analysis, in this manner, are diagrams and accompanying texts
that describe what argumentation devices such as inferences, conflicts or rephrasings
are being employed by the author.
Naturally, argument-oriented discourse analysis focuses on what is being said
and follows the source text as literally as possible. This is a desirable property,
as being faithful to the text minimises unwanted biases and spurious information
that the analyst might otherwise inject. However, this also has the consequence
that little or no attention is paid to the actual things in the world to which the text
refers. But the analyst must necessarily develop a mental model of what entities are
being referred to by the text in order to understand it, resolve references, construct
meaning and, in general, make sense of the words. In particular, proposition
reconstruction (i.e. rewording the literal locutions in the text so that standalone
propositions can be obtained) often plays a central part in argument-oriented
analysis discourse, as illustrated by the IAT Guidelines (Centre for Argument
Technology, 2018). And reconstructing propositions requires the analyst to guess
or unveil what was in the mind of the authors, so that their words make sense. This
mental model of the discourse domain that the analyst constructs is rarely mentioned
in the discourse analysis literature, despite its apparent centrality. Consequently, it
is rarely captured as a product of the analysis process, and usually lost forever.
Readers or users of the analysis products must re-create this mental model in their
heads again, possibly diverging from the interpretation adopted by the analyst, and
thus hindering the communication and utility of the analysis products.
When the text being analysed involves a situation in which two or more agents
exchange arguments, this issue becomes even more important. The analysis of
dialogical texts, as well as the analysis of independent but intertextually connected
texts, requires the analyst to discover the common ontology shared by the different
authors and interpret their utterances in relation to it. A shared ontology between
authors must exist; otherwise, no communication would be possible. For example,
if an author publishes a note as a response to a paper by someone else, this second
author must share an ontology with the first one in order to respond to him or
her. But, again, this shared ontology is rarely documented, and the products of the
discourse analysis rarely refer to it. In this manner, the reconstructed propositions
and argumentation relations are only anchored on the text but not on the world
external to it, leaving to each reader or user the task of re-creating this ontology in
their heads, hoping that they got it right, and re-interpreting the analysis products in
relation to it.
In this chapter we argue that the mental model that the analyst develops in
relation to the discourse being analysed should be captured during argument-

5 What Archaeological Texts Argue About: Denotations and Ontological Proxies

95

oriented discourse analysis, and documented as a proper analysis product, so that
users of the diagrams or other artefacts that result from the analysis can refer to it
as necessary. To do this, we propose the use of conceptual models to represent the
relevant parts of the world that the text refers to. In addition, we argue that detailed
connections should be made between the conventional products of argumentoriented discourse analysis, usually diagrams, and these conceptual models, so
that tracing between discourse and world becomes feasible. These connections are
mediated by conceptual artefacts named ontological proxies. Finally, we argue that
ontological proxies must be anchored to the propositions that refer to them via
denotations.
This chapter is based on a previous article (Gonzalez-Perez, 2020), which is now
updated and presented with an archaeological focus.

5.2 Proposed Approach
The approach followed in this chapter is based on conceptual modelling. This
means that we consider that the product of an argument-oriented discourse analysis
effort is a conceptual model, i.e. a formalised representation of a part of the world
in terms of concepts as dictated by a given formalism or modelling language.
Conceptual models are powerful because they represent a part of the world through
controlled simplification so that we can reason on them and apply the results of
our reasoning back to the part of the world being represented (Gonzalez-Perez,
2018). For example, we can represent the geography of a place through a digital
map in a Geographical Information System, reason on the digital map (for example,
by measuring the distance between two villages), and then apply the conclusions
of our reasoning back to the physical world (we expect these villages to be at the
measured distance). Conceptual models are composed of modelling elements, which
are formalised concepts that adhere to a given formalism or modelling language.
This modelling language is usually described through a metamodel, which defines
what kinds of modelling elements, or primitives, can be used, and how they may
be connected. For example, many modelling languages such as ConML (Incipit,
2020) or UML (OMG, 2017a) establish that the world is to be described in terms
of primitives such as Type and Instance, or equivalent ones, adopting a classical
type/token (Wetzel, 2018) stance.
In our case, the part of the world being modelled is the discourse under analysis,
and the modelling language is a more or less explicit collection of primitives from
which the analysis products are constructed. In our work we use an extended
version of IAT (Janier et al., 2016; Reed & Budzynska, 2010), which defines basic
modelling primitives such as Locution, Proposition, Inference and Illocutionary
Force, as well as specific relationships between them. Even though IAT has not been
described through an explicit metamodel, its major “building blocks” (locutions,
propositions, inferences, etc.) can be readily characterised from the literature. In
this manner, performing a discourse analysis with IAT entails re-expressing what

96

C. Gonzalez-Perez

the text says in terms of IAT’s primitives, i.e. what locutions there are, how they are
reconstructed into propositions, how illocutionary forces anchor each proposition
onto a locution, how inferences connect propositions to drive the argumentation
from premises to conclusions, and so on. In this manner, the final product of
an argumentation-oriented discourse analysis effort is a conceptual model of the
discourse, which describes the discourse in terms of the above-mentioned modelling
primitives. We will call this model a discourse model.
In addition, the central thesis in this chapter is the need for every discourse model
to be accompanied by a conceptual model of the discourse domain, or part of the
world to which the text refers. We will call this model a domain model.
At this point, we must make a clarification. Within information technologies,
the representation of the world has been approached from two different disciplinary
traditions and has thus generated two different sets of terms and assumptions. In the
world of software engineering, the term “conceptual model” is often used, whereas
in the tradition of artificial intelligence and computer systems, the term “ontology”
is more common. The commonalities between conceptual models and ontologies are
far more numerous than their differences (Atkinson et al., 2006; Gonzalez-Perez,
2017; Henderson-Sellers, 2011), so we will use “conceptual model” in this chapter
despite the fact that “ontology” should work equally well.
In this manner, the fact that both the discourse model and the domain model
are both conceptual models allows us for a homogeneous treatment as well as
their interconnection, as we explain in further sections. Figure 5.1 summarises our
approach.

Fig. 5.1 An author produces a discourse (bottom) referring to a part of the world in their mind
(right-hand side). By looking only at the discourse, an analyst creates a discourse model to
represent the discourse (top left), plus a domain model to represent the associated domain (top
right). Since the discourse refers to the domain (thick arrow, bottom), the discourse model must
somehow refer to the domain model (dashed arrow, top)

5 What Archaeological Texts Argue About: Denotations and Ontological Proxies

97

There is an extensive body of literature on conceptual modelling (as well as
ontologies), and conceptual modelling is practised today through the use of many
techniques, languages and tools, such as ConML (Gonzalez-Perez, 2018; Incipit,
2020), OntoUML (Suchánek, 2018), OWL (World Wide Web Consortium, 2012)
or even UML (OMG, 2017b). To express discourse models, as introduced above,
we employ a modified and formalised version of IAT (Janier et al., 2016; Reed &
Budzynska, 2010), supplemented with details from the Periodic Table of Arguments
(Wagemans, 2019; Wagemans, 2020), which we call IAT/ML (Gonzalez-Perez &
Pereira-Fariña, 2021). IAT/ML is large and complex, and a thorough description
is out of the scope of this chapter, but this should not matter for the current
discussion, as the approach that we propose is independent of any particular
modelling formalisms.
On the other hand, we chose ConML to express domain models, as it is
especially suited to the representation of soft issues such as vagueness, temporality
and subjectivity (Gonzalez-Perez, 2013), which are often important in discourse
analysis. A full description of ConML is out of the scope of this chapter, but we
can offer a brief description. ConML is a general-purpose conceptual modelling
language especially oriented towards the humanities and social sciences. It is based
on the object-oriented paradigm, so its metamodel defines modelling primitives such
as Class, Attribute, Association, Object and Link (Gonzalez-Perez, 2018; Incipit,
2020). This means that ConML models represent parts of the world in terms of
what categories of things (classes) there are, what properties they have (attributes),
how they relate to each other (associations), what particular entities exist (objects),
and how they are connected one another (links).
Even though the discourse and domain models are both conceptual models, they
are expressed in terms of different languages (IAT/ML and ConML, respectively),
and thus they must be considered two separate models rather than one. Keeping
these models separate also makes sense for modularity reasons. For example, an
intertextuality study addressing commonalities and differences between a set of
related texts may want to use a common domain model for the whole collection
of texts, but obviously one discourse model for each of them. In this manner, the
relationship between discourse models and domain models (top of Fig. 5.1) is manyto-one.
An example my help. Consider the following excerpt from an archaeological
assessment report (Angove, 2020):
The hedgerows bounding the site to the south-east are shown on the Charlestown Tithe
Map and are therefore historically important using the criteria of the 1997 Hedgerow
Regulations.

Here, the author is describing the fact that some hedgerows are historically
important because they appear in certain historical document, which, according
to some criteria, constitutes grounds to consider them so. Note that the particular
criteria are not mentioned in this fragment, but only a general reference to some
regulations. Similarly, the specific site that the author is discussing is not mentioned,

98

C. Gonzalez-Perez

Fig. 5.2 An IAT/ML diagram showing the text fragment mentioned above. Locutions are shown
as large boxes on the right-hand side, whereas propositions are shown as large boxes on the left.
Note that an inference, labelled IN18, indicates how propositions are argumentatively related. (The
diagram was prepared with LogosLink, a software tool developed by the author)

although we can determine what it is from previous sections in the report. Using
IAT/ML, we would model this fragment as depicted in Fig. 5.2.
The diagram in Fig. 5.2 constitutes a small part of a larger discourse model. To
construct this model, the analyst had to interpret what the author meant. Expressions
such as “the site” or “the criteria” need some reconstruction, as the fragment bears
no reference to what site or criteria are being discussed. In the absence of an explicit
domain model, the discourse model depicted above fails to convey the necessary
information to the reader, who must interpret the diagram themselves to, luckily,
arrive at the same mental model as the analyst who created it.
A domain model of this text fragment would look like the one depicted in Fig.
5.3. This domain model represents the major things that are explicitly mentioned by
the author, such as the site or the Charlestown Tithe Map. It also represents other
things that do not appear in the text but we know about, such as the site’s name
(which is mentioned by the author in previous locutions) or the fact that the 1997
Hedgerow Regulations apply in fact to the site (which is implied by the author). All
in all, this domain model captures the interpretation that the analyst made of the
discourse and can be used as a reference to better understand the discourse model.
At this point, the question remains as to how elements in the discourse model
should be connected to elements in the domain model, as depicted by Fig. 5.4.
The discourse and domain models are different models, each using a different
language, so there is no common formalism that may establish the rules for the

5 What Archaeological Texts Argue About: Denotations and Ontological Proxies

TheSite: Site

HR1997: CompoundNorm

AppliesTo

Title = “Hedgerow Regulations”
Year = 1997

Name = “Land off Mill Lane”

Hedgerows: ConstructiveElement

99

TitheMap: Map

Represents

Location = SE
HistoricallyImportant = true

Name = “Charlestown Tithe Map”

Fig. 5.3 A ConML diagram showing a domain model for the text fragment mentioned above.
Boxes represent entities in the world. For each one, an identifier and a category are given, separated
by a colon. For some entities, values are stated, such as in the case of Location = SE for Hedgerows.
Lines connecting boxes stand for links between entities and are labelled accordingly

TheSite: Site

AppliesTo

Hedgerows: ConstructiveElement

Location = SE
HistoricallyImportant = true

HR1997: CompoundNorm

Title = “Hedgerow Regulations”
Year = 1997

Name = “Land off Mill Lane”

Represents

TitheMap: Map
Name = “Charlestown Tithe Map”

Fig. 5.4 Diagram fragments for the discourse and domain models are displayed here. Blue
arrows connecting them stand for the expected connections between elements in the discourse
and elements in the domain. Discourse fragments have been highlighted in different shades for
clarity. For example, the words “the Charlestown Tithe Map” in proposition PR10 (top left) must
be connected to the TitheMap: Map entity (bottom right)

necessary connection. In other words, neither the metamodel of IAT/ML or ConML
can represent both propositions and entities in the world. In addition, IAT offers no
modelling primitive to represent fragments of a proposition, such as “are historically
important” or “The 1997 Hedgerow Regulations” in Fig. 5.4. To address these
issues, we propose the notion of ontological proxy, as well as the related notion
of denotation.

100

C. Gonzalez-Perez

5.3 Results
An ontological proxy is an element in a discourse model that stands for another
element in the associated domain model, and which may be referenced by multiple
propositions. Let us unpack this definition and explore its consequences.
• Ontological proxies are model elements. This means that, like any other model
elements, they are formalised concepts in the mind of the analyst (GonzalezPerez, 2018), and are usually communicated via depictions in diagrams or other
media.
• Ontological proxies are elements in the discourse model. This means that the
IAT/ML metamodel must contain suitable modelling primitives to accommodate
them. In other words, the IAT/ML metamodel must define primitives for ontological proxies as well as locutions, propositions and inferences.
• Every discourse model must have an associated domain model. As we introduced
above, a common domain model may be shared by multiple discourse models,
but every discourse model must have one and only one domain model.
• Each ontological proxy stands for one element in the associated domain model.
By “stand for” here we mean that they can work as simpler replacements of the
referred to domain elements, since both an ontological proxy and the associated
domain element represent the same thing in the world. It is for this reason that
they are called “proxies”.
• Ontological proxies must be simpler than the associated domain elements;
otherwise, there would be no point in using them. Also, and for the sake of
modularity, ontological proxies must be as independent as possible from the
modelling language employed to express the domain model. For these two
reasons, ontological proxies must be lightweight and minimal.
• Each ontological proxy may be referenced by multiple propositions. Actually, it
is fragments of propositions what refer to ontological proxies, as highlighted in
Fig. 5.4. Each proposition fragment that refers to an ontological proxy is called a
denotation.
These consequences have been used as design criteria to extend the IAT/ML
metamodel and incorporate the necessary constructs to support ontological proxies.
The following subsections describe these criteria and the associated implementation
in greater detail.

5.3.1 IAT/ML Metamodel
As described above, the IAT/ML metamodel must provide modelling primitives to
express ontological proxies and denotations. Figure 5.5 shows the relevant part of
the metamodel.

5 What Archaeological Texts Argue About: Denotations and Ontological Proxies
1

101

Model

1

Proposition

1

1..*

Denotation

1

Range: 1 con TextRange
Content: 1 Text

0..*

Ontology
Identifier: 1 Text
Name: 1 Text

0..*
OntologyElement (A)

RefersTo
1
Target

Identifier: 1 Text

Fig. 5.5 Diagram depicting a section of the IAT/ML metamodel. Model on the top right refers to
discourse models. Ontology refers to domain models (but see text below for details)

According to the metamodel, every discourse model (simply called Model in
Fig. 5.5) has an associated domain model (called Ontology in the figure). We said
in previous sections that multiple discourse models can share a common domain
model. However, the Ontology class in Fig. 5.5 does not represent domain models
themselves, but the proxy image of a domain model that is kept by a discourse
model. In other words, and from the perspective of a discourse model (Model in Fig.
5.5), Ontology represents a private and simplified copy of the associated ontology.
Consequently, this relationship has been modelled as a one-to-one whole/part
association.
Furthermore, every private and simplified ontology contains a number of ontological proxies, called ontology elements in the metamodel. OntologyElement is an
abstract class, as indicated by the “(A)” marker in Fig. 5.5. This means that it has
a number of subtypes representing different kinds of ontology proxies, which we
discuss below.
Reading now from left to right in the diagram, every proposition has a number of
denotations. A denotation is a fragment of a proposition that refers to an ontology
element. The concept of denotation allows us to pick specific words or phrases in a
proposition that clearly refer to an element in the ontology, such as “are historically
important” in PR12 or “The 1997 Hedgerow Regulations” in PR14 in Fig. 5.4.
Figure 5.6 depicts a sample instance model conforming to the metamodel in
Fig.5.5. In the figure, the ontological proxies are the objects of type OntologyElement. These objects have an Identifier value whose contents match the identifiers of
elements in the domain model. This matching relationship is what makes ontological
proxies to work as, precisely, proxies. Note that, in the diagram, proxy relationships
are shown as blue arrows between the associated elements, but they do not exist as
formal relationships as such, since, as we explained above, the discourse and domain
models are expressed using different languages. In any case, both human users of
the models as well as computers processing them can easily find these matches and
thus navigate the proxy relationships.
As we said above, and as depicted in Fig. 5.5, OntologyElement is an abstract
class and has a number of subtypes, corresponding to the different kinds of ontology
elements that are common in domain models. Of course, there are many languages

102

C. Gonzalez-Perez

LandOffMillLane: Model

DN20: Denota!on

PR10: Proposi!on
Content = “The hedgerows bounding
the site to the south-east are shown
on the Charlestown Tithe Map.”

Range = 0..48
Content = “The hedgerows
bounding the site to
the south-east”

DN21: Denota!on

Range = 67..87
Content = “Charlestown
Tithe Map”

RefersTo

AT3: OntologyElement
Target

Identifier = “Hedgerows”

Default: Ontology

AT4: OntologyElement

RefersTo
Target

Identifier = “TitheMap”

discourse model

domain model

TitheMap: Map
Name = “Charlestown Tithe Map”

Hedgerows: Construc!veElement
Location = SE
HistoricallyImportant = true

Fig. 5.6 Diagram depicting how ontological proxies work. Above the line, an instance model
conforming to the metamodel in Fig. 5.5 is shown, stating that proposition PR10 has two
denotations for “The hedgerows . . . ” and “Charlestown Tithe Map”. Each denotation refers to
a particular ontological element of the discourse model’s associated domain model (ontology).
Below the line, a fragment of the associated domain model from Fig. 5.3 is shown. Blue arrows
across the line depict the fact that ontological elements work as proxies to elements in the domain
model, as shown by the matching identifiers “Hedgerows” and “TitheMap”

that one could use to express a domain model, so the IAT/ML metamodel must be
generic enough as to cater for as many as possible. For this purpose, we decided
to implement a small but varied range of subtypes of OntologyElement, which the
design goal that at least languages such as ConML, OntoUML and OWL should be
supported. Most conceptual modelling languages adopt an object-oriented approach
and hence include primitives such as Class, Attribute, Object and Link. However,
terminology varies between languages, and the specific semantics of the major
primitives are also slightly different. Most languages, however, share the fact that
they distinguish clearly between types and instances (or categories and entities,
depending on the terminology used) as a major architectural principle around which
their metamodels are organised. This means that ontological elements could also be
organised along these lines. However, we felt that adopting a multilevel modelling
approach (Atkinson & Kühne, 2001; Clark et al., 2014) would entail little extra complexity and provide a much richer and more expressive ontological infrastructure.
Multilevel modelling allows chains of type/instance relationships of arbitrary length,
thus enabling the homogeneous treatment of types and instances for many common
purposes and supports higher-order types with a rather simple structure. For these
reasons, we adopted the multilevel modelling principles sketched in (Almeida et al.,
2018) and designed the OntologyElement subtype hierarchy shown in Fig. 5.7.
The first subtype of OntologyElement is Entity, which represents things in the
world such as the computer I am using, my house, the Second World War, or the
5/2016 Act on Cultural Heritage, for example. Anything in the world may be an
entity. Entities are characterised through facets of two kinds: values and references.
Values represent atomic qualities or quantities of entities, such as the fact that I
am 53 years old or that the Second World War began in 1939. References, in turn,

5 What Archaeological Texts Argue About: Denotations and Ontological Proxies

103

OntologyElement (A)
Identifier: 1 Text
Kind

RefersTo

0..*

Opposite

Reference

1

0..*
1

Entity

0..*

Name: 1 Text
Instantiability

0..*

Value

Facet (A)
Kind

0..*

Instance

Instance

Content: 1 Data

0..*

[IsOfType]

0..1

1

1

Type
1

Category

Feature (A)
0..*

0..*
SubType
isSubTypeOf

Opposite

SuperType 0..*

Type

[IsOfType]

IsOfType

IsOfType

0..*

[IsOfType]

Atom

1
Kind

Property

Name: 1 Text

1

1

Association

RefersTo

0..*

Fig. 5.7 Part of the IAT/ML metamodel showing the class hierarchy under OntologyElement.
Please see the text below for a detailed description of each model element

represent connections between entities, such as the fact that I (an entity) work at
Incipit CSIC (another entity), or that the 5/2016 Act on Cultural Heritage (an entity)
applies in Galicia, Spain (another entity).
Entities come in two kinds, depending on whether or not they can be instantiated,
as described in the multilevel modelling literature (Almeida et al., 2018; Clark et al.,
2014). Some entities are not instantiable, that is, they cannot work as templates for
other entities. These are called “particulars” (and sometimes “atoms”) in philosophy,
“ur-elements” in mathematics, or “objects” in the object-oriented approach in
software engineering. We call them atoms. Some examples of atoms include myself,
the Second World War, or the 5/2016 Act on Cultural Heritage.
Some other entities, as opposed to the previous, can be instantiable into other
entities, working as templates for them, and usually corresponding to generic
concepts or ideas. For example, the notion of Tree can be instantiated into individual
trees, such as each of the trees I can see through the window as I type this sentence.
Similarly, the notion of Person is instantiated into each individual person. These

104

C. Gonzalez-Perez

instantiable entities are called “universals” in philosophy or “classes” in objectoriented software engineering. We call them categories. In general, we can say that
every entity has a category as type, since, in the words of George Lakoff, “There
is nothing more basic than categorization to our thought, perception, action, and
speech” (Lakoff, 1990). For example, I am of the Person category, the Second World
War is of the ArmedConflict category, and the 5/2016 Act on Cultural Heritage is
of the Law category. In practice, and especially when constructing ontologies with
some degree of uncertainty, we do not know or are not interested in the category of
some entities, so specifying them is not mandatory.
Now, since categories are also entities, they can have values and references. In
addition, they can be characterised through two extra kinds of features: properties
and associations. Properties define possible values of the entities of the category.
For example, since every person has a value for their age, then we can capture this
fact by stating that the Person category has an Age property. Similarly, associations
define possible references of the entities of the category. For example, since every
person has been born in a particular place, then we can capture this fact by stating
that the Person category has a WasBornIn association towards the Place category.
In this manner, the IAT/ML metamodel supports ontological proxies of six
concrete kinds: atoms, values, references, categories, properties, and associations.
Although some types of modelling primitives are not covered (such as OntoUML
non sortals, for example), these six kinds map nicely to the major modelling
primitives of almost any conceptual modelling language, as exemplified by Table
5.1.
We must also remark that the notation used in Fig. 5.6 is convenient to visualise
the details of the data structures implementing the models. However, we suggest a
different notation for most practical purposes, which is shown in Fig. 5.8.
The following sections provide guidance on how to find ontological proxies as
well as some examples to illustrate how they can be used in practice.
Table 5.1 Mappings between IAT/ML ontology element subtypes and modelling primitives of
common conceptual modelling languages
IAT/ML
Atom
Value
Reference
Category
Property
Association

ConML
Object
Value
Reference
Class
Attribute
Semi-association

OntoUML
(Not supported)
(Not supported)
(Not supported)
RigidSortal
Property
Relation

OWL
Individual
DataProperty
ObjectProperty
Class
(Handled through axioms)
(Handled through axioms)

5 What Archaeological Texts Argue About: Denotations and Ontological Proxies
discourse model
Atom
Hedgerows

Atom
TitheMap

105

domain model
Hedgerows: ConstructiveElement

Location = SE
HistoricallyImportant = true
TitheMap: Map

Name = “Charlestown Tithe Map”

Fig. 5.8 This depicts the same situation that was shown in Fig. 5.6, but using the IAT/ML notation
introduced earlier plus some additional lines and symbols. Ellipses represent ontological proxies,
that is, instances of OntologyElement in Fig. 5.7. Matching elements in the domain model are
shown to the right

5.3.2 Constructing Ontological Proxies
As we described in previous sections, ontological proxies are model elements. This
means that they are mental constructs that adhere to a well-known formalism or
modelling language. In this section we tackle the issue of how ontological proxies,
as model elements, are constructed.
As explained above, ontological proxies are referred to by fragments of propositions. In Fig. 5.8, for example, the fragments “The hedgerows bounding the site to
the south-east” and “the Charlestown Tithe Map” are highlighted to indicate that
they correspond to denotations, each of them referring to an ontological proxy.
So, in order to determine what ontological proxies must be constructed for a given
proposition, we must take into account the following guidelines.
First, it is important to acknowledge that conceptual modelling is always done
for a purpose, i.e. it is a situated activity driven by a goal. Two models of the same
part of the world but pursuing different goals are likely to be very different. In
addition, conceptual modelling, as a concept-creation process, is clearly dependent
on subjective traits of the analyst such as academic and cultural background or
personal preferences. Consequently, it is impossible to provide clear-cut rules as
to how construct ontological proxies; only approximate guides can be offered.
Having said this, it is safe to say that the process to construct ontological proxies
is often driven by an examination of the lexicon and grammar employed by the
proposition at hand, with the goal of answering the question “what is this sentence
talking about?”. For example, in “The hedgerows bounding the site to the southeast are shown on the Charlestown Tithe Map” in Fig. 5.8, we can observe the
following:
• The subject “The hedgerows bounding the site to the south-east” refers to some
hedgerows.
• The verb “are shown in” indicates a representation relationship between these
hedgerows and a map.

106

Hedgerows: ConstructiveElement
Location = SE
HistoricallyImportant = true

C. Gonzalez-Perez

Represents

TitheMap: Map
Name = “Charlestown Tithe Map”

Fig. 5.9 Domain model depicting the observations made from the analysis of proposition PR10 in
Fig. 5.8

• The complement “the Charlestown Tithe Map” refers to the medium supporting
this representation.
This means that the proposition contains three denotations, which in turn hint
at three potential entities: some hedgerows, which we can conceptualise as a
constructive element; a representation relationship; and a medium on which this
representation is captured, i.e. the map. It also expresses connections between them:
“The hedgerows . . . ” points at the thing being represented, and “the Charlestown
Tithe Map” points at the thing doing the representation. We can depict this by the
domain model in Fig. 5.9.
Note that, in the domain model, we state that the map represents the hedgerows,
rather than saying that the hedgerows are shown on the map, which would be
closer to what the text literally says. Reconstructions like these, which do not
alter the semantics significantly, can be safely done when domain modelling if
they produce models that are clearer and easier to understand. In our case, we are
using the Cultural Heritage Abstract Reference Model (CHARM) (Gonzalez-Perez
et al., 2018; Gonzalez-Perez & Parcero Oubiña, 2011; Incipit, 2016) as guidance for
domain modelling, so some category and association names are taken from it and
adapted as necessary for better interoperability of the resulting model.
The domain model, as is, contains two entities, named by the identifiers
Hedgerows and TitheMap. Note also that we have chosen particular categories
for these entities: Hedgerows is a ConstructiveElement and TitheMap is a Map.
These categories are taken from CHARM, but other options may be also valid. For
example, stating that the hedgerows referred to by proposition PR10 constitute a
constructive element may not be shared by everyone, as it responds to a particular
conceptualisation of the landscape. If we suspected that model users may not share
this ontology, then we should rather employ a different category such as the noncommittal StructureEntity. Choosing the right category is not always easy, as often
there is not much information in the text about what “right” means in this context.
Using a domain-specific reference model or ontology, as we did with CHARM in
this example, can be useful, as it provides a catalogue of common concepts in the
domain to choose from.
In this example, all the denotations refer to entities in the world, or atoms in
our domain model. Other propositions may refer to other kinds of ontological
elements, such as values or references. For example, PR12 in Fig. 5.4 states that
“The hedgerows bounding the site to the south-east are historically important”.
Here, the fragment “are historically important” can be interpreted as denoting a

5 What Archaeological Texts Argue About: Denotations and Ontological Proxies

107

value for the entity denoted by “Hedgerows”, namely the predication that they are
historically important (represented by the HistoricallyImportant value in the figure).
In general, proper nouns or qualified noun phrases, such as “the hedgerows” or
“the 1997 Hedgerow Regulations” usually denote material or immaterial entities.
Verbal phrases headed by dynamic verbs such as “excavate” or “reconstruct” (not in
our example) usually denote processes or activities. Both can be modelled through
Entity ontological proxies. Verbal phrases with stative or passive-mode verbs, such
as “are shown on” or “state” often denote predications of values or references on
the subject entity, which can be modelled through Value and Reference ontological
proxies. Adjectival clauses such as “historically important” usually denote the
content of values or references. A special mention should be made of phrases with
the verb “to be”, as this verb may carry different meanings in many languages.
In English, for example, “to be” may indicate either existence (e.g. “there is a
site”), which would be modelled through an Entity; identity (e.g. “this area is the
destination of mass migrations”), which can be also modelled as an Entity plus a
Reference; predication (e.g. “the artefact is 12 cm long”), which is best modelled as
a Value or a Reference; classification (“this is a post hole”), which can be modelled
through an Entity and a Category; or subsumption (“a house is a structure”), which
should be modelled through two related instances of Category (Gonzalez-Perez,
2018). Sentences containing “to be” must be carefully analysed.
Not that this lexical and grammatical analysis of propositions allows us to
define elements in a domain model, rather than the ontological proxies themselves.
Ontological proxies, by definition, are lightweight replacements for elements in the
domain model, so once this model has been created and is stable, an ontological
proxy can be constructed for each model element. Going back to the example in
Fig. 5.9, we would construct three ontological proxies: an Atom for Hedgerows,
another Atom for TitheMap, and a Value for HistoricallyImportant = true.
As we proceed to analyse more propositions in the same discourse, we would
be adding to the domain model, or altering it to accommodate new elements.
For example, it is likely that another proposition tells us something relevant to
characterise the tithe map involved in the argumentation, or to locate the hedgerows
in Fig. 5.9 in relation to the site, or even to add extra details to any of these entities.
Conceptual modelling is usually an iterative and incremental task, which eventually
converges to a stable resolution.

5.3.3 Examples of Use
Let us look at some examples of ontological proxies in practice. Firstly, let us
focus on the issue of how ontological proxies may help us to document particular
interpretations of the discourse. Consider the following fragment:

108

C. Gonzalez-Perez

Alice: The 5/2016 law says that you cannot build close to a protected site.
Bob: But the law also says that I have the right to buy and possess any land.

A first approach to analysing this fragment may interpret the exchange as a conflict,
since “the law” in Bob’s line refers to the same thing as “The 5/2016 law” in Alice’s.
In fact, the “But” lexical marker heading Bob’s retort is a usual indicator of conflict.
This interpretation is captured by the models depicted in Fig. 5.10.
However, an alternative interpretation is possible. The denotation “the law” in
Bob’s line may refer to the general laws and regulations that apply, rather than
the 5/2016 Heritage Act in particular. If this is the case, then Bob is saying that
regulations, in general, allow you to buy and possess any land, which may not be
a conflict with Alice’s proposition after all, as the 5/2016 Heritage Act could be
making an exception to the general right to buy and possess land. This alternative
interpretation is captured in Fig. 5.11.
Here, two ontological proxies exist, capturing the facts that the 5/2016 Heritage
Act is part of a larger set of overall regulations. Once this interpretation has been
established, it is clear that there is no necessary conflict between propositions PR10

discourse model

Atom
HeritageAct5_2016

domain model

HeritageAct5_2016: CompoundNorm
Name = “5/2016 Heritage Act”

Fig. 5.10 Discourse and domain models for the interpretation that “The 5/2016 law” and “the
law” refer to the same thing
discourse model

Atom
HeritageAct5_2016

Atom
Regulations

domain model
HeritageAct5_2016: CompoundNorm

Name = “5/2016 Heritage Act”

Regulations: CompoundNorm
Name = “Overall regulations”

Fig. 5.11 Discourse and domain models for the interpretation that “The 5/2016 law” and “the
law” refer to different but related things

5 What Archaeological Texts Argue About: Denotations and Ontological Proxies

109

and PR12, as shown. Note that, in the absence of ontological proxies, the two
discourse diagrams (corresponding to the boxes displayed on a grid) from Figs.
5.10 and 5.11 would show different options but with no associated explanation. A
reader of these models would find no information as why a conflict was or was not
described between the propositions. Once we incorporate the ontological proxies,
however, and even in the absence of the domain model, the interpretation of the
discourse becomes clear.
Let us now move to a different example and focus on how ontological proxies
can work to assist in lexical/semantic studies. Consider the following text (Ruiz
Mantilla, 2020):
People tend to go down South, where there is wealth and work. And they expel the Muslim
population. The North was hard, and they got rumours about Al Andalus being like an Eden.

Here, two terms, “the South” and “Al Andalus”, are being used to refer to the same
thing. This interpretation is shown in Fig. 5.12.
First, note that propositions PR24 and PR26 use “South” or “the South” to refer
to the southern region of Spain, whereas PR43 uses “Al Andalus” to refer to the

discourse model

Atom
TheSouth

domain model

TheSouth: NonMaterialPlace
Name = “The South”, “Al Andalus”

Fig. 5.12 Discourse and domain models for the interpretation that “the South” and “Al Andalus”
refer to the same thing

110

C. Gonzalez-Perez

same place. This is interpretation is clearly documented by the single ontological
proxy labelled TheSouth. Once this has been established, it is easy to see why PR43
works as a premise (together with PR30) for inference IN573 and leading to the
conclusion PR24: living in the North was hard, and since people got rumours that
Al Andalus was like an Eden, they moved there. This argument only makes sense if
we assume that Al Andalus and the South are the same thing. Again, this assumption
is clearly documented through ontological proxies and thus works as grounding to
support inference IN573.
Finally, let us consider how ontological proxies may be useful to intertextual
studies. Consider the following fragments, taken from different documents (Angove,
2020; Historic Environment Service, Cornwall Council, 2012):
Angove: The hedgerows bounding the site to the south-east are shown on the Charlestown
Tithe Map.
HESCC: The tithe map of 1842 illustrates how the settlement had expanded since the 1825
survey.

Here, speakers Angove and HESCC (the Historical Environment Service of Cornwall Council) are not engaged in a dialog, and they may not even know about
each other. But both are discussing the Charlestown Tithe Map, albeit by using
different ways to denote it. Angove uses the denotation “the Charlestown Tithe
Map” whereas HESCC uses “The tithe map”. Figure 5.13 depicts the models for
both fragments.
In this example, the denotation “the Charlestown Tithe Map” discourse model 1,
as well as the denotation “The tithe map” in discourse model 2, point both to atoms
labelled TitheMap. Furthermore, discourse model 2 contains a denotation pointing
to a Year value for this atom, with contents “1842”. The domain model is shared
between the two discourse models. In it, we can see a single object TitheMap with a
subjectively marked value for the Year attribute, corresponding to the Year value in
discourse model 2. The modelling of subjectivity is out of the scope of this chapter,
but a brief introduction can be found in (Gonzalez-Perez, 2013). Essentially, the
line starting with “Year” in the TitheMap box in the domain model stands for a value
given to this object by a particular agent, in our case, HESCC, and which may not be
shared by other agents. We chose to use the subjective marker to show that the 1842
attribution of the map is provided only by the second text, and not mentioned by the
first one. In this manner, two discourse models that were in principle disconnected
and structurally unrelated are now linked together through a common domain model
that documents the associated speaker perspectives. This captures the fact that both
discourses are referring to a common set of concepts in the world, namely the 1842
tithe map of Charlestown. This example only involves two discourse models, but
this approach can be applied with any number of discourse models as long as all of
them refer to a common set of things in the world.

5 What Archaeological Texts Argue About: Denotations and Ontological Proxies
discourse model 1

111

domain model

Atom
Hedgerows

Atom
TitheMap
TitheMap: Map

Name = “Charlestown Tithe Map”
Year $HESCC = 1842
Atom
TitheMap

Value
Year = 1842

Atom
Charlestown

discourse model 2

Fig. 5.13 Discourse and domain models for the fragments above. Note that the two discourse
models share a common domain model

5.4 Conclusions
The previous sections have presented the notions of ontological proxy and denotation, and described how ontological proxies and denotations can be used to better
express domain facts that are relevant to the discourse being analysed.
Various aspects must be highlighted. Firstly, ontological proxies are independent
of the specific languages or approaches that one employs for discourse or domain
modelling. We have chosen IAT/ML and ConML, but ontological proxies do not
rely on these choices. Rather, they are an abstract device that mediates between
a discourse model and a domain model, whatever formalisms are used to express
them. As we previously stated, the six concrete kinds of ontological proxies (atoms,
values, references, categories, properties, and associations) map nicely to the major
modelling primitives of almost any conceptual modelling language.
Secondly, ontological proxies are part of the discourse model. This means that
the discourse model is autonomous and does not need an accompanying domain
model to stay expressive. In fact, we could remove the right-hand side in every
figure in the previous section, and the diagrams would still be understandable. Of
course, ontological proxies are proxies, and therefore lightweight, so they do not
contain every detail that the full domain model can offer. This is especially clear, for
example, in Fig. 5.13, where the fact that there is a subjective year attribution of the

112

C. Gonzalez-Perez

Charlestown tithe map cannot be seen but in the domain model. Still, ontological
proxies provide a good balance between expressiveness and conciseness, which
arguably would minimise the need to retrieve and examine the domain model in
most situations. In addition, the fact that the connections between discourse and
domain models are established via lightweight elements acknowledges the principle
of modularity that has been crucial in software engineering since at least the 1980s
(Meyer, 1997). According to this principle, discourse and domain models are kept
separate (they are different “modules”) but connected through few and weak links,
namely, the mappings between ontological proxies and elements in the domain
model. This allows each of these two artefacts to live separately, using whichever
formalism is required for each one, but still be connected when needed.
Another relevant issue is the fact of limited expressiveness. Since ontological
proxies are simpler replacements for domain model elements, they are limited by
how expressive the chosen modelling language is. In this chapter we have used
ConML, which is capable, for example, of representing different subjective views on
the same things, or temporal change, with minimum burden, as it provides specific
mechanisms to do it. Not all modelling languages do this. If the chosen domain
modelling language does not offer a similar mechanism to represent subjective
views, for example, propositions such as “As opposed to the local government,
tourists often think that the cathedral urgently needs repairs” would be difficult
to analyse and express, as the opposed subjective views described by it could
not be satisfactorily represented by any primitive in the language. In this regard,
and despite the fact that ConML is highly expressive (Gonzalez-Perez, 2013), it
still lacks support for irrealis modalities such as conditionals or imperatives, so
ontological proxies for denotations using these modalities are difficult or impossible
to represent properly.
The theoretical proposal introduced in this chapter has been implemented in the
LogosLink software tool, as previously mentioned, and has been applied to the
analysis of selected texts from a corpus of over 800 articles on covid-19 from the
Spanish edition of The Conversation (The Conversation, 2020). It is also being used
to analyse a number of documents on archaeological sites related to Mansilla de
la Sierra (La Rioja, Spain), the Portico of Glory at the Cathedral of Santiago de
Compostela (Spain), and other areas.
Future research directions include the following. The ConML language will
be extended to support inequality predication, so that facts such as “The site is
wider than 120 m” can be captured. Also, ConML will be extended to support
various additional modalities such as deontic or hypothetical structures. This will
allow domain models to become much richer and expressive, as described above.
The subclasses of OntologyElement in IAT/ML will be extended likewise so that
propositions containing constructs like these can be adequately linked to domain
elements. Additional extensions will be made to allow denotations to refer not only
to specific ontological proxies, but also to the changes associated to them. This will
allow, for example, to cater for statements expressing persuasion or change of mind,
such as “I was convinced that the cathedral was fine, but now I see that it needs
some repairs”.

5 What Archaeological Texts Argue About: Denotations and Ontological Proxies

113

Finally, a comprehensive specification of IAT/ML, including a proper graphical
notation, will be prepared and published. From the point of view of tool implementation, LogosLink will be updated with the new additions to IAT/ML, and
support will be added for multi-model projects so that additional analytical options
become possible, especially in relation to intertextual analysis. This material will
be made available through the IAT/ML web site (Gonzalez-Perez, Pereira-Fariña &
Calderon-Cerrato, 2021).

References
Almeida, J. P. A., Frank, U., & Kühne, T. (2018). Multi-level modelling (Dagstuhl seminar 17492).
Wadern, Germany. https://doi.org/10.4230/DagRep.7.12.18.
Angove, A. (2020). Land off Mill Lane, Charlestown, Cornwall – Archaeological Assessment.
https://doi.org/10.5284/1084120.
Atkinson, C., & Kühne, T. (2001). The essence of multilevel metamodelling. In «UML» 2001:
Modeling languages, concepts and tools (Vol. 2185, pp. 19–33). Springer.
Atkinson, C., Gutheil, M., & Kiko, K. (2006). On the relationship of ontologies and models. In
Proceedings of the 2nd International workshop on meta-modelling (WoMM) (Vol. LNI 96, pp.
47–60).
Centre for Argument Technology. (2018). Annotation guidelines for Inference Anchoring Theory
(IAT) with support for Conventional Implicatures (CIs). [Online]. Available: https://typo.unikonstanz.de/add-up/wp-content/uploads/2018/04/IAT-CI-Guidelines.pdf
Clark, T., Gonzalez-Perez, C., & Henderson-Sellers, B. (2014). A foundation for multi-level
modelling. In C. Atkinson, G. Grossmann, T. Kühne, & J. de Lara (Eds.), Proceedings of the
workshop on multi-level modelling co-located with ACM/IEEE 17th international conference
on model driven engineering languages & systems (MoDELS 2014) (Vol. 1286, pp. 43–52).
CEUR-WS.org.
Gee, J. P. (2014). An introduction to discourse analysis: Theory and method. Routledge.
Gonzalez-Perez, C. (2013). Modelling temporality and subjectivity in ConML. In R. Wieringa & S.
Nurcan (Eds.), 7th IEEE International conference on research challenges in information science
(RCIS 2013) (pp. 1–6). IEEE Computer Society.
Gonzalez-Perez, C. (2017). How ontologies can help in software engineering. In J. Cunha, J.
P. Fernandes, R. Lämmel, J. Saraiva, & V. Zaytsev (Eds.), Grand timely topics in software
engineering (Vol. 10223: LNCS) (pp. 26–44). Springer.
Gonzalez-Perez, C. (2018). Information modelling for archaeology and anthropology. Springer.
Gonzalez-Perez, C. (2020, November). Connecting discourse and domain models in discourse analysis through ontological proxies. Electronics, 9(11), 1955. https://doi.org/10.3390/
electronics9111955
Gonzalez-Perez, C., & Parcero Oubiña, C. (2011). A conceptual model for cultural heritage
definition and motivation. In M. Zhou, I. Romanowska, Z. Wu, P. Xu, & P. Verhagen (Eds.),
Revive the past: Proceeding of the 39th conference on computer applications and quantitative
methods in archaeology (pp. 234–244). Amsterdam University Press.
Gonzalez-Perez, C., Pereira-Fariña, M., & Calderon-Cerrato, B. (2021). IAT/ML. http://
www.iatml.org/
Gonzalez-Perez, C., Martín-Rodilla, P. , & Pereira-Fariña, M. (2018). Computer-assisted analysis
of combined argumentation and ontology in archaeological discourse. In 46th computer
applications and quantitative methods in archaeology (CAA 2018). Tübingen.
Henderson-Sellers, B. (2011). Bridging metamodels and ontologies in software engineering.
Journal of Systems and Software, 84(2), 301–313. https://doi.org/10.1016/j.jss.2010.10.025

114

C. Gonzalez-Perez

Historic Environment Service, Cornwall Council. (2012). Charlestown conservation area
character appraisal & management plan. [Online]. Available: https://map.cornwall.gov.uk/
reports_conservation_areas/Charlestown.pdf
Incipit. (2016). CHARM white paper. Incipit, CSIC. [Online]. Available: http://
www.charminfo.org/Resources/Technical.aspx
Incipit. (2020). ConML technical specification. Incipit CSIC. [Online]. Available: http://
www.conml.org/Resources/TechSpec.aspx
Janier, M., Aakhus, M., Budzynska, K., & Reed, C. (2016). Modeling argumentative activity
with inference anchoring theory. In D. Mohhamed & M. Lewinski (Eds.), Argumentation and
reasoned action. Volume I Proceedings of the 1st European conference on argumentation (Vol.
1, no. 62). College Publications.
Lakoff, G. (1990). Women, fire, and dangerous things. University of Chicago Press.
Mann, W. C., & Thompson, S. A. (1988). Rhetorical structure theory: Toward a functional theory
of text organization. Text – Interdisciplinary Journal of Study Discourse, 8(3). https://doi.org/
10.1515/text.1.1988.8.3.243
Meyer, B. (1997). Object-oriented software construction (2nd ed.). Prentice-Hall.
OMG. (2017a). Unified modeling language 2.5.1. [Online]. Available: https://www.omg.org/spec/
UML/
OMG. (2017b). Unified modeling language. Object Management Group.
Reed, C., & Budzynska, K. (2010). How dialogues create arguments. In ISSA Proceedings 2010. [Online]. Available: http://rozenbergquarterly.com/issa-proceedings-2010-howdialogues-create-arguments/
Ruiz Mantilla, J. (2020, February 23). Peridis: En comarcas de la montaña palentina nacen ya más
osos que niños. El País.
Suchánek, M. (2018). OntoUML specification. https://ontouml.readthedocs.io/. Accessed 9
Oct2020.
The Conversation, Spanish Edition, 2020. https://theconversation.com/es. Accessed 16 Oct 2020.
Wagemans, J. (2019). Four basic argument forms. Research in Language, 17(1), 57–69. https://
doi.org/10.2478/rela-2019-0005
Wagemans, J. (2020). Period table of arguments. https://periodic-table-of-arguments.org/.
Accessed 16 Oct 2020.
Wetzel, L. (2018). Types and tokens. In E. N. Zalta (Ed.), The Stanford encyclopedia of philosophy.
Metaphysics Research Lab/Stanford University.
World Wide Web Consortium. (2012). OWL 2 Web Ontology Language. World Wide Web
Consortium. [Online]. Available: http://www.w3.org/TR/2012/REC-owl2-overview-20121211/

Chapter 6

The Social Production of Discourse
in Archaeology
Isto Huvila

Abstract Archaeology is a profoundly social and collaborative enterprise. Even if
it is a discipline of things, archaeology is also a discipline of discourses of things.
The making of new archaeological information and knowledge both leans on and
weaves a conversation of the past that is fundamentally as social as it is material.
These conversations traverse an immense spectrum of archaeological practices and
contexts far beyond archaeology itself. This chapter provides an overview of how
discourses are produced in archaeology, their characteristics and contemporary
facets, and how studying the social production of archaeological discourse(s) is
helpful for understanding archaeology and archaeological knowledge. Discourse
refers not only to talking or writing about archaeology but documenting, communicating and conveying archaeology, archaeological information and knowledge
in diverse means, and by doing that, influencing archaeological practices and
the production of archaeological knowledge. The chapter starts by asking where
contemporary archaeological discourse is produced and continue to inquiring into
who participates and who are left out, how to analyse and explain archaeological
discourses, what characterises them, and finally, why understanding the social
production of archaeological discourse can be useful for archaeologists and nonarchaeologists.

6.1 Introduction
Archaeology is a profoundly social and collaborative enterprise. Even if it is a
discipline of things (Olsen, 2012), archaeology is also a discipline of discourses of
things. The making of new archaeological information and knowledge both leans on
and weaves a conversation of the past that is fundamentally as social as it is material.

I. Huvila (!)
Department of ALM, Uppsala University, Uppsala, Sweden
e-mail: isto.huvila@abm.uu.se
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
C. Gonzalez-Perez et al. (eds.), Discourse and Argumentation in Archaeology:
Conceptual and Computational Approaches, Quantitative Archaeology
and Archaeological Modelling, https://doi.org/10.1007/978-3-031-37156-1_6

115

116

I. Huvila

These conversations traverse an immense spectrum of archaeological practices and
contexts far beyond archaeology itself.
This chapter aims to provide an overview of how discourses are produced
in archaeology, their characteristics and contemporary facets, and how studying
the social production of archaeological discourse(s) is helpful for understanding
archaeology and archaeological knowledge. Here, as in the earlier chapters of this
volume, discourse refers not only to talking or writing about archaeology but documenting, communicating and conveying archaeology, archaeological information
and knowledge in diverse means, and by doing that, influencing archaeological
practices and the production of archaeological knowledge. More precisely, the word
discourse is used in the following in three different senses. Archaeological discourse
in singularis refers to the entirety of how archaeology is discussed, communicated,
talked and written about and conveyed in the society. As this entirety consists
of multiple parallel, partly overlapping and in many cases conflicting ways of
thinking and communicating about archaeology, the references to (archaeological)
discourses in pluralis refer this myriad. Thirdly, throughout the article when
discussing various discourse theories and notions like Smith’s authorised heritage
discourse, the term discourse and discourses refer to the particular matters stipulated
in the specific parts of the liteature.
In this chapter, we start the inquiry into the social production of archaeological
discourse and discourses by asking where contemporary archaeological discourse
is produced and continue by inquiring into who participates and who are left out,
how to analyse and explain archaeological discourses, what characterises them, and
finally, why understanding the social production of archaeological discourse can be
useful for archaeologists and non-archaeologists.

6.2 Approaching Archaeological Discourses
6.2.1 Whereabouts of Archaeological Discourses
The extent and variety of archaeological work conducted in different countries
around the world, the diversity of practices across the different branches of
archaeology and archaeological and archaeology-inspired theorising and the broad
implications of archaeological knowledge in the society has led to that archaeology
is discussed in multiple languages in a large variety of scholarly, professional and
non-professional outlets. Some areas of archaeology are more international than
others but as Aitchison (2017) notes of archaeological practices, also archaeological
knowledge and knowledge-making are characterised by a certain parochiality and
proliferation of spaces and places where it is discussed rather than a cosmopolitanism of predominant discourses and venues that would be comparable, for
instance, to sciences.

6 The Social Production of Discourse in Archaeology

117

Even if scholarly texts and academic publications carry considerable weight
as principal means of conveying archaeological discourse, discussing archaeology is not confined to the literature. By quantity, the most extensive corpus of
archaeological knowledge is inscribed in investigation reports. They are a key
source of information about specific investigations and localities even if a single
report seldom attracts an especially wide audience (Börjesson, 2015, 2016a, b).
Simultaneously, they play a measurable role as a site of conveying and unfolding
of how archaeological knowledge is discussed and deliberated. Especially recent
research has also started to emphasise the role of oral and uninformal information
exchange both in field (e.g. Morgan & Wright, 2018) and in general, in archaeological information work (Huvila, 2014). While archaeologists have always had
their social networks, much of the contemporary quasi-formal professional and
academic archaeological exchange and reflection takes place online in social media
(Walker, 2014; Huvila, 2014; Kansa et al., 2011). Another significant context of
negotiating and framing archaeological information and knowledge highlighted
in the scholarship is administrative and policy documents, including heritage
conventions and information standards (e.g. Börjesson, 2016a, b; Lafrenz Samuels,
2016; Enqvist, 2016). Their impact might be somewhat invisible but in practice,
they play a prodigious role in steering both contemporary and future archaeological
discourses (Lafrenz Samuels, 2016) by amplifying and silencing specific aspects
and perspectives in archaeological discourse and consequently, what is known about
and within archaeology.
Besides professional and administrative outlets, popular and popular science
media do often report archaeological research results and amateur archaeologists’
magazines, local archaeological and historical societies and recently, especially
social media (e.g. García-Ceballos et al., 2021; Wakefield, 2020; Huvila, 2013)
extends the reach and diversity of how and where archaeology is discussed and
debated. The rapid and profound digitalisation of social exchange in society has
broadened also the reach and diversity of both professional and public archaeological discourse. Some of the outlets are less obvious than others. Major social
media sites such as Twitter and Facebook have developed to significant arenas
for archaeology-related exchange (Huvila, 2013; Richardson, 2014; Wakefield,
2020). Also, digital games have developed to considerable sites of producing
and conveying archaeological knowledge (Morgan, 2016). In a somewhat more
confined sense, previously expensive technologies that have become available for
both ordinary archaeologists and hobbyists have broadened the scope and reach
of archaeological and archaeology-related social exchange. As, for instance, the
proliferation of metal-detecting (Dobat et al., 2020) and amateur-use of Google
Earth (Liang et al., 2018) for detecting archaeological sites evince, they can
provide new opportunities for non-archaeologists to engage in archaeological
and archaeology-related knowledge-making and emerge as new sites for diverse
archaeological and archaeology-related discourses. Similarly, even if Wikipedia
is hardly classifiable as an archaeological medium, its popularity as a source of
colloquial information means that its significance for communicating archaeological
knowledge is far from negligible (Grillo & Contreras, 2019). In the non-digital

118

I. Huvila

contexts, museum exhibitions and major archaeological sites are obvious conveyors
of archaeological knowledge but as Högberg’s (2012) discourse analytical study of
information boards shows, even the physical world contains simultaneously obvious
and invisible scenes where archaeological knowledge is communicated and formed.
In summary, even if there are certain established venues where a substantial part
of archaeological discourse takes place, the diversity of outlets where archaeology
and archaeology-related matters are discussed in the contemporary society is
significant. Less familiar sites and contexts are obviously easy to miss as well as
those that are only emerging as arenas for archaeology-related exchange.

6.2.2 Discussants in Archaeological Discourses
A pertinent follow-up question to where archaeological discourse takes place, is
who participates in the discussion. While archaeologists certainly are both the loudest and most numerous contributors in the production of archaeological discourse,
they are not the only ones to talk about archaeology (Hamilakis & Anagnostopoulos,
2009). Huvila and Huggett (2018) make a useful distinction between archaeological
and archaeology-related practices by drawing a line between doings that pertain
to archaeology-proper and diverse activities that are associated with archaeology,
archaeological work and information. While archaeological practices range from
professional and academic fieldwork to research, education and archaeological
heritage administration, the latter extends to such areas as land development,
tourism, popular education and history enactment. In the perimeters of archaeology, the scholarship has had an evident propensity to focus on education and
non-professionals whereas professional archaeology-related discourses and actors,
like land developers (Huvila, 2017b) or administrators (Huvila, 2016b) and their
engagements with archaeology have not been discussed and problematised to a
comparable degree.
Laužikas et al. (2018) review archaeology-related communities and the degrees
of their creolisation in relation to archaeology. Their study identifies ten peripheral
spaces between archaeology and other domains including arts and design, travel
and tourism, branding, crime, identity work, alternative archaeologies, museums
and heritage, amateur-archaeology, education and public policy, and proposes a
model for studying them. Even if none of these archaeology-related communities
and discourses is novel per se, the upsurge of the digital sphere has revolutionised
the volume, topographies and reach of especially non-professional participation in
both archaeological and archaeology-related discourses. The studies conducted so
far evince a large heterogeneity of means and modes of engaging with archaeology. Even if many archaeology-related communities and discourses concern local
archaeology (Deeley et al., 2014), many of them have become ‘glocal’ in the sense
that interest-driven communities traverse boundaries and unite groups across large
geographical areas (Laužikas et al., 2018).

6 The Social Production of Discourse in Archaeology

119

As a whole, a glimpse to the popular and professional archaeological literature
and discourse across the digital and non-digital outlets shows that the predominant
voice in archaeological discourse is that of professional and academic archaeologists. A parallel closer look at the profilerating variety of emerging, especially
digital venues where are archaeological matters are debated today evinces, however,
of a fast widening and, as Laužikas et al. (2018) express it, crealisation of both
archaeological discourse and its discussants.

6.2.3 Approaches to Analysing Discourses
Multiple methods can be used to investigate and follow the social production
of archaeological discourse. Much of the earlier work has been conceptual and
theoretical. It has tended to follow the major shifts in archaeological theory and
the philosophy of science (see e.g. Trigger, 1989; Hodder, 2001; Harris & Cipolla,
2017). Evidence-based research has been typically conducted using qualitative
methods. Different varieties of discourse analysis and studies of knowledge-making
alike are typically based on a close reading of written texts. Lucas (2019) highlights
Hodder’s (1989) study of site reports and Tilley’s (1989a) analysis of Cambridge
inaugural lectures as two key inquiries into archaeological textual discourses. Even
if the study of archaeological discourses goes farther back in time (e.g. Gardin,
1967; Wylie, 1985; Barrett, 1988), the year 1989 with the texts of Hodder, Tilley
and Fahnestock (1989) marks a certain beginning of a recognisable broader interest
in empirical analysis of archaeological texts and discourse as a social phenomenon.
This interest has since then broadened to a certain extent to cover other types of
texts, media and discourses and approaches with a distinct focus, for instance,
narratives, argumentation and discourse (for a partial overview, see Lucas, 2019).
This chapter does not attempt to provide a systematic overview of all possible
approaches that have been used to analyse (archaeological) discourses or would be
useful to that end but by providing a selection of examples (Table 6.1), it provides a
brief glimpse to the variety of alternatives.
In archaeological context, Martín-Rodilla (2015) has analysed discourses in
archaeological documents using Hobbs’ (1985) discourse-analytical approach to
argumentation relations between clauses in text. Discourse analysis and close
reading can be applied to other forms of texts as well. Huvila (2011) has used Laclau
and Mouffe’s (2001) discourse theoretical approach to analyse interview records for
identifying power dynamics and frictions between different ways of conceptualising
the role and value of archaeological reports. This variant of discourse theory can
help to understand, for example, how different expectations and aims of writing
archaeological field reports clash and co-exist with each other in the documents,
and how the presence of multiple discourses means leads not only conflicts and
dissatisfaction to the usefulness of the documentation but also helps them to inform
and be used by multiple communities with widely different needs and expectations
(Huvila, 2011, 2012).

Deconstruction, internal logic of texts or discourses
Subjectivity, four discourses (fundamental types)

Narratives, forms of discourse, strategies of explanation
Structuration (of e.g. discourse), role of language in structuration
Dialogue, history of discourse, language, heteroglossia
Representation of problems, problem representations, policy
analysis, implications of representations to different groups
Identity construction, discrimination, historical dimensions of
discourse formation
Quantitative analysis of publications, information and discourse

Derrida
Lacan

White
Giddens
Bakhtin
‘What’s the Problem
Represented to be’
Discourse Historical
Approach
Bibliometrics and
informettics
Visual discourse analysis

Wetherell and Potter’s
discourse analysis
Body and discourse

Fairclough (1992)
Foucault (1979, 1998, 2002)

Language as a social practice, power asymmetries
Knowledge, power, social practices

Body, embodiment

Discursive psychology, interpretative repertoires

Visuals, visual information, visual media

Laclau and Mouffe (2001)

Politics of discourses, hegemony, antagonism, articulations

Coupland and Gwyn (2003)

Borgman and Furner (2002)
and Groth and Gurney (2010)
Jancsary et al. (2016) and
Albers (2013)
Wetherell and Potter (1988)

Reisigl and Wodak (2009)

White (1975, 1987)
Giddens (1984)
Bakhtin (1981)
Bacchi (2012)

Derrida (1967)
Lacan (1966)

References
Hobbs (1985)

Key aspects
Structure of discourse

Approach
Hobbs’ discourse-analytical
approach
Laclau and Mouffe’s
discourse theory
Critical Discourse Analysis
Foucault

Table 6.1 Examples of approaches to analysing discourses

Goodwin (2003), Huvila (2019b),
and Olsson (2016)

Smith and Campbell (2017)

Hutson (2002) and Jørgensen
(2015)

Smith (2012) and Enqvist (2014)
Waterman (2014), Olsson (2016),
and Bapty (2014)
Bapty (2014) and Tilley (1994)
Nordbladh and Yates (2014) and
Shanks and Tilley (1988)
Pluciennik (1999)
Mizoguchi (1997)
Joyce (2002)

Huvila (2011)

Examples of uses in
archaeology-related contexts
Martín-Rodilla (2015)

6 The Social Production of Discourse in Archaeology

121

Fairclough’s (1992) Critical Discourse Analysis has been used especially in
critical heritage studies to inquire into archaeological heritage discourses (e.g.
Smith, 2012; Enqvist, 2014) and how archaeologists and archaeology produces particulars understandings of ‘archaeological heritage’. In addition to specific discourse
analytical frameworks, investigations into archaeological discourses (e.g. Hodder,
1989; Edgeworth, 1991; Bapty & Yates, 2014) have followed less specific content
analytical approaches and historical method and found inspiration from several
different discourse theorists including Foucault, Derrida, Lacan (incl. Foucault,
1979, 1998, 2002; Derrida, 1967; Lacan, 1966 see e.g. Bapty & Yates, 2014;
Thomas, 1993; Smith, 2004), White (incl. White, 1975, 1987, see e.g. Pluciennik,
1999), Giddens (Giddens, 1984 e.g. in Mizoguchi, 1997), Bakhtin (e.g. Bakhtin,
1981 in Joyce, 2002) and others. Lucas (2019) recent contribution to the inquiry
of the literary knowledge production in archaeology draws on a broad array of
theorists in linguistics, composition studies and rhetorics. Examples of the variants
of discourse analysis that have not made their way to the analysis of archaeological
discourses but that could well be useful, are for instance, Bacchi’s (2012) ‘What’s
the Problem Represented to be’ and the Discourse Historical Approach (Reisigl
& Wodak, 2009). Bacchi’s method traces problems that are embedded in discourses
and often remain unarticulated behind articulated solutions and priorities. Discourse
Historical Approach builds on Critical Discourse Analysis, although emphasising
that the interaction between discourse structures and social structures is mediated
rather than determined by the latter. In addition to qualitative text analytical
methods, discourses can be investigated also by using quantitative text analysis. For
example, Jackson et al. (2020) have studied how archaeological texts conceptualise
bone material suggesting that the two main categories of referring to bones is to
discuss them as objects and as related to bodies.
Apart from inquiring into texts, the social production of archaeological discourse
can also be followed in other types of traces. In the literature, in parallel to text,
discourses can be traced using bibliometric and informetric methods by investigating how authors cite different texts in their work (e.g. Hutson, 2002; Jørgensen,
2015). Discourse is also carried by images and can be approached using visual
discourse analysis (Jancsary et al., 2016; Albers, 2013). Further, discourses and
their related interpretative repertoires can also be uncovered in quantitative survey
data by interpreting responses as indicative of different ways of thinking and talking
about particular matters (e.g. Huvila, 2020b). Goodwin’s (2003) study of young
archaeologists who are learning to excavate and interpret their findings showcase
further how it is possible to study the social production of bodily discourses in
action, and how it can provide critical insights in how archaeology is learned and
how archaeological knowledge is produced in highly material terms. The material
and bodily dimensions of archaeological discourse have been analysed further by
others, drawing for instance from genre theory (Huvila, 2019a), and the work of
Foucault and Fairclough (Olsson, 2016).
As a whole, the brief summary of examples of discourse analytical and theoretical approaches and perspective in this section shows first and foremost the diversity
and sheer number of alternatives how archaeological discourse and discourses can

122

I. Huvila

be studied. Without having to say, there are many others worth considering that
similarly to the ones discussed here, provide means to analyse the social production
of both the broader archaeological discourse and the multiplicity of archaeological
and archaeology-related discourses.

6.3 Characteristics of Archaeological Discourses
After a brief survey of where archaeological discourses unfold and how they can
be studied, in the following, we will proceed to provide a short exposé of three
aspects of the social production of archaeological discourse that characterise the
contemporary and past exchange of archaeological ideas and how the discourse
unfolds in the social fabric of archaeological and archaeology-related practices. We
will consider the social and societal underpinnings that influence archaeological
discourse, the questions of power and mandate to make authoritative claims about
archaeological matters, and finally the structural and infrastructural scaffolding that
conveys and sustains archaeological knowledge work.

6.3.1 Social and Societal Underpinnings
A key question in an attempt to approach the unfolding of archaeological discourse
is to consider what drives archaeological knowledge production. On a broad societal
level, the question is obviously about the general rationale of studying the human
past. On the level of the formation of specific archaeological discourses, it is more
of an issue of what drives the interest in particular issues about the past. Throughout
the history of science, the development of scientific and scholarly disciplines has
been explained to a varying degree by internal and external influences. Even if
there is hardly a consensus, a popular tendency in the recent scholarship has been to
espouse, in broad terms, a contextualist standpoint that emphasises the influence and
interplay of both societal and intra-disciplinary influences (Brush, 1995; Schnapp,
2012; Salminen, 2020). For a contextualist, it is obvious that even if individual
archaeologists and their personal interests and yearning for knowledge should not
be dismissed as instigators of new archaeological knowledge (Farid, 2015), both the
popular interest in archaeology (Trigger, 1995) and in many cases, glaringly political
interests both have, and has had, a major impact on the discourse and production
of new archaeological knowledge (e.g. Kohl & Fawcett, 1995a; Mizoguchi, 1997;
Hegardt & Källén, 2011; Bernbeck, 2012).
Apart from the interests that spur the perceived significance of archaeology
and archaeological knowledge, there are multiple socio-structural factors within
and in the close vicinity of archaeology that influence how the intra-disciplinary
discourse unfolds. Archaeology is shaped by transdisciplinary influences from other
scientific and scholarly fields. Even if there has always been exchange between

6 The Social Production of Discourse in Archaeology

123

scientific and scholarly fields (Díaz-Andreu & Coltofean-Arizancu, 2020), the
recent rapid adaptation of scientific and data-intensive digital analysis methods
to archaeological work have had a profound, arguably to a degree unprecedented,
impact (Kristiansen, 2014a). However, in parallel to external influence, archaeology
also is shaped by archaeology itself. The archetypal forms of archaeological work
(Moser, 2007) and social and practical organisation archaeological practices shape
and enable archaeological knowledge production (Shanks, 2012) together with
paradigmatic theories (Kristiansen, 2014b) and the structures of organising and
exhibiting archaeological knowledge (Coye, 2009) and information (Huvila, 2019c)
in different forms and modalities. Even if archaeology has never been a solitary
undertaking, discourse and knowledge production has been traditionally a concern
of individual archaeologists who have directed fieldwork and research projects
(Huvila, 2017a). Undoubtedly the most well-known approach to direct attention
to the relevance of increasing the multivocality of archaeological discourse and
knowledge production is the reflexive archaeology envisioned and developed by
Hodder and colleagues (Hodder, 2000) that summons everyone wielding a trowel to
take part in the reflexive practice (Berggren & Hodder, 2003) through participating
both in the bodily discourse of hands-on practice (cf. Olsson, 2015; Huvila, 2019b)
and ‘in conversations at the edge of the trench’ (Morgan & Wright, 2018, p. 146).
Reflexive archaeology and how it has been applied and developed since the early
1990s deserves to be credited of directing focus on the fundamentally social nature
of how archaeological discourse develops and archaeological knowledge unfolds
but it evinces at the same time the multiplicity of challenges (Hamilakis, 1999)
of how to maintain and capture it in its broad diversity. The growing number
of public archaeology (e.g. Okamura & Matsuda, 2011; Wakefield, 2020; papers
in Williams et al., 2019) and community archaeology initiatives (e.g. Miroff &
Versaggi, 2020; Bromberg et al., 2017) provide parallel examples of engagements
with non-archaeologists. The outcomes of individual endeavours can always be
debated (Simpson & Williams, 2008; Emerson & Hoffman, 2019) and even if
the most successful ones tend to have some rough edges as archaeological work
in general (Silliman, 2018), there are signs that the archaeological community
is taking significant steps towards increasing the multivocality of participation in
archaeological discourse and knowledge production.
Besides the social structures of archaeological work, collective society-level
priorities and arrangements have an equally consequential albeit somewhat more
indirect impact on archaeological discourse. Early archaeology and archaeological
discourse have been characterised by antiquarian and national interests in the
past and its material, primarily monumental, remains (Kohl & Fawcett, 1995b).
Internationally, it became a public, societal matter first towards the mid and late
twentieth century. The commercialisation of the sector and introduction of new
public management principles in development-led archaeology from the 1990s
onwards in a number of countries has contributed to further change and framing
of archaeology as a business (Rostock, 2007) rather than as a branch of scholarship
or public good. It has stirred up the critique of the increasing influence of profitoriented ideologies on archaeological work (e.g. Zorzin, 2015; Demoule, 2012)

124

I. Huvila

and discourse (Smith, 2004). In parallel, even if the contemporary national political
agendas are not always as blatant as they were, for instance, in the early twentieth
century colonial Africa (e.g. Conde et al., 2016), national socialist Germany (Arnold
& Hassmann, 1995) or Soviet Union (Shnirelman, 1995), there is a plenty of
evidence (e.g. Gustafsson & Karlsson, 2011; Stylianou-Lambert & Bounia, 2016)
of how the usefulness of archaeology has not escaped the attention of present-day
politicians either.
Besides public policy, the framing of archaeology in the popular debate, and
not least in popular culture, has a continuing impact on archaeology and how
archaeological discourse evolves (Matthews, 2004). It has both direct and indirect
repercussions to funding, the perceived value of particular aspects of archaeological
heritage, consequently to research agendas adopted in archaeology and not least
to archaeologists’ self-understanding of their identity and role as professionals and
cognitive authorities.
In summary, even if archaeologists play a decisive role in the formation of
archaeological discourse, professional and academic archaeology are not isolated
from the society where the discourse takes place. Archaeology is used by extraarchaeological actors in the society as much as archaeological discourse as a whole
and particular perspectives to archaeological and archaeology-related matters stem
from the society where the discourse and discourses take place.

6.3.2 Power and Mandate
There are many reasons why particular perspectives gain precedence in archaeological discourses and why certain discourses prevail and others pass away. The
general societal and discursive power structures that are conventionally used to
explain inequalities operate also in archaeology. What is important, however, is to
be sensitive to their influence and lack of it, to avoid turning the critique itself to a
structure that generates new kinds of inequities. A non-negligible reason why certain
positions persist is the propensity of societal regimes – whether they are professional
and academic disciplines, political ideologies or public authorities – to essentialise
discourses.
In the heritage field, one of the most prominent descriptions of how this can
happen is Smith’s (2006) well-known analysis of how heritage itself can be seen as
a discourse and how very a particular institutionalised authorised heritage discourse
has become predominant in defining what heritage is and how it should be acted
upon. In a comparable sense, there are authoritative archaeological discourses
that are stipulating what pertains to archaeology and what remains outside of its
perimeters. Enqvist’s (2016) study of archaeological heritage professionals’ framing
of what counts as archaeological sites and monuments exemplifies the influence of
their sayings and doings on how archaeology is understood in the society in practice.
The mandate of making authoritative claims about archaeology and archaeological knowledge has been traditionally tightly intertwined to structural hierarchies of

6 The Social Production of Discourse in Archaeology

125

archaeological work. Not only the privilege to interpret and publish but also finds,
documentation and entire sites have been routinely attributed to the individuals who
direct excavations and fieldwork. In addition to the social nature of archaeological
knowledge production, reflexive archaeology (Hodder, 2000) has also emphasised
that knowledge-making should not be limited to encompass merely the post-survey
phase interpretation of field documentation carried out by a field director or in some
cases, a small group of senior archaeologists (cf. Bradley, 2003; Lucas, 2001; Tilley,
1989b).
Even if archaeological knowledge-making has irrefutably become a more social
and collective enterprise than before, many archaeologists still remain silent in the
archaeological record (Lucas, 2001). The deliberate naming and non-naming of
subjects is a powerful mechanism that includes and excludes. It helps archaeological
information to traverse different archaeological and archaeology-related domains or
discourses (Huvila, 2017a) and gives authority to ‘archaeological’ propositions and
makes them harder to refute. These findings are similar to Enqvist’s (2016) observations of the Finnish authoritative heritage discourse where the social significance
of managing archaeological heritage is depersonalised and reduced to obeying the
Antiquities Act. At the same time, selective naming and non-naming do, however,
also contribute to that ‘a very large group of anonymous and silent archaeologists’
still have no voice in archaeological discourse (Lucas, 2001, p. 12).
Unsurprisingly, the dividing lines between silence and non-silence have tended to
follow not only those of merit and experience but also those of gender, social status
and origins (Berggren & Hodder, 2003) – the boundaries between privileged and
disadvantaged. Feminist archaeology and feminist research on archaeology have
demonstrated not only that archaeology like scientific and scholarly research in
general has a long parallel history of being dominated by men and stereotypically
masculine perspectives. Much of archaeological work has focused on public rather
than domestic matters and projected assumptions of the prevalence of gender roles
that dominated in the Western cultural hemisphere during the last couple of centuries
in the past societies (Conkey, 2003).
In parallel to pure gender-bias, feminist theorising has also directed attention
to the general insensitivity to multiplicity – of accountabilities, scales and perspectives (Wylie, 2007) in archaeological discourse. Besides gender, this applies to
indigenous (e.g. Marliac, 2005), popular culture (Holtorf, 2005), and in general nonpredominant and non-professional, perspectives and broader framing of research,
for instance, concerning the mutual influence of micro- and macro-level factors and
phenomena (Conkey, 2003). Western archaeologists have also been criticised for
a tendency to orientalise (as for Said, 1979) non-Western views to archaeological
heritage as exotic and less-creditable (Starzmann, 2012) and of neo-colonialism
enacted through global appropriation of local heritage as world heritage (Stobiecka,
2020).
Overall, it is easy to both exaggerate and underestimate the influence of power
relations in the formation of discourses. The recent archaeological literature has
taken significant steps towards disclosing predominant hierarchies, deconstructing
traditional narratives of archaeology and the past, and identifying silences and

126

I. Huvila

insensitivities. At the same time, it is apparent that the power structures and
multiplicities highlighted in the contemporary debate are only examples – many
of them flagrant ones – but still mere instances of the complex dynamics of how the
mandate of having a say in archaeological matters affects archaeological knowledge
production.

6.3.3 Structural and Infrastructural Scaffolding
The previous sections provided a glimpse to how archaeological discourse is
influenced by its underpinning intra-disciplinary and broader societal fluctuations
and conscious and unconscious acts of seizing and maintaining control. At the
same time, it is shaped by supporting and ‘scaffolding’ (Wylie, 2017) technical
and material structures and infrastructures. This has become, perhaps especially
apparent with the digitalisation of discursive infrastructures. Apart from enabling
exchange, diverse digital and non-digital technologies influence how archaeologists
and non-archaeologists alike can participate in archaeological discourses and
knowledge-making. Critics have pointed attention to emerging digital inequalities
between large well-funded and smaller archaeological projects (Watson, 2019;
Chadwick, 2003) and warned of a substantial risk that new infrastructures reinforce
existing detrimental hierarchies (Taylor & Gibson, 2017) and erect new unwanted
ones instead of contributing to an advancement and quality of the archaeological
enterprise as a whole.
Besides having or not having access to particular technologies, the unfolding
of archaeological discourse depends on their specific qualities and characteristics.
Digital geographic data and 3D visualisations are probably the most intensely
debated examples of contemporary technologies used by both professional archaeologists and others to communicate archaeology and to contribute to archaeological
knowledge-making. Instead being neutral, they both have their own language
(Manovich, 2001, also e.g. Cochrane & Russell, 2007) that influences interpretations (Copplestone & Dunne, 2017), the on-going discourse and sometimes
are instrumental for introducing new ones. The same applies to documentation
equipment. Moving from paper-based field documentation to digital does not
influence only what is captured and how but also how the documented phenomena
are framed and described (Huvila, 2019b).
In parallel to technologies, the making of discourse is similarly shaped by
different literary, non-literary (Bakhtin, 1982) and social genres (Miller, 1984) of
their (re)presentation. Lucas (2019) proposes that instead of genres, it goes further
back to text types – or, perhaps, in a broader sense, information, its function
and especially structure. In a sociotechnical sense, borrowing from Pickering
(1995), discourse can be described as unfolding in a ‘dance’ of agency between
technologies, their users (Gunnarsson, 2020) and information bearers (Huvila,
2016a). Each of them puts each other to work (Huvila, 2018) to advance a particular

6 The Social Production of Discourse in Archaeology

127

set of goals and aspirations that are partly consciously instituted and partly inherent
to the constituents of the discursive dance.
Even if there is no real doubt about the critical influence of technologies and
technical infrastructures on the unfolding of discourse, as Lucas (2019) remarks,
the proliferation of new forms of media, technologies and genres of (re)presentation
does not necessarily mean that types of writing (or discussing) archaeology would
change to a radical extent. In parallel to following the social and societal base,
power relations, and structural, infrastructural underpinnings of how archaeological
discourse unfolds, it is equally important to follow the discourse itself and its
forms. Archaeological discourse can still be enacted as a narrative, argumentation
or persuasion, exposition or, for instance, conversation (Lucas, 2019) even if the
genres, technologies and its language of expression would change shape and the
societal and social premises of discussing archaeology would become different.

6.4 Understanding the Discursive Production
of Archaeological Knowledge Matters
After reviewing the constituents and some of the predominant characteristics
of archaeological discourse, it is appropriate to summarise key implications of
understanding how archaeological discourse comes into being. Theoretical and
evidence-based exploration of the social production of archaeological discourse and
its premises informs both archaeological inquiry itself and the making and use of
archaeological information and knowledge in a broader and more indirect sense
both within archaeology and neighbouring disciplines and contexts.
First and fundamentally, a general understanding and empirical descriptions of
the social enactment of archaeological discourse can help to understand archaeological knowledge production and to debunk inaccurate conceptions of how
archaeological knowledge comes into being. A direct consequence of the social
nature and multiplicity of communities and venues where archaeological discourse
is enacted is that it is hardly justified to refer to archaeological discourse in the
singular. There are multiple parallel, partly overlapping, national and thematic
discourse communities in academic and professional archaeology alone (Venclova,
2007) with their own local knowledges and systems of knowing (Huvila, 2020a).
While some archaeology-related discourses and practices have explicit and implicit
influence on how archaeology is discussed and practised, many of them are
excluded. In parallel to how individuals and groups of people are excluded from
participating in the archaeological discourse by consciously and unconsciously
silencing their voices, other voices and perspectives are suppressed by designating
them non-archaeological or archaeologically less relevant. In this respect, an
analytical distinction between archaeological and archaeology-related (as in Huvila
& Huggett, 2018) should not be treated as a question of the value of knowledge
and knowledge claims. Instead, the relation of discourses to archaeology and the

128

I. Huvila

difference between the two could be seen in terms of centred sets as a question of
how far a discourse is from the nucleus of ‘archaeology’ and other parallel epistemic
cores (as in Huvila, 2019d), for instance, of entertainment, land development,
identity-building or commercial interests.
Second, a thorough understanding of the social production of archaeological
discourse and discourses is a necessary precondition to any meaningful attempts
to pluralise archaeological knowledge production. Decolonisation and engagement
with the past and present social injustices (Starzmann, 2012) of archaeological
practices and discourses is impossible without a comprehensive understanding of
ideological and pragmatic dimensions of archaeological work (Schlanger, 2012),
discursive mechanisms of colonisation, mechanisms of silencing, making invisible and taking over expropriating discourses. Understanding and acknowledging
multiple perspectives and discourses is merely a starting point (Lucas, 2019) but
a necessary premise of fostering meaningful techniques of participation, increased
epistemic openness (Marila, 2020) and, for instance, envisioning, developing and
using new tools together with local communities (Palmer, 2013) that do not backfire
and lead to new inequalities. The efforts to pluralise archaeological discourse
and studies following such attempts have simultaneously identified both parallel
developments that go against its strive for broader participation in knowledgemaking across the spectrum of archaeological and creolised communities and the
general difficulty of realising multivocality in practice. Attitudes are not changing
fast if at all and not necessarily to directions that embrace multivocality as an ideal.
The increasing professional and scholarly specialisation, adoption of increasingly
complex technologies(Watson, 2019), excessive confidence in digital data (Bevan,
2015) and failure to understand and account for the contexts knowledge production
(Huvila, 2019c) come with a real risk that it can be exceedingly difficult to
participate in archaeological knowledge-making.
Third, a comprehensive insight into the social production of archaeological
discourse is of vital importance in understanding and developing archaeological information work. Without a thorough understanding of how archaeology is
discussed, archaeological information remains difficult to find and use, and it is
demanding to manage. Producing information and documentation that is easily
understood by its actual and potential contemporary and future users is equally
challenging, if possible at all. While some would deny the possibility to develop
a general language for representing archaeological knowledge a lieu of what, for
instance, Gardin’s (1980, 1999) logicist programme aimed at achieving (also e.g.
Djindjian, 2004), even narrower standardisation and interoperability of information
requires an in-depth understanding of how archaeological discourse unfolds and
how it has had a strong tendency to resist settling for using shared concepts and
terms (e.g. Pavel, 2010; Oikarinen & Kortelainen, 2013). So far, there is still relatively little empirical and theoretical work both from historical and contemporary
perspectives on how archaeological materiality is translated into words (Lucas,
2019) and other conveyers of information. Salminen’s (2020) critique that historical
research has been overtly concerned with political influences and neglected the role
of individual researchers in their contemporary contexts applies also to a certain

6 The Social Production of Discourse in Archaeology

129

extent to the literature on the production of contemporary archaeological practices
and discourses.
Fourth and finally, a better understanding of the social production of archaeological discourse is necessary for understanding and increasing its diversity. Using the
discourse analytic concept of Wetherell and Potter (1988), there are multiple interpretative repertoires of how archaeology and archaeological matters are conceived
by archaeologists and non-archaeologists across the broad variety of archaeologyrelated (e.g. those in Laužikas et al., 2018, and others) and archaeological local
and global communities. They all come with their own arrangements or regimes
of truth (Foucault, 1975), value (Boltanski & Thévenot, 2006) and what counts as
information (Ekbia & Evans, 2009). Differences and contradictions do not mean
that any particular perspective would be inherently or necessarily wrong but they
rather bestow an epistemic responsibility to inquire into them, their internal and
external plausibility and implications, and how and where they emerge. As Lucas
(2019) reminds, the tendency to contrast (Western) scientific to other discourses –
especially indigenous ones – risks to lump together widely different epistemic
cultures, reinforce old stereotypical dichotomies and push towards a compelled
choice between archaeology and ‘non-archaeology’ not as a mere analytical division
but as a distinction of legitimacy. Lucas (2019) seconds Sillitoe’s (2002) proposal
to focus on epistemic practices, specific problems and issues as means to elicit
dialogue and collaboration instead of underlining premisory epistemological differences. Instead of treating differences as a question of demarcations (Sommerlund,
2002) and ending up engaging in mere face-work (Clauss, 2016), a more productive
approach could be to focus on distance and proximities between communities
(Huvila, 2020a), their practices and matters of concern, what they perceive as real
options, and most importantly, what implications and concrete consequences the
different discourses have.

6.5 Conclusions
There are several key takeaways from a review of the social production of archaeological discourse to consider. Even if archaeology and archaeological discourse
are social and enacted in a thick of a large number of overlapping, parallel and
often distant communities, it does not mean that archaeological discourse would
necessarily be inclusive or dialogic. Understanding that it is the first necessary
step to that direction similarly to how it is a precondition to understanding how
archaeological knowledge unfolds, to advancing archaeological information work,
developing better and more meaningful tools, documentation and infrastructures
for archaeological knowledge production, and engaging with the broad and diverse
archaeological field of discursivity.
A popular contemporary suggestion for academics and professionals is to reach
out and engage in the discourse outside of archaeology proper. The relevance of participating in public archaeology-related discourse in social media, and for instance,

130

I. Huvila

to contribute to Wikipedia has been broadly acknowledged (e.g. Harding, 2007;
Scherzler, 2010) but at the same time embraced by relatively few in archaeology and
scholarly and professional community as a whole (e.g. AtallahBidart, 2020; Cyron,
2017). Acknowledging and understanding that archaeological discourse is social
does not, however, mean that every archaeologist would need to do outreach in
every conceivable community in person. Sometimes the expectations of community
participation can be over-dimensioned (Chirikure et al., 2010). What is probably
more important, is to be knowledgeable of on-going discourse, to able to position
oneself – in an anthropological sense (cf. Hamilakis & Anagnostopoulos, 2009) – in
the thick of archaeological things, and that the field of discursivity itself is inclusive,
open and dialogic not only for an archaeological discourse, but discourses in the
plural.

References
Aitchison, K. (2017). On the outside looking in: What will Brexit mean for European archaeology?
The Historic Environment, 8(3), 194–198.
Albers, P. (2013). Visual discourse analysis. In P. Albers, T. Holbrook, & A. Flint (Eds.), New
methods of literacy research (p. 8). Routledge.
Arnold, B., & Hassmann, H. (1995). Archaeology in Nazi Germany: The legacy of the Faustian
bargain. In P. L. Kohl & C. P. Fawcett (Eds.), Nationalism, politics, and the practice of
archaeology. Cambridge University Press.
AtallahBidart, S. (2020). Collaborer sur wikipédia pour co-construire une société de la connaissance. Revue française des sciences de l’information et de la communication, 20.
Bacchi, C. (2012). Introducing the ‘what’s the problem represented to be?’ Approach. In A. Bletsas
& C. Beasley (Eds.), Engaging with Carol Bacchi: Strategic interventions and exchanges (pp.
21–24). University of Adelaide Press.
Bakhtin, M. M. (1981). Dialogic imagination: Four essays. University of Texas Press.
Bakhtin, M. M. (1982). L’oeuvre de François Rabelais et la culture populaire au Moyen Age et
sous la Renaissance. Gallimard.
Bapty, I. (2014). Nietzsche, Derrida and Foucault: Re-excavating the meaning of archaeology. In I.
Bapty & T. Yates (Eds.), Archaeology after structuralism: Post structuralism and the practice
of archaeology (pp. 214–276). Routledge.
Bapty, I., & Yates, T. (Eds.). (2014). Archaeology after structuralism: Post structuralism and the
practice of archaeology. Routledge.
Barrett, J. C. (1988). Fields of discourse: Reconstituting a social archaeology. Critique of
Anthropology, 7(3), 5–16.
Berggren, A., & Hodder, I. (2003). Social Practice, Method, and Some Problems of Field
Archaeology. American Antiquity, 68(3), 421–434. http://www.jstor.org/stable/3557102
Bernbeck, R. (2012). The political dimension of archaeological practices. In D. T. Potts (Ed.), A
companion to the archaeology of the ancient Near East (pp. 87–105). Wiley-Blackwell.
Bevan, A. (2015). The data deluge. Antiquity, 89(348), 1473–1484.
Boltanski, L., & Thévenot, L. (2006). On justification. Princeton University Press.
Borgman, C. L., & Furner, J. (2002). Scholarly communication and bibliometrics. ARIST, 36(1),
2–72.
Börjesson, L. (2015). Grey literature – Grey sources? Nuancing the view on professional
documentation: The case of Swedish archaeology. Journal of Documentation, 71(6), 1158–
1182.

6 The Social Production of Discourse in Archaeology

131

Börjesson, L. (2016a). Beyond information policy: Conflicting documentation ideals in extraacademic knowledge making practices. Journal of Documentation, 72(4), 674–695.
Börjesson, L. (2016b). Research outside academia? An analysis of resources in extra-academic
report writing. In Proceedings of the 2016 ASIS&T annual meeting, Copenhagen (pp. 1–10).
Bradley, R. (2003). Seeing things: Perception, experience and the constraints of excavation.
Journal of Social Archaeology, 3(2), 151–168. http://jsa.sagepub.com/content/3/2/151.abstract
Bromberg, F., Cressey, P., Fesler, G., Nasca, P., & Reeder, R. (2017). We dig Alexandria: A
reflection on more than fifty years of community archaeology. In Urban archaeology, municipal
government and local planning (pp. 203–225). Springer.
Brush, S. G. (1995). Scientists as historians. Osiris, 10(1), 214–231.
Chadwick, A. (2003). Post-processualism, professionalization and archaeological methodologies.
Towards reflective and radical practice. Archaeological Dialogues, 10(1), 97–117.
Chirikure, S., Manyanga, M., Ndoro, W., & Pwiti, G. (2010). Unfulfilled promises? Heritage management and community participation at some of Africa’s cultural heritage sites. International
Journal of Heritage Studies, 16(1–2), 30–44.
Clauss, L. R. (2016). Betwixt and between: Archaeology’s liminality and activism’s transformative
promise. In S. Atalay (Ed.), Transforming archaeology (pp. 29–44). Routledge.
Cochrane, A., & Russell, I. (2007). Visualizing archaeologies: A manifesto. Cambridge Archaeological Journal, 17(01), 3–19.
Conde, P., Senna-Martínez, J. C., & Martins, A. C. (2016). Archeological connections: Tracking
and tracing international relations throughout Portuguese colonialism. In G. Delley, M.
Díaz-Andreu, F. Djindjian, V. M. Fernández, A. Guidi, & M.-A. Kaeser (Eds.), History of
archaeology: International perspectives (pp. 51–62). Archaeopress.
Conkey, M. W. (2003). Has feminism changed archaeology? Signs, 28(3), 867–880.
Copplestone, T., & Dunne, D. (2017). Digital media, creativity, narrative structure and heritage.
Internet Archaeology, 44.
Coupland, J., & Gwyn, R. (Eds.). (2003). Discourse, the body, and identity. Palgrave Macmillan.
Coye, N. (2009). Collections, musées, paysages. Les Nouvelles de l’archéologie, 117, 3–5.
Cyron, M. (2017). Wikipedia. macht. archäologie. Archäologische Informationen, 40. Archäologische Informationen.
Deeley, K., Pruitt, B., Skolnik, B. A., & Leone, M. P. (2014). Local discourses in archaeology. In
C. Smith (Ed.), Encyclopedia of global archaeology (pp. 4540–4545). Springer. https://doi.org/
10.1007/978-1-4419-0465-2_1556
Demoule, J.-P. (2012). Rescue archaeology: A European view. Annual Review of Anthropology,
41, 611–626.
Derrida, J. (1967). De la grammatologie. Les Éditions de Minuit.
Díaz-Andreu, M., , & Coltofean-Arizancu, L. (2020). Interdisciplinarity in archaeology – A
historical introduction. In L. Coltofean-Arizancu, & M. D.-A. García (Eds.) Interdisciplinarity
and archaeology: Scientific interactions in nineteenth- and twentieth-century archaeology, (pp.
1–21). : Oxbow.
Djindjian, F. (2004). La publication scientifique en langue naturelle est-elle en archéologie
un discours logique? Essai de conception d´un langage cognitif d´aide á la publication.
Archeologia e calcolatori, 15, 51–61.
Dobat, A. S., Deckers, P., Heeren, S., Lewis, M., Thomas, S., & Wessman, A. (2020). Towards a
cooperative approach to hobby metal detecting: The European public finds recording network
(EPFRN) vision statement. European Journal of Archaeology, 23(2), 272–292.
Edgeworth, M. (1991). The act of discovery: An ethnograpby of the subject-object relation in
archaeological practice. Ph.D. thesis, University of Durham.
Ekbia, H. R., & Evans, T. P. (2009). Regimes of information: Land use, management, and policy.
The Information Society, 25(5), 328–343.
Emerson, P., & Hoffman, N. (2019). Technical, political, and social issues in archaeological
collections data management. Advances in Archaeological Practice, 7(3), 258–266.
Enqvist, J. (2014). The new heritage: A missing link between Finnish archaeology and contemporary society? Fennoscandia Archaeologica, XXXI, 101–123.

132

I. Huvila

Enqvist, J. (2016). Suojellut muistot: Arkeologisen perinnön hallinnan kieli, käsitteet ja ideologia.
Doctoral dissertation, University of Helsinki.
Fahnestock, J. (1989). Arguing in different forums: The bering crossover controversy. Science,
Technology & Human Values, 14(1), 26–42.
Fairclough, N. (1992). Discourse and social change. Polity.
Farid, S. (2015). ‘Proportional representation’: Multiple voices in archaeological interpretation at Ç
atalhöyük. In R. Chapman & A. Wylie (Eds.), Material evidence: Learning from archaeological
practice (pp. 59–78). Routledge.
Foucault, M. (1975). Surveiller et punir, naissance de la prison. Gallimard.
Foucault, M. (1979). My body, this paper, this fire. Oxford Literary Review, 4(1), 9–28.
Foucault, M. (1998). What is an author? In J. D. Faubion (Ed.), Aesthetics, method and
epistemology (pp. 205–222). The New Press.
Foucault, M. (2002). The archeology of knowledge. Routledge. L’Archeologie du savoir first
published 1969 by Editions Gallimard.
García-Ceballos, S., Rivero, P., Molina-Puche, S., & Navarro-Neri, I. (2021). Educommunication
and archaeological heritage in Italy and Spain: An analysis of institutions’ use of Twitter,
sustainability, and citizen participation. Sustainability, 13(4), 1602.
Gardin, J. C. (1967). Methods for the descriptive analysis of archaeological material. American
Antiquity, 32(1), 13–30.
Gardin, J.-C. (1980). Archaeological constructs: An aspect of theoretical archaeology. Cambridge
University Press.
Gardin, J.-C. (1999). Archéologie, formalisation et sciences sociales. Sociologie et sociétés, 31(1),
119–127. http://www.erudit.org/revue/socsoc/1999/v31/n1/001282ar.pdf
Giddens, A. (1984). The constitution of society: Outline of the theory of structuration. Polity.
Goodwin, C. (2003). The Body in Action. In J. Coupland, & G. Richard (Eds.) Discourse, the body,
and identity. Palgrave Macmillan. http://site.ebrary.com/id/10076971
Grillo, K. M., & Contreras, D. A. (2019). Public archaeology’s mammoth in the room: Engaging
wikipedia as a tool for teaching and outreach. Advances in Archaeological Practice, 7(4), 435–
442.
Groth, P., & Gurney, T. (2010). Studying scientific discourse on the web using bibliometrics: A
chemistry blogging case study. In Proceedings of the WebSci10: extending the frontiers of
society on-line. Web Science Trust.
Gunnarsson, F. (2020). Digitalisation and its impact on archaeological knowledge production. In
J. Hansson & J. Svensson (Eds.), Doing digital humanities: Concepts, approaches, cases (pp.
27–44). Linnaeus University Press.
Gustafsson, A., & Karlsson, H. (2011). A spectre is haunting swedish archaeology – The spectre
of politics: Archaeology, cultural heritage and the present political situation in sweden. Current
Swedish Archaeology, 19(1), 11–36.
Hamilakis, Y. (1999). La trahison des archeologues? Archaeological practice as intellectual activity
in postmodernity. Journal of Mediterranean Archaeology, 12(1), 60–79.
Hamilakis, Y., & Anagnostopoulos, A. (2009). What is archaeological ethnography? Public
Archaeology, 8(2–3), 65–87.
Harding, A. (2007). Communication in Archaeology. European Journal of Archaeology, 10(2–3),
119–133. http://eja.sagepub.com/cgi/content/abstract/10/2-3/119
Harris, O. J., & Cipolla, C. (2017). Archaeological theory in the new millennium. Routledge.
Hegardt, J., & Källén, A. (2011). Being through the past: Reflections on swedish archaeology
and heritage management. In L. R. Lozny (Ed.), Comparative archaeologies (pp. 109–135).
Springer.
Hobbs, J. R. (1985). On the coherence and structure of discourse. Technical report, Center for the
Study of Language and Information (CSLI).
Hodder, I. (1989). Writing archaeology: Site reports in context. Antiquity, 63(239), 268–274.
Hodder, I. (2000). Towards reflexive method in archaeology: The example at Çatalhöyük. McDonald Institute for Archaeological Research.
Hodder, I. (Ed.). (2001). Archaeological theory today. Polity.

6 The Social Production of Discourse in Archaeology

133

Högberg, A. (2012). The voice of the authorized heritage discourse: A critical analysis of signs at
ancient monuments in Skåne, Southern Sweden. Current Swedish Archaeology, 20, 131–167.
http://www.arkeologiskasamfundet.se/csa/
Holtorf, C. (2005). Beyond crusades: How (not) to engage with alternative archaeologies. World
Archaeology, 37(4), 544–551.
Hutson, S. R. (2002). Gendered citation practices in american antiquity and other archaeology
journals. American Antiquity, 67(2), 331–342. http://www.jstor.org/stable/2694570
Huvila, I. (2011). The politics of boundary objects: Hegemonic interventions and the making of a
document. Journal of the Association for Information Science and Technology, 62(12), 2528–
2539.
Huvila, I. (2012). Authorship and documentary boundary objects. In 45th Hawaii international
conference on system science (HICSS) (pp. 1636–1645). IEEE Computer Society.
Huvila, I. (2013). Engagement has its consequences: The emergence of the representations of
archaeology in social media. Archäologische Informationen, 36, 21–30.
Huvila, I. (2014). Archaeologists and their information sources. In I. Huvila (Ed.), Perspectives to
archaeological information in the digital society (pp. 25–54). Department of ALM, Uppsala
University.
Huvila, I. (2016a). Awkwardness of becoming a boundary object: Mangle and materialities of
reports, documentation data and the archaeological work. The Information Society, 32(4), 280–
297.
Huvila, I. (2016b). ‘If we just knew who should do it’, or the social organization of the archiving of
archaeology in Sweden. Information Research, 21(2), Paper 713. http://www.informationr.net/
ir/21-2/paper713.html
Huvila, I. (2017a). Archaeology of no names? The social productivity of anonymity in the
archaeological information process. ephemera, 17(2), 351–376.
Huvila, I. (2017b). Land developers and archaeological information. Open Information Science,
1(1), 71–90.
Huvila, I. (2018). Putting to (information) work: A Stengersian perspective on how information
technologies and people influence information practices. The Information Society, 34(4), 229–
243.
Huvila, I. (2019a). Genres and situational appropriation of information. Journal of Documentation,
75(6), 1503–1515.
Huvila, I. (2019b). Learning to work between information infrastructures. Information Research,
24(2), paper 819. http://www.informationr.net/ir/24-2/paper819.html
Huvila, I. (2019c). Management of archaeological information and knowledge in digital environment. In M. Handzic (Ed.), Knowledge management, arts and humanities (pp. 147–169).
Springer.
Huvila, I. (2019d). Rethinking context in information research: Bounded versus centred
sets. Information Research, 24(4), paper colis1912. http://www.informationr.net/ir/24-4/colis/
colis1912.html
Huvila, I. (2020a). Information-making-related information needs and the credibility of information. Information Research, 25(4), paper isic2002. http://informationr.net/ir/25-4/isic2020/
isic2002.html
Huvila, I. (2020b). Librarians on user participation in five european countries/perspectives de
bibliothécaires sur la participation des utilisateurs dans cinq pays européens. Canadian Journal
of Information and Library Science, 43(2), 127–157.
Huvila, I., & Huggett, J. (2018). Archaeological practices, knowledge work and digitalisation.
Journal of Computer Applications in Archaeology, 1(1), 88–100.
Jackson, S. E., Richissin, C. E., McCabe, E. E., & Lee, J. J. (2020). Data-informed tools
for archaeological reflexivity: Examining the substance of bone through a meta-analysis of
academic texts. Internet Archaeology, 55.
Jancsary, D., Höllerer, M. A., & Meyer, R. E. (2016). Critical analysis of visual and multimodal
texts. In Methods of critical discourse studies (pp. 180–204). SAGE.

134

I. Huvila

Jørgensen, E. K. (2015). Typifying scientific output: A bibliometric analysis of archaeological publishing across the science/humanities spectrum (2009–2013). Danish Journal of Archaeology,
4(2), 125–139.
Joyce, R. A. (2002). The languages of archaeology: Dialogue, narrative, and writing. Blackwell.
Kansa, E. C., Kansa, S. W., & Watrall, E. (Eds.). (2011). Archaeology 2.0: New approaches to
communication and collaboration. Cotsen Institute of Archaeology, UC Los Angeles.
Kohl, P. L., & Fawcett, C. (1995a). Archaeology in the service of the state: Theoretical
considerations. In P. L. Kohl & C. P. Fawcett (Eds.), Nationalism, politics, and the practice
of archaeology (pp. 3–18). Cambridge University Press.
Kohl, P. L., & Fawcett, C. P. (Eds.). (1995b). Nationalism, politics, and the practice of archaeology.
Cambridge University Press.
Kristiansen, K. (2014a). Towards a new paradigm? The third science revolution and its possible
consequences in archaeology. Current Swedish Archaeology, 22, 11–34.
Kristiansen, K. (2014b). What is in a paradigm? Reply to comments. Current Swedish Archaeology,
22, 65–71.
Lacan, J. (1966). Écrits. Éditions de Seuil.
Laclau, E., & Mouffe, C. (2001). Hegemony and socialist strategy: Towards a radical democratic
politics (2nd ed.). Verso.
Lafrenz Samuels, K. (2016). Transnational turns for archaeological heritage: From conservation to
development, governments to governance. Journal of Field Archaeology, 41(3), 355–367.
Laužikas, R., Dallas, C., Thomas, S., Kelpšienė, I., Huvila, I., Luengo, P., Nobre, H., Toumpouri,
M., & Vaitkevičius, V. (2018). Archaeological knowledge production and global communities:
Boundaries and structure of the field. Open Archaeology, 4(1), 350–364.
Liang, J., Gong, J., & Li, W. (2018). Applications and impacts of Google Earth: A decadal review
(2006–2016). ISPRS Journal of Photogrammetry and Remote Sensing, 146, 91–107.
Lucas, G. (2001). Critical approaches to fieldwork contemporary and historical archaeological
practice. Routledge.
Lucas, G. (2019). Writing the past: Knowledge and literary production in archaeology. Routledge.
Manovich, L. (2001). The language of new media. MIT Press.
Marila, M. (2020). Introductory notes to a speculative epistemology of archaeology. phdthesis,
University of Helsinki.
Marliac, A. (2005). Scientific discourse and local discourses: The case of African archaeology.
International Journal of Historical Archaeology, 9(1), 57–70.
Martín-Rodilla, P. (2015). An empirical approach to the analysis of archaeological discourse.
In A. Traviglia (Ed.), Across space and time: Papers from the 41st conference on computer
applications and quantitative methods in archaeology, Perth, 25–28 March 2013 (pp. 319–
325). Amsterdam University Press.
Matthews, C. N. (2004). Public significance and imagined archaeologists: Authoring pasts in
context. International Journal of Historical Archaeology, 8, 1–25.
Miller, C. R. (1984). Genre as social action. The Quarterly Journal of Speech, 70(2), 151–167.
Miroff, L. E., & Versaggi, N. M. (2020). Community archaeology at the trowel’s edge. Advances
in Archaeological Practice, 1–11.
Mizoguchi, K. (1997). The reproduction of archaeological discourse: The case of Japan. Journal
of European Archaeology, 5(2), 149–165.
Morgan, C. (2016). Video games and archaeology. SAA Archaeological Record, 16(5), 9–10.
Morgan, C., & Wright, H. (2018). Pencils and pixels: Drawing and digital media in archaeological
field recording. Journal of Field Archaeology, 43(2), 136–151.
Moser, S. (2007). On disciplinary culture: Archaeology as fieldwork and its gendered associations.
Journal of Archaeological Method and Theory, 14(3), 235–263.
Nordbladh, J., & Yates, T. (2014). This perfect body, this virgin text: Between sex and gender in
archaeology. In I. Bapty & T. Yates (Eds.), Archaeology after structuralism: Post structuralism
and the practice of archaeology (pp. 222–237). Routledge.
Oikarinen, T., & Kortelainen, T. (2013). Challenges of diversity, consistency, and globality in
indexing of local archeological artifacts. Knowledge Organization, 40(2), 123–135.

6 The Social Production of Discourse in Archaeology

135

Okamura, K., & Matsuda, A. (2011). New perspectives in global public archaeology. Springer.
Olsen, B. (2012). Archaeology the discipline of things. University of California Press.
Olsson, M. (2015). Making sense of the past: The information practices of field archaeologists. In
Presentation at the i3 conference, Aberdeen, Scotland.
Olsson, M. (2016). Making sense of the past: The embodied information practices of field
archaeologists. Journal of Information Science, 42(3), 410–419.
Palmer, M. H. (2013). (In)digitizing Cá uigú historical geographies: Technoscience as a postcolonial discourse. In A. Lünen & C. Travis (Eds.), History and GIS: Epistemologies,
considerations and reflections (pp. 39–58). Springer.
Pavel, C. (2010). Describing and interpreting the past: European and American approaches to the
written record of the excavation. Editura Universitatii din Bucuresti.
Pickering, A. (1995). The mangle of practice: Time, agency, and science. University of Chicago
Press.
Pluciennik, M. (1999). Archaeological narratives and other ways of telling. Current Anthropology,
40(5), 653–678.
Reisigl, M., & Wodak, R. (2009). The discourse-historical approach (DHA). In R. Wodak & M.
Meyer (Eds.), Methods of critical discourse studies (2nd ed., pp. 87–121). SAGE.
Richardson, L.-J. (2014). Public archaeology in a digital age. Ph.D. thesis, UCL.
Rostock, J. (2007). Arkæologi som forretning – om en diskurs med uheldige konsekvenser.
Arkæologisk Forum, 17, 33–39.
Said, E. W. (1979). Orientalism. Vintage Books.
Salminen, T. (2020). Arkeologian historia: tehtyä ja tehtävää. Muinaistutkija, 1, 35–47.
Scherzler, D. (2010). Das Ende des Frontalunterrichts Beobachtungen zu Archäologie und
Web 2.0 im Frühling 2011. Archäologische Informationen, 33(1), 99–111. http://www.dianescherzler.de/downloads/AI_33_Scherzler.pdf
Schlanger, N. (2012). Situations archéologiques, expériences coloniales. Les Nouvelles de
larchéologie, 128, 41–46.
Schnapp, A. (2012). La crise de l’archéologie, de ses lointaines origines à aujourd’hui. Les
Nouvelles de l’archéologie, 128, 3–6.
Shanks, M. (2012). The archaeological imagination. Left Coast Press.
Shanks, M., & Tilley, C. (1988). Social theory and archaeology. University of New Mexico Press.
Shnirelman, V. A. (1995). From internationalism to nationalism: Forgotten pages of Soviet
archaeology in the 1930s and 1940s. In P. L. Kohl & C. P. Fawcett (Eds.), Nationalism, politics,
and the practice of archaeology (pp. 120–138). Cambridge University Press.
Silliman, S. W. (2018). Engaging archaeology: 25 case studies in research practice. WiIey.
Sillitoe, P. (2002). Globalizing indigenous knowledge. In P. Sillitoe, A. Bicker, & J. Pottier (Eds.),
Participating in development: Approaches to indigenous knowledge (pp. 108–138). Routledge.
Simpson, F., & Williams, H. (2008). Evaluating community archaeology in the uk. Public
Archaeology, 7(2), 69–90.
Smith, L. (2004). Archaeological theory and the politics of cultural heritage. Routledge.
Smith, L. (2006). Uses of heritage. Routledge.
Smith, L. (2012). Discourses of heritage: Implications for archaeological community practice.
Nuevo mundo mundos nuevos.
Smith, L., & Campbell, G. (2017). The tautology of ‘intangible values’ and the misrecognition of
intangible cultural heritage. Heritage & Society, 10(1), 26–44.
Sommerlund, J. (2002). Demarcations and boundary objects: Scientific balancing acts in molecular microbial ecology. Ph.D. thesis, Copenhagen Business School.
Starzmann, M. T. (2012). Archaeological fieldwork in the Middle East: Academic agendas, labour
politics and neo-colonialism. In N. Schlanger, S. van der Linde, M. van den Dries, & C.
Slappendel (Eds.), European archaeology abroad: Global settings, comparative perspectives.
Sidestone Press.
Stobiecka, M. (2020). Archaeological heritage in the age of digital colonialism. Archaeological
Dialogues, 27(2), 113–125.
Stylianou-Lambert, T., & Bounia, A. (2016). The political museum. Routledge.

136

I. Huvila

Taylor, J., & Gibson, L. K. (2017). Digitisation, digital interaction and social media: Embedded
barriers to democratic heritage. International Journal of Heritage Studies, 23(5), 408–420.
https://doi.org/10.1080/13527258.2016.1171245
Thomas, J. (1993). Discourse, totalization and ‘the neolithic’. In C. Y. Tilley (Ed.), Interpretative
archaeology (pp. 357–394). Berg.
Tilley, C. (1989a). Discourse and power: The genre of the cambridge inaugural lecture. In D.
Miller, M. Rowlands, & C. Tilley (Eds.), Domination and resistance (pp. 40–62). Routledge.
Tilley, C. (1989b). Excavation as theatre. Antiquity, 63(239), 275–280.
Tilley, C. (1994). Interpreting material culture. In S. M. Pearce (Ed.), Interpreting objects and
collections (pp. 67–75). Routledge.
Trigger, B. G. (1989). A history of archaeological thought. Cambridge University Press.
Trigger, B. G. (1995). Romanticism, nationalism, and archaeology. In P. L. Kohl & C. P. Fawcett
(Eds.), Nationalism, politics, and the practice of archaeology (pp. 263–279). Cambridge
University Press.
Venclova, N. (2007). Communication within archaeology: Do we understand each other? European
Journal of Archaeology, 10(2–3), 207–222.
Wakefield, C. (2020). Digital public archaeology at must farm: A critical assessment of social
media use for archaeological engagement. Internet Archaeology, 55.
Walker, D. (2014). Decentering the discipline? Archaeology, museums and social media. AP:
Online Journal in Public Archaeology, S1, 77–102.
Waterman, S. (2014). Discourse and domination: Michel Foucault and the problem of ideology. In
I. Bapty & T. Yates (Eds.), Archaeology after structuralism: Post structuralism and the practice
of archaeology (pp. 79–103). Routledge.
Watson, S. (2019). Whither archaeologists? Continuing challenges to field practice. Antiquity,
93(372), 1643–1652.
Wetherell, M., & Potter, J. (1988). Discourse analysis and the identification of interpretative
repertoires. In C. Antaki (Ed.), Analysing everyday explanation: A casebook of methods (pp.
168–183). Sage.
White, H. (1975). Metahistory. Johns Hopkins University Press.
White, H. (1987). The content of form: Narrative discourse and historical representation. Johns
Hopkins University Press.
Williams, H., Pudney, C., & Ezzeldin, A. (2019). Public archaeology arts of engagement.
Archaeopress.
Wylie, A. (1985). Between philosophy and archaeology. American Antiquity, 50(2), 478–490. http:/
/www.jstor.org/stable/280505
Wylie, A. (2007). Doing archaeology as a feminist: Introduction. Journal of Archaeological
Method and Theory, 14(3), 209–216.
Wylie, A. (2017). How archaeological evidence bites back: Strategies for putting old data to work
in new ways. Science, Technology & Human Values, 42(2), 203–225.
Zorzin, N. (2015). Dystopian archaeologies: The implementation of the logic of capital in heritage
management. International Journal of Historical Archaeology, 19(4), 791–809.

Chapter 7

Dealing with Vagueness in Archaeological
Discourses
Cesar Gonzalez-Perez, Martín Pereira-Fariña, Patricia Martín-Rodilla,
and Leticia Tobalina-Pulido

Abstract Vagueness is an intriguing topic, especially in the humanities. It has been
treated as a problem that contaminates information and makes research harder,
but also as an expression of human subjectivity that enriches our accounts of the
world. Vagueness is studied by philosophers, treated by computer scientists, and
used by archaeologists intentionally or unintentionally. This chapter aims to provide
a comprehensive overview of how vagueness has been treated from philosophy
and computer science, and offer a synthetic theoretical framework to operationalise
vagueness on archaeological discourses that can be applied for practical purposes.
To illustrate this, an empirical study is described.
Vagueness is everywhere, and archaeology is no exception. From quantitative
measurements or datings to uncertain function or use assessments, archaeologists
deal with imprecise, inaccurate and uncertain information all the time. In addition,
vagueness is strongly embedded in language. Human language contains a number
of mechanisms to express vagueness, such as hedges (“approximately”, “might”)
or ranges (“between 12 and 15”). This means that any archaeological discourse is
likely to employ devices like these to describe relevant information.
In addition, vagueness is a computational challenge. Common representations
of knowledge that are stored on computer systems discard vagueness, thus losing
nuance and richness. Computer scientists have tried to incorporate these aspects into

C. Gonzalez-Perez (!) · L. Tobalina-Pulido
Incipit CSIC, Santiago de Compostela, Spain
e-mail: cesar.gonzalez-perez@incipit.csic.es; leticia.tobalina-pulido@incipit.csic.es
M. Pereira-Fariña
Department of Philosophy, University of Santiago de Compostela, Santiago de Compostela, Spain
e-mail: martin.pereira@usc.es
P. Martín-Rodilla
Department of Computer Science and Information Technologies, University of A Coruña, A
Coruña, Spain
e-mail: patricia.martin.rodilla@udc.es
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
C. Gonzalez-Perez et al. (eds.), Discourse and Argumentation in Archaeology:
Conceptual and Computational Approaches, Quantitative Archaeology
and Archaeological Modelling, https://doi.org/10.1007/978-3-031-37156-1_7

137

138

C. Gonzalez-Perez et al.

data through approaches such as fuzzy logic or protoforms, which provide richer
accounts of what is being represented, but at the cost of higher complexity.
This chapter begins with a philosophical introduction to vagueness, and then it
describes how vagueness has been treated from computer science. Then, a vagueness
framework is proposed based on the previous, and an empirical study is described
to illustrate practical applications.
Keywords Vagueness · Archaeological discourse · Imprecision · Inaccuracy ·
Uncertainty · Error

7.1 Philosophical Groundings of Vagueness
If you are removing a grain rice from a pile of rice one by one, at what point will the
pile cease being a pile? Where does the Everest Mountain start? How many hairs
should be removed from a head for it to become bald? What is the height of a tall
person? How many parts can be substituted in a classic car before it is not original
anymore? This sort of questions have been a matter of philosophical reflection from
Ancient Greece, when Eulibides of Miletus wondered what constitutes a “stone
heap” (van Deemter, 2010). This is the so called Sorites paradox, which has been
formulated in a variety of ways, and this is a typical one:
1 stone does not make a heap.
If 1 stone does not make a heap, then 2 stones neither.
If 2 stones do not make a heap, then 3 stones neither.
...
Therefore, no finite number of stones will make a heap.

This paradox shows the key point of vagueness, a phenomenon that is usually
defined as the existence of borderline cases (van Deemter, 2010; Hyde, 2008; Keefe,
2000; Williamson, 1996). This is a sort of grey area where there is no clear cut
between what is the case and what is not, such as in the case of “heap” above.
Vagueness is a very common phenomenon in natural language, which should not be
confused with other linguistic phenomena such as ambiguity, meaninglessness or
uncertainty.
Next, we will introduce an overview of the concept of vagueness and the key
points of four philosophical approaches to vagueness.

7.1.1 Philosophical Approaches to Vagueness
Vagueness is an ambiguous concept, and vague itself. In a technical sense (Hyde,
2008), it should not be confused with the concepts of ambiguity, meaninglessness
or uncertainty. Thus, a statement such as “I saw you in the bank” is not vague but

7 Dealing with Vagueness in Archaeological Discourses

139

ambiguous because it has well defined meanings (I could have seen you either in
a bank office or sitting in the bank of the river having a picnic) but I do not know
which one the speaker is referring to. Regarding meaninglessness, we say that an
expression such as “boomrap” is meaningless because its reference is unknown and
does not entail the existence of borderline cases. Lastly, a sentence such as “she will
arrive on time or 10 minutes late” shows that we do not know exactly when a future
event is going to happen, but we still have a clear criterion to determine whether the
statement is true or false.
Vagueness is complex as well. Different kinds of vagueness can be recognised
in natural language (Hyde, 2008), such as degree-vagueness and combinatoryvagueness or concept application or individuation, but all of them show certain
degree of resemblance. On the one hand, we can talk about absolute borderline
cases, which refer to those predicates in which no matter how much conceptual
analysis of empirical investigation we do, we will never be able to determine
whether the predicate can be applied or not (e.g., “bald”, “heap”, etc.). On the
other hand, we can talk about compositional borderline cases, which derive from
a combination of a variety of (eventually) crisp features or conditions that generates
a grey area of borderline cases. For instance, the concept of country, although we
can provide a more or less well-defined list of features that a country must satisfy,
there are certain territories having statuses that are debatable, despite satisfying each
criterion independently. Some well-known examples are Hong Kong, Kosovo, or
Palestine.
For the sake of simplicity, we just focus on absolute borderline cases, such as
predicates like “tall”, “heap”, “bald”; hedges like “approximately”, “more or less”,
etc.; quantifiers like “many”, “most”, etc. This is precisely the starting point of
the contemporary studies of vagueness. Russell, in his seminal paper Vagueness
(Russell, 1923), set the grounds for the contemporary studies of vagueness arguing
that vagueness is a purely linguistic phenomenon, a semantic one. The main function
of natural language is the representation of the world but, as a representational
system, it is imperfect, and vagueness is some kind of defect in it (Hyde, 2008).
Well defined formal languages, such as classical logic, are free of this problem;
therefore, and according to this take, the only right way to address vagueness is to
eliminate it. This approach is also compatible with Russell’s logic atomism project,
which rejected the existence of vague entities and defended that neither classical
logic or semantics are suitable for the analysis of natural language.
Russell’s philosophical project lost his prevalence when a new conception of
natural language emerged in the 1950s: ordinary language is useful for much more
than just representing or talking about the world; actually, when we are saying
something, we are doing something (Austin, 1989; Wittgenstein, 1989). Under
this new framework, new views of vagueness arose, being the epistemic and the
pragmatic the most relevant ones.
The epistemic view (Williamson, 1996) argues that vagueness is not a semantic
problem but an epistemic one. Its central point is to reject the existence of borderline
cases: predicates are always clear, but we just do not know it. Therefore, linguistic
vagueness is derived from a certain type of ignorance, namely lack of information.

140

C. Gonzalez-Perez et al.

Therefore, together with the semantic view, both assume that reality is crisp, i.e.
there are no vague objects (Evans, 1978).
The pragmatic view (van Deemter, 2010) assumes the existence of borderline
cases, but rejects that this constitutes a semantic or epistemic problem. Vagueness
arises as a feature (not a defect) of the relation between the users of language
and language itself. Therefore, vagueness cannot be studied independently of its
communicative function. Moreover, vagueness is not just present in language, but
everywhere. It appears in many other realms such as in the classification of species
(van Deemter, 2010) (Chap. 2) or when we try to define what obesity means (Chap.
3). Thus, its project is not eliminating vagueness but understanding how we handle
it when it appears and how we assume specific conventions to interpret it according
to context (Keefe, 2000). With respect to the existence of vague entities in the world,
the pragmatic approach presupposes the existence of vague properties, objects and
even identities (Akiba, 2014).

7.1.2 Theories of Vagueness
As we said, Russell’s goal was to eliminate vagueness from natural language
because it was a defect. However, today we know that this is impossible, and more
recent approaches to vagueness aim to deal with that as a feature of language. The
three most relevant theories for this are supervaluationism, subvaluationism, and
gradability.
Supervaluationism is based on bivalued classical semantics, that is, having true
and false as the only proper truth values. It assumes that the truth value of a vague
statement can be indeterminate, i.e., vagueness generates truth value gaps (Keefe,
2000). This leads to assume a non-classical semantics, since, for instance, the truth
value of a sentence does not only depend on its constituents (solving, in this way,
the Sorites paradox by rejecting the conjunction of its premises), but that can be
classical for individual specifications. Therefore, supervaluationism is a sort of nonclassical metalanguage compatible with a classical logic study of language.
Subvaluationism (Hyde, 2008) is the counterpart of supervaluationism and a
form of paraconsistent logic (a conjunction of contradictory statements can be
collectively true while not being all individually true). Instead of admitting truth
value gaps, this theory admits truth value gluts, i.e., predicates or statements
expressing borderline cases can be true and false simultaneously. As opposed to
supervaluationism, subvaluationism rejects classical semantics and the law of the
excluded middle; therefore, it is not compatible with classical logic. To avoid this
criticism, subvaluationism argues that accepting the overlapping of truth values in
a sentence does not entail accepting it for every sentence, but still, it has detractors
(Keefe, 2000).
Supervaluationism and subvaluationism are theories of vagueness that reject the
epistemic view of vagueness, as they consider vagueness to be not a matter of
ignorance but a semantic phenomenon. However, they agree with the epistemic view

7 Dealing with Vagueness in Archaeological Discourses

141

regarding the inexistence of vagueness of the world (i.e. there are not vague objects)
but there is vagueness in the world, result of how our representational tools and skills
work.
Gradability, the third main theory of vagueness, defends the existence of vague
properties and vague objects. Vague predicates, such as “tall” or “small”, are vague
because they allow us to talk about properties that are essentially vague. Vague
nouns, such as “mountain”, or determiners, such as “many”, denote vague objects
or situations because there is no way to define perfect boundaries for them. However,
the fact that a predicate is vague does not mean that it can only refer to borderline
cases; on the contrary, crisp cases can exist as well. For instance, if I am at the top of
a mountain, I am clearly in the mountain; if I am in the valley next to the mountain
and I can see the mountain, I am clearly not in the mountain. If I start to walk towards
the mountain, I will be in the mountain at some point without being aware when that
changed has happened; in other words, there is no dichotomy between being in/out
of the mountain, but a continuum (van Deemter, 2010). Thus, gradability is a matter
of “to what extent?” and assumes that truth values constitute a continuum. This
is the foundation of many-valued logics, such as fuzzy logic (discussed in further
sections of this chapter), which define a functional semantics that rejects the law
of the excluded middle but allows us to define the meaning of expressions such as
“almost true”.
The pragmatic view is the main supporter of the gradability approach. The
assignment of a truth value to a specific function is a matter of context and social
convention, in other words, a matter of how the predicate is being used by people.
This limits its logical study in classical terms and requires a more empirical study
based on speakers and language use in different contexts. Defining a representative
sample for studies like this constitutes a challenging issue. In spite of that, current
studies suggest that contextual circumstances and background information make
users choose the crisp definition that is most suitable for each communicative
situation, although this can change if the circumstances change (van Deemter, 2010).
An open question affecting all of these theories is high-order vagueness (Hyde,
2008): Is it possible to define a clear-cut boundary for the border of borderline cases?
Of course, this question can be recursively formulated ad infinitum. This generates
additional questions, such as how many truth values are necessary to handle vague
predicates, but these kinds of questions are out of our scope in this book.
Next, we describe what and how these different views have been used for the
development of mathematical and computational tools for handling vagueness in
practical applications.

7.2 Computational Treatments of Vagueness
The theory of computation has inherited many of the approaches analysed in the
previous section for applications in computer systems, which has allowed us to
incorporate vagueness into algorithms and decision-making systems. Note that the

142

C. Gonzalez-Perez et al.

works in this area are part of one (or several) specific disciplines within computer
science and computation theory. Due to that, it is not the purpose of this section to
make an exhaustive review of all of them, but to provide the reader with an overview
of most outstanding computational studies and application of vagueness, pointing
out especially the approaches applied to archaeological discourses.
Firstly, we can find strongly mathematical treatments of vagueness. Most of
them are based on an margin-of-error paradigm, such as the Interval Predictor Model (Lacerda & Crespo, 2017), models that estimate uncertainty regions
of the information contained. Statistical-based approaches are also common in
mathematical-computational modelling of vagueness, which generally associate
probability functions with especially vague attributes (features of each data type) of
the information that we are modelling. The probability functions (Fermüller et al.,
2017) can be indicators of the precision (used in inferential statistics) or the certainty
degree of the attribute values (that is, measures of error for a given value). These
solutions explicitly model different aspects or dimensions of vagueness and have
been widely applied for years in archaeological contexts, such as in studies with GIS
data (Lieskovský et al., 2013; Runz et al., 2007) or 3D reconstructions (Nicolucci
& Hermon, 2010; de Runz et al., 2013), or structured data and classification
mechanisms (Hermon & Niccolucci, 2002). However, these models assimilate the
vagueness of the information as a function of margin of error, a semantic paradigm
which assume that some dimension of the vagueness is always due to the deviation
of our knowledge in relation to a “true” reference value and try to compute that
deviation (Martin-Rodilla et al., 2019b). This approach is hardly generalizable
when the data sources are unstructured, expressed in natural language and from
humanistic or social domains (Martin-Rodilla et al., 2019b; PROVIDEH, 2018).
Secondly, there are mathematical accounts of vagueness grounded on philosophical ideas (such as Black (Black, 1937)). These works laid the foundations
for the modelling of the complex phenomenon of vagueness at a computational
level. Years later and based on Black’s works, Zadeh’s fuzzy logic-based theory
(Zadeh, 1996) constituted a milestone at the computational level, allowing for the
first time the computational representation and treatment of vagueness. It is a less
error-focused approach (Zadeh, 1996, 2010), which develops specific techniques
(for example, fuzzy sets and degrees of probability, rule bases, linguistic summaries
such as fuzzy description of variables or fuzzy quantifiers, and similarity measures)
to model vague aspects of information. Since then, fuzzy logic has been widely
used as a method of representation of vagueness in computer systems, being able to
find implementations of fuzzy logic of different nature (de Silva, 1995; Syropoulos,
2016), also with some attempts in archaeology (Baxter, 2009; Reeler, 1999; Taheri et
al., 2019). We can also find some novel approaches from computational theory itself,
closer to the classical programming languages representations (Coletti, 2020) or
developing specific informational ambiguity metrics (Fabbrini et al., 2001; Fantechi
et al., 2018).
In recent years, some comprehensive approaches have been developed from
software engineering, including vagueness aspects within conceptual models of
software applications (Abualdenien & Borrmann, 2020; Martin-Rodilla et al.,

7 Dealing with Vagueness in Archaeological Discourses

143

2019b), especially after various attempts to identify features for which vague information can be conceptually modelled, such as set membership, interval membership,
incompleteness, and related issues (Jing et al., 2008). At a software level, there are
translations of this conceptualization in implementations both in relational and nonrelational databases (Martin-Rodilla et al., 2019a).
All these approaches have been successfully applied in heterogeneous domains:
bio-medical sciences (He & Smit, 2021), logistics (Ottomanelli & Wong, 2011), egovernment and infrastructures, energy resources, etc., and in which the so-called
expert or decision-making assistance systems (de Silva, 1995). Their implementation involves in most cases certain adaptations of fuzzy logic depending on
the nature of the domain data. However, the intrinsic presence of vagueness in
natural language is still a challenge for the formal representation of fuzzy theory
in computation, especially when the source we wish to deal with at a computational
level is discourse in humanistic and social science domains, characterized by a
great presence of vagueness in the produced narratives (Hermon & Niccolucci,
2002; Martin-Rodilla et al., 2019b). In order to address this challenge, many of the
current fuzzy-level approaches in computing incorporate certain interdisciplinary
and hybrid approaches that specifically attempt to improve the computational
treatment of vagueness in natural language narratives. We can find, for example,
implementations of fuzzy logic with linguistic characteristics, such as HFLTS (Ashtiani & Azgomi, 2016), Fuzzy Natural Logic (Novak, 2017) or hybrid approaches
for incorporating linguistic aspects to new protoforms (an abstracted model which
instances represent knowledge about data (Zadeh, 2002)) in fuzzy logic (RamosSoto & Martin-Rodilla, 2021), and specific studies of the performance of these
models in applications that use natural language, such as automated Question
and Answering (Q&A) systems (Gupta et al., 2018) or automatic generation of
language (Ramos-Soto & Martin-Rodilla, 2021). Also, linguistic-analytical studies
are common in this area, determining metrics to quantify aspects of vagueness in
specific linguistic categories or expressions (in different languages) such as markers
(Malyuga & McCarthy, 2018) or adjectives (Gasmi & Bourahla, 2017; Lassiter
& Goodman, 2017), and even analytically selecting the most determining aspect
of vagueness given an expression in natural language (that can express or reflect
various aspects of vagueness at the same time) (Raskin & Taylor, 2014).
Continuing with the hybrid approaches with a linguistic component, interdisciplinary research lines from linguistics apply a corpus-based approach to the
problem of the representation of vagueness at a computational level, analysing
large volumes of texts in different languages and then automating certain detection
of patterns and/or specific expressions with a vague component (Lebanoff & Liu,
2018; Rashkin et al., 2017). This approach is used both in specific languages,
such as Romanian, English or German (Dinu et al., 2017; Leto Russo, 2019; Li,
2019; Quammie-Wallen, 2021), as well as using multi-language parallel corpus such
as Russian/English (Malyuga & McCarthy, 2018) or German/Spanish/Mandarin
(Cutting, 2019), among others, and is applied using different corpus sources and
goals, such as medical corpus, analysis of fake news (Rashkin et al., 2017), political
discourses (Leto Russo, 2019; Rashkin et al., 2017) or historical corpus (Dinu et

144

C. Gonzalez-Perez et al.

al., 2017; Toledo, 2017). Finally, there are computational implementations arising
from formal theories in discourse, such as Rhetorical Structure Theory (Mann
& Thompson, 1987) and/or formalization of argumentation, such as Inference
Anchoring Theory (Janier et al., 2016). However, its specifications do not deal
with aspects of vagueness explicitly, leaving the decision of the categorization of
vague expressions for the modeller to tackle by hand. Also, some of the automation
attempts in RST, for example (Joty et al., 2015), apply automatic classification
models for the discourse analysis, but their implementation does not explicitly
include vagueness support.
In summary, the computational treatment of vagueness was in the past, and
continues being in the present, deeply influenced by mathematical models based
on margins-of-error paradigms or by fuzzy logic models incorporating recently
updated and hybridized variants. In the case of margins-of-error models, their
semantics makes it difficult to apply them to treat archaeological discourses and
their vagueness due to its own subjective, narrative-based and vague nature. In the
case of the fuzzy logic-based models, the new approaches to hybrid intelligence
and hybrid corpus methodologies with machine learning algorithms show certain
promising results (Gupta et al., 2018), although their application in humanities
domains (and in particular with archaeological discourses as sources) is still novel,
not standardized between projects and residual.
The existence of formal theories that allow a certain extent standardization of the
computational treatment of the vagueness for the archaeological discourses, with
some flexibility of application between projects that would allow the inclusion of
the different dimensions of vagueness in archaeological discourse analysis, together
with algorithms and multilingual techniques for treating them, would allow further
progress in this area. The following sections develop this need and propose an
approach for the specific conceptualization of vagueness based on the previous
philosophical foundations, and especially oriented to the application in archaeology
through the computational techniques described.

7.3 Concept of Vagueness as a Conceptual Modelling Issue
After having described how vagueness has been tackled from philosophy and
computation, we propose now a conceptualisation of vagueness that is grounded on
conceptual modelling (Gonzalez-Perez, 2018; Olivé, 2007). Conceptual modelling
is a discipline that aims to provide theories and techniques to represent the world in a
manner that is useful for humans to build information systems. It is closely related to
ontologies and ontology engineering, but the latter emphasises machine processing
whereas conceptual modelling has traditionally emphasised human communication
(Gonzalez-Perez, 2017). Conceptual models are often expressed in terms of types
and tokens (Wetzel, 2018). Types correspond to what categories exist in the world

7 Dealing with Vagueness in Archaeological Discourses

145

(often called classes), what properties these have (attributes), and how they relate to
one another (via associations), whereas tokens correspond to what class instances
exist (objects), what properties they exhibit (values), and how they connect to each
other (links).
In this context, we see vagueness as a property related to the imperfection, lack
of clarity, lack of detail, doubt, and unreliability of object existence, values or links.
Vagueness cannot be defined comprehensively, as it encompasses phenomena of
very different nature. It is not a classical category having a criteria-based definition,
but rather a Lakoffian radial category (Lakoff, 1990) comprising phenomena having
a rough family resemblance.

7.3.1 Sources of Vagueness
In previous works, we have argued that vagueness originates mostly from two kinds
of sources: ontological and epistemic (Gonzalez-Perez, 2018; Tobalina-Pulido &
Gonzalez-Perez, 2020).
Ontological Vagueness comes from the fact that some things in the world do not
have clear-cut or perfectly defined boundaries but rather exhibit gradual change.
A typical example, often used in philosophy and presented in a previous section
of this chapter, is that of the limits of a hill. If we are sitting at the top of a hill,
we can certainly state that we are on the hill. Similarly, if we are sitting at the
bottom of the valley, we can safely state that we are not on the hill. However,
if we start walking from the top of the hill downward towards the valley, there
is no clear boundary at which we switch from being on the hill to not being on
it. Rather, it is a gradual change, and therefore we say that the limits of the hill
are ontologically vague. Future or imaginary events are also a common source of
ontological vagueness. For example, the precise duration of a task planned for the
future has not been established yet (since the task has not been yet carried out), and
therefore any statement about it is ontologically vague.
Epistemic Vagueness comes from the fact that our knowledge about the world is
usually imperfect and incomplete. For example, we may be not sure about how many
children Alexander the Great had. Certainly, he either had none or one or two, etc.
But since we are not sure, any statement that we make about it will be necessarily
epistemically vague. Also, epistemic vagueness is sometimes purposeful injected
into a statement by blurring or hedging what we say for semantic purposes. For
example, we may say that there were between 15 and 20 houses in a village that we
visited last year, as we cannot recall the specific number. Using an interval instead
of committing to a particular number indicates our lack of perfect knowledge for the
sake of being correct.

146

C. Gonzalez-Perez et al.

7.3.2 Vagueness Variables
In order to study vagueness, we must operationalise it in the form of different
variables that can be described and measured. We propose the following.
7.3.2.1

Imprecision

Imprecision is the absence of detail in a statement in relation to what is being
represented. In other words, a statement is very precise if it contains a lot of detail.
For example, if we say that “Mount Everest is 8800 m high”, we are being quite
imprecise, as Everest is in fact 8848.86 m high according to the latest measurements.
Expressions such as “more or less”, “roughly” or “approximately” usually
indicate imprecision. However, imprecision can be present even in the absence of
any lexical marker, like in the previous example. It is common that, especially in
colloquial language, detail is removed for simplicity and conciseness.
Imprecision is a property of statements, regardless of what they aim to represent.
Imprecision may be originated by:
• The inherent ontological vagueness of the entity being described. For example,
a statement such as “the Himalayas is 700 km long” is imprecise due to the
ontological vagueness of the length of any mountain range. No matter how
good our instruments and technologies are, we cannot obtain a significant
measurement that is much more precise.
• The limitation of our instruments and technologies. For example, a common field
thermometer can tell us that the temperature is 19 ◦ C, or perhaps 19.3 ◦ C. These
measurements are likely to be imprecise in the sense that expressing the actual
temperature may use more significant figures. This kind of limitation is often
called the systematic error of the instrument or technology, and should not be
confused with the measurement error, as described in further sections.
• The intentional removal of detail. For example, if we are unsure about how many
children Alexander the Great had, we can say that “he had between 4 and 6”. This
lacks detail but, as described in further sections, is a more reliable expression than
a more detailed one.
7.3.2.2

Inaccuracy

Inaccuracy is the difference between the content of a statement and the entity being
represented by it. In other words, a statement is very accurate if it describes this
entity very faithfully. For example, if we state that “there are 21 post holes in this
area” when in fact there are 21 post holes, we are being fully accurate. However, if
in reality there are 16 post holes, then we are being quite inaccurate.
Inaccuracy is a property of the relationship between a statement and that which
it aims to represent. For this reason, inaccuracy can be difficult to detect by looking

7 Dealing with Vagueness in Archaeological Discourses

147

at the statements only. We need to compare what is being said to what is being
described in order to ascertain inaccuracy.
Inaccuracy is usually originated by epistemic vagueness. In other words, it is our
lack of knowledge what makes our statements inaccurate.
7.3.2.3

Uncertainty

Uncertainty is the degree of doubt that we possess about a statement. In other words,
a very certain statement is one for which we feel highly confident and reliant. For
example, in a statement such as “I think this should be a burial site”, the lexical
markers “I think” and “should” clearly point at uncertainty and indicate that the
speaker harbours some doubt about its content.
Markers such as “perhaps”, “maybe” or “I think” usually indicate uncertainty.
Uncertainty is a property of statements, regardless of what they aim to represent.
Uncertainty is usually originated by epistemic vagueness, that is, our imperfect
knowledge about the world.
7.3.2.4

Error

Error is the difference between the contents of two or more statements that aim
to represent the same thing. In other words, an error-free statement is one that
coincides with other statements representing the same thing. For example, if we
measure the length of a tomb twice and obtain 14.2 m and 13.9 m as results, the
error is given by the fact that the two measurements do not coincide. The larger the
difference, the larger the error.
Note that error only makes sense when the two (or more) statements represent the
same thing. However, determining whether two statements represent the same thing
or not can be tricky. For example, imagine two specialists producing independent
reports on the conservation status of a monument, the first one concluding that the
monument does not need restoration and the second one concluding that it does. This
can only be considered erroneous if the specialists used the same techniques and
approaches to assess the state of the monument and share the same goals. As soon as
significant subjective issues are introduced, it can be argued that the two statements
do not represent the same thing and, therefore, no error exists. For example, if two
neighbours provide their respective opinions on the worth and value of a derelict
monument risking demolition, we should not appeal to error when they disagree, as
their opinions cannot be considered representations of the monument itself but of
the particular values and preferences of each neighbour.
Error, as defined by this proposal, includes measurement error, which is a
property of how instruments and techniques are used for a specific task. In our
previous example about measuring a tomb twice, perhaps the person making
the measurements did not secure the measurement tape correctly and it slipped,
producing an error. However, error in this proposal excludes systematic error, which

148

C. Gonzalez-Perez et al.

is a property of the instruments and technologies that we use rather than our use of
them, and which we consider to be part of imprecision as described in a previous
section.

7.3.3 Relationships Between Vagueness Variables
Several relationships exist between vagueness variables.
7.3.3.1

Imprecision Decreases Uncertainty

In general, the more imprecise a statement it, the less uncertain it becomes. In other
words, removing detail from a statement tends to make it more certain, because the
less detail it contains, the more covering it is, the weaker the commitment it entails,
and the better chances that it is correct.
For example, the statement “Alexander the Great had 5 children” is very precise,
as it conveys a specific figure. However, the only state of affairs that would make it
true is if Alexander the Great had indeed 5 children, so we are reasonably uncertain
about it. Changing the statement to “Alexander the Great had 4 or 5 children”
removes some detail and makes the statement more imprecise, but also makes it
more certain as we are now covering more options than before. Changing it further
to “Alexander the Great had between 0 and 50 children” makes it extremely certain
but also extremely imprecise.
Similarly, the statement “the site was abandoned in 2257 BCE” is quite precise
but probably very uncertain, as it is unlikely that we have reliable information about
an event that happened over 4000 years ago. Hedging the statement as “the site was
abandoned around 2200 BCE” removes detail but gains in certainty. And adding a
margin like in “the site was abandoned in 2200 BCE ± 300” makes it much more
certain, albeit less precise.
Precision makes a statement informative, whereas certainty makes it reliable. A
statement such as “the site was abandoned in 2257 BCE” is very informative, but
it can be unreliable if we are not certain of what it says. At the opposite end of the
spectrum, a statement such as “Alexander the Great had between 0 and 50 children”
provides nearly no information, but is highly reliable as it is almost certainly true.
Often, we must find a useful balance between imprecision and uncertainty, for
example by purposefully removing detail to gain certainty, which is a common
technique in, for example, archaeological dating.
7.3.3.2

Imprecision Increases Inaccuracy

In general, the more imprecise a statement is, the more inaccurate it becomes. In
other words, removing detail from a statement tends to make it less faithful to what

7 Dealing with Vagueness in Archaeological Discourses

149

it aims to describe, as the absence of detail moves the stated content towards nice
round figures which, under the usual interpretation, are less likely to be accurate.
For example, the statement “the main tower is 22.7 m tall” is quite precise, and
probably quite accurate if the measurement was properly done. Instead, “the main
tower is 22 m tall” is more imprecise and, consequently, more inaccurate, as the
removal of detail is moving the conveyed information away from the actual height of
the tower. And “the main tower is roughly 20 m tall” is very imprecise and probably
very inaccurate”, as the tower is unlikely to be exactly 20 m tall.
Consider, however, that a very precise statement can be extremely inaccurate. In
other words, a high amount of detail does not entail a faithful representation. For
example, imagine an archaeological site having a protection perimeter of 2.15 km2 .
A statement such as “the protection perimeter is 4.45 km2 ” provides a quite detailed
description but is wildly inaccurate. However, a statement such as “the protection
perimeter is roughly 2 km2 ” is more imprecise but much more accurate. In other
words, precision is useless if we are not accurate.
7.3.3.3

Error Increases Uncertainty

In general, the more error we have, the more uncertain we are. In other words,
detecting frequent and large discrepancies between descriptions tends to make us
less sure about what we say, as it is difficult to choose between them.
For example, imagine that a set of pottery fragments is dated using two different
techniques. If one yields an estimated date of 3200 BCE and the other estimates it
as 3100 BCE, we can be quite confident about the pottery’s age. However, if the
dating techniques yield estimates of 3200 BCE and 4500 BCE, we cannot be sure
of which number is better.

7.4 Empirical Study
In order to test the proposed vagueness framework, an empirical study was carried
out. The aim was to determine and characterise the perception of vagueness
variables expressed on text by readers in terms of its textual expression, under the
hypothesis that vagueness perception obtained empirically from specialists should
correlate with vagueness variables as measured in the texts.
Two well-known archaeological elements were selected: the Roman Villa of
Liédena (Navarra, Spain), henceforth referred to as “Villa”, and the Visigoth bone
brooch of Santa María de Hito (Cantabria, Spain), referred to as “Brooch”. For each
of these, four text fragments in Spanish between 200 and 1000 words each were
extracted from various publications and labelled A to H, as shown in Table 7.1.
The experiment was organised as follows. Imprecision and uncertainty were
measured for each text fragment by counting lexical markers, as described below.
Then, different archaeologists were asked to read the texts and score each in relation

150

C. Gonzalez-Perez et al.

Table 7.1 Selected text fragments for the empirical study
Element
Villa

Text
A
B
C
D

Brooch

E

F

G

H

Reference
Altadill, J. 1921. “Los mosaicos romanos de Liédena”, Boletín de la
Comisión de Monumentos Históricos y Artísticos de Navarra, 62–63.
Mezquíriz Irujo, M. A. 2009. “Las villae tardorromanas del Valle del
Ebro”, Trabajos de Arqueología Navarra, 21, 222–223.
Taracena Aguirre, B. 1950. “Excavaciones en Navarra, La villa romana de
Liédena”, Príncipe de Viana, 38–39, 14–15.
Vizcaíno León, D. et al. 2013. “La reconstrucción virtual del patrimonio
arqueológico al servicio de la divulgación y puesta en valor de la Villa
Romana de Liédena (Navarra, España)”, VAR, 4, 104–108.
Gimeno García-Lomas, R. 1978. “Hallazgo de un broche alto medieval
trabajado en hueso”, Boletín del Seminario de Estudios de Arte y
Arqueología, 44, 430–432.
García Guinea, M. A. 2006. “Broche de cinturón (Necrópolis de Santa
María de Hito), Apocalipsis: el ciclo histórico de Beato de Liébana:
catálogo de la exposición”, Santillana del Mar.
Europapress, 2015. “Un broche de hueso encontrado en la necrópolis
medieval de Santa María de Hito, Pieza del Mes de la UC”. https://www.
europapress.es/1antabria/cultura-deporte-00760/noticia-broche-huesoencontrado-necropolis-medieval-santa-maria-hito-pieza-mes-uc-20,150,
306,171,046.html
Gutiérrez Cuenca, E. & Hierro Gárate, J. A. 2018. “Broche de cinturón de
Santa María de Hito” La pieza del mes 2014–2016, Museo de Prehistoria y
Arqueología de Cantabria, 24–25.

The actual text contents are not shown for brevity

to these variables through an online survey. Finally, the assigned scores were
compared against the measured values for each text, looking for correlations.

7.4.1 Measuring Imprecision and Uncertainty
Imprecision and uncertainty were measured by counting the number of relevant
lexical markers in each text, and then dividing this into the text word count and
multiplying by 1000 to obtain an appropriately scaled index. For imprecision,
markers included qualifiers such as “some”, “quite”, “approximately” or “much”;
approximate dating expressions such as “Constantine coin” or “Later Roman
Empire” or those spanning a range, such as “first to third centuries”. Table 7.2 shows
the results.
In the case of uncertainty, markers included subjectivity expressions such as “I
believe that” or “I think”; hedges such as “maybe” or “it seems likely”, and question
marks or similar signs indicating doubt associated to data, such as in “1.5 m?”. Table
7.3 shows the results.

7 Dealing with Vagueness in Archaeological Discourses
Table 7.2 Imprecision
measurements

Element
Villa

Brooch

Text
A
B
C
D
E
F
G
H

Marker count
27
29
42
13
12
12
8
17

151
Word count
769
654
1056
190
562
215
202
528

Imprecision
35.11
44.34
39.77
68.42
21.35
55.81
39.60
32.20

Imprecision is calculated as 1000 × MarkerCount/WordCount
Table 7.3 Uncertainty
measurements

Element
Villa

Brooch

Text
A
B
C
D
E
F
G
H

Marker count
10
11
12
2
5
6
3
7

Word count
769
654
1056
190
562
215
202
528

Uncertainty
13.00
16.82
11.36
10.53
8.90
27.91
14.85
13.26

Uncertainty is calculated as 1000 × MarkerCount/WordCount

7.4.2 Survey and Assigned Scores
Two online surveys were created, one for each archaeological element, using the
Google Forms platform. Both surveys had the same structure of four parts. The
first part gathered demographic data of respondents, in terms of work experience
in archaeology and their degree of experience in the subject of the texts (either
Roman villae or Visigoth brooches). The second part gathered information on how
familiar each respondent was with each archaeological element, in terms of having
visited, read about, or worked on it. The third part gathered information on perceived
precision and certainty for each text, asking the respondent to assess the precision
and certainty of each text fragment by using a 5-point Likert scale quantified in
a scale of 0–10. In addition, a third variable was added as an additional check:
respondents were asked to assess the absence of relevant details and information
from each text. Finally, the fourth part of the survey allowed respondents to make
comments on the survey, make suggestions for improvement, and leave their email
address to be informed of the results.
It must be highlighted that the surveys were phrased in terms of precision and
certainty, whereas vagueness measurements were made in terms of imprecision
and uncertainty. Although the vagueness framework presented in this chapter
conceptualises variables in “negative” form, a survey phrased in “positive” form

152

C. Gonzalez-Perez et al.

was considered easier to understand for specialists not familiar with this vagueness
theory.
The first survey, corresponding to the Villa, was responded by 26 archaeologists.
The second survey, corresponding to the Brooch, was responded by 15 archaeologists. Table 7.4 shows the survey results.

7.4.3 Discussion of Results
The major hypothesis of the study was that correlations should be observed between
vagueness measurements (as shown in Tables 7.2 and 7.3) and vagueness reported
by specialists (as shown in Table 7.4). In particular, measured imprecision was
expected to negatively correlate with perceived precision, and measured uncertainty
was expected to negatively correlate with perceived certainty. In addition to studying
these, correlations were also investigated between all additional pairs of measured
against reported variables.
Figure 7.1 shows the results for the primary hypothesis. For the Villa element,
reported precision negatively correlated with measured precision, but very weakly.
However, reported certainty negatively correlated with measured uncertainty to a
greater extent. For the Brooch element, reported precision negatively correlated with
measured precision quite strongly, and reported certainty negatively correlated with
measured uncertainty much more weakly. This supports the proposed hypothesis.
For the Villa element, a weak positive correlation was also found between
reported certainty and measured imprecision (R2 = 0.2057). For the Brooch
element, weak negative correlations were found across variables, that is, between
reported precision and measured uncertainty (R2 = 0.1734) and between reported
certainty and measured imprecision (R2 = 0.1908).
Overall, the results seem to support the hypothesis. However, the low number
of texts employed and the weak correlations do not allow us to draw strong
conclusions. At this stage, we believe that there may exist additional factors,
Table 7.4 Average and standard deviation for perceived precision, certainty and absence as
reported by specialists through the surveys, in a scale of 0–10
Element
Villa

Brooch

Text
A
B
C
D
E
F
G
H

Precision
average
4.10
5.30
7.10
4.80
7.33
5.17
3.33
7.17

Precision
std. dev
2.22
2.58
2.31
2.23
1.93
2.13
2.69
1.25

Certainty
average
3.30
4.90
7.20
6.40
7.00
6.17
4.67
7.50

Certainty
std. dev
2.32
2.50
2.16
2.65
2.27
2.01
3.14
1.29

Absence
average
7.20
5.50
4.10
5.50
6.50
7.00
8.00
4.17

Absence
std. dev
2.68
2.65
2.63
2.65
2.55
2.45
2.08
1.49

7 Dealing with Vagueness in Archaeological Discourses

153

Villa (im)precision

Villa (un)certainty

80

18

16

60

Measured uncertainty

Measured imprecision

70
R² = 0.0274

50
40

30
20
10

0

R² = 0.2582

14
12
10
8
6
4
2

0

1

2

3

4

5

6

7

8

9

0

10

0

1

2

3

Reported precision

6

7

8

9

10

9

10

30

50

Measured uncertainty

Measured imprecision

5

Brooch (un)certainty

Brooch (im)precision
60

R² = 0.355

40
30
20
10
0

4

Reported certainty

0

1

2

3

4

5

6

Reported precision

7

8

9

10

25
20

R² = 0.0692

15
10
5
0

0

1

2

3

4

5

6

7

8

Reported certainty

Fig. 7.1 Reported vs. measured (im)precision (left) and (un)certainty (right) for the Villa (top)
and Brooch (bottom) elements

beyond the lexical markers that were considered in this study, that influence how
a text is perceived regarding precision and certainty. One candidate factor may
be the style of the text. For example, text A scored the lowest for both reported
precision and certainty for the Villa element. This is a text dating back to 1921,
which employs a language that most people would consider too convoluted and old
fashioned today. This may be contributing to the low reported values for this text.
Another candidate factor may be the preconceptions and previous experiences of
the specialists assessing the texts. For someone who knows much about an element
such as the Villa or the Brooch, a text may look imprecise or uncertain even if other
specialists with not as much experience with this particular element would see it as
quite precise or certain. Finally, other linguistic devices in addition to the lexicon
may be contributing to the specialists reported scores, such as sentence length and
complexity or connector use.
These observations are possible within the vagueness theoretical framework
described in this chapter. For example, the fact that imprecision and uncertainty
are separately defined and characterised allowed us to study how people perceive
them through different sets of markers.

154

C. Gonzalez-Perez et al.

7.5 Conclusions
In this chapter we have described how vagueness is treated in philosophy and
computer science. We have also proposed an operationalised theory for vagueness
variables, and illustrated it through an empirical study.
By managing vagueness explicitly, as proposed in this chapter, archaeological
texts can be explicit about their imprecision, inaccuracy, uncertainty and error.
This allows more faithful representations of the archaeological record and more
nuanced interpretations of their accounts. However, dealing with vagueness comes
to a cost, as it increased the complexity of the associated information systems
and even the associated field methodologies (Tobalina-Pulido & Gonzalez-Perez,
2020). We will keep working to find an acceptable balance between expressivity
and complexity, and better approaches and techniques to implement this vagueness
theory in archaeological software tools.

References
Abualdenien, J., & Borrmann, A. (2020). Vagueness visualization in building models across different design stages. Advanced Engineering Informatics, 45, 101107. https://doi.org/10.1016/
j.aei.2020.101107
Akiba, K. (2014). Vague objects and vague identity (Vol. 33). Springer.
Ashtiani, M., & Azgomi, M. A. (2016). A hesitant fuzzy model of computational trust considering
hesitancy, vagueness and uncertainty. Applied Soft Computing, 42, 18–37. https://doi.org/
10.1016/j.asoc.2016.01.023
Austin, J. L. (1989). How to do things with words: The William James lectures delivered at Harvard
University in 1955 (2nd ed.). University Press.
Baxter, M. J. (2009). Archaeological data analysis and fuzzy clustering. Archaeometry, 51(6),
1035–1054. https://doi.org/10.1111/j.1475-4754.2008.00449.x
Black, M. (1937). Vagueness. An exercise in logical analysis. Philosophy of Science, 4(4), 427–
455. [Online]. Available: http://www.jstor.org/stable/184414
Coletti, G. (2020). Decision Rules Under Vague and Uncertain Information. In Fuzzy Approaches
for Soft Computing and Approximate Reasoning: Theories and Applications: Dedicated to
Bernadette Bouchon-Meunier (pp. 85–97). Cham: Springer International Publishing. https://
doi.org/10.1007/978-3-030-54341-9_8
Cutting, J. (2019). German, Spanish and Mandarin speakers’ metapragmatic awareness of
vague language compared. Journal of Pragmatics, 151, 128–140. https://doi.org/10.1016/
j.pragma.2019.03.011
de Runz, C., Desjardin, E., Piantoni, F., & Herbin, M. (2013). Using fuzzy logic to manage
uncertain multi-modal data in an archaeological GIS. In International Symposium on Spatial
Data. Quality-ISSDQ 13-15th June 2007. Enschede, the Netherlands, vol. 7, 2007.
de Silva, C. W. (1995). Intelligent control. Routledge.
Dinu, A., Hahn, W. v., & Vertan, C. (2017, November). On the annotation of vague expressions:
A case study on Romanian historical texts. In Proceedings of the workshop on language
technology for Digital Humanities in Central and (South-)Eastern Europe, pp. 24–31, https://
doi.org/10.26615/978-954-452-046-5_004.
Evans, G. (1978). Can there be vague objects? Analysis, 38(4), 208. https://doi.org/10.1093/analys/
38.4.208

7 Dealing with Vagueness in Archaeological Discourses

155

Fabbrini, F., Fusani, M., Gnesi, S., & Lami, G. (2001). An automatic quality evaluation for
natural language requirements. In Proceedings of the seventh international workshop on RE
Foundation for Software Quality (REFSQ’2001), pp. 4–5.
Fantechi, A., Ferrari, A., Gnesi, S., & Semini, L. (2018, August). Requirement engineering
of software product lines: Extracting variability using NLP. In 2018 IEEE 26th international Requirements Engineering conference (RE), pp. 418–423. https://doi.org/10.1109/
RE.2018.00053
Fermüller, C. G., Hofer, M., & Ortiz, M. (2017). Querying with vague quantifiers using probabilistic semantics. In Flexible Query Answering Systems: 12th International Conference, FQAS
2017, London, UK, June 21-22, 2017, Proceedings 12 (pp. 15–27). Springer International
Publishing.
Gasmi, M., & Bourahla, M. (2017). Reasoning with vague concepts in description logics.
International Journal of Fuzzy System Applications, 6(2), 43–58. https://doi.org/10.4018/
IJFSA.2017040103
Gonzalez-Perez, C. (2017). How ontologies can help in software engineering. In J. Cunha, J.
P. Fernandes, R. Lämmel, J. Saraiva, & V. Zaytsev (Eds.), Grand timely topics in software
engineering (LNCS) (Vol. 10223, pp. 26–44). Springer.
Gonzalez-Perez, C. (2018). Information modelling for archaeology and anthropology. Springer.
Gupta, C., Jain, A., & Joshi, N. (2018). Fuzzy logic in natural language processing – A closer view.
Procedia Computer Science, 132, 1375–1384. https://doi.org/10.1016/j.procs.2018.05.052
He, L., & Smit, E. (2021). Vague language in online medical consultation. European Journal of
Health Communication, 2(1), 1–28. https://doi.org/10.47368/ejhc.2021.001
Hermon, S., & Niccolucci, F. (2002). Estimating subjectivity of typologists and typological
classification with fuzzy logic. Archeologia e Calcolatori, 13, 217–232.
Hyde, D. (2008). Vagueness, logic and ontology. Ashgate.
Janier, M., Aakhus, M., Budzynska, K., & Reed, C. (2016). Modeling argumentative activity
with inference anchoring theory. In D. Mohhamed & M. Lewinski (Eds.), Argumentation and
reasoned action. Volume I proceedings of the 1st European conference on argumentation (Vol.
1, no. 62). College Publications.
Jing, X., Pinel, P., Pi, L., Aranega, V., & Baron, C. (2008). Modeling uncertain and imprecise
information in process modeling with UML.
Joty, S., Carenini, G., & Ng, R. T. (2015). CODRA: A novel discriminative framework
for rhetorical analysis. Computational Linguistics, 41(3), 385–435. https://doi.org/10.1162/
COLI_a_00226
Keefe, R. (2000). Theories of vagueness. Cambridge University Press.
Lacerda, M. J., & Crespo, L. G. (2017, May). Interval predictor models for data with measurement
uncertainty. In 2017 American Control Conference (ACC), pp. 1487–1492. https://doi.org/
10.23919/ACC.2017.7963163
Lakoff, G. (1990). Women, fire, and dangerous things. University of Chicago Press.
Lassiter, D., & Goodman, N. D. (2017). Adjectival vagueness in a Bayesian model of interpretation.
Synthese, 194(10), 3801–3836. https://doi.org/10.1007/s11229-015-0786-1
Lebanoff, L., & Liu, F. (2018, August). Automatic detection of vague words and sentences in
privacy policies. [Online]. Available: http://arxiv.org/abs/1808.06219
Leto Russo, P. G. (2019). A corpus-based study of vague language in political discourse: Trump
and the strategic use of vague terms. Università degli Studi di Modena e Reggio Emilia.
Li, S. (2019). Communicative significance of vague language: A diachronic corpus-based study
of legislative texts. English for Specific Purposes, 53, 104–117. https://doi.org/10.1016/
j.esp.2018.11.001
Lieskovský, T., Duračiová, R., & Karell, L. (2013). Selected mathematical principles of archaeological predictive models creation and validation in the GIS environment. Interdisciplinarity
and Archaeology – Natural Science in Archaeology, IV(2), 177–190. https://doi.org/10.24916/
iansa.2013.2.4

156

C. Gonzalez-Perez et al.

Malyuga, E., & McCarthy, M. (2018). English and Russian vague category markers in business
discourse: Linguistic identity aspects. Journal of Pragmatics, 135, 39–52. https://doi.org/
10.1016/j.pragma.2018.07.011
Mann, W. C., & Thompson, S. A. (1987). Rhetorical structure theory: Description and construction
of text structures. In Natural language generation (pp. 85–95). Springer.
Martin-Rodilla, P., Gonzalez-Perez, C., Martín-Rodilla, P., Gonzalez-Perez, C., Martin-Rodilla,
P., & Gonzalez-Perez, C. (2019a). Conceptualization and non-relational implementation of
ontological and epistemic vagueness of information in digital humanities. Informatics, 6(2),
20. https://doi.org/10.3390/informatics6020020
Martin-Rodilla, P., Pereira-Farı̃a, M., & Gonzalez-Perez, C. (2019b). Qualifying and quantifying
uncertainty in digital humanities: A fuzzy-logic approach. In ACM international conference
proceeding series (pp. 788–794). https://doi.org/10.1145/3362789.3362833
Nicolucci, F., & Hermon, S. (2010). A fuzzy logic approach to reliability in archaeological virtual
reconstruction, in: Nicolucci, F., & S. Hermon (eds.), Beyond the Artifact. Digital Interpretation
of the Past. Proceedings of CAA2004, Prato 13–17 April 2004. Archaeolingua, Budapest, pp.
28–35.
Novak, V. (2017). Fuzzy logic in natural language processing. In 2017 IEEE international conference on fuzzy systems (FUZZ-IEEE) (pp. 1–6). https://doi.org/10.1109/FUZZIEEE.2017.8015405
Olivé, A. (2007). Conceptual modeling of information systems. Springer.
Ottomanelli, M., & Wong, C. K. (2011). Modelling uncertainty in traffic and transportation
systems. Transportmetrica, 7(1), 1–3. https://doi.org/10.1080/18128600903244636
PROVIDEH. (2018). CHIST-ERA call 2016 – VADMU topic. http://www.chistera.eu/projects/
providedh
Quammie-Wallen, P. (2021). Vague language in Hong Kong English, ‘something like that’. English
Today, 37(1), 13–25. https://doi.org/10.1017/S0266078419000415
Ramos-Soto, A., & Martin-Rodilla, P. (2021). Enriching linguistic descriptions of data: A
framework for composite protoforms. Fuzzy Sets and Systems, 407, 1–26. https://doi.org/
10.1016/j.fss.2019.11.013
Rashkin, H., Choi, E., Jang, J. Y., Volkova, S., & Choi, Y. (2017). Truth of varying shades:
Analyzing language in fake news and political fact-checking. In Proceedings of the 2017
conference on empirical methods in natural language processing (pp. 2931–2937). https://
doi.org/10.18653/v1/D17-1317
Raskin, V., & Taylor, J. M. (2014, June). Fuzziness, uncertainty, vagueness, possibility, and
probability in natural language. In 2014 IEEE Conference on Norbert Wiener in the 21st
Century (21CW) (pp. 1–6). https://doi.org/10.1109/NORBERT.2014.6893868
Reeler, C. (1999). Neural networks and fuzzy logic analysis in archaeology. In L. Dingwall, S.
Exon, V. Gaffney, S. Laflin, & M. van Leusen (Eds.), Proceedings of the 25th anniversary
conference, University of Birmingham, April 1997. Archaeopress.
Runz, C. D., Desjardin, E., Piantoni, F., & Herbin, M. (2007). USING fuzzy logic to manage
uncertain multi-modal data in an archaeological GIS.
Russell, B. A. W. (1923). Vagueness. Australasian Journal of Psychology and Philosophy, 1, 84–
92.
Syropoulos, A. (2016). A (basis for a) philosophy of a theory of fuzzy computation. https://doi.org/
10.2478/kjps-2018-0009.
Taheri, S. M., Ghadim, F. I., & Kabirian, M. (2019, January). Application of fuzzy inference
systems in archaeology. In 2019 7th Iranian Joint Congress on Fuzzy and Intelligent Systems
(CFIS) (pp. 1–4). https://doi.org/10.1109/CFIS.2019.8692167
Tobalina-Pulido, L., & Gonzalez-Perez, C. (2020). Valoración de la calidad de los datos arqueológicos a través de la gestión de su vaguedad. Aplicación al estudio del poblamiento tardorromano.
Complutum, 31(2), 341–358. https://doi.org/10.5209/cmpl.72488
Toledo, E. Q. (2017). Vague language in the corpus of historical English texts (Vol. 2).
van Deemter, K. (2010). Not exactly. In Praise of vagueness. Oxford University Press.

7 Dealing with Vagueness in Archaeological Discourses

157

Wetzel, L. (2018). Types and tokens. In E. N. Zalta (Ed.), The Stanford encyclopedia of philosophy
(Fall 201). Metaphysics Research Lab, Stanford University.
Williamson, T. (1996). Vagueness (Paperback). Routledge.
Wittgenstein, L. (1989). Philosophical investigations (3rd ed. re. ed.). Blackwell.
Zadeh, L. A. (1996). Fuzzy logic = Computing with words. IEEE Transactions on Fuzzy Systems,
4(2), 103–111. https://doi.org/10.1109/91.493904
Zadeh, L. A. (2002). A prototype-centered approach to adding deduction capability to search
engines-the concept of protoform. In 2002 annual meeting of the North American fuzzy
information processing society proceedings. NAFIPS-FLINT 2002 (Cat. No. 02TH8622) (pp.
523–525). https://doi.org/10.1109/NAFIPS.2002.1018115
Zadeh, L. A. (2010, August). A summary and update of ‘fuzzy logic’. In 2010 IEEE international
conference on granular computing (pp. 42–44). https://doi.org/10.1109/GrC.2010.144

Chapter 8

Extending Discourse Analysis
in Archaeology: A Multimodal Approach
Jeremy Huggett

Abstract Archaeology is a highly visual discipline, reliant on observation as well
as description, and consequently makes extensive use of diagrams, maps, plans,
illustrations and photography as well as textual narratives in communicating its
interpretations of past material culture. If discourse analysis is to shed light on the
construction of archaeological knowledge it therefore should seek to incorporate the
visual alongside the textual, but at present discussion of the two modes are largely
independent of each other with an emphasis on the text. A case study examines the
interrelationships and interdependencies that exist between text and illustrations in
archaeological grey literature, and argues that a multimodal approach to knowledge
creation is called for which better reflects the different modes and media used in
archaeology.
Keywords Text · Visualisation · Multimodal · Knowledge · Grey literature

8.1 Archaeology and Discourse Analysis
Discourse analysis is frequently defined quite specifically in terms of the analysis
of text or speech. Originally defined by Harris as “a formal method for the analysis
of connected speech or writing based on its linguistic components in order to obtain
new information about the text under study” (Harris, 1952, p. 1), similar definitions
are found in contributions to this volume (for example, Pereira-Fariña, GonzalezPerez, Martin-Rodilla, Lawrence et al., and Castiello) and elsewhere. For instance,
“Discourse analysis examines patterns of language across texts and considers the
relationship between language and the social and cultural contexts in which it is
used” (Paltridge, 2012, p. 2) and “ . . . discourse analysis is a view of language
at the level of text. Discourse analysis is also a view of language in use . . . ”
J. Huggett (!)
Archaeology, School of Humanities, University of Glasgow, Glasgow, UK
e-mail: jeremy.huggett@glasgow.ac.uk
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
C. Gonzalez-Perez et al. (eds.), Discourse and Argumentation in Archaeology:
Conceptual and Computational Approaches, Quantitative Archaeology
and Archaeological Modelling, https://doi.org/10.1007/978-3-031-37156-1_8

159

160

J. Huggett

(Paltridge, 2012, p. 7). Discourse analysis is often defined in terms of ‘language’
rather than purely text, and while this might be extended to non-linguistic forms
of communication the context frequently indicates a more restrictive interpretation.
For instance, Schiffrin et al. (2001a, p. 1) categorise discourse as anything beyond
the sentence, as language in use, and as a broader range of social practice
which included non-linguistic and non-specific instances of language. This third
category would appear to admit visual languages, for example, although the volume
(Schiffrin et al., 2001b) is fundamentally textual in outlook and while the content
of the second edition of the volume (Tannen et al., 2015) is significantly different,
the textual emphasis across the contributions is largely unchanged. Similarly, Gee
(2011, p. 7) describes discourse analysis as the study of language-in-use, and
suggests that “a Discourse is a ‘dance’ that exists in the abstract as a coordinated
pattern of words, deeds, values, beliefs, symbols, tools, objects, times, and places”
(Gee, 2011, p. 36). While this might potentially allow non-textual representations
as a form of linguistic symbolism, the emphasis throughout remains on language
as text and speech. Discourse is associated with language and in turn language is
restricted to text and speech.
But to what extent is archaeology a primarily textual discipline? It is common
for archaeology to be described as “writing history”, creating “narrative accounts of
past cultures” (QAA, 2014, p. 8), for example, which would imply the centrality
of a textual approach. The close association between archaeology and text is
evidenced through terminology such as the ‘archaeological record’, the decoration
on a pot categorised as a ‘grammar’, the life history of an artefact described as a
‘biography’, and so on. The discipline of archaeology has itself been traditionally
divided in textual terms, with historical archaeology focusing on ‘literate’ societies’
and leaving ‘illiterate’ societies to prehistoric archaeology: what Hawkes (1954, pp.
156–57) termed “text-aided” archaeology as opposed to “text-free” archaeology.
The presence of texts in historical archaeology has seen by some as limiting the
potential for archaeological theory, analysis, and interpretation, making archaeology
subservient to history, a perspective leading some historical archaeologists to
attempt to set texts aside and treat their studies as ahistorical (e.g. Moreland, 2006,
pp. 136–37). At the same time, paradoxically, prehistoric archaeologists might
bemoan the absence of texts (see, for example, Andrén, 1998, pp. 2–4; Moreland,
2003, pp. 9–13) since an archaeology without texts was seen to limit the degree to
which past cultural activities could be reconstructed (Hawkes, 1954, p. 160ff).
A textual focus is equally concerned with how archaeology and its associated
knowledge is presented and communicated. The task of the archaeologist is to
‘write’ archaeology, from desk-based assessments and project proposals, to fieldwork with its inscriptions made in field notebooks, diaries, and context records, to
the preparation of the final report and publication, and its subsequent incorporation
in works of synthesis. Consequently “Writing long-form linear text remains the
privileged form in which our research is shared” (Tringham & Danis, 2019, p. 62).
As Joyce argued, archaeological discourse is dialogic in nature:

8 Extending Discourse Analysis in Archaeology: A Multimodal Approach

161

The formation of marked genres – including site reports and more popular media, such
as museum exhibits – are formalizations of specific dialogues, amenable to analysis as
genres. Archaeology is a textual practice from the field through the lab and into all forms of
dissemination. (Joyce, 2002, p. 3)

This apparent textual pre-eminence in archaeology was reinforced during the
structural and post-structural theoretical debates in archaeology in the 1980s and
1990s concerning text-based approaches to understanding the past. As Hodder
observed, the idea that material culture was itself a text that could be ‘read’ was
tacitly assumed in archaeology and the challenge lay not so much in the idea that
artefacts could ‘speak’ but in understanding what they meant (Hodder, 1986, p.
122ff; see also Hodder & Hutson, 2003, p. 167ff). Shanks and Tilley, jointly and
separately, similarly sought to treat material culture as text, arguing that material
culture was
. . . a communicative medium of considerable importance for transmitting, storing and
preserving social knowledge and as a symbolic medium for orientating people in their
natural and social environment because of the relative permanence of material culture vis
a vis speech acts. So material culture can be regarded in oral societies as a form of writing
and discourse inscribed in a material medium in just the same way as words in chirographic
and typographic cultures are inscribed on a page. (Shanks & Tilley, 1987, pp. 96–97; see
also Shanks, 1992; Tilley, 1991, for example)

Although Hodder initially argued that material culture was easier to decipher
than texts where the language was not known, it was quickly recognised that the
relationship was not straightforward and material culture did not communicate
in the same way as writing. For example, Barrett (1988) rejected the idea of an
archaeological ‘record’ left by the past that could be ‘read’ like a text; instead,
the past consisted of fragmentary traces of social practices whose presences and
relationships could best be understood as a field of discourse. Elsewhere, Olsen
remarked that
. . . we come to ignore the differences between things and text: that material culture is in the
world and plays a fundamentally different constitutive role for our being in this world than
texts and language. Things do far more than just speak and express meanings . . . (Olsen,
2003, p. 90)

Others simply observed that the linearity associated with text made it a poor analogy
for the spatial and temporal complexities of archaeological evidence (for example,
Renfrew, 1989, pp. 35–36; see Preucel, 2006, pp. 138–42 for a useful overview).

8.2 From Text to Visualisation
Archaeology and text has therefore been entangled in complex practical and theoretical ways over the years, further complicated by association with Foucault’s use
of ‘archaeology’ as a means of analysing written and spoken discourse (Foucault,
1989). However, although the textual analogy retains considerable influence, its

162

J. Huggett

pre-eminence is over-stated. An alternative perspective argues that archaeology
is a “profoundly visual discipline” (James, 2015, p. 1189); it is “an explicitly
visual science . . . [which] has from its very beginnings developed a distinctive
visual language that it has used to communicate theories, technical principles,
and data” (Moser, 1996, p. 185); a discipline in which its “illustrative traditions
are central” (Moser, 2001, p. 280) and where its everyday discourse is reliant
on visual communication (Bateman, 2006, p. 68). Indeed, “There can be no
doubt that archaeological ‘imagination’ has always been visual to a large extent”
(Hussain, 2021, p. 140). Visual representations have long been a key means of
archaeological communication, as seen in Carter’s volumes on the Tutankhamun
excavations, Flinders Petrie’s Ten Years Diggings in Egypt, or Gardner’s Ancient
Athens, for instance (see Thornton, 2018). So much of archaeology is predicated
on observation that visual representations can in some senses be seen as ‘natural’ –
for example, Hope-Taylor proposed the existence of a universal visual language for
archaeological evidence:
Translate such data into words, and not only are they removed one step further from reality
but also their meaning is put internationally at risk . . . Since we can all understand each
other’s drawings and photographs whatever language we happen to speak, it must always
be folly to verbalize where we could visualize. (Hope-Taylor, 1967, p. 181)

So if the visual component of archaeology is so important, why has the textual
emphasis remained so dominant?
One characteristic of archaeological publications is that visual representations
are often demoted to an accompanying role to the text, illustrating but otherwise
contributing little to the discourse. For example, James points to what he calls
“a widespread ‘logocentrism’ and ‘iconophobia’ . . . based on the notion that the
more pictures a work has, the less seriously it is taken” (James, 1997, p. 24).
More recently, Opgenhaffen (2021, pp. 354–55) has observed how few illustrations
accompany more theoretically-inclined archaeological publications, whereas an
extensively illustrated student textbook on archaeological theories, methods and
practice contains no section on visualisation practice, emphasising the niche
character of visualisation in archaeological communication. Pétursdóttir argues that
. . . this attitude towards visual material is . . . anchored in a more deep-seated discrimination between word and image, between the articulated and the artistic and, more generally,
in a semiotics of suspicion that has permeated the humanities and social sciences throughout
the 20th century. (Pétursdóttir, 2020, p. 102)

Part of this is bound up in a traditional distrust of images as lacking appropriate
objectivity and transparency, a position inherited from logical empiricism which
saw visualisation as inferior to text in a form of linguistic determinism (for example,
Baigrie, 1996; Giere, 1996; Topper, 1996). This assumes that visual representations
are at best illustrations in support of the text and denies their capacity to carry
information, even evidence, in their own right. The power of the visual in discourse
is nevertheless important:
“You doubt what I say? I’ll show you.” And, without moving more than a few inches, I
unfold in front of your eyes figures, diagrams, plates, texts, silhouettes, and then and there

8 Extending Discourse Analysis in Archaeology: A Multimodal Approach

163

present things that are far away and with which some sort of two-way connection has now
been established. (Latour, 1990, p. 36)

However, in such a scenario it would clearly still be possible to see visual
representations as little more than props in an argument without necessarily having
evidential value.
The use of visual representations in archaeology has been seen as rather different
to the standard approaches to scientific images because of the way that they do
represent evidence in their own right (for example, Bueno, 2016, p. 15; Lopes, 2009;
Hussain, 2021). This is not to suggest that they are not selective, or indeed, that they
are objective, but such selection is part of the process of archaeological knowledge
construction. In the field, for example, drawings are fundamental to the process of
knowledge creation:
The reiterative process through which site drawings are transformed into illustrations for
publication gradually separates the image from the subjective interpretive process that was
at the root of its inception. The conscious and unconscious decisions that were part of the
image’s creation become embedded more deeply within the knowledge authority structures
of the discipline. The fuzzily drawn lines are sharpened and the hesitantly drawn boundaries
are strengthened and defined through the repeated tracing and redrawing of the original field
drawing. (Bateman, 2006, p. 78)

The means by which archaeological representations achieve this evidential status
is through the use of conventions: social or symbolic practices which ensure commonality of understanding and enable comparability between visualisations (see, for
example, Lopes, 2009, p. 12; Moser, 2001, pp. 268–69). The conventions determine
the information to be included and frequently the way it is to be represented, creating
what are effectively technical drawings of artefacts, maps, plans, stratigraphic
sections, and the like. These conventions represent visual sets of rules, both tacit and
explicit, which are accepted and understood by the archaeological community, if not
beyond. Conventions differ between mode of visualisation (drawing, map, plan etc.)
and subject (lithics, pottery, etc.), and between media (photograph, drawing, etc.),
providing different ways of seeing and representation. Such conventions
. . . work to imbue visualisations with the quality of objectivity (which brings together other
qualities such as transparency, scientific-ness and facticity). This produces the impression
that visualisations are showing the facts, telling it like it is, offering windows onto data.
(Kennedy et al., 2016, p. 716)

For example, the preference in archaeological field drawings for two-dimensional
presentations – either top-down (as in maps and plans) or frontal view (as in section
drawings) – may have its origins in field practice (recording via the two-dimensional
permatrace sheet or computer screen, for instance), but it also carries with it an
implicit objectivity (although it is not) and may present an impression of control
and authority through a ‘god-like’ perspective (Kennedy et al., 2016, p. 723). Even
when three-dimensional data is collected, as in structure-from-motion imaging of
stratigraphic sections, they are frequently represented as two-dimensional images
or tracings.

164

J. Huggett

Classically, visual representations are seen to have a supporting role to the text:
. . . the employment of a graphical feature, photograph, map, or other representational
device to elucidate, explain, or show something in a text . . . the illustration is meant to
summarize an argument, provide a reference point, or corroborate the text (Burdick et al.,
2012, p. 43)

The text retains priority in such a scenario: the image provides data or backing
for an argument while the detail of the specific position is expressed textually.
Indeed, if an illustration simply recapitulates the text, the necessity of its inclusion
may legitimately be open to question (e.g. Candea, 2019, pp. 65–66). That aside,
images are seen as a means of improving the readability and understandability of
the text through their capacity to summarise and communicate information more
economically, although their success is dependent on skilful presentation and often –
ironically – on appropriate labelling and captioning. Even if they are not necessarily
peripheral to the presentation of argument (c.f. Moser, 1996, p. 186), they can appear
to add a spurious level of authority by virtue of their inclusion, with their ‘scientific’
air of objectivity and transparency. Furthermore, there may be unrecognised, even
hidden implications embedded in the visualisation which go beyond the intentions
of the author – for example, drawn elements such as circles imply closure, solid lines
suggest clear boundaries (Candea, 2019, p. 76), and, of course, the range of doubts
and uncertainties inherent in archaeological field drawings are frequently resolved
in their final publication form. Like texts, images can also mislead
. . . through the constant ambiguity between what is being figured and what is merely a
convenient way to draw something. Is the distance between these two forms, their respective
size, or the thickness of the line meant to be relevant, or is it merely the clearest way to
arrange a picture on the page? (Candea, 2019, pp. 76–77)

Crucially, text and image are different modes of expression, employing different
languages and conventions: words and grammar producing sentences on the one
hand, with shape, colour, size and space on the other, although aspects such as page
layout and typography blur the distinction between the two. A consequence of these
different modes is that there is always a semantic gap between text and image that
is bridged through interpretation of the relationships between them. There may be
a degree of functional equivalency between visual and textual arguments, but the
means by which they present their information and their relationship with author
and reader differ (e.g. van den Hoven, 2012, p. 258ff).
The question remains, however, that if visual representations provide more
than simply decoration or “bravura display” (Flanders, 1998, p. 309) for the
accompanying text, should they not be incorporated as part of the analysis of a
discourse, rather than that analysis focussing solely on the text? Even the most basic
archaeological grey literature fieldwork reports contain often substantial graphical
components alongside their texts (for example, see Fig. 8.1). Subsequently applying
optical character recognition to extract the text for analytical purposes wrenches the
text from that intimate relationship with the graphical and image components and
restores the division between text and illustration to the state prior to the preparation
of the final report, changing the interplay between text and image in the process.

8 Extending Discourse Analysis in Archaeology: A Multimodal Approach

165

Fig. 8.1 Snapshot of the Inverkeithing Friary excavation report (Beckett, 2018), excluding the
cover and content pages and appendices

This raises important questions concerning the role and function of the images and
graphics, and the extent to which the text in the report is reliant on or independent of
them. Are the visuals in archaeological texts critical to the discourse on the page, and
can a discourse analysis focused on the text alone adequately capture the knowledge
represented?

8.3 Discourse Analysis and Visualisation
Understanding visualisation communication entails understanding the underlying
codes – codes we may already know, at least implicitly, without necessarily knowing
what we know or how we ‘read’ an image (Kress & van Leeuwen, 2006, pp. 32–
33). Such codes provide the vehicle through which a visualisation creates meaning.

166

J. Huggett

Kjeldsen (2018, p. 79) describes visual images as providing a thick and rich but
ambiguous representation because of the range of dimensions and visual details they
provide, whereas text is seen as providing unambiguous but thin information. For
instance, an archaeological statement such as ‘layer X is cut by layer Y’ describes
a stratigraphic condition in a straightforward if abstract manner, whereas a matrix
diagram demonstrates this visually along with other relationships that either layer
might be involved in, a section diagram shows the relationship visually along
with details of the shape and extent of the cut, and a photograph may show this
together with an indication of the basis for the distinction between the layers based
on the colour and texture differentiation, for example, often with other contextual
elements visible in the background. Whether text can legitimately be described as
unambiguous is also open to question: the relative regularity and clarity of textual
codes might be mistaken for a lack of ambiguity, and differences in phrasing and
shading can be used to imply uncertainty or lack of clarity in a description in much
the same way as they can be represented visually.
How meaning is communicated through visual representation has been categorised in numerous ways (see summary in Engelhardt, 2007, for example). For
instance, Engebretsen and Weber (2018, pp. 277–78) identify a series of ‘graphic
modes’ within the broader set of modes or semiotic resources. These graphic modes
include typography, layout, maps, diagrams, drawings, and photographs, and each
offer different semiotic affordances and employ different conventions. Furthermore,
each of these modes consists of a set of what they call semiotic elements, or
sub-modes (for example, font, size, colour, shape, spatial arrangement, etc.), each
of which in turn can be broken down further into more specific characteristics
(for example, hue, saturation, luminance, texture, etc.). In an alternative approach,
Drucker (2014, pp. 65–66) categorises visualisation according to different parameters which can be combined in different ways to different ends. For example, there
may be different graphical formats (maps, plans, timelines, charts, photographs,
etc.), they may have different purposes (mapping, data presentation, calculation,
etc.), they may have different types of content (spatial, temporal, quantitative,
qualitative, interpretative, etc.), they may structure meaning differently (by analogy,
through comparison, connection, in 2D, 3D, etc.), or may differ according to
their disciplinary origins (geographical maps, geological sections, statistical charts,
genealogical trees, etc.). In both cases, while some aspects of a visual code may
be shared between different (sub)modes, others may be unique and indeed, may
specifically characterise a particular visualisation method.
There is also an analytical division in terms of the methodology used to
expose the workings of visual representations as meaning-making devices. On
the one hand, methods may be derived from linguistic analysis, based on the
notion that visualisations possess a ‘grammar’ which enables them to be treated
analogous to texts (based on Kress & van Leeuwen, 2006, for example). On the
other hand, methods may be derived from information visualisation studies, itself
concerned with the design of visual representations to facilitate understanding,
employing graphical analytics and using visualisations to uncover relationships in
other visualisations (see Kilchör & Lehmann, 2021; Uggla, 2021, for example).

8 Extending Discourse Analysis in Archaeology: A Multimodal Approach

167

Table 8.1 Some of the semiotic elements or sub-modes associated with visual representations
organised according to the ideational, interpersonal, and compositional metafunctions
What is being communicated?
Type of data presentation
(graph, chart, map,
flowchart, network . . . )
Type of data (quantitative,
qualitative . . . )
Type of information (facts,
process, classification,
structure, concept . . . )
Type of representation
(comparison in size, ranking,
distribution, correlation,
space/location change over
time . . . )
Type of subject (event,
action, people, objects . . . )
Style (pictorial,
non-pictorial . . . )
Basic semiotic resources
(lines, points, circles . . . )
Visual variables used (size,
shape, colour, texture,
surface, volume, duration,
order, perspective . . . )
What is not shown or
omitted?
Are other visuals
integrated with this one?
(embedded drawings,
photos, etc.)

How is it presented?
Style of visualisation
(scientific, hand-drawn,
cartoon, standard software
template . . . )
Purpose (narrative,
descriptive, explanatory,
argumentative, exploratory
...)
User engagement (degree of
interactivity . . . )
Relationship between
author/reader (top-down,
bottom-up, linear,
non-linear, narrative,
exploratory . . . )
Distance (small – ‘showing’
mode, large – ‘telling’ mode,
viewpoint – 2D/3D, ‘god’
view, immersive . . .
Attitude (professional,
casual, sensational,
impartial, objective,
subjective, factual . . . )
Knowledge required (level
of visual literacy)
Framing (fact-based or not?
Is uncertainty shown? Can
different visualisations be
chosen?
Appearance of
trustworthiness or
reliability
Balance between aesthetics
and ethics

How is meaning created?
Grouping of units
(proximity, spatial
arrangement, foreground,
background . . . )
Salience or emphasis
(through size, colour, shape,
contrast, repetition,
dynamics . . . )
Framing of units (through
axes, legend, caption, text
boxes, frames, connecting
lines, space, colour . . . )
Positioning of units
(horizontal, vertical, radial,
circular, top, bottom, centre
...)
Nature of layout (2D/3D,
gridded, alignment, contrast,
consistency, symmetry,
balance, margin . . . )
Navigation
Hierarchy (information
architecture, information
layers . . . )
Reader guidance (defined
reading path: left to right
etc., no predefined path . . . )
Usability (information
density and complexity,
interactivity, accessibility,
inclusion . . . )
Causal relations? (arrows,
nodes and connectors . . . )

Adapted from Weber (2019, Tables 1, 2 and 3)

Fundamentally, all approaches can be seen to build from three metafunctions
originally defined by Halliday: ideational, interpersonal, and textual (e.g. Halliday
& Matthiessen, 2006, p. 511ff). These essentially ask what is being communicated,
how the content is presented to the ‘reader’, and how the composition is used to
create meaning. For example, Weber (2019) has usefully structured a framework
of textual and graphical modes of visualisation around these three metafunctions,
and the semiotic elements or sub-modes associated with visual representations are
summarised in Table 8.1.
This analytical framework illustrates a common problem with many forms of
discourse-related visualisation analysis: it is highly descriptive and consequently

168

J. Huggett

labour-intensive to apply since the process of recording essentially constitutes a
form of entextualisation, a translation of the visual characteristics into textual
description (see, for example, Jones, 2021, p. 10ff), which tends to imply a high
degree of human intervention in the process. This complexity may legitimately raise
questions as to scalability of such approaches to large corpora, and in turn, how
digital tools might be brought to bear on visualisation analysis.

8.4 Multimodal Discourse Analysis
Although discourse studies more generally have privileged language, virtually
equating one with the other, as Rheindorf notes,
. . . if the ultimate aim of critically studying discourse is to reveal the ways in which it
constitutes, maintains, and transforms social reality and relations . . . such logocentrism
is a severe limitation: to focus only on the linguistic elements risks ignoring a significant
portion of the meaning potential of texts . . . . (Rheindorf, 2019, p. 93)

That said, the term ‘text’ has been stretched to cover other analytical objects, as in
the case of archaeological approaches to material culture. Bateman et al. (2017, p.
52) observed that anything subjected to semiotic analysis could be treated as a ‘text’,
one effect of which was in many instances to inappropriately associate the properties
of texts with non-textual objects. Consequently terms like ‘visual language’, ‘visual
grammar’, and ‘visual literacy’ potentially run the risk of mis-associating textual
properties with visual representations and presumes that non-textual objects perform
in a similar manner to texts. This can present problems when the non-textual is
categorised and described in primarily textual terms.
The logocentric nature of discourse analysis began to change with the recognition
by Kress and van Leeuwen of what they called a “communicational ensemble”
(2001, p. 111), acknowledging that meaning was created in many different ways
through different modes and media coming together. Consequently, they argued,
the idea that “language is the central means of representing and communicating
even though there are ‘extra-linguistic’, ‘para-linguistic’ things going on as well –
is simply no longer tenable, that it never really was, and certainly is not now.”
(Kress & van Leeuwen, 2001, p. 111). They subsequently produced what they
called a ‘grammar’ of images (Kress & van Leeuwen, 2006) but this multimodal
approach to the creation of meaning extended beyond text and images, ranging
across layouts, music, gestures, video and film, soundtracks, 3D objects, artefacts,
space, architecture, etc. Kress (2010, p. 79) describes these multiple modes or
different semiotic resources as presenting a challenge to notions of language since
the different modes offer different potentials which affect the choice of modes used
in specific instances of communication. Hence, for example, a multimodal analysis
may involve the examination of the words and their presentation on the page in
conjunction with the function and meaning of visual images, and the way the two
semiotic resources are integrated with each other (e.g. O’Halloran, 2004, p. 1). For

8 Extending Discourse Analysis in Archaeology: A Multimodal Approach

169

instance, the text and image may refer to each other through cross-references in
the text to the image, representations within the image of aspects of the text, and
so on. In combination, therefore, they provide different possibilities for meaningmaking, and further, expand what is possible to express using one or other mode
alone (Bateman, 2011, p. 17). While different modes may have specific properties
uniquely associated with them, they are not necessarily restricted to them but may
operate across different modes. For example, ‘framing’ in an image context may
refer to the boundedness of the image, but it may equally refer to the layout of a
text, or the divisions between architectural spaces, or the intervals in film or music
(Kress & van Leeuwen, 2001, pp. 2–3).
Although multimodal analysis implies a unified multidimensional approach
across all modes, this is not always the case: each mode may be analysed
individually. Although this clearly cuts across the objective of multimodal analysis
and hence restricts potential outcomes, it recognises that a ‘true’ multimodal
discourse analysis is highly complex. For example, the presentation of images,
graphics, words, typography, and their spatial arrangement on a page represents
an intricate tapestry of interrelationships, further complicated in a digital arena
with the introduction of hyperlinks, sound, animations and moving images, making
the treatment of all the semiotic modes as a single entity extremely challenging.
Consequently, while it may be possible to treat the verbal-visual complex as a
single analytical unit, alternatively it may be feasible to separate out each mode
and analyse it on its own, perhaps drawing them all together as a final step (e.g.
Bednarek & Caple, 2017, p. 9). To illustrate this, Bednarek and Caple developed
a topology which allows any analysis to be positioned relative to choices about
the unit of analysis and the semiotic mode (the two axes in Fig. 8.2) (Bednarek &
Caple, 2017, pp. 9–12). For example, an analysis might be monomodal, focussing
on a single mode within a single text (bottom right in Fig. 8.2), or a single
mode across several texts (top right in Fig. 8.2), potentially repeating the study
examining a different mode and ultimately combining both to generate a multimodal
analysis. Alternatively, an analysis might be multimodal from the outset, looking at
a combination of different modes across several texts (top left in Fig. 8.2) or within
a single text (bottom left in Fig. 8.2).

8.5 Multimodal Analysis and Archaeological Discourse
Archaeological scholarship on the written text itself has been primarily monomodal,
focussing on the textual component and saying little about other means of communication that can be used in conjunction with text. One of the most extensive
discussions of archaeology and text is that by Lucas (2019), a volume which itself
only contains one figure (and hence reinforces Opgenhaffen’s (2021, pp. 354–
55) observation about the lack of illustration in such texts). Lucas’ discussion
of the role of text in archaeological knowledge production says nothing about
visual representation as a contributing factor, although it is interesting to consider

170

J. Huggett

Fig. 8.2 A topology of semiotic resources and analytical units, focusing on choices surrounding
the analysis of language and/or images in texts. (Adapted from Bednarek & Caple, 2017, Figure
1.3)

the relationship and role of his figure illustrating the Folkton Drums in relation
to the accompanying textual discussion of the drums (Lucas, 2019, pp. 147–49).
By way of comparison, Fagan (2016) briefly refers to illustrations in a book that
otherwise focuses on text, although Connah (2010) includes a full chapter on visual
explanation in his book on writing in archaeology. Overall, however, there is only
limited consideration of the multimodal nature of the writing process in archaeology.
There is a considerable body of scholarship looking at different aspects of the
nature and role of visualisation in archaeology, in addition to the manuals and guides
defining methods and conventions (for example, Adkins & Adkins, 2009). There
are discussions of analog and digital field drawing (e.g. Bateman, 2006; Morgan
& Wright, 2018; Morgan et al., 2021), drawings and visual representations (e.g.
Molyneaux, 1997; Moser, 1996, 2001, 2014; Perry & Johnson, 2014; Hussain,
2021), 2D and 3D digital imagery (e.g. Frischer & Dakouri-Hild, 2008; Garstki,
2017), photography (e.g. Carter, 2015; contributions in McFadyen & Hicks, 2019;
Morgan, 2016; Shanks & Svabo, 2013), mapping (e.g. Gillings et al., 2019),
and aerial and satellite imagery (e.g. Hanson & Oltean, 2013; Parcak, 2009), as
well as a range of image-related contributions to Smiles and Moser (2005), for
example. There is also a vast archaeological literature on material culture discourse

8 Extending Discourse Analysis in Archaeology: A Multimodal Approach

171

associated with artefacts, structures and monuments, for example, which is not
considered here. Like their textual equivalents, most discussions of archaeological
visualisation are primarily monomodal and focus on the particular type of visual
representation concerned and say relatively little about the relationship of that mode
with the broader context in which it may be reproduced. Some exceptions to this
include, for example, Morgan’s study of photography at Çatalhöyük with its use
of framing and semiotic codes (Morgan, 2016), Baird’s analysis of photography
at Dura-Europos (Baird, 2011, 2019), Hussain’s comparative analysis of French
and Anglophone lithic imagery (Hussain, 2021), Carter’s examination of the use
of scales in archaeological site photography (Carter, 2015), or indeed, many of
the studies in McFadyen and Hicks (2019). These would likely in other contexts
be recognised as discourse studies wherein they primarily examine a single, if
non-textual, mode. In some instances, however, they also refer to accompanying
modes: for example, Carter describes how the importance of a photograph only
becomes apparent from the accompanying text and comments on the arrangement
of the images on the page (Carter, 2015, p. 9). Monomodal or multimodal, much
of the work represented in the archaeological discussions of visualisation provides
a valuable grounding for a wider discourse-based analysis examining the way in
which archaeologists integrate linguistic and non-linguistic aspects within their
discourses.
Like texts, archaeological visualisations are often several steps removed from the
phenomena they represent or organise: a field drawing may be one step removed,
while a final publication drawing will be several steps further removed as a consequence of intervening interpretation, redrawing, and reconfiguration. The same can
be said for photographs, with original images subject to subsequent enhancement,
cropping, and resizing, for example. Such processes place the eventual viewer as
observer at potentially some distance from the representation and the processes it
has undergone. As Drucker observes: “The interpretative acts that become encoded
in graphical formats may disappear from final view in the process, but they are the
persistent ghosts in the visual scheme, rhetorical elements of generative artefacts”
(Drucker, 2014, p. 66).
To investigate the effects of this and examine the degree to which archaeological
texts are reliant on accompanying images, maps, and diagrams for their meaning,
a selection of archaeological grey literature reports can be examined. These are
derived from the Archaeology Data Service Grey Literature Archive, which is
particularly appropriate in this context given that the archive is increasingly being
used as a corpus for discourse-style analysis and natural language processing (e.g.
Richards et al., 2011, see also Wright and Evans, this volume). The example reports
chosen here for the two case studies have been quasi-randomly selected; they are
understood to be broadly representative of their type and their discussion should
not be construed as criticism of the reports or their authors. Case Study 2 will
be discussed in less detail since the focus will be on significant differences in
presentation from Case Study 1, bearing in mind each are produced by different
commercial archaeological organisations.

172

J. Huggett

8.5.1 Case Study 1: Excavation Report
The Inverkeithing excavation report by Northlight Heritage (Beckett, 2018) is a
data structure report (DSR), a required output of any archaeological intervention
in Scotland and intended to provide the basis for further analysis and archiving. The
structure of a DSR broadly corresponds to the reporting requirements laid down by
professional bodies (e.g. CIfA, 2020b, pp. 13–15) and consists of a narrative account
of the intervention accompanied by maps, plans and diagrams as required together
with lists of data. This report concerns a small-scale excavation undertaken over a
period of 12 days on the site of the former Franciscan Friary at Inverkeithing, in Fife,
Scotland, in the area of what is currently a park garden. In this example, up to 40%
of the main body of the report as produced (excluding the cover, contents pages, and
appendices) consists of a mixture of photographs, maps, plans and section drawings
(see Fig. 8.1).
One feature that is emphasised early in the report is that this was in part a
community project: ‘Back in the Habit – Digging for Inverkeithing’s Medieval
Friary’. Many of the photographs provided in the report (see Fig. 8.3) quite literally
flesh out the brief, factual statement on community engagement and education in
the text which focuses on the number of volunteers, the number of school children,
and the number of visitors on the open day, together with a brief description of
what the school children did on their visit. Five out of the six images show people
working or training, emphasising both the active engagement and nature of work
activities undertaken. In most cases, faces are hidden or obscure, providing a degree
of anonymity to those depicted. While it would be difficult to argue that the images
contain information crucial to the report, they offer a useful flavour of the public
engagement activities and in combination with the raw numbers of participants
noted in the text provide a valuable indicator of public interest for audiences such as
the funders of the project. This is underlined by a number of textual visitor accounts
of their memories of the site. Finally, a photograph of a number of the volunteers
and staff following a day of backfilling is provided at the end of the main report.
This is one of the few images where faces are clearly visible: they are presumably
among the list of volunteers provided in the acknowledgements, though none are
identified.
A key illustration is the site plan, showing the location of buildings, trenches,
and other features (Fig. 8.4). Apart from the site’s map coordinates provided in the
text, and a description of its location relative to other modern contemporary streets
and buildings, there are no other textual details of the location of the site provided,
which underlines the significance of the site plan in conjunction with the earlier
location map. The plan incorporates a standard north arrow and scale bar, and the
bounding border shows the overlying map grid with coordinates, making the scale
bar slightly redundant. The boxed key distinguishes the flower bed and lawns by
colour (although the empty delineated area to the lower right of the plan is actually
lawn but not coded as such). The excavation trenches are delineated with dot-dash
lines, and provided with labels that float outside each trench, lacking connectors

8 Extending Discourse Analysis in Archaeology: A Multimodal Approach

173

Fig. 8.3 Inverkeithing Friary: examples of photographs. Volunteers excavating (top left and right);
school children learning (middle left and right); volunteers training (bottom left and right) (a
composite of Beckett, 2018, Plates 2, 3 and 5)

174

J. Huggett

Fig. 8.4 Inverkeithing Friary: site plan, showing trench locations (after Beckett, 2018, Figure 2)

to link them unambiguously although the relationship is visually clear. A wall is
labelled, again without a connecting line and the label itself is well-separated from
its colour coded area, which may introduce some ambiguity. Correspondingly, the
label referring to a projected wall line does have a connector, but the resolution is
such that this might relate to the dashed line extending from the hospitium building
(and otherwise unidentified), or alternatively to the faint dotted lines which link
back to the wall in trench 2. Reference to the textual description in the report makes
it clear that the latter applies. The interior of trenches 1 and 2 contain a series of
areas bounded by dashed lines, labelled S1 to S10, whereas trench 3 contains two
areas demarcated by dotted lines with numeric labels contained in square brackets.
Trench 4 contains demarcated but unlabelled areas, including one that is colour
coded. Given its similar treatment to the coded area within trench 2 which is labelled

8 Extending Discourse Analysis in Archaeology: A Multimodal Approach

175

as a wall, the same meaning could be assumed to apply here. None of the codes used
in the labels are explained in the key or elsewhere on the plan, and their meaning
is only found by reference to the text: ‘S’ stands for sondage, the square bracketed
numbers represent cuts, while the coded area in trench 4 is a partial floor surface
rather than a wall. In the text itself, trench locations are identified in general terms:
‘SW corner of the Friary Gardens’, ‘Eastern edge of the site’, for example, and
the locations of the sondages are described in similar terms relative to their trench.
Details such as depth and contents of the trenches are entirely contained within
the textual descriptions since these would overload the two-dimensional plan. This
brief overview highlights the considerable degree to which the textual component
and the illustration are inter-dependent. Elements are contained within each that are
not common to both, while other elements are only understandable by reference
from one to the other. For example, the colour code in the site plan applied to the
standing buildings and to the wall and floor features in the excavation is not shown
in the key provided, but reading the text it becomes clear that this is used to denote
structures in general. Similarly, the shared coded representation of the structures in
trenches 2 and 4 could legitimately imply both are walls, given the label in trench
2, but the text description of trench 4 indicates that this is not the case. It is clear,
therefore, that ambiguities in either the text or the plan require to be resolved by
reference to the other.
The section drawing in Fig. 8.5a is evidently diagrammatic in format, as indicated
by the representation of grass on the ground surface and the hatching of the mortar
layer (0009), for example. A key is provided to the shading used, and a scalebar
is shown. Each of the layers are shown as bounded areas identified by numeric
codes in brackets, but beyond the indications from the key, the nature of each of

Fig. 8.5 Inverkeithing Friary: (a) SE-facing section of Sondage 6; (b) Plan of wall within Sondage
6/7 (a composite of Beckett, 2018, Figures 4 and 5)

176

J. Huggett

the layers is dependent on the textual description. For example, (001) and (028) are
described as topsoil/landscaping overburden, while (029) is a deposit of sandy loam
containing oyster shell (indicated from the key) and some stones (not apparent from
the drawing). The section drawing shows that layer (010) contained stone and oyster
shell but the textual description indicates it also included green-glazed pottery and
butchered animal bone. What is unclear is the extent to which the coded objects
shown in the layers represent the actual position and shape of stones, charcoal,
etc. or are purely representative. The stratigraphic relationships between layers are
implicit within the section plan, and also specified in the context information table in
the text appendices. In the text, the interpretation of the layers is separated by some
distance from their description, reserved for a discussion/summary section. For
example, both (029) and (009) in the section are interpreted as material left behind
from the robbing out of the wall. The markers shown labelled b and b! presumably
represent the section datum line used in the creation of the original field drawing
but site coordinates for these are not provided so the precise location, orientation,
and height of the section is not known. It is also difficult to tie the section drawing
in with the plan provided (Fig. 8.5b), despite both relating to the same area. The
plan in Fig. 8.5b contains a north arrow, scalebar, and a key to the shading used. The
wall (017) is shown in some detail, although again the presentation is diagrammatic
rather than artistic since the outlines of the stones conventionally represent where
stone meets the surrounding matrix rather than a ‘true’ representation of the stone
from above. The two sondages are shown and labelled with connectors, but their
numbers are not shown. It is possible that the sondage in the top left is S6 on the
basis that the section drawing of S6 is SE-facing (from its caption), the S6 section
shows layer (31) to the right of the wall (17), and although layer (30) is not shown, its
description in the report appendix indicates that (30) is under (10), which is shown.
This demonstrates that ambiguities in visual representations may be resolved by
reference to other visualisations, as well as to the accompanying text. Similarly,
lack of detail in the text can often be resolved through information provided in the
visual representations.

8.5.2 Case Study 2: Field Evaluation and Watching Brief
The Wind Hill archaeological evaluation and watching brief report by AOC
Archaeology (Walker, 2020) is a report structured according to the professional
standards defined for reporting field evaluations (CIfA, 2020b, 13–15) and watching
briefs (CIfA, 2020a, 14–15). The Wind Hill evaluation was undertaken in advance
of the construction of a parking area and driveway, with the objective of establishing
the presence or otherwise of any archaeological remains that might be encountered
during the groundworks, and, if found, evaluating their extent, preservation, date,
and significance. The report consists of a number of narrative sections followed
by selected maps and drawings, photographs, with context summary tables in an
appendix. Almost 50% of the main body of the report (excluding the cover, contents

8 Extending Discourse Analysis in Archaeology: A Multimodal Approach

177

Fig. 8.6 Wind Hill: site plan, showing the standardised template format and test pit locations.
(After Walker, 2020, Figure 2)

pages, and appendices) consists of maps, plans, sections and photographs. However,
there is a sharp differentiation between text and image, with all the figures and
plates placed in sections at the end of the report rather than embedded in the text
at appropriate locations. Alongside the report is a digital site data archive (Walker,
2021) which includes the report, a set of photographs (only 9 of which appear in the
report), and low resolution scans of original site records, including drawing sheets,
registers of levels, photos, and finds, together with trench records and some selected
context sheets.
Figure 8.6 demonstrates the use of a standard template consisting of a large
bounding box enclosing the drawn area and a series of small bounding boxes
containing the figure number, north arrow, key, scale bar, and company logo. This
could be seen as heightening an impression of professional reliability, perhaps
reinforced by signs that the plan is digital rather than hand-drawn. The plan is not
gridded and lacks coordinates so locational information for the site is limited to the
description and national grid reference in the accompanying text. The plan provides
the arrangement of test pits within the monitored area, adding more specific detail
to the general locational information in the text which places test pits relative to
the garden area (the proposed car park) and the proposed driveway to the south.
Identification of the bounded areas beyond the monitored area is unclear from both
plan and text, although Plates 1 and 8 (Walker, 2021) provide some contextual
information, together with additional photographs in the digital archive.

178

J. Huggett

Fig. 8.7 Wind Hill: (a) Final section drawing for test pit 4 (extracted from Walker, 2020, Figure
3); (b) Field drawing of test pit 4 section (reconstructed from Walker, 2021, drawing sheet 4)

Figure 8.7a shows one example of a section drawing, extracted from a composite
illustration of selected sections which uses the standard template incorporating
scalebar, key, and logo etc. All the layers are demarcated with firm, strong
boundaries with the exception of the interfaces between 4/006, 4/002 and 4/004
which are shown with dashed lines. Apart from labels for the areas representing
layers and the cardinal points at the corners of the section, areas of natural deposit
and an animal burrow are also labelled. Crosses mark the location of a horizontal
datum line though no locational information is provided (although the height of
the datum can be calculated from the levels register in the digital archive). All the
information concerning the nature of the layers visible in the section is reliant on
the accompanying text which indicates that the dashed lines represent interfaces
between layers that were difficult to define clearly, and discusses the interpretation
of 4/004 as a possible pit (subsequently interpreted as be root/animal disturbance)
and cut 4/005 as a linear feature (thought to be modern disturbance). No distinction
between layers and cuts is shown in the presentation of the context numbers, other
than the use of connecting lines. The section drawing itself is a simplified, summary
diagram of what was recorded in the field: reference to the archived field drawing
(see Fig. 8.7b) shows the presence of annotations and the representation of stones in
the sections, together with the location of pottery, marked as a strong black line in
4/006 to the left of the animal burrow and identified as glazed post-medieval pottery
in the archived record (Walker, 2021). The interface between 4/001 and 4/006 is
shown as a firm boundary in the final drawing but is evidently a dashed line in
the field drawing, and is noted as a diffuse horizon in the archive record (Walker,
2021). This example highlights not only the interrelationship of illustration and text

8 Extending Discourse Analysis in Archaeology: A Multimodal Approach

179

but also the relationship between report and archive, in the way that it sheds light
on details in the field records that do not make it through into the final report as a
consequence of decisions made during the post-excavation process which are only
revealed by virtue of access to the digital archive.
These two case studies underline the inter-relationships between visual representation and textual description and interpretation in archaeological reports.
At times, the visual expands on what is included in the text, at other times it
simply illustrates what the text says. On other occasions, the text explains what
the visual is showing, and in others again, the text and visual(s) interact to
resolve questions about both. Other reports will not necessarily share the same
ambiguities or the same sets of relationships, since these are in part related to
convention, organisational and individual custom, and circumstances surrounding
the archaeological intervention concerned. Again, the observations made here are
not criticisms of the reports or their authors, but are derived from a close reading
of the selected visual representations and their associated texts. The outcome of
this overview demonstrates that, for some purposes at least, visual representations
from archaeological texts cannot easily be ignored, as they do more than simply
accompany the text or illustrate what is already clear from the textual descriptions.
Of course, for certain approaches – for instance, natural language processing to
extract the what, where, and when of archaeological sites (for example, see Wright
and Evans, this volume) – the visual representations are not required since adequate
information can usually be found in the text. However, if more detailed processing
of the text is undertaken to extract information about features, contexts, objects, and
so on, then it is very likely that the visual representations will provide important
information, both supplementing and providing new information to add to that
which can be derived from the text. Seeking to understand the archaeological
data, and the subsequent warrants and claims concerning a site without including
consideration of the visual representations accompanying the textual report will
demonstrably risk inaccuracy and error. A multimodal analysis that incorporates
all the modes embedded within such archaeological reports is therefore a necessity
in all but the simplest of cases.

8.6 Digital Multimodal Discourse Analysis
A multimodal analysis incorporating all the modes of communication present at
the same time can be a highly complex task, a complexity compounded by the
reliance on describing and categorising non-linguistic modes in textual terms.
Questions about the extent to which this kind of analysis can be automated and
conducted digitally with minimal human intervention have been asked for some
time, largely because computers are commonly seen to be more compatible with
linguistic rather than graphical forms of analysis. Optical character recognition
enabled large bodies of text to be consumed digitally, while natural language
processing techniques are capable of automating the annotation of texts. However,

180

J. Huggett

Salway (2010, p. 50) highlighted that “A major obstacle to the computer-based
analysis of multimodal texts is the current limit on what can be achieved with
automatic image and video analysis techniques, compared with text analysis.”.
Similarly, Thomas (2017, p. 2) describes illustrations as “the pictorial obstacles
that computational tools come up against”. The primary problem identified is that
“images . . . cannot be treated in the same computational way as texts: they cannot
be marked up, retrieved or ‘mined’ like words.” (Thomas, 2017, p. 2). Text is seen
to be more computationally tractable because words and sequences of words form
explicit meaning-bearing units, whereas visual representations have no equivalent
accessible units of meaning (e.g. Salway, 2010, pp. 51–52; Kirschenbaum, 2003,
pp. 145–46). As a consequence, computer-based analyses of visual representations
have relied on textual descriptions of selected characteristics or the detection of
words and phrases within the textual component that refer in various ways to
the accompanying images. However, this approach creates a significant semantic
gap between a coded description of a graphical representation and what that
visualisation actually contains (and correspondingly, what a ‘reader’ would see).
The textual characterisation is a poor substitute for the original, and further, inserts
an interpretative layer incorporating a specific theoretical perspective into the
analysis.
A method for automating the handling of images that has been increasingly
investigated is the use of neural networks: deep learning computer systems which
are capable of recognising objects and categorising images, and which can be used
to automatically annotate an image collection. In supervised neural networks, a
representation of the elements sought within the image dataset has to be coded
or a metadata constructed in advance (Arnold & Tilton, 2019, p. i4). To do
this, either an existing training dataset which has already been labelled may be
employed, or a new training dataset has to be created which entails coding a set
of images in advance. In either event, a degree of manual tagging or annotation is
required before being subsequently applied automatically across the larger image
set via the neural network. Alternatively, in unsupervised neural networks, the
algorithm discovers patterns and groupings without the need for pre-labelled data.
The complexity and work entailed in creating a training dataset can be considerable
(for example, Hiippala et al., 2021, p. 673ff), which means that most analysts
employ a pre-trained neural network and apply it to the data in question. Many of
these pre-trained networks are based on the ImageNet dataset, consisting of over 14
million labelled images across over 20,000 categories (see Crawford & Paglen, 2019
for a critical overview). For example, Arnold and Tilton (2019, p. i10) employed
an ImageNet-trained neural network on a collection of 170,000 images and were
able to categorise the images on the basis of the dominant objects represented in
each image. However, Wevers and Smits (2020) employed an ImageNet-trained
neural network to categorise images within newspapers, but found that it did not
work well on historic images. This is partly because ImageNet employs images
scraped from the Internet, hence focuses on contemporary objects captured using
high-resolution photography (Wevers & Smits, 2020, p. 200). The same problem
has been experienced elsewhere; for example:

8 Extending Discourse Analysis in Archaeology: A Multimodal Approach

181

. . . networks are often challenged by unknown, often pre-modern object categories or
objects defamiliarized by style properties. This is mainly because detection networks were
trained on real photos and therefore have never seen instances of swords, medieval clothing
or objects deformed by Cubism . . . (Lang & Ommer, 2021, p. 7)

Other related problems arising with such pre-trained networks include the fact that
resources such as ImageNet are primarily photographic in nature (Chávez Heras &
Blanke, 2021, p. 1155) which limits their value for other non-linguistic data, so,
although Wevers and Smits were able to use their neural network to differentiate
between images and illustrations, for example, their subsequent analyses focus on
images alone (Wevers & Smits, 2020, p. 197). Furthermore, images in ImageNet and
other similar training datasets are labelled in terms of nouns, so although objects etc.
may be detected, conceptual descriptions are not incorporated which is a limitation
when it comes to understanding meaning-making. Pre-trained networks are also
susceptible to the biases within the training data (see, for example, Crawford &
Paglen, 2019) as well as what Offert and Bell call ‘perceptual bias’, defined as “the
difference between the assumed ‘ways of seeing’ of a machine vision system, our
reasonable expectations regarding its way of representing the visual world, and its
actual perceptual topology.” (Offert & Bell, 2021, pp. 1133–1134). While Arnold
and Tilton (2019) describe the use of neural networks as “distant viewing”, in which
the ‘viewing’ is undertaken by the network, Offert and Bell argue for ‘close reading’
of feature visualisations. Using the output images generated by the neural network
in response to the inputs, rather than the input images themselves, they are able to
show that the original dataset is often heavily biased towards specific, misleading,
depictions. For example, the ‘fence’ class “not only picked up the general geometric
structure of the fence but also the fact that many photos of fences in the original
dataset . . . seem to contain people confined behind these fences ... this also means
that images of people behind fences will appear more fence-like to the classifier”
(Offert & Bell, 2021, p. 1141). This underlines that algorithms do not look at an
image in the way humans do: they handle images as matrices of pixel values (Wevers
& Smits, 2020, p. 196).
A fundamental problem with neural networks, however, is their lack of interpretability (c.f. Huggett, 2021, p. 424ff; see also Offert & Bell, 2021, pp. 1135–
1136). How the different layers of a neural network actually generate the identification is largely opaque, or, if examined closely, is largely uninterpretable to
the human eye. A range of approaches to the interpretability of machine learning
have been proposed: for example, building models that have explainability designed
into them from the outset, post-hoc methods which seek to approximate the model
in a way that is more easily explainable, and interactive methods which allow a
clearer functioning of the model at each stage (Selbst & Barocas, 2018, p. 1110). All
have limitations built into them, whether it is over-simplifying the model to make it
understandable or limiting the range of variables under consideration, for example.
Hiippala (2021, pp. 144–47) writes of the cascading risks in applying computational
models, ranging from the selection of data and its annotation, the model selected,
the training choices made, and the subsequent deployment of the model, and argues
that these require an understanding of the underlying assumptions at each stage,

182

J. Huggett

which underlines the importance of expertise spanning the humanities and computer
science. In the end, however, computational models are not yet at the stage where
they can meet Kirschenbaum’s challenge:
Whereas it might be possible to imagine a pattern-matching algorithm that could distinguish
between shepherds and sheep, how could a computer ever hope to recognize the difference
between shepherds and, say, philosophers? (Kirschenbaum, 2003, p. 147)

8.7 Conclusions
A recent discussion of information-making in archaeological field reports looks at
how archaeologists document their information work practices within their reports,
and finds that evidence for this occurs throughout the typical report (Huvila et
al., 2021, p. 1120). Interestingly, the use of visual representations other than
photographs as a means of documenting practice is not considered in what is
primarily a textual review of report writing, essentially relying on whether or not
the ‘event’ of drawing or photography is referred to in the narrative text rather than
evidence of its presence in the report (Huvila et al., 2021, p. 1113). However, they do
note that most reports contain photographs which may be used to depict the context
of the site and its details along with images of archaeologists at work (Huvila et al.,
2021, p. 1115), as also seen in the case studies discussed above. Clearly, including
the different kinds of visual representations used in archaeological report – maps,
plans, sections, etc. as well as photographs – in a multimodal study as discussed
here would be a natural extension to this kind of work.
However, the use of discourse analysis in archaeology (including the discussion
here) has, perhaps inevitably, focused on finished products: the final textual reports
of archaeological interventions. If a key objective of discourse analysis is to understand archaeological knowledge creation rather than to simply enhance the ability to
categorise and locate archaeological reports, then this emphasis on the end products
of archaeological practice is only part of the story. Archaeological reports are
constructed from the products of fieldwork: the databases, excavation diaries, textual
records, plans, sections, drawings, photographs, as well as the objects themselves.
It is the creation of these and the way these are subsequently incorporated into the
final narrative that constitutes the process of archaeological knowledge creation.
This process is broadly equivalent to the acts of translation, transduction, and
transformation defined by Kress (2010, pp. 124–30) in relation to the movement
of meaning and meaning change. Translation essentially constitutes the movement
of meaning from one mode to another, effectively from one ‘language’ to another
(Kress, 2010, p. 124). This includes the shift from image to writing, from descriptive
record to drawing, for example, such as takes place during the post-excavation phase
of archaeological reporting. Kress calls this ‘transduction’: “the re-articulation
of meaning from the entities of one mode into the entities of the new mode”
(Kress, 2010, p. 125), emphasising the shift from words to visual representation
or from photographic image to textual description. Kress also defines the process of

8 Extending Discourse Analysis in Archaeology: A Multimodal Approach

183

‘transformation’, which entails the reordering of elements in a text or other semiotic
object while remaining in the same mode and without ontological change (Kress,
2010, p. 129). For instance, using textual descriptions and interpretations from field
records and incorporating them in narrative text would constitute transformation,
whereas digitising a pencil field drawing would be transduction, moving from one
material mode to another even whilst retaining its identity as a drawing. The same
processes of translation, transduction and transformation might equally be applied
to the archaeological recording of the material evidence in the first place, but such
would be beyond the scope of this chapter and would further extend the discourse
analysis into speech and gesture, for example (see Edgeworth, 2003, 2006, 2012,
for instance). In relation to field recording, however, the work by Mickel (2015) and
Sandoval (2020) on excavation diaries, Morgan et al.’s (2021) study of drawing and
knowledge construction, and especially Sandoval’s examination of context records
and their accompanying sketches (Sandoval, 2021) show how a close reading of
such records can shed light on the knowledge creation process (see also Huggett,
2020, p. 10ff). The value of a multimodal analysis is the way in which it would
draw all these threads (and more) together.
In the same way that an understanding of knowledge creation in archaeology is
limited by a discourse analysis focused primarily on finished texts, examinations
of knowledge creation are, perhaps by definition, largely focused on creation rather
than the consumption and consequent understanding of knowledge. For example,
“Scholars studying multimodal discourse have mainly focused on meaning-making
as the primary property of the text and as the result of the intention of the maker
rather than as the result of the inference process carried out by the receiver of
a multimodal text” (Tseronis & Pollaroli, 2018, p. 150). To a degree, this task is
taken up by studies of argumentation, both within discourse studies and beyond (for
example, see the approaches outlined in many of the contributions to this volume).
However, the methods of analysis used in argumentation studies such as these are
primarily textual in outlook and there is comparatively little consideration of the
use of non-linguistic modes in archaeological argumentation. Beyond discourse
studies, non-linguistic modes similarly lack discussion in the context of ‘reading’
archaeology: for example, Gibbon (2014) only refers to images in the mind, not
on the page, while elsewhere the traditional emphasis remains on reading the
past as if it were a text (for example, Hodder & Hutson, 2003). Groarke, for
example, emphasises the range of semiotic resources that may be incorporated
within argumentation, employed by the narrator and received by the reader:
. . . there are modes of arguing that employ visuals of many different sorts (diagrams,
graphs, photographs, videos, paintings, observation, etc.), tactile sensations, musical notes,
non-verbal sounds, and a wide variety of other non-verbal elements. . . . In the age of
print, an overwhelming emphasis on the verbal mode of arguing may have been adequate
and appropriate. In a digital age, we need a set of modes that accommodates digital
communication and the ease with which it embraces images and sounds of all sorts.
(Groarke, 2015, p. 142)

Clearly, coming to an understanding of how archaeological knowledge is created
is an important endeavour, but an appreciation of how that knowledge is received

184

J. Huggett

and how the different modes incorporated in its communication are employed is
also critical to that understanding, and to subsequent developments in knowledge
creation and communication. So while analyses of discourses that look beyond texts
are important, analyses that consider all parties to a discourse, rather than just those
who created it, are equally so.
This emphasis on the need for multimodal analysis has been a core proposition
in this chapter, with a particular stress on the importance of visual representations as
one of the key vehicles in archaeological discourse. Consequently it has been argued
that limiting discourse analysis to text, especially when visualisations are present,
is a significant restriction in understanding the nature of that discourse. Archaeological visual representations are of more than secondary interest: “they intimately
resonate with broader concerns of knowledge production and archaeological theory”
(Hussain, 2021, p. 155). Unpicking those intimate relationships within the collective
ensemble of semiotic resources used in archaeological knowledge creation is a task
requiring a multimodal, rather than monomodal, approach.

References
Adkins, L., & Adkins, R. (2009). Archaeological illustration (Cambridge Manuals in Archaeology). Cambridge University Press.
Andrén, A. (1998). Between artifacts and texts: Historical archaeology in global perspective
(Contributions to Global Historical Archaeology). Springer. https://doi.org/10.1007/978-14757-9409-0
Arnold, T., & Tilton, L. (2019). Distant viewing: Analyzing large visual corpora. Digital
Scholarship in the Humanities, 34(Supplement_1), i3–i16. https://doi.org/10.1093/llc/fqz013
Baigrie, B. (1996). Introduction. In B. Baigrie (Ed.), Picturing knowledge: Historical and
philosophical problems concerning the use of art in science (pp. xvii–xxiv). University of
Toronto Press.
Baird, J. A. (2011). Photographing Dura-Europos, 1928–1937: An archaeology of the archive.
American Journal of Archaeology, 115(3), 427–466. https://doi.org/10.3764/aja.115.3.0427
Baird, J. A. (2019). Exposing archaeology: Time in archaeological photographs. In L. McFadyen
& D. Hicks (Eds.), Archaeology and photography: Time, objectivity and archive (pp. 73–95).
Bloomsbury Visual Arts.
Barrett, J. C. (1988). Fields of discourse: Reconstituting a social archaeology. Critique of
Anthropology, 7(3), 5–16. https://doi.org/10.1177/0308275X8800700301
Bateman, J. (2006). Pictures, ideas, and things: The production and currency of archaeological
images. In M. Edgeworth (Ed.), Ethnographies of archaeological practice: Cultural encounters, material transformations (pp. 68–80). AltaMira Press.
Bateman, J. A. (2011). The decomposability of semiotic modes. In K. L. O’Halloran & B. A. Smith
(Eds.), Multimodal studies: Exploring issues and domains (Routledge Studies in Multimodality
2) (pp. 17–38). Routledge. https://doi.org/10.4324/9780203828847
Bateman, J. A., Wildfeuer, J., & Hiippala, T. (2017). Multimodality: Foundations, research
and analysis — problem-oriented introduction. De Gruyter. https://doi.org/10.1515/
9783110479898
Beckett, A. (2018). Inverkeithing Friary archaeological excavation. (Northlight Heritage). Archaeology Data Service [distributor]. https://doi.org/10.5284/1058630

8 Extending Discourse Analysis in Archaeology: A Multimodal Approach

185

Bednarek, M., & Caple, H. (2017). The discourse of news values: How news organizations create newsworthiness. Oxford University Press. https://doi.org/10.1093/acprof:oso/
9780190653934.001.0001
Bueno, O. (2016). Visual reasoning in science and mathematics. In L. Magnani & C. Casadio
(Eds.), Model-based reasoning in science and technology (Studies in applied philosophy,
epistemology and rational ethics) (pp. 3–19). Springer. https://doi.org/10.1007/978-3-31938983-7_1
Burdick, A., Drucker, J., Lunenfeld, P., Presner, T., & Schnapp, J. (2012). Digital humanities. MIT
Press. https://doi.org/10.7551/mitpress/9248.001.0001
Candea, M. (2019). On visual coherence and visual excess: Writing, diagrams, and anthropological
form. Social Analysis: The International Journal of Cultural and Social Practice, 63(4), 63–88.
https://doi.org/10.3167/sa.2019.630404
Carter, C. (2015). The development of the scientific aesthetic in archaeological site photography?
Bulletin of the History of Archaeology, 25(2), Art. 4. https://doi.org/10.5334/bha.258
Chávez Heras, D., & Blanke, T. (2021). On machine vision and photographic imagination. AI &
Society, 36, 1153–1165. https://doi.org/10.1007/s00146-020-01091-y
CIfA. (2020a). Standard and guidance for an archaeological watching brief. Chartered Institute
for Archaeologists. https://www.archaeologists.net/codes/cifa
CIfA. (2020b). Standard and guidance for archaeological field evaluation. Chartered Institute for
Archaeologists. https://www.archaeologists.net/codes/cifa
Connah, G. (2010). Writing about archaeology. Cambridge University Press. https://doi.org/
10.1017/CBO9780511845383
Crawford, K., & Paglen, T. (2019, September 19). Excavating AI: The politics of images in
machine learning training sets. Excavating AI.. https://excavating.ai
Drucker, J. (2014). Graphesis: Visual forms of knowledge production. Harvard University Press.
Edgeworth, M. (2003). Acts of discovery: An ethnography of archaeological practice (British
Archaeological Reports International Series 1131). Archaeopress.
Edgeworth, M. (Ed.). (2006). Ethnographies of archaeological practice: Cultural encounters,
material transformations (Worlds of Archaeology Series). AltaMira Press.
Edgeworth, M. (2012). Follow the cut, follow the rhythm, follow the material. Norwegian
Archaeological Review, 45(1), 76–92. https://doi.org/10.1080/00293652.2012.669995
Engebretsen, M., & Weber, W. (2018). Graphic modes: The visual representation of data. In
C. Cotter & D. Perrin (Eds.), The Routledge handbook of language and media (Routledge
handbooks in linguistics) (pp. 277–295). Routledge. https://doi.org/10.4324/9781315673134
Engelhardt, Y. (2007). Syntactic structures in graphics. IMAGE. Zeitschrift Für Interdisziplinäre
Bildwissenschaft, 5(3:1), 23–35. https://doi.org/10.25969/MEDIAREP/16745
Fagan, B. (2016). Writing archaeology: Telling stories about the past. Routledge. https://doi.org/
10.4324/9781315415611
Flanders, J. (1998). Trusting the electronic edition. Computers and the Humanities, 31(4), 301–
310.
Foucault, M. (1989). Archaeology of knowledge. (A. M. Sheridan Smith, Trans.). Routledge. https:/
/doi.org/10.4324/9780203604168.
Frischer, B., & Dakouri-Hild, A. (Eds.). (2008). Beyond illustration: 2D and 3D technologies as
tools for discovery in archaeology (British Archaeological Reports International Series 1805).
Archaeopress.
Garstki, K. (2017). Virtual representation: The production of 3D digital artifacts. Journal of
Archaeological Method and Theory, 24(3), 726–750. https://doi.org/10.1007/s10816-0169285-z
Gee, J. P. (2011). An introduction to discourse analysis: Theory and method (3rd ed.). Routledge.
https://doi.org/10.4324/9780203847886
Gibbon, G. E. (2014). Critically reading the theory and methods of archaeology: An introductory
guide. AltaMira Press.

186

J. Huggett

Giere, R. (1996). Visual models and scientific judgement. In B. Baigrie (Ed.), Picturing knowledge:
Historical and philosophical problems concerning the use of art in science (pp. 269–302).
University of Toronto Press.
Gillings, M., Hacigüzeller, P., & Lock, G. (Eds.). (2019). Re-mapping archaeology: Critical
perspectives, alternative mappings. Routledge. https://doi.org/10.4324/9781351267724
Groarke, L. (2015). Going multimodal: What is a mode of arguing and why does it matter?
Argumentation, 29(2), 133–155. https://doi.org/10.1007/s10503-014-9336-0
Halliday, M. A. K., & Matthiessen, C. M. I. M. (2006). Construing experience through meaning:
A language-based approach to cognition. Open Linguistics Series. Continuum.
Hanson, W. S., & Oltean, I. A. (Eds.). (2013). Archaeology from historical aerial and satellite
archives. Springer. https://doi.org/10.1007/978-1-4614-4505-0
Harris, Z. S. (1952). Discourse analysis. Language, 28(1), 1–30. https://doi.org/10.2307/409987
Hawkes, C. (1954). Archeological theory and method: Some suggestions from the Old World.
American Anthropologist, 56(2), 155–168.
Hiippala, T. (2021). Distant viewing and multimodality theory: Prospects and challenges. Multimodality & Society, 1(2), 134–152. https://doi.org/10.1177/26349795211007094
Hiippala, T., Alikhani, M., Haverinen, J., Kalliokoski, T., Logacheva, E., Orekhova, S., Tuomainen,
A., Stone, M., & Bateman, J. A. (2021). AI2D-RST: A multimodal corpus of 1000 primary
school science diagrams. Language Resources and Evaluation, 55(3), 661–688. https://doi.org/
10.1007/s10579-020-09517-1
Hodder, I. (1986). Reading the past: Current approaches to interpretation in archaeology (1st ed.).
Cambridge University Press.
Hodder, I., & Hutson, S. (2003). Reading the past: Current approaches to interpretation in archaeology (3rd ed.). Cambridge University Press. https://doi.org/10.1017/CBO9780511814211
Hope-Taylor, B. (1967). Archaeological Draughtsmanship: Part III. Antiquity, 41(163), 181–189.
Huggett, J. (2020). Capturing the silences in digital archaeological knowledge. Information, 11(5),
278. https://doi.org/10.3390/info11050278
Huggett, J. (2021). Algorithmic agency and autonomy in archaeological practice. Open Archaeology, 7(1), 417–434. https://doi.org/10.1515/opar-2020-0136
Hussain, S. T. (2021). Compelling image-worlds: A pictorial perspective on the epistemology of
stone artefact analysis in Palaeolithic archaeology. In S. A. de Beaune, A. Guidi, O. M. Abadía,
& M. Tarantini (Eds.), New advances in the history of archaeology (Proceedings of the XVIII
UISPP World Congress (4–9 June 2018, Paris, France)) (Vol. 16, pp. 138–170). Archaeopress.
Huvila, I., Sköld, O., & Börjesson, L. (2021). Documenting information making in archaeological
field reports. Journal of Documentation, 77(5), 1107–1127. https://doi.org/10.1108/JD-112020-0188
James, S. (1997). Drawing inferences: Visual reconstructions in theory and practice. In B.
Molyneaux (Ed.), The cultural life of images: Visual representation in archaeology (pp. 22–
48). Routledge. https://doi.org/10.4324/9781315888460
James, S. (2015). “Visual competence” in archaeology: A problem hiding in plain sight. Antiquity,
89(347), 1189–1202. https://doi.org/10.15184/aqy.2015.60
Jones, R. H. (2021). Data collection and transcription in discourse analysis: A technological
history. In K. Hyland, B. Paltridge, & L. L. C. Wong (Eds.), The Bloomsbury handbook of
discourse analysis (pp. 9–20). Bloomsbury Academic. https://doi.org/10.5040/9781350156111
Joyce, R. A. (2002). The languages of archaeology: Dialogue, narrative, and writing. Blackwell
Publishers. https://doi.org/10.1002/9780470693520
Kennedy, H., Hill, R. L., Aiello, G., & Allen, W. (2016). The work that visualisation conventions do. Information, Communication & Society, 19(6), 715–735. https://doi.org/10.1080/
1369118X.2016.1153126
Kilchör, F., & Lehmann, J. (2021). Graphical viewing at a distance: Graphical analytics as a
method for the investigation of illustrated books. Visual Communication, 20(3), 415–436.
https://doi.org/10.1177/1470357220972165

8 Extending Discourse Analysis in Archaeology: A Multimodal Approach

187

Kirschenbaum, M. (2003). The word as image in an age of digital reproduction. In M. E. Hocks
& M. R. Kendrick (Eds.), Eloquent images: Word and image in the age of new media (pp.
137–156). MIT Press. https://doi.org/10.7551/mitpress/2694.001.0001
Kjeldsen, J. E. (2018). Visual rhetorical argumentation. Semiotica: Journal of the International Association for Semiotic Studies/Revue de l’Association Internationale de Sémiotique,
220(January), 69–94. https://doi.org/10.1515/sem-2015-0136
Kress, G. (2010). Multimodality: A social semiotic approach to contemporary communication.
Routledge. https://doi.org/10.4324/9780203970034
Kress, G., & van Leeuwen, T. (2001). Multimodal discourse: The modes and media of contemporary communication. Arnold.
Kress, G., & van Leeuwen, T. (2006). Reading images: The grammar of visual design (2nd ed.).
Routledge.
Lang, S., & Ommer, B. (2021). Transforming information into knowledge: How computational
methods reshape art history. Digital Humanities Quarterly, 15, 3. http://digitalhumanities.org/
dhq/vol/15/3/000560/000560.html
Latour, B. (1990). Drawing things together. In M. Lynch & S. Woolgar (Eds.), Representation in
scientific practice (pp. 19–68). MIT Press.
Lopes, D. (2009). Drawing in a social science: Lithic illustration. Perspectives on Science, 17(1),
5–25.
Lucas, G. (2019). Writing the past: Knowledge and literary production in archaeology. Routledge.
https://doi.org/10.4324/9780429444487
McFadyen, L., & Hicks, D. (Eds.). (2019). Archaeology and photography: Time, objectivity and
archive. Bloomsbury Visual Arts. https://doi.org/10.4324/9781003103325
Mickel, A. (2015). Reasons for redundancy in reflexivity: The role of diaries in archaeological epistemology. Journal of Field Archaeology, 40(3), 300–309. https://doi.org/10.1179/
2042458214Y.0000000002
Molyneaux, B. (Ed.). (1997). The cultural life of images: Visual representation in archaeology.
Routledge. https://doi.org/10.4324/9781315888460
Moreland, J. (2003). Archaeology and text (Duckworth Debates in Archaeology). Duckworth.
Moreland, J. (2006). Archaeology and texts: Subservience or enlightenment. Annual Review of
Anthropology, 35(1), 135–151. https://doi.org/10.1146/annurev.anthro.35.081705.123132
Morgan, C. (2016, September). Analog to digital: Transitions in theory and practice in archaeological photography at Çatalhöyük. Internet Archaeology, 42. https://doi.org/10.11141/ia.42.7
Morgan, C., & Wright, H. (2018). Pencils and pixels: Drawing and digital media in archaeological field recording. Journal of Field Archaeology, 43(2), 136–151. https://doi.org/10.1080/
00934690.2018.1428488
Morgan, C., Petrie, H., Wright, H., & Taylor, J. S. (2021). Drawing and knowledge construction in
archaeology: The Aide Mémoire Project. Journal of Field Archaeology, 46(8), 614–628. https:/
/doi.org/10.1080/00934690.2021.1985304
Moser, S. (1996). Visual representation in depicting the missing-link origins. In B. Baigrie (Ed.),
Picturing knowledge: Historical and philosophical problems concerning the use of art in
science (pp. 185–214). University of Toronto Press.
Moser, S. (2001). Archaeological representation: The visual conventions for constructing knowledge about the past. In I. Hodder (Ed.), Archaeological theory today (pp. 262–283). Polity
Press.
Moser, S. (2014). Making expert knowledge through the image: Connections between antiquarian
and early modern scientific illustration. Isis, 105(1), 58–99. https://doi.org/10.1086/675551
O’Halloran, K. L. (2004). Introduction. In K. L. O’Halloran (Ed.), Multimodal discourse analysis:
Systemic-functional perspectives (Open Linguistics Series) (pp. 1–7). Continuum.
Offert, F., & Bell, P. (2021). Perceptual bias and technical metapictures: Critical machine vision
as a humanities challenge. AI & Society, 36, 1133–1144. https://doi.org/10.1007/s00146-02001058-z
Olsen, B. (2003). Material culture after text: Re-membering things. Norwegian Archaeological
Review, 36(2), 87–104. https://doi.org/10.1080/00293650310000650

188

J. Huggett

Opgenhaffen, L. (2021). Visualizing archaeologists: A reflexive history of visualization practice in
archaeology. Open Archaeology, 7(1), 353–377. https://doi.org/10.1515/opar-2020-0138
Paltridge, B. (2012). Discourse analysis: An introduction (2nd ed.). Bloomsbury Academic.
Parcak, S. H. (2009). Satellite remote sensing for archaeology. Routledge. https://doi.org/10.4324/
9780203881460
Perry, S., & Johnson, M. (2014). Reconstruction art and disciplinary practice: Alan Sorrell and the
negotiation of the archaeological record. The Antiquaries Journal, 94, 323–352. https://doi.org/
10.1017/S0003581514000249
Pétursdóttir, Þ. (2020). Visual essays: Different ways of knowing and communicating the
archaeological. Norwegian Archaeological Review, 53(2), 101–103. https://doi.org/10.1080/
00293652.2020.1860119
Preucel, R. W. (2006). Archaeological semiotics (Social Archaeology). Blackwell.
QAA. (2014). ‘Subject benchmark statement: Archaeology’. UK quality code for higher education.
Quality Assurance Agency for Higher Education.
Renfrew, C. (1989). Comments on archaeology into the 1990s. Norwegian Archaeological Review,
22(1), 33–41. https://doi.org/10.1080/00293652.1989.9965488
Rheindorf, M. (2019). Revisiting the toolbox of discourse studies: New trajectories in methodology,
open data and visualization. Palgrave Macmillan. https://doi.org/10.1007/978-3-030-19369-0
Richards, J., Jeffrey, S., Waller, S., Ciravegna, F., Chapman, S., & Zhang, Z. (2011). The
archaeology data service and the Archaeotools project: Faceted classification and natural
language processing. In E. C. Kansa, S. W. Kansa, & E. Watrall (Eds.), Archaeology 2.0: New
approaches to communication and collaboration (pp. 27–56). Cotsen Institute of Archaeology
Press.
Salway, A. (2010). The computer-based analysis of narrative and multimodality. In E. Ruth (Ed.),
New perspectives on narrative and multimodality (Routledge Studies in Multimodality) (pp.
50–64). Routledge. https://doi.org/10.4324/9780203869437
Sandoval, G. (2020). In pursuit of a reflexive recording. An epistemic analysis of excavation diaries
from the Çatalhöyük Research Project. Norwegian Archaeological Review, 53, 135–153. https:/
/doi.org/10.1080/00293652.2020.1854338
Sandoval, G. (2021). Single-context recording, field interpretation and reflexivity: An analysis of
primary data in context sheets. Journal of Field Archaeology, 46(7), 496–512. https://doi.org/
10.1080/00934690.2021.1926700
Schiffrin, D., Tannen, D., & Hamilton, H. E. (2001a). Introduction. In D. Schiffrin, D. Tannen, &
H. E. Hamilton (Eds.), The handbook of discourse analysis (1st ed.). Blackwell. https://doi.org/
10.1002/9780470753460
Schiffrin, D., Tannen, D., & Hamilton, H. E. (Eds.). (2001b). The handbook of discourse analysis
(1st ed.). Blackwell. https://doi.org/10.1002/9780470753460
Selbst, A. D., & Barocas, S. (2018). The intuitive appeal of explainable machines. Fordham Law
Review, 87, 1085–1139. https://doi.org/10.2139/ssrn.3126971
Shanks, M. (1992). Experiencing the past: On the character of archaeology. Routledge. https://
doi.org/10.4324/9780203973639
Shanks, M., & Svabo, C. (2013). Archaeology and photography: A pragmatology. In A. GonzálezRuibal (Ed.), Reclaiming archaeology: Beyond the tropes of modernity (Archaeological
Orientations) (pp. 89–102). Routledge. https://doi.org/10.4324/9780203068632
Shanks, M., & Tilley, C. (1987). Social theory and archaeology. Polity Press.
Smiles, S., & Moser, S. (Eds.). (2005). Envisioning the past: Archaeology and the image.
Blackwell. https://doi.org/10.1002/9780470774830
Tannen, D., Hamilton, H. E., & Schiffrin, D. (Eds.). (2015). The handbook of discourse analysis
(2nd ed.). Wiley. https://doi.org/10.1002/9781118584194
Thomas, J. (2017). Nineteenth-century illustration and the digital. Springer. https://doi.org/
10.1007/978-3-319-58148-4
Thornton, A. (2018). Archaeologists in print: Publishing for the people. UCL Press. https://doi.org/
10.14324/111.9781787352575
Tilley, C. (1991). Material culture and text. Routledge. https://doi.org/10.4324/9781315746883

8 Extending Discourse Analysis in Archaeology: A Multimodal Approach

189

Topper, D. (1996). Towards an epistemology of scientific illustration. In B. Baigrie (Ed.), Picturing
knowledge: Historical and philosophical problems concerning the use of art in science (pp.
215–249). University of Toronto Press.
Tringham, R., & Danis, A. (2019). Doing sensory archaeology. In R. Skeates & J. Day (Eds.), The
Routledge handbook of sensory archaeology (1st ed., pp. 48–75). Routledge. https://doi.org/
10.4324/9781315560175-4
Tseronis, A., & Pollaroli, C. (2018). Introduction: Pragmatic insights for multimodal argumentation. International Review of Pragmatics, 10(2), 147–157. https://doi.org/10.1163/1877310901002001
Uggla, K. (2021). Interpreting information visualization. In S. Petersson (Ed.), Digital human
sciences: New objects – New approaches (pp. 103–126). Stockholm University Press. https:/
/doi.org/10.16993/bbk/
van den Hoven, P. (2012). The narrator and the interpreter in visual and verbal argumentation.
In F. H. van Eemeren & B. Garssen (Eds.), Topical themes in argumentation theory: Twenty
exploratory studies (Argumentation Library) (pp. 257–271). Springer. https://doi.org/10.1007/
978-94-007-4041-9_17
Walker, M. (2020). Wind Hill, Bransdale, North Yorkshire – Archaeological evaluation and watching brief report (AOC Archaeology Group 52051). Archaeology Data Service [distributor].
https://doi.org/10.5284/1085027
Walker, M. (2021). Site data from an archaeological evaluation and watching brief at Wind
Hill, Bransdale, North Yorkshire (AOC Archaeology Group). Archaeology Data Service
[distributor]. https://doi.org/10.5284/1085027
Weber, W. (2019). Towards a semiotics of data visualization – An inventory of graphic resources.
In 2019 23rd international conference information visualisation (IV) (pp. 323–328). https://
doi.org/10.1109/IV.2019.00061
Wevers, M., & Smits, T. (2020). The visual digital turn: Using neural networks to study historical
images. Digital Scholarship in the Humanities, 35(1), 194–207. https://doi.org/10.1093/llc/
fqy085

Part II

Computational Techniques

Chapter 9

Computer Processing of Language:
Where Archaeological Discourse
and Computers Meet
Patricia Martín-Rodilla

Abstract Archaeological practice produces a vast amount of documentation about
our past in form of archaeological narratives in free-format texts (internal reports,
academic publications or dissemination activities). This huge amount of unstructured textual documentation has produced in recent years an increasingly interest
in the application of computational processing of natural language as part of the
archaeological research and practice.
Advancing in the understanding, analysis, processing and exploitation of these
archaeological narratives by machines requires an in-depth training work in methods
and techniques in natural language processing from other areas, such as linguistics,
software engineering or artificial intelligence. This chapter provides a briefly
historical and typological review of the different computational approaches for
natural language processing (which will be addressed in subsequent chapters of
the volume), to later focus on the computational processing of archaeological
discourse and its possibilities. The archaeological discourse in form of free-textual
narratives, constitutes the expression and reflection of the archaeological knowledge
produced. Thus, a computational discursive treatment is necessary to advance in the
archaeologist-computers relation. The different computational approaches adopted
at the discourse level and what kind of applications in archaeology are possible will
be seen here.
Keywords Natural language processing · Archaeology · Discourse parsing ·
Computational discourse analysis

P. Martín-Rodilla (!)
Department of Computer Science and Information Technologies, University of A Coruña,
A Coruña, Spain
e-mail: patricia.martin.rodilla@udc.es
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
C. Gonzalez-Perez et al. (eds.), Discourse and Argumentation in Archaeology:
Conceptual and Computational Approaches, Quantitative Archaeology
and Archaeological Modelling, https://doi.org/10.1007/978-3-031-37156-1_9

193

194

P. Martín-Rodilla

9.1 Introduction
Documentation of the archaeological practice is one of the most important activities
of any archaeologist. The vast amount of raw data and information processed as a
results of fieldwork campaigns, qualitative work (surveys, anthropological studies
in archaeology, etc.) or quantitative approaches (e.g. archaeometry’s studies or geographical, network-based or dating analysis, among others) is commonly interpreted
and reported in form of complex and elaborate natural language narratives about
these findings, either in reports (the so-called grey literature) or in academic and
dissemination publications (scientific journals, books, teaching materials, etc.).
The presence and need for treatment of all this knowledge in freestyle texts,
together with the highly techno-medication of some processes and methodologies in
archaeology (Huggett, 2004), has caused an increasingly interest of the archaeology
community in the computational processing of natural language (especially from
textual sources) and its possibilities in archaeology. We can find, for example, large
archaeological projects at the European level that study NLP possibilities or that
perform specific applications of NLP to archaeological contexts and sources, such as
ARIADNE (Vlachidis et al., 2017) or more recently SEADDA (SEADDA Project,
2020).
With these recent works in mind, the aim of this chapter is twofold. First of
all, the chapter gives the reader a non-exhaustive but multifaceted and up-to-date
overview of natural language processing area. It is important to highlight here
that natural language processing, as an interdisciplinary area, covers a vast field of
knowledge impossible to address in one book chapter. Therefore, it is not the goal of
this first part of the chapter to cover all the theories, methods or techniques available
in NLP, but rather to briefly organize the historical evolution of the discipline and
to provide a small typology of techniques, tools and problems approached with
NLP at the computational level. This first part serves as the basis for the rest
of the chapters of the volume that deal in depth with some of these problems
and/or methods and their application in archaeology. Thus, this first section of the
chapter takes a short historical journey through the discipline of Natural Language
Processing (hereafter NLP) and the type of information that we can currently extract
and analyze using automatic and semi-automatic NLP methods. This information is
extracted at different linguistic levels (lexical, grammatical, etc.) and these levels
determine the type of application and research goals at the archaeological level that
can be proposed. This tour also illustrates some real projects that extract linguistic
information from archaeological sources.
Secondly, the chapter focuses on the computational processing of archaeological
discourse and its possibilities. In recent years, discourse analysis is being integrated
into the human-machine relationship, allowing us some automation and assistance
via software in the identification of discursive elements, referenced ontological entities and inferential relationships. Intrinsically narrative domains, such as historical,
archaeological, or anthropological studies need this level of computational treatment

9 Computer Processing of Language: Where Archaeological Discourse. . .

195

in order to be able to perform a real analysis of the knowledge produced which is
expressed in free texts.
This bidirectional relationship between discourse and archaeological knowledge,
(a transversal ideal across this volume) is explored in this chapter from a more computational point of view, addressing techniques, methods and tools for automatic
and semi-automatic analysis of discourse and its application to current and future
archaeological research.

9.1.1 A Brief History of NLP
We could situate Natural Language Processing (hereafter NLP) as an interdisciplinary study area, since (1) it requires aspects of linguistics, cognitive science,
computer science and artificial intelligence fields, and (2) it is an area clearly
defined by its problem orientation: its main objective is to improve the interaction
between machines and humans using natural language, being machines capable of
understanding, processing, and producing expressions in natural language. This last
aspect, about the natural language production by machines, has given rise to a subarea called natural language generation, being outside the scope of this chapter.
We will therefore focus on the relevant advances that have been made in the
relation between machines and humans in natural language, in which humans
produce natural language data and machines try to understand and process them.
Although contributions from previous formal linguistic theories are clearly
influenced on the field, the 1950s is often cited as the beginning of the discipline of
NLP as we know it today, and specifically, the works of Alan Turing in the definition
of intelligent machines and their relationship with the language in Machine and
Intelligence (Turing, 2009) and the works of Noam Chomsky in Syntactic Structures
book in 1957 (Chomsky, 2002), as foundations on the study of the relations between
natural language and computers.
At this point, Chomsky created a style of grammar called Phase-Structure
Grammar, which methodically translated natural language sentences into a format
that is usable by computers. Subsequent Chomsky works in generative grammar
resulted on real developments during 1960–1970s, such as SHRDLU (Winograd,
1971), a natural language system that allow to give some orders to the computer
using “blocks worlds” with restricted vocabularies (using memory as contextual
information for the system) or ELIZA (Weizenbaum, 1966), a basic conversational
simulator written by Joseph Weizenbaum between 1964 and 1966. It was also at this
time that the U.S. National Research Council (NRC) created in 1964 the Automatic
Language Processing Advisory Committee (ALPAC) with the mission of evaluating
the progress of Natural Language Processing research.
ALPAC analyzed 12 years of research and $20 million dollars invested in
NLP, especially in machine translation related developments. ALPAC report (Pierce
& Carroll, 1966) was issued in 1966 and constitutes a turning point in the
NLP research. ALPAC report was overly critical of research done in machine

196

P. Martín-Rodilla

translation so far and emphasized the need for basic research in NLP area. The
report publication caused the U.S. government to reduce its funding of the topic
dramatically as well as a drastic worldwide drop in NLP research.
In the early 1980s there was a resurgence of NLP studies, thanks to symbolic
approaches and some grammatical extensions (Indurkhya & Damerau, 2010). NLP
starts to be a highly specialized area, with some well-defined tasks in order to
divide the efforts and obtain better results at some linguistic levels. For example,
the study of reference and coreference problems, the rule-based parsing approach or
the Rhetorical Structure Theory at a discursive level are important NLP advances at
this time (Indurkhya & Damerau, 2010; Lesk, 1986; Mann & Thompson, 1988).
The success of the previous advances came only in specific contexts or domains,
almost as ad hoc applications. To try to overcome this barrier in NLP generalization,
the 1990s was dedicated to applying purely grammar-based approaches (later called
the rule-based approaches). The NLP rule-based solutions were evaluated following
statistical methods, with the aim of comparing systems performance and achieving
a certain degree of generalization in the developments provided. Some quantitative
models became popular for some NLP applications, such as N-Grams models
(Rosenfeld, 2000) (an N-gram model predicts the occurrence of a word based on
the occurrence of its N-1 previous words in the text).
Some of the most relevant successes in purely statistical NLP are in the subfield
of automatic translation (Hirschberg & Manning, 2015; Manning & Schutze, 1999),
whereas it present modest results in other specific NLP problems and tasks. Seeking
to an NLP improvement in which tasks that the symbolic, rule-based and statistical
approaches did not offer good results, in 1997 neural networks were applied for the
first time (i.e., LSTM recurrent neural net models (Khurana et al., 2017)), which
would constitute the beginning of the approaches in machine learning (as opposed
to the rule-based approaches).
Currently, advances in representation learning and deep machine learning methods are commonly used in natural language processing, with results showing that
such methods can achieve state-of-the-art results in many NLP tasks.
Thus, we can find today NLP supervised and unsupervised machine learning
approaches. The supervised approach is based on a training dataset that has been
generally annotated by human experts, which is used by the machine to learn
generalizations and data patterns and then to recognize them in larger textual corpus.
Supervised methods require the annotation of a small corpus of training documents,
a time-consuming task but which is a less labor-intensive task than the creation of
hand-crafted rule-based systems. Unsupervised approach refers to machine learning
method without any human intervention in the machine learning process. Thus,
some probabilistic clustering techniques are employed for creating the training
dataset (without human annotations) and for obtaining an output result, with the
subsequent application to a larger corpus.
This dichotomy between rule-based and machine learning approaches continues
today, with different theoretical positions in the NLP researchers (Hirschberg &
Manning, 2015; Indurkhya & Damerau, 2010). Some advantages of the machine
learning approach point to a greater generalization of the solutions achieved and use

9 Computer Processing of Language: Where Archaeological Discourse. . .

197

of the volume of textual data currently available, with adequate precision results.
In terms of the rule-based approach, the advantages point to high customization
of the solution and some higher accuracy rates in some NLP tasks, as well as
the need for no prior annotated corpora or high computational processing. In
recent years, approaches based on green computing, concerned about the impact
of computing and its practices on the environment, have warned of the huge
amount of energy and resources of all kinds used in training machine learning
systems for any task. In this sense, tasks whose rule-based approach shows similar
precisions to learning approaches can constitute a sustainable alternative for many
NLP applications (Strubell et al., 2019). Currently, research trends include hybrid
sustainable NLP methods that allow efficient use of resources with only training
phases if necessary, analyzing risks and pros and cons of each approach, with a more
hybrid methodology. This hybrid approach mainly affects NLP applications with a
large volume of unstructured information (textual or oral), processing requirements
(as the cases of the heritage, cultural and archaeological domain applications). Note
that the applications of all possible NLP approaches to textual, humanistic, cultural,
and archaeological domains will be discussed in following sections of this chapter.
After the previous historical review of NLP, the next section details the language
levels and type of tasks that NLP is currently addressing.

9.2 Natural Language: Understanding Levels and Formal
Structure for Computational Analysis
As could be seen in the previous historical tour, NLP has gone through different
stages of development at a technological and methodological level. Generally, the
transition from one stage to another has always been marked by a technological
advance in information extraction or in information analysis methods. These
advances did not necessarily occur within the NLP community, but rather the
community adopted general approaches and evaluated them to verify that these
advances offered good results in the various tasks that conforms NLP as a discipline.
We can therefore identify four current paradigms that coexist in the developments
of NLP:
• Rule-based methods: It involves the production of sets of linguistic symbology or
sets of rules and grammars (usually handwritten), relying on the manual encoding
of linguistic (and world) knowledge. The rules allow the identification, extraction
and analysis from text at different linguistic levels, as well as the application
of derivation rules that expand the results (Boufaden et al., 2002; Indurkhya &
Damerau, 2010; Polanyi et al., 2004).
• Statistical methods: It involves the formalization of the NLP problem in terms
of a statistical problem, and the subsequent application of statistical models to
confirm or refuse the significance of the model’s results for this specific NLP
task (Manning & Schutze, 1999; Rosenfeld, 2000).

198

P. Martín-Rodilla

• Learning Methods (Neural): It involves the application of automatic algorithms
for learning how to solve the NLP task achieved. Both supervised and unsupervised learning methods are applied, always relying on the capability of
the computer to learn the linguistic knowledge for a big volume of linguistic
information (in form of corpora) (Khurana et al., 2017; Liu et al., 2020).
• Hybrid intelligence methods: It involves the injection of deep and structured
linguistic knowledge (defined by humans as formal knowledge models, not
just annotated texts) into learning models (in machine learning or deep learning approaches) to develop hybrid approaches for NLP tasks. In this hybrid
approaches, abstract and structured knowledge from specialists can be used
not just as training data to learn uninterpretable black-box models, but also to
design the models themselves by making them more transparent, easy to interpret
by humans, and more efficient for specific purposes (Gamallo et al., 2020).
This paradigm also presents some connections in foundations with cognitive
science (Mishra & Bhattacharyya, 2018; Sharp & Delmonte, 2015), in which the
cognition information about the language (studies with eye-tracking or sensors
about the brain mechanisms, etc.) are also include in some computational models.
As previously seen, each paradigm has its advantages and disadvantages, as well
as implications in technological and project organization decisions, in terms of
characteristics of the results obtained and in sustainability depending on the choice.
It is also possible to combine paradigms or to focus on one paradigm for each
specific NLP task defined. For more details on this intra-paradigm analysis, see
Chowdhury (2003), Hirschberg and Manning (2015), Indurkhya and Damerau
(2010), and Khurana et al. (2017).
But it is not only the technological and methodological paradigm to adopt that
we must select when we faced with an NLP challenge. Depending on the goal and
the kind of information that we need to extract and analysis on each NLP project, we
must also decide at what natural language processing level we want to work. NLP
has been subdivided in tasks, creating a typology of tasks (with different levels of
complexity and abstraction in their tasks’ definition), with the aim of solving NLP
problems at different linguistic levels.
Although there are other categorizations of language levels in Linguistics, it
is common to use the categorizations proposed by Liddy and Feldman (Feldman,
1999; Liddy, 1998) to structurally organize the language and later define tasks and
challenges in NLP within each level. Thus, it is possible to extract meaning from
a written text (or spoken language) at seven levels (Feldman, 1999; Liddy, 1998),
from lower to higher level of abstraction:
• Phonological level: analysis of pronunciation and prosody aspects, including
phoneme recognition or similarity tasks, etc. It is common to combine NLP
applications with voice recognition and audio speech technologies in order to
apply NLP at this level (Chaudhary et al., 2018).
• Morphological level: analysis of smallest piece of language to obtain some
meaning. This includes to deal with stems, suffixes and prefixes or lemmas
(Balakrishnan & Lloyd-Yemoh, 2014; Lovins, 1968). Also, determining the part

9 Computer Processing of Language: Where Archaeological Discourse. . .

•

•
•

•

•

199

of speech (POS) for each word is also an important task at this level (Indurkhya
& Damerau, 2010).
Lexical level: analysis of lexical meaning of words and parts of speech interpretations, such as determining the polarity in sentiment analysis for a given
word, determining if certain word is a proper name (NER task, e.g., proper
names of persons or places) (Indurkhya & Damerau, 2010), or conducting the
disambiguation at a word level (Lesk, 1986). Lexical level could appear or not
depends on the linguistic classifications, swiping some tasks with the semantic
level in other classifications.
Syntactic level: analysis sentence structure and sentence-based roles. It implies to
perform a grammatical parsing and interpreting in function of the analysis results
(Indurkhya & Damerau, 2010; Soricut & Marcu, 2003).
Semantic level: analysis of words in the context of the sentence. For instance,
determining polarity in sentiment analysis in a given context or Information
Extraction (IE) task (Andersen et al., 1992), which extract specific predefined
information from the text, especially triples in form of Objects or Subjects and
their relationships.
Discourse level: analysis above each sentence. This allows to analyze paragraphs
or complete documents, trying to extract structural and semantic information,
applying discourse analysis methodologies. The relationships between sentences
allow causal or argumentative analysis (among others) at the document level. It
also includes aspects of discursive intention and coherence (Harris, 1981; Kurdi,
2017; Mann & Thompson, 1988).
Pragmatic level: analysis of the use of linguistic structures in specific situations,
depending on the context of use. Generally, requires analyses in some previous
levels and additional human knowledge that can sometimes be provided to the
machine (Kurdi, 2017).

Note that lexical and semantic levels could share or overlap some tasks in other
linguistic classifications. Using the entire classification detail above, our lexical
level corresponds to analyses more focused on the word itself as a minimum
unit, with its internal structure and self-contained meaning and its specific lexical
typology. As example, in this pair of expressions:
1. “Cold pizza drives me crazy”
2. “You have to be crazy to eat one of those pieces of cold pizza”
At the lexical level, we apply strategies at the word level, so for instance in a
sentiment analysis application, we will work with polarity at a word level, where
the term “crazy” could have a negative polarity, although it is clearly positive in the
first context and negative in the second. The good result of the lexical strategies is
based on their ability to generalize and balance individual values for each unit or
word. This analysis is not only valid to sentiment analysis, but also to any algorithm
or NLP strategy with the same lexical base of operation.
Regarding the semantic level, we include in this classification applications more
focused on the word as a minimum unit but also analyzing its context of use, its

200

P. Martín-Rodilla

meaning in that context and the auxiliary role the word plays in it. Looking back to
the previous sentences and the sentiment analysis example,
1. “Cold pizza drives me crazy”
2. “You have to be crazy to eat one of those pieces of cold pizza”
In this case, we would have algorithms or NLP strategies based not only on the
individual polarity of each element, but also on the role it plays in the specific
sentence (for example, if we have information about the expression “to drive crazy
to someone” and its common use) and we employ this information to reasoning
about the polarity of sentences. As has been explained on the lexical level, some
authors do not make this distinction and work by default at a semantic level, ignoring
(and including in the semantic level) the lexical approach for some NLP algorithms
and strategies. The rest of the levels of the classification used here present a more
standardized behavior.
With these fundamentals, it is possible to see the discipline of NLP as a matrix
structure in which the rows are the different language levels and their subtasks, and
the columns are the paradigms of methods and techniques in NLP (we can find
subdivisions of both rows and columns in other detailed classifications, and it could
be both expanded in the future). Figure 9.1 tries to summarize the matrix structure
that we propose here. In Fig. 9.1, most of the relevant NLP task are categorized by
Liddy and Feldman levels (Feldman, 1999; Liddy, 1998). Since it is not the scope
of this chapter to detail all the possible NLP tasks, we only list the most well-known
NLP tasks so that the reader has an idea of what type of linguistic information is
possible to extract and what depth of analysis can be carried out at each level.
The plus icon indicates the paradigm-linguistic level combinations that occur
most commonly in the literature (where there are more developments and results
achieved). This is not to say that the free holes have not been developed any work.
Note that the more we raise the level of linguistic abstraction, the more work on
learning and hybrid paradigms has been done, due in part to the complexity of the
tasks at the higher levels of abstraction. In the combinations of the matrix without
symbol, the number of jobs or the results are smaller, although it is possible to find
NLP development attempts in almost all combinations.
The NLP structure proposed does not pretend to establish a standard typology
that covers every of the methods/techniques and paradigms adopted, but rather to
summarize the NLP area for the reader, who can easily extend the approach in case
their needs are more specific.

9.2.1 An Overview on Natural Language Computational
Analysis in Archaeology
Not all previous language levels and NLP paradigms have been explored in the
specific domain of archaeology. In many cases, it is the archaeologists who have

9 Computer Processing of Language: Where Archaeological Discourse. . .

201

Fig. 9.1 NLP summary matrix structure. NLP paradigms as columns and linguistic levels (and
some NLP tasks examples) as rows

decided what textual information and at what level is relevant and useful to extract
and analyze for a specific archaeological investigation.
NLP paradigms used in archaeology have followed a similar historical development to the discipline of NLP itself, with initial rule-based approaches to
gradually exploring the statistical and learning approaches. Recently, an active
research community in archaeology explored the possibilities of the most advanced
learning methods on archaeological sources (Alex et al., 2020). Meanwhile, hybrid
intelligence approaches (including NLP cognitive approximations and cognitivebased research questions) are still in early stages at the NLP level and have not been
widely applied in archaeology.
It is important to highlight here that the same implications of taking a decision
about adopting any NLP paradigm (i.e., Ruled-based vs. statistical methods or
learning methods) are also valid for the archaeological domain, or are even slightly

202

P. Martín-Rodilla

magnified in this field. Specifically, rule-based methods are usually employed in
small and ad hoc applications (the rules approach is difficult to generalize) and the
rest of approaches are chosen in larger projects with more volume of information
(but professionals need to be trained).
This is also true for archaeological NLP applications. For example, the rule-based
approach is mostly adopted in small applications of NLP in archaeology, since the
high variability of textual sources makes them sometimes weak to generalize the
developments. Meanwhile, statistical, learning-based or hybrid approaches require
specific training in formal, statistical, and algorithmic methods (sometimes with
a significant learning curve) by archaeologists, or the configuration of interdisciplinary teams to carry out these larger projects.
It is necessary to consider also that in the most up-to-date approaches there may
be other factors that influence both the decision of the NLP paradigm and its results,
such as the degree of familiarity with the texts by the researchers who carry out the
study. This is an aspect that we must consider. The lack of annotated corpora for
training in supervised methods is also a handicap even greater in domains such as
archaeology.
Regarding the different linguistic levels treated, the motivations and goals of the
archaeological community when working on one or more of the previous levels are
numerous and varied. We can find, for example, a whole set of applications at the
lexical level motivated by the existence and previous development in archaeology of
thesauri, domain typologies and controlled vocabularies, which lays an ideal basis
for NLP at the morphological and lexical levels (Felicetti, 2017; Vlachidis et al.,
2017). Thus, and as will be seen in later chapters, it has yielded successful results in
some applications in archaeology.
At the syntactic level, applications in archaeology decrease considerably,
although the studies carried out help us to detect domain-specific challenges.
For example, some initial efforts have made for integrating grammatical analysis
in inferring fieldwork methodologies from grey literature (Epure et al., 2015).
Also, in Jeffrey et al. (2011) and Jockers and Underwood (2015) some structural
challenges in NLP and text mining in the humanities were defined, and the reports
of the ARIADNE project on NLP in archaeology already collect some challenges
in NLP for archaeology that involve from the syntactic to the pragmatic level, such
as the development of multilingual grammars, the role of negation in archaeology
or some special cases of archaeological ontological ambiguity in ontology learning
processes (Vlachidis et al., 2017).
The levels that have been developed in the archaeological domain will be treated
in specific chapters of this volume. Thus, the rest of this chapter is dedicated to the
level of discourse, with chapters for the lexical (Chap. 9), syntactic and semantic
(Chap. 10) and pragmatic (Chaps. 11 and 12) levels.
The previous historical and typological tour detailed us what kinds of problems
can NLP solve as a discipline, what subtasks are addressed, and what approaches
we can use. The rest of the chapter will use those foundations to focus on one of the
levels of highest abstraction and interest in the archaeological domain: the discursive
level. The following sections will detail the work done so far on automatic and semi-

9 Computer Processing of Language: Where Archaeological Discourse. . .

203

automatic discourse processing, why it is interesting and necessary in archaeology,
and how it has been applied until now to archaeological contexts.

9.3 Where Archaeological Discourse and Computers Meet
Among many other functions, language acts as a bridge: it connects people,
transmitting the most unstructured, complex or instinctive ideas from underlying
models. Furthermore, it serves as a pathway between people and machines (e.g.,
programming languages or semantic models). Natural language processing or text
mining techniques have improved this human-machine relationship, both in analysis
(with automatic parsers, language recognizers or prediction models), and in natural
language production (in the form of chatbots or applications data-to-text).
However, many of these advanced models reproduce textual syntactic and/or
semantic patterns in an algorithmic way, avoiding information from higher levels
of abstraction such as speaker intention, composition and discursive coherence
(Martín-Rodilla, 2015).
In recent years, discourse analysis is being integrated into the human-machine
relationship, allowing us the automation of some discursive extraction and the
assistance via software in the identification of discursive elements, referenced
ontological entities and inferential relationships. Inherently narrative domains, such
as archaeology, need software support at the discursive level, mainly due to intrinsic
characteristics such as subjectivity or ambiguity present in the domain (GonzálezPérez, 2018). Let us take an archaeological case study as motivation: Why is the
discursive level interesting for archaeology? and its computational approach?
Let us imagine that we want to carry out an archaeological project that aims to
investigate and recover archaeological heritage at sites in ancient rural areas that
were flooded for dam construction. Numerous studies have been interested in the
number of villages flooded due to the construction of dams between the 1960s and
1970s in USA, Spain (during Franco’s dictatorship) (del Romero Renau, 2013) or
Portugal (Arcà et al., 2001), and their consequent underwater heritage.
Given the temporal proximity of the context, we will have abundant grey
literature (official reports, local news in newspapers, publications about these
archaeological sites, etc.) All this textual material contains narratives throughout
history about the material evidence that we can find (What are the different
chronologies involved?), the motivations and opinions about their flooding, conservation, or destruction (Should we conserve what flooded?), and the implications
it had for the local population (How do they perceive that heritage now?) (Fig. 9.2).
In order to answer these research questions, the automatic or semi-automatic
analysis of the textual sources allows us a systematic treatment of the narratives
produced about this archaeological heritage. Using natural language processing
(NLP), we can, for example, (1) extract from the reports and publications the
types of entities and archaeological findings that have been found in previous
investigations, (2) establish metadata for all the existing documentation and that

204

P. Martín-Rodilla

Fig. 9.2 Example of an archaeological project that illustrates the different levels of NLP information of interest and the absence of discourse-level treatment of the textual sources

allow its optimal computational search or (3) evaluate the newspapers with a
sentiment analysis to deal with the opinions of the local press over time. It is
possible to perform all these analyzes with NLP, working at the lexical, grammatical,
or semantic level. However, let us look again at the research questions we have
proposed: What are the different chronologies involved? Should we conserve what
was flooded? How do local people perceive that heritage now?
As we can see at Fig. 9.2, working at these linguistic levels, we only have a
partial answer to the first question (that is, to the type of archaeological entities
that previous scientific reports or publications findings deal with) or some sentiment
analysis results on opinions, but it is not possible to answer the rest of the questions.
Based on what evidence did the archaeologists reach these conclusions regarding
the chronologies involved? What discourses support the conservation or destruction
of the flooded heritage? What reasons do the local population use to perceive this
heritage as positive or negative?
All these sub-questions already belong to a deeper field of analysis. The discourse
level treatment of the reports, publications, or news about these sites is really what
will give us an answer to the research questions raised.
Of course, it is possible to carry out all this work without software assistance
(not automating any aspect of the analysis process) but, as is evident, the increase
in textual sources of information and in formal methods to perform discourse

9 Computer Processing of Language: Where Archaeological Discourse. . .

205

analysis (connected with the rest of the levels NLP) facilitates the treatment of these
sources, the systematization of the work at a methodological level and improves the
replicability and consistency of the study. Therefore, we will be able to find out what
material evidence supports the chronology of the flooded fields, the different reasons
why it was chosen to flood that village, the different reasons why local population
presents positives or negatives opinions about the flooded heritage etc.
As the example in Error! Reference source not found illustrates, there is a
connection inside the archaeological textual sources (grey literature, publications,
dissemination materials, etc.) between the discourse employed by the archaeologists
for explaining their findings and reasoning and the archaeological knowledge
that has been produced inside the text. The vast amount of textual material on
archaeological investigations currently available makes the computational approach
at all levels (storage and access, extraction, treatment, and analysis) a real necessity.
At the discourse level, the computational approach allows us to cover many textual
sources from the same formal and theoretical representation, comparing different
approaches and applying metrics that allow us to carry out deeper analyzes on the
discursive aspects that we want to analyze. In the previous case study, for example,
it allows us to analyze the causality aspects of all sources looking for the different
reasons why it was chosen to flood that village or a contrast discursive analysis in
the opinion’s analysis.
The development of formal methods of representation, automatic treatment and
specific NLP algorithms at the discursive level opens the door to a treatment of
archaeological textual sources that is much richer and integrated into the research
questions of the archaeological domain. Despite these advantages, the use of semiautomatic or automatic treatment of textual sources in archeology for analysis at the
discursive level is still residual. The next section goes through the computational
analysis of discourse are and, later, it details the works that have used it in
archaeological contexts, where more possible applications can be seen.

9.3.1 Computational Analysis of Discourse
The term “discourse” has been changing in its meaning and references over time
(and studied overlaps with terms such as speech, text or context (Gordon, 2009;
Kurdi, 2017)). Currently, we can find at least two interdisciplinary approaches
that deal with the concept of discourse, considering the linguistic phenomenon
and its content in different ways. In the first place, the inheritance of linguistic,
semantics and communicative studies allows us to define discourse as the underlying
conceptualization in a communicative act (spoken or written), which has domainspecific vocabularies and structural elements. Secondly, discourse can be defined
as a linguistic construct made up of statements, allowing the discourse creator to
assign “meaning to words and to communicate repeatable semantic relations to,
between, and among the statements, objects, or subjects of the discourse” (Foucault
& Kremer-Marietti, 1969).

206

P. Martín-Rodilla

We can therefore identify in these two approaches the objects of a discourse
analysis (writing, conversation, any communicative event), a discipline that allows
subdividing the sub-elements of discourse (at different levels) and formalizing them,
either only attending to linguistic criteria, or expanding the concept and conducting
a discourse analysis that includes socio-psychological aspects of the authors.
Thus, discourse analysis constitutes a methodology of textual analysis (or oral
discourse) based on the subdivision into lower-level elements (i.e., sentences), and
their characterization, to analyze the meaning and internal connections between
the elements (Harris, 1981). It is possible to carry out NLP semi-automatic and
automatic works in both approaches to discourse analysis as a methodology,
although with differences in levels of complexity and applications.
In this chapter we will review existing computational approaches to discourse
analysis focusing on the linguistic approach, including some expansions in intention, coherence, and other formal metrics that already work at the discourse level
(Kurdi, 2017). Thus, the review does not include conceptualizations of the discourse
around ideas, beliefs or relations between knowledge and power (Foucault &
Kremer-Marietti, 1969; Lessa, 2006) because of the reduce number of computational implementations of them. However, the expansion of the discourse analysis
area in the computation applications is increasing day by day, influencing areas such
as argument mining. More information about sociological and philosophical aspects
of discourse and some computational approximations could be find in some chapters
of this volume (e.g., Chaps. 1 and 3).
At a computational level, discourse structure has received an increasingly
attention in recent years due to the benefits its application offers in some NLP tasks,
such as automatic summarization or question answering (Atutxa et al., 2019). Some
discursive phenomena, as topic modelling or anaphora, are more advanced in terms
of computational approaches (Kurdi, 2017; Wiseman et al., 2016). However, the
most important part to deal with when we make any computational approach from
textual sources is the formal aspect of the discourse itself. Formalization allows
the subsequent recognition, extraction, and application of computational methods
in textual analysis. The first formalizations at the level of discourse organized
discourse around units called utterances. These minimal units of discourse were connected to each other both logically and topologically (Kurdi, 2017). The subsequent
development of this germ idea gave rise to formalizations of discursive structures
that allow computational advances in discourse analysis. Hobb’s formalization
constitutes the first formal discourse theory that uses tree as underlined text structure
(Hobbs, 1985), and it is possible to find current applications based on Hobb’s
approach (Dutta et al., 2008). Later, it is from the formulation of the Rhetorical
Structure Theory (hereafter RST), a theory of discourse structure formalized in 1988
by Mann and Thompson (1988), when computational approaches at the discourse
level become generalized. RST internally represents the structure of any discourse
as a tree of discourse units, which are related to each other by rhetorical criteria
(analyzing content at a functional and semantic level within the discourse elements
and their relationships) (Marcu, 2000). We can, therefore, given a textual source,
obtain the underlined discursive tree with the discursive elements, if they are central

9 Computer Processing of Language: Where Archaeological Discourse. . .

207

aspects (nucleus) or peripheral (satellites) within the discourse and if they are related
at a causal, contrastive, elaborative level etc. (Taboada & Mann, 2006).
RST is taken as the base formalization by almost all computational approaches
that currently work at the discursive linguistic level, although some approaches
carry out RST adaptations or use different computational techniques for their parsers
(Kurdi, 2017). Marcu (2000) defined the first method to follow for the construction
of discourse parsers. This method involves a first segmentation of the discourse and
a subsequent construction of the RST discourse tree. The current parsers follows
this methodology, although they carry out the construction of the tree employing
different methods: either through rule-based NLP (Boufaden et al., 2002; Polanyi
et al., 2004; Soricut & Marcu, 2003), or using statistical or machine learning
approaches (Heilman & Sagae, 2015; Joty et al., 2013; Li et al., 2014) and exploring
different source languages (Liu et al., 2020).
Most of the existing parsers are high-quality developments in terms of internal
formalization, offering real automation results from textual sources. However, their
development within research projects without a continuation in funding, makes
many of them little accessible or present high learning curves for their use (and
therefore, high difficulty also for their modification and research improvement). This
could be one of the reasons for the limited knowledge of this type of development
outside the academic environment or areas outside of computational linguistics
field. In order to promote research and application of natural language processing at
the speech level, numerous computational resources have been developed in recent
years, especially different corpus annotated using RST. These annotated corpus act
as a gold standard in algorithm, methods or novel techniques evaluation systems,
as well as training data sources in case we develop new algorithms based on
supervised methods. We can find RST-based treebanks and annotated corpus in
different languages (Cao, 2018; Cao et al., 2018; Das & Stede, 2018; Iruskieta et
al., 2013; Mann & Taboada, 2005–2021).
Using these parsers as an extraction basis, applications of automatic or semiautomatic analysis of discourse have been carried out in diverse domains (in addition
to the applications on NLP tasks improvement), such as medicine (Paulino et al.,
2018), media (in fake news detection or hate speech recognition, among others)
(Fortuna & Nunes, 2018; Karimi & Tang, 2019; Kolhatkar & Taboada, 2017) or the
legal domain (Gamallo et al., 2019; Kurdi, 2017; Moens et al., 2007).
Another area of complementary development has been that of software strategies for displaying information from discourse analysis, in two sub-lines (1) the
discourse analysis carried out manually by experts and their collective annotation—
an extensive review of applications is presented in Martin-Rodilla and Sánchez
(2020)—and (2) information visualization techniques for visualizing the results of
the parsers (Martin-Rodilla & Sánchez, 2020; Zhao et al., 2012).
Apart from RST, in recent years some discourse structural theories have been
developed using graphs as the underlined formalization (Radev, 2000; Webber,
2004; Webber & Joshi, 2012). A complete review about the current state of RST
application and the comparison with graph-based existing approaches in discourse
automation could be found at (Hou et al., 2020).

208

P. Martín-Rodilla

In summary, the computational approaches that allow an automatic analysis of
discursive structures from textual sources have presented a great change in recent
years. Theories such as RST or the novel approaches based on graphs offer us the
necessary formalization to create parsers that automate discursive analysis from
textual sources. The existence of complementary tools at the software level, such as
editors to annotate texts by experts, treebanks as a massive example of text analysis,
and advances in the visualization of results allow the application of discursive
analysis at the linguistic level to a wide variety of domains.
However, the limited accessibility to the tools, the high learning curve of the
methods and tools and the need for interdisciplinary teams in this type of analysis
constitute barriers in the generalization of its application. The following section
briefly details the efforts made in this regard in the archaeological context.

9.3.2 Applications in Archaeological Discourse
Although the study of archaeological discourse from a methodological perspective
on the part of the researcher is a constant in archeological research, the application of
computational methods that allow its formalization, assistance via software or even
automation is still residual. At the topic modeling level, there are initial applications
with existing algorithms (Borgo Ton, 2019), but without formalizing aspects of
discourse from the base textual sources.
At a formal level (which, as previously seen, is necessary to advance in
the application of automatic parsing and deep computational analysis), the first
formalizations of archaeological discourse can be found in the works of Gardin
(1980), with ramifications in current applications (Dallas, 2016; Moscati, 2016).
These works are the basis for the application of computational discourse analysis in
archeology, although they constitute theoretical frameworks not fully implemented
in computational algorithms.
Similar state presents some most advanced efforts to recover the linguistic aspect
of archaeological discourse with formalizations based on Hobbs or RST (MartínRodilla, 2015, 2018; Martín-Rodilla & González-Pérez, 2014). Also, empirical
studies with professionals in archeology and nearby areas of knowledge show
greater satisfaction with these methods in archaeology in the form of software
assistance, or semi-automation (Martin-Rodilla, 2018; Rodilla & González-Pérez,
2017), concluding that some human know-how and interdisciplinary teams are
necessary to undertake more ambitious efforts in computational discourse analysis
in archaeology (González-Pérez, 2021).
In summary, the archaeological domain has not yet fully tested the advances
detailed above in automatic and semi-automatic discourse analysis, either because
they do not leave the academic-linguistic environment or because of an archaeological excessive fear of automation, although there are some promising formalizations
and studies in this area that would allow, in the future, to combine the advances
detailed in this chapter with real applications in archaeological contexts.

9 Computer Processing of Language: Where Archaeological Discourse. . .

209

9.4 Computer Processing of Language and Discourse
in Archaeology: Current and Future Perspectives
In summary, this chapter consists of two separate thematic parts. Firstly, an extensive
tour is made through the history of natural language processing as a discipline, its
paradigms (the most common methods and tasks) and the linguistic levels at which
NLP can work. In addition, the current NLP situation in archaeology is briefly
contextualized (a more in-depth study will be addressed in successive chapters).
At the NLP paradigm level, rule-based approaches are well established, and
learning-based approaches are currently expanding. Future steps possibly include
more attempts in hybrid intelligence approaches in archaeology. Recent projects
more focused on cognitive aspects in the discipline, such as the one recently awarded
in the Sinergy Grant ERC 2020 call called XSCAPE (Incipit-CSIC, 2020), focused
on researching if the material structures of archaeological settlements, buildings,
roads, and artefacts actively change brain and mind patterns of thought and attention.
This also could have ramifications in terms of the language and discourse produced
by the archaeologists in their textual narratives and give us an idea of the future
importance of cognitive studies in archaeology. This trend on cognitive studies will
probably cause an advance also in the applications of hybrid NLP and cognitive
approaches in studies from textual sources in archaeology, but the latter is only a
prediction that we must wait to see fulfilled.
In the second part, the linguistic level of discourse and its computational
possibilities in archaeology are specifically addressed. As it has been seen, the
discursive level constitutes a fundamental level at a theoretical level (since it deals
with archaeological discourse vital aspects such as causality, negation, subjectivity,
or ambiguity, among others). In recent decades, there have been important advances
in the computational formalization and NLP applications at the discourse level.
However, the software assistance and NLP approaches at a discourse level is still
residual in archaeology. Current and future advances in representation, extraction
and analysis techniques at the discursive level will allow a generalization of the
approach and its possible systematic application in textual sources in archaeology.
Acknowled gments This research has received financial support from the Saving European
Archaeology from the Digital Dark Age (SEADDA) 2019-2023 COST ACTION CA 18128 and
Xunta de Galicia—“Consellería de Cultura, Educación e Universidade” and the ERDF (“Centro
Singular de Investigación de Galicia” accreditation ED431G 2019/01).

References
Alex, B., Kramer, I. C., Verschoof-van der Vaart, W. B., Orengo, H. A., Garcia-Molsosa,
A., & Conesa, F. C. (2020). Machine learning in archaeological research; challenges and
opportunities. Session 5 at 48th computer applications and quantitative methods in archaeology
(CAA) conference, Oxford, UK.

210

P. Martín-Rodilla

Andersen, P. M., Hayes, P. J., Weinstein, S. P., Huettner, A. K., Schmandt, L. M., & Nirenburg,
I. (1992). Automatic extraction of facts from press releases to generate news stories. In Third
conference on applied natural language processing, pp. 170–177.
Arcà, A., Bednarik, R. G., Fossati, A., Jaffe, L., & Abreu, M. S. (2001). Damned dams again: The
plight of Portuguese rock art. Rock Art Research, 18, i–iv.
Atutxa, A., Bengoetxea, K., de Ilarraza, A. D., & Iruskieta, M. (2019). Towards a top-down
approach for an automatic discourse analysis for Basque: Segmentation and Central Unit
detection tool. Plos One, 14(9), e0221639.
Balakrishnan, V., & Lloyd-Yemoh, E. (2014). Stemming and lemmatization: A comparison of
retrieval performances. Lecture Notes on Software Engineering, 2, 262–267.
Borgo Ton, M. (2019). Magic lantern shows through a macroscopic lens: Topic modelling and
mapping as methods for media archaeology. Early Popular Visual Culture, 17(3–4), 341–360.
Boufaden, N., Lapalme, G., & Bengio, Y. (2002). Segmentation en thèmes de conversations
téléphoniques: traitement en amont pour l’extraction d’information. En Actes de la 9ème
conférence sur le Traitement Automatique des Langues Naturelles (TALN) 2002.
Cao, S. (2018). Elaboration of a RST Chinese treebank.http://hdl.handle.net/10810/26206
Cao, S., da Cunha, I., & Iruskieta, M. (2018). The RST Spanish-Chinese treebank. In Proceedings
of the joint workshop on linguistic annotation, multiword expressions and constructions (LAWMWE-CxG-2018), pp. 156–166, .
Chaudhary, A., Zhou, C., Levin, L., Neubig, G., Mortensen, D. R., & Carbonell, J. G. (2018).
Adapting word embeddings to new languages with morphological and phonological subword
representations. arXiv preprint arXiv:1808.09500.
Chomsky, N. (2002). Syntactic structures. Walter de Gruyter.
Chowdhury, G. G. (2003). Natural language processing. Annual Review of Information Science
and Technology, 37(1), 51–89.
Dallas, C. (2016). Jean-Claude Gardin on archaeological data, representation and knowledge:
implications for digital archaeology. Journal of Archaeological Method and Theory, 23(1),
305–330.
Das, D., & Stede, M. (2018). Developing the bangla RST discourse treebank. In Proceedings of
the eleventh international conference on language resources and evaluation (LREC 2018).
del Romero Renau, L. (2013). La construcción de sociedades hidráulicas:: El caso de España y del
Oeste de EE. UU. Cuadernos de geografía, 93, 53–77.
Dutta, K., Prakash, N., & Kaushik, S. (2008). Resolving pronominal anaphora in hindi using hobbs
algorithm. Web Journal of Formal Computation and Cognitive Linguistics, 1(10), 5607–5611.
Epure, E. V., Martín-Rodilla, P., Hug, C., Deneckère, R., & Salinesi, C. (2015). Automatic process
model discovery from textual methodologies. In 2015 IEEE 9th international conference on
research challenges in information science (RCIS), pp. 19–30.
Feldman, S. (1999). NLP meets the Jabberwocky: Natural language processing in information
retrieval. Online-Weston Then Wilton, 23, 62–73.
Felicetti, A. (2017). Teaching archaeology to machines: Extracting semantic knowledge from free
text excavation reports. Digital Humanities, p. 9.
Fortuna, P., & Nunes, S. (2018). A survey on automatic detection of hate speech in text. ACM
Computing Surveys (CSUR), 51(4), 1–30.
Foucault, M., & Kremer-Marietti, A. (1969). L’archéologie du savoir (Vol. 1). Gallimard.
Gamallo, P., Martín-Rodilla, P., & Calderón, B. (2019). Identifying causal relations in legal
documents with dependency syntactic analysis. In 8th symposium on languages, applications
and technologies (SLATE 2019).
Gamallo, P., Grarcía, M., Martin-Rodilla, P., & Pereira-Farina, M. (2020). Workshop on hybrid
intelligence for natural language processing tasks (co-located at ECAI-2020). March 2021.
Available at https://hi4nlp.pages.citius.usc.es/
Gardin, J. C. (1980). Archaeological constructs: an aspect of theoretical archaeology. Cambridge
University Press.
González-Pérez, C. (2018). Information modelling for archaeology and anthropology. Software
engineering principles for cultural heritage. Springer.

9 Computer Processing of Language: Where Archaeological Discourse. . .

211

González-Pérez, C. (2021). Heritage 3.0 project: Argumentation and conceptual modelling
for enhanced cultural heritage participation and management policies. Grant PID2020114758RB-I00 Founder and prescriptor: Spanish NAtional Agency for Research Funding
(Agencia Estatal de Investigación). Available at http://www.incipit.csic.es/en/project/acme
Gordon, C. (2009). Making meanings, creating family: Intertextuality and framing in family
interaction. OUP.
Harris, Z. S. (1981). Discourse analysis. In Papers on syntax (pp. 107–142). Springer.
Heilman, M., & Sagae, K. (2015). Fast rhetorical structure theory discourse parsing. arXiv preprint
arXiv:1505.02425.
Hirschberg, J., & Manning, C. D. (2015). Advances in natural language processing. Science,
349(6245), 261–266.
Hobbs, J. R. (1985). On the coherence and structure of discourse. CSLI Publications.
Hou, S., Zhang, S., & Fei, C. (2020). Rhetorical structure theory: A comprehensive review of
theory, parsing methods and applications. Expert Systems with Applications, 157, 113421.
Huggett, J. (2004). Archaeology and the new technological fetishism. Archeologia e Calcolatori,
15, 81–92.
Incipit-CSIC. (2020). XSCAPE Material Minds Project (ERC-2020-SyG 951631 – XSCAPE).
08/03/2021; Available at http://www.incipit.csic.es/en/project/xscape
Indurkhya, N., & Damerau, F. J. (2010). Handbook of natural language processing (Vol. 2). CRC
Press.
Iruskieta, M., Aranzabe, M. J., Diaz de Ilarraza, A., Gonzalez, I., Lersundi, M., & Lopez de Lacalle,
O. (2013). The RST Basque TreeBank: an online search interface to check rhetorical relations.
In 4th workshop RST and discourse studies 2013, pp. 40–49.
Jeffrey, S., Richards, J., Ciravegna, F., Waller, S., Chapman, S., Zhang, Z., & Austin, A. (2011).
When ontology and reality collide: The Archaeotools project, faceted classification and natural
language processing in an archaeological context. In Proceedings of the 36th international
conference, Budapest, 2–6 April 2008, pp. 285–290.
Jockers, M. L., & Underwood, T. (2015). Text-mining the humanities. In A new companion to
digital humanities (pp. 291–306). Wiley.
Joty, S., Carenini, G., Ng, R., & Mehdad, Y. (2013). Combining intra-and multi-sentential
rhetorical parsing for document-level discourse analysis. In Proceedings of the 51st annual
meeting of the Association for Computational Linguistics (Volume 1: Long papers), pp. 486–
496.
Karimi, H., & Tang, J. (2019). Learning hierarchical discourse-level structure for fake news
detection. arXiv preprint arXiv:1903.07389.
Khurana, D., Koli, A., Khatter, K., & Singh, S. (2017). Natural language processing: State of the
art, current trends and challenges. arXiv preprint arXiv:1708.05148.
Kolhatkar, V., & Taboada, M. (2017). Constructive language in news comments. In Proceedings of
the first workshop on abusive language online, pp. 11–17.
Kurdi, M. Z. (2017). Natural language processing and computational linguistics 2: semantics,
discourse and applications (Vol. 2). Wiley.
Lesk, M. (1986). Automatic sense disambiguation using machine readable dictionaries: how to tell
a pine cone from an ice cream cone. In Proceedings of the 5th annual international conference
on systems documentation (pp. 24–26). Association for Computing Machinery.
Lessa, I. (2006). Discursive struggles within social welfare: Restaging teen motherhood. British
Journal of Social Work, 36(2), 283–298.
Li, J., Li, R., & Hovy, E. (2014). Recursive deep models for discourse parsing. In Proceedings
of the 2014 conference on empirical methods in natural language processing (EMNLP), pp.
2061–2069, .
Liddy, E. D. (1998). Enhanced text retrieval using natural language processing. Bulletin of the
American Society for Information Science and Technology, 24(4), 14–16.
Liu, Z., Shi, K., & Chen, N. F. (2020). Multilingual neural RST discourse parsing. arXiv preprint
arXiv:2012.01704.

212

P. Martín-Rodilla

Lovins, J. B. (1968). Development of a stemming algorithm. Mechanical Translation and
Computational Linguistics, 11(1–2), 22–31.
Mann, W. C., & Taboada, M. (2005–2021). RST tools for analysts. [12/03/2021]; Available at
https://www.sfu.ca/rst/06tools/index.html
Mann, W. C., & Thompson, S. A. (1988). Rhetorical structure theory: Toward a functional theory
of text organization. Text, 8(3), 243–281.
Manning, C., & Schutze, H. (1999). Foundations of statistical natural language processing. MIT
Press.
Marcu, D. (2000). The theory and practice of discourse parsing and summarization. MIT Press.
Martín-Rodilla, P. (2015). An empirical approach to the analysis of archaeological discourse.
In Across Space and Time. Papers from the 41st Conference on Computer Applications and
Quantitative Methos in Archaeology, Perth 25–28, March 2013 (vol. 319). https://doi.org/
10.5117/9789089647153
Martin-Rodilla, P. (2018). Digging into software knowledge generation in cultural heritage.
Springer.
Martín-Rodilla, P., & Gonzalez-Perez, C. (2014). An ISO/IEC 24744-derived modelling language
for discourse analysis. In 2014 IEEE eighth international conference on research challenges in
information science (RCIS), pp. 1–10.
Martin-Rodilla, P., & Sánchez, M. (2020). Software support for discourse-based textual information analysis: A systematic literature review and software guidelines in practice. Information,
11(5), 256.
Mishra, A., & Bhattacharyya, P. (2018). Cognitively inspired natural language processing: An
investigation based on eye-tracking. Springer.
Moens, M.-F., Boiy, E., Palau, R. M., & Reed, C. (2007). Automatic detection of arguments in
legal texts. In Proceedings of the 11th international conference on artificial intelligence and
law (pp. 225–230). Association for Computing Machinery.
Moscati, P. (2016). Jean-claude gardin and the evolution of archaeological computing. Les
nouvelles de l’archéologie, 144, 10–13.
Paulino, A., Sierra, G., Hernández-Domínguez, L., da Cunha, I., & Bel-Enguix, G. (2018).
Rhetorical relations in the speech of Alzheimer’s patients and healthy elderly subjects: An
approach from the RST. Computación y Sistemas, 22(3), 895–905.
Pierce, J. R., & Carroll, J. B. (1966). Language and machines: Computers in translation and
linguistics (ALPAC report). National Academy of Sciences/National Research Council.
Polanyi, L., Culy, C., Van Den Berg, M., Thione, G. L., & Ahn, D. (2004). A rule based approach
to discourse parsing. In Proceedings of the 5th SIGdial workshop on discourse and dialogue at
HLT-NAACL 2004, pp. 108–117.
Radev, D. (2000). A common theory of information fusion from multiple text sources step one:
Cross-document structure. In 1st SIGdial workshop on discourse and dialogue, pp. 74–83.
Rodilla, P. M., & González-Pérez, C. (2017). A modelling language for discourse analysis in
humanities: Definition, design, validation and first experiences. Revista de Humanidades
Digitales, 1, 368–378.
Rosenfeld, R. (2000). Two decades of statistical language modeling: Where do we go from here?
Proceedings of the IEEE, 88(8), 1270–1278.
SEADDA Project. (2020). SEADDA ACTION COST CA18128 – Saving European archaeology
from the digital dark age 08/03/2021; Available at https://www.seadda.eu/
Sharp, B., & Delmonte, R. (2015). Natural language processing and cognitive science. De Gruyter.
Soricut, R., & Marcu, D. (2003). Sentence level discourse parsing using syntactic and lexical
information. In Proceedings of the 2003 human language technology conference of the North
American chapter of the Association for Computational Linguistics, pp. 228–235.
Strubell, E., Ganesh, A., & McCallum, A. (2019). Energy and policy considerations for deep
learning in NLP. arXiv preprint arXiv:1906.02243.
Taboada, M., & Mann, W. C. (2006). Rhetorical structure theory: Looking back and moving ahead.
Discourse Studies, 8(3), 423–459.

9 Computer Processing of Language: Where Archaeological Discourse. . .

213

Turing, A. M. (2009). Computing machinery and intelligence. In Parsing the turing test (pp. 23–
65). Springer.
Vlachidis, A., Tudhope, D., Wansleeben, M., Azzopardi, J., Green, K., Xia, L., & Wright, H.
(2017). D16.4: Final report on natural language processing. Ariadne.
Webber, B. (2004). D-LTAG: Extending lexicalized TAG to discourse. Cognitive Science, 28(5),
751–779.
Webber, B., & Joshi, A. (2012). Discourse structure and computation: Past, present and future. In
Proceedings of the ACL-2012 special workshop on rediscovering 50 years of discoveries, pp.
42–54.
Weizenbaum, J. (1966). ELIZA—A computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), 36–45.
Winograd, T. (1971). Procedures as a representation for data in a computer program for
understanding natural language. Massachusetts Institute of Technology Cambridge Project
Mac.
Wiseman, S., Rush, A. M., & Shieber, S. M. (2016). Learning global features for coreference
resolution. arXiv preprint arXiv:1604.03035.
Zhao, J., Chevalier, F., Collins, C., & Balakrishnan, R. (2012). Facilitating discourse analysis with
interactive visualization. IEEE Transactions on Visualization and Computer Graphics, 18(12),
2639–2648.

Chapter 10

NLP and Archaeology: A View
from a Digital Archive
Holly Wright, Tim N. L. Evans, and Katie Green

Abstract The Archaeology Data Service (ADS) has been experimenting with
Natural Language Processing (NLP) methodologies for over 12 years. As an
accredited digital repository, the focus has been to explore how NLP techniques
can be used to augment any basic digital object’s metadata and to begin to facilitate
increased human and machine access. Thus, the words used within the ADS archive
catalogue and archaeological reports have added value; they provide detail, context
and understanding, but conversely, they can also be ambiguous. The NLP techniques
studied go beyond allowing a user to search a PDF file, to building a classification
for the user, and then continuing to improve the rules behind the method(s). While
these experiments solidified our view that NLP has an important role to play in our
core services, our ability to implement them in a robust way has remained elusive.
This chapter presents our journey from an archaeological perspective, being useful
to both researchers who wish to engage with NLP methodologies in Social Sciences
and Humanities, while also giving the point of view of a trusted digital repository.
Also, it reports ADS efforts to implement NLP within our collections, discussing
why it remains elusive and future challenges.
Keywords Archaeology data service · Natural language processing · Named
entity recognition · Archive · Metadata

10.1 Introduction
The Archaeology Data Service (ADS) has been experimenting with Natural Language Processing (NLP) technologies and methodologies for over 12 years. As an
accredited digital repository that actively curates over two million digital objects,
the focus of the NLP work at the ADS has been to explore how these techniques can
H. Wright (!) · T. N. L. Evans · K. Green
Archaeology Data Service, University of York, York, UK
e-mail: holly.wright@york.ac.uk; tim.evans@york.ac.uk; katie.green@york.ac.uk
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
C. Gonzalez-Perez et al. (eds.), Discourse and Argumentation in Archaeology:
Conceptual and Computational Approaches, Quantitative Archaeology
and Archaeological Modelling, https://doi.org/10.1007/978-3-031-37156-1_10

215

216

H. Wright et al.

be used to augment the basic metadata that describes a digital object, and to begin
to facilitate increased human and machine access. In simple terms, the words used
within the ADS archive catalogue and archaeological reports have added value;
they provide detail, context and understanding, but conversely, they can also be
ambiguous. The NLP techniques studied go beyond allowing a user to search a PDF
file (for example) for matching words and phrases, to building a classification for
the user, and then continuing to improve and refine the rules behind the method(s).
While these experiments solidified our view that NLP has an important role to
play in our core services, our ability to implement them in a robust way has
remained elusive. This chapter will present our journey from an archaeological
domain perspective rather than a technical perspective, in the hope that it will be
useful to both researchers who wish to engage with NLP methodologies using Social
Sciences and Humanities data, while also giving a user needs perspective from the
point of view of a trusted digital repository, and its users. This chapter sets out the
history of the many directions across which the ADS attempted to implement NLP
functionality within our collections, a discussion of why it remains elusive, and the
challenges we hope to address in future.

10.2 Archaeology and Unpublished Fieldwork Reports
Countries like the UK, with a large development-led archaeology sector have a
problem. Every time a developer or a government department decides to undertake
a project that may result in the destruction of an archaeological resource, they
must hire a commercial field unit to assess, and potentially ‘offset’ the loss of the
archaeological resource in a way that advances understanding and provides public
benefit (Thomas, 2019). In most instances, this results in a single synthetic output:
the unpublished fieldwork report (also referred to as grey literature). This report is
meant to satisfy the requirements of the local government authority, by documenting
and describing what was found, and what its significance might be. These reports
are often quite mundane but form a critical part of the corpus archaeologists must
consult prior to undertaking any work nearby, or for academic research focussed
beyond the site level (Fulford & Holbrook, 2018). As such, archaeologists must
access these reports, to understand any prior archaeological interventions and
include them in their planning (Evans, 2015).
This used to mean travel to local authority offices to consult the lone paper
copy of the report, resulting in time and expense impacting the already tight profit
margins to which most archaeologists must work (Bradley, 2006). To mitigate this
in the UK, the OASIS system (https://oasis.ac.uk/) was developed to automate these
compliance procedures, where practitioners must provide information about their
investigations to local Historic Environment Records (HERs) or national heritage
bodies in digital form (Richards & Hardman, 2008). The ADS was able to add
digital preservation and dissemination of these reports to this workflow, resulting in
over 62,000 reports now freely available online through a designated interface called

10 NLP and Archaeology: A View from a Digital Archive

217

the ADS Library of Unpublished Fieldwork Reports (n.d.), latterly incorporated
within a larger application called the ADS Library (https://archaeologydataservice.
ac.uk/library/). To understand the impact of the open dissemination of these reports
in digital form, an economic assessment was undertaken, resulting in the primary
conclusion that a significant number of commercial archaeological field units in
the UK now make considerable use of ADS held reports within their costing and
business models, and the resultant savings are now a critical part of their commercial
workflows (Beagrie & Houghton, 2013).

10.3 Metadata Challenges
As important as this resource is to most archaeological work in the UK, the actual
content within unpublished fieldwork reports is notoriously difficult to access. While
most reports from the 2000s to the present were “born digital” and created using a
word processing program, HERs continue to digitise the report backlogs, many of
which are scanned from typewritten pages which OCR software still struggles to
read. The purpose of these reports is to inform the description and classification
of the individual heritage asset, so the creation of resource discovery metadata to
allow their content to be searched alongside other reports has become an ongoing
challenge for the ADS. When Natural Language Processing (NLP) as a form of
automated metadata extraction first came on the scene, ADS staff quickly saw a
potential solution they were eager to explore.
Use of controlled vocabularies was not originally a major feature of OASIS,
but as time went on the value of their incorporation became obvious. Work was
undertaken to standardise controlled vocabularies for the heritage sector in Great
Britain (Binding & Tudhope, 2016) along with international standards relevant
to archaeology, such as the Getty Art and Architecture Thesaurus and the Getty
Thesaurus of Geographical Names (Cobb, 2015). The possibilities for incorporating
Named Entity Recognition (NER), in combination with NLP, might bring not only
greater Findability and Accessibility, but also Interoperability; the F, A and I in the
FAIR Principles (Wilkinson et al., 2016), to the Library of Unpublished Fieldwork
Reports.

10.4 The Archaeotools Project
In 2007 the ADS made its first attempt to use NLP to address this challenge. In
partnership with the Natural Language Processing (NLP) Research Group at the
University of Sheffield the ADS undertook the Archaeotools project, funded under
the UK Arts and Humanities e-Science Initiative. For perspective, at the start of
Archaeotools the Library of Unpublished Fieldwork Reports totalled around 2300
with a growth rate of 50–100 each month (Jeffrey et al., 2009), and even at that

218

H. Wright et al.

point the challenges in creating robust metadata were considered significant enough
to look to NLP for help. Over 2 years, the Archaeotools project worked to implement
Information Extraction (IE) over a corpus of around 1000 unstructured Unpublished
Fieldwork Reports, using very simple NER to map the results to subject (What),
location (Where) or temporal designation (When) and to Dublin Core (DC) entities
for publication information:
Subject (topics covered, findings mentioned) mapped to What
Location (place names related to events and findings) mapped Where
Temporal (temporal information related to findings) mapped to When
Grid reference mapped to Where
Report title, creator, publisher, publisher contact, publication date mapped to DC
Event dates mapped to DC
Bibliography and references mapped to DC
Archaeotools employed both a knowledge engineering approach (KE) and an
automatic training (AT) approach to the ADS Library of Unpublished Fieldwork
Reports. For the parts of the data that appeared in standardised contexts, such
as the title of the report, the KE approach was applied. For the heterogeneous
and irregular data, such as placenames and subjects, both approaches were then
combined for IE. This produced mixed results. For example, when archaeologists
discuss a site, they invariably also discuss other sites that are relevant to the site
under investigation. This caused problems for Archaeotools, as the NLP was unable
to distinguish between them, and determine Where the site was located amongst all
the extracted placenames. This was solved by only returning the placenames found
in the summary, or barring a summary, the first 10% of the document. This still
returned 162 out of 960 reports where the correct placename could not be identified
(Richards et al., 2011), which speaks to the lack of structure in archaeological
reports generally.
Upon completion of the Archaeotools project, it was agreed there was potential
for applying automated data and metadata extraction to Unpublished Fieldwork
Reports, and that the combined approach was generally felt to be successful.
Archaeotools was much more successful when applied to structured data within
the ADS Archsearch search interface and helped build the Solr index upon which
Archsearch was based. It was also successful in identifying trends in the use of terms
in other types of unstructured text held by the ADS, such as the Proceedings of the
Society of Antiquaries of Scotland (PSAS). The PSAS have been published since
1851, and as such do not follow modern forms of sentence structure and syntax, but
Archaeotools was still able to find useful patterns:
Here is an example section of text from an early PSAS paper and the named
entities that could be extracted from it using NLP (Bateman & Jeffrey, 2011):
The bronze ring inscribed with runic characters, presented to the Society, was found in
the year 1849, in the Abbey Park, in the immediate neighbourhood of St Andrews. It is a
large bronze finger ring inscribed on the two faces in Anglo-Saxon runes, and is of peculiar
interest, as being, it is believed, the only example of the Paleography of our Anglo-Saxon

10 NLP and Archaeology: A View from a Digital Archive

219

forefathers hitherto found in Scotland, with the single, but most important exception of the
noble monument at Ruthwell, Dumfriesshire. (Wilson, 1851)

What – Bronze Ring, Runic Inscription (also ‘the monument at Ruthwell’)
Where – Abbey Park, St Andrews, (also Ruthwell, Dumfriesshire)
When – Anglo-Saxon (also ‘found 1849’)
Who – Wilson, D.
Media – PSAS (PDF)
This early attempt allowed the ADS to see what might be possible but moving
forward was reliant on access to highly specialised research expertise which was
not available after completion of the Archaeotools project. We also became aware
that while we could clearly see how useful this might be for our users, and the
archaeologists in the project considered it largely successful, for NLP researchers
our desired application of wishing to extract richer resource discovery metadata
from unstructured text, was very mundane in comparison to their other research
areas within computer science.

10.5 NLP and the ARIADNE Infrastructure
The next opportunity to advance NLP and NER capabilities for the ADS Unpublished Fieldwork Reports didn’t arrive until 2014 with the Advanced Research
Infrastructure for Archaeological Dataset Networking in Europe (ARIADNE),
which was funded under the European Community’s Seventh Framework Programme. The primary output of the project was the ARIADNE Portal (https://
portal.ariadne-infrastructure.eu/), but a range of research pilots were also key to
the project, including NLP.
For ARIADNE, partners built on previous work done within the Semantic
Technologies for Archaeological Resources (STAR) project (Vlachidis et al., 2010)
using rule-based NLP methods and the GATE toolkit developed at Sheffield (https://
gate.ac.uk/). Work done in English within STAR was expanded within ARIADNE
to determine whether it could be adapted for Dutch and Swedish grey literature.
This work was undertaken in collaboration with ARIADNE partners at Leiden
University, DANS (Dutch reports) and the Swedish National Data Service (Swedish
reports) and made use of glossaries and thesauri from the Dutch and Swedish
partners, importing the thesauri into GATE, and analysing the suitability and performance for NER use. The NER experimentation focused on the characteristics:
•
•
•
•
•
•

Archaeological Context
Material
Physical Object (Finds)
Monument
Place
Temporal (Time Appellation)

220

H. Wright et al.

Fig. 10.1 Screenshot of the suite of named entity recognizer pipelines developed within the
ARIADNE project, available on GATEcloud (https://cloud.gate.ac.uk/shopfront)

Once extracted, these were mapped to native vocabularies, CIDOC CRM subjects
or Getty Art and Architecture Thesaurus concepts. Additional thematic case studies
were undertaken, including a numismatic study, and a dendrochronology study
(Vlachidis et al., 2017). The archaeology and dendrochronology NER pipelines are
openly available via GATEcloud (Fig. 10.1).
With the hiring of an Applications Developer from the NLP Research Group at
Sheffield, the ADS was also able to experiment with machine learning-based NLP
techniques for our NLP contribution to ARIADNE. The ADS built upon the lessons
learned from Archaeotools and attempted once again to use NLP tools to unlock the
potential of its Unpublished Fieldwork Reports. This text typically exists in PDF,
MS Word, or plain text files within the ADS Library of Unpublished Fieldwork
Reports. Training data developed by Archaeotools was applied to a classifier (a
machine learning tool that takes data items and places them into classes resulting
in a statistical model used to extract entities from text). Several classifiers were
tested. In the end, the CRF classifier was chosen, not because it produced better
results than other classifiers, but because it was easier to implement into an API and
required less computing time to produce results (Vlachidis et al., 2017).
The models were built by the classifier using gazetteers, which were then directly
applied to data from the Unpublished Fieldwork Reports. As there was no Gold
Standard for Unpublished Fieldwork Reports, a group of reports from the North
Yorkshire region which had not been part of any previous Archaeotools training data
were chosen and manually scored. The gazetteers improved extraction performance,
confirming substantial overlap of information from various corpora within the grey
literature.
To train the CRF classifier, a window size of five surrounding tokens and the
following feature set was used:

10 NLP and Archaeology: A View from a Digital Archive

221

Fig. 10.2 Screen shot of the prototype ADS API showing original text on the left and metadata
entities extracted from the text on the right

•
•
•
•

N-Grams with max length of six tokens (i.e., contiguous sequence of words)
Exact token string
Features from previous word class sequence
Archaeological Gazetteer

As a proof-of-concept, a prototype API was developed (Fig. 10.2). The prototype
allowed domain experts to annotate reports, generate resource discovery metadata
where none existed, and generate metadata which could be used to further train
the classifiers. While only a prototype, the interface showed how an API might
be visualised if implemented in an existing interface, allowing users to correct the
results, ensuring the creation of better-quality metadata.
The API included the Named Entities to which the ADS would map the
extracted metadata, using the thesauri created within the Semantic ENrichment
Enabling Sustainability of arCHaeological Links (SENESCHAL) project (Binding
& Tudhope, 2016). Text was entered into the “input text area” allowing entities
to be extracted using the CRF classifier. The extracted entities were displayed as
suggested metadata to the right of the entered text, allowing users to assess the
relevance of the extracted entities. The API also detected and extracted UK grid
references using manually crafted regular expressions, which were automatically
verified using UK Geospatial data held within a spatial database. By clicking on
a magnifying glass icon beside each generated entity, users could jump directly

222

H. Wright et al.

to the word in the text from which the result was derived, allowing easy manual
verification or correction (Vlachidis et al., 2017).
The entities extracted by the NER module with this method, using a relatively
short piece of summary text, produced good results. The small number of entities
returned were easy to view and manage within the API, although this became
more complicated when tested with a larger body of text. Development time was
focussed on creating and refining a version of the API that allowed external users to
submit NER tasks to an ADS server, which then returned a set of terms, including
their category and offsets, which developers could incorporate into their existing
interfaces. The API was a RESTful HTTP web service where users could submit
a task, and clients POST JSON to an API endpoint. If successful, the API would
return JSON in the response. Depending on the complexity of the task and length
of the content, the API might return the result asynchronously, in which case the
results were not immediately available, and it was envisioned that a delay would
have to be implemented by the developer after the task was submitted.
Here is an example of the kind of text tested by the API:
The various sites that Butser Ancient Farm occupied over the years were all, in one way
or another, based on the concept of demonstrating what a farm, which would have existed
in the British Iron Age circa 300 BC, might have been like. It was founded in 1972 as the
Butser Ancient Farm Project and occupied sites on Little Butser Hill, Hampshire UK, the
so-called Demonstration Site in the grounds of Queen Elizabeth Country Park, Hampshire
and finally it moved to its present site at Bascomb Down in 1991. The work was extended
to include the construction of a Roman Villa in 2002.

Using the simple example text, the API returned the following response aligned to
pre-defined general categories such as location, subject and period (what, where,
when):
Placename: Butser Ancient Farm
Placename: Little Butser Hill
Placename: Hampshire
Placename: Bascomb Down
Subject: farm
Subject: Villa
Temporal: British Iron Age circa 300 BC
Temporal: Roman
Temporal: 2002
The ADS planned to test the API as part of the redevelopment of the OASIS
system. The aim was to allow an archaeologist to upload a report to OASIS, and
by choosing to use the NER service, automatically extract suggested metadata for
the report. The metadata could then be accepted or rejected by the user and then
automatically populated into the correct fields within OASIS. OASIS is intended
to be easy to use and the process of uploading a report, creating the relevant
metadata, and submitting it to the system was meant to be quick to complete. It
took time to assess how long the call-response method would take in real time,
as OASIS was already quite a process-intensive application. The idea of adding

10 NLP and Archaeology: A View from a Digital Archive

223

equally intensive NER functionality, including the time needed to approve or reject
suggested matches, would likely try user patience. In addition, the user base of
OASIS was expanding beyond commercial archaeologists, with community users
also being a key demographic, making ease and speed of use even more important.
Ensuring the updates to the OASIS system were fit for purpose meant the
deployment was delayed, and it was not possible to continue the work within the
ARIADNE project, but the API was circulated informally to ARIADNE partners
for internal review to test the service and provide feedback. It was found that while
the service did not return any false positives, it failed to return all potential positives.
This would indicate that while the metadata generated by the service was reliable,
it was not complete. It was determined that this was likely due to a need for more
training data, and/or an adjustment to the algorithm. The ADS hoped to continue
to work on the service beyond the completion of the ARIADNE project, to see if
further improvements were possible, but this was curtailed due to capacity issues
and staffing changes.

10.6 The ADS and NLP at the University of York
The ADS got another opportunity to work toward our goal of using NLP to enhance
the metadata in our Unpublished Fieldwork Reports after an approach by our own
Department of Computer Science at the University of York. After a few meetings it
was decided that this would be a fruitful research avenue which we should pursue
together. Once again, the interests of the computer scientists were not necessarily
the interests of the archaeologists, and after further discussion, the importance of
applying NLP to enhance the metadata within our Unpublished Fieldwork reports
over more cutting-edge, but less useful avenues, was agreed upon.
This led to opportunities for MSc students in Computer Science and Archaeological Information Systems (AIS). The AIS student was interested in applying
NLP to identify Zooarchaeological data for her dissertation, and the implementation
provided a dissertation topic for the Computer Science student. The students worked
together, guided by Computer Science and ADS staff, with additional Zooarchaeology domain expertise and supervision provided by staff in the Department of
Archaeology. Working directly with an archaeologist with domain expertise was
particularly important to the usefulness of the outcome.
At this point, the ADS had experienced working with quite a few computer
scientists interested in Machine Learning generally, and NLP in particular. As an
open access digital archive, the ADS is also approached fairly regularly by computer
scientists from a range of countries and institutions wanting to use our data for their
own research. This was of course encouraged, but when the results are shared with
us, it is clear they did not understand the kinds of questions that were of importance
to archaeologists (even when that was their expressed intention for wanting to
use the data). By not working alongside archaeologists during the entire process,
the results were at best, of little use, and at worst, lacking an important ethical

224

H. Wright et al.

understanding of how to interact with data about the human past. For a project where
the ADS would be investing time and capacity, we were insistent that for this new
collaboration, archaeologists with critical domain expertise and an understanding of
the theoretical underpinnings of our discipline were included in all phases of the
work.
As the Computer Science staff expertise was in deep learning neural networks,
the students were advised to use the Keras open-source neural network library
(Keras, 2017) as Keras allowed easy switching between different backend engines.
The AIS student used a statistical approach to evaluate the performance of the NER
tool, and to evaluate its usefulness to archaeologists, she created a questionnaire.
To create the NER Gold Standard for Zooarchaeological data, a diverse range
of Zooarchaeological reports were chosen from the (at the time) over 42,000
reports held by the ADS. This ensured a wide range of animal taxa, locations
and archaeological time periods were represented. These were then annotated by
Zooarchaeology specialists to create the Test Set, which was then checked by the
Zooarchaeology domain expert at York as the superannotator. The Test Set was then
transformed into XML using GATE, resulting in a Gold Standard consisting of over
2000 annotations in 97 different classes (Talboom, 2017).
As this was undertaken as MSc dissertation projects, it was not possible to
fully develop the NER tool, but some promising outcomes were evident. The
tool was able to correctly tag entities and annotate entities, however, the tool
also tagged words that did not have a tag in the original annotation or tagged
the incorrect term. These could have been improved with further development.
Rather than focussing on the F-measure, which is invariably disappointing for
computer scientists when dealing with archaeological data (a lower F-measure often
represents a very useful result for domain users), the AIS student focussed on
understanding the usefulness for archaeological researchers. Using the Likert scale,
users were asked to evaluate their agreement with the following statements, along
with some qualitative queries:
•
•
•
•
•
•

The tool retrieved the required information
The majority of the retrieved documents was relevant
I trust the retrieved information
The tool found all relevant documents within the repository
What percentage of the received documents do you see as relevant?
Other perceptions?

Once the testing was completed, users were then asked to evaluate their agreement
with these additional statements/queries:
•
•
•
•
•
•

The tool is time saving
The tool is facilitating my searching for information
The tool is relevant to my work tasks
I would use this tool in my future research
What type of queries did you use?
Other perceptions?

10 NLP and Archaeology: A View from a Digital Archive

225

Fig. 10.3 Bar chart showing the mean results of the usefulness of osteoarchaeological entity
search, using the seven-point Likert scale. (Reproduced from Talks, 2019)

While the NER tool could not be completed within the timeframe of the MSc, the
creation of this evaluation structure, undertaken with the domain expert, was a step
in the right direction. The following year, this work was later used by another student
as the basis for his BSc and then MSc dissertations in Archaeological Science,
expanding the NER tool for osteoarchaeology. This work used the same technical
approach but expanded testing the usefulness of the tool to include domain experts
in osteoarchaeology, archaeologists, archaeology students, and non-archaeologists
in the UK (Talks, 2019). While the evaluation was quite general, it shows we are
moving to a point where we can start asking these key questions in a robust way
(Fig. 10.3).
As with our other attempts however, the capacity to move this into ADS technical
workflows was lost when the researchers and students moved on, but our discussions
continue, and an effective, efficient, and useful solution will be found.

10.7 Conclusion
The way the ADS currently views the potential of NLP, particularly regarding its
use as part of OASIS, focussed on two main challenges. It should be relatively
simple to extract accurate Dublin Core-type metadata, so the first challenge is
making it possible to drill down within unstructured text to pull out significance
beyond the literal identification of terminologies (such as henge). This is particularly
important, as archaeological grey literature does not exist in a vacuum. It links

226

H. Wright et al.

to and from extant inventory records such as the Heritage Gateway (https://www.
heritagegateway.org.uk/gateway/) in England, Canmore (https://canmore.org.uk/)
in Scotland, and the ARIADNE Portal (https://portal.ariadne-infrastructure.eu/)
internationally. Users of these aggregation interfaces may well come across ADS
reports by searching for a particular site-type, but may not realise there is additional,
more detailed information available because it isn’t represented in the metadata.
Associated with this would be an ambition to allow greater understanding and a
thematic sense of how this digital object contributes to a research topic or question.
For example, rather than searching for Mesolithic and/or Neolithic, this may allow
resource discovery related to the Mesolithic to Neolithic transition, and therefore
evidence of the adoption of agriculture. Pragmatically this would also improve
the range of vocabularies to encompass specialisms within the archaeological
community such as Zooarchaeology; turning the research questions themselves into
defined concepts.
Within the UK, a new generation of Research Frameworks (e.g. https://scarf.
scot/) have begun to turn common questions and fields of study into clear entities,
and more generally, other disciplines and initiatives outside of archaeology, such as
the Marine Environment (https://vocab.nerc.ac.uk/collection/) have taken steps to
define their classifications and understandings to help the researcher identify how
something contributes to their specialism.
Another example would be the pottery reporting and classification work currently
underway by Historic England (Barclay et al., 2016; Medieval Pottery Research
Group, 2019). An NLP approach that can augment digital objects to show a user
why and how they are relevant, is an ambitious but necessary step for us, particularly
as the ADS Library of Unpublished Fieldwork Report continues to grow at such a
rapid rate.
It also has the benefit of being a dynamic exercise. Unlike the production of
metadata at the point of data creation or deposition within an archive, it allows
continued re-evaluation of a digital object, based on new ideas and interpretations.
The ongoing curation of metadata through the implementation of NLP tools may be
key to fulfilling the ambition of a living, engaging and relevant digital archive.
The second challenge is the historic backlog of Unpublished Fieldwork Reports.
If the ADS currently holds around 62,000 reports, we estimate that represents maybe
half (optimistically) of the reports produced in England alone since 1990 (where
roughly 4000 fieldwork events take place per year, on average). A large factor
limiting the deposition and online dissemination of these Unpublished Fieldwork
Reports is the time it would require to manually create metadata. The ability to
auto-generate syntactically meaningful metadata for this backlog would double the
usefulness of the resource by significantly increasing findability, accessibility, and
interoperability. This would also provide a pathway for dealing with other types of
unstructured text the ADS hold, such as backruns of archaeological journals.
The ADS has learned many lessons about the difficulties of incorporating NLP
into our workflows in any practical way over the last 12 years, but our belief in the
potential it could unlock for our unstructured data remains undiminished. If NLP is
going to be implemented in substantive ways, it needs to be supported in the same

10 NLP and Archaeology: A View from a Digital Archive

227

ways as other forms of vital technical infrastructure. At the same time, the barriers
to implementation are not technical. We understand the technology and how to use
it, but for smaller organisations, such as the ADS, it is nearly impossible to retain
NLP specialists who can both implement and maintain these types of systems in the
long term, and it is currently unclear how this problem can be solved. The projectto-project funding landscape and the transient nature of academia also contribute to
this issue. It is difficult to make the case for these needs when it is equally difficult
to create viable proofs-of-concept to even demonstrate their potential. We will keep
working to find a solution.

References
Barclay, A., Knight, D., Booth, P., Evans, J., Brown, D. H., & Wood, I. (2016). Standards for
pottery studies in archaeology. Medieval Pottery Research Group. https://romanpotterystudy.
org.uk/wp-content/uploads/2016/06/Standard_for_Pottery_Studies_in_Archaeology.pdf
Bateman, J., & Jeffrey, S. (2011). What matters about the monument: Reconstructing historical
classification. Internet Archaeology, 29. https://doi.org/10.11141/ia.29.6
Beagrie, N., & Houghton, J. (2013). The value and impact of the archaeology data service. JISC
Website. http://repository.jisc.ac.uk/5509/1/ADSReport_final.pdf
Binding, C., & Tudhope, D. (2016). Improving interoperability using vocabulary linked data.
International Journal on Digital Libraries, 17(1), 5–21.
Bradley, R. (2006, September). Bridging the two cultures – Commercial archaeology and the study
of prehistoric Britain. The Antiquaries Journal, 86, 1–13.
Cobb, J. (2015). The journey to linked open data: The Getty vocabularies. Journal of Library
Metadata, 15(3-4), 142–156.
Evans, T. N. L. (2015). A reassessment of archaeological Grey literature: Semantics and paradoxes.
Internet Archaeology, 40. https://doi.org/10.11141/ia.40.6
Fulford, M., & Holbrook, N. (2018). Relevant beyond the Roman period: Approaches to the investigation, analysis and dissemination of archaeological investigations of the rural settlements
and landscapes of Roman Britain. Archaeological Journal, 175(2), 214–230.
Jeffrey, S., Richards, J., Ciravegna, F., Waller, S., Chapman, S., & Zhang, Z. (2009). The
Archaeotools project: Faceted classification and natural language processing in an archaeological context. Philosophical Transactions of the Royal Society A: Mathematical, Physical
and Engineering Sciences, 367(1897), 2507–2519.
Keras. (2017, June 10). Keras. https://keras.io/
Library of Unpublished Fieldwork Reports. (n.d.). https://archaeologydataservice.ac.uk/archives/
view/greylit/. Accessed 15 Apr 2021.
Medieval Pottery Research Group. (2019, December 21). A guide to the classification of
Medieval ceramic forms digitisation project completed!https://medievalpottery.org.uk/2019/12/
21/a-guide-to-the-classification-of-medieval-ceramic-forms-digitisation-project-completed/
Richards, J. D., & Hardman, C. S. (2008). Stepping back from the trench edge: An archaeological
perspective on the development of standards for recording and publication. In M. Greengrass
& L. Hughes (Eds.), The virtual representation of the past. Digital research in the arts &
humanities (pp. 101–112). Ashgate.
Richards, J., Jeffrey, S., Waller, S., Ciravegna, F., Chapman, S., & Zhang, Z. (2011). The
archaeology data service and the Archaeotools project. Archaeology, 2. https://doi.org/10.2307/
j.ctvhhhfgw.11
Talboom, L. (2017). Improving the discoverability of zooarchaeological data with the help of
natural language processing (MSc in archaeological information systems). University of York.

228

H. Wright et al.

Talks, A. (2019). An exploration of NLP and NER for enhanced search in osteoarchaeological and
palaeopathological textual resources (Bachelor of science in bioarchaeology). University of
York.
Thomas, R. (2019). It’s not mitigation! Policy and practice in development-led archaeology in
England. The Historic Environment: Policy & Practice, 10(3-4), 328–344.
Vlachidis, A., Binding, C., Tudhope, D., & May, K. (2010, July). Excavating grey literature: A
case study on the rich indexing of archaeological documents via natural language-processing
techniques and knowledge-based resources. ASLIB Proceedings, 39, 88.
Vlachidis, A., Tudhope, D., Wansleeben, M., Azzopardi, J., Green, K., Xia, L., & Wright, H.
(2017). D16.4 final report on natural language processing. http://legacy.ariadne-infrastructure.
eu/wp-content/uploads/2019/01/D16.4_Final_Report_on_Natural_Language_Processing_
Final.pdf
Wilkinson, M. D., Michel, D., Aalbersberg, I. J. J., Appleton, G., Axton, M., Baak, A., Blomberg,
N., et al. (2016). The FAIR guiding principles for scientific data management and stewardship.
Scientific Data, 3(March), 160018.
Wilson, D. (1851). Inscribed ring. Proceedings of the Society of Antiquaries of Scotland, 1, 1851–
1854.

Chapter 11

Information Extraction and Machine
Learning for Archaeological Texts
Alex Brandsen

Abstract Archaeologists are creating ever-increasing amounts of textual data. So
much in fact, that manual reading and inspection has become practically impossible.
By leveraging computational approaches, it is possible to extract relevant information from this big data, allowing for more efficient research and new analyses. In
this chapter, methods and techniques to extract information from archaeological
texts through Machine Learning are introduced and discussed, with a focus on
practical examples. After reading the chapter, you should have a clear grasp on the
possibilities of text mining in archaeology, the current state of research, and enough
information to start your own text analyses.
Keywords Information extraction · Text mining · Machine learning · Data
science

11.1 Introduction
In the last ten years or so, archaeologists have started generating ‘big data’:
information assets characterised by the four V’s: Volume, Velocity, Veracity, and
Variety. Volume simply means the size of the data, generally meaning many
gigabytes or terabytes of data. The Velocity is the speed at which data updates,
and Veracity is a measure of how trustworthy data is, these V’s are generally less
relevant to archaeology. Variety speaks to the level of heterogeneity in the data, and
how fuzzy or unclear data is, something we do encounter regularly in archaeology.
In short, big data is so unwieldy that it is not feasible to analyse it with
conventional methods. This problem of having too much data has been described
by multiple authors, with Bevan calling it a “data deluge” (Bevan, 2015, p. 1) and
Vince noting “we are drowning in our own data” (Vince, 1996, p. 1). Dealing with
A. Brandsen (!)
Faculty of Archaeology, Leiden University, Leiden, Netherlands
e-mail: a.brandsen@arch.leidenuniv.nl
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
C. Gonzalez-Perez et al. (eds.), Discourse and Argumentation in Archaeology:
Conceptual and Computational Approaches, Quantitative Archaeology
and Archaeological Modelling, https://doi.org/10.1007/978-3-031-37156-1_11

229

230

A. Brandsen

structured data—such as databases and geospatial data—has received a fair share
of our attention, but much less research is being done on processing and analysing
unstructured information: the documents that archaeologists write (Bevan, 2015).
These texts do contain a wealth of information, and by using computational tools
to access, extract, and combine information in the documents, we can perform new
synthesising research on large scales. Due to the amount of text data, computational
methods almost become a necessity: in the Netherlands alone more than 4000
excavation reports are produced each year, not to mention thousands of books,
papers, and preprints as well. When we extrapolate that to the situation across the
world, it quickly becomes clear that manual inspection of these texts is unfeasible.
In this chapter, methods and techniques to extract information from archaeological texts through Machine Learning are introduced and discussed, with a focus on
practical examples. After reading the chapter, you should have a clear grasp on the
possibilities of text mining in archaeology, the current state of research, and enough
information to start your own text analyses.

11.2 Information Extraction Techniques
In this section, an overview is given of techniques that are useful for Information
Extraction. The focus here is on explaining what the methods do and what use they
have, while the following sections go more into the technical details on how to
practically apply these methods. The first part of this section explains some general
concepts, and the subsections deal with specific techniques.
Natural Language Processing (NLP) is a research field which explores how
computers can be used to understand and manipulate natural language, i.e. speech
and written text in human language (as opposed to formal/constructed language such
as programming languages) (Chowdhury, 2005). A document collection that we can
analyse with NLP is called a corpus.
Text Mining is a subfield of NLP, and is a group of tasks all related to analysing
written text (Feldman & Sanger, 2007). The most common task is Information
Extraction (IE): extracting information from unstructured text.
In essence, IE is text simplification: turning unstructured text into a structured
view of the information present in the text. There are a number of techniques that
fall under IE, we here list the most used ones:
• Named Entity Recognition (NER), detection of entities (or concepts) in text.
For example, finding all archaeological artefacts or time periods mentioned in a
document.
• Document Classification, the process of automatically assigning labels to a text.
For example, assigning subject metadata to a document by classifying it into one,
or a number of categories.
• Topic Modelling, automatically clustering documents into distinct groups based
on their content.

11 Information Extraction and Machine Learning for Archaeological Texts

231

• Relationship Extraction, the identification of relations between entities. For
example, finding relations such as [artefact] is found in [context].
• Coreference Resolution, detection of coreference between entities. For example, in the sentence “We found an arrow head, it is dated to the Neolithic”, it is
useful to know that “it” and “arrow head” refer to the same real world entity.
• Terminology Extraction or ontology extraction, a method of automatically
constructing a thesaurus by analysing a large corpus.
At the moment in archaeology, NER, document classification, and topic modelling are being researched the most, and we will focus mainly on these techniques
so we can illustrate the methods with archaeological examples.
Machine Learning (ML) is a form of Artificial Intelligence, which uses relations
between data points in large data sets to create statistical models which can be used
for various purposes. Generally, a Machine Learning algorithm will be able to take
a human-annotated set of data (e.g. labelled entities in text) and create a statistical
model which can predict new, unlabelled data (e.g. predict entities in text). Another
way to make predictions is to use handcrafted rules: a Rule-Based approach. Here,
an expert manually creates rules that can predict labels, e.g., “if a word is in an
artefact word list, label it as an artefact”. Both approaches have been used with
various degrees of success in archaeological text mining. More detailed information
about ML can be found in Sect. 11.5.
The term Grey Literature is used to describe documents which are not published
in the traditional sense of the word by academic or commercial publishing houses,
such as field reports and theses. A lot of research in archaeological text mining is
focused on using this type of literature, as it is generally the most prevalent and the
least studied.

11.2.1 Named Entity Recognition
Named Entity Recognition is the process of finding different categories of named
entities (or concepts) in text. Quite often, the categories of entities are persons,
organisations, locations, time periods, and quantities, as defined in CoNLL-2002
(Conference on Natural Language Learning), the most used NER benchmark (Tjong
Kim Sang, 2002). For archaeology, these entities are not as relevant, except for time
periods and locations. Generally, archaeologists are interested in entity types such
as artefacts, materials, contexts, species, locations, and time periods. An example
sentence with marked archaeological named entities is shown in Fig. 11.1.
NER can be useful for a range of applications. In archaeology, it is mainly used to
automatically generate metadata, i.e. descriptions of data (Jeffrey et al., 2009; Byrne
& Klein, 2010; Vlachidis, 2012; Niccolucci & Richards, 2013; Vlachidis et al.,
2017). A lot of archaeological texts have limited or no metadata at all, and to be
able to find these texts for research, it is useful to have some description of the data,
such as which time periods, places, and artefacts are mentioned in the text. Instead of

232

A. Brandsen

Fig. 11.1 Example sentence with named entities marked. Entity types have been shortened: MAT
= material, ART = artefact, PER = time period, and CON = context

using all the detected entities, often a selection is made of the most important ones to
serve as metadata. It is also possible to connect the detected entities to entries in thesauri, to further improve interoperability between data sets (Tudhope et al., 2011).
Instead of using a selection of entities as metadata, it is also possible to index
all the entities in a search engine, together with the full text of the documents
(Brandsen et al., 2019, 2020). This makes it possible for researchers to do advanced
searches and find more relevant documents to their research. Recent research by
Brandsen and Lippok (2021) shows that using such an intelligent search engine
leads to more data and new insights. With either search on entities, or automatic
metadata generation, the goal is to make the data more FAIR (Findable, Accessible,
Interoperable and Reusable, Wilkinson et al., 2016).
Another approach that is made possible by entity extraction is pattern mining:
using algorithms to automatically extract meaningful patterns from data. In other
domains this has been researched extensively, but in archaeology it is quite rare.
One example of pattern mining is the work by Wilcke et al. (2019), but they worked
with hand-created XML (eXtensible Markup Language) data, not extracted entities.
Their results are perhaps a bit lacking, but this might be partly due to the small
amount of data. Once these methods are applied to thousands or millions of entities
extracted from big data, more meaningful patterns might emerge.

11.2.2 Document Classification
Unlike NER where extraction of entities is the goal, document classification aims
to assign one or more labels to a text. But similar to NER, this is often done to
create metadata (Brandsen & Koole, 2021). Another approach is to label documents
as relevant or irrelevant for a particular research question (Fischer et al., 2021). In
archaeology, the focus has mainly been on NER, so there are not many examples.
However, there are many possible applications, for example: determining whether
or not tweets or reviews are about (a particular kind of) archaeology, or classifying
a large amount of papers into certain categories for further study.
There are three variants of document classification:
1. Binary classification, each document is classified as either belonging, or not
belonging, to one class (is a document relevant or not?)
2. Multi-class classification, each document can be classified as belonging to one
of multiple classes (Which time period is a document about?)

11 Information Extraction and Machine Learning for Archaeological Texts

233

3. Multi-label classification, each document can be classified as belonging to one
or more classes (which subject(s) are discussed in this document?)
Generally, when assigning metadata, we would be looking for multiple classes
for each document, so multi-label classification is the most common.
A particular type of document classification worth mentioning here is sentiment
analysis, a task where the goal is to determine whether a text is positive or negative
about a certain topic (Turney, 2002). This task is quite popular, receiving a lot of
research interest, mainly in eCommerce and social media settings, where finding
out whether a post or review is positive or negative is useful information. In an
archaeological setting, it has been used to e.g. study the reactions to the destruction
of heritage sites by ISIS (Cunliffe & Curini, 2018) and how tourists interact with
monuments (Paolanti et al., 2019).

11.2.3 Topic Modelling
Topic modelling is a Machine Learning technique that can be used to cluster a
collection of documents into groups, based on the word content of those documents.
It is an unsupervised Machine Learning technique, as it does not require data
annotated by humans (see Sect. 11.5.1). Because of this, it is a quick and easy way
to start analysing a corpus. However, it is difficult to get accurate or meaningful
results, which is why document classification is often more worthwhile. That being
said, some potential uses for topic modelling include automatically grouping papers
about a certain topic into subtopics to decide which will be manually read, and
investigating changes in language use (or even theoretical trends) in archaeological
literature over time and/or space (Plets et al., 2021; Jackson et al., 2020). An
example of a topic model is shown in Fig. 11.4.

11.2.4 Information Retrieval
Related to Information Extraction is Information Retrieval (IR): methods to retrieve
a set of documents based on a user defined query. In essence, IR is building search
engines. IR is a research area of its own, with conferences and journals dedicated to
the topic, and an in depth discussion is out of the scope of this chapter. However, it
is worth briefly discussing IR in the context of Information Extraction.
Finding relevant literature for research is of course a common problem across all
of science, and archaeology is no exception. But currently, most literature search is
done using metadata search: searching through the title, description, and sometimes
keywords manually entered by the author or archival service. Such metadata can
not fully capture all the information present in a document, and as such relevant
information can be missed. More advanced search systems, including full-text

234

A. Brandsen

search and named entity search, have been explored to some extent (Paijmans &
Brandsen, 2010; Gibbs & Colley, 2012; Brandsen et al., 2019), but more research is
needed to create better search engines for archaeologists.

11.3 Previous Research on Information Extraction in the
Archaeology Domain
As mentioned by Richards et al. (2015), archaeological texts have excellent potential
for text mining, due to its relatively well-controlled vocabulary. Much work has
gone into producing thesauri (controlled word lists) in multiple languages (Gilman
& Newman, 2007; Brandt et al., 1992), which we can leverage to extract information
from text. In the last fifteen years, a range of projects have been undertaken which
have attempted to use text mining within archaeology, starting with rule-based
methods, and gradually moving towards Machine Learning based methods. In this
section, we provide a brief overview of these text mining studies.
Amrani et al. (2008) created a workflow allowing archaeologists to extract
information from English texts, but in a quite specialised way on a small collection.
At the same time, The OpenBoek project (Paijmans & Brandsen, 2010) used
Machine Learning to automatically label time periods and locations in Dutch field
reports, which were searchable together with the full text in a web application.
Byrne and Klein (2010) experimented with extracting archaeological events and
converting them to RDF (Resource Description Framework) triples, to increase the
interconnectivity between data sets from different sources.
The Archaeotools project used a combination of rule-based and Machine Learning approaches to automatically generate location, time period, and subject metadata
for a small selection of reports. This generated metadata could then be used for
searching in a facetted interface (Jeffrey et al., 2009). In the OPTIMA project,
Vlachidis (2012) applied rule-based techniques to perform NER and express entities
in the CIDOC-CRM schema.1 The output of this research was further built upon
in the STAR and STELLAR projects, where Tudhope et al. (2011) created a search
demonstrator which searches through extracted entities from text and five excavation
databases at the same time.
As part of the international ARIADNE project, some experiments were undertaken with NLP on grey literature. The ADS (Archaeology Data Service) in the
UK created a prototype web application which uses NER to automatically create
metadata for English reports, and experimented with rule-based NER for Dutch and
Swedish reports as well (Vlachidis et al., 2017).

1 The International Documentation Committee—Conceptual Reference Model (a way to model
information) for cultural heritage and museum documentation, as defined by the International
Committee for Documentation (CIDOC) (2014).

11 Information Extraction and Machine Learning for Archaeological Texts

235

In her Master’s thesis, Talboom (2017) specifically targeted zooarchaeological
entities in reports, using Machine Learning to perform NER. Building on her work,
Talks (2019) added more entity types and did an extensive evaluation with users.
Very recently, Fischer et al. (2021) used text mining as part of their research on
ruralisation in the Netherlands. They created a term document matrix and compared
this with a list of keywords related to the topic of ruralisation, to assess the
usefulness of a large number of reports for a number of topics.
In a slightly different direction, Plets et al. (2021) describes research on grey
literature from Belgium, looking at theoretical trends over time. They successfully
manage to use text mining to find these trends and chart the decrease in text
quality due to developer-led archaeology. Similarly, Jackson et al. (2020) used topic
modelling techniques on English data to see if there are patterned ways in which
archaeologists write about osteology.
In the Netherlands, the AGNES (Archaeological Grey literature Named Entity
Search) project has been working to create a search engine for Dutch excavation
reports, which leverages Machine Learning NER to make more efficient and detailed
search possible (Brandsen et al., 2019). This project will be extended to also include
English and German documents, and include more document types (such as books,
papers, etc) over the next four years.
From this overview it is evident that there is a clear focus on grey literature, presumably due to their ubiquity and potential for Information Extraction.
A lot of research also focuses on making data more FAIR (Wilkinson et al.,
2016), by automatically creating more metadata, by building search engines, and
by expressing unstructured text information into machine-readable formats to
increase interoperability.
Generally, the aim is to assist archaeologists in their research by making big data
sets that are difficult to navigate more manageable and searchable. The hope is that
by harnessing computer power to analyse and summarise big data, we can do better
synthesising research at large scales, leading to a better view of the past.

11.4 Preprocessing
As mentioned in the introduction, text is unstructured data. This means that there
is no external structure added to the data which allows computers to easily process
it. For example, in a database table, each number or string is stored in a cell, which
is in a specific column and row. This column/row structure allows computers to
‘understand’ the data and perform analyses. Humans can easily make sense of text
by reading it, because we have an incredible amount of background knowledge: we
know the world we exist in, we know which words describe which concepts in that
world, we know the language the text is written in, and we have the ability to read
words and process them into meaningful information in our minds.
However, to computers, text is just a sequence of individual symbols with no
inherent meaning, and they do not know the language or the concepts in the real

236

A. Brandsen

world that the text describes. This makes it a lot more difficult to work with text
data than it is to work with structured data. To convert text into a format where a
computer can work with it, we need to do some preprocessing. During this process,
we can also help our analyses by excluding or transforming words (further detailed
below). While preprocessing is not the most exciting part of a text analysis, the
choices made in this part of the process can make big differences in the outcome of
an analysis. In addition, it does tend to take up a substantial amount of time: often
more time is spent on defining and fine-tuning the preprocessing methods than on
the actual analysis itself. In the next couple of sections, an overview is given of
common preprocessing tasks, how to perform them, and what effect they (can) have
on the results of an analysis.

Note
Most of the software we reference in this chapter is Linux and Python based,
but all steps and methods can be done with other software on other platforms
as well.

11.4.1 Converting to Plain Text
For many data sets, the first step is to convert the files to plain text (.txt files). Most
often, text data sets in archaeology are collections of PDF (Portable Document
Format) or Microsoft Word files. They are not ideal for computation approaches
as these formats also encode style information (among other details), which we
generally do not need in our analyses and just cause unwanted noise.
A lot of tools exist to convert PDF files to plain text. Commonly used tools are
pdftotext,2 a command-line utility for Linux distributions, and the PDFMiner3
and PyPDF24 packages in Python.
For Word files, the most used tool is docx2txt,5 or the Python library
textract.6 which can extract text from a range of file formats, also including
image and sound files. Choosing which tool is best for a use case depends on the
end goal of the analysis, and which software you are using, but any tool that creates
plain text should be sufficient.

2 https://www.xpdfreader.com/pdftotext-man.html.
3 https://pdfminersix.readthedocs.io/.
4 https://pypi.org/project/PyPDF2/.
5 http://docx2txt.sourceforge.net/.
6 https://textract.readthedocs.io/en/stable/.

11 Information Extraction and Machine Learning for Archaeological Texts

237

11.4.2 Optical Character Recognition
Most documents we deal with nowadays are ‘born digital’, which means they were
created using computer software. Born digital documents will have the text encoded
as actual characters which we can extract using the methods mentioned above.
However, some files will be scanned pictures of existing hard copy documents, this
is mainly the case for older documents (before the 2000s). In this case, the file does
not contain actual computer readable characters, but just a grid of pixels in varying
colours as far as the computer is concerned. To extract computer-readable text, the
process of Optical Character Recognition (OCR) is needed (Merali & Smith, 1985).
This method ‘reads’ the image of the text, and uses pattern matching and/or Machine
Learning methods to translate these into machine-readable text. OCR is never 100%
accurate, and as such you should expect noise being introduced in this phase, with
the level of noise largely dependent on the quality of the original print and the quality
of the scans. But once the computer readable text is available, we can continue with
the rest of the preprocessing.

11.4.3 Sentence Boundary Detection
Most methods and analyses require one sentence per line in the text file, but often
this is not what the plain text conversion provides. The first step is to do sentence
boundary detection (also called sentence boundary disambiguation): automatically
detecting where sentences begin and end (Riley, 1989). This might seem trivial,
as sentences are normally ended by a full stop, exclamation mark or question mark,
but in practice this is quite challenging due to the potential ambiguity of punctuation
marks. A full stop for example, can be a part of an abbreviation, an email address, or
be a decimal point, all instances where we should not end a sentence. The following
sentences illustrate the problem:

We found a Neolithic(?) flint axe in pit no. 2, but didn’t find any pottery.
An adjacent post hole yielded enough charcoal for a C14 dating.

Here, a number of potential problems are highlighted: the question mark after
“Neolithic” and full stop after “pit no” are not the ends of the sentence. Also note
the full stop on the next line, this is not a typo, but a common occurrence in text
created by PDF conversion and/or OCR. The correct sentence split is on the full
stop after “pottery”:

238

A. Brandsen

We found a Neolithic(?) flint axe in pit no. 2, but didn’t find any pottery.
An adjacent post hole yielded enough charcoal for a C14 dating.

Sentence boundary detection is mainly done by using rules of varying complexity, but can also be tackled by Machine Learning. In Python, the most commonly
used method is the NLTK (Natural Language ToolKit) package, which also performs
a large number of other NLP tasks (Bird et al., 2009).

Note
The first sentence in the box above will be used to illustrate all the following
steps, to give a view of the full process.

11.4.4 Tokenisation
Like we mentioned earlier in this section, computers see text as a sequence of
symbols with no inherent meaning. This also means that computers do not know
what words are, or how to distinguish where a word starts and ends. To convert
a sentence into a sequence of words, we use tokenisation, which returns a list of
tokens. Tokens are similar to words, and a token often is a word, but not always. A
token is defined as an instance of a sequence of characters that are grouped together
as a useful unit for processing (Manning et al., 2008). This difference between words
and tokens can be illustrated by tokenising our example sentence:

We found a Neolithic ( ? ) flint axe in pit no . 2 , but did n’t find any pottery .

In this example, most of the tokens are indeed words, but punctuation marks
have also become individual tokens and “didn’t” has been converted to two separate
tokens. This tokenisation process is important as it removes noise from words (such
as the brackets and question mark after ‘Neolithic’) and turns sentences into chunks
of information that can be processed further.

11 Information Extraction and Machine Learning for Archaeological Texts

239

11.4.5 Normalisation
Once we have a list of tokens, we can normalise and clean the text. There are a lot of
different methods that can be applied at this stage, but the following most common
steps are discussed: lowercasing, removing words, stripping characters, stemming,
and lemmatisation.

Important
All normalisation preprocessing steps described below can affect the end
result of the analysis both positively and negatively. Depending on the data,
the methods used and the end goal, each normalisation technique should be
individually considered.

11.4.5.1

Lowercasing

This is pretty much what the title suggest: changing all uppercase characters to
their respective lowercase versions. Lowercasing is useful for most analyses, as it
decreases the number of different tokens in your data set, and merges the uppercase
and lowercase versions of a token into one. This intuitively makes sense as there is
no semantic difference between e.g. “Axe” and “axe”, but to a computer, these are
two different strings, and will be analysed separately.
There are some exceptions in which case it is better to keep the uppercase
characters, a good example is Named Entity Recognition (NER), a method for
automatically finding and labelling certain concepts such as person names and place
names. To be able to recognise such a name, having the casing intact is useful, as
names will most often be capitalised, making it easier to distinguish between the last
name “Flint” and the material “flint”. Lowercasing is a common function in most
text analysis software and programming languages, e.g. in Python the lower()
function can be used. Here is our example sentence with lowercasing applied:

we found a neolithic ( ? ) flint axe in pit no . 2 , but did n’t find any pottery .

11.4.5.2

Removing Words

Quite often, certain words are uninformative for an analysis, and removing them
will reduce noise. It removes low-level information from the text to give more
weight to important information. Besides this effect, removing common words also

240

A. Brandsen

reduces the size of the data, and thus reduces the training time of Machine Learning
algorithms.
A method that is often used is to remove so-called ‘stop words’. Stop words
are the most common words in a language, like articles, prepositions, pronouns,
conjunctions, etc. Some examples in English include “the”, “a”, “so”, “is” and
“that”. The words that should be deleted are defined in a manually defined stop
words list. Luckily, most—if not all—text analysis software provides such a list for
English, and often many more languages too. Here, we have used the NLTK stop
word list to remove them from our example:

found neolithic ( ? ) flint axe pit . 2 , find pottery .

Another way to reduce the total number of different tokens is to remove the
n most common tokens, this is very similar to removing a predefined list of stop
words. The other way around, it is also possible to remove tokens that only occur
n times in the data set, with n often being a number between 1 and 3. This way we
remove tokens that are uncommon, and thus uninformative for some tasks.
Do keep in mind that for certain types of analyses, having stop words or
uncommon words in your data can be useful. An example is sentiment analysis,
where words like “not” are indicative of a negative sentiment.
11.4.5.3

Stripping Characters

Besides removing words, we can also remove other types of tokens, such as punctuation, numbers and symbols. Doing all three on our example sentence leads to:

found neolithic flint axe pit find pottery

Quite often these types of tokens are not informative, but there are exceptions. If
you are trying to find C14 dates in text, removing all symbols will also remove “.±”,
which is a very strong indicator of a C14 date in archaeological texts.
11.4.5.4

Stemming

Stemming is the process of reducing words to their stem, i.e. removing the suffix of
the word. For example, “house”, “houses” and “housing” all have the same stem:
“hous”. As you can see, the stem does not need to be an actual word, although it
often is. It is sufficient if all related words are reduced to the same stem—even if

11 Information Extraction and Machine Learning for Archaeological Texts

241

that stem is not a word—as to a computer there is no semantic difference. Stemming
groups related words into one representation, again reducing the variety in the data.
If we apply stemming using the Porter stemmer (Porter, 1980) from NLTK to our
example sentence, we end up with:

found neolith flint axe pit find potteri

While stemming in general does reduce the variety of tokens, in this case it has
not: “found” and “find” have been assigned separate stems, while really they have a
very similar meaning. Stemming does not take into account the actual meaning of a
word, but uses rules to remove suffixes.
11.4.5.5

Lemmatisation

Lemmatisation is similar to stemming, but a bit more advanced. It reduces a word
not to its stem, but to its lemma: the dictionary form of a word. Instead of chopping
off a word’s suffix, it uses linguistic features to determine the Part Of Speech (POS)
and semantic meaning of a word, and subsequently finds the corresponding lemma.
This means that the lemma of “axing” is “ax” and the lemma of “axe” is “axe”,
indicating the semantic difference between the two. If we use lemmatisation instead
of stemming, our example sentence looks like this:

find neolithic flint axe pit find pottery

Here we see that “found” and “find” are both assigned the same lemma:
“find”, unlike with stemming. Depending on your application, either stemming
or lemmatisation can be more appropriate, but something to keep in mind is that
lemmatisation is a more difficult task than stemming, and as such is less accurate.
11.4.5.6

Normalisation and Information Loss

As already indicated with examples for e.g. lowercasing, stripping characters, and
removing words, not all normalisation techniques are useful for every analysis. This
is because the goal of normalisation is to reduce the complexity i.e., simplify data.
However, when data is simplified, this means some information is lost. The trick is
finding a balance between normalising the text to such an extent that classifiers can
more easily learn statistical patterns, while not removing any information that might
be useful for that classifier.

242

A. Brandsen

Generally, there are certain types of preprocessing that are commonly used for
each type of analysis, but every data set is different and requires a thorough consideration by inspecting the data and comparing normalisation steps. In archaeology,
we often deal with particular types of fuzziness and ambiguity in our data, when
compared to other domains. This means that when working with archaeological
data, careful consideration is needed from both the computer science side and
the archaeology side, to make optimal choices regarding normalisation and other
choices during the development of text mining tools.

11.4.6 Adding Structure
At this point we have preprocessed our text, and we are at the final step before we
can start our analysis: adding structure, so a computer can do something with our
data. The easiest way to do this is the so-called Bag of Words (BoW) approach
(Manning et al., 2009). Here we simply create a table with a column for each word,
and each row representing a sentence (or document). In the cells, we store how often
a word occurs in each sentence. This word count is called Term Frequency (TF). See
Table 11.1 for an example using the two sentences we introduced in Sect. 11.4.3.
Note that the order of the words in the original sentences is lost, this is why it
is called a Bag of Words: all the words end up in a ‘bag’, shuffled and without
ordering. Most text analysis software will do this data transformation automatically.
While here the BoW is represented as a table for clarity, in reality each sentence
is stored as a vector: a list of numbers. In Python, this would look like:
[0, 1, 0, 0, 0, 2, 1, 0, 1, 1, 0, 1, 0]
At this point, the computer does not know which TF stands for which word,
because it does not need this information. Based purely on the vectors of a large
number of sentences (or documents), it can extract statistical relationships and
make predictions based on those. For example, if we are interested in automatically
finding sentences about the Neolithic, an algorithm would infer that if the TF of
‘neolithic’ is not 0, it has the label Neolithic. Of course this is not a great example
as just looking for the term ‘neolithic’ would be enough to find that out, but
relationships between other (less literal) words can also be used to make predictions.
11.4.6.1

Term Frequency and Inverse Document Frequency

In the above example, we used the Term Frequency to create the vector. While this
is an easy way to create a vector, it is not always ideal. Some words are simply
more frequent in general, but that does not mean they are actually more important
or relevant. To counteract this problem, we can use the Term Frequency–Inverse
Document Frequency (TF-IDF), which lowers the value if a word occurs in many

1
2

Adjacent
0
1

Axe
1
0

c14
0
1

Charcoal
0
1

Dating
0
1

Find
2
0

Flint
1
0

Hole
0
1

Table 11.1 Example of two sentences converted into a Bag of Words table, after preprocessing
Neolithic
1
0

Pit
1
0

Post
0
1

Pottery
1
0

Yield
0
1

11 Information Extraction and Machine Learning for Archaeological Texts
243

244

A. Brandsen

documents (Manning et al., 2009). TF-IDF is currently the most used statistical
measure for information retrieval and text mining (Beel et al., 2016).

11.4.7 Selecting Preprocessing Steps
All the preprocessing steps discussed here have different effects on the eventual
input data for Machine Learning, and can greatly affect the outcome. It is always
worth considering which steps will help for a particular analysis, as not all steps are
always applicable.
In general though, when doing document classification and topic modelling, most
of these steps will help increase the performance, as they decrease the variety in the
text and group different forms of semantically similar words together, making it
easier to generalise over the data. On the other hand, for NER (and also e.g. word
embeddings, see Sect. 11.5.5), it is wise to only perform sentence detection and
tokenisation and none of the normalisation steps, as differences in e.g. casing and
symbols can be key indicators for entities. Lastly, another option is to simply try all
possible combinations of preprocessing steps in a brute force method, and select the
best performing combinations (Brandsen & Koole, 2021).

11.5 Machine Learning
Once the data has been selected, preprocessed, and converted into the right format,
the actual analysis can be performed, i.e. NER or document classification. Most
information extraction methods used today are based on Machine Learning (ML),
a subfield of Artificial Intelligence. ML can be defined as the study of algorithms
that automatically improve through experience (Mitchell, 1997). This means that
these algorithms can build models based on training (or sample) data, without being
programmed by a human to do so, in this way ‘learning’ by themselves how to
predict labels for unseen data. Machine Learning is ubiquitous in modern life, being
used in everything from predicting the spam status of emails to preventing traffic
accidents in cars by automatically detecting obstacles.
Within archaeological research, ML is also becoming more popular, and is being
used for a wide range of problems. Some examples are the automatic detection of
archaeological features in LiDAR (Light Detection And Ranging) data (Verschoofvan der Vaart et al., 2020; Trier et al., 2018), classification of pottery types
based on photos (Gualandi et al., 2021; Pawlowicz & Downum, 2021), analysing
projectile point typology (Nash & Prewitt, 2016), and differentiating between lithic
assemblages (Grove & Blinkhorn, 2020). For a more in depth overview of ML in
archaeology and cultural heritage, see Bickler (2021) and Fiorucci et al. (2020).
Machine Learning has also been applied to textual data, both modern and
ancient. Some examples of the analysis of ancient texts are the translation of

11 Information Extraction and Machine Learning for Archaeological Texts

245

cuneiform script using an app (Sanders, 2018) and the reconstruction of missing
pieces of ancient Greek text (Sommerschield, 2020). But mostly, ML is used to
analyse modern texts about archaeology: e.g. books, papers, theses, and field reports
written by archaeologists in the last couple of decades. Some examples include
codifying semantically consistent definitions of archaeological concepts (Davis,
2020), Named Entity Recognition (Paijmans & Brandsen, 2010; Vlachidis et al.,
2017; Tudhope et al., 2011; Talboom, 2017; Brandsen et al., 2019; Vlachidis et al.,
2021), classifying reports on time period, location and/or subject (Jeffrey et al.,
2009; Brandsen & Koole, 2021), topic modelling (Jackson et al., 2020), creating a
list of relevant documents for certain topics (Fischer et al., 2021) and investigating
theoretical trends over time (Plets et al., 2021).
Machine Learning is often juxtaposed with rule-based approaches: methods
where a researcher defines a set of rules by hand, which are used to predict labels.
These rule-based methods have been successful in many cases, but we see that ML
approaches are being used more and more, as they are generally more effective at
learning patterns in complex data (Richards et al., 2015; Bickler, 2021). This is also
why this chapter will mainly focus on ML methods. That being said, rule-based
approaches still have a place in current research, especially for problems where
there is not a lot of training data, and can be used together with ML methods in
many cases.

11.5.1 Supervised and Unsupervised Learning
Machine Learning can be subdivided into two main types: supervised and unsupervised learning. The difference is that supervised learning uses data that has been
labelled by humans, while unsupervised learning uses raw, unlabelled data. In effect,
supervised learning is where an algorithm learns patterns between the raw data and
true labels, while unsupervised learning detects patterns in the raw data itself.
To give an example of supervised learning in text, we can take the automatic
labelling of papers with topics. Imagine a stack of thousands of archaeology papers
with no information on topic (no assigned keywords in the metadata). But it would
be useful to know the topic, so we can make a selection of which papers to read.
It is possible to create a classifier model to predict the topic (or class) of a paper,
by feeding a supervised Machine Learning algorithm a collection of data that has
been labelled by an archaeologist. See Table 11.2 for a simplified example with two
possible subjects: Neolithic or Bronze Age. The first four rows are training data,
with a label assigned by a human. The ‘Content’ column contains the titles of the
papers, preprocessed as discussed in Sect. 11.4.

246

A. Brandsen

Table 11.2 Simplified example of Machine Learning document classification, with four human
labelled training examples and one unlabelled document in the bottom row
Type
Training
Training
Training
Training
Prediction

Content
Flint domestication use wear analysis
Knapping flint wheat harvest
Flint bronze sickle knapping
Bronze axe wood use wear analysis
Domestication flint knapping microscope

Class (or label)
Neolithic
Neolithic
Bronze Age
Bronze Age
???

Table 11.3 Example prediction based on how often terms occur in a class in the training data.
The percentages show in which proportion of documents from a class this term occurs. The label
‘Neolithic’ can be assigned with a 66.6% certainty
Term
Domestication
Flint
Knapping
Microscope
Average

Neolithic %
50%
100%
50%
n/a
66.6%

Bronze age %
0%
50%
50%
n/a
33.3%

Try it Yourself
Based on the information in this table, a human would be able to predict the
label of the last row. If you want to do some human brain powered ‘machine’
learning, you can try it yourself: which label do you predict for the last row?

By reading the words in the examples, humans can figure out that the terms
“domestication”, “flint”, and “knapping” are indicators that the predicted label
should be “Neolithic”, even if they have no prior knowledge of archaeology.
Computers can do this too, but mathematically. Imagine each term being assigned a
score between 0 and 1, based on which documents the terms occur in. A score of 0
means it only occurs in Neolithic, a score of 1 means it only occurs in Bronze Age,
and a score between 0 and 1 means it occurs in both to some degree. For a new,
unlabelled document, we can then calculate the average of all the term scores and
predict a label based on whether it is above or below 0.5. This process is illustrated
in Table 11.3.
For each term, we calculate in what percentage of documents it occurs for
each label, and then average those scores to get a final prediction. The term
“domestication” occurs in 50% of Neolithic documents, and not at all in Bronze Age
documents, meaning it is an indicator (or feature) of a document belonging to the
Neolithic class. “flint” occurs in both classes but more in Neolithic and “knapping”
occurs in both equally, meaning it does not indicate either class. Then finally,
“microscope” does not occur in either class, so also does not affect the classification.

11 Information Extraction and Machine Learning for Archaeological Texts

247

Table 11.4 Simplified example of four documents, with term frequencies for the terms ‘flint’ and
‘bronze’
Document number
1
2
3
4

Document content
Flint bronze flint bronze flint
Flint flint flint flint
Bronze flint bronze bronze bronze
Bronze bronze bronze

Flint TF
3
4
1
0

Bronze TF
2
0
4
3

When the scores are averaged, we can see that the label “Neolithic” is predicted
with 66.6% certainty. Of course, this is a very simplified model of classification, but
should give an insight into how Machine Learning algorithms deal with text data. In
real world examples, there are often many more possible labels, many more terms
to take into account, and possibly bias due to differences in document size, all of
which complicate matters.
Unsupervised learning does not use any labelled data, but can still make
subdivisions in data. In essence, most unsupervised learning methods are some
variation of a clustering algorithm. Of course, archaeologists are very familiar with
clustering algorithms, and we have been using these methods for at least 40 years
(Doran & Hodson, 1975). Some examples include geospatial clustering of finds
(Bogdanovic, 2015) and clustering artefacts into a typology (Gilboa et al., 2004). It
is possible to do the same with text data, after transforming the text into a vector (as
discussed in Sect. 11.4.6). The most used unsupervised learning technique used for
archaeological texts is topic modelling: automatically creating a number of clusters,
each with a certain topic, defined by which words are most frequent in that cluster.
A simplified example is provided in Table 11.4, where four documents are
preprocessed to only contain the terms ‘flint’ and ‘bronze’, and the term frequencies
are shown for each. At this point, the documents have been vectorised: for each
document there is a vector with two dimensions (the dimensions being flint and
bronze). This can also be expressed as a list of vectors (here displayed in Python
syntax):
{
1
2
3
4

:
:
:
:

[3
[4
[1
[0

,
,
,
,

2],
0],
4],
3]

}
For each document number, there is a corresponding list containing two numbers
(a two-dimensional vector). By treating these numbers as x and y values, we can
easily plot this as a scatter plot to visualise the data: see Fig. 11.2. Here, the
document vectors are plotted in two-dimensional vector space, and an algorithm has
been applied to cluster the points into two groups based on their position in the plot.
In essence, this is how clustering text data works, although normally the vectors used
have more than two dimensions, often hundreds or even thousands, which makes

248

A. Brandsen

Fig. 11.2 Scatter plot of the data from Table 11.4. Points have been clustered and assigned a label
and colour

these hyper-dimensional vector spaces difficult to intuitively illustrate.7 The group
label and colour have been manually assigned, and this is an important point: any
clustering algorithm will return a number of clusters, but it can not assign a label,
this has to be done manually afterwards by inspecting the data.
In this example, it was very easy to assign topic labels, as there are two
well-defined groups with different content, but this is not always the case. An
example is the work by (Plets et al., 2021), who used topic modelling to try and
detect changes in theoretical thought in archaeology over time. Unfortunately, the
clusters presented by the algorithm could not be assigned to different schools
of thought. Another problem with (some) clustering algorithms is that they are
non-deterministic, i.e. running the same analysis on the same data with the same
settings will produce differing results every time. The size of the difference can
be small or substantial, and any conclusion based on the method will have to take
this into account.
As unsupervised learning does not provide actual labels for our data, it is not
often used. Therefore, the rest of this chapter will mainly focus on the characteristics
of supervised learning.

7 There are possibilities to display multi-dimensional data in two or three dimensions: an often
used method is Principal Component Analysis (Wold et al., 1987) which ‘flattens’ data, but also
loses complexity.

11 Information Extraction and Machine Learning for Archaeological Texts

249

11.5.2 Training Data and Validation
For any supervised Machine Learning method, training data with annotated labels is
required for the algorithm to learn from. This training data is sometimes also called
the ‘ground truth’. Depending on the task, different types of labels are needed. In the
case of document classification, one or more labels is needed for each document. For
Named Entity Recognition, a label is required for each token. Such a combination
of an observation (a document or a token) and a corresponding label is called a
sample. It is important that the training data is representative of the entire data set,
so the algorithm can learn—and deal with—the variety that exists in the data.
Once the labelled data has been created, it is required to split the data into a train
set and a test set. The train set is used to train the model, so the algorithm uses these
samples to create statistical relations. The test set is then used to evaluate how well
the model is performing. This is done by letting the model predict labels on the test
set, and then comparing them to the ground truth labels to calculate a performance
metric (see Sect. 11.5.4). It is important that the model does not ‘see’ the test set
during training, as that would give an unfair advantage, and the performance score
would not reflect the effectiveness it will have on unlabelled data.
Often, the data is split into 80% train set and 20% test set, also called an 80/20
split. But other splits with more or less test data can be useful, depending on the
task and the amount of data available. However, such a static split does come with a
caveat: if the test set coincidentally happens to be very easy or hard to predict, this
does not truly reflect how well the model would perform on new data. To prevent
this, it is often better to perform leave-one-out cross validation. This means that the
data is split into k equal sized chunks, and the model is trained k times, each time
using one of the chunks as the test set, and the rest of the chunks (k-1) as training
data. Afterwards, the performance metrics are averaged across the k runs to provide
a more well-rounded indication of the model’s quality.
For any task, a relatively large number of samples is needed for the algorithm
to be effective. Unfortunately, there is no predefined number of samples which
would guarantee good performance: each task is different and has varying levels
of complexity, which influences the amount of data needed. For some simpler tasks
with just two possible labels (a binary task), 300 to 500 samples might be enough,
but for e.g. complex NER, thousands of examples are needed for each target entity.
One way to determine if more data will improve the performance, is by again
splitting the data into k chunks (with k often being 10), and running the algorithm
k-1 times, starting with 2 chunks (1 train, 1 test) and every time adding one chunk
of data (which becomes the test set). The performance scores can then be plotted
in a line graph to judge whether adding more labelled data would help. A curve
that flattens out means adding more data will probably not help, but a curve that
has not flattened out yet indicates more data will probably increase the performance
(Brandsen et al., 2020).

250

A. Brandsen

11.5.3 Commonly Used Algorithms for Information Extraction
Many algorithms have been developed for Machine Learning, each with different
strengths and weaknesses. To give an idea of which are useful for Information
Extraction, some commonly used methods are discussed here for each type of task.
This list is far from exhaustive, for a more complete list see (Mohri et al., 2018).
For text classification, the most commonly used algorithm used to be Naive
Bayes (NB), which uses the probabilities of known events to predict new events.
In fact, the example in Table 11.3 is a form of NB. It is particularly useful when
working with small training data, as it learns quickly compared to other methods
which require more data. However, this method is not very powerful or good at
handling complex data, and has been largely superseded by the Support Vector
Machines (SVM) algorithm (Cortes & Vapnik, 1995).
SVM works by plotting vectors in a space (like in Fig. 11.2), and drawing a line
(called a hyperplane in multidimensional space) dividing the points so the distance
between all points and the hyperplane is maximised. Any new vectors will be
assigned a label depending on which side of the hyperplane it is plotted. To illustrate
this, a hyperplane has been added in Fig. 11.3, and a new, unlabelled point is added
(green square). The red point—based on this hyperplane—would be classified as
‘Neolithic’ by the SVM. In reality, these hyperplanes are never straight, but bend
around the vector points in hyper-dimensional space. This can be calculated, but
unfortunately not visualised.
For NB and SVM, the order of the samples does not matter and is not taken into
account. However, for Named Entity Recognition, a lot of information is encoded
in the order of—and context around—a token. Think of e.g. the time period entity,

Fig. 11.3 Scatter plot of the data from Table 11.4. Points have been clustered and assigned a
label and colour. A hyperplane dividing the two groups of points has been added in green. A new,
unlabelled vector is displayed (green square)

11 Information Extraction and Machine Learning for Archaeological Texts

251

it is very likely that time periods are preceded by the tokens “around” or “from”,
for example “we found a house from 1800 BCE”. Having information about tokens
before and after the current token the algorithm is trying to label is very useful, and
so for NER, other algorithms are more effective. The most well-known one is the
Conditional Random Fields (CRF) algorithm, and is generally the starting point for
any sequence classification problem. (Joachims, 1998). It is relatively easy to use,
does not require much computing power or time to run, and it generally produces
good results. NB, SVM and CRF are all available to use via the scikit-learn
Python library (Pedregosa et al., 2011), among others.
For both document classification and NER, neural networks (also known as Deep
Learning) have seen an increase in popularity over the last decade. As they are
able to capture complexity more accurately than ‘traditional’ algorithms, they can
provide state-of-the-art performance. For document classification, Recurrent Neural
Networks (RNNs) and Convolutional Neural Networks (CNNs) are often used. For
NER, the Bidirectional Long Short Term Memory (BiLSTM) algorithm is popular,
as well as the Bidirectional Encoder Representations from Transformers (BERT)
architecture. BERT is discussed in more detail in Sect. 11.5.6 below.
Clustering is often performed using the k-means algorithm. Also used extensively with other types of data, k-means aims to group vectors into k clusters
by minimising the within-cluster variance. Specifically for topic modelling, LDA
(Latent Dirichlet Allocation) is often used, which can relatively easily be implemented with the pyLDAvis Python library (Sievert & Shirley, 2014). In Fig. 11.4 an
example is shown of the output of pyLDAvis, displaying ten clusters of documents
about ancient fire use. Topic number 8 has been highlighted, and the top relevant
terms for that topic displayed on the right. Judging from the top terms, this particular
cluster seems to be about burning bones and/or cremations.

11.5.4 Evaluation and Performance Metrics
To see how well an algorithm performs, we need to evaluate the output. For
unsupervised learning, there are no target labels, so it is not possible to do a quantitative evaluation, and a qualitative evaluation is needed by manually inspecting
the outcome. For supervised learning, it is possible to quantitatively measure the
output, as we can compare the labels predicted by the algorithm to the true labels
assigned by human annotators, and calculate performance metrics. However, due to
the fuzziness and ambiguity in archaeological data, sometimes a manual inspection
of the predicted labels is warranted, to see in detail where the algorithm is correct
and incorrect (or nearly correct). However, a performance metric should always be
calculated when possible, as this gives an overview of the performance over the
entire test set, but also because this makes it possible to compare different methods
on the same data, and promotes reproducible open science.

Fig. 11.4 Topic model visualisation by pyLDAvis

252
A. Brandsen

11 Information Extraction and Machine Learning for Archaeological Texts
Table 11.5 Illustrating the
true/false positive/negative
categories
Label

True
False

253
Prediction
True False
tp
fn
fp
tn

In the rest of this section, the most common metrics for text mining are discussed,
but many more exist, and it is worth investigating which one is most suitable for a
given task. Most metrics involve calculations of percentages between correctly and
incorrectly classified items. A label is predicted by the algorithm for each item in the
test set, and those predicted labels are compared to the true labels. Each prediction
can then be assigned to one of the categories listed below. The categories and
metrics are further explained with an archaeological example: imagine a document
classification task where the goal is to automatically label a large set of papers
as being relevant or irrelevant to a certain research topic, e.g. Early Medieval
cremations in Europe. As the amount of possibly relevant papers is too large to
manually inspect, using a Machine Learning algorithm to make a preselection could
be useful.
• True positive (tp). When a paper is relevant, and the label is correctly predicted
as ‘relevant’.
• True negative (tn). When a paper is irrelevant, and the label is correctly predicted
as ‘irrelevant’.
• False negative (fn). When a paper is relevant, but the label is incorrectly predicted
as ‘irrelevant’. More simply put: a paper that has not been recognised as relevant
by the system.
• False positive (fp). When a paper is not relevant, but the label is incorrectly
predicted as ’relevant’. More simply put: the system thinks a paper is relevant
when it is not.
These categories are further illustrated in Table 11.5. Once all the items have
been assigned a category, it is possible to calculate performance metrics. The most
used measures in Machine Learning in general are recall, precision and F1 score..
Recall shows what proportion of actual positives was identified correctly. For
our example, it indicates out of all the relevant papers, what percentage have been
correctly labelled as ‘relevant’. It can also be viewed as the percentage of papers
that have been found. It is defined as follows:

Recall =

.

tp
tp + f n

(11.1)

254

A. Brandsen

Precision shows what proportion of positive identifications was actually correct.
For our example, it indicates out of all the papers labelled as ‘relevant’, what
percentage was actually relevant. In essence, this means that it shows that when
an algorithm predicts a label, how often it is right. It is defined as follows:

Precision =

.

tp
tp + fp

(11.2)

The F1 score (or F measure) combines recall and precision to provide an overall
evaluation metric. More specifically, it is the harmonic mean of precision and recall,
and is defined as:

F =2·

.

precision · recall
precision + recall

(11.3)

The 1 in F1 means that recall and precision are equally important (and thus
equally weighted) when calculating the harmonic mean. But in some cases, either
recall or precision are more important, in which case the F score can be weighted
to favour recall or precision more. This is done by changing to the F0.5 score
(precision is 2 times more important/weighted) or F2 score (recall is 2 times
more important/weighted). For example, (Brandsen et al., 2019) shows that when
searching for documents, Dutch archaeologists are more interested in getting as
many relevant documents as possible, even if this means getting more irrelevant
documents. This means that the recall is more important, and the F2 score would be
more suited for that task.
Other metrics are less popular, but can be useful in certain situations. These
include the ROC (Receiver Operating Characteristic) curve, the related AUC
(Area Under the ROC Curve), and the MCC (Matthews Correlation Coefficient)
(Verschoof-van der Vaart & Landauer, 2021). If a less popular metric is chosen, it is
useful to also include the most common metric(s) for the task as well, to be able to
compare algorithms between studies.
Generally, these metrics are not calculated manually. Most Machine Learning
libraries will have functions that can automatically calculate the metrics, based on
an input of predicted labels and correct labels. For Python, the Metrics functions
of the scikit-learn library have the metrics discussed here available, among
many others.

11.5.5 Word Embeddings
Word embeddings are a different way to represent tokens. Instead of using the
actual string (or a number assigned to that string), word embedding algorithms

11 Information Extraction and Machine Learning for Archaeological Texts

255

convert tokens into vectors. Instead of creating a vector for each document (like
in Sect. 11.4.6), a vector is created for each token in the document. The vectors are
created by the word embeddings algorithm in such a way that words which occur in
similar contexts (i.e. have similar words near it in sentences), have similar vectors
(i.e. are near each other in the vector space). This is based on the distributional
hypothesis: words that have similar contexts will have similar meanings (Harris,
1954). Once the vectors for the individual tokens have been calculated, a single vector for the document can be created (for example by averaging all the token vectors).
Word embeddings are useful because to a computer, “axe” and “adze” are two
completely unrelated strings, the computer does not know they are semantically
similar. However, if the vectors of these two words are near each other, the computer
can use this information to understand that they are similar. To illustrate this, the
following two sentences (before and after preprocessing) would have substantially
different vectors when using the method from Sect. 11.4.6:
• “The axe was used to chop wood” (axe used chop wood)
• “The birch was carved with an adze” (birch carved adze)
In fact, after preprocessing, none of the tokens overlap between sentences. Yet
it is clear to humans that these sentences have substantial semantic similarity.
Assuming the word embeddings have been created correctly, these two sentences
would be quite similar: axe is similar to adze, carve is similar to chop, and birch
is similar to wood. And when averaged into a document vector, the computer
understands these sentences to be similar, even though the tokens are completely
different. This makes word embeddings incredibly powerful for dealing with text
data, and consequently, it has been applied with great success to many tasks: from
document classification and NER, to automatically expanding search queries and
tracking the change of meanings of words over time.
Word embeddings can be created by multiple algorithms, the most popular
currently are word2vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014)
and FastText (Bojanowski et al., 2016). Instead of just averaging word vectors
to get to a document vector, it is also possible to use the doc2vec model (Le &
Mikolov, 2014) to create more sophisticated document vectors. All these models
can be implemented in Python using the gensim library (Rehurek & Sojka, 2010).

11.5.6 Transfer Learning
While word embeddings are a significant improvement over the bag-of-words
model, the current state of the art in NLP is transfer learning, specifically transformer based methods. These Deep Learning algorithms can ‘learn’ language
from extremely large unlabelled text collections (billions of tokens) to create a
language model, and can then use this model to better perform specific tasks.
The idea behind these language models, is that they mimic human behaviour: by
already knowing a language, it is easier to try and predict classes. The most well-

256

A. Brandsen

known architecture is called BERT (Bidirectional Encoder Representations from
Transformers), developed by researchers at Google (Devlin et al., 2019).
Similar to word embeddings, BERT also creates vectors for tokens. However,
traditional word embeddings such as word2vec are context independent, meaning
a token will always have the same vector, regardless of its context. In this case,
the word ‘flint’ will have the same vector in “a flint axe” and “Mr. Flint” while
being semantically very different. BERT produces context-dependent embeddings,
meaning the vector of a token is different if it occurs in a different context. This
means that BERT is particularly useful for tasks where synonymy and polysemy are
a problem, and can handle more complex tasks with higher performance.
While BERT is being used extensively in other domains for a wide range of tasks,
in archaeology it has not been used much yet. An exception to this is the work by
Brandsen et al. (2021), who used BERT for NER, showing substantial improvement
over a CRF based method.
BERT does have large potential for use in archaeology, as it leverages unlabelled
data to train the neural net. Normally for deep learning algorithms, a very large
amount of labelled data is needed to train the network, which often is not available
in our domain. By creating a language model with unlabelled data, only a modest
amount of labelled data is needed to fine-tune the model on a specific NLP task.
The unlabelled training data does not necessarily need to be archaeological data
either, as long as it is in the same language, hence why it is called ‘transfer
learning’, transferring knowledge from one domain to another. However, research
does show that using unlabelled training data from the domain itself can lead to
modest increases in performance (Lee et al., 2019; Beltagy et al., 2020; Brandsen
et al., 2021).

11.6 Conclusions
This chapter has described various Information Extraction methods, how to perform
these using Machine Learning, and given an introduction to data preprocessing and
the evaluation of text mining algorithms, with a focus on practical archaeological
examples. This provides a snapshot of the current state of research, as well as some
ideas and inspiration for future directions.
Even though a large proportion of this chapter is dedicated to machine automation, computers are not going to replace archaeologists any time soon, as also noted
by other archaeologists working with Machine Learning (Verschoof-van der Vaart
et al., 2020; Traviglia et al., 2016). While computers are great at calculating answers,
they are not able to ask any questions: formulating research ideas and analysing
the output of algorithms will still have to be done by humans. A certain level of
creativity and ability to ‘connect the dots’ is needed in science, which we need
human brains for. While neural networks are getting increasingly complex and are
starting to mimic human learning, they are still rudimentary when compared to the
incredible ability of humans to learn from scratch, connect ideas, and think out of

11 Information Extraction and Machine Learning for Archaeological Texts

257

the box, while algorithms are (quite literally) bound by their ‘box’, or the limits of
the programming that created them.
Instead, computational tools are meant to further enhance the archaeologist’s
ability to draw meaningful conclusions from raw data and to make this process more
efficient. Outsourcing menial tasks to e.g. students and volunteers has a long history
in archaeology, and science as a whole. The more we can replace this valuable
human time with relatively unvaluable computing time, the more we can focus
on the interesting parts of archaeology: drawing conclusions and building theories
relating to past human behaviour.
However, this new big data paradigm (Löwenborg, 2018) and the associated techniques also pose new challenges (Kintigh et al., 2014; Gattiglia, 2015). An example
is the reliability of data. While data has always been central to archaeological knowledge, in this new paradigm large data sets can be presumed to be unproblematic, and
any problems with quality or reliability to be overcome purely by the quantity of
data (Huggett, 2020). This can cause the conceptual understanding of the creation
of archaeological data—gained over decades of discussion—to be overlooked when
performing these large scale syntheses (Cunningham & MacEachern, 2016). At the
same time, discussions around big data have seen a renewed interest in the relation
between data, and the knowledge created from this data (Leonelli, 2015).
As big data is getting increasingly ubiquitous in archaeology, it seems inevitable
that computational methods to find, combine, and analyse nuggets of information
from large data sets will become increasingly common place. As other domains—
and specifically computer science scholars—push the state of the art of NLP towards
ever-increasing performance, we as archaeologists can use and adapt these new
tools with relative ease, or collaborate with experts. Using these methods and
applying them to our own data, for our own research questions, we can perform
better synthesising research at larger scales, leading to a better, more thorough
understanding of the past.

References
Amrani, A., Abajian, V., & Kodratoff, Y. (2008). A chain of text-mining to extract information
in archaeology. In Information and communication technologies: From theory to applications,
ICTTA 2008, Damascus, Syria (pp. 1–5). https://doi.org/10.1109/ICTTA.2008.4529905
Beel, J., Gipp, B., Langer, S., & Breitinger, C. (2016). Research-paper recommender systems: A
literature survey. International Journal on Digital Libraries, 17(4), 305–338. https://doi.org/
10.1007/S00799-015-0156-0
Beltagy, I., Lo, K., & Cohan, A. (2020). SCIBERT: A pretrained language model for scientific text.
In EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings
of the Conference. Hong Kong: Association for Computational Linguistics. https://doi.org/10.
18653/v1/d19-1371
Bevan, A. (2015). The data deluge. Antiquity 89(348), 1473–1484. https://doi.org/10.15184/aqy.
2015.102

258

A. Brandsen

Bickler, S. H. (2021). Machine learning arrives in archaeology. Advances in Archaeological
Practice, 9(2), 186–191. https://doi.org/10.1017/aap.2021.6
Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python. Sebastopol:
O’Reilly.
Bogdanovic, I. (2015). Spatial cluster detection in archaeology: Current theory and practice. In
Mathematics and archaeology (pp. pp 366–382). Boca Raton: CRC Press.
Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2016). Enriching word vectors with subword
information. Transactions of the Association for Computational Linguistics, 5(1), 135–146.
Brandsen, A., & Koole, M. (2021). Labelling the past: Data set creation and multi-label classification of dutch archaeological excavation reports. Language Resources and Evaluation, 56,
543–572. https://doi.org/10.1007/s10579-021-09552-6
Brandsen, A., Lambers, K., Verberne, S., & Wansleeben, M. (2019). User requirement solicitation
for an information retrieval system applied to Dutch grey literature in the archaeology domain.
Journal of Computer Applications in Archaeology, 2(1):21–30, https://doi.org/10.5334/jcaa.33
Brandsen, A., & Lippok, F. (2021). A burning question – Using an intelligent grey literature search
engine to change our views on early medieval burial practices in the Netherlands. Journal of
Archaeological Science, 133, 105456. https://doi.org/10.1016/j.jas.2021.105456
Brandsen, A., Verberne, S., Lambers, K., & Wansleeben, M. (2021). Can BERT dig it? - Named
entity recognition for information retrieval in the archaeology domain. http://arxiv.org/abs/
2106.07742
Brandsen, A., Verberne, S., Wansleeben, M., & Lambers, K. (2020). Creating a dataset for named
entity recognition in the archaeology domain. In Proceedings of the 12th Language Resources
and Evaluation Conference (pp. 4573–4577). Marseille: European Language Resources Association. https://www.aclweb.org/anthology/2020.lrec-1.562/
Brandt, R., Drenth, E., Montforts, M., Proos, R., Roorda, I., & Wiemer, R. (1992). Archeologisch
Basisregister. Tech. Rep., Rijksdienst voor Cultureel Erfgoed, Amersfoort.
Byrne, K., & Klein, E. (2010). Automatic extraction of archaeological events from text. In B.
Frischer, J. Crawford, & D. Koller (Eds.), Making history interactive: Computer applications
and quantitative methods in archaeology 2009. BAR International Series (vol. 2079, pp. pp
48–56). Oxford.
Chowdhury, G. G. (2005). Natural language processing. Annual Review of Information Science
and Technology, 37(1), 51–89. https://doi.org/10.1002/aris.1440370103
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.
https://doi.org/10.1007/BF00994018
Cunliffe, E., & Curini, L. (2018). ISIS and heritage destruction: A sentiment analysis. Antiquity,
92(364), 1094–1111. https://doi.org/10.15184/AQY.2018.134
Cunningham, J. J., & MacEachern, S. (2016). Ethnoarchaeology as slow science. World Archaeology, 48(5), 628–641.
Davis, D. S. (2020). Defining what we study: The contribution of machine automation in
archaeological research. Digital Applications in Archaeology and Cultural Heritage, 18,
e00152. https://doi.org/10.1016/J.DAACH.2020.E00152
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep
bidirectional transformers for language understanding. In Proceedings of the 2019 Conference
of the North American Chapter of the Association for Computational Linguistics: Human
Language Technologies (vol. 1, pp. 4171–4186). Minnesota: Association for Computational
Linguistics. https://doi.org/10.18653/v1/N19-1423
Doran, J., & Hodson, F. (1975). Mathematics and computers in archaeology. Harvard: Harvard
University Press.
Feldman, R., & Sanger, J. (2007). The text mining handbook: Advanced approaches in analyzing
unstructured data. Cambridge: Cambridge University Press.
Fiorucci, M., Khoroshiltseva, M., Pontil, M., Traviglia, A., Del Bue, A., & James, S. (2020).
Machine learning for cultural heritage: A survey. Pattern Recognition Letters, 133, 102–108.
https://doi.org/10.1016/j.patrec.2020.02.017

11 Information Extraction and Machine Learning for Archaeological Texts

259

Fischer, A., Londen, H. V., Bercken, A. B. V. D., Visser, R., & Renes, J. (2021). NAR 68 Urban
farming and ruralisation in the Netherlands (1250 up to the nineteenth century), unravelling
farming practice and the use of (open) space by synthesising archaeological reports using text
mining. Nederlandse Archeologische Rapporten (NAR) 68.
Gattiglia, G. (2015). Think big about data: Archaeology and the big data challenge. Archäologische
Informationen, 38(1), 113–124. https://doi.org/10.11588/ai.2015.1.26155
Gibbs, M., & Colley, S. (2012). Digital preservation: Online access and historical archaeology
’grey literature’ from New South Wales, Australia. Australian Archaeology, 75, 95–103. https://
doi.org/10.1080/03122417.2012.11681957
Gilboa, A., Karasik, A., Sharon, I., & Smilansky, U. (2004). Towards computerized typology and
classification of ceramics. Journal of Archaeological Science, 31(6), 681–694. https://doi.org/
10.1016/j.jas.2003.10.013
Gilman, P., & Newman, M. (2007). Informing the future of the past: Guidelines for historic
environment records (2nd edn.). Tech. Rep., ADS, ALGAO UK, English Heritage, Historic
Scotland, RCAHMS and RCAHMW.
Grove, M., & Blinkhorn, J. (2020). Neural networks differentiate between middle and later stone
age lithic assemblages in eastern Africa. PloS One, 15(8), e0237528.
Gualandi, M. L., Gattiglia, G., & Anichini, F. (2021). An open system for collection and automatic
recognition of pottery through neural network algorithms. Heritage, 4(1), 140–159.
Harris, Z. S. (1954). Distributional structure. Word, 10(2–3), 146–162.
Huggett, J. (2020). Is big digital data different? Towards a new archaeological paradigm. Journal
of Field Archaeology, 45(suppl. 1), S8–S17. https://doi.org/10.1080/00934690.2020.1713281
International Committee for Documentation (CIDOC). (2014). Information and documentation A reference ontology for the interchange of cultural heritage information (ISO Standard No.
21127:2014). Tech. Rep., International Organization for Standardization. https://www.iso.org/
standard/57832.html
Jackson, S., Richissin, C. E., McCabe, E. E., & Lee, J. J. (2020). Data-informed tools for
archaeological reflexivity: Examining the substance of bone through a meta-analysis of
academic texts. Internet Archaeology, 55. https://doi.org/10.11141/ia.55.12
Jeffrey, S., Richards, J., Ciravegna, F., Waller, S., Chapman, S., & Zhang, Z. (2009). The
Archaeotools project: Faceted classification and natural language processing in an archaeological context. Philosophical Transactions Series A, Mathematical, Physical, and Engineering
Sciences, 367(1897), 2507–19. https://doi.org/10.1098/rsta.2009.0038
Joachims, T. (1998). Text categorization with support vector machines: Learning with many
relevant features. In: Machine learning: ECML-98 (pp. 137–142). Berlin: Springer.
Kintigh, K. W., Altschul, J. H., Beaudry, M. C., Drennan, R. D., Kinzig, A. P., Kohler, T. A.,
Limp, W. F., Maschner, H. D., Michener, W. K., Pauketat, T. R., Peregrine, P., Sabloff, J. A.,
Wilkinson, T. J., Wright, H. T., & Zeder, M. A. (2014). Grand challenges for archaeology.
American Antiquity, 79(1), 5–24.
Le, Q., & Mikolov, T. (2014). Distributed representations of sentences and documents. In
International Conference on Machine Learning. Proceedings of Machine Learning Research
(pp. 1188–1196).
Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., & Kang, J. (2019). BioBERT: A pre-trained
biomedical language representation model for biomedical text mining. Bioinformatics, 36(4),
1234–1240. https://doi.org/10.1093/bioinformatics/btz682
Leonelli, S. (2015). What counts as scientific data? A relational framework. Philosophy of Science,
82(5), 810–821.
Löwenborg, D. (2018). Knowledge production with data from archaeological excavations. In
Archaeology and archaeological information in the digital society (pp. 37–53). Milton Park:
Routledge.
Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval.
Cambridge: Cambridge University Press.
Manning, C. D., Ragahvan, P., & Schutze, H. (2009). An introduction to information retrieval.
Cambridge: Cambridge University Press. https://doi.org/10.1109/LPT.2009.2020494

260

A. Brandsen

Merali, Z., & Smith, J. (1985). Optical character recognition: The technology and its application
in information units and libraries. Wetherby: Boston Spa.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations
in vector space. In 1st International Conference on Learning Representations, ICLR 2013 Workshop Track Proceedings.
Mitchell, T. (1997). Machine learning. New York: McGraw Hill.
Mohri, M., Rostamizadeh, A., & Talwalkar, A. (2018). Foundations of machine learning (2nd
edn.). Cambridge: MIT Press.
Nash, B. S., & Prewitt, E. R. (2016). The use of artificial neural networks in projectile point
typology. Lithic Technology, 41(3), 194–211.
Niccolucci, F., & Richards, J. D. (2013). ARIADNE: Advanced research infrastructures for
archaeological dataset networking in Europe. International Journal of Humanities and Arts
Computing, 7(1–2), 70–88. https://doi.org/10.3366/ijhac.2013.0082
Paijmans, H., & Brandsen, A. (2010). Searching in archaeological texts: Problems and
solutions using an artificial intelligence approach. PalArch’s Journal of Archaeology of
Egypt/Egyptology, 7(2), 1–6.
Paolanti, M., Pierdicca, R., Martini, M., Felicetti, A., Malinverni, E., Frontoni, E., & Zingaretti,
P. (2019). Deep convolutional neural networks for sentiment analysis of cultural heritage.
ISPRS-International Archives of the Photogrammetry, Remote Sensing and Spatial Information
Sciences, 4215, 871–878.
Pawlowicz, L. M., & Downum, C. E. (2021). Applications of deep learning to decorated ceramic
typology and classification: A case study using Tusayan White Ware from Northeast Arizona.
Journal of Archaeological Science, 130, 105375. https://doi.org/10.1016/j.jas.2021.105375
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M.,
Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher,
M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine learning in python. Journal of
Machine Learning Research, 12, 2825–2830.
Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language
Processing (EMNLP) (pp. 1532–1543)
Plets, G., Huijnen, P., & van Oeveren, D. (2021). Excavating archaeological texts: Applying digital
humanities to the study of archaeological thought and banal nationalism. Journal of Field
Archaeology, 46, 289–302. https://doi.org/10.1080/00934690.2021.1899889
Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3), 130–137.
Rehurek, R., & Sojka, P. (2010). Software framework for topic modelling with large corpora. In
Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks (pp. 45–
50). Valletta: ELRA.
Richards, J., Tudhope, D., & Vlachidis, A. (2015). Text mining in archaeology: Extracting
information from archaeological reports. In J. A. Barcelo & I. Bogdanovic (Eds.), Mathematics
and archaeology (pp. 240–254). Boca Raton: CRC Press. https://doi.org/10.1201/b18530-15
Riley, M. D. (1989). Some applications of tree-based modelling to speech and language. In
Proceedings of the Workshop on Speech and Natural Language, Association for Computational
Linguistics (ACL) (pp. 339–352). https://doi.org/10.3115/1075434.1075492
Sanders, D. H. (2018). Neural networks, AI, phone-based VR, machine learning, computer vision
and the CUNAT automated translation app–not your father’s archaeological toolkit. In 2018
3rd Digital Heritage International Congress (DigitalHERITAGE) Held Jointly with 2018
24th International Conference on Virtual Systems & Multimedia (VSMM 2018) (pp. 1–5).
Piscataway: IEEE.
Sievert, C., & Shirley, K. (2014). LDAvis: A method for visualizing and interpreting topics. In
Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces,
pp. 63–70
Sommerschield, T. (2020). Ralegh radford Rome awards: Restoring ancient text using machine
learning: A case-study on Greek and Latin epigraphy. Papers of the British School at Rome,
88, 387–388. https://doi.org/10.1017/S0068246220000240

11 Information Extraction and Machine Learning for Archaeological Texts

261

Talboom, L. (2017). Improving the discoverability of zooarchaeological data with the help of
Natural Language Processing. Master’s thesis, University of York.
Talks, A. (2019). An exploration of NLP and NER for enhanced search in osteoarchaeological and
palaeopathological textual resources. Master’s Thesis, University of York.
Tjong Kim Sang, E. F. (2002). Introduction to the CoNLL-2002 shared task: Languageindependent named entity recognition. In COLING-02: The 6th Conference on Natural
Language Learning 2002 (CoNLL-2002).
Traviglia, A., Cowley, D., & Lambers, K. (2016). Finding common ground: Human and computer
vision in archaeological prospection. AARGnews-The Newsletter of the Aerial Archaeology
Research Group, 53, 11–24.
Trier, Ø. D., Salberg, A. B., & Pilø, L. H. (2018). Semi-automatic mapping of charcoal kilns from
airborne laser scanning data using deep learning. In CAA2016: Oceans of Data. Proceedings of
the 44th Conference on Computer Applications and Quantitative Methods in Archaeology (pp.
219–231). Oxford: Archaeopress.
Tudhope, D., May, K., Binding, C., & Vlachidis, A. (2011). Connecting archaeological data
and grey literature via semantic cross search. Internet Archaeology, 30(30). https://doi.org/10.
11141/ia.30.5
Turney, P. (2002). Thumbs up or thumbs down? Semantic orientation applied to unsupervised
classification of reviews. In Proceedings of the 40th Annual Meeting of the Association for
Computational Linguistics, pp 417–424
Verschoof-van der Vaart, W. B., Lambers, K., Kowalczyk, W., & Bourgeois, Q. P. (2020).
Combining deep learning and location-based ranking for large-scale archaeological prospection
of LiDAR data from the Netherlands. ISPRS International Journal of Geo-Information, 9(5),
293. https://doi.org/10.3390/ijgi9050293
Verschoof-van der Vaart, W. B., & Landauer, J. (2021). Using CarcassonNet to automatically detect
and trace hollow roads in LiDAR data from the Netherlands. Journal of Cultural Heritage, 47,
143–154. https://doi.org/10.1016/j.culher.2020.10.009
Vince, A. (1996). Editorial. Internet Archaeology, 1. https://doi.org/10.11141/ia.1.7
Vlachidis, A. (2012). Semantic indexing via knowledge organization systems: Applying the
CIDOC-CRM to archaeological grey literature. Unpublished PhD Thesis, University of South
Wales (USW).
Vlachidis, A., Tudhope, D., & Wansleeben, M. (2021). Knowledge-based named entity recognition
of archaeological concepts in Dutch. In E. Garoufallou & M. A. Ovalle-Perandones (Eds.), 14th
International Conference on Metadata and Semantic Research (pp. 53–64). Cham: Springer.
https://doi.org/10.1007/978-3-030-71903-6_6
Vlachidis, A., Tudhope, D., Wansleeben, M., Azzopardi, J., Green, K., Xia, L., &
Wright, H. (2017). D16.4: Final report on natural language processing. Tech. Rep.,
ARIADNE. http://legacy.ariadne-infrastructure.eu/wp-content/uploads/2019/01/D16.4_Final_
Report_on_Natural_Language_Processing_Final.pdf
Wilcke, W. X., de Boer, V., de Kleijn, M. T., van Harmelen, F. A., & Scholten, H. J. (2019). Usercentric pattern mining on knowledge graphs: An archaeological case study. Journal of Web
Semantics, 59, 1–10. https://doi.org/10.1016/j.websem.2018.12.004
Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J. J., Appleton, G., Axton, M., Baak, A.,
Blomberg, N., Boiten, J. W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes,
A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., ...
Mons, B. (2016) The FAIR guiding principles for scientific data management and stewardship.
Scientific Data, 3, 160018. https://doi.org/10.1038/sdata.2016.18
Wold, S., Esbensen, K., & Geladi, P. (1987). Principal component analysis. Chemometrics and
Intelligent Laboratory Systems, 2(1–3), 37–52.

Chapter 12

Argument Mining and Analytics in
Archaeology
John Lawrence, Martín Pereira-Fariña, and Jacky Visser

Abstract The ever increasing volume of textual data ripe for analysis has driven
computational efforts to unlock the wealth of information contained within. The
automated reconstruction of the argumentative structure of texts, Argument Mining, meets this challenge by not only showing what claims are being advanced
(conclusion), but also why (premises). In this chapter, we start by surveying some
of the foundations and state-of-the-art of argument mining and how they can be
applied in domain-specific tasks in different research contexts, such as archaeology.
After that, we discuss two central themes in argumentation critical for argument
mining: argument schemes (common patterns of reasoning) and discourse markers
(that function as argumentative indicators). Next, we describe how to create specific
datasets for argument mining systems by means of annotated text corpora and how
to store it using the Argument Interchange Format ontology. We conclude explaining
Argument Analytics, a visual way to deliver the output of argument mining systems
to its potential users.
Keywords Argument mining · Argument scheme · Corpus · Analytics ·
Ontology

12.1 Introduction
The ever increasing volume of textual data ripe for analysis has driven computational efforts to unlock the wealth of information contained within. Automated
techniques such as Opinion Mining and Sentiment Analysis make it possible to

J. Lawrence · J. Visser (!)
Department of Computing, University of Dundee, Dundee, UK
e-mail: j.lawrence@dundee.ac.uk; j.visser@dundee.ac.uk
M. Pereira-Fariña
Department of Philosophy, University of Santiago de Compostela, Santiago de Compostela,
Spain
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
C. Gonzalez-Perez et al. (eds.), Discourse and Argumentation in Archaeology:
Conceptual and Computational Approaches, Quantitative Archaeology
and Archaeological Modelling, https://doi.org/10.1007/978-3-031-37156-1_12

263

264

J. Lawrence et al.

identify the views expressed in a piece of text—for example, whether a product
review is positive or negative (Pang & Lee, 2008). While these well-established
techniques can be effectively used to determine the stance of argumentative texts,
they stop short of reconstructing the reasoning advanced in support of that stance.
The automated reconstruction of the argumentative structure of texts, Argument
Mining, meets this challenge by not only showing what claims are being advanced,
but also why.
The earliest approaches to argument mining (Moens et al., 2007; Palau & Moens,
2009), attempted to detect the argumentative parts of a text by first splitting it
into sentences and using features of these sentences to classify each as either
Argument or Non-Argument, and then classifying each Argument sentence as either
premise or conclusion. Whilst much recent work in this area builds on these
concepts and techniques, the range of tasks and technologies available has grown
dramatically, as have the application areas. The tasks addressed can be broadly
categorised as: identifying argument components, including boundary detection and
argument/non-argument classification; identifying clausal properties, both intrinsic,
such as whether the clause is factual or opinion-based, and contextual such as
whether the clause is the conclusion to an argument; and identifying relational
properties, from simple premise/conclusion relationships, to whether a set of clauses
form an instance of an argument scheme.
In this chapter, we will survey some of the foundations and state-of-the-art of
argument mining: the automated reconstruction of the reasons advanced in defence
(or attack) of a disputed claim (Lawrence & Reed, 2020). Argument mining tools
can be fine-tuned for application on a variety of text genres and domains. While we
will mostly discuss general argument mining, the methods can be utilised to great
effect on specialised domain-specific tasks in, e.g., research contexts—whether in
archaeology or beyond.
After outlining the application of argument mining techniques in the archaeology
domain (Sect. 12.2), we will use the remainder of this chapter to first discuss
two central themes in argument mining and argumentation broadly (Sect. 12.3):
argument schemes as common patterns of reasoning that can be identified and
used to determine the structure of argumentation (Sect. 12.3.1), and discourse
markers that function as argumentative indicators (Sect. 12.3.2). Next, we turn to
argumentation datasets, focusing on both annotated and generated text corpora
developed specifically for argument mining (Sect. 12.4). In the next section, we
briefly look at the Argument Interchange Format, a shared ontology for representing
argumentative concepts and labels (Sect. 12.5) that can be employed to increase
the compatibility and reusability of annotated datasets. In the last part of the
chapter, we will explain Argument Analytics, which can take the output of argument
mining systems to provide quantitative metrics and visualisations. These data-driven
infographics give users an overview of complex evidential reasoning and debates,
supporting the sense-making process on otherwise hard to track argumentative cases
(Sect. 12.6).

12 Argument Mining and Analytics in Archaeology

265

12.2 Applying Argument Technologies in Archaeology
Lucas (2019) defends an argumentative model of reasoning as one of the possible
theoretical frameworks for knowledge production in archaeology. This type of
knowledge is not stable until it is elaborated in complex texts (although it is still
being mobile) but, due to the nature of archaeological knowledge (claims about
the past that cannot be indubitably verified), there are always alternative ways
competing between them for interpreting archaeological records. The most recent
work studying how archaeological knowledge is produced (Chapman & Wylie,
2016) is mainly inspired by Toulmin’s (1958) model of argumentation, having
several limitations to gather the variety and richness of the different ways in which
people build different type of arguments in the field.
We argue, however, in favour of using annotated text corpora (see Sect. 12.4)
and Argument Analytics (see Sect. 12.6) in the study of knowledge production
in archaeology as an alternative to the Toulmin model. Knowledge production in
archaeology requires different text types. In addition to the documents generated
during any research (reports, field diaries, etc.), the two main text types are journal
articles and books (Lucas, 2019). Because journal articles usually contain more
argumentation than books—which are more descriptive and expository—these
should constitute the raw data to create annotated corpora. In addition, since this
type of text uses a formal linguistic register and is carefully written, the annotation
of this type of text should be easier and more reliable.
On the other hand, different texts can provide alternative views of the same
underlying data. Therefore, the texts must develop alternative lines of reasoning.
Texts aligned with the same view or defending the same main claim can be put
together into the same set of raw corpus data to be annotated, which allows us to
explore its inter-textual relationships (Chandler, 2003; Visser et al., 2018a); i.e.,
discover how texts and authors are connected between them and how the content
of various texts cross-references and relies on the meaning of others. As a result,
arguments are not isolated islands but, following Lucas’ program (Lucas, 2019, p.
61), “arguments work best not as linear chains but as multiple strands which work
to triangulate around the same chain”.
Efforts to create good annotated text corpora will make it possible to address
some of the current epistemic issues in the field of archaeology, because it unpacks
how different researchers argue in favour of their competing interpretative views
about the past. A standard ontology and annotation scheme (see Sect. 12.5), and
notions such as argument schemes (see Sect. 12.3.1), have a higher expressive
power than Toulmin’s model, and they allows us to better capture the argumentative
richness of texts. In addition, combining these manual methods with automated
techniques for Argument Mining and Argument Analytics, makes it possible to
capture the theoretical pluralism and overcome that inner tension with an argumentative model of reasoning. As we describe in Sect. 12.6, annotated corpora not only
provide information about the content of the debates but also about its dynamics.
This allows us to observe both the interaction between speakers and their main

266

J. Lawrence et al.

points of agreement and disagreement. As a result, we will be able to join together
the different theories with respect to a particular issue and how they are defended.
In conclusion, we argue that is worth to make an effort to create good annotated
text corpora in the archaeological field. Other fields where such corpora already have
been created, such as political debate, education, legal texts, and newspapers, have
already got the benefit of argument technologies. For instance, Visser et al. (2020a)
have successfully deployed argument technology to all secondary schools in the UK
to instil critical literacy skills in distinguishing fake news from genuine. Similarly,
the project The Morality of Abortion1 shows how different debaters in the BBC
Radio 4 programme The Moral Maze argue their different positions with respect to
abortion laws and what their main points of conflict were. It is our contention that
the same can be achieved in the archaeological field.

12.3 Dimensions of Argument Mining
Many of the earliest approaches to argument mining focused on applying existing
computational linguistic techniques to identify specific facets of the argumentative
structure contained within a text. Instead of requiring an a priori defined set of rules
that the software applies to a given example of argumentative text, these techniques
train machine learning algorithms on manually annotated argument datasets (see
Sect. 12.4) to produce models capable of automatically classifying text. By feeding
in a sufficient volume of text spans along with their annotated class—e.g. part of
a sentence with the annotated label as ‘premise’—the model learns to associate
specific linguistic cues with this class and can then predict to which class new text
spans should be assigned. In this way, the system learns a model on the basis of an
appropriately labelled set of examples (the training data), which is then tested to an
as yet unseen set of unlabelled examples (the test data) to test how well the system
performs.
As advances are made in the performance of machine learning models, this
strategy continues to deliver incrementally improving results (Galassi et al., 2020).
More recent advances have started to explore characteristic features of natural
language that are specifically related to argumentative intent or to particular
application domains. By combining such features and the accompanying techniques
in a concerted approach, the insights from various disciplines and perspectives can
be leveraged to achieve the best results (Lawrence & Reed, 2015). In this section, we
discuss two of these central dimensions: argument schemes (12.3.1) and linguistic
features (12.3.2).

1 https://bbc.arg.tech/.

12 Argument Mining and Analytics in Archaeology

267

12.3.1 Argument Schemes
Argument schemes capture persuasive structures of (typically presumptive) inference from a set of premises to a conclusion, relying on stereotypical patterns of
human reasoning.2 As such, argument schemes represent a historical descendant
of the topics of Aristotle (Aristotle, 1958) and, much like Aristotle’s topics, play
an equally valuable role in the production, analysis, and evaluation of arguments—
whether by human arguers or by automated software.
Several attempts have been made at creating taxonomies of the most commonly
used schemes—to give just a small sample of the existing scholarship: Hastings
(1963), Perelman and Olbrechts-Tyteca (1969), Kienpointner (1992), van Eemeren
and Grootendorst (1992), Pollock (1995), Walton (1996), Grennan (1997), Katzav
and Reed (2004), Walton et al. (2008), and Wagemans (2016). Although these sets
of schemes overlap in many places, the number of schemes identified and their
granularity varies greatly. As a result, most argument analyses tend to contain
examples from only one scheme set, with various permutations of the schemes
described by Douglas Walton to be the most commonly used in computational
approaches.
Whilst the majority of these argument schemes are general in their nature,
applying to any situation where argument can be found, some schemes are specific
to a certain context or domain. For example, Wyner et al. (2012) identify a
‘consumer argumentation scheme’ to represent the arguments made in product
reviews relative to the preferences and values of the potential buyer. Similarly,
Green (2015) identifies ten custom argument schemes targeted at genetics research
articles. For example, one of the schemes presented, ‘Failed to Observe Effect
of Hypothesized Cause’, looks for situations where specific properties were not
observed, and where it is assumed that a specific condition that would result in those
properties is present, leading to the conclusion that the condition may not be present.
Green (2018) further argues for schemes expressed in terms of domain concepts
rather than by generic definitions like those used by Walton et al. (2008). She carries
out a pilot annotation study of schemes for 15 arguments in the Results/Discussion
section of biological/biomedical journal articles. To the best of our knowledge, no
domain-specific argument schemes have yet been specified for Archaeology.
Understanding the argument scheme, whether general or domain-specific, instantiated in a piece of natural language text can help us understand its persuasive force
beyond what many existing automated techniques for extracting meaning offer. If
we consider the argument in Example (1), then sentiment analysis techniques, for
instance, allow us to understand at a high level what views are being presented—

2 The study of argument schemes has a long history, ranging from Antiquity to modern academic
research (Garssen, 2001). In the literature, various authors use different terms to signify the
same general idea (with small variations): e.g., ‘argument scheme’ (van Eemeren and Grootendorst, 1992), ‘argumentation scheme’ (Walton, 1996), and ‘argumentative scheme’ (Perelman &
Olbrechts-Tyteca, 1969). In this chapter, we will use the term ‘argument scheme’.

268

J. Lawrence et al.

that the speaker is against opening the Cave of Altamira in Spain, for example—but
they are unable to provide details on exactly why this standpoint is held.3
(1)

Opening the Cave of Altamira to the public may damage the drawings,
because environmental stability is essential to their preservation, and the
visitors risk altering the environmental parameters.

Looking at the structure of the argumentation in this review, we can see that
the propositions “environmental stability is essential to their preservation” and
“the visitors risk altering the environmental parameters” are working together as
a linked argument (Snoeck Henkemans, 1992; Freeman, 2011) to support the
conclusion “Opening the Cave of Altamira to the public may damage the drawings”.
Furthermore, we can see that the link between the premises and conclusion is an
instance of Argument from Cause to Effect (Walton et al., 2008).
In Walton’s approach to argument schemes a particular label is often assigned
to each component part of a scheme instance. For the Argument from Cause to
Effect in Example (1), the scheme components are only labelled as major and minor
premise, as follows:
Major Premise: environmental stability is essential to the preservation of the
drawings
Minor Premise: the visitors risk altering the environmental parameters
Conclusion: opening the Cave of Altamira to the public may damage the
drawings
The features of these common patterns of argument provide us with a way in which
to both identify that an argument is advanced and determine its structure. By using
the specific nature of each component proposition in a scheme, we can identify
where a particular scheme is being used and classify the propositions accordingly,
thereby gaining a deeper understanding of the reasoning expressed in a piece of text.
Argument schemes can be a strong feature in argument mining and in the
reconstruction of enthymemes (understood narrowly as arguments with unexpressed
premises) (Feng & Hirst, 2011). To maximise their efficacy, the number and
variation of individual schemes in annotated argument corpora should be as large as
possible. Existing annotations, however, tend to use restricted sets of scheme types,
while struggling to obtain reliable annotation results. For example, Duschl (2007)
initially adopts a selection of nine argument schemes described by Walton (1996)
for his annotation of transcribed middle-school student interviews about science fair
projects. Later, however, he collapses several schemes into four more general classes
no longer directly related to particular scheme types. This deviation from Walton’s
typology appears to be motivated by the need to improve annotation agreement.
Similarly, Song et al. (2014) base their annotation on a modification of Walton’s
typology, settling on a restricted set of three more general schemes: policy, causal,
3 This example was adapted and translated from the Cultural Heritage—Altamira corpus, available
at http://corpora.aifdb.org/Altamira. @MP Is there a citation for this to refer to?

12 Argument Mining and Analytics in Archaeology

269

and sample , while Anthony and Kim (2015) employ a bespoke set of nine coding
labels modified from the categories used by Duschl (2007) and nine schemes
described in a textbook by Walton (2006).
Visser et al. (2020b) develop an annotation procedure that aims to stay close to
Walton’s original typology, while facilitating the reliable annotation of a broad range
of argument schemes. The main principle guiding the annotation is the clustering
of argument schemes on the basis of intuitively clear features recognisable to
annotators. Due to the strong reliance on the distinctive properties of arguments that
are characteristic for a particular scheme, the annotation procedure bears a striking
resemblance to methods for biological taxonomy—the identification of organisms in
the various sub-fields of biology (see, e.g., Voss, 1952; Pankhurst, 1978). Drawing
on the biological analogue and building on the guidelines used by Visser et al.
(2018b), they develop a taxonomic key for the identification of argument schemes
in accordance with Walton’s typology: the Argument Scheme Key (ASK).
The ASK is a dichotomous identification key that leads the analyst through
a series of disjunctive choices based on the distinctive features of a ‘species’
of argument scheme to the particular type. Starting from the distinction between
source-based and other arguments, each further choice in the key leads to either a
particular argument scheme or to a further distinction. The distinctive characteristics
are numbered, listing between brackets the number of any not directly preceding
previous characteristic that led to this particular point in the key.
To further simplify the annotation of argument schemes, Lawrence et al. (2019)
develop a software solution that takes the user through the ASK in a series of
binary choices to result in a suggested scheme. This ASK Assistant is integrated
in the Online Visualisation of Argument (OVA) tool (Janier et al., 2014), a web
browser based application for analysing and annotating the argumentative structure
of natural language text. OVA4 has over 3000 individual users in 38 countries,
analysing argumentative texts ranging from online discussions (Lawrence et al.,
2017) to election debates (Visser et al., 2019).

12.3.2 Linguistic Features
Discourse markers are explicitly stated linguistic expressions of the relationship
between statements (Webber et al., 2011), and, when present, provide strong
indicators of argumentative structure (van Eemeren et al., 2007). For instance, if
we consider Example (2), then this can be split into two separate propositions “the
Palace of Culture and Science in Warsaw should not be demolished” and “it [the
Palace of Culture and Science in Warsaw] houses many public institutions”.5 The

4 http://ova.arg.tech.
5 Example

adapted from (Budzynska et al., 2021).

270

J. Lawrence et al.

presence of the discourse marker “because” between these two propositions is a
clear indication that the second is being employed as a reason for the first.
(2)

The Palace of Culture and Science in Warsaw should not be demolished,
because it houses many public institutions.

Discourse indicators have been successfully leveraged as a component of argument
mining techniques. For example, Stab and Gurevych (2014b) used indicators as
a feature in multiclass classification of argument components, with each clause
classified as a major claim, claim or premise, or as non-argumentative. Similar
indicators are used by Wyner et al. (2012), along with domain terminology (e.g.
product names and properties), to highlight potential argumentative sections of
online product reviews. However, there has been little study of how well indicators
perform on their own, how frequently they occur in real-world text, and how well
different individual indicators map to specific argumentative relations.
There are many different ways in which indicators can appear, and a wide
range of relations which they can suggest (Knott, 1996). Lawrence and Reed
(2017) limit their search to specific terms indicating support or attack relations
between a pair of propositions. Specifically, they consider those indicators which
show an argumentative relation between sequential propositions of the form A
[indicator] B (as we saw in Example (2)) or [indicator] A B (e.g. “Because the
Palace of Culture and Science in Warsaw houses many public institutions, it should
not be demolished”). They also consider the relationship between indicators and
the directionality of the argumentative connections (e.g. A because B suggests a
support relation from the premise B (single underlined) to the conclusion A (double
underlined), whereas A therefore B suggests a support relation from A to B). In this
work, two sources of candidate discourse indicators were used: an aggregation of
those found in existing literature (Groarke et al., 1997; Knott, 1996), and a domain
specific list extracted from relations in the US2106 corpora (Visser et al., 2019).
In each case, these lists were extended by including synonyms identified using
WordNet (Miller, 1995). The indicators drawn from existing literature are shown
in Table 12.1.
Surprisingly, the results show that indicators which are commonly mentioned in
the literature as being useful for identifying argumentative structure rarely occur in
the examined data. The indicator “therefore”, for instance, only occurs once within
Table 12.1 Argumentative discourse indicators from existing literature
Relation type

Indicators

support

So, therefore, accordingly, then, thus, consequently, hence, ergo

support

Because, since, as

conf lict

But, however, nonetheless, nevertheless, still, yet, though, whereas

conf lict

Although, except, despite, albeit

A .−−−−→ B
A .←−−−− B

A .−−−−−→ B
A .←−−−−− B

12 Argument Mining and Analytics in Archaeology

271

the entire US2016G1tv corpus (where it does indeed connect two inferentially linked
text spans).
Of those indicators which do appear more frequently in US2016G1tv, most
provide little information. For example, whilst there were 30 instances of the
indicator “so” occurring between adjacent spans, only 37.5% of these instances were
between spans where a support relation exists. A possible explanation for which can
be found in the spoken genre of the US2016G1tv corpus, in which “so” may be used
rather as a linguistic device signalling turn-taking.
The one exception here is the indicator “because”. This indicator appears
between spans 71 times and, of these, 87.3% were connected by a support
relationship. Whilst this is a promising result, and suggests that, in those cases where
“because” occurs, it can tell us with high accuracy the type of connection, it is also
shown that using this method on its own would leave approximately 80% of support
relations (as well as all conflict relations) unidentified.
These results are supported by those of earlier work carried out on the Araucaria
corpus (Reed et al., 2008). Focusing on the thirteen most reliable support indicators
and eleven most reliable conflict indicators, Lawrence and Reed (2015) achieved
an overall precision of 0.89, but a recall of only 0.04, concluding that: “discourse
indicators may provide a useful component in an argument mining approach, but,
unless supplemented by other methods, are inadequate for identifying even a small
percentage of the argumentative structure”.

12.4 Annotated Corpora of Argumentation
One of the challenges faced by current approaches to argument mining is the lack
of suitably large quantities of appropriately labelled (or annotated) arguments to
serve as training and test data. Especially techniques based on neural networks and
deep learning require vast quantities of data to perform well, and to prevent the
system from over-fitting to the data—fitting to a limited arbitrary text sample, at the
expense of wider applicability. Several recent efforts have been made to improve this
situation through the creation of annotated text corpora and argument datasets across
a range of different communicative domains. These efforts can be broken down into
two main categories: manually annotated corpora of argumentative components and
structure found in natural language text; and manually or automatically generated
corpora.
In this section, we will discuss some of the prototypical and most widely used
annotated and generated text corpora for argument mining. We will inevitably
leave out a great many alternatives, but chose this sample to give a reasonable
introduction—a wider overview is presented by, e.g., Lawrence and Reed (2020),
and some of the datasets not discussed independently in this chapter can still be
found in Table 12.2.
The Internet Argument Corpus (IAC) consists of .∼390,000 posts in .∼11,000
online discussions, totalling some 73,000,000 words (Walker et al., 2012). Subsets

AraucariaDB

US2016

MM2012

Dispute mediation

Digging by debating

Name
AIFdb Corpora
Argumentation
schemes

62,881
words

87,064
words

= 0.68

= 0.75

Single
annotator

.κ

= 0.55
(types), 0.61
(relations)

.κ

.κ

Single
annotator

35,789
words

26,923
words
29,068
words

Single
annotator

6704 words

Examples of occurrences of
Walton’s argumentation
schemes found in episodes of
the BBC Moral Maze Radio 4
programme.
Collection of analyses of
nineteenth century philosophical
texts from the Hathi Trust
collection.
Argument maps of mediation
session transcripts
Analyses of all episodes from
the 2012 summer season of the
BBC Moral Maze Radio 4
programme.
2016 US presidential elections:
annotations of selected excerpts
of primary and general election
debates, combined with
annotations of selected excerpts
of corresponding Reddit
comments.
An import of 661 argument
analyses produced using
Araucaria and stored in the
Araucaria database.

IAA

Size

Description

Table 12.2 Significant argumentation datasets available online

http://corpora.aifdb.org/araucaria

http://corpora.aifdb.org/US2016

http://corpora.aifdb.org/mm2012

http://corpora.aifdb.org/mediation

http://corpora.aifdb.org/dbyd

http://corpora.aifdb.org/schemes

URL

Reed (2006)

Visser et al. (2018a)

Budzynska et al. (2014)

Janier and Reed (2016)

Murdock et al. (2017)

Lawrence and Reed
(2016)

Reference

272
J. Lawrence et al.

Available elsewhere
Argument annotated
essays

Language of
opposition
Microtext

Internet argument
corpus (IAC)

Imported into AIFdb
eRulemaking

AraucariaDBpl

The corpus consists of argument
annotated persuasive essays
including annotations of
argument components and
argumentative relations.

Argument maps of 67 comment
threads from
regulationroom.org.
Consisting of 11,000
discussions and developed for
research in political debate on
internet forums. Subsets of the
data have been annotated for
topic, stance, agreement,
sarcasm, and nastiness among
others.
Used in Rutgers for the SALTS
project (http://salts.rutgers.edu/).
112 manually created, short
texts with explicit
argumentation, and little
argumentatively irrelevant
material.

A selection of over 50 Polish
language analyses created using
the Polish version of Araucaria.

147,271
words

48,666
words
7828 words

1,031,398
words

26,083
words

2,654
words

= 0.22-0.60,
≈ 0.47

= 0.73

= 0.83

= 0.64-0.88
(types),
0.71-0.74
(relations)
.κ

.κ

Not reported

.κ̄

.κ

.κ

Single
annotator

https://bit.ly/2OlRZnt

http://corpora.aifdb.org/Microtext

http://corpora.aifdb.org/looc1

http://corpora.aifdb.org/IAC

http://corpora.aifdb.org/RRD

http://corpora.aifdb.org/araucariapl

(continued)

Stab and Gurevych
(2017)

Peldszus (2014)

Ghosh et al. (2014)

Walker et al. (2012)

Park and Cardie (2014)

Budzynska (2011)

12 Argument Mining and Analytics in Archaeology
273

IBM project debater
datasets

Internet argument
corpus (IAC) 2

Consumer debt
collection practices
(CDCP)

Name
Argument annotated
user-generated web
discourse

Table 12.2 (continued)

Description
User comments, forum posts,
blogs and newspaper articles
annotated with an argument
scheme based on an extended
Toulmin model
User comments about rule
proposals by the Consumer
Financial Protection Bureau
collected from an eRulemaking
website
Corpus for research in political
debate on internet forums. It
includes topic annotations,
response characterizations, and
stance.
Collection of annotated data sets
developed as part of Project
Debater to facilitate this
research. Organized by research
sub-fields.
= 0.65
(types), 0.44
(relations)

Not reported

.∼500,000

Various

forum posts

Various

.α

.∼88,000

words

IAA
=
0.51-0.80

.αU

Size
84,673
words

https://ibm.co/2OlqieA

https://nlds.soe.ucsc.edu/iac2

http://joonsuk.org

URL
https://bit.ly/2vdkHOD

(Rinott et al., 2015), Levy
et al. (2017) etc

Abbott et al. (2016)

Niculae et al. (2017)

Reference
Habernal and Gurevych
(2017)

274
J. Lawrence et al.

12 Argument Mining and Analytics in Archaeology

275

of the data have been annotated with a variety of labels such as topic, stance,
agreement, sarcasm, and nastiness. The IAC is further developed in the IAC Version
2 (Abbott et al., 2016), a collection of corpora for research in political debate on
internet forums. It consists of three datasets: 4forums (.∼414K posts), ConvinceMe
(.∼65K posts), and a sample from CreateDebate (.∼3K posts). The annotation
includes topics, response characterisations, and stance classification. The detail of
argument annotation in both IAC datasets is still rather limited in comparison to that
available in other datasets.
One of the ways in which others have succeeded in creating corpora with more
detailed argument annotation is by narrowing down the scope and focusing on
a specific domain. Green (2014), for instance, creates a freely available corpus
of open-access, full-text scientific articles from the biomedical genetics research
literature, annotated to support argument mining applications. However, there
are challenges to creating such corpora, such as the extensive use of biological,
chemical, and clinical terminology in the BioNLP domain requiring specialist
annotators trained in this field (Green, 2015). Legal texts constitute another highly
specialised domain. Walker et al. (2014) mark up successful and unsuccessful
patterns of argument in U.S. judicial decisions. Building on a corpus of vaccineinjury compensation cases that report fact-finding about causation, based on both
scientific and non-scientific evidence and reasoning, patterns of reasoning are
identified and used to illustrate the difficulty of developing a type or annotation
system for characterising these patterns.
In the development of the Argument Annotated Essays Corpus (AAEC), Stab
and Gurevych (2014a) leverage the inherent argumentative nature of a particular
text genre. The AAEC consists of argument-annotated persuasive essays, featuring
not just topic and stance identification, and annotation of argument components and
relations, but also persuasiveness scores for (a selection of) the arguments. Drawn
from 90 English language essays, the initial AAEC corpus comprises 90 major
claims, 429 claims, and 1033 premises, connected by 1312 support and 161 attack
relations—with the second version of the AAEC (Stab and Gurevych, 2017) further
extending this to 402 essays, 751 major claims, 1506 claims, and 3832 premises,
connected by 3613 support and 219 attack relations. The persuasiveness annotation
by Carlile et al. (2018) also includes scores for attributes that potentially impact
persuasiveness: Eloquence, Specificity, Relevance, and Evidence, and the means of
persuasion—Ethos, Pathos and Logos. The usefulness of this addition to AAEC has
been demonstrated in the development of automated methods for persuasiveness
scoring of essays (Ke et al., 2018).
Another corpus focusing on persuasive essays is the generated corpus of argumentative “microtexts”. Peldszus (2014) creates this corpus by tasking participants
to write approximately five segments in which: all segments are argumentatively
relevant; there is a segment acting as the main claim of the text; all other segments
are supporting/attacking the main claim or another segment; and at least one
possible objection to the claim is considered in the text. Whilst this method of
generating textual data produces very clear examples of argumentation, the artificial
nature of its construction means that results obtained on the dataset may not

276

J. Lawrence et al.

generalise well to naturally occurring unrestricted text. Nonetheless, the Microtext
corpus provides a valuable resource for controlled ‘laboratory’ testing of argument
mining techniques.
Whilst the previously discussed datasets can be viewed as fully structured
argument data, there is an increasing usage of larger semi-structured sources. One
source for such data is the ChangeMyView6 (CMV) Reddit subcommunity, the
argumentative nature of which has been successfully leveraged for gathering semistructured data by, amongst others, Hidey and McKeown (2018). The data takes the
form of discussion threads where the original poster of a thread provides a viewpoint
on a specific topic, and other users reply with comments aiming to change this view.
If the original poster finds that a comment succeeds in changing their viewpoint,
they can reply with a .∆ (delta) indicating this. The textual CMV data contains strong
indicators of arguments and counterarguments (Hua & Wang, 2017).
To support the creation and curation of argument datasets, software infrastructure
has been developed, including tools for argument annotation and online repositories.
AIFdb7 is an online, freely accessible database of annotated argumentative texts
(Lawrence et al., 2012). Arguably the most comprehensively annotated collection
of such data, AIFdb contains a range of independent annotated corpora , comprising
over 2.2m words and 200,000 claims in fourteen different languages8 and over
20,000 argument maps compliant with the Argument Interchange Format (AIF, see
next Section) (Chesñevar et al., 2006). In Table 12.2, we survey some of the corpora
contained in AIFdb—both native AIF and imported—as well as some of the main
online corpora available elsewhere.

12.5 Argument Interchange Format
Argumentation theory is a large and diverse field stretching from analytical philosophy to communication theory and social psychology. The computational investigation of the space has multiplied that spectrum by a diversity of its own in semantics,
logics and inferential systems. One of the problems associated with the diversity
and productivity of the field, however, is fragmentation: with many researchers
from various backgrounds focusing on different aspects of argumentation, it is
increasingly difficult to reintegrate results into a coherent whole. This in turn makes
it difficult for new research to build upon old. Furthermore, the large variation
in theoretical interpretations of argumentative concepts leads to idiosyncratic
labels in annotated datasets. To tackle such problems, the computational argument
community built a common ontology for argument to support interchange between

6 https://www.reddit.com/r/changemyview/.
7 http://www.aifdb.org.
8 Amharic, Chinese, Dutch, English, French, German, Hindi, Italian, Japanese, Polish, Portuguese,
Russian, Spanish and Ukrainian.

12 Argument Mining and Analytics in Archaeology

277

different research projects and applications in the area: the Argument Interchange
Format (AIF) (Chesñevar et al., 2006).
Owing to its roots in computational argumentation, a main aspiration of the AIF
is to facilitate data interchange among various tools and methods for argument
analysis, manipulation and visualisation. Whilst the ideal of a single format might
not be feasible in such a diverse field, a common consensus on the standards and
technologies employed is desirable. Furthermore, the AIF project aims to develop
a commonly agreed-upon core ontology that specifies the basic concepts used to
express argumentative information and relations. The purpose of this ontology is
not to replace other languages for expressing argument but rather to serve as an
abstract interlingua that acts as the centrepiece to multiple individual languages for
argumentation. These argument languages can be, for example, logical languages
(e.g. ASPIC’s defeasible logic Prakken, 2010), visual languages (e.g. Araucaria’s
AML format for diagrams Reed & Rowe, 2004) or natural language (e.g. as used in
the pragma-dialectical theory van Eemeren, 2018).
The AIF can be seen as a representation scheme constructed in three layers.
At the most abstract layer, the AIF provides a hierarchy of concepts which can
be used to describe argument structure. This hierarchy describes an argument by
conceiving of it as a network of connected nodes that are of two types: information
nodes that capture data (such as datum and claim nodes in an analysis using the
Toulmin (1958) model, or premises and conclusions in a box-and-arrow analysis
in the style of Freeman (1991), for example), and scheme nodes that describe
passage between information nodes (similar to the application of warrants or rules of
inference). Scheme nodes in turn come in several different guises, including scheme
nodes that correspond to support or inference (or ‘rule application nodes’), scheme
nodes that correspond to conflict or refutation (or ‘conflict application nodes’),
scheme nodes that correspond to rephrase and scheme nodes that correspond
to value judgements or preference orderings (or ‘preference application nodes’).
At this topmost layer, there are various constraints on how components interact:
information nodes, for example, can only be connected to other information nodes
via scheme nodes of one sort or another. Scheme nodes, on the other hand, can be
connected to other scheme nodes directly (in cases, for example, of arguments that
have inferential components as conclusions, e.g. in patterns such as Kienpointner’s
(1992) ‘warrant-establishing arguments’). Inference captured by multiple incoming
scheme nodes thus naturally corresponds to convergent argumentation; that covered
by multiple premises supporting a single incoming scheme node corresponds to
linked argumentation (Walton, 2006).
A second, intermediate layer provides a set of specific argumentation schemes
(and value hierarchies, and conflict patterns). Thus, the uppermost layer in the
AIF ontology lays out that presumptive argumentation schemes are types of rule
application nodes, but it is the intermediate layer that cashes those presumptive
argumentation schemes out into Argument from Consequences, Argument from
Cause to Effect and so on. At this layer, the form of specific argumentation schemes
is defined: each will have a conclusion description (such as ‘A may plausibly be
taken to be true’) and one or more premise descriptions (such as ‘E is an expert

278

J. Lawrence et al.

in domain D’). Walton’s schemes (Walton, 1996; Walton et al., 2008) have been
developed in full for the AIF (Rahwan et al., 2007).
Finally, the third and most concrete level supports the integration of actual
fragments of argument, with individual argument components (such as strings of
text) instantiating elements of the layer above. At this third layer, an instance of
a given scheme is represented as a rule application node: RA for applications
of rules of inference, CA for conflict scheme applications, MA for rephrases or
transformations, PA for preference schemes, etc. These rule application nodes are
said to fulfil the presumptive argumentation scheme descriptors at the level above.
As a result of this fulfilment relation, premises of the rule application node fulfil the
premise descriptors, the conclusion fulfils the conclusion descriptor, presumptions
can fulfil presumption descriptors, and conflicts can be instantiated via instances
of conflict schemes that fulfil the conflict scheme descriptors at the level above.
Rephrase plays a slightly different role, that of connecting information nodes of
similar propositional content. Again, all the constraints at the intermediate layer
are inherited, and new constraints are introduced by virtue of the structure of the
argument at hand.

12.6 Argument Analytics
Argument Analytics9 provides a suite of automated techniques for quantitatively
processing and visualising characteristics of large sets of analysed argumentative
data (Lawrence et al., 2016). More specifically, the developed methods work with
any data conforming to the Argument Interchange Format, be it pre-annotated data
(from AIFdb, for instance), or the output of argument mining software. Argument
Analytics components range from the detailed statistics required for discourse
analysis or argument mining, to infographic-style representations, offering insights
in a way that is accessible to a general audience. The extendable set of modules
currently comprises: simple statistical data, which provides both an overview of
the argument structure and frequencies of patterns such as argumentation schemes;
dialogical data highlighting the behaviour of participants of the dialogue; and
real-time data allowing for the graphical representation of an argument structure
developing over time. Together these analytics open an avenue to giving feedback
on live debates, generating hypotheses from large sets of evidence, producing
summaries of citizen science, and more.
The Argument Analytics platform is designed specifically for making sense
out of argument data represented according to the AIF such as the data stored in
the AIFdb10 database Lawrence et al. (2012). AIFdb Corpora enables Argument
Analytics to display the interpretations of data, whether on a single AIF argument

9 http://analytics.arg.tech.
10 http://www.aifdb.org.

12 Argument Mining and Analytics in Archaeology

279

map (stored in AIFdb as a NodeSet), or a large corpus containing hundreds or
thousands of such AIF representations.

12.6.1 Simple Statistics
The simple statistics modules allows an analyst to quickly make sense of a large
amount of annotated argument data. Although these calculations are straightforward
and relatively easy to automate, they nevertheless provide interesting insights into
the data.
The overview page shows a range of statistics, offering a rapidly digested
summary of the overall argumentative structure. The number of Information nodes
provides an indication of the overall size of the analysis. The average number of
words per Information Node illustrates the complexity of the ideas presented, and
how succinctly they are expressed. The numbers of inference (RA) and conflict (CA)
nodes give a suggestion as to the nature of the dialogue, which is further expanded
by showing the ratios of RA to CA (capturing how diverse are the perspectives in
the debate) and RA to I (how dense the argumentation is). From these ratios it is
possible to get an idea of: how close the relationships are between the points being
made, low ratios of RA and CA to I-nodes suggest an argument that is quite loose
and fragmented; the levels of conflict and agreement; and, perhaps, how contentious
the issue being discussed is, with a higher ratio of CA to RA suggesting a more
contentious issue.
The Pattern Count modules expand on the overview to give detailed statistics
suitable for more in-depth argument and discourse analysis. They provide the
frequencies of commonly occurring patterns, split into two categories. Firstly,
argumentative and illocutionary patterns which describe both the nature of the
interactions, for example levels of agreement and disagreement, and the way in
which participants have expressed themselves and interacted with each other, such
as how frequently a participant questions the statements of others compared to how
frequently they assert their own views. The second category, dialogical patterns,
illustrates the flow of the discourse and gives an indication of any dialogical rules,
either explicit or implicit, to which the participants are conforming. Such dialogical
patterns are also useful, for instance, to show cross-cultural differences in dialogue,
or differences in the formality and setting of dialogues.

12.6.2 Comparative Statistics
Comparative statistics modules (Duthie et al., 2016) allow for the validation of
both manual and automatic argument analysis. Such calculations enable comparison
between two manual analyses to determine the efficacy of annotation guidelines via
inter-annotator agreement (e.g. Cohen’s Kappa Cohen, 1960), or the comparison

280

J. Lawrence et al.

of results from automatic techniques to a manually created gold standard (e.g.
precision, recall and F1-Score van Rijsbergen, 1979). The examples given in this
section refer to two human annotators, but in each case the same calculations could
be applied with one of these being an annotation produced by an automatic system.
There are a number of considerations that must be taken into account when
calculating agreement, such as what effect a differing segmentation of the original
text, in two separate annotations, may have on the assignment of inference and
conflict in an argument structure. To account for this, the agreement and results calculations were split into smaller sub-calculations covering segmentation similarity,
propositional contents (inference and conflict) and dialogical contents (locutions).
Calculating agreement for segmentation of argumentative units is a challenging task
(Wacholder et al., 2014). The modular architecture of Argument Analytics allows
for a range of measures to be displayed, and currently differences are accounted for
using various segmentation similarity algorithms, which give an overall normalised
score for the similarity. Propositional contents are compared by separating nodes
from the text and instead using the Levenshtein distance for the matching of nodes.
Dialogical contents are compared in the same way with word ordering added to the
Levenshtein distance for node matching and with the addition of added calculations
for the intricacies of dialogue (Duthie et al., 2016 provide an in-depth description of
the comparative statistics module).

12.6.3 Dialogue Statistics
For those argument analyses where there is a dialogue taking place between
multiple participants, a range of dialogue analytics modules are able to provide
insights into the dynamics of the discourse, and make these complex interactions
accessible to a general audience. There is growing demand to present complex
argumentative structures to a broad audience in ways which are both intuitive
and interactive. Whilst there is some progress towards this goal, for example, the
Election Debate Visualisation Project (Plüss & De Liddo, 2015), many of these
approaches rely on custom, genre-specific interfaces for both the elicitation and
display of argumentative structure. Dialogue oriented analytics modules make use of
both the locution details stored in AIFdb, as well as the participant details provided
by the Argument Web social layer (Snaith et al., 2017).
Each of the modules in this section are illustrated using data from an episode of
the BBC Radio 4 program Moral Maze.11 These examples show how such graphical
displays of information can take the technical details captured in the argumentative
structure of a complex debate, and present them in ways which are easily processed
by a general audience.

11 http://www.bbc.co.uk/programmes/b006qk11.

12 Argument Mining and Analytics in Archaeology

281

Fig. 12.1 Graphical representations of the relative involvement of each participant in a dialogue,
and how stimulating the points made by each participant are

The structural statistics modules extract particular facets of the argumentative
structure in order to display data such as who is speaking most, which pairs
of participants are interacting most and who is making the most well supported
arguments. As such, they provide a greater insight into the argumentative structure
than that which is afforded by looking at a simple argument map of the same data.
For each participant, the number of locutions they have made is counted
and represented in a bar chart. This provides an easy way of identifying which
participants were most, and least, dominant within a dialogue. An example can be
seen in Fig. 12.1, which shows that Jan Macvarish was the most active participant in
this dialogue with twenty-three locutions, whereas Matthew Taylor was least active
with only one locution made.
A point of debate is stimulating if it receives responses, either to agree or
disagree. From the analysed argument structure, we count the number of locutions
which each participant has made that have at least one response, and those which
have been ignored by the other participants. The example in Fig. 12.1 shows that
whilst Claire Fox has only made three locutions, they have all been responded to
in some way, whereas, of the six locutions made by Clifford Longley, only two
received any attention from the other participants.
The chord diagram shows the interaction between participants. A chord diagram
is a graphical method of displaying the inter-relationships between data in a matrix.
The data is arranged radially around a circle with the relationships between the
points drawn as arcs connecting the data together. In this case, the arcs represent
interaction between participants, with the width of the arc at each end representing
the number of locutions made by that participant to which the connected participant
has responded. Viewing the interactions in this way makes it easy to identify, for
example, cliques. An example chord diagram can be seen in Fig. 12.2. Clicking
on a specific participant emphasises their connections with other participants. For

282

J. Lawrence et al.

Fig. 12.2 Chord diagrams showing the frequency of interactions between participants. The
diagram on the right shows Melanie Philips selected, highlighting just those interactions in which
she is involved

Fig. 12.3 Graphical representation of the turn structure in a dialogue, highlighting the way in
which each participant introduces themselves, followed by direct interactions between two pairs of
participants

example, with Melanie Philips selected (as shown on the right of the figure), we can
see that the majority of her interactions were with Jan Macvarish, reflecting the fact
that, for a period of the dialogue, Melanie was questioning Jan.
Similar to the average number of words per I-node presented in the overview,
verbosity shows a comparison of the average length of locutions made by each
participant. By comparing in this way, we are able to see not just the overall
complexity of the ideas expressed, but also how prolix or concise each participant
is in presenting their ideas.
Temporal statistics use the time-stamping of locutions provided by AIFdb to
show how the state of a dialogue has altered as it has progressed. These statistics
provide clues, not easily discernible from an argument map, as to when individual
participants have been most involved in the dialogue, when conflict has arisen, and
changes in topic that have occurred as the dialogue progresses.
Using the timestamping of locutions provided by AIFdb, a graphical representation of the turn structure in a dialogue is created. This visualisation provides a
quick overview of when each participant has been most active, suggesting details of
any pre-defined turn-taking rules. The example shown in Fig. 12.3 reflects the turn
structure in a Moral Maze episode. As the episode begins, each of the four regular
panelists speak briefly about the topic being discussed. A guest witness is then
introduced, and, after providing their own views on the topic, are then questioned
by first one of the panelists and then by a second.

12 Argument Mining and Analytics in Archaeology

283

Semantics-based analytics use Dung-style semantics (Dung, 1995) to determine
the acceptability of a participant’s arguments. An AIF graph is translated into
ASPIC.+ then, using TOAST, a Dung-style abstract argumentation framework is
derived and evaluated.
The defended points in a dialogue, are those where conflicting points have been
made, but these conflicting points have, in turn been attacked. It is easy in a broad
ranging and complex dialogue for points to be made which are not challenged either
due to going unnoticed, or being simply dismissed. By looking at those points which
are challenged and then later defended we gain an insight into both the validity of a
point, and how crucial it is to the argument which a participant is making.
Where one participant has more acceptable arguments than another, the former is
said to carry more sway. This value is calculated for each participant, and displayed
as the relative balance in sway between each pair of the most commonly interacting
participants. This can, to some extent, be viewed as who is winning in a debate;
best supporting their own points and best attacking the points made by the other
participants in the dialogue.

12.6.4 Real-Time Statistics
Many of the modules used in Argument Analytics have the ability to not only
display data on a fixed, pre-analysed argument structure, but to update in realtime as the structure evolves. This functionality has been used, for example,
in a tool developed for the Built Environment for Social inclusion through the
Digital Economy (BESiDE) project,12 to facilitate round table discussions between
architects working on the design of care environments, and the various stakeholders
involved in the design process.
As the discussion is taking place, the audio is recorded and an analyst uses a
custom-designed interface to segment the dialogue when either the topic or the
speaker changes. A simple dialogue protocol is used, allowing participants to make
moves of various types (e.g. asking questions, agreeing with another participant, and
offering their own opinion), and relating to a set of predefined topics relevant to the
design project.
Throughout the discussion, the dialogue overview shown in Fig. 12.4 is displayed
for all participants to see. This overview includes a transcript of the dialogue on the
right hand side, and analytics modules displaying how much each participant has
spoken, and which topics have been discussed on the left. In testing these interfaces,
it is interesting to see that they serve not only an informative function, but actually
impact the dynamics of the dialogue. When a participant can see that they are talking
more than everyone else, they tend to let others speak more. When someone hasn’t
spoken yet, the other participants notice this, and make an effort to direct questions

12 http://beside.ac.uk/.

284

J. Lawrence et al.

Fig. 12.4 Real-time Argument Analytics highlighting the involvement of individual participants
and the topics discussed

at them. And, when one topic has been less explored than the others, there is a
noticeable shift towards that area in both the questions asked and the points raised.
This ability for the argumentative and dialogical structure to, not only represent
the outcome of a discussion, but to inform the participants and help ensure that
all areas are fully explored has wide ranging potential applications. The current
limitation to providing this kind of interface more widely is the ability to perform
real-time analysis, but as tools, such as the Analysis Wall (Bex et al., 2013) which
has been used to analyse several hour-long radio programmes in real time, improve,
and automatic argument mining techniques develop, it is easy to imagine such a live
display accompanying activities such as debates, meetings and media coverage.

12.7 Conclusion
The recent rapid growth in argument mining shows that there is an increasing
demand for the automated extraction of deeper meaning from the vast amounts
of data that we currently produce and consume. Although techniques in opinion
mining are able to tell us which conclusions are drawn, they do not tell us how
those conclusions are supported. There is substantial commercial opportunity here
as businesses increasingly want to build on the data that they gather in order to know
more about the thoughts and behaviours of their customers, and it is unsurprising
that many of the large players in the field are engaging, most visibly to date, IBM.
One of the primary remaining challenges faced by argument mining is the lack
of annotated argument data represented in a consistent ontology. Much recent
work has focused on producing annotation guidelines targeted at specific domains
(e.g. Kirschner et al., 2015, Walker et al., 2014, and Kiesel et al., 2015), and

12 Argument Mining and Analytics in Archaeology

285

whilst this has shown that data from these fields can be consistently annotated,
the use of specific annotation schemes aimed at individual areas means that any
techniques developed using this data are limited to that domain. The volume of
data, particularly data annotated at the most fine-grained level, is still far below
what would be required to apply many of the techniques previously discussed in
a domain independent manner. Attempts are being made to overcome this lack of
data, including the use of crowdsourced annotation (Ghosh et al., 2014; Skeppstedt
et al., 2018) and automatic methods to extend the data currently annotated (Bilu
et al., 2015). As these efforts combine with increasing attention to manual analysis,
the volume of data available should increase rapidly. Schulz et al. (2018) also offer
some solace in this regard, showing how multi task learning (training models across
datasets from different domains), can improve results in domains where limited
domain specific annotated data is available. To the best of our knowledge, no largescale corpora are available yet for Archaeology or Cultural Heritage.
Even in cases where there is a greater volume of data, conflicting notions of
argument are often problematic. In a qualitative analysis of six different, widely
used, argument datasets, Daxenberger et al. (2017) show that each dataset appears
to conceptualise claims quite differently. These results clearly highlight the need for
greater effort in building a framework in which argument mining tasks are carried
out, covering all aspects from agreement on the argument theoretical concepts
being identified, through to uniform presentation of results and data. The Argument
Interchange Format (Chesñevar et al., 2006) is a constructive proposal in this
direction to arrive at a shared ontology of argumentation.
Argument mining remains profoundly challenging, and traditional methods on
their own seem to need to be complemented by stronger, knowledge-driven analysis
and processing. However, the pieces required to successfully automate the process
of turning unstructured data into structured argument are starting to take shape. As
the volume of analysed argument continues to increase, and existing techniques are
further developed and brought together, rapid progress can be expected.
In addition to argument mining, we discussed the Argument Analytics suite,
which provides a comprehensive range of analytic tools from the detailed statistics
required for discourse analysis, to graphic visual representations making the same
data accessible to a general audience. The existing modules which we have
described offer solutions to a broad range of potential user groups, including
those involved in argument analysis and critical discourse analysis, those working
on argument mining applications, scientists working with complex evidence sets,
people performing political or social studies, and members of the general public
who wish to get a greater understanding of the issues and dynamics of a complex
debate.
Acknowledgments This research has received financial support from the grant “Heritage 3.0: Argumentation and Conceptual Modelling for Enhanced Cultural Heritage
Participation and Management Policies” (ACME), grant number PID2020-114758RBI00 funded by MCIN/AEI/10.13039/501100011033 and project “Deflationist Views in

286

J. Lawrence et al.

Ontology and Metaontology”, grant number PID2020-115482GB-I00, both funded by
MCIN/AEI/10.13039/501100011033.

References
Abbott, R., Ecker, B., Anand, P., & Walker, M. A. (2016). Internet argument corpus 2.0: An SQL
schema for dialogic social media and the corpora to go with it. In Proceedings of the 10th
International Conference on Language Resources and Evaluation (LREC), Portoroz (pp. 4445–
4452).
Anthony, R., & Kim, M. (2015). Challenges and remedies for identifying and classifying
argumentation schemes. Argumentation, 29(1), 81–113.
Aristotle. (1958). Topics. Oxford: Oxford University Press.
Bex, F., Lawrence, J., Snaith, M., & Reed, C. (2013). Implementing the argument web. Communications of the ACM, 56(10), 66–73.
Bilu, Y., Hershcovich, D., & Slonim, N. (2015). Automatic claim negation: Why, how and
when. In: Proceedings of the 2nd Workshop on Argumentation Mining (pp. 84–93). Denver:
Association for Computational Linguistics.
Budzynska, K. (2011). Araucaria-PL: Software for teaching argumentation theory. In Proceedings
of the Third International Congress on Tools for Teaching Logic (TICTTL 2011), Salamanca
(pp. 30–37).
Budzynska, K., Janier, M., Reed, C., Saint-Dizier, P., Stede, M., & Yaskorska, O. (2014). A model
for processing illocutionary structures and argumentation in debates. In Proceedings of the 9th
Edition of the Language Resources and Evaluation Conference (LREC), Reykjavik (pp. 917–
924)
Budzynska, K., Koszowy, M., & Pereira-Fariña, M. (2021). Associating ethos with objects:
Reasoning from character of public figures to actions in the world. Argumention, 35(4), 519–
549.
Carlile, W., Gurrapadi, N., Ke, Z., & Ng, V. (2018). Give me more feedback: Annotating
argument persuasiveness and related attributes in student essays. In Proceedings of the 56th
Annual Meeting of the Association for Computational Linguistics. Melbourne: Association for
Computational Linguistics.
Chandler, D. (2003). Semiotics: The basics (1st publ. repr edition). London: Routledge.
Chapman, R., & Wylie, A. (2016). Evidential reasoning in archaeology. London: Bloomsbury
Academic.
Chesñevar, C., McGinnis, J., Modgil, S., Rahwan, I., Reed, C., Simari, G., South, M., Vreeswijk,
G., & Willmott, S. (2006). Towards an argument interchange format. The Knowledge Engineering Review, 21(04), 293–316.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological
Measurement, 20(1), 37–46.
Daxenberger, J., Eger, S., Habernal, I., Stab, C., & Gurevych, I. (2017). What is the essence of a
claim? Cross-domain claim identification. In Proceedings of the 2017 Conference on Empirical
Methods in Natural Language Processing (pp. 2055–2066). Copenhagen: Association for
Computational Linguistics.
Dung, P. M. (1995). On the acceptability of arguments and its fundamental role in nonmonotonic
reasoning, logic programming and n-person games. Artificial Intelligence, 77(2), 321–357.
Duschl, R. A. (2007). Quality argumentation and epistemic criteria. In Argumentation in science
education (pp. 159–175). Berlin: Springer.
Duthie, R., Lawrence, J., Budzynska, K., & Reed, C. (2016). The CASS technique for evaluating
the performance of argument mining. In Proceedings of the 3rd Workshop on Argumentation
Mining (pp. 40–49). Berlin: Association for Computational Linguistics.

12 Argument Mining and Analytics in Archaeology

287

Feng, V. W., & Hirst, G. (2011). Classifying arguments by scheme. In Proceedings of the
49th Annual Meeting of the Association for Computational Linguistics: Human Language
Technologies-Volume 1 (pp. 987–996). Portland: Association for Computational Linguistics.
Freeman, J. B. (1991). Dialectics and the macrostructure of arguments: A theory of argument
structure (vol. 10). Berlin: Walter de Gruyter.
Freeman, J. B. (2011). Argument structure: Representation and theory. Berlin: Springer.
Galassi, A., Kersting, K., Lippi, M., Shao, X., & Torroni, P. (2020). Neural-symbolic argumentation
mining: An argument in favor of deep learning and reasoning. Frontiers in Big Data, 2, 52.
Garssen, B. J. (2001). Argument schemes. In F. H. van Eemeren (ed.), Crucial concepts in
argumentation theory (pp. 81–99). Amsterdam: Amsterdam University Press.
Ghosh, D., Muresan, S., Wacholder, N., Aakhus, M., & Mitsui, M. (2014). Analyzing argumentative discourse units in online interactions. In Proceedings of the First Workshop on
Argumentation Mining (pp. 39–48). Baltimore: Association for Computational Linguistics.
Green, N. (2014). Towards creation of a corpus for argumentation mining the biomedical genetics
research literature. In Proceedings of the First Workshop on Argumentation Mining (pp. 11–18).
Baltimore: Association for Computational Linguistics.
Green, N. (2015). Identifying argumentation schemes in genetics research articles. In Proceedings
of the 2nd Workshop on Argumentation Mining (pp. 12–21). Denver: Association for Computational Linguistics.
Green, N. (2018). Proposed method for annotation of scientific arguments in terms of semantic
relations and argument schemes. In Proceedings of the 5th Workshop on Argument Mining.
Brussels: Association for Computational Linguistics.
Grennan, W. (1997). Informal logic: Issues and techniques. Montreal: McGill-Queen’s PressMQUP.
Groarke, L., Tindale, C., & Fisher, L. (1997). Good reasoning matters! A constructive approach to
critical thinking. Toronto: Oxford University Press.
Habernal, I., & Gurevych, I. (2017). Argumentation mining in user-generated web discourse.
Computational Linguistics, 43(1), 125–179.
Hastings, A. C. (1963). A reformulation of the modes of reasoning in argumentation. Ph.D. Thesis,
Northwestern University.
Hidey, C., & McKeown, K. (2018). Persuasive influence detection: The role of argument
sequencing. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence,
New Orleans.
Hua, X., & Wang, L. (2017). Neural argument generation augmented with externally retrieved
evidence. In Proceedings of the 55th Annual Meeting of the Association for Computational
Linguistics (pp. 219–230). Vancouver: Association for Computational Linguistics.
Janier, M., Lawrence, J., & Reed, C. (2014). OVA+: An argument analysis interface. In S. Parsons,
N. Oren, C. Reed, & F. Cerutti (Eds.), Proceedings of the Fifth International Conference on
Computational Models of Argument (COMMA 2014) (pp. 463–464). Pitlochry: IOS Press.
Janier, M., & Reed, C. (2016). Corpus resources for dispute mediation discourse. In Proceedings
of the 10th International Conference on Language Resources and Evaluation (LREC), Portoroz
(pp. 1014–1021).
Katzav, J., & Reed, C. (2004). On argumentation schemes and the natural classification of
arguments. Argumentation, 18(2), 239–259.
Ke, Z., Carlile, W., Gurrapadi, N., & Ng, V. (2018). Learning to give feedback: Modeling
attributes affecting argument persuasiveness in student essays. In Proceedings of the TwentySeventh International Joint Conference on Artificial Intelligence, IJCAI-18 (pp. 4130–4136).
Stockholm: International Joint Conferences on Artificial Intelligence Organization.
Kienpointner, M. (1992). Alltagslogik: Struktur und funktion von argumentationsmustern.
Stuttgart: Frommann-Holzboog.
Kiesel, J., Al Khatib, K., Hagen, M., & Stein, B. (2015). A shared task on argumentation mining
in newspaper editorials. In Proceedings of the 2nd Workshop on Argumentation Mining (pp.
35–38). Denver: Association for Computational Linguistics.

288

J. Lawrence et al.

Kirschner, C., Eckle-Kohler, J., & Gurevych, I. (2015). Linking the thoughts: Analysis of
argumentation structures in scientific publications. In Proceedings of the 2nd Workshop on
Argumentation Mining (pp. 1–11). Denver: Association for Computational Linguistics.
Knott, A. (1996). A data-driven methodology for motivating a set of coherence relations. Ph.D.
Thesis, Department of Artificial Intelligence, University of Edinburgh.
Lawrence, J., Bex, F., Reed, C., & Snaith, M. (2012). AIFdb: Infrastructure for the argument web.
In Proceedings of the Fourth International Conference on Computational Models of Argument
(COMMA 2012) (pp. 515–516).Vienna: IOS Press.
Lawrence, J., Duthie, R., Budzysnka, K., & Reed, C. (2016). Argument analytics. In P. Baroni,
M. Stede, & T. Gordon (Eds.), Proceedings of the Sixth International Conference on Computational Models of Argument (COMMA 2016) (pp. 371–378). Berlin. IOS Press.
Lawrence, J., Park, J., Budzynska, K., Cardie, C., Konat, B., & Reed, C. (2017). Using argumentative structure to interpret debates in online deliberative democracy and erulemaking. ACM
Transactions on Internet Technology, 17(3), 25.
Lawrence, J., & Reed, C. (2015). Combining argument mining techniques. In: Proceedings
of the 2nd Workshop on Argumentation Mining (pp. 127–136). Denver: Association for
Computational Linguistics.
Lawrence, J., & Reed, C. (2016). Argument mining using argumentation scheme structures. In P.
Baroni, M. Stede, & T. Gordon (Eds.), Proceedings of the Sixth International Conference on
Computational Models of Argument (COMMA 2016) (pp. 379–390). Potsdam: IOS Press.
Lawrence, J., & Reed, C. (2017). Mining argumentative structure from natural language text using
automatically generated premise-conclusion topic models. In Proceedings of the 4th Workshop
on Argument Mining (pp. 39–48). Copenhagen: Association for Computational Linguistics.
Lawrence, J., & Reed, C. (2020). Argument mining: A survey. Computational Linguistics, 45(4),
765–818.
Lawrence, J., Visser, J., & Reed, C. (2019). An online annotation assistant for argument schemes. In
Proceedings of the 13th Linguistic Annotation Workshop (pp. 100–107). Florence: Association
for Computational Linguistics.
Levy, R., Gretz, S., Sznajder, B., Hummel, S., Aharonov, R., & Slonim, N. (2017). Unsupervised
corpus–wide claim detection. In Proceedings of the 4th Workshop on Argument Mining (pp.
79–84). Copenhagen: Association for Computational Linguistics.
Lucas, G. (2019). Writing the past, 1 edn. Milton: Routledge.
Miller, G. A. (1995). Wordnet: A lexical database for english. Communications of the ACM, 38(11),
39–41.
Moens, M.-F., Boiy, E., Palau, R. M., & Reed, C. (2007). Automatic detection of arguments in
legal texts. In Proceedings of the 11th International Conference on Artificial Intelligence and
Law (pp. 225–230). Stanford: ACM.
Murdock, J., Allen, C., Borner, K., Light, R., McAlister, S., Ravenscroft, A., Rose, R., Rose, D.,
Otsuka, J., Bourget, D., Lawrence, J., & Reed, C. (2017). Multi-level computational methods
for interdisciplinary research in the hathitrust digital library. PLOS ONE, 12(9), 1–21.
Niculae, V., Park, J., & Cardie, C. (2017). Argument mining with structured SVMS and RNNS. In
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 985–995). Vancouver: Association for Computational Linguistics.
Palau, R. M., & Moens, M.-F. (2009). Argumentation mining: The detection, classification and
structure of arguments in text. In Proceedings of the 12th International Conference on Artificial
Intelligence and Law (pp. 98–107). Barcelona: ACM.
Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in
Information Retrieval, 2, 1–135.
Pankhurst, R. J. (1978). Biological identification. London: Edward Arnold.
Park, J., & Cardie, C. (2014). Identifying appropriate support for propositions in online user
comments. In Proceedings of the First Workshop on Argumentation Mining (pp. 29–38).
Baltimore: Association for Computational Linguistics.

12 Argument Mining and Analytics in Archaeology

289

Peldszus, A. (2014). Towards segment-based recognition of argumentation structure in short
texts. In Proceedings of the First Workshop on Argumentation Mining (pp. 88–97). Baltimore:
Association for Computational Linguistics.
Perelman, C., & Olbrechts-Tyteca, L. (1969). The new rhetoric: A treatise on argumentation. Notre
Dame: University of Notre Dame Press.
Plüss, B., & De Liddo, A. (2015). Engaging citizens with televised election debates through
online interactive replays. In Proceedings of the ACM International Conference on Interactive
Experiences for TV and Online Video (pp. 179–184). New York: ACM.
Pollock, J. L. (1995). Cognitive carpentry: A blueprint for how to build a person. Cambridge: MIT
Press.
Prakken, H. (2010). An abstract framework for argumentation with structured arguments. Argument and Computation, 1(1), 93–124.
Rahwan, I., Zablith, F., & Reed, C. (2007). Laying the foundations for a world wide argument web.
Artificial Intelligence, 171, 897–921.
Reed, C. (2006). Preliminary results from an argument corpus. In E. M. Bermúdez & L. R. Miyares
(Eds.), Linguistics in the twenty-first century (pp. 185–196). Cambridge: Cambridge Scholars
Press.
Reed, C., Mochales Palau, R., Rowe, G., & Moens, M.-F. (2008). Language resources for studying
argument. In Proceedings of the 6th Language Resources and Evaluation Conference (LREC2008), Marrakech (pp. 91–100).
Reed, C., & Rowe, G. (2004). Araucaria: Software for argument analysis, diagramming and
representation. International Journal on Artificial Intelligence Tools, 13(4), 961–980.
Rinott, R., Dankin, L., Perez, C. A., Khapra, M. M., Aharoni, E., & Slonim, N. (2015). Show me
your evidence-an automatic method for context dependent evidence detection. In Proceedings
of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon (pp.
440–450).
Schulz, C., Eger, S., Daxenberger, J., Kahse, T., & Gurevych, I. (2018). Multi-task learning
for argumentation mining in low-resource settings. In Proceedings of the 2018 Conference
of the North American Chapter of the Association for Computational Linguistics: Human
Language Technologies, Volume 2 (Short Papers) (pp. 35–41). New Orleans: Association for
Computational Linguistics.
Skeppstedt, M., Peldszus, A., & Stede, M. (2018). More or less controlled elicitation of argumentative text: Enlarging a microtext corpus via crowdsourcing. In Proceedings of the 5th Workshop
on Argument Mining (pp. 155–163). Brussels: Association for Computational Linguistics.
Snaith, M., Medellin, R., Lawrence, J., & Reed, C. (2017). Arguers and the argument web. In
F. Bex, F. Grasso, N. Green, F. Paglieri, & C. Reed (Eds.), Argument technologies: Theory,
analysis & applications (pp. 57–72). College Publications.
Snoeck Henkemans, A. F. (1992). Analyzing complex argumentation. SicSat.
Song, Y., Heilman, M., Beigman Klebanov, B., & Deane, P. (2014). Applying argumentation
schemes for essay scoring. In Proceedings of the First Workshop on Argumentation Mining
(pp. 69–78). Association for Computational Linguistics.
Stab, C., & Gurevych, I. (2014a). Annotating argument components and relations in persuasive
essays. In Proceedings of the 25th International Conference on Computational Linguistics,
Dublin (pp. 1501–1510).
Stab, C., & Gurevych, I. (2014b). Identifying argumentative discourse structures in persuasive
essays. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language
Processing (EMNLP) (pp. 46–56). Doha: Association for Computational Linguistics.
Stab, C., & Gurevych, I. (2017). Parsing argumentation structures in persuasive essays. Computational Linguistics, 43(3), 619–659.
Toulmin, S. E. (1958). The uses of argument. Cambridge: Cambridge University Press.
van Eemeren, F. H. (2018). Argumentation theory: A pragma-dialectical perspective. Argumentation Library. Berlin: Springer.
van Eemeren, F. H., & Grootendorst, R. (1992). Argumentation, communication, and fallacies: A
pragma-dialectical perspective. Mahwah: Lawrence Erlbaum Associates.

290

J. Lawrence et al.

van Eemeren, F. H., Houtlosser, P., & Snoeck Henkemans, A. F. (2007). Argumentative indicators
in discourse: A pragma-dialectical study. Argumentation Library. Berlin: Springer.
van Rijsbergen, C. J. (1979). Information retrieval. Butterworth.
Visser, J., Duthie, R., Lawrence, J., & Reed, C. (2018a). Intertextual correspondence for integrating
corpora. In N. Calzolari, K. Choukri, C. Cieri, T. Declerck, S. Goggi, K. Hasida, H. Isahara,
B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, S. Piperidis, & T. Tokunaga (Eds.),
Proceedings of the Eleventh International Conference on Language Resources and Evaluation
(LREC 2018) (pp. 3511–3517). Miyazaki: European Language Resources Association (ELRA).
Visser, J., Konat, B., Duthie, R., Koszowy, M., Budzynska, K., & Reed, C. (2020). Argumentation
in the 2016 US presidential elections: annotated corpora of television debates and social media
reaction. Language Resources and Evaluation, 54(1), 123–154.
Visser, J., Lawrence, J., & Reed, C. (2020a). Reason-checking fake news. Communications of the
ACM, 63(11), 38–40.
Visser, J., Lawrence, J., Reed, C., Wagemans, J., & Walton, D. (2021). Annotating Argument
Schemes. Argumentation, 35(1), 101–139.
Visser, J., Lawrence, J., Wagemans, J., & Reed, C. (2018b). Revisiting computational models of
argument schemes: Classification, annotation, comparison. In S. Modgil, K. Budzynska, &
J. Lawrence (Eds.), Proceedings of the Seventh International Conference on Computational
Models of Argument (COMMA 2018) (pp. 313–324). Warsaw: IOS Press.
Voss, E. G. (1952). The history of keys and phylogenetic trees in systematic biology. Journal of
the Science Laboratories, Denison University, 43(1), 1–25.
Wacholder, N., Muresan, S., Ghosh, D., & Aakhus, M. (2014). Annotating multiparty discourse:
Challenges for agreement metrics. LAW VIII, p. 120.
Wagemans, J. (2016). Constructing a periodic table of arguments. In P. Bondy & L. Benacquista
(Eds.), Argumentation, Objectivity, and Bias: Proceedings OSSA 11 (pp. 1–12). OSSA.
Walker, M. A., Tree, J. E. F., Anand, P., Abbott, R., & King, J. (2012). A corpus for research
on deliberation and debate. In Proceedings of the 8th International Conference on Language
Resources and Evaluation (LREC), Istanbul (pp. 812–817).
Walker, V., Vazirova, K., & Sanford, C. (2014). Annotating patterns of reasoning about medical
theories of causation in vaccine cases: Toward a type system for arguments. In Proceedings
of the First Workshop on Argumentation Mining (pp. 1–10). Baltimore: Association for
Computational Linguistics.
Walton, D. (1996). Argumentation schemes for presumptive reasoning. Mahwah: Lawrence
Erlbaum Associates.
Walton, D. (2006). Fundamentals of critical argumentation. Cambridge: Cambridge University
Press.
Walton, D., Reed, C., & Macagno, F. (2008). Argumentation schemes. Cambridge: Cambridge
University Press.
Webber, B., Egg, M., & Kordoni, V. (2011). Discourse structure and language technology. Natural
Language Engineering, 18(4), 437–490.
Wyner, A., Schneider, J., Atkinson, K., & Bench-Capon, T. (2012). Semi-automated argumentative
analysis of online product reviews. In Proceedings of the Fourth International Conference on
Computational Models of Argument (COMMA 2012) (pp. 43–50). Vienna: IOS Press.

Chapter 13

Computational Processing of Language
Vagueness for Archaeological Site
Modelling
Maria Elena Castiello

Abstract This chapter aims to outline the challenge of language uncertainty and
vagueness for the construction of predictive models in archaeology. It includes
methods and examples to deal with the issue of uncertainty and vagueness arising
from archaeological datasets and elaborates quantitative tools to process and
integrate it in a Machine Learning (ML) framework. In particular, the chapter is
focused on the combination of a fuzzy set approach with the well-known ensemble
algorithm of Random Forest (RF). On this basis, Archaeological Predictive Maps
(APM) for two case studies are produced and an uncertainty visualization strategy is
defined, based on statistics and cognitive theory methods. A procedure is suggested
in order to visually represent and communicate the uncertainty in the final output of
the modeling procedure.
A four-steps methodology is described here, to consistently estimate and process
the language vagueness, without increasing the computational cost of an ML
environment, so that it is possible to produce APMs, incorporating confidence
intervals and subjective values. The goal is to provide archaeologists with the
necessary theoretical and methodological infrastructure to critically evaluate and
compute various levels of uncertainty and vagueness that are inherent in archaeological databases and to design best practices for establishing scientific transparency
of the results, as well as to improve the efficiency of APMs in research and
decision-making processes, within cultural heritage management and archaeological research.
Keywords Archaeology · Uncertainty · Machine learning · Fuzzy theory ·
Predictive maps

M. E. Castiello (!)
Institute of Archaeological Sciences, University of Bern, Bern, Switzerland
e-mail: maria.castiello@faculty.unibe.ch
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
C. Gonzalez-Perez et al. (eds.), Discourse and Argumentation in Archaeology:
Conceptual and Computational Approaches, Quantitative Archaeology
and Archaeological Modelling, https://doi.org/10.1007/978-3-031-37156-1_13

291

292

M. E. Castiello

13.1 Introduction
There have been a multitude of different attempts at taxonomies for structuring
uncertainty (an example is shown in Fig. 13.1). Smithson’s taxonomy indicates
that uncertainty itself can be internally differentiated (Smithson, 1989) and recently
Spiegelhalter (2017) has provided a short categorization of uncertainty, focusing on
statistical modelling of risks, which is expressed as follow:
– aleatoric (irreducible randomness inherent in a process)
– epistemic (uncertainty from a lack of knowledge that could theoretically be
reduced given more information)
– ontological (uncertainty about how accurately the modelling describes reality,
which can only be described subjectively)
Epistemic uncertainty, considered as ‘lack of information’, is an unavoidable aspect
of life and is inherent to most data. In this Chapter, the term uncertainty is used to
refer to the epistemic one, that can be quantified and visualized, most commonly
as a probability distribution a priori of any archaeological (predictive) modelling
procedure. Moreover, being the future by definition uncertain, as we only have
limited information about it, predictions are furthermore generally uncertain, and
this uncertainty can only be resolved by comparing it with the actual outcome,
according to Piotrowski (2019).
Thus, using models to project probable futures, based on current information
and understanding, can entail additional uncertainties. Indeed, Evans (2012) framed
that the uncertainty can have various sources, occurring throughout the life cycle
of any model exercise: (i) uncertainty associated with input data; (ii) model choice;
(iii) model parameters; and (iv) model outputs. Input data in particular can introduce
uncertainty by various means, such as errors in data measurement, overlooking data,
or through the choice of inappropriate sample sizes or inappropriate discretization
measures. Choices in model settings might also introduce uncertainty in different

Fig. 13.1 Smithson’s taxonomy of ignorance. (Source: Smithson, 1989: 9)

13 Computational Processing of Language Vagueness for Archaeological Site. . .

293

ways, in particular by the selection of variables, scale, parameters, and algorithms,
or mathematical transformations (Espig et al., 2020; Brouwer Burg et al., 2012).
But epistemic uncertainty does not only concern the future instead mainly the
past, and everything is interconnected with it. Fusco and de Runz (2020) have rightly
pointed out that “what we perceive about past human activities is limited to the
materials that cross the ages to reach us, and obviously to the areas investigated
and methods used”. This archaeological incompleteness causes uncertainty (often
described linguistically/verbally) at different research stages, starting from the
excavation databases and the assembly of inventory collections, that are used today
for a multitude of quantitative analysis (e.g. spatial and statistical analyses; site
distribution maps; 3D reconstructions, etc.). As already highlighted by Niccolucci
and Hermon (2003), all “Archaeological (spatial) concepts are often defined in an
imprecise way” such as the archaeological “site” itself and epistemic vagueness
may occur indeed, “when trying to define what exactly is a settlement or when
struggling to ascribe an archaeological object to a precise epoch”, as Gonzalez-Perez
(2018) pointed out (Fig. 13.2). Scholars have long acknowledged and extensively
discussed this problem (see for example: Taheri et al., 2019; Ramos-Soto et al.,
2017; van der Leeuw, 2016; Barceló & Bogdanovic, 2015; Niccolucci & Hermon,
2015; Lieskovský et al., 2013; Evans, 2012; Mink et al., 2009; De Runz et al., 2007;
Refsgaard et al., 2007; Niccolucci & Hermon, 2004; Ducke, 2003). In 1996, Lock
G. and Harris T. underlined the general weight of uncertainty in archaeology where
data are mostly fragmentary and difficult to date. A theme addressed as well by
Bevan et al. in 2013, who said: “One final, complicating factor for archaeologists is
the fact that archaeological observations are very partial, imperfect records of past
activity. Much of the variability in our observed spatial patterns is due to patchy
levels of archaeological preservation and investigation”. The uncertainty or the bias
encompassed in the archaeological datasets is clearly an endemic factor and can
often be traced back to the survey and storage/digitalization methods.
Hermon and Niccolucci (2003) pointed out how the elaboration of a database is
essentially based on maps (sometimes in the traditional paper format, which needs to
be digitized), photos (that need to be adjusted with photogrammetry programs) and
drawings (both hand and computer made) and by doing so, archaeologists perform a
simplification, that in some cases may also result in an oversimplification of the past
reality. Likewise, the typological classification of an archaeological object presents
several aspects that may affect the accuracy of the archiving procedure. If we look
closely at the excavation activity, the uncovered objects or structures are often
only the remains of what constituted probably the foundations of ancient buildings.
This scarce information needs to be completed by analyzing sparse material, using
ethnographic parallels, textual descriptions or by comparing with better preserved
sites (if they exist). The rest of the interpretation has to be completed by the
researcher based on his/her knowledge or, ultimately left incomplete. When creating
an excavation database, the interpretations of analytical results are always prone
to an unknown degree of uncertainty. Similarly, many uncertainties surround the
interpretive possibilities of an archaeological record as well as the subsequent
contextualization and modelling of human behavior.

294

M. E. Castiello

Fig. 13.2 A conceptual model of uncertainty in spatial data. (Adapted from Fischer et al., 2006)

The uncertain, inaccurate archaeological information problem has been
approached in different ways over the time, believing that by tackling the issue
this would further strengthen the modelling accuracy. Fusco and de Runz (2020) in
fact point out that “if swept under the carpet, data imperfection spreads throughout
analyses, results and interpretation. It then grows out of control, and prevents us
from assessing the validity of our conclusions or from directly comparing situations
and phenomena.” Thus, the integration of uncertainty can moreover allow for
questioning how models can be compared, and provides more information about
the data and the model itself (Morrison, 2015).
A seen so far, any modelling exercise is sensitive to the quality and quantity of
the underlying data. However, at the same time, defining what is meant by quality of
information can be very difficult, given the diversity of dimensions that this concept
takes on (Fusco, 2016). As of yet, according to Goodchild (2003):
Quality [...] is a measure of the difference between the data and the reality that they
represent, and becomes poorer as the data and the corresponding reality diverges. Thus,
if data are of poor quality, and tell us little about the geographic world, then they have little
value.

13 Computational Processing of Language Vagueness for Archaeological Site. . .

295

The predictive models used in archaeology are no exception to such rules. Archaeological Predictive Maps (APMs) have been abundantly described in literature
and are intended here as exploratory data tool that can help identifying suitable
locations of specific types of human activity and their archaeological remains, to
trace and highlight patterns of settlement preference choices in the landscape (to
cite few examples: Castiello, 2022; Castiello & Tonini, 2021; Rogers et al., 2014;
Verhagen, 2007; Kamermans et al., 2005; Van Leusen, 2002; Kvamme, 1990).
APMs, as result of computational models are subject to accumulate an unknown
degree of uncertainty and reliability all along their designing process. However,
they have the clear advantage of being (1) formal and (2) executable and can
thus be automatically tested against large amount of actual linguistic assertions
(Piotrowski, 2019). According to Piotrowski (2019) “uncertainty, when processed
and manipulated in computational models, can be preserved through the modeling
exercise and can be formalized and made explicit in the visual results.
The problem of managing uncertain data in computer science has generally
received much more attention in mathematics and natural sciences than in humanities, and various methods and approaches have been developed by mathematicians
and statisticians to deal with the different types of uncertainty. This is probably
motivated by the need to provide effective real world-applications (e.g. the need to
deal with uncertainty data from environmental sensors or medical data, statistical
anonymized data, spatio-temporal data extrapolated by mobile applications, etc.)
(Piotrowski, 2019). The literature indeed provides several examples of computational approaches for dealing with uncertainty drawing heavily on probability
theory, statistics and information theory (see e.g. Nagypál & Motik, 2003; McBurney & Parsons, 2001; Shannon, 1948; Dempster, 1967; Shafer, 1976; Zadeh, 1965).
In the archaeological research field, Brouwer Burg et al. (2012) noticed that
despite the growing body of literature on archaeological modelling and more
recently of computational modelling, the question of uncertainty and model validation is still rarely addressed. It is believed, according to Gonzalez-Perez (2018) that
“revealing all doubts, as well as the general model limitations to the reader, can be
seen as a matter of scientific ethics, at least as important as compiling a convincing
story.” Attempts to resolve this conundrum of structuring and integrating uncertainty
in archaeological (predictive) modeling procedures have long mostly relied on a
‘classic’ probabilistic framework.
The Archäoprognose Brandenburg project carried out by Ducke (2014) and the
resulting APM, represent a first attempt made in this direction. A key concept of this
study was the management of uncertainty as introduced by missing data, incomplete
datasets, errors, and the diverse sources of information. The procedure selected
relied on Dempster Shafer Theory of Evidence (DST) (Van Leusen et al., 2009;
Ducke & Münch, 2005; Ejstrud, 2005; Ducke, 2003; Ejstrud, 2003). The author
assigns an explicit set of values to the data e.g. by weighting the data or variables
used in the modeling procedure, based on expert judgment or subjective knowledge.
The uncertainty in its chronological declination has been meanwhile probabilistically approached with Bayesian statistics (Buck et al., 1996). Desachy (2012), for
example, produced a first chronological model to deal with stratigraphic sequences

296

M. E. Castiello

interpretation and the chronological uncertainty, tasks generally handled during
archaeological excavations. Crema et al. (2010) addressed the similar issue of
intrinsic temporal uncertainty in archaeological datasets by adopting a probabilistic
approach and a diachronic analysis for developing a distribution model of the
Middle to Late Jomon pithouses in Japan, trying “to make best use of the
available information by integrating different degrees of knowledge”. The authors
ultimately suggested an “environment where comparisons between alternative
hypotheses are made easier”. Similarly, Bevan et al. (2013) explored probabilistic
and spatial-statistical methods for assigning pottery artifacts discovered during
intensive excavations carried out on Antikythera island (Greece) to particular
chronological periods. The authors suggested a belief-based approach to quantify
local, intra-site uncertainty and compare the chronology assigned across different
excavations and sites “by considering the degree to which the uncertainty associated
with one period is linked to the uncertainty associated with another”.
In the predictive archaeological exercise instead, scholars gradually moved away
from the ‘classical’ probabilistic framework, considered not the best suited to model
uncertainty and vagueness in predictive archaeology (Fusco & de Runz, 2020). For
instance, Conolly and Lake (2006) stated that:
Prediction is probabilistic. Very few, if any, models predict site occurrence with the absolute
certainty of presence or absence. Consequently it usually makes sense to talk about the
model correctly predicting site presence at some specified probability p between 0.0 and
1.0.

The most recent trends show a strong preference for a fuzzy logic approach. To
streamline, fuzzy logic (a branch of mathematics evolved out of “fuzzy set” theory)
is a technique that allows considering uncertainty by ranking the “truth” or the
accuracy of the modeled data by degree or percentage rather than seeing it as a
binary (true/false) information (Zadeh, 1965; Hájek, 1998; Yager & Filev, 1994;
Halpern, 2003). As useful recall on the fuzzy theory stands Gacôgne’s (2003)
definition: “Fuzzy logic, or more generally the treatment of uncertainties, is to
study the representation of imprecise knowledge and reasoning approached”. Zadeh
introduced the so-called fuzzy sets theory in 1965, characterized by a function that
may vary between 0 and 1, not only assuming the two extreme values as for ordinary
sets (crisp sets).
As defined by Fisher (2006), a fuzzy set F is a pair (!, µF) where !, is a set and
µF is the mapping of ! to the unit interval [0, 1].
degree of membership of ! in F

µF: ! → [0, 1]
µF (ω) for ω ∈ !

Although this approach has not yet been widely adopted in archaeology (Fusco,
2016; Evans, 2012; Refsgaard et al., 2007), its theoretical and methodological
framework is particularly well suited when trying to model human behavior over
space and time in attempting to analyze and predict with exactitude the spatial preference, choice, and movement of individuals. In the predictive modeling procedure,
such approach, as pointed out by Fusco (2016), has the advantage of keeping all

13 Computational Processing of Language Vagueness for Archaeological Site. . .

297

the available data rather than considering only those estimated as “reliable” and
eliminating the “unreliable” from the databases. This pre-selection in fact could even
be seen as counterproductive: in addition to loosing data, it could lead the analysis
to overestimate the quality and certainty of the data considered to be reliable. Thus,
this technique has been applied in many research fields since its formulation (Ragin,
2000; Roberts, 1986; Moraczewski, 1993; Sattler, 1996) but again, with very few
explorations undertaken in archaeology (Verhagen, 2007; Hatzinikolaou et al., 2003;
Niccolucci et al., 2001; Barceló & Pallarés, 1998). For instance, the ArchaeDyn
I research project (Favory & Nuninger, 2008) developed a method for weighting
archaeological data incompleteness and uncertainty through “Confidence Maps”.
This approach highlighted inventories data biases and reliability, suggested ways
to interpret the absence or the lower number of archaeological data in a particular
area investigated by using a fuzzy logic approach. This study contributed to raise
awareness on the importance of uncertainty visualization in archaeological data
processing. “Considering information in a fuzzy dimension offers an alternative
method which prevents us from making restrictive choices in modelling and/or
forcing us to reject all unreliable data” as Fusco and de Runz (2020) stressed out.
They indeed have very recently suggested an exploratory method to model the
spatiotemporal structures and dynamics of settlements during the Bronze Age in the
Syrian Fertile Crescent, which were described mainly by imperfect archaeological
data. The authors developed a Fuzzy set approach to tackle data imperfection and
to further make estimates and assumptions about potential settlement location in
unsurveyed areas by setting up an archaeological predictive modeling framework
which moreover integrated uncertainty.
Strong emphasis has been placed on the advantage of incorporating not only
textually but also numerically (calculating an index of reliability, for example one
that goes from “totally sure” – this is what we believe – to pure uncertainty – that
is “we have no proof whatsoever”) and visually such qualitative predication into
the predictive modelling procedure, which becomes central to a robust analysis and
strengthens the integrity of the research results (Gonzalez-Perez, 2018; Balla et al.,
2013; Jaroslaw & Hildebrandt-Radke, 2009; Vaughn & Crawford, 2009).
A quantification protocol allows us assigning quantitative uncertainty values to
qualitative linguistic labels (such as “unsure” or “unknown”, etc.) in a systematic
manner. Once obtained, quantitative values can be further algorithmically processed
in the ML framework.
The approach developed in this Chapter is part of an exploratory line of reasoning
where one first tries to identify the different levels of uncertainty expressed by the
data, going from the general to the specific, before to model uncertain sites through
the ML approach. In fact, also according to Farinetti et al. (2004), uncertainty
should ideally not be considered only a posteriori, when the results of the modeling
procedure have been already obtained, but should be integrated at the time of data
collection, not only as an attribute of the artifact or site in question (such as its epoch
or typology) but as the point of view, or even the theoretical basis from which the
data are considered.

298

M. E. Castiello

It is argued here that combining Machine Learning algorithms based on a binary
logic (“site”/“no site”) with a upstream fuzzy approach represents an innovative
and interesting solution and “can enable more rigorous research practice, and
attune archaeologists to data-centric imperfections processing in archaeological
data” (Gupta, 2020).
At the same time, it is also necessary to frame and communicate this uncertainty
effectively, thus to choose the right visualization method. The communication
of uncertainty can play indeed an equally significative role in the context of
quantification and processing of archaeological information, although the research
in this context has been relatively limited and often underestimated. It is only
recently that this research topic is emerging as of great importance.
Significant contributions in delivering uncertain data and information originate
by combining techniques using cognitive theory and psychology statistical methods
(Padilla et al., 2021; Hullman et al., 2015, 2018; Fernandes et al., 2018; ZikmundFisher et al., 2014). In this specific context, the most accredited method to visually
represent uncertainty belongs to the group of visual encoding channels (Munzner,
2014; MacEachren et al., 2012; Brodlie et al., 2012). As described by Padilla et al.
(2021):
Visual encoding channels define the appearance of marks using controls such as color,
position, and transparency. Techniques that use encoding channels have the added benefit
of adjusting a mark that is already in use, such as making a mark more transparent if the
uncertainty is high (Fig. 13.3).

Since the way which people reason with uncertainty is nonintuitive and it can
be intensified if uncertainty information is addressed and communicated visually
(Padilla et al., 2021), the encoding channel method is believed to be more efficient
in evoking uncertainty associations notably in geographic information systems and
cartography (Kinkeldey et al., 2014, 2017; MacEachren et al., 2018). Thus, it has
been selected in this Chapter to visually represent uncertainty in the final predictive
maps. It is argued that using a visualization technique based on encoding channels
can stimulate the archaeologists to incorporate uncertainty in their decision-making
process and indirectly instigate them to use this uncertainty information rather than
eliminating it.

Fig. 13.3 Examples of encoding channels. (Source: Padilla et al., 2021)

13 Computational Processing of Language Vagueness for Archaeological Site. . .

299

13.2 Case Studies
In a tangible manner, two case studies have been selected and analyzed for
uncertainty quantification processing and predictive modeling computations. The
institutional archaeological databases of the Cantons of Aargau and Geneva, two
regions respectively located in the northern and southern Switzerland (Figs. 13.4
and 13.5) were provided by the local Archaeological Departments1 in the form
of digital tables containing a list of surveys carried out in the regions over the
last decades and inventoried until 2015 (for Aargau: 3101 entries; for Geneva:
865 entries). Both databases embedded information belonging to different epochs,
spanning from Mesolithic to Middle Age with several attribute fields and details
about the discoveries made in the regions. As it is often the case, the recorded
information varies considerably in structure and quantity from one region to another.
Hence, two new constructed geo-spatial databases were developed in ArcGIS
environment (Esri, release 10.7) in order to establish a reproducible approach of
data management.
Since one of the goals of the modeling procedure as explained in this study is
to identify areas in the landscape susceptible to experience the presence of still
undiscovered Roman settlements (based on uncertain information), analyses were
focused on the “settlement” category, which often referred to as building-housingliving spaces. As shown in Fig. 13.6, the entries defined as belonging to the Roman
epoch account for roughly 10–30% of all data points in the Cantons of Aargau and
Geneva.
At a closer look to the databases, many fields carry doubt information and
numerous entries lack precise geographical coordinates, as well as several rows were
left blank. The inventoried records mainly express uncertainty or vagueness through
linguistic statements, in the sense that archaeologists qualitatively evaluated and
inventoried the objects discovered by means of a degree of subjective reliability by
adding attributes in the fields, such as: “sure”, “unsure”, “unknown”, “undefined”,
“possible”, “potential.” Then, a new database architecture was designed to minimize
the potential errors during new data entry or modification and to maximize
the database flexibility and its potential for further research based on the same
data. Entries with no coordinates or those left blank were erased. Coordinates
were checked and adjusted to comply with the new system requirements in use.
Descriptive information about each entry were cross-referenced.
In the specific case of AG database, the uncertainty is manifold. At time, it relates
to the chronology and to the typology characteristics. The degree of uncertainty is
originally stored in both fields of Typology and Datation with qualitative expressions
such as sure, unsure, unknown. In the Typology field we can find for example the
following entries: Settlement – unsure, Religious site - sure, Grave – unknown, etc.
1 Kanton Aargau, Departement Bildung, Kultur und Sport, Abteilung Kultur, Kantonsarchäologie. République et Canton de Genève, Office du patrimoine et des sites, Service cantonale
d’archéologie.

300

Fig. 13.4 Roman archaeological sites in the Canton of Aargau

M. E. Castiello

13 Computational Processing of Language Vagueness for Archaeological Site. . .

301

Fig. 13.5 Roman archaeological sites in the Canton of Geneva

The same is true for the Datation field, where information are stored as following:
Roman – sure, Medieval - unsure, Neolithic – unknown, etc.
In GE database, uncertainty is not differentiated between typology and datation,
but is assigned to each entry as a whole, e.g. a roman settlement is considered as
sure or a certain area is considered as potential-suspected Roman religious site.
The italicized terms are inexact concepts whose meanings are fuzzy. Thus,
uncertainty arising from the interpretation and processing of these inexact concepts
has nothing to do with randomness but is directly related to fuzziness. Since interaction exists between those linguistic variables, with varying degrees of intensity,
conventional binary representation is defined as usually inadequate (Leung, 1983).
Indeed, the uncertainty contained in the two databases analyzed is subjective and

302

M. E. Castiello

Fig. 13.6 Percentage of entries labeled as belonging to the Roman epoch in the Aargau and
Geneva databases. (Source: Castiello, 2022)

Fig. 13.7 The four steps of the proposed methodology

directly related to the opinion of the agent/researcher who compiled them in the first
instance, and the reliability of the data depends on said agent’s state of knowledge.
Fuzzy theory and Fuzzy sets can help to better address and handle this problem
because it does not require to have sharp boundaries that distinguish members of a
set from non-members. On the contrary, a fuzzy membership rather reflects a matter
of degree of belonging (Zadeh, 1965). A fuzzy set A of a universe X is defined by a
function that assigns to each object x in X a membership degree of x in A.

13.3 Method
Four sequential steps were followed for the exploration and processing of archaeological language uncertainty: (i) identification, (ii) quantification, (iii) modelling
and (iv) visualization as schematically illustrated in Fig. 13.7.
The methodological procedure essentially aims to explore the effectiveness of a
fuzzy set theory approach in archaeological data uncertainty quantification and to
develop a ML model framework for predicting archaeological settlements in given

13 Computational Processing of Language Vagueness for Archaeological Site. . .

303

areas. In particular, it suggests a way to visually represent the quantified uncertainty
and its incorporation into the modeling outputs. Thus, the approach selected implies
starting from what we know, within a computed degree of certainty (site presence
and their characteristics) and leads to model what we want to know (where is the
highest probability to discover archaeological sites).

13.3.1 Identification
According to the more recent literature (Fusco & de Runz, 2020; Martin-Rodilla
et al., 2019; Gonzalez-Perez, 2018; Fusco, 2016; Niccolucci & Hermon, 2015;
Oštir et al., 2007; Hatzinikolaou, 2006; Niccolucci & Hermon, 2004), after a first
screening and pre-processing of the databases, a fuzzy quantification procedure was
defined. The modelling procedure was then extended to compute the “numerical
confidence values” or “numerical degree of membership” assigned to the settlement
presences in both databases, which expresses the subjective level of ‘confidence’ in
the assignment under consideration (Quality uncertainty). Although this coefficient
of membership is assigned indeed “subjectively”, the procedure roots in a long
tradition that has seen among its major exponents De Finetti (1970) for probability
theory and Savage (1972) for statistics. Thus, the numeric values are the expression
in numerical terms of a series of elements, evaluated subjectively, in which the
experience and scientific correctness of the research converged (Hermon & Niccolucci, 2003) aiming to give the scientific status of measurability and verifiability
to a reliable problem.

13.3.2 Quantification
As shown in Table 13.1, a numeric value intended as a coefficient of membership
was first assigned to the Typology of the discovery as classified and stored in the
database. As mentioned above, this study focuses on roman settlements. Therefore,
site types corresponding to settlements are assigned the maximum membership
value of 1. Findings that may hint to the presence of settlements, but are not defined
as such, are assigned a value of 0.75 and single and other findings (e.g. ceramic
shreds, coins, etc.) the value 0.5.
Considering that e.g. Fortifications/Water infrastructure/Religious sites are often
discovered in close proximity to a settlement, the coefficient membership or degree
of reliability assigned to this categories was defined as the highest, and so on for
the others. The second step implied the assignment of a coefficient of membership
to the Datation field entries, as shown in the Table 13.2. It comes with no surprise
that datations corresponding to the roman period are assigned the value 1. A value
of 0.25 was assigned to the medieval epoch, as settlements of that period are
susceptible to have developed in continuum with former roman settlements.

304
Table 13.1 Coefficient
memberships assigned to
each class of the AG database

M. E. Castiello
Type uncertainty
Settlement
Fortification
Water infrastructure
Religious sites
Graves
Roads
Bridges
Quarry
Single finds
Others
Unknown

1
1
1
1
0.75
0.75
0.5
0.5
0.5
0.5
0.5

Table 13.2 Coefficient
memberships assigned to
each chronological definition
as defined in the AG database

Period uncertainty
Roman
1
Roman Empire 1
Medieval
0.25
Others
0

Table 13.3 The
quantification of uncertain
linguistic variables expressed
with a degree of reliability
value

Quality uncertainty
Sure
1
Unsure
0.75
Unknown
0.5

As mentioned above, the typological classification as well as the datation of each
entry is labeled with a degree of reliability expressed by a qualitative expression.
A degree of reliability value was assigned to the linguistic uncertainty terms that
accompanied the Typology and Datation definitions as shown in the Table 13.3.
Figure 13.8 shows the Typological uncertainty quantification as resulted from
the coefficient membership of Type multiplied with the degree of reliability Quality
of that definition. While Fig. 13.9 shows the Datation uncertainty derived from the
coefficient membership of Period multiplied with the degree of reliability Quality
of that definition. Finally, the Total uncertainty is calculated by multiplying the
Typological uncertainty with the Datation uncertainty (Fig. 13.10).
A similar, although more simple procedure was set up for computing the uncertainty in the GE database, as only one-dimensional information about uncertainty
is provided in this database. Table 13.4 shows the uncertainty for the linguistic
variable contained in the database, giving an appreciation of the reliability of the
interpretation for each entry of the database.

13 Computational Processing of Language Vagueness for Archaeological Site. . .
Fig. 13.8 Typological
uncertainty quantification for
the AG database

Fig. 13.9 Datation
uncertainty quantification for
the AG database

305

306

M. E. Castiello

Fig. 13.10 Total uncertainty quantification for the AG database
Table 13.4 General uncertainty quantification for the GE database
Quality uncertainty for Roman settlements in GE
Known and/or excavated site
Potential site extension (around known site)
Presumed site
Other epochs or no findings

Sure
Potential
Unsure
Absence

1
0.75
0.5
0

13.3.3 Modelling
Both AG and GE databases were processed and computed first in a GIS (Geographical Information System) environment and secondly within R, a software
environment for statistical computing and graphics (R Core team, 2018). The new
elaborated datasets carried only the quantified uncertainty were integrated in the RF
predictive modelling procedure. Specifically, for probability mapping the package
randomForest (Liaw & Wiener, 2002) was used.

13 Computational Processing of Language Vagueness for Archaeological Site. . .

307

Random Forest (RF) algorithm (Breiman et al., 2018; Breiman & Cutler, 2010),
a machine learning based approach capable of handling discrete values, was adapted
here to estimate the probability of discovering archaeological Roman settlements in
the two regions analyzed.2 Generally, the RF regression-based computations involve
the use of several geo-environmental proxies. A short list of these proxies contained:
Digital Elevation Model (DEM; altitude) and derivates (Slope, Northness and Eastness); Distance to water (lakes and rivers); Agricultural suitability; Depth of vegetal
soil; Soil skeleton; Water saturation and Water storage capacity; Permeability and
Nutrient storage capacity, and prone to influence the sites location, were combined
with the pre-processed archaeological data (site presences and site pseudo absences)
(Lotfian, 2016; Kulkarni & Sinha, 2012; Breiman, 2001; Liaw & Wiener, 2002). The
parameters for the RF logistic regression models were then calibrated in the same
way for both the regions, following the protocol as described in (Castiello 2022):
• Define the input training and testing datasets, including predictor variables and
response variables (1380 real presences and pseudo absences for AG and 241 real
presences and pseudo absences for GE)
• Perform an external cross-validation to add an additional accuracy measurement
(Spatial K-cross validation; Valavi et al., 2019)
• Select the number of trees to develop (1000)
• Select the number of predictor variables for creating the binary rules for each
split (4)

13.3.4 Visualization
The output of the regression predictive models performed are thus visually
expressed by the maps in Figs. 13.11 and 13.12. As mentioned, the regression
returned a prediction on continuous values [0.0, . . . ,1]. These APMs show the
probability of each pixel of the rasters to contain a Roman settlement. The
probability values are expressed with a gradient color scale that goes from 0.0
to 1 and from light green to dark green, where dark green corresponds to a “sure”
prediction for roman settlements, light green to an “unsure” prediction and white
to “sure” prediction of the absence of roman settlements. The scale of uncertainty
is thus reproduced by the color intensity which can be blurred proportional to
the uncertainty related to the settlements Typology and Datation. As a result, the
most uncertain values and areas all appear as the same shade of green. The range
absence-unknown-unsure-presumed-potential-sure is integrated and reproduced in
the prediction outputs.

2 A comprehensive and detailed description of RF functioning and application in archaeological
context and predictive modelling can be found for example in Castiello M.E. (2022) and Castiello
and Tonini (2021).

308

M. E. Castiello

Fig. 13.11 Predictive map for the Roman archaeological settlements in Canton Aargau

13.4 Conclusion
This Chapter explored the concept of processing and modeling uncertain information expressed through linguistic variables in an archaeological context. It sets
out how to explore, describe, quantify, process, and finally visualize uncertainty as
crucial steps of the archaeological research process.
The aim was to provide an overview of processing and quantification approaches
and to further integrate uncertain values within an innovative predictive modeling

13 Computational Processing of Language Vagueness for Archaeological Site. . .

309

Fig. 13.12 Predictive map for the Roman archaeological settlements in Canton Geneva

framework. The methodology developed accounts for the effects of quantified
uncertainty in the institutional archaeological databases of the two selected case
studies through a Fuzzy approach, as well as for presenting accessible methods to
model the location of Roman archaeological sites by using cutting edge technologies
of Machine Learning.
Given a set of environmental features selected as influential factors in site
location preferences, an innovative application of Random Forest algorithm for

310

M. E. Castiello

computing the probability to discover Roman archaeological sites in the two regions
analyzed was proposed.
Finally, a procedure for uncertainty visualization was selected from the list of
well known techniques in use in natural sciences, to help addressing archaeological
uncertain information to the very final prediction output in an effective manner.
First, different definitions and categorizations of uncertainty and ignorance were
presented to better frame the uncertainty in the original databases, essentially
expressed as vague information. Secondly, previous approaches to archaeological
uncertainty were examined and a suitable mathematical method was selected
and described, for quantifying an expert’s belief based on limited knowledge, to
archaeological scenarios. Fuzzy sets theory was explained in detail and applied
to the archaeological case studies. Third, the uncertainty quantification procedure
was integrated into the predictive modeling framework based on Random Forest algorithm. The regression analysis performed produced advanced predictive
maps, highlighting zones with highest and lowest probability to discover Roman
archaeological settlements, given a set of environmental proxies. The various levels
of uncertainty expressed by linguistic terms in the databases are integrated in
calculations and transposed in the final result by means of visualizing the degree
of uncertainty with color grades.
While an extensive literature on spatial and geo-historical data imprecision exists,
the literature on uncertainty in archaeological research contexts highlights some
cumbersome issues. Quantification, processing and visualization of archaeological
uncertainty are processes still in their infancy and at their conceptual stage when
compared to other disciplines. Reasoning with uncertainty or with imperfection
is unilaterally difficult, but in line with recent contributions to the uncertainty
quantification research, it is argued that complying with the various dimensions
of archaeological data imperfection will prevent us from assessing hypotheses on
past settlement patterns that are too rigid and restrictive. Indeed, as recent studies
have revealed, some types of modelling, for example through fuzzy logic and
fuzzy set theory, can broaden the horizons of archaeological research, and the
right visualization methods can improve decision-making in a variety of diverse
contexts, from hazard forecasting to healthcare communication and certainly also in
archaeology and cultural heritage management.

References
Balla, A., Pavlogeorgatos, G., Tsiafakis, D., & Pavlidis, G. (2013). Locating Macedonian tombs
using predictive modelling. Journal of Cultural Heritage, 14(5), 403–410.
Barceló, A., & Bogdanovic, I. (Eds.). (2015). Mathematics and archaeology. Taylor & Francis.
Barceló, J. A., & Pallarés, M. (1998). Beyond GIS: The archaeology of social spaces. Archaeologia
e Calcolatori, 1, 47–80.
Bevan, A., Crema, E.R., Li, X., & Palmisano, A. (2013). Intensities, Interactions and Uncertainties:
Some New Approaches to Archaeological Distributions. In Computational Approaches to

13 Computational Processing of Language Vagueness for Archaeological Site. . .

311

Archaeological Space, edited by A. Bevan, and M. Lake, 27–52. Walnut Creek: Left Coast
Press.
Breiman, L. (2001). Random forests. Machine Learning, 45, 15–32.
Breiman, L., & Cutler, A. (2010). Random forests. Available at: http://www.stat.berkeley.edu/
~breiman/RandomForests/
Breiman, L, Cutler, A, Liaw, A., & Wiener, M. (2018). Breiman and Cutler’s Random
Forests for Classification and Regression. R package version 4.6–14. https://doi.org/10.1023/
A:1010933404324
Brodlie, K., Allendes, R. O., & Lopes, A. (2012). A review of uncertainty in data visualization. In
J. Dill et al. (Eds.), Expanding the frontiers of visual analytics and visualization (pp. 81–109).
Springer.
Brouwer Burg, M., Peeters, H., & Lovis, W. A. (Eds.). (2012). Uncertainty and sensitivity analysis
in archaeological computational modeling. Springer/University of California.
Buck, C. E., Cavanagh, W., & Litton, C. D. (1996). Bayesian approach to interpreting archaeological data. Wiley.
Castiello, M. E. (2022). Computational and machine learning tools for archeological site
modeling. Springer. ISBN : 978-3-030-88566-3
Castiello, M. E., & Tonini, M. (2021). An explorative application of random forest algorithm for
archaeological predictive modelling. A Swiss case study. Journal of Computer Applications in
Archaeology, 4, 110–125.
Conolly, J., & Lake, M. (2006). Geographical information systems in archaeology (p. 338).
Cambridge University Press.
Crema, E. R., Bevan, A., & Lake, M. (2010). A probabilistic framework for assessing spatiotemporal point patterns in the archaeological record. Journal of Archaeological Science, 37(5),
1118–1130.
De Finetti, B. (1970). Teoria delle probabilità, Sintesi introduttiva con appendice critica. Einaudi.
De Runz, C., Desjardin, E., Piantoni, F. Herbin, M. (2007). Using fuzzy logic to manage uncertain
multi-modal data in an archaeological GIS. International symposium on Spatial Data Quality,
Pays-Bas, Enschede.
Dempster, A. P. (1967). Upper and lower probabilities induced by a multi-valued mapping. Annals
of Mathematical Statistics, 38, 325–339.
Desachy, B. (2012). Formaliser le raisonnement chronologique et son incertitude en archeologie de
terrain. Cybergeo: European Journal of Geography, Systemes, Modelisation, Geostatistiques,
document 597.
Ducke, B. (2003). Archaeological predictive modelling in intelligent network structure. In M.
Doerr & A. Sarris (Eds.), Proceedings of the 29th conference of the computer applications
in archaeology (pp. 267–273). Hellenic Ministry of Culture.
Ducke, B. (2014). An integrative approach to archaeological landscape evaluation: Locational
preferences, site preservation and uncertainty mapping. The Archaeology of Erosion, the
Erosion of Archaeology, 1, 13–22.
Ducke, B. & Münch U., 2005. Predictive modelling and the archaeological heritage of Brandenburg (Germany) (M. van Leusen & H. Kamermans, Eds.) (pp. 93–107).
Ejstrud, B., 2003. Indicative models in landscape management: Testing the methods. The archaeology of landscapes and geographic information systems. Predictive maps, settlement dynamics
and space and time in prehistory (J. Kunow & J. Müller, Eds.) (pp. 119–134).
Ejstrud, B. (2005). Taphonomic models. Using Dempster-Shafer theory to assess the quality of
archaological data and indicative models (H. Kamermans & M. van Leusen, Eds.) (pp. 189–
198).
Espig, M., Finlay-Smits, S.C., Meenken, E.D., Wheeler, D.M., Sharifi, M., Shah, M., 2020.
Understanding and communicating uncertainty in data-rich environments: Towards a transdisciplinary approach. In: Nutrient management in farmed landscapes. (Eds.) C.L. Christensen,
D.J. Horne and R. Singh). Occasional Report No. 33. Farmed Landscapes Research Centre,
Massey University, .

312

M. E. Castiello

Evans, A. (2012). Uncertainty and error. In A. J. Heppenstall, A. Crooks, L. M. See, & M. Batty
(Eds.), Agent-based models for geographical systems. Springer.
Farinetti, E., Hermon, S., & Niccolucci, F. (2004). Fuzzy logic application to artefact surface survey
data. In F. Niccolucci & S. Hermon (Eds.), Beyond the artifact: Digital interpretation of the
past: Proceedings of CAA 2004 (pp. 125–129). Budapest.
Favory, F., & Nuninger, L. (2008). ArchaeDyn. Dynamique spatiale du peuplement et ressources
naturelles: vers une analyse intégrée dans le long terme, de la Préhistoire au Moyen Age,
ArchaeDyn, Rapport d’activité scientifique 2005–2007, p. 71.
Fernandes, M., Walls, L., Munson, S., et al. (2018). Uncertainty displays using quantile dotplots or
CDFs improve transit decision-making. In Proceedings of the 2018 CHI conference on human
factors in Computing Systems, ACM, p. 144.
Fischer, P., Comber, A., Wadsworth, R. (2006). Approaches to Uncertainty in Spatial Data. In
R. Devillers & R. Jeansoulin (Eds.) Fundamentals of Spatial Data Quality. Wiley, ISBN:
9780470612156
Fisher, P. F. (2006). Models of uncertainty in spatial data. In P. A. Longley, M. F. Goodchild, D.
J. Maguire, & D. W. Rhind (Eds.), Geographical information systems: Principles, techniques,
management and applications (pp. 191–205). Wiley.
Fusco, J. (2016). Analyse des dynamiques spatio-temporelles des systèmes de peuplement dans un
contexte d’incertitude: Application à l’archéologie spatiale. University Nice Sophia Antipolis.
Retrieved from https://tel.archives-ouvertes.fr/tel-01341554
Fusco, J., & de Runz, C. (2020). Spatial fuzzy sets. In M. Gillings, P. Hacıgüzeller, & G. Lock
(Eds.), Archaeological spatial analysis. A methodological guide. Routledge.
Gacôgne, L. (2003). Logique floue et applications (p. 128). Institut d’informatique d’entreprise
d’Evry.
Gonzalez-Perez, C. (2018). Information modelling for archaeology and anthropology. Software
engineering principles for cultural heritage. Springer.
Goodchild, M. F. 2003. The nature and value of geographic information. In: M. Duckham, M. F.
Goodchild, & M. Worboys (Eds.), Foundations of geographic information science : Taylor &
Francis. pp. 18–30.
Gupta, N. (2020). Preparing archaeological data for spatial analysis. In M. Gillings, P. Hacıgüzeller,
& G. Lock (Eds.), Archaeological spatial analysis. A methodological guide. Routledge.
Hájek, P. (1998). Metamathematics of fuzzy logic. Kluwer.
Halpern, J. Y. (2003). Reasoning about uncertainty. MIT Press.
Hatzinikolaou, E. G. (2006). Quantitative methods in archaeological prediction: From binary to
fuzzy logic. In M. W. Mehrer & K. L. Wescott (Eds.), GIS and archaeological site location
modelling (pp. 437–446). Taylor & Francis.
Hatzinikolaou, E. G., Hatzichristos, T., Siolas, A., & Mantzourani, E. (2003). Predicting archaeological site locations using GIS and fuzzy logic. In M. Doerr & A. Sarris (Eds.), The digital
heritage in archaeology: Computer applications and quantitative methods in archaeology (pp.
169–178). Archive of Monuments and Publications, Hellenic Ministry of Culture.
Hermon, S., & Niccolucci, F. (2003). A Fuzzy Logic Approach to Typology in Archaeological
Research. In M. Doerr and A. Sarris (Eds), The digital Heritage of Archaeology. Athens,
Archive of Monuments and Publications. 307–310.
Hullman, J., Resnick, P., & Adar, E. (2015). Hypothetical outcome plots outperform error bars and
violin plots for inferences about reliability of variable ordering. PLoS One, 10(11), e0142444.
Hullman, J., Qiao, X., Correll, M., et al. (2018). In pursuit of error: A survey of uncertainty
visualization evaluation. IEEE, 25(1), 903–913.
Jaroslaw, J., & Hildebrandt-Radke, I. (2009). Using multivariate statistics and fuzzy logic system
to analyse settlement preferences in lowland areas of the temperate zone: An example from the
Polish Lowlands. Journal of Archaeological Science, 36(10), 2096–2107.
Kamermans, H., Deeben, J., Hallewas, D., Zoetbrood, P., van Leusen, M., & Verhagen, P.
(2005). Project proposal. In M. van Leusen & H. Kamermans (Eds.), Predictive modelling
for archaeological heritage management: A research agenda (Nederlandse Archeologische
Rapporten 29) (pp. 13–23). Rijksdienst voor het Oudheidkundig Bodemonderzoek.

13 Computational Processing of Language Vagueness for Archaeological Site. . .

313

Kinkeldey, C., MacEachren, A. M., & Schiewe, J. (2014). How to assess visual communication
of uncertainty? A systematic review of geospatial uncertainty visualisation user studies.
Cartography and Geography, 51(4), 372–386.
Kinkeldey, C., MacEachren, A. M., Riveiro, M., & Schiewe, J. (2017). Evaluating the effect
of visually represented geodata uncertainty on decision-making: systematic review, lessons
learned, and recommendations. Cartography and Geography Information Science, 44(1), 1–
21. https://doi.org/10.1080/15230406.2015.1089792
Kulkarni, V. Y., & Sinha, P. K. (2012). Pruning of Random Forest classifiers: A survey and future
directions. In International Conference on Data Science & Engineering (ICDSE), Cochin,
Kerala, 2012 (pp. 64–68). https://doi.org/10.1109/ICDSE.2012.6282329
Kvamme, K. L. (1990). The fundamental principles and practice of predictive archaeological
modeling. In A. Voorrips (Ed.), Mathematics and information science in archaeology: A flexible
framework (pp. 275–295). HOLOSVerlag.
Leung, Y. (1983). Fuzzy sets approach to spatial analysis and planning, a nontechnical evaluation.
Geografiska Annaler. Series B, Human Geography, 65(2), 65–75.
Liaw, A., & Wiener, M. (2002). Classification and regression by Random Forest. R News, 2(3),
18–22.
Lieskovský, T., Ďuračiová, R., & Karell, L. (2013). Selected mathematical principles of archaeological predictive models creation and validation in the GIS environment. Interdisciplinaria
archaeologica. Natural Sciences in Archaeology, 4(2), 33–46.
Lock, G., & Harris, T. M. (1996). Danebury revisited: An English iron age hillfort in a digital
landscape. In M. Aldenderfer & H. D. G. Maschner (Eds.), Anthropology, space and geographic
information systems (pp. 214–240). Oxford University Press.
Lotfian, M. 2016. Urban climate modeling, case study of Milan city. Master thesis, Politecnico di
Milano.
MacEachren, A. M., Roth, R. E., O’Brien, J., et al. (2012). Visual semiotics & uncertainty
visualization: An empirical study. IEEE Transactions on Visualization and Computer Graphics,
18(12), 2496–2505. https://doi.org/10.1109/TVCG.2012.279
MacEachren, A. M., Roth, R. E., O’Brien, J., et al. (2018). Visual semiotics & uncertainty
visualization: an empirical study. IEEE Trans. Vis. Comput. Graph., 18 (12), 2496–2505. http:/
/doi.org/10.1109/TVCG.2012.279
Martin-Rodilla, P., Pereira-Farina M., Gonzalez-Perez, C. 2019. Qualifying and quantifying
uncertainty in digital humanities: A fuzzy-logic approach. In Seventh international conference
on technological ecosystems for enhancing multiculturality, 16–18 October 2019, Leon.
McBurney, P., & Parsons, S. (2001). Representing epistemic uncertainty by means of dialectical
argumentation. Annals of Mathematics and Artificial Intelligence, 32(1–4), 125–169.
Mink, P., Ripy, J., Bailey, K., & Grossardt, T. H. (2009). Predictive archaeological modeling
using GIS-based fuzzy set estimation: A case study in Woodford County, Kentucky (Kentucky
Transportation Center Faculty and Researcher Publications. 12). https://uknowledge.uky.edu/
ktc_facpub/12
Moraczewski, I. R. (1993). Fuzzy logic for phytosociology II. Generalizations and predictions.
Vegetatio, 106(1), 13–20.
Morrison, M. S. (2015). Reconstructing reality: Models, mathematics, and simulations. Oxford
University Press.
Munzner, T. (2014). Visualization analysis and design. CRC Press.
Nagypál, G., & Motik, B. (2003). A fuzzy model for representing uncertain, subjective, and vague
temporal knowledge in ontologies. In R. Meersman, Z. Tari, & D. C. Schmidt (Eds.), On the
move to meaningful internet systems. Springer.
Niccolucci, F., & Hermon, S. (2003). La logica fuzzy e le sue applicazioni alla ricerca archeologica.
Archeologia e Calcolatori, 14, 97–110.
Niccolucci, F., & Hermon, S. (2004). A fuzzy logic approach to reliability in archaeological virtual
reconstruction, in Proceedings of the 2004 Computer Applications in Archaeology (CAA)
Conference

314

M. E. Castiello

Niccolucci, F., & Hermon, S. (2015). Time, chronology and classification. In J. A. Barceló & I.
Bogdanovic (Eds.), Mathematics and archaeology. Taylor & Francis.
Niccolucci, F., D’Andrea, A., & Crescioli, M., 2001. Archaeological applications of fuzzy
databases. In Z. Stančič & T. Veljanovski (Eds.), Computing archaeology for understanding the
past. CAA 2000. Computer applications and quantitative methods in archaeology. Proceedings
of the 28th conference, Ljubljana, April 2000, pp. 107–116.
Oštir, K., Kokalj, Ž., Saligny, L., Tolle, F., Nunninger, L., avec la collaboration de F. Pennors
et K. Zaksek. (2007). Confidence maps: A tool to evaluate archaeological data’s relevance in
spatial analysis. In Layers of perception. Proceedings of the 35th computer applications and
quantitative methods in archaeology conference, Berlin, Germany, April 2–6, 2007, Bonn, pp.
272–277.
Padilla, L. M. K, Powell, M, Kay, M., & Hullman, J. (2021). Uncertain About Uncertainty: How
Qualitative Expressions of Forecaster Confidence Impact Decision-Making With Uncertainty
Visualizations. Front. Psychol. 11:579267. https://doi.org/10.3389/fpsyg.2020.579267
Piotrowski, M. (2019). Accepting and modeling uncertainty. In v. A. Kuczera, T. Wübbena, &
T. Kollatz (Eds.), Die Modellierung des Zweifels – Schlüsselideen und -konzepte zur graphbasierten Modellierung von Unsicherheiten (Zeitschrift für digitale Geisteswissenschaften, 4).
Ragin, C. C. (2000). Fuzzy-set social science. University of Chicago Press.
Ramos-Soto, A., Alonso, J. M., Reiter, E., & van Deemter, K. (2017). An empirical approach for
modeling fuzzy geographical descriptors. IEEE.
Refsgaard, J. C., van der Sluijs, J. P., Etejberg, A. L., & Vanrollegham, P. A. (2007). Uncertainty
in the environmental modeling process—A framework and guidance. Environmental Modeling
and Software, 22, 1543–1556.
Roberts, D. W. (1986). Ordination on the basis of fuzzy set theory. Vegetatio, 66, 123–131.
Rogers, S. R., Fischer, M., & Huss, M. (2014). Combining glaciological and archaeological
methods for gauging glacial archaeological potential. Journal of Archaeological Science, 52,
410–420. https://doi.org/10.1016/j.jas.2014.09.010
R Core Team. (2018). R: A language and environment for statistical computing. Vienna, Austria:
R Foundation for Statistical Computing. Available at http://www.R-project. org/
Sattler, R. (1996). Classical morphology and continuum morphology: Opposition and continuum.
Annals of Botany, 78, 577–581.
Savage, L. (1972). The foundation of statistics. Dover.
Shafer, G. (1976). A mathematical theory of evidence. Princeton University Press.
Shannon, C. E. (1948). A mathematical theory of communications. The Bell System Technical
Journal, 27, 379–432.
Smithson, M. (1989). Ignorance and Uncertainty: Emerging Paradigms. New York: SpringerVerlag. https://doi.org/10.1007/978-1-4612-3628-3
Spiegelhalter, D. (2017). Risk and uncertainty communication. Annual Review of Statistical
Applications, 4, 31–60.
Taheri, S. M., Ghadim, F. I., & Kabirian, M. (2019). Application of fuzzy inference systems in
archaeology. In 7th Iranian joint congress on Fuzzy and Intelligent System, Iran, Bojnurd, 29–
31 January 2019.
Valavi, R., Elith, J., & Guillera-Arroita, G. (2019). blockCV: An r package for generating spatially
or environmentally separated folds for k-fold cross validation of species distribution models.
Methods in Ecology and Evolution, 10(2), 225–232. https://doi.org/10.1111/2041-210X.13107
van der Leeuw, S. 2016. Uncertainties. In: Brouwer Burg, M Peeters J and Lovis W (Eds.)
Uncertainty and sensitivity analysis in archaeological computational modeling. Springer.
Van Leusen, P. M. (2002). Pattern to process: Methodological investigations into the formation
and interpretation of spatial patterns in archaeological landscapes. PhD thesis, Faculty of
Arts. Available at: http://dissertations.ub.rug.nl/faculties/arts/2002/
Van Leusen, M., Millard, A. R., & Ducke, B. (2009). Dealing with uncertainties in archaeological
prediction. In H. Kamermans, M. van Leusen, & P. Verhagen (Eds.), Archaeological prediction
and risk management: Alternatives to current practice. (pp. 123–160). Leiden: Leiden University Press.

13 Computational Processing of Language Vagueness for Archaeological Site. . .

315

Vaughn, S., & Crawford, T. (2009). A predictive model of archaeological potential: An example
from northwestern Belize. Applied Geography, 29(4), 542–555.
Verhagen, P. (2007). Case studies in archaeological predictive modelling. PhD thesis, Leiden
University Press.
Yager, R. R., & Filev, D. P. (1994). Essentials of fuzzy modeling and control. Wiley.
Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8, 338–355.
Zikmund-Fisher, B. J., Witteman, H. O., Dickson, M., et al. (2014). Blocks, ovals, or people? Icon
type affects risk perceptions and recall of pictographs. Medical Decision Making, 34(4), 443–
453.

Part III

The Future

Chapter 14

Future Directions
Cesar Gonzalez-Perez, Martín Pereira-Fariña, and Patricia Martín-Rodilla

Now we reach the end of the book. Over 13 chapters, we have described a
number of conceptual approaches and computational techniques for discourse and
argumentation analysis in archaeology. Our aim has been to offer a consolidated and
integrated view of various works and research lines, which are often found scattered
across different fields. By combining the expertise of specialists in discourse
analysis, argumentative analysis, natural language processing, archaeology, and
digital humanities, we hope to have succeeded in our goal.
In Part I, composed of 7 chapters, we have addressed the ways in which
the production and understanding of different genres of archaeological discourse
(technical reports, scientific papers, dissemination documents) can benefit from
incorporating discourse analysis methodologies and principles. In Part II, composed
of 5 chapters, we have described a sample of computational techniques that can be
applied to partially or fully automate most of the conceptual approaches described
in Part I.

C. Gonzalez-Perez (!)
Incipit CSIC, Santiago de Compostela, Spain
e-mail: cesar.gonzalez-perez@incipit.csic.es
M. Pereira-Fariña
Department of Philosophy, University of Santiago de Compostela, Santiago de Compostela, Spain
e-mail: martin.pereira@usc.es
P. Martín-Rodilla
Department of Computer Science and Information Technologies, University of A Coruña,
A Coruña, Spain
e-mail: patricia.martin.rodilla@udc.es
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
C. Gonzalez-Perez et al. (eds.), Discourse and Argumentation in Archaeology:
Conceptual and Computational Approaches, Quantitative Archaeology
and Archaeological Modelling, https://doi.org/10.1007/978-3-031-37156-1_14

319

320

C. Gonzalez-Perez et al.

14.1 Areas to Develop
This book is just a starting point. Although computer systems are widely used today
to store and process data, their usage for the storage and processing of discourses is
still uncommon. We hope that this book can work as a guide to elaborate guidelines
and protocols to aid in the adoption of the proposed approaches and techniques
by archaeologists, as well as help in the implementation of computer systems
that are capable of processing discourses and arguments as their primary kind of
information.
To achieve this, a few areas need to be developed further. First, we believe that
natural language processing and other language technologies are being primarily
applied to the analysis of plain text, that is, text that has been decoupled from its
context and purpose. Current approaches and algorithms are capable of parsing,
analysing and manipulating the elements in a piece of text, but this is not sufficient.
We argue that language technologies like these should aim to raise their level of
abstraction so that they can deal with discourses, rather than texts, that is, language
as being used by particular agents with specific purposes and in specific social
contexts. In this manner, the context where a discourse is produced, the intention
of the speaker, and the relationships between the discourse and the entities in the
world would become an object of study in addition to the lexicon, structure and
meaning of the text itself. Incorporating these contextual aspects will require new
conceptualisations and new computational techniques.
Second, visualisation and dissemination techniques must be improved. Displaying the result of an argument analysis on a computer screen, for example, requires
a large amount of screen real estate, and still fails to convey the necessary details
for a comprehensive and deep understanding of what is being represented. We need
to devise new visualisation techniques that can easily expose the large and complex
networks of discourse elements on a two-dimensional screen, either for researchers
or for the general public as a dissemination vehicle. The addition of interaction
to these visual devices would add an extra layer of value to the exploration and
comprehension of argumentation structures. This interaction can be implemented
in different ways, such as by means of the automatic production of linguistic
summaries as a response to queries formulated by users, or even conversational
agents that would allow users to engage in a conversation that dynamically navigates
the results of the argument analysis.
A third area that needs further development is that of the applicability of these
conceptual approaches and computational techniques. This book has been specially
oriented towards archaeology, but nothing precludes the presented approaches and
techniques from being applied also to anthropology, history, and other related
disciplines. Doing this will probably require new conceptual developments as well
as discipline-specific trials and experiments.

14 Future Directions

321

14.2 The Future
Of course, anything that we say about the future is quite speculative, but we want
to conclude by examining some plausible scenarios where we may find ourselves
in the near future. Once the areas described in the previous section are properly
developed, we will be able to attain new standards with regard to discourse and
argumentation analysis in archaeology. First and foremost, open and public science
will be truly achievable, as the justification of knowledge will be almost universally
accessible. In other words, if we are able to unpack, visualise and explain the
complete argumentative structure of any claim, anyone will be able to assess how
well supported that knowledge is.
Second, a new kind of scientific repository will become possible. We know
dataset repositories, which store and make available datasets for public reuse; or
document repositories, which store and serve documents. By developing the areas
described above, however, we would be able to create argument repositories, which
would contain large meshes of interconnected entities, claims about these entities,
and argumentation relationships (such as inferences, conflicts or rephrasings)
between them. These meshes would be constructed from many different works
by many different authors, so a truly multi-vocal and intertextual account of the
archaeological record would be possible. Tasks that today are cumbersome and
difficult, such as gathering comprehensive bibliography on a particular site or find,
would become extremely easy.
Third, knowledge generation from argument repositories through artificial intelligence (AI) would become possible. Current AI techniques are capable of detecting
patterns, mining for hidden relationships, and learning about them. Once argument
repositories are available, current and future AI approaches will be able to generate
new knowledge in an automated or semi-automated manner. This will allow the
scientific community to detect and fight against fraudulent research where data or
evidences are falsified or strongly biased in favour of any given agenda.
In addition to these advances, some new challenges and issues will appear as
well. First, open argument repositories will be vulnerable to security and safety
threats, so proper measures will need to be implemented. Given the fact that these
repositories will be expected to constitute a reliable source of knowledge, their
robustness against tampering or accidental misuse should be critical concerns.
Second, bias will be injected in argument repositories. Human knowledge is
never exempt of bias, which would be certainly captured and “fossilised” into
repositories, as it is today captured and preserved in reports or books. It is difficult
to assess how big a problem this will be. On the one hand, argument repositories
as described above will be much larger and complex than any book or resource that
exists today, so that bias will accumulate in larger quantities. On the other hand,
argument repositories will contain specific features to deal with multi-vocality and
subjective perspectives, so that the fact any particular claim is subjectively biased
may not constitute a significant problem after all. In any case, we need to be aware
of bias injection, and design systems that can handle it adequately.

322

C. Gonzalez-Perez et al.

Third, new knowledge generated through the application of AI on argument
repositories could result, in some occasions, in morally challenging or even
unacceptable situations. We can imagine, for example, that an AI system reaches
the conclusion, by working on an argument repository, that some highly valued
archaeological monument must be abandoned, destroyed, or interpreted according
to some specific perspective to the detriment of others. Finding a moral justification
for this kind of results will be difficult. Explainable AI is an active area of research
today, so we may expect to see AI approaches in the near future that are better at
explaining why a result is the way it is, thus allowing us to find improved moral
justifications for them.
Fourth, this envisioned future may entail a threat to creativity in archaeology.
If computer-based argument repositories are taken as a reliable major source
of knowledge, the human role in devising new research questions, proposing
hypothesis, making interpretations and overall producing knowledge is likely to
be challenged. Even when unaware, archaeologists will possibly become biased
by the very information in the repositories. Again, it is difficult to forecast how
much the built-in multi-vocality and subjectivity management features of our future
knowledge systems will be able to guard us from this.
We leave you with these thoughts. Language is perhaps the most human of traits
and, in this book, we have argued that discourse and argumentation in archaeology
can be somehow tamed through analysis and computerisation. How far can we go
down this road? How far should we venture?