Research Studies

Abstracts - Project List by Title

Title	Code
AI in records and archives management in China: Interaction and Co-evolution AI in records and archives management in China: Interaction and Co-evolution GS03/	GS03/
AI in the Middle ages-Arrangement of ancient documents via appearance-based recognition AI in the Middle ages-Arrangement of ancient documents via appearance-based recognition AD03/	AD03/
AI literacy for record management and Archives AI literacy for record management and Archives MA02/	MA02/
AI tools supporting records creation and recordkeeping in Brazil and Mexico AI tools supporting records creation and recordkeeping in Brazil and Mexico 03/	03/
AI Tutorials AI Tutorials GS01/	GS01/
AI-Assisted Digitization of Archives and Documentary Heritage Materials AI-Assisted Digitization of Archives and Documentary Heritage Materials RA03/	RA03/
Analyzing Public Data Sets Analyzing Public Data Sets AD02/	AD02/
Artificial Intelligence Applied to Cultural Heritage: Automatic Construction and Open Source Use of Trustworthy Knowledge Graph of Ancient Chinese and Italian Medical Texts Artificial Intelligence Applied to Cultural Heritage: Automatic Construction and Open Source Use of Trustworthy Knowledge Graph of Ancient Chinese and Italian Medical Texts AD06/	AD06/
Artificial Intelligence, Warfare, and Evidence in International Contexts Artificial Intelligence, Warfare, and Evidence in International Contexts CU09/	CU09/
Building & Creating a Digital Twin for Preservation Building & Creating a Digital Twin for Preservation CU04/Retention and Preservation	CU04/Retention and Preservation
Case Study on Extraction and Identification of Records containing Personal Data and Sensitive Personal Data for Long Term Preservation Case Study on Extraction and Identification of Records containing Personal Data and Sensitive Personal Data for Long Term Preservation RA02/	RA02/
Comparative assessment of ethical codes of the archival/records management and artificial intelligence communities Comparative assessment of ethical codes of the archival/records management and artificial intelligence communities MA01/	MA01/
Competencies from training programs to market needs Competencies from training programs to market needs MA10/	MA10/
Emergency Communications and Critical Incident Response Data Emergency Communications and Critical Incident Response Data CU01/	CU01/
Employing AI for Retention & Disposition in Digital Information and Recordkeeping Systems (DIRS) Employing AI for Retention & Disposition in Digital Information and Recordkeeping Systems (DIRS) AA01/	AA01/
Enterprise Master Data Management and the Role of Metadata Enterprise Master Data Management and the Role of Metadata RP02/	RP02/
Exploratory Study on Veracity and Truth Discovery of Multimedia Data in the Era of Generative AI Exploratory Study on Veracity and Truth Discovery of Multimedia Data in the Era of Generative AI GS04/	GS04/
Exploring Possible Uses of AI to Support Research in Archives Exploring Possible Uses of AI to Support Research in Archives RA09/	RA09/
Feasibility study: Creating a web archives in universities in Iran and South Africa Feasibility study: Creating a web archives in universities in Iran and South Africa RP05/	RP05/
Gamification of archival experience for users Gamification of archival experience for users RA04/	RA04/
Identification of critical archival challenges which are the best candidates for improvement by AI technologies in the context of retention and preservation of digital records Identification of critical archival challenges which are the best candidates for improvement by AI technologies in the context of retention and preservation of digital records RP01/	RP01/
Increasing Access to Photos, Videos and Social Media records through AI-generated Descriptive Metadata Increasing Access to Photos, Videos and Social Media records through AI-generated Descriptive Metadata RA01/	RA01/
Inova HFA-Using AI to better manage health records and patient outcomes Inova HFA-Using AI to better manage health records and patient outcomes CU06/	CU06/
Intensional logic-based AI for the Records in Contexts Conceptual Model (RiC-CM) and Ontology (RiC-O) Intensional logic-based AI for the Records in Contexts Conceptual Model (RiC-CM) and Ontology (RiC-O) AD04/	AD04/
Investigating the Use of AI technologies in the Realm of E-Government Development Investigating the Use of AI technologies in the Realm of E-Government Development CU07/	CU07/
ITrust AI Metastudy ITrust AI Metastudy MA06/	MA06/
Language identification for digital preservation workflows Language identification for digital preservation workflows RP06/	RP06/
Maturity Assessment for Appraisal in the AI Age Maturity Assessment for Appraisal in the AI Age AA02/	AA02/
Metadata and Metadata Models in the AI Context Metadata and Metadata Models in the AI Context MA07/	MA07/
Personal Information Content Assessment Personal Information Content Assessment MA05/	MA05/
Preserving AI Techniques as Paradata Preserving AI Techniques as Paradata RP04/	RP04/
Privacy in Digital Records Containing Personally Identifiable Information (PII)-An Exploration of Current Status and the potential of AI Tools and Techniques Privacy in Digital Records Containing Personally Identifiable Information (PII)-An Exploration of Current Status and the potential of AI Tools and Techniques RA06/	RA06/
Recordkeeping practices of creators using AI to generate images Recordkeeping practices of creators using AI to generate images CU08/	CU08/
Records Classification using Natural Language Processing Techniques to Support Trustworthy Public Digital Record Records Classification using Natural Language Processing Techniques to Support Trustworthy Public Digital Record CU02/	CU02/
Representing Archival Record Sets for Machine Learning Experts Representing Archival Record Sets for Machine Learning Experts MA08/	MA08/
Research on the process of declassifying personal information using AI tools - Israel State Archives case study Research on the process of declassifying personal information using AI tools - Israel State Archives case study RA07/	RA07/
Smart Grid Data Communication and Analytics Smart Grid Data Communication and Analytics CU03/	CU03/
Tatuoca Magnetic Observatory: from paper to intelligent bytes Tatuoca Magnetic Observatory: from paper to intelligent bytes AD05/	AD05/
Teachable AI for Arrangement and Description Teachable AI for Arrangement and Description AD01/	AD01/
Terminology Database Terminology Database GS02/	GS02/
The Development of Ethical Guidelines for AI use with Records - SUSPENDED The Development of Ethical Guidelines for AI use with Records - SUSPENDED MA03/	MA03/
The role of AI in identifying or reconstituting archival aggregations of digital records and enriching metadata schemas The role of AI in identifying or reconstituting archival aggregations of digital records and enriching metadata schemas CU05/	CU05/
The Role of Records and RM in Environments Where Trustworthy AI is the Focus The Role of Records and RM in Environments Where Trustworthy AI is the Focus MA04/	MA04/
Traditional AI in Records Creation through Acquisition Traditional AI in Records Creation through Acquisition GS05/	GS05/
Trusted access/use of archives and AI: a conceptual model Trusted access/use of archives and AI: a conceptual model RA08/	RA08/
Use of AI Tools in Local Governments and its Implications on Public Records Management Use of AI Tools in Local Governments and its Implications on Public Records Management MA09/	MA09/
User approaches and behaviours in accessing records and archives in the perspective of AI-A global user study User approaches and behaviours in accessing records and archives in the perspective of AI-A global user study RA05/	RA05/

Creation and Use Back to Top

Title: Emergency Communications and Critical Incident Response Data (CU01)

Lead Researcher: Michael Stiber, University of Washington, Bothell

Timeline: September 2021-August 2026

Abstract: This study will explore fundamental challenges associated with a particular use case for AI in records and archives: data from critical systems surrounding incidents such as natural disasters or actions by bad actors. This study leverages work ongoing at the University of Washington Bothell (UWB) and at the University of British Columbia (UBC). At UWB the work has been focused on developing simulations of emergency public communications systems, such as "next generation 911" (NG911) in North America, and to manage such work via a system that collects, visualizes, and manages provenance information from simulation artifacts. These systems, Graphitti and Workbench, are described at https://depts.washington.edu/biocomp/index.html. At UBC the focus is on building and preserving digital disaster archives in Japan from the point of view of archival studies, such as the 2011 Great East Japan Earthquake archives.
Title: Smart Grid Data Communication and Analytics (CU03)

Lead Researcher: Tracey Lauriault, Carleton University

Timeline: 2021-2026

Abstract: With the popularity of smart electrical appliances and home energy management systems, massive amounts of data are generated about electricity consumption. These data are beneficial for the utility companies as they provide information about the behaviour patterns of consumers, and these can also inform decisions on how to optimize the load on the grid. The data obtained from the communication system is stored in a database hosted in the cloud. Our aim is to communicate the data communication system between the transformer agent (TA), attached to a neighbourhood’s electric transformer, and its customer agents (CAs) attached to each house using inexpensive and common-use devices and modules and to process these data to help utility companies design better demand side management (DSM) programs for the efficient transmission and distribution of energy. This solves the problem of balancing electric demand and supply at the grid and reduces peak demands, which helps lower the electricity bills for the consumers. In this context, we analyze household electricity consumption data to forecast energy consumption for short-term (hours/days ahead) and long-term (weeks/months ahead). What records are generated in these systems, what and how should they be preserved, and what role will artificial intelligence play?
Title: Records Classification using Natural Language Processing Techniques to Support Trustworthy Public Digital Record (CU02)

Lead Researcher: Umi Mokhtar, Universiti Kebangsan

Timeline: 2021-2026

Abstract: In the library field, classification is used for retrieval and searching, whereas in records management, classification is designed to be used for preservation purposes and to maintain required records characteristics. The authenticity, reliability, integrity, and usability of records must be maintained throughout their lifecycle. This study will extend the Functional Model for Classification: The Records Management Approach developed by the lead researcher in 2015 to embed AI techniques to automate the classification of records.
Title: Building & Creating a Digital Twin for Preservation (CU04)

Lead Researcher: Tracey P. Lauriault, Associate Professor, Critical Media and Big Data, School of Journalism and Communication, Carleton University, Ottawa CA

Timeline: 2021-2026

Abstract: The Imagining the Canada Digital Twin (ICDT) project funded by the Canadian New Frontiers in Research Fund proposes a national, inclusive, and multidisciplinary research consortium for the creation of a technical, cultural, and ethical framework to build Canada’s digital twin. ICDT will focus on of the built environment, concentrating on the Architecture, Engineering, Construction, and Owner Operator (AECOO) industry. ICDT is led by the Carleton Immersive Media Studio (CIMS) developing a DT prototype of the Montréal-Ottawa-Toronto corridor using a simulated, distributed server network. A DT is an ecosystem of multi-dimensional and interoperable subsystems made up of physical things in the real-world, digital versions of those real things, synchronized data connections between them and the people, organizations and institutions involved in creating, managing, and using these. For this I Trust AI study, an interdisciplinary research team of architects, data scientists, engineers, building scientists, archival professionals and critical data studies scholars, from Carleton University CDN, Luleå University of Technology SE, the Swedish Transport Administration and the University of Florence IT will develop a Use and Creation preservation case study and a test bed to preserve a DT artifacts created as part of the SUSTAIN and the Carleton Digital Campus Innovation (DCI) project integrating Building Performance Simulation (BPS) technologies with BIM on a campus scale. This will involve building information management systems (BIM), Asset Management Systems (AMS), visualizations as part of the Unreal Game Engine, VR and modelling, AI/ML, Real-time data for decision making. Our research questions are Can a digital twin be preserved and what is required at the point of creation to ensure that it can be? Can the AI, automation and real time data involved in this complex data, social and technological system be preserved? And what might be the role of AI/ML be in terms of creating an archival package to ingest a digital twin. The outputs of this research will provide empirical data to meet the objectives of the I Trust AI Project; data governance research as part of the ICDT project; will provide Carleton University with the opportunity to test the preservation of Campus DT records in its institutional archives and will inform the technology sectors involved in the creation of DTs, a nascent technological system and sector.
Title: The role of AI in identifying or reconstituting archival aggregations of digital records and enriching metadata schemas (CU05)

Lead Researcher: Mariella Guercio, Associazione Nazionale Archivistica Italiana-ANAI, and Stefano Allegrezza, Università di Bologna

Timeline: January 2022-July 2023

Abstract: The uncontrolled creation of a huge numbers of current records with missing metadata, necessary to ensure the reliability, trustworthiness, quality and sustainability of appraisal and acquisition, is a common and complex problem today, including: 1. Records managed by ERMS without the full set of information required for proper records creation; 2. Business systems that create and manage records with only partial identification and procedural information; 3. Records created by systems without metadata and without being integrated in ERMS, including email repositories. This study will use case studies to explore the question: Can we use AI tools to constitute or reconstitute archival aggregations and create metadata schema for them? It will assess existing AI technologies that can address the problem of non-aggregated, unarranged, or de-contextualized records both in the current and semi-current phases of their lifecycle in order to ensure an accurate appraisal and guarantee managed and controlled transfer procedures. Further, it will identify archival requirements for new tools, which should be developed according to archival concepts and principles. In particular the study aims to assess: - the possibility of using AI tools to re-establish the archival bond among a multitude of de-contextualized records; - the possibility of using AI tools to integrate incomplete recordkeeping metadata schemas; - the capability of existing AI technologies to address the critical archival issues mentioned above; and - the ability of archival concepts and principles to inform new AI tools aligned with the archival needs named above.
Title: Inova HFA-Using AI to better manage health records and patient outcomes (CU06)

Lead Researcher: Claudio Gottschalg-Duque

Timeline: September 2021-December 2023

Abstract: This study will investigate the use of AI (NLP), DLT and Smart Contracts to solve the health records (EHR) management problems, including how to use the health records register to improve patients' health outcomes (early recognition of clinical deterioration), while respecting the General Personal Data Protection Law (LGPD, Brazilian law number 13.709-2018). Can AI help researchers, clinicians and administrators improve their services efficiently and effectively using patient data without invading patient privacy? The objective is to implement Hospital 4.0 in the HFA through the help of academia and IT processes.
Title: Investigating the Use of AI technologies in the Realm of E-Government Development (CU07)

Lead Researcher: Proscovia Svärd

Timeline: February-October 2022

Abstract: This study will explore what AI technologies are being used within the realm of e-government development and what recordkeeping challenges can be identified, specifically a) at what points in the processes where AI is being deployed are records created? b) How are they created and captured for use? c) How are they pluralized? d) Are there any challenges that can addressed by AI? By conducting a systematic literature review, the study team will: 1. Establish legislative and regulatory guidelines in three selected countries (Sweden, Finland and South Africa) that inform e-government development pertaining to different AI technologies and the creation and use of records. 2. Identify in the different countries, key trendsetters (national government agencies/municipalities) that utilise AI towards e-government development. The focus is to determine what, and the extent to which these trendsetting organisations utilise AI towards e-government 3. Identify recordkeeping challenges during the utilisation of AI within the realm of e-government development.
Title: Recordkeeping practices of creators using AI to generate images (CU08)

Lead Researcher: Jessica Bushey

Timeline: April 2023-April 2025

Abstract: This study will explore common tools and technologies for AI image creation, workflows and/or best practices, the application of these AI images in the areas of news reportage, law enforcement, and medical diagnostics, and the preservation of AI images (and paradata about their generation) for the long term. This study will contribute a new perspective to the discourse on photographic representation and the trustworthiness of AI generated images and assist archivists in determining whether AI generated images (and paradata) are being created as reliable records and are being managed and preserved in a manner that ensures their authenticity for the long term. The outcomes of this study can assist creators and preservers in implementing controls over creation, use and preservation to address potential risks posed by proprietary AI tools and technologies to the continuing access and longevity of AI generated images collections.
Title: Artificial Intelligence, Warfare, and Evidence in International Contexts (CU09)

Lead Researcher: James Lowry

Timeline: March 2024-March 2026

Abstract: Archival and legal concepts of evidentiality in records are constantly tested by developments in information technologies; artificial intelligence presents particular problems for evidentiality because of the responsibility gap created by machine learning. This project considers such problems in the context of warfare, where the use of autonomous weapons systems has called into question the robustness of international law for the regulation of just war and the prosecution of war crimes. What can archival perspectives on record-making in the battlespace bring to our understanding of evidentiality in the context of international jurisprudence? This project is being conducted by the Archival Technologies Lab at the City University of New York as part of the iTrustAI Creation and Use Working Group.

Appraisal and Acquisition Back to Top

Title: Employing AI for Retention & Disposition in Digital Information and Recordkeeping Systems (DIRS) (AA01)

Lead Researcher: Pat Franks, San Jose State University

Timeline: October 2021-September 2024

Abstract: Not all records that would benefit from storage in a trusted digital repository (or other electronic storage solution) must be preserved indefinitely. We must, therefore, trust not only that our records are being preserved for access as long as necessary but also that those records that must be disposed of can be done according to a defensible retention and disposition schedule. The records management function provides controls related to records disposition following an approved retention schedule. AI may provide the means to dispose of such records accurately and efficiently even if stored in trusted digital repositories that were not designed to facilitate disposition. This study will investigate how AI can be used to not only implement but also create retention schedules, enable litigation controls, provide PII security, and ensure consistency with organization-wide policies and procedures.
Title: Maturity Assessment for Appraisal in the AI Age (AA02)

Lead Researcher: Basma Makhlouf Shabou

Timeline: March 2023-February 2025

Abstract: Assessing the maturity of appraisal processes and tools will allow us to identify the archival, technical, technological, cultural, and strategical barriers and facilitators to effectively apply AI tools for appraisal processes. This study will develop a model to assess the appraisal readiness for AI and address the following questions: 1) how defensible are current appraisal decisions? 2) How stable, coherent, and reproducible are appraisal practices? 3) What are the prerequisite conditions of AI integration to a given appraisal practice? 4) How are data, records and archives prepared to be appraised automatically and or “smartly”? 5) What are the complementary actions to upgrade appraisal practices for the AI facilities?
Title: AI tools supporting records creation and recordkeeping in Brazil and Mexico (03)

Lead Researcher: Alicia Barnard and Claudia Lacombe

Timeline: January 2025-December 2025

Abstract: The literature review carried out in the AA01 study highlighted that little has been written specifically on the use of AI for retention and disposition activities. IHowever, various AI tools are being successfully used to aid other recordkeeping activities, such as classification. Can some of these solutions could be used, adapted or improved to support appraisal, for example determining retention periods and/or disposition requirements? Taking into account that classification impacts archival appraisal and selection activities, it may be relevant to look in more detail at possible activities in records creation and recordkeeping that later can improve the use of AI for appraisal in the Latin America context. Goals: 1- identify whether there are activities in the ongoing creation and recordkeeping processes that may impact records appraisal and disposition; 2- identify whether and what ML or other AI tools are being used to support the creation and recordkeeping processes, such as classification, selection and data protection.

Arrangement and Description Back to Top

Title: Teachable AI for Arrangement and Description (AD01)

Lead Researcher: Richard Arias Hernandez, University of British Columbia

Timeline: October 2021-September 2023

Abstract: This study aligns with ITrust AI’s overall goal "to design, develop, and leverage Artificial Intelligence to support the ongoing availability and accessibility of trustworthy public records" by focusing on creating lesson plans and educational materials for archival students, archivist, and records managers to be able to at least leverage (and possibly "design") Artificial Intelligence to support the ongoing availability and accessibility of trustworthy public records in the areas of archival description and arrangement. This project can be the basis for, or join a bigger project that focuses on broader curriculum development of AI for Archival Science.
Title: Analyzing Public Data Sets (AD02)

Lead Researcher: Ozgur Kulcu, Haceteppe University

Timeline: October 2021-September 2023

Abstract: This study will explore how to analyze archival contents and what meaningful results can be obtained by using AI technologies including data mining and machine learning. The study will use digital archival content produced by different public institutions in Turkey. Topics to be investigated include what support unstructured big data archives can provide for the institutional decision-making process, and detection of missing and incorrect information in data archives, automatic classification, topic generation and subject detection, and automatic processes management.
Title: AI in the Middle ages-Arrangement of ancient documents via appearance-based recognition (AD03)

Lead Researcher: Benedetto Luigi Compagnoni

Timeline: 2021-2026

Abstract: Interest in applying Artificial Intelligence to image data analysis is growing, and scientists are increasingly using it as a powerful, complex, tool for statistical inference. Computer-based image analysis provides an objective method of scoring visual content independent of subjective manual interpretation, while potentially being more sensitive, consistent and accurate. This study will test AI tools for an appearance-based recognition of the "signum tabellionis" of ancient parchments and documents. The approach that will be used, based on the use of neural networks, aims at reducing manual annotation and at the same time at using manual annotation as a form of continuous learning. The whole system needs manual tagging of large training data. All manually verified data will be used as continuous learning and will be maintained as training datasets. A deep neural network based on an object detector will be used to recognise the "signum tabellionis" of the parchments. This system concerns not only the recognition and classification of the objects present in the images, but also the location of each of them. The study expands the implementation of AI in archival science: a method that could be reproduced by many other Archives and for different types of documents.
Title: Intensional logic-based AI for the Records in Contexts Conceptual Model (RiC-CM) and Ontology (RiC-O) (AD04)

Lead Researcher: Hugolin Bergier

Timeline: TBD

Abstract: The purpose of this study is to establish mutual understanding between the archival and AI fields within the context of archival arrangement and description. The team will identify specific AI technologies that can address critical archival challenges: the research will apply an enriched intensional formalism using logic-based AI to analyze and enrich the Records in Context ontology (RiC-O).
Title: Tatuoca Magnetic Observatory: from paper to intelligent bytes (AD05)

Lead Researcher: Cristian Berrio Zapata

Timeline: April 2022-March 2026

Abstract: Tatuoca Magnetic Observatory (TTB) is located in the state of Pará, in the Brazilian Amazon, and is one of the few observatories of its kind on the equator. It began in 1933, when the International Polar Year Commission subsidized the installation of magnetographs on the island of Tatuoca, in response to the requirements of the Brazilian National Observatory. In 1957, the Brazilian government built a permanent observatory on the Island that generated data records on paper until 2007, when reporting became digital. This analogue collection represents the scientific patrimony of the Brazilian Amazon and vital information about the magnetic field of the earth, captured in the tropical zone. The documentary corpus includes approximately 38,656 pages, divided between magnetograms (A2 sheets of Kodak photographic paper containing printed curves representing magnetic registers), and tables with daily reports of field values magnetic (A4 sheets on cellulose paper) . The first stages of the project involve securing and digitizing this analogue corpus, and the third stage will model and apply AI tools to extract data from the documents' image.
Title: Artificial Intelligence Applied to Cultural Heritage: Automatic Construction and Open Source Use of Trustworthy Knowledge Graph of Ancient Chinese and Italian Medical Texts (AD06)

Lead Researcher: Emanuele Frontoni

Timeline: 2023-2026

Abstract: Both China and Italy emphasize the conservation, protection, and use of cultural heritage. Yet, there are many issues with the existing methods for the use of cultural heritage, e.g., information islands, knowledge fragmentation, and the lack of cross-language service, etc. Knowledge Graph is a foundational technology to Artificial Intelligence (AI), and it has been used in many fields for various purposes, including the protection of cultural heritage. This study, conducted by researchers at the University of Macerata in Italy and Tianjin Normal University in China, aims to collaboratively perform text mining and semantic network analysis on Chinese and Italian ancient medical texts based on a closer reexamination of the concepts, theories, methodologies, and technologies underlying user’ cognition and language oriented and trustworthy, explainable, and open Knowledge Graph. More specifically, this study aims to accomplish the following goals: On the basis of the concept of trustworthiness and deep learning technology and for the dual purposes of pandemic prevention and regimens, this project will, first, investigate the underlying generation mechanism of Knowledge Graph from the perspective of linguistic lexical chains and user cognitive “attention” theory; then, mine knowledge within ancient Chinese and Italian medical texts and establish their networks by identifying and extracting entities and entity relationship in ancient Chinese and Italian medical texts, and automatically linking them to knowledge bases such as BabelNet (or DBpedia), Thesaurus of Chinese Medicine, etc. and finally, aggregate relevant knowledge and establish semantic equivalents among medical terminologies in Chinese, Italian and English, and develop relevant open source medical terminology service for public use. It is expected that this study will provide better technological support for the protection and use of ancient cultural heritage and their cultural exchange between the East and West.

Retention and Preservation Back to Top

Title: Identification of critical archival challenges which are the best candidates for improvement by AI technologies in the context of retention and preservation of digital records (RP01)

Lead Researcher: Hrvoje Stancic, University of Zagreb

Timeline: October 2021-May 2022

Abstract: This study aims to identify critical archival challenges in the context of retention and preservation of digital records. The issues arising from the implementation of OAIS-based and other digital archive solutions will be investigated. For example, the research will look for repetitive tasks, tasks requiring dealing with a large quantity of digital records as well as other tasks which are the best candidates for improvement by using AI technologies. The research will also aim to identify challenges arising from digital preservation risks. Once the critical challenges are recognized, the specific factors within them will be identified and mapped, and the way how to further address them by AI technologies will be proposed. The study has two research objectives: 1. identify critical challenges to be addressed by AI, and 2. identify within each critical challenge the specific factors to be addressed and how AI might address them.
Title: Enterprise Master Data Management and the Role of Metadata (RP02)

Lead Researcher: Alex Richmond, Bank of Canada

Timeline: 2021-2026

Abstract: As many public agencies are investing in research and infrastructure to advance their data and analytic capabilities, the challenge of mastering enterprise data has surfaced as a key pain point. By combing domain expertise from archival science, specifically descriptive standards and the use of metadata, as well as recent thinking in data warehouse, data lake architectures, and object modeling, the Bank of Canada is proposing a research study centered around a proof of concept at the Bank using the Legal Entity Identifier standard to create an enterprise master data set for financial institutions. In this study we will be looking at various AI technologies that can contribute to the utilization of metadata in the development and maintenance of enterprise master data sets. Further, we will be looking at virtual graph technologies to link various data sets and information assets to increase their accessibility and reusability by stakeholders. Finally, we aim to produce a set of best practices and guidelines in the application of metadata in the creation and maintenance of enterprise master data, which will aid in their findability and access and preservation over time.
Title: Preserving AI Techniques as Paradata (RP04)

Lead Researcher: Pat Franks, San Jose State University and Babak Hamidzadeh, University of Maryland

Timeline: October 2021-September 2024

Abstract: If an AI technique is used to facilitate or automate an archival, recordkeeping, or other process, how much of that AI technique, its code, the data (probably a subset of existing records) we use to train it, test cases and test results to examine its efficacy, its parameters and their values at or over the time of application, the technical environment in which it is executive, and the records it (the AI technique) is applied to for automation purposes, should be preserved? This question is not to preserve AI techniques for their own sake and in and of themselves, but it is to preserve them as contextual materials/information in support of preserving the records they are applied to. As such are they preserved as part of procedural context, technological context, a combination of the two, or other contexts? How do we preserve the pieces that constitute the AI technique (code, training data, test cases, parameters, etc.)? How reproducible should what we preserve be? If there is non-determinism or randomness in any of these AI techniques, how do we identify, characterize, and preserve them? If there is/are human(s) intertwined with the AI technique in the decision-making process, how is the human’s role and his/her relationship with the AI technique captured and preserved? This study will explore these questions, gathering data about the present state of practice, and proposing best practices and solutions.
Title: Feasibility study: Creating a web archives in universities in Iran and South Africa (RP05)

Lead Researcher: Amir Reza Asnafi

Timeline: March 2022-2026

Abstract: Websites are powerful tools for sharing information, and important to the history and identity of an organization or individual. It is important, therefore, to be able to archive essential and valuable information from websites. Universities have an increasing volume of digital content online, including key information available only on the Internet, and necessary measures must be taken to store and protect that content against the risks of web instability. This study examines the feasibility of integrating the web space and preserving the intellectual and cultural heritage, as well as meeting the needs of researchers and the sustainability of information in the digital environment in top 20 universities in each of Iran and South Africa (Webometrics Ranking) using AI tools. The study will incorporate patterns of classification, regression and deep learning to understand: 1. The refresh rate of the web pages of these universities, 2. Provide recommendations to users when searching, 3. Website content storage, 4. Intelligent ranking of results. Applications of this research include: • Determining the status of universities in terms of feasibility of creating a web archive, • Preserving intellectual heritage, • Meeting the needs of researchers and the user community, • Creating an integrated environment on the websites of top universities using artificial intelligence components, • Ensuring sustainability of information in the digital environment and continuous access to it.
Title: Language identification for digital preservation workflows (RP06)

Lead Researcher: Muhammad Abdul-Mageed, Justin Simpson

Timeline: September 2023-April 2024

Abstract: A common problem for archivists and digital preservationists is identifying the language(s) of records in a corpus of material designated for preservation, description, and access. This study will review any existing language identification tools and investigate their appropriateness to be incorporated into an archival or preservation workflow to enable preservation, description, and access to archival materials, and develop tools as required.

Management and Administration Back to Top

Title: Comparative assessment of ethical codes of the archival/records management and artificial intelligence communities (MA01)

Lead Researcher: Jim Suderman

Timeline: 2021-2025

Abstract: This study is based on the archival concept of authentic records being reliable evidence of the past. Archival ethics focus on the application of archival concepts and principles in ethical ways so that archivists and archival organizations are trusted to preserve authentic records and make them available to users in the context in which the records were created. As AI technologies are increasingly used by records professionals and archives, archival and AI ethical issues overlap and intersect. The study asks the questions: 1. What similarities and differences exist between representative ethical codes of the two communities? 2. In what ways might or should the ethical codes of each community influence the other? It will complement the Ethical Guidelines for AI use in Private Sector Organizations (MA03).
Title: AI literacy for record management and Archives (MA02)

Lead Researcher: Moises Rockembach, Universidade Federal do Rio Grande do Sul

Timeline: October 2021-April 2023

Abstract: The uses of artificial intelligence to support records management activities involve not only the development of AI applications, but human-machine interaction. If AI can help us with records management, we need to develop AI literacy to work together and adopt new solutions. How can we develop an AI literacy in the context of records management and archives? This study will identify competencies for critical AI evaluation; analyze the digital transition scenario, and the impacts on labor dynamics; identify the challenges involving communication/interaction between humans and AI; and propose ways to engage in AI-based records management and archives solutions.
Title: The Development of Ethical Guidelines for AI use with Records - SUSPENDED (MA03)

Lead Researcher: Mia Steinberg, Collabware

Timeline: September 2021-September 2023

Abstract: Organizations, especially in the tech sector, will make use of AI for analysis and data comprehension. It would be prudent to have an established set of guidelines for these organizations to use to create in-house ethics review committees and ensure that their AI use is responsible and transparent. The objective of this study is two-fold. First, we seek to identify and create a set of reasonable and prudent practices for the development of an ethics review process within an organization. Once established, we will create a meaningful plan on how these processes would be applied. Therefore, the objectives cover both the procedure of creating an ethics review process as well as its outcome. Records should be authentic evidence and their use and access are already governed by ethical guidelines within the archival profession; this study extends that ethical framework to focus specifically on the application of AI to records, and brings a multidisciplinary approach that gives consideration to the complexities of the technology. This study complements the Comparative assessment of ethical codes of the archival/records management and artificial intelligence communities (MA01).
Title: The Role of Records and RM in Environments Where Trustworthy AI is the Focus (MA04)

Lead Researcher: Sherry Xie, Renmin University of China

Timeline: 2021-2026

Abstract: This study explores the meaning of “trustworthy artificial intelligence” (https://digital-strategy.ec.europa.eu/en/policies/artificial-intelligence) and the roles of digital records and records management in the EU's AI strategy for building trustworthy AI. It will assess the EU's "trustworthy artificial intelligence" initiatives with current developments in (digital) RM in the context of Asia, North America, and the G20. If current (digital) RM developments are not aligned with or represented in the EU's initiative, the study team will establish recommendations on next steps for the RM profession.
Title: Personal Information Content Assessment (MA05)

Lead Researcher: Jim Suderman

Timeline: October 2021-December 2022

Abstract: Privacy protection is a central legal responsibility of public sector organizations. For some such organizations privacy protection is already a risk-based process. Semi-structured digital records, e.g., email, present a significant challenge to assessing risks of privacy breaches because every record must be reviewed for personal information before it can be made generally accessible. Confidence and trust in public sector organizations will be improved if a robust, AI-supported means to assess the scope, type, and location of personal information can help the human experts charged with protecting privacy focus their attention where the risks are highest. This grounds the study not in one or more specific archival principles but in the role of archival organizations and archivists as trusted preservers. To continue to be trusted, archivists, including those managing archival collections, need to be aware of the information in their collection, their legal obligations for administering it, and the ethical and moral responsibilities to assess how changing contexts can affect their responsibilities.
Title: ITrust AI Metastudy (MA06)

Lead Researcher: Ken Thibodeau

Timeline: June 2022-March 2026

Abstract: This study is a review of ITrust-AI proposed and approved studies to identify what AI methods and products they use in relations to which archival purposes. On this basis, a lessons learned report will be produced summarizing the successes and problems encountered in the use of AI in different archival functions. If supported, the results could be used to generate recommendations regarding AI use in archives.
Title: Metadata and Metadata Models in the AI Context (MA07)

Lead Researcher: Joe Tennis

Timeline: August 2022-October 2024

Abstract: This study will review the InterPARES Authenticity Metadata (IPAM) for sufficiency to be used to describe records generated by AI, and investigate, in the context of artificial intelligence systems, what are appropriate models of record creation, preservation, and use, that distinguish between identity and integrity metadata and other forms of documentation.
Title: Representing Archival Record Sets for Machine Learning Experts (MA08)

Lead Researcher: Pat Moore, Isto Huvila

Timeline: January 2023-December 2024

Abstract: This study will investigate how metadata documented according to archival metadata standards map to the information needs of the machine learning community (differences, similarities, opportunities to improve their compatibility, caveats). It will explore what major epistemic or ontological differences exist between archival and machine learning communities (i.e. how things are known differently, how the same terms are used for different concepts, what concepts have primacy in different fields, etc.)
Title: Use of AI Tools in Local Governments and its Implications on Public Records Management (MA09)

Lead Researcher: Yo Hashimoto

Timeline: March 2023-March 2025

Abstract: Japanese law uses the word “administrative documents” to designate official records. It is often narrowly construed, leaving records created by new technologies out of the official recordkeeping system. In such a recordkeeping context, AI tools are being introduced in local governments for a variety of services to achieve higher operational efficiency, often without considering its implications on the management of public records. The aim of this study is threefold. Firstly, it examines what kinds of tools using what kinds of AI technologies are being deployed in local governments and what kinds of data and information such tools produce in the conduct of official business. Secondly, it identifies the records created by AI tools and verifies if they are being managed as “administrative documents.” Lastly, it explores the implications of the use of AI tools on public records management in the present Japanese legal and administrative framework.
Title: Competencies from training programs to market needs (MA10)

Lead Researcher: Anahí Casadesús de Mingo

Timeline: October 2023-March 2026

Abstract: The study will analyze whether or not there are new competencies required for the archivist and records managers due to the emergence of AI technologies and their use in the field. The study will also analyze if market needs are being satisfied by new information professionals from different education programs in different countries. Ultimately this will provide information to evaluate the need for new competencies in these programs.

Reference and Access Back to Top

Title: Case Study on Extraction and Identification of Records containing Personal Data and Sensitive Personal Data for Long Term Preservation (RA02)

Lead Researcher: Alicia Barnard

Timeline: January 2022-June 2023

Abstract: This study aims to develop an algorithm for recognizing and extracting unstructured information that is personal data and sensitive personal data in digitized records (PDF with OCR) by applying artificial intelligence techniques (AI), in particular machine learning algorithms (ML), and look for possible requirements or equivalents of trustworthiness (accuracy, reliability and authenticity) of AI and ML of the product to be obtained.
Title: AI-Assisted Digitization of Archives and Documentary Heritage Materials (RA03)

Lead Researcher: Eng Sengsavang, UNESCO

Timeline: 2022-2026

Abstract: This study will explore the following questions: ● What key archival functions and best practices are carried out in effective digitization projects? ● What AI-based tools are currently being developed and/or used by practitioners and vendors for digitization activities? ● What digitization projects have been implemented, completed, or are in progress that have used AI technologies? ● What are the benefits and risks, limitations and potential biases when using such AI technologies in digitization projects? ● What AI-based tools might be developed in the future, particularly to assist archival functions during the digitization process (pre-digitization, digitization, post-digitization), and particularly solutions that are low-cost or less resource-intensive? The study will accomplish the above by modeling key archival functions carried out during digitization projects that may benefit from AI technologies, whether already developed, emerging, or to be developed in the future. The study will also explore potential low-cost AI solutions, recognizing that digitization is a resource-intensive process, and that resource-strapped organizations, groups, and least developed countries in particular face barriers to digitization.
Title: Gamification of archival experience for users (RA04)

Lead Researcher: Demet Soylu, Haceteppe University

Timeline: 2021-2026

Abstract: This study explores problems related to the user experience during the online access and retrieval process in digital archives. It challenges the existing traditional approach to the access and retrieval process utilized within archives and aims to promote a user-centric approach to online access and retrieval. The study also aims to enable the easy facilitation of archival services for key target group(s) as identified through other ITrust AI research studies. The focus of this study is to improve the technical features related to online access and retrieval of archival records through the synthesis of gamification as a component of machine learning within the AI context.
Title: User approaches and behaviours in accessing records and archives in the perspective of AI-A global user study (RA05)

Lead Researcher: Pierluigi Feliciati

Timeline: October 2021-March 2023

Abstract: This preliminary user study aims to bring the users' perspective to support the definition of requirements and guidelines for developing trustful, valuable, and viable AI tools to improve archival reference and access. Data is lacking internationally on the actual UX of accessing records and archives. In the absence of shared protocols and metrics, users' behaviour and satisfaction (quality of access) studies are driven at the “local” level, limited to specific services. How much do we know about how users perform their research? Do they use personal names? Places? Dates? Functions? Subjects? Are they comfortable with the language of interfaces and records? Are they willing to share their research data to improve archival services? Data collected by involving a sample of final users could reveal more about the satisfaction against existing digital archival reference and access services (access to digital records + access to digitized records and documents + access to digital finding aids and reference and access tools), and the actual expectations and concerns on the application of AI to reference and access archival records. Finally, this study provides an opportunity to better define the reference and access function in an AI context (articulating it in activities, processes, and quality indicators).
Title: Privacy in Digital Records Containing Personally Identifiable Information (PII)-An Exploration of Current Status and the potential of AI Tools and Techniques (RA06)

Lead Researcher: Georg Gaenser, European Free Trade Association

Timeline: 2021-2026

Abstract: This study explores a key barrier (i.e., risk to privacy) to providing open access to digital archival records. It will investigate how archival institutions are protecting privacy in digital records containing PII when providing access to them, how AI tools and techniques could contribute to the challenges faced by archival institutions in providing access to these kind of records, and what are the implications of using AI tools and techniques to deal with privacy issues in records. Specific objectives will be: to identify the main needs/obligations of archival institutions concerning the protection of privacy in the provision of access to digital records (derived from legal/regulatory requirements, institutional policies, standards, social expectations, etc.); to describe current approaches, processes, techniques, and/or tools used by archival institutions to protect PII in digital records when providing access to them; to identify gaps between the needs/obligations that archival institutions face regarding the protection of privacy in digital records and the current approaches, processes, techniques, and tools being used to provide access to them; to identify AI tools and/or techniques that could help archival institutions to comply with regulatory/institutional/social requirements when providing access to records; and to explore uses, issues, and challenges associated with the use of the identified AI tools and techniques in the archival field, as well as new tools that could be developed to that end.
Title: Research on the process of declassifying personal information using AI tools - Israel State Archives case study (RA07)

Lead Researcher: Silvia Schenkolewski-Kroll, Bar Ilan University, and Assaf Tractinsky, Israel State Archives

Timeline: November 2021-November 2023

Abstract: Archivists and record managers around the world deal with the process of declassifying information in paper and digital form, mainly in the context of quantities and time invested. This issue increases as digital information needs to be declassified, among others, due to the large quantities and its structure. This study proposes to explore the nature of declassifying in the paper and digitals environment to identify and develop guidelines for future use of AI tools for declassification. The main purpose of the study is to review the literature and practice in archival institutions, the process of declassifying in the paper and digital environment, and to create a framework and guidelines that will serve as a basis for future declassification using AI tools.
Title: Increasing Access to Photos, Videos and Social Media records through AI-generated Descriptive Metadata (RA01)

Lead Researcher: Adam Jansen

Timeline: January 2022-December 2025

Abstract: The study explores how AI and Machine Learning can be used to increase discoverability and improve access to non-textual records, such as digital photographs, films and videos, and social media posts. The study explores how AI can be used to analyze a digital photograph and add searchable descriptive element that may be missing from the record creator. The study will determine to what extent AI can improve user access to the content of films and videos - and may also help determine how AI can improve archival description of films and videos - through: automatic speech recognition; machine translation of film dialogue; automated time-coding of scenes; multimodal video summarization. The study explores how AI and Machine Learning can be used to increase discoverability of Social Media posts through content and semantic analysis of text, images, and videos.
Title: Trusted access/use of archives and AI: a conceptual model (RA08)

Lead Researcher: Pierluigi Feliciati, University of Macerata

Timeline: October 2023-April 2026

Abstract: Existing theoretical and functional models and ontologies do not conceptualize access/use of records and archives, taking for granted that it is just the “right or permission to find, retrieve, or use documents or information”. Nevertheless, the quality of access/use depends on many factors and other archival functions: what is accessible, by whom, how and when. The adoption of AI tools to create, appraise, manage, preserve and describe trustworthy records and archives increases the necessity of a conceptual model where entities, relations and functions involved in access/use procedures are correctly conceptualized. This study will consider all the existing models and reference documents on records and archives creation, management, appraisal, preservation, and description, such as OAIS, RiC-O, ISO 15489, ISO 23081, InterPARES ontologies, InterPARES Terminology database, ICA Code of Ethics, ICA Principles of access, Moreq, accessibility and usability generally accepted models. Moreover, the results of the first half of ITrust AI project studies (Ethics, Paradata, Information intersections among functions, gaming, enhanced access to multimedia, etc.) will be considered with attention to extend the existing standards, sharpen the principles related to access and finally, refine and release a robust conceptual model.
Title: Exploring Possible Uses of AI to Support Research in Archives (RA09)

Lead Researcher: Ken Thibodeau

Timeline: July 2024-June 2026

Abstract: The general question addressed in this research is: are there ways that AI can enrich possibilities for using archives as sources of information in research? The research will consider both actual and possible use of different AI approaches. The study aims to put archival institutions in a better position to help researchers to find records and record sets relevant to their interests and to define, find, extract — and possibly to generate — data responsive to their research questions. The difference between data extraction and data generation is that extraction maps to data types that archival institutions use, such as those defined in descriptive standards, whereas data generation extends to data types that are not needed or relevant in the performance of archival functions but can be obtained from digital archives through application of AI.

General Studies Back to Top

Title: AI Tutorials (GS01)

Lead Researcher: Muhammad Abdul-Mageed

Timeline: 2021-2026

Abstract: This AI and ML Tutorial Repository (https://github.com/UBC-NLP/itrustai-tutorials) houses tutorials created for the iTrustAI SSHRC Partnership Grant. These tutorials will grow over time and will be used in hands-on workshops for training purposes for archivists and records professionals, as well as anyone interested in learning about different AI tools and techniques. These will include Natural Language Processing (NLP) and its core tasks (Part of Speech (POS) Tagging, Named Entity Recognition (NER), Text Classification, Machine Translation); Speech Processing; Image Processing; Practical Machine Learning.
Title: Terminology Database (GS02)

Lead Researcher: Basma Makhlouf Shabou

Timeline: June 2021-March 2026

Abstract: Building on the terminology database developed in InterPARES Trust, this study will research and add terms from Artificial Intelligence and Machine Learning that are relevant to archives and records professionals.
Title: AI in records and archives management in China: Interaction and Co-evolution (GS03)

Lead Researcher: Weimei Pan and Linqing Ma

Timeline: May 2023-December 2024

Abstract: This study will examine the interaction between AI and records and archives management in China, and research how AI is being used currently for records and archives management work, how AI can be used, how the use of AI will transform the Chinese archival field, including its theories, methodologies, and practices, how theories and methodologies in archival field can contribute to the development of AI technology, and if and how the use of AI accelerate the digital transition. The study will also examine if AI technologies have the capabilities to transcend existing disputes in the field or if the successful use and adoption of AI require good records and archives management in the first place.
Title: Exploratory Study on Veracity and Truth Discovery of Multimedia Data in the Era of Generative AI (GS04)

Lead Researcher: Michel Barbeau, Carleton University

Timeline: TBD

Abstract: Generative AI leverages artificial intelligence to produce synthetic multimedia documents such as text, images, audio and video. While multimedia generally encompasses video and audio content, mulsemedia (multiple sensorial media) also includes haptic, gustatory, olfactory, and media including more than two senses. Massive production of AI-generated multimedia and mulsemedia data is expected in the upcoming years. Veracity and truth assessment tools such as AI-generated detectors will become increasingly critical for archivists and users of archives in the current and evolving context of digital data. The field of archives currently lacks the tools to assess vast volumes of multimedia data. This study will explore various aspects of data veracity, truth discovery of multimedia data, and the challenges emerging due to the novel AI generative techniques, and identify promising research directions and approaches. The results of this study, it is expected, will have implications for software vendors; archives and others who must work with large volumes of multimedia data where veracity is a concern.
Title: Traditional AI in Records Creation through Acquisition (GS05)

Lead Researcher: Fred Cohen

Timeline: March 2024-March 2026

Abstract: This study will examine traditional AI currently and historically used in archives, identify areas within creation through acquisition where traditional AI is not yet used and reasonably could be used, and work with an archives to prototype and demonstrate the use of previously unused methods in applications where time savings, cost savings, or other advantages of traditional AI to the archives could be exploited.