Skip to content

Scientific LargeScale Infrastructure for Computing/Communication Experimental Studies – Starting Community

FAIR Maturity and Reproducibility in the focus of RDA20 Plenary 21-23 March 2023 by Yuri Demchenko

What is here for SLICES-RI?

Research Data Alliance (RDA). Last week on 21-23 March 2023 Research Data Alliance (RDA) celebrated its 20th Plenary[1] in Gothenburg, which is a biannual meeting of the RDA community. Since its foundation in 2013, RDA has become a primary forum for research data management experts and practitioners to share experiences and establish best practices in sharing research data. Many initiatives related to Open Science and research data sharing are originated from RDA and currently widely adopted by the research community and supported by the industry. This includes but is not limited to metadata definition and metadata registries, trusted data repositories, FAIR data principles and best practices in different scientific domains.

The Scientific Large-scale Infrastructure for Computing/Communication Experimental Studies (SLICES) is a distributed Digital Infrastructure designed to support large-scale experimental research focused on networking protocols, radio technologies, services, data collection, parallel and distributed computing and, in particular, cloud and edge-based computing architectures and services. Experimental research automation and reproducibility and supporting data management infrastructure are special topics of interest for SLICES-RI, which are actively developed by the multi-disciplinary RDA community.

The RDA20 Plenary featured a number of plenary sessions that provided a venue for wide community discussion on the RDA achievements and experience after 10 years of existence and vision for the next five years (2024-2028) on the future of science and research data sharing. The following were the main topics for plenary sessions: research data sharing landscape change in the last 10 years and RDA contribution; further developments on seamless interoperability between data, software, and high performance computing as a vision for future science; capacity of Open Science and research data sharing to solve the grand challenges of society. The plenary discussion also included a new framework, “The Value of the Research Data Alliance to Industry” released in December 2022 that aims to accelerate the partnership between RDA and industry for the benefit of the global data community.

A number of breakout meetings were focused on technical and infrastructure aspects of implementing FAIR data principles in different scientific and application domains. A short summary and references are provided below[2].

FAIR data principles implementation and maturity

The following breakout sessions took place related to FAIR data principles implementation and maturity:

1) The Way to FAIR: from data collection to citation[3] (organised by FAIR Data Maturity Model WG and Persistent Identification of Instruments WG)

This session discussed the application of FAIR principles starting from data collection from different data sources, including sensor/IoT networks and instruments, to supporting the whole data lifecycle to data publication and citation. To allow this, RDA proposed persistent identifiers for instruments (for details, refer to the Persistent Identification of Instruments WG wikipage).

This work is a part of the FAIR Data Maturity Model WG that develops a common set of core assessment criteria for FAIRness and a generic and expandable self-assessment model for measuring the maturity level of a dataset in combination with the maturity of the data management operator and supporting infrastructure. The prospective recommendations should increase the coherence and interoperability of existing and emerging FAIR assessment frameworks.

2) Defining the roadmap towards FAIR for Machine Learning[4] (organized by FAIR for Machine Learning (FAIR4ML) IG)

FAIR for Machine Learning IG is at the stage of building community and setting up their activities. The goal of session was to discuss where FAIR should apply to ML data pipeline, considering the work in other working groups and identifying what existing solution can be used at different pipeline stages.

Data Management Planning

3) Data Management Planning: where are we and where do we want to be?[5] (organized by Active Data Management Plans IG, DMP Common Standards WG, Discipline-specific Guidance for Data Management Plans WG)

In most projects, research data management plans (DMP) created at the proposal stage do not evolve, and their implementation is not monitored. The Active Data Management Plans IG develops recommendations for the active DMP, which is defined at the planning stage and evolves through the entire datasets and project lifecycle, ensuring that data is appropriately managed, archived, preserved and available for re-use. This also led to introducing the machine-readable and machine actionable DMP format, referred to as mrDMP or maDMP.

The IG and meeting were focused on constructing DMP using templates in XML or JSON for machine readability. However, the side of machine actionability is not specifically discussed which may include DMP construction for derivative datasets, DMP monitoring, and implementation assessment. For SLICES-RI, we can identify the following aspects or actions that can be taken on the DMP specified for the whole RI, organisational level, experiment level and for the special case of provisioning Virtual Research Environment (VRE) on demand.

 

Research Reproducibility

4) BoF: Computational Reproducibility: What’s Next for RDA?[6] (organized by CURE-FAIR WG)

RDA has been addressing research reproducibility from the very beginning, coordinated by the currently historical CURE-FAIR WG (Curating for FAIR and Reproducible Research) to establish standards-based guidelines for curating for reproducible and FAIR data and code. The WG produced guidelines on “10 Things for Curating Reproducible and FAIR Research”[7] that provide a wide view of research reproducibility and include Completeness, Organization, Economy, Transparency, Documentation, Access, Provenance, Metadata, Automation, Review. The work was done in coordination with the FAIR and metadata for scientific software.

The session on computational reproducibility discussed possible RDA’s contribution and if there is a need for coordinating efforts around computational reproducibility at RDA, should the RDA Reproducibility IG be restarted. The session hosted presentations on the existing reproducibility frameworks such as by: Association of Computing Machinery (ACM), EIG on Reproducibility; Reproducibility Networks, ReScience journals; Turing Way; Open Science communities; PLOS; cascad.

FAIR Digital Object and PID

This topic included three inter-related meetings that proposed the concepts of FAIR Digital Object concept, Open Science Graph, and Persistent Identifiers as the infrastructure components for data sharing and discovery.

5) IG FAIR Digital Object Fabric: Discussing the FAIR DO Concept[8] (organized by FAIR Digital Object Fabric IG)

FAIR Digital Objects (FAIR DOs) describe a concept of virtual data objects that have been developed and used by RDA in various Working and Interest Groups. FAIR DOs may represent data, software, or other research resources. They are uniquely identified by a Persistent Identifier (PID) and metadata rich enough to enable them to be reliably found, used and cited.

FAIR DOs are currently discussed world-wide, in many RDA groups, in the European Open Science Cloud, the FDO Forum (https://fairdo.org), as well as in other initiatives with the need to design and develop large integrative research data infrastructures to facilitate findability, accessibility, interoperability and reusability (FAIR).

The session provided update to the RDA Community on ongoing work in the FAIR DO communities and discussed further coordination activities for wider FAIR DO adoption.

6) Open Science Graph – Interoperability Framework[9] (organized by Open Science Graphs for FAIR Data IG that focuses on open challenges in Open Science Graphs for FAIR Data)

An Open Science Graph (OSG) is an information space describing through metadata one or more entities and actors involved in the research lifecycle and knowledge production (e.g., publications, data, software, projects, funding, researchers, organisations, and services). The session discussed interoperability issues and the role of robust PDI infrastructure.

Definition of guidelines towards an Interoperability Framework (IF) to enable seamless exchange of data across diverse OSGs is organized in the form of four Task Forces (TFs) focused on different aspects of implementing the Open Science Graph – Interoperability Framework (OSG-IF): OSG Core Information Model; OSG Data Exchange Commons; OSG Access Protocol Commons; OSG Profiles.

7) Pathways to national PID strategies: guidance to facilitate uptake and alignment[10] (organised by National PID Strategies WG)

This session extended on the wider PDI adoption and supporting infrastructure that should include the National and thematic RIs PID Strategies and corresponding infrastructure. Building European PDI infrastructure has been a topic for several Horizon Europe e-Infrastructure calls. This topic is also addressed in current projects EOSC Future and EOSC FAIR-IMPACT.

Infrastructure elements to support FAIR data sharing

RDA also addresses infrastructure aspects in building robust data infrastructure for FAIR data sharing and these aspects were represented in the following breakout sessions. Infrastructure reated IGs and WGs are actively contributed by EOSC, RI and NREN projects related to data infrastructure services. The topics are highly related to the SLICES-RI architecture and design.

8) The Role of Middleware in Data and Metadata Management[11] (organized by Research Data Architectures in Research Institutions IG)

9) Draft recommendations for making VREs FAIR and FAIR-enabling[12] (organized by FAIR for Virtual Research Environments WG)

10) Trusted Research Environments for Sensitive Data: FAIRness for “Closed” Data and Processes[13]

 

About RDA (https://www.rd-alliance.org/)

RDA was launched as a community-driven initiative in 2013 by the European Commission, the United States Government’s National Science Foundation and National Institute of Standards and Technology, and the Australian Government’s Department of Innovation with the goal of building the social and technical infrastructure to enable open sharing and re-use of data.

RDA has a grass-roots, inclusive approach covering all data lifecycle stages, engaging data producers, users and stewards, addressing data exchange, processing, and storage. It has succeeded in creating the neutral social platform where international research data experts meet to exchange views and to agree on topics including social hurdles on data sharing, education and training challenges, data management plans and certification of data repositories, disciplinary and interdisciplinary interoperability, as well as technological aspects.

 

[1] https://www.rd-alliance.org/, https://www.rd-alliance.org/rdas-20th-plenary-draft-programme-0

[2] How to use the information below: (1) meeting are organized by existing or proposed Interest Groups or Working Groups to discuss specific topics related to IG/WG workplan or documents; (2) refer to meeting agenda by link and collaborative notes; (3) check for specific IG/WG wikipage for additional information; (4) contribution to IG/WG is open, ongoing activity is coordinated via mailing lists.

[3] https://www.rd-alliance.org/way-fair-data-collection-citation

[4] https://www.rd-alliance.org/plenaries/rda-20th-plenary-meeting-gothenburg-hybrid/defining-roadmap-towards-fair-machine-learning

[5] https://www.rd-alliance.org/data-management-planning-where-are-we-and-where-do-we-want-be

[6] https://www.rd-alliance.org/computational-reproducibility-what%E2%80%99s-next-rda

[7] https://curating4reproducibility.org/10things/

[8] https://www.rd-alliance.org/plenaries/rda-20th-plenary-meeting-gothenburg-hybrid/ig-fair-digital-object-fabric-discussing-fair

[9] https://www.rd-alliance.org/plenaries/rda-20th-plenary-meeting-gothenburg-hybrid/open-science-graph-interoperability-framework

[10] https://www.rd-alliance.org/plenaries/rda-20th-plenary-meeting-gothenburg-hybrid/pathways-national-pid-strategy-guide-facilitate

[11] https://www.rd-alliance.org/plenaries/rda-20th-plenary-meeting-gothenburg-hybrid/role-middleware-data-and-metadata-management

[12] https://www.rd-alliance.org/plenaries/rda-20th-plenary-meeting-gothenburg-hybrid/draft-recommendations-making-vres-fair-and-fair

[13] https://www.rd-alliance.org/trusted-research-environments-sensitive-data-fairness-closed-data-and-processes