GreenDIGIT

Greener future digital research infrastructures

Introduction

Lowering the environmental impact of digital services and technologies has to become a priority for both the operation of existing digital services and the design of future digital infrastructures. Energy consumption and carbon footprint are the two most talked about environmental impacts, and indeed, digital infrastructures today contribute 3 to 4% of the total greenhouse gas (GHG) emissions in the world, with a growth of 8% per year. In particular, the part of the networking infrastructure alone is responsible for 2 to 14% of digital impacts, according to various sources, mainly due to their electricity consumption. Digital Research Infrastructures are a fundamental tool in the development of research from design to market. Moreover, digital services are becoming a ‘+1 instrument’ to many of the thematic (i.e. scientific discipline specific) ESFRIs as well as due to the big data challenges triggered by modern scientific instruments.

GreenDIGIT will approach the problem of reducing the environmental impact of RIs from the generic conceptual point of view by addressing all major factors that define and influence the RI environmental and climate impact. Thisincludes (i) physical digital infrastructure including datacenters and computer networks, which are the main factor of energy consumption in digital infrastructures; (ii) provider tools that allow for energy and efficiency monitoring on digital infrastructures; (iii) user tools and individual research environment that can be a strong factor of reducing energy consumption by optimal design of the scientific workflows and minimising resources usage by supporting research reproducibility and optimising research data management with impact awareness.

To address these factors, GreenDIGIT will develop the referenced architecture for future sustainable RIs that will include all factors influencing energy efficiency, environmental impact and sustainable RI operation and evolution through the whole RI lifecycle. The proposed reference architecture will provide a basis for designing RI components, defining operational models and metrics for energy and impact assessment, and developing an end-to-end optimisation model for different workloads and applications. To guarantee a take-up of the identified solutions across the stakeholders of the involved DIGIT RIs, and by other ESFRI Ris, GreenDIGIT will develop general methodologies for RI impact assessment, technical and policy recommendations for RI operators and decision makers. GreenDIGIT will develop model curricula and will deliver a set of training modules on the general and specific sustainability skills for different categories of organisational roles and actors in the RIs. The foundation of all project activities will be based on the RIs survey and landscape analysis at the beginning of the project that will align GreenDIGIT development with the existing practices and real needs of the target RIs. The described conceptual approach will allow to achieve the defined goals and objectives and deliver the expected outcomes.

Research reproducibility as a way to increase the efficiency and sustainability of RIs

Open Science has the potential to reduce energy, time and resources by sharing research results, data, experience at the early stages of the research environment setup and operation. Research reproducibility is an important factor in making Open Science actionable. It is essential to improve the efficiency, quality and environmental impact of research in the IoT - 5G/6G - Edge Cloud continuum addressing communication, computation and storage. It is a crucial component in ensuring that scientific findings are accurate, reliable, and can be independently verified. Research Infrastructures for experimental research need the ability to reproduce the whole experiment environment and equipment setup. Suitable approaches that meet the requirements for experimental research reproducibility generally combine a range of technologies with distributed git-based versioning control for experiments deployment and operation, Jupyter Notebooks, and support for workflow management.

Challenges from the heterogeneous architecture of edge cloud research infrastructure involve non-uniform and special purpose hardware and software stacks. Approaches and solutions from the SLICES-RI address research environment provisioning on demand, including two aspects: experiment management and orchestration platform pos (plain orchestration service) and federated data management infrastructures to support effective data sharing and ensure data quality.

Experiments based on suitable reproducibility frameworks make the experiments portable, allowing the experiments to be executed on various existing RIs. This enables better utilisation of available resources, and allows for resource-optimal placement. As a positive side-effect, results can be assessed for their robustness in different execution environments.

Research artefacts with a high level of trustworthiness and reproducibility are sustainable as experiments do not need to be repeated to verify results. Instead, available data from reproducible experiments can be made publicly accessible and reused, without recreating the original results, thereby improving the environmental footprint, and speeding up research.

GreenDIGIT aims to improve experiment tooling and frameworks for better reproducibility with less overhead for developers. Additionally, the publication, archiving, and accessibility of artefacts should be partially automated and integrated into typical research and development workflows. This simplifies compliance with open science practices.

GreenDIGIT goals include the development of a set of tools, platforms and best practices enabling resource efficiency while using federated Research Infrastructures for reproducible experiments, thereby establishing a Reproducibility as a Service (RaaS) ecosystem. It will bring time, energy and resource economy while accelerating research and facilitating collaboration. This can result in substantial cost savings, reduce the carbon footprint of research, and promote sustainable research practices by means of virtual research environments, including the following services:

  • Standardised experiment and workflow description: Establishing standardised protocols and procedures for experimental design, data collection, data analysis, and data sharing can help to ensure consistency and accuracy in the data collected and improve reproducibility. The use of appropriate experimental controls to minimise the impact of extraneous variables, and to ensure that any observed effects are due to the manipulation of the independent variable.
  • Experiment automation: Automating experimental procedures can help to ensure consistency and accuracy in the data collected. This can be done by using software tools that automate tasks such as data collection, data analysis, and data visualisation.
  • Experiment deployment: Creating a platform that allows for easy deployment of experiments can help to ensure that experiments can be run in different environments and conditions. This can be done by using container orchestration technology, such as Docker or Kubernetes, that automates the provisioning, deployment, networking, scaling, availability, and lifecycle management of experiments.
  • Experiment sharing and Reusability: Creating a platform that allows for easy sharing and reusability of experiments can help to improve reproducibility. This can be done by using tools such as Git or GitHub to version control and share code, and by using tools such as Jupyter Notebook to share code, data, and documentation.
  • Experiment data management: Implementing open science practices for experimental research data management. Properly storing, tracking, and sharing data can help to ensure that the data is accessible to others and can be used for replication studies. A robust data management system can help to ensure that data is properly stored, tracked, and shared. This can be done by using tools such as OpenRefine or Dataverse to clean, organize, and share data, and by using tools such as Zenodo or DataCite to assign unique identifiers and track data citations.
  • Collaboration and sharing: Collaborating with other researchers and sharing resources, such as experimental materials, protocols, and data, can help to improve reproducibility and increase the impact of research.

Overall, by promoting research reproducibility, we can achieve significant savings in time, energy, and resources, while also accelerating research and facilitating collaboration; creating a platform for experiment automation, deployment, sharing and reusability can help to improve the reproducibility of experimental research by ensuring consistency and accuracy in the data collected, and by making it easy to share and re-use experiments. Additionally, the platform should be designed with a robust data management system to ensure data is properly stored, tracked, and shared. These approaches can help to increase transparency, consistency, and accuracy in the research process and improve the reproducibility of experimental research. All these factors contribute to reducing overall energy and resource consumption and lower GHG/CO2 footprint.

Research objectives

  • O1: Assess status and trends of low impact computing within 4 DIGIT RIs (EGI, SLICES, SoBigData, EBRAINS) and in the broader digital service provider community of ESFRIs, to produce recommendations and roadmaps for providers for during and beyond the project.
  • O2: Provide reference architecture and design principles, as well as an actionable model for RIs about environmental impact assessment and monitoring, reflecting on the whole RI lifecycle and including the digital infrastructure components and their interaction with the broader environment.
  • O3: Develop and validate new and innovative technologies, methods, and tools for digital service providers within European Research Infrastructures through which they can reduce their energy consumptions and overall environmental impact.
  • O4: Develop and provide for researchers technical tools that assist them in the design, execution and sharing of environmental impact aware digital applications with reproducibility, Open Science and FAIR data management considerations. Description of objective: GreenDIGIT will provide researchers with virtual research environments and open
  • O5: Educate and support digital service providers in the RI communities about good practices on environmental impact conscious lifecycle management and operation of infrastructures and services.

Open and running student theses

Author Title Type Advisors Year Links
Konstantin Kissel Token-based Resource Management - A Currency for Scientific Testbeds BA Holger Kinkelin, Sebastian Gallenmüller, Henning Stubbe 2023