The birth of the Research Cloud in South Africa

The birth of the research cloud in South Africa found its origin in the development of the African Research Cloud (ARC). The ARC was created in 2015 began as a research partnership between the North-West University and the University of Cape Town. The ARC provides a cloud-based infrastructure, which is hosted by the partner institutions. The ARC lead to the Astronomy Proof of Concept, which was led by researchers from IDIA. The ARC tested different models of data management, storage and transfer with the aim of supporting the data processing steps required to transform raw MeerKAT data into scientific data projects.

In September 2015 a formal partnership was agreed upon between University of the Western Cape (UWC), University of Cape Town (UCT) and North-West University (NWU) which lead to the official establishment of Inter-University Institute for Data Intensive Astronomy (IDIA). In February 2016 the University of Pretoria (UP) joined the partnership. It is in that same year that the grant for Ilifu was successfully proposed and accepted by the Data Intensive Research Institute of South Africa (DIRISA) by six partner institutions through its National Integrated Cyberinfrastructure System (NICIS) to build a data-centric computing system that will provide computing power and data storage for projects in the strategic fields of astronomy and bioinformatics. It was at this point that IDIA resources and personnel focused on both Astronomy and Bioinformatics users.

When IDIA launched the Africa Research Cloud (ARC) in November 2016, it was the first African institute to launch a cloud-based data centre. Two proof of concept projects were run on the ARC facility; radio astronomy in the Western Cape and genomics at North West University. The radio astronomy pilot was called ARCADE, “African Resource Cloud Astronomy Development”. It was used for astronomy data processing and training, being used in UCT classrooms as part of students’ education. The Astronomy POC involved the development of a data intensive calibration and imaging pipeline for radio telescopes, with an emphasis on MeerKAT.

The success of the ARC ultimately led to the large-scale deployment of the pipeline on the IDIA facility, a facility built in anticipation of the deluge of MeerKAT data from different Large Survey Projects (LSPs). This deployment of the pipeline is known as the IDIA Research Cloud. This became the platform used by IDIA researchers for early MeerKAT data, and for MeerLICHT observations. Officially launched in July 2018, the IDIA research cloud is currently used by researchers across seven countries to collaborate on very large data sets coming from the MeerKAT telescope.

The IDIA Research Cloud is intended to provide sufficient storage capacity for persistent storage of the aggregated LSP visibility data from the MeerKAT data store over the lifetime of the project, as well as storage of intermediate science data products and post-processing products. The storage capacity increases as data from MeerKAT and other SKA pathfinder projects are ingested. The facility is designed to be an agile development platform for pipeline and post-processing algorithms, and for analytics and data mining. The goal is to foster coordinated development of pipelines among and between LSPs to identify common processing needs and take advantage of expertise across the LSPs. It is managed by the University partners with the participation of the community of research users. As a pipeline development platform it serves as a testbed for cloud-based provision of resources, tools and platforms for data intensive research. Since the establishment of the African Research Cloud, IDIA has added additional nodes, and the result became known as the IDIA Research Cloud.

The IDIA Research Cloud, which has served as a test case for the use of cloud technology for collaborative research, is currently used by researchers across seven countries to collaborate on the huge data sets coming off the MeerKAT telescope. The IDIA cloud is, however, not big enough to meet the requirements for the strategic science domains of astronomy and bioinformatics.

Since 2016, IDIA has driven the development of the next generation of research cloud. As science projects are being commissioned on MeerKAT and SKA, the telescopes are producing data sets even bigger than currently available. Biological data sets are undergoing a similar exponential growth. Joining forces with the bioinformatics community, IDIA has led the establishment of ilifu, a bigger research cloud infrastructure designed and built to service the astronomy and bioinformatics research communities. ilifu is a partnership of the following institutions:

In 2016 the six partner institutions put in a successful bid to DIRISA through its National Integrated Cyberinfrastructure System (NICIS) to build a data-centric computing system that will provide computing power and data storage for projects in the strategic fields of astronomy and bioinformatics.

The expansion of IDIA Cloud into Ilifu infrastructure has also been further supported by investments from IDIA and H3BioNet, the Pan African Bioinformatics Network for H3Africa. IDIA is the lead organization in the Ilifu project. Ilifu is a data-centric high performance computing facility focused on providing data intensive research capacity for astronomy and bioinformatics as part of a national tier-distributed infrastructure within the Data Intensive Research Initiative for South Africa (DIRISA). IDIA leads the development and implementation of astronomy-focused data intensive research solutions and more general system access and data distribution tools, with a major goal to provide the infrastructure and software systems for execution of MeerKAT Large Survey Projects.

Ilifu is a regional node, known as a Tier II node, in a national infrastructure, and partly funded by the Department of Science and Innovation (DSI) through their Data-Intensive Research Initiative of South Africa (DIRISA). It brings together the existing infrastructure and expertise of the six partner institutions

Since 2019, the IDIA research cloud is fully integrated into ilifu. 2019 will also see the first attempts to create research cloud federation between South Africa (IDIA) and Europe with EGI. EGI is a federated e-Infrastructure set up to provide advanced computing services for research and innovation. The EGI e-infrastructure is publicly-funded and comprises hundreds of data centres and cloud providers spread across Europe and worldwide.