Training Infrastructure as a Service

This article has been Reviewed by the following groups

Read the full article

Abstract

Background

Hands-on training, whether in bioinformatics or other domains, often requires significant technical resources and knowledge to set up and run. Instructors must have access to powerful compute infrastructure that can support resource-intensive jobs running efficiently. Often this is achieved using a private server where there is no contention for the queue. However, this places a significant prerequisite knowledge or labor barrier for instructors, who must spend time coordinating deployment and management of compute resources. Furthermore, with the increase of virtual and hybrid teaching, where learners are located in separate physical locations, it is difficult to track student progress as efficiently as during in-person courses.

Findings

Originally developed by Galaxy Europe and the Gallantries project, together with the Galaxy community, we have created Training Infrastructure-as-a-Service (TIaaS), aimed at providing user-friendly training infrastructure to the global training community. TIaaS provides dedicated training resources for Galaxy-based courses and events. Event organizers register their course, after which trainees are transparently placed in a private queue on the compute infrastructure, which ensures jobs complete quickly, even when the main queue is experiencing high wait times. A built-in dashboard allows instructors to monitor student progress.

Conclusions

TIaaS provides a significant improvement for instructors and learners, as well as infrastructure administrators. The instructor dashboard makes remote events not only possible but also easy. Students experience continuity of learning, as all training happens on Galaxy, which they can continue to use after the event. In the past 60 months, 504 training events with over 24,000 learners have used this infrastructure for Galaxy training.

Article activity feed

  1. Background Hands-on training, whether it is in Bioinformatics or other scientific domains, requires significant resources and knowledge to setup and run. Trainers must have access to infrastructure that can support the sudden spike in usage, with classes of 30 or more trainees simultaneously running resource intensive tools. For efficient classes, the jobs must run quickly, without queuing delays, lest they disrupt the timetable set out for the class. Often times this is achieved via running on a private server where there is no contention for the queue, and therefore no or minimal waiting time. However, this requires the teacher or trainer to have the technical knowledge to manage compute infrastructure, in addition to their didactic responsibilities. This presents significant burdens to potential training events, in terms of infrastructure cost, person-hours of preparation, technical knowledge, and available staff to manage such events.Findings Galaxy Europe has developed Training Infrastructure as a Service (TIaaS) which we provide to the scientific commnuity as a service built on top of the Galaxy Platform. Training event organisers request a training and Galaxy administrators can allocate private queues specifically for the training. Trainees are transparently placed in a private queue where their jobs run without delay. Trainers access the dashboard of the TIaaS Service and can remotely follow the progress of their trainees without in-person interactions.Conclusions TIaaS on Galaxy Europe provides reusable and fast infrastructure for Galaxy training. The instructor dashboard provides visibility into class progress, making in-person trainings more efficient and remote training possible. In the past 24 months, > 110 trainings with over 3000 trainees have used this infrastructure for training, across scientific domains, all enjoying the accessibility and reproducibility of Galaxy for training the next generation of bioinformaticians. TIaaS itself is an extension to Galaxy which can be deployed by any Galaxy administrator to provide similar benefits for their users. https://galaxyproject.eu/tiaasCompeting Interest Statement

    This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giad048), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

    Elizabeth Ryder

    This technical note is an informative explanation of Training-Infrastructure-as-a-Service, which is a free service available to facilitate Galaxy training sessions. The service provides an easy way for instructors to set up infrastructure for trainings, enables learners to make progress through the training without long waiting times, and includes a dashboard through which instructors can easily monitor progress of learners. The article provides data showing the large number of events and locations that have benefited from using TIaaS. Because of the utility and general applicability of TIaaS, the article will be of interest to the readers of GigaScience.Minor suggestions:In the Development section: As a practical matter, it would be useful to know the typical timeline for approval of a training session. Also, can anyone who uses Galaxy become an instructor and request this service?In the Usage section, there is a sentence that reads, 'Class sizes have ranged considerably, from the median of 25 participants (std. dev 121) to a maximum of 1500 registrants for afully asynchronous (self-paced) course.' It's a little unusual to talk about a median and standard deviation, since medians are non-parametric measures and SDs are parametric and measured with respect to the mean. I'd suggest using the median and interquartile range instead. I think a histogram of class size distribution would be informative, similar to the event distributions in Fig. 4.Grammatical / spelling errors:I'm not sure why 'Findings' appears before 'Background' - perhaps an editing error?p. 2'a limiting factor for events with large number of participants, 'should read'with a large number of participants''by it's design'should read'by its design''which to to preference'should read'which to preference'p.4'univeristy'should read'university'p.5This sentence is hard to scan as written; I think it needs a semi-colon after 'cluster' to make sense. Galaxy Europe uses it with HTCondor, and job rules that allow spill over to the main cluster, new machines are brought up in an OpenStack cluster specifically for training events and destroyed afterwards.

  2. Background Hands-on training, whether it is in Bioinformatics or other scientific domains, requires significant resources and knowledge to setup and run. Trainers must have access to infrastructure that can support the sudden spike in usage, with classes of 30 or more trainees simultaneously running resource intensive tools. For efficient classes, the jobs must run quickly, without queuing delays, lest they disrupt the timetable set out for the class. Often times this is achieved via running on a private server where there is no contention for the queue, and therefore no or minimal waiting time. However, this requires the teacher or trainer to have the technical knowledge to manage compute infrastructure, in addition to their didactic responsibilities. This presents significant burdens to potential training events, in terms of infrastructure cost, person-hours of preparation, technical knowledge, and available staff to manage such events.Findings Galaxy Europe has developed Training Infrastructure as a Service (TIaaS) which we provide to the scientific commnuity as a service built on top of the Galaxy Platform. Training event organisers request a training and Galaxy administrators can allocate private queues specifically for the training. Trainees are transparently placed in a private queue where their jobs run without delay. Trainers access the dashboard of the TIaaS Service and can remotely follow the progress of their trainees without in-person interactions.Conclusions TIaaS on Galaxy Europe provides reusable and fast infrastructure for Galaxy training. The instructor dashboard provides visibility into class progress, making in-person trainings more efficient and remote training possible. In the past 24 months, > 110 trainings with over 3000 trainees have used this infrastructure for training, across scientific domains, all enjoying the accessibility and reproducibility of Galaxy for training the next generation of bioinformaticians. TIaaS itself is an extension to Galaxy which can be deployed by any Galaxy administrator to provide similar benefits for their users. https://galaxyproject.eu/tiaas

    This work has been peer reviewed in *GigaScience *(see https://doi.org/10.1093/gigascience/giad048), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

    **Azza Ahmed **

    The paper is well-written and neatly reports on the development of Training-Infrastructure-as-a-Service (TIaaS), a free infrastructure resource originally developed by Galaxy Europe and the Gallantries project together with the Galaxy community. TIaaS is a step towards democratizing bioinformatics training, where infrastructure can be a major barrier- even in advanced and well-developed countries.I specially appreciate the value of this resource for instructors and students in low and middle income countries where infrastructure limitations may be exacerbated by the availability of well-trained system administrators able to cater specific training needs. It was indeed gratifying to see training events using TIaaS in such countries in the figure 3 map- especially that it is not clear TIaaS is deployed in such counties. The utility of the resource is self-evident: 438 training events in 48 months targeting > 19000 students. Thus, overall, I congratulate the authors for the success of their project, and the community for having such a great free resource at their disposal.