Cisco Canada Blog

Bioinformatics – How Cisco and McMaster University are using the power of data in the fight against COVID-19

August 12, 2020

Over the past 10 years, technology has had a transformative and far reaching impact on our society. This has never been more apparent than over the past few months. During the current crisis, organizations have turned to Cisco’s technology to help them stay safe, secure and connected – and most importantly to help combat COVID-19.

Partnership with the McMaster University Research Community

In the past decade Cisco has developed close partnerships with many of the top Canadian research universities, sponsoring innovation programs and University Chairs across the country including McMaster University, based in Hamilton, Ontario. McMaster is internationally recognized for its leadership in health sciences, particularly in the fields of bioinformatics, functional genomics, and computational biology.

Our partnership with McMaster goes beyond the typical vendor/customer relationship – over the last six years Cisco has played a role in several of their Health Science related programs, supporting two research Chairs in the last few years.

Now, with the donation of a $375,000 high performance Unified Computing System (UCS) from the Cisco Foundation, we are supporting a remarkable new project, the COVID-19 Genotyping Tool (CGT) led by Dr. Andrew McArthur, associate professor of biochemistry and biomedical science at McMaster and past Cisco Chair in Bioinformatics. The CGT app is rooted in big data analytics and will enable scientists worldwide to track changes in the genetic structure of COVID-19 and ultimately find a vaccine.

The COVID-19 Genotyping Tool (CGT)

In partnership with the Ontario Vector Institute and Sunnybrook Health Sciences in Toronto, Dr. McArthur and his research team began looking at ways bioinformatics could help track fluctuations in the coronavirus, by analyzing its genome as it spreads around the world. As the virus is passed from person to person, its genome picks up minor changes and provides researches clues as to how it spread and its origin. Analysis of these genetic fluctuations allows the research teams to trace the trajectory of the virus and project where it’s heading next. This is a vital element in contact tracing – it helps identify how a localized outbreak may have started.

A key element of this research is the  COVID-19 Genotyping Tool (CGT), an artificial intelligence/machine learning analytics platform that allows researchers, hospitals and public agencies around the globe to upload their COVID-19 data and contextualize it with available sources in the public domain. Using AI dimensionality reduction techniques such as UMAP, the CGT is able to identify small differences in the virus genome, allowing it to be classified and compared against other known strains. The results are expected to lead to better insights on where transmission events likely occurred, when outbreaks happened, and will even lead to alerts of any key changes in the genetic makeup of the virus, which determines how infectious it is.

Powering Data Analysis

Sequencing and analyzing the DNA of a single infected patient involves millions of data points. So with any big data system, speed of analysis is key. The more data you have, the longer it takes to analyze – an especially critical point considering the vast amounts of COVID-19 global infection data that is streaming in on a daily basis. But processing massive amounts of data on this scale requires memory – specifically in-memory systems. An in-memory framework allows fast data processing and avoids the unwanted latency incurred when transferring between data backend storage systems, or even the latency incurred on local disk systems.

To be useful, the CGT needs to turn around these massive AI/ML tasks quickly and efficiently. This is only possible if the underlying compute system supports both complex AI/ML processing at high speed and low-latency access to the datasets which requires large amounts of fast memory.

With the Cisco UCS in place, the McMaster team is now able to process on the order of 150 genomes per hour – an impressive accomplishment considering the millions of data points and complex AI/ML algorithms involved, while at the same time ensuring sensitive patient data is stored securely. This has direct implications for vaccine design, drug development and collective efforts to combat COVID-19 worldwide.

I know I speak for all my Cisco colleagues when I say how proud we are to be part of a project with such huge potential for positive impact at both the local and global levels. Across the globe, governments, NGOs, and businesses are coming together to harness the power of data in the fight against COVID-19 and we are truly humbled to play a role through such an incredible project.

Leave a comment