How the pandemic reminds us to improve our science infrastructure
Kyle Tretina, Ph.D.
Scientific networks deal with unpredictable flows of elephant-sized traffic intensity.
As the COVID-19 pandemic showed, the U.S. wasn’t ready for a single large-scale event that required massive amounts of diagnostic testing. Although U.S. intelligence agencies warned about the coronavirus threat in January, leaving sufficient time for the country to ramp up testing, many states are still above the 10% test positivity threshold set by the World Health Organization, indicating that as a country we are under-testing in several regions. If there was ever a time when we should be able to admit as a country that we need better public health infrastructure, it should be now.
However, COVID-19 testing is just a microcosm of the data traffic intensity problems that science often faces. For example, data from High Energy Physics (HEP) experiments (like many research experiments) have a few challenges for users that differ from commercial data. First, the HEP field handles data flows up to ten million times bigger than commercial data flows and can last days — as opposed to the tiny data flow and the 14 hours and 19 minutes it would take you to watch three seasons of the Umbrella Academy on Netflix. Second, data transfer rates on these scientific networks are very sensitive, since a single HEP researcher can cause a large bump in the whole network, raising traffic intensity by 50–60% times the baseline from just a single experiment. Third, researchers need near-perfect data transfer fidelity, since even miniscule amounts of data loss can have an immense impact on data interpretation for some workflows. Netflix, on the other hand, has a relatively high number of interruptions for users and are doing very well as a business.
The scale of what scientific computing can do can be overwhelming to imagine. As measured in floating-point operations per second (a.k.a. FLOPS; see Table 1), the GPU inside the redesigned Mac Pro can reach up to 56 teraflops (10¹² FLOPS) of power, and one petaflop (10¹⁵ FLOPS) falls in the middle of what might be called commodity high-performance computing (HPC). This year (2020), China is expected to debut the world’s first exascale computer, which can carry out one billion billion (10¹⁸ FLOPS). This has been argued to be nearly the performance capacity of the human brain. Unfortunately, these computers are scarce and a small set of large corporations, universities, and government institutions dominate the supercomputer supply chain for very specific problems.
Access to scientific equipment and services should not be hindered by geography.
Despite a drastic increase in the nation’s testing capacity, there are barriers to testing for specific populations in the US. For many people, where they live, and other demographic characteristics, can determine their access to COVID-19 tests. This is hard for many people to believe, because in some parts of the country, thousands of tests are going unused. Nevertheless, the complexity of the technology and training required to use scientific equipment increases the need for infrastructure to make them accessible everywhere there is demand.
While the scientific community has enough equipment to support unprecedented data capacity and data flow, many of their devices are not accessible, even to most researchers. For example, as of 2020 there are only between 100 and 200 cryo-electron microscopes in the world, which can cost over $7 million and operational costs >$10,000 per day. The quality and quantity of data from these instruments are excellent and in demand by labs that have money to pay for their services. However, for most structural biologists, there is no access whatsoever, resulting in the “haves and have-nots”. As the supply of these instruments increases in the short term, democratizing access to them will be essential for promoting the best science.
Science needs networks to not only connect the scientific world, but also empower improvements in experience, performance and conduct.
Improvements in the use of network-connected devices, embedded in their spatial environments, is a subset of what is called the Internet of Things (IoT). Most scientists use antiquated approaches to finding and using scientific equipment and services, which typically involves cycles of verbal recommendations, reading data sheets, personal contacts, and going to exhibitions with poster boards and trinket handouts. As end users have become increasingly important for sales decisions, the IoT can become even more critical for improving the way in which scientific equipment is found and used. This could include increasing connectivity at the instrument, the interfaces between the instrument, network and user, and on the cloud. In the best case scenario, this also includes augmented intelligence to improve the entire user experience, including supply/demand aggregation, experimental design, data analysis, and communication.
Scientific workflows should be entirely modular and programmable.
As the pandemic moves into a new phase where people are trying to return to public spaces, one thing is very clear: employers are not equipped to make rational decisions to develop a comprehensive back-to-work plan. That is understandable. The choices of tests, lab suppliers, medical expertise, test site management, legal support, disease management when on-site positive cases are found, and compliance are many, diverse, and complicated. In a sense, employers are running into the same problem that scientists run into when using scientific equipment and services: they need help in identifying and navigating the maze of issues to develop a successful approach.
So what does it look like to dream of a world where the use of scientific instruments is a true “internet of instruments”? It goes beyond new capabilities that weren’t possible before. While there are many answers to this question from a technological perspective, I’d like to discuss what is known as the application layer, the user experiences.
First, at a minimum this means that the scientific equipment that we use will finally be accessible and controllable beyond the metal, glass or ivory gates that they sit behind. In an ideal world, we would also get real time data about the status of that instrument regarding its performance (temperature, speed of operation, progress, current users, etc.). Networks that aggregate these instruments can also make pricing more transparent and easily comparable, rather than spending weeks emailing PDF quotes back and forth.
Second, not only is the IoT applicable to individual instruments, it can also be applied to workflows. In the same way that rescue teams coordinate to find a missing person, or regions of the brain interact to orchestrate human behavior, the IoT can be used to turn scientific workflows into modules that are completely automated and interchangeable.
Finally, as Mark Weiser, Chief Technologist at Xerox PARC in the 1990s, articulated, “The most profound technologies are those that disappear. They weave themselves into the fabric of everyday life until they are indistinguishable from it.” The cost and size of making these improvements in scientific networks is finally approachable and we only need people with the vision to make the practice of science, not just the technology, truly modern.
*Figure design by Jessica Beasley