Redefining the computer virus

Thanks to the increased globalization of travel, the potential for a national epidemic is often just a plane ride away.

The possibility that your competition for the armrest is carrying a dangerous disease strain is compounded by the fact that this strain is likely to be evolving at a rapid rate.

By the time those wheels hit the tarmac, the virus could look completely different; and it’s unlikely to be alone — there could be numerous other strains hiding in its shadow, preparing to trigger the next public health crisis.

It’s a risk that health researchers and professionals have been dealing with for some time — from the great influenza pandemic of 1918, to more recent outbreaks of SARS and avian flu in Asia.

With recent advancements and innovations in the field of medical technology, however, these groups are closing the gap on devolving viruses and taking steps to nip potential epidemics, like China’s current H7N9 outbreak, in the bud.

Leading the charge is the Advanced Molecular Detection program at the Centers for Disease Control and Prevention in Atlanta. This initiative leverages next-generation genome sequencing technology, supercomputers and bioinformatics to sequence the genetic code of individual virus strains. Unlike earlier equipment, which could only read the genome of the most prominent strain, the new machinery can detect all of the virus substrains in a patient sample and clue lab analysts in as to how the disease is evolving.

“We can take the fragment of DNA, with the supercomputer, [and] put them back together like a jigsaw puzzle with tens of thousands of pieces to figure out where the connections are, whether it’s resistant, how it’s spreading and whether it’s becoming more viral,” CDC Director Thomas Frieden said at a congressional hearing last month.

Solving the genetic riddle

It’s a process being used by the CDC’s influenza division to monitor the outbreak of H7N9 influenza in China. The CDC’s labs have already received two patient samples of H7N9 from Shanghai and are working to engineer a new clinical vaccine.

Before work could begin, however, the influenza division had to sequence the DNA. To accomplish this task, they relied on technology used by the AMD program: the Ion Torrent PGM, PacBio RS and Illumina MiSeq. Each one of these machines can sequence the genomes within a particular sample, but they do it using a variety of methods.

The first step to reading a virus’ DNA with the Ion Torrent PGM is deciding what you want to the machine to sequence — the complete genetic code or just a small part of it, said Andy Felton, head of product management at IonTorrent. This is an important step because if you want to find out the entire genome of a particular sample, you have to start by breaking that DNA into smaller, more manageable pieces with a set of specialized enzymes.

Once you have these fragments, the sample enters the template preparation phase. In this step, small nucleotide adapter sequences are bonded to the end of each DNA segment. These adapter sequences help the DNA fragment attach to a tiny bead that sits inside the well of an IonTorrent semiconductor chip, Felton said. These chips are built in the same factory as iPhone camera microchips, but instead of detecting light and building a photographic image, they detect protons or hydrogen ions and build a replica of a sample’s DNA.

The technology to sequence a sample’s DNA is actually found IonTorrent’s tiny semiconductor chip – not the
larger PGM machine.
(Photo courtesy of Ion Torrent)

It is this tiny semiconductor chip that serves as the sequencing device for IonTorrent technologies – not the PGM machine, which weighs approximately 65 pounds, and is roughly the size of a large printer. Put simply, if the chip were a piece of art, the PGM device would be the frame, Felton said.

Once inside the PGM, the sequencing process can begin. In this case, that means the DNA sample will undergo cycles of exposure to liquid solutions of A, T, G and C nucleotides.

“If they’re complementary — if they’re A and we have a complementary T — that joins, you get a reaction,” Felton said. “We get a proton release, we get a signal and we know in that particular well, at that particular time, there was a T sequence … then we flow an A and then we flow a C and then we flow a G and then we repeat that cycle over and over again…until you have the strand sequenced.”

The IonTorrent PGM floods the semiconductor chip with fluids that aid the sequencing process.
(Photo courtesy of Ion Torrent)

When this cycle is complete, you are left a giant jigsaw puzzle of DNA data. But by matching up sequence areas, the strands can be put back together in proper order and you can get an accurate picture of the entire genome for that sample, Felton said. In all, this sequencing can take as little as three to four hours, depending on the length of the genome you want to read.

The smallest IonTorrent chip can perform 500,000 reads of DNA base pairs, but the larger chips can perform anywhere from a few million to 80-90 million reads. It’s this scalability of the technology, along with the speed and the simplicity of the device, Felton said, that sets the IonTorrent PGM apart.

Borrowing from biology

The PacBio RS, on the other hand, is set apart by its ability to read individual molecules of DNA. It does this by placing the DNA within Pacific Bioscience’s proprietary single-molecule real-time sequencing cell, or SMRT Cell. The inside of a SMRT Cell is lined with 150,000 tiny holes called zero-mode waveguides, which serve as nanoscopic observation chambers during the sequencing process. These waveguides also contain an enzyme that encourages the polymerization, or bonding, of single nucleotide units in a DNA strand.

Pacific Bioscience’s SMRT Cells contain enzymes that aid in the reconstruction of a sample’s DNA.
(Photo courtesy of Pacific Biosciences)

According to Jonas Korlach, chief scientific officer at Pacific Bioscience, the PacBio RS reads a sample’s DNA “by borrowing from the natural biological process by which cells duplicate their genome before they divide.”

In this sequencing process, an enzyme called DNA polymerase works its way down one strand of a DNA double helix. As it slides down this twisted ladder, the enzyme reads the string of nucleotides and, by matching up complementary nucleotides, starts constructing an entirely new strand that is an exact copy of the original.

“PacBio’s single molecule real-time sequencing eavesdrops on this process by monitoring the progression of individual DNA polymerase enzymes on DNA in real time, detecting and identifying which of the four letters in the DNA alphabet are sequentially added to the growing complementary DNA chain,” Korlach said.

Because the enzymes can be monitored simultaneously, the PacBio RS can produce 50,000-70,000 reads in runtimes as short as 45 minutes to two hours. Not to mention, these reads are an average of 3,000 to 5,000 base pairs in length; which is much longer than other sequencing technologies, Korlach said. The PacBio RS II, which was launched in April of this year, doubles this throughput capability and cuts previous runtimes in half.

The PacBio RS machine mimics the natural biological process by which cells duplicate their DNA.
(Photo courtesy of Pacific Biosciences)

With the Illumina MiSeq technology, once the DNA sample has been split apart into smaller pieces — similar to the IonTorrent process — the ends of these fragments are fixed with adapters, which prepare them to attach to the inside of proprietary flow cells.

According to Jeremy Preston, the director of product marketing for Illumina, these cells are essentially glass slides that are engraved with a number of channels. The surfaces of these channels are coated with millions of very small pieces of DNA that help the larger DNA fragments attach to the surface of the flow cell and lock them in place for the sequencing process.

The first step in the Illumina sequencing process is cluster generation, which uses a clonal amplification technique called bridge amplification to create thousands of copies of the original DNA fragment.

In order to read this mass amount of DNA, Illumina uses sequencing by synthesis, or SBS, technology and their proprietary reversible terminator chemistry.

“What that means is…a sequencing primer is flooded into the flow cells through the channels and it attaches to the fragments,” said Preston. In the next step, the DNA fragments are flooded with mixtures of A, T, C and G nucleotides that contain reversible dye-terminators and are labeled with different fluorescent signals or colors.

“Every time we flood the system with the four nucleotides, the complementary nucleotide binds, we excite the nucleotide with a light source — could be a laser, could be an LED light — and then the fluorescent signal that comes off that is detected by a camera and an image is taken. And that’s captured that one base there,” Preston said. “Because we flood all four nucleotides together at the same time, it’s almost like a competitive reaction; only the base that is complementary to the base that’s there will hybridize and then the others are washed away.”

The Illumina MiSeq machine uses sequencing by synthesis to reproduce mass amounts of DNA.
(Photo courtesy of Illumina)

The current Illumina platform repeats this cycle enough to sequence 15 million reads that are up to 250 base pairs long, in less than 20 hours. But what really sets the MiSeq apart, Preston said, is that cluster generation and the sequencing process are conducted successively in the same fully automated device, without any outside intervention or steps.

“All of [these machines] give you this population level information, they do the data generation the sequencing of the specimen using different technology,” said Dr. Michael Shaw, the associate director of laboratory science for the CDC’s influenza division, in a recent interview. “The trick for all of them is having the computing power to do the bioinformatics analysis on the data that comes off, because what you’re looking at is not just gigabytes of information, but terabytes and petabytes, which requires massive computing power.”

Not only do you have to have supercomputers with advanced processing power and storage capacity, Shaw said, you have to have the high-speed networking capacity to transfer the sequencing data from the machines doing the analysis to the machines that do the computations.

Putting the pieces together

Once this information transfer is made, the AMD initiative shifts to a reliance on human brainpower. CDC programmers design software that can detect specific markers or genetic red flags while filtering out irrelevant or benign DNA strains.

Overall, Shaw said, AMD technology allows researchers to process anywhere from 100 to 1,000 specimens a day. With the mass amount of data produced by these sequences, analysts can determine whether a particular virus strain is developing a drug-resistance or the capability to spread more easily from person to person. They can also see the track along which a virus appears to be evolving, and engineer a vaccine that will treat forthcoming strains.

With older methods, the CDC labs had to wait until they could get a sample of the virus, grow it in a lab setting, analyze it and filter out the unnecessary sequences by eye. With AMD, researchers are nearly two weeks ahead of schedule, “and with something that spreads as quick as influenza, two weeks can make a big difference,” Shaw said.

There is already one vaccine that is “fairly far along” in the battle against H7N9, according to Shaw. “[It’s] undergoing testing now and is close to being ready to make available to the vaccine manufacturers, so they can see how it works in their systems.”

But AMD doesn’t stop with influenza; it’s also being used to combat hospital-acquired infections, polio, foodborne outbreaks and unknown pathogen outbreaks.

“I think it’s got a tremendous future, because we’re seeing more and more of these cases where a zoonotic infection has been jumping from animals to humans. Sometimes they remain localized, but other times they evolve into a major human health risk,” Shaw said. “Being able to find out what’s going on in an outbreak at the very beginning, gives you a greater change to contain it and not let it become a larger health problem.”

Medill Today | June 11, 2025

Medill on Twitter