Why does sequencing virus proteins take time?

Why does sequencing virus proteins take time?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

According to the below paper, the coronavirus spike protein sequence was available to scientists by end of february 2020 - the begin of march 2020 timeline. I had this question that why does sequencing of a virus protein take so much time (I am a software engineer, currently studying molecular biology to better understand the pandemic, hence I am seriously unaware!) ? The reason of asking is because, in case of another epidemic, we might need to spend again a lot of time to sequence the protein. This might be a real concern!,remain%20identical%20or%20almost%20so.

Sequencing a viral genome requires isolation of the virus, propagation in cell culture, extraction of nucleic acids, and preparation of a sequencing library. Once sequences are obtained, a genome can be assembled de novo using (untargeted) shotgun reads, and gaps in the genome can be spanned with (targeted) Sanger sequencing. This process can take weeks to months depending on the availability of resources and the growth characteristics of the virus.

For details on how the 2019-nCoV genome was sequenced, see Zhu et al., published January 2020:

A Novel Coronavirus from Patients with Pneumonia in China, 2019

Sequencing DNA (which is what was done for the COVID virus, post reverse transcription as it is an RNA virus) does not take very long. A NextSeq instrument can provide results in <24 hours. Protein sequences are inferred from the DNA sequence using gene/ORF calling software.

SOme of this is suggested by @acvill, though I think that they are pessimistic about time estimates. Viral genomes are pretty easy to assemble (matter of hours). In principle you can dispense with the isolation step too, though that makes the analysis more complex.

It is all the medical and bioinformatic stuff around the sequencing, e.g. isolation and analysis and ethical paperwork (necessary!), which takes time.

The sequencing of COVID-19

Dr Charlotte Houldcroft emerges briefly from her tent within a lab on the Cambridge Biomedical Campus.

Dr Charlotte Houldcroft emerges briefly from her tent within a lab on the Cambridge Biomedical Campus.

Labs across the country have converted to the genetic sequencing of coronavirus samples to help track its mutation and spread. The initiative, COG-UK, is being led by Cambridge. We spoke to one of the scientists lending their time and expertise.  

Prof Ian Goodfellow with Dr Charlotte Houldcroft and other COG-UK volunteers during pre-lockdown training (prior to social distancing directives).

Prof Ian Goodfellow with Dr Charlotte Houldcroft and other COG-UK volunteers during pre-lockdown training (prior to social distancing directives).

The scientists generally work in pairs, swapping tasks to provide relief, and double-checking each other’s work. Virus samples arrive the day after being taken, attached to an anonymised barcode. Once the sample is prepped, it gets pipetted into the injection port of the latest handheld minION sequencer.

“The chemical properties of each DNA base change the electrical current in the machine, so it reads DNA really fast,” explains Houldcroft. “Instead of taking up to two days, it takes one to eight hours.” This means her lab can sequence the genomes of between 24 and 70 virus samples a day.  

‘Wind the clock back’

A virus is essentially a parasitic packet of genetics programmed to copy itself inside a host. Coronaviruses are encased in a layer of fat, which is why soap is so effective, says Houldcroft. “It breaks down fat, and the genetic guts of the virus spill out.”

As Covid-19 replicates within a host, mistakes get made. Most of these little genetic mutations make no difference to the effectiveness of the virus. They can, however, be tracked by scientists through virus genome sequencing.

The minor mutations lead to subtly different lineages. This can be seen in the RNA sequence, and used to determine the phylogenetics: the coronavirus family tree, as it splits and diversifies. Very roughly, a mutation occurs every 20 “transmission events” or about once every two weeks.  

Anonymised tubes of coronavirus RNA to be prepared for sequencing by the COG-UK team.

Anonymised tubes of coronavirus RNA to be prepared for sequencing by the COG-UK team.

“You look at the diversity in the genome and try to wind the clock back: working out what are mutations, when they occurred and so where this strain emerged and how it fits into the UK pattern,” says Houldcroft.

“Some of the first samples of the virus in the UK, the Chinese visitors in York and the cluster in Brighton, are not related that closely to anything we’re seeing in the country now. It suggests the tracking and tracing at the beginning worked.”

“It was not until it started to spread across Europe and the US that we get multiple introductions across the UK. Virus clusters in Wales, for example, had contributions from all over the world. There’s not a single Welsh strain.”

Unsurprisingly, perhaps, many UK strains are closely related to those in neighbouring countries, such as France and Holland, with London seeing the most viral diversity.

The sequencing data from across the nation is uploaded several times a day, feeding into a big picture that gets pored over at senior levels of the COG consortium. This can be used in mathematical models to provide better indications of infection rates.

It can also flag if part of the country starts �having weirdly” says Houldcroft. “If you have a sudden expansion of a single viral lineage somewhere, you know you need to look closely at that area’s containment measures, focus resources."

“If you have a COVID ward with multiple examples of the same lineage, you might have a local outbreak within a particular area, community or perhaps even hospital. Whereas diversity of viruses suggests pre-lockdown infection.” This genetic detective work can help locate transmission “hot spots”, or reassure that steps to control infection are working.

One of the handheld MinION DNA sequencers used by Charlotte and the COG-UK team.

One of the handheld MinION DNA sequencers used by Charlotte and the COG-UK team.

One of the truly unnerving features of the new coronavirus is its unpredictability. Many suffer almost no symptoms, while some young and seemingly healthy people end up with pneumonia or worse. As well as short-term detection, COG-UK is building an invaluable resource for long-term prevention.

“Using electronic health records, we will ultimately be able to see if changes in the viral genome are associated with more or less severe symptoms, or cause problems for those with particular underlying conditions,” says Houldcroft. 

Herpes and Corona

COVID-19’s closest known relative was found in bats seven years ago in China. It is believed the new coronavirus jumped into the human “reservoir” via an intermediary species, possibly the much-persecuted pangolin, traded for its scales.   

Members of this virus family – named for their protruding proteins resembling a spiky crown – have long troubled humans, causing childhood coughs and colds. Houldcroft points out that the genetics of familiar coronas date back to the Middle Ages.

“We get them as kids, build an immunity, and then the only reservoir available to the virus is the next generation. But COVID-19 is new to humans: no one has immunity.”    

“Mutations in the spike proteins, probably in the bat ancestor, is what made this virus so successful – helping it access the cells in our lungs. We’re vigilant for new mutations within these proteins, which could affect vaccination strategies.” 

COVID-19 vaccines focus on the spike protein – but here's another target

Credit: Kateryna Kon/Shutterstock

The latest results from the phase 3 COVID-19 vaccines trials have been very positive. These have shown that vaccinating people with the gene for SARS-CoV-2 spike protein can induce excellent protective immunity.

The spike protein is the focus of most COVID-19 vaccines as it is the part of the virus that enables it to enter our cells. Virus replication only happens inside cells, so blocking entry prevents more virus being made. If a person has antibodies that can recognize the spike protein, this should stop the virus in its tracks.

The three most advanced vaccines (from Oxford/AstraZeneca, Pfizer/BioNTech and Moderna) all work by getting our own cells to make copies of the virus spike protein. The Oxford vaccine achieves this by introducing the spike protein gene via a harmless adenovirus vector. The other two vaccines deliver the spike protein gene directly as mRNA wrapped in a nanoparticle. When our own cells make the spike protein, our immune response will recognize it as foreign and start making antibodies and T cells that specifically target it.

However, the SARS-CoV-2 virus is more complicated than just a spike protein. There are, in fact, four different proteins that form the overall structure of the virus particle: spike, envelope (E), membrane (M) and nucleocapsid (N). In a natural infection, our immune system recognizes all of these proteins to varying degrees. So how important are immune responses to these different proteins, and does it matter that the first vaccines will not replicate these?

Following SARS-CoV-2 infection, researchers have discovered that we actually make the most antibodies to the N protein – not the spike protein. This is the same for many different viruses that also have N proteins. But how N protein antibodies protect us from infection has been a long-standing mystery. This is because N protein is only found inside the virus particle, wrapped around the RNA. Therefore, N protein antibodies cannot block virus entry, will not be measured in neutralization assays that test for this in the lab, and so have largely been overlooked.

Parts of the coronavirus, including the N protein. Credit: OSweetNature/Shutterstock

New mechanism discovered

Our latest work from the MRC Laboratory of Molecular Biology in Cambridge has revealed a new mechanism for how N protein antibodies can protect against viral disease. We have studied another virus containing an N protein called lymphocytic choriomeningitis virus and shown a surprising role for an unusual antibody receptor called TRIM21.

Whereas antibodies are typically thought to only work outside of cells, TRIM21 is only found inside cells. We have shown that N protein antibodies that get inside cells are recognized by TRIM21, which then shreds the associated N protein. Tiny fragments of N protein are then displayed on the surface of infected cells. T cells recognize these fragments, identify cells as infected, then kill the cell and consequently any virus.

We expect that this newly identified role for N protein antibodies in protecting against virus infection is important for SARS-CoV-2, and work is ongoing to explore this further. This suggests that vaccines that induce N protein antibodies, as well as spike antibodies, could be valuable, as they would stimulate another way by which our immune response can eliminate SARS-CoV-2.

Adding N protein to SARS-CoV-2 vaccines could also be useful because N protein is very similar between different coronaviruses—much more so than the spike protein. This means it's possible that a protective immune response against SARS-CoV-2 N protein could also offer some protection against other related coronaviruses, such as Mers.

Another potential benefit that may arise from including N protein in SARS-CoV-2 vaccines is due to the low mutation rates seen in the N protein sequence. Some changes to the sequence of SARS-CoV-2 have been reported over the course of this pandemic, with the most significant changes occurring in the spike protein. There is some concern that if the spike sequence alters too much, then new vaccines will be required. This could be similar to the current need for annual updating of influenza vaccines. However, as the N protein sequence is much more stable than the spike, vaccines that include a component targeting the N protein are likely to be effective for longer.

The first wave of SARS-CoV-2 vaccines brings genuine hope that this virus can be controlled by vaccination. From here it will be an ongoing quest to develop even better vaccines and ones that can remain effective in the face of an evolving virus. Future vaccines will probably focus on more than just the spike protein of SARS-CoV-2, and the N protein is a promising target to add to the current strategies being considered.

This article is republished from The Conversation under a Creative Commons license. Read the original article.

For more information about DNA sequencing technologies and their use:

MedlinePlus Genetics discusses whether all genetic changes affect health and development.

An illustration of the decline in the cost of DNA sequencing, including that caused by the introduction of new technologies, is provided by the National Human Genome Research Institute.

The American College of Medical Genetics and Genomics has laid out their policies regarding whole exome and whole genome sequencing, including when these methods should be used, what results may arise, and what the results might indicate.

GeneReviews compares whole genome sequencing, whole exome sequencing, and the sequencing of a selection of individual genes, particularly their use in the diagnosis of genetic conditions.

The PHG Foundation provides an overview of whole genome sequencing and how it can be used in healthcare.

The Mount Sinai School of Medicine Genomics Core Facility describes the techniques used in whole exome sequencing.

The Smithsonian National Museum of Natural History's exhibit 'Genome: Unlocking Life's Code' discusses the advancements made in DNA sequencing.

Review Questions

As an Amazon Associate we earn from qualifying purchases.

Want to cite, share, or modify this book? This book is Creative Commons Attribution License 4.0 and you must attribute OpenStax.

    If you are redistributing all or part of this book in a print format, then you must include on every physical page the following attribution:

  • Use the information below to generate a citation. We recommend using a citation tool such as this one.
    • Authors: Julianne Zedalis, John Eggebrecht
    • Publisher/website: OpenStax
    • Book title: Biology for AP® Courses
    • Publication date: Mar 8, 2018
    • Location: Houston, Texas
    • Book URL:
    • Section URL:

    © Jan 12, 2021 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution License 4.0 license. The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.

    Cell-Surface Receptors

    Cell-surface receptors, also known as transmembrane receptors, are cell surface, membrane-anchored (integral) proteins that bind to external ligand molecules. This type of receptor spans the plasma membrane and performs signal transduction, in which an extracellular signal is converted into an intercellular signal. Ligands that interact with cell-surface receptors do not have to enter the cell that they affect. Cell-surface receptors are also called cell-specific proteins or markers because they are specific to individual cell types.

    Each cell-surface receptor has three main components: an external ligand-binding domain, a hydrophobic membrane-spanning region, and an intracellular domain inside the cell. The ligand-binding domain is also called the extracellular domain. The size and extent of each of these domains vary widely, depending on the type of receptor.

    Because cell-surface receptor proteins are fundamental to normal cell functioning, it should come as no surprise that a malfunction in any one of these proteins could have severe consequences. Errors in the protein structures of certain receptor molecules have been shown to play a role in hypertension (high blood pressure), asthma, heart disease, and cancer.

    How Viruses Recognize a Host

    Unlike living cells, many viruses do not have a plasma membrane or any of the structures necessary to sustain life. Some viruses are simply composed of an inert protein shell containing DNA or RNA. To reproduce, viruses must invade a living cell, which serves as a host, and then take over the host’s cellular apparatus. But how does a virus recognize its host?

    Viruses often bind to cell-surface receptors on the host cell. For example, the virus that causes human influenza (flu) binds specifically to receptors on membranes of cells of the respiratory system. Chemical differences in the cell-surface receptors among hosts mean that a virus that infects a specific species (for example, humans) cannot infect another species (for example, chickens).

    However, viruses have very small amounts of DNA or RNA compared to humans, and, as a result, viral reproduction can occur rapidly. Viral reproduction invariably produces errors that can lead to changes in newly produced viruses these changes mean that the viral proteins that interact with cell-surface receptors may evolve in such a way that they can bind to receptors in a new host. Such changes happen randomly and quite often in the reproductive cycle of a virus, but the changes only matter if a virus with new binding properties comes into contact with a suitable host. In the case of influenza, this situation can occur in settings where animals and people are in close contact, such as poultry and swine farms. Once a virus jumps to a new host, it can spread quickly. Scientists watch newly appearing viruses (called emerging viruses) closely in the hope that such monitoring can reduce the likelihood of global viral epidemics.

    Cell-surface receptors are involved in most of the signaling in multicellular organisms. There are three general categories of cell-surface receptors: ion channel-linked receptors, G-protein-linked receptors, and enzyme-linked receptors.

    Figure 2. Gated ion channels form a pore through the plasma membrane that opens when the signaling molecule binds. The open pore then allows ions to flow into or out of the cell.

    Ion channel-linked receptors bind a ligand and open a channel through the membrane that allows specific ions to pass through. To form a channel, this type of cell-surface receptor has an extensive membrane-spanning region. In order to interact with the phospholipid fatty acid tails that form the center of the plasma membrane, many of the amino acids in the membrane-spanning region are hydrophobic in nature. Conversely, the amino acids that line the inside of the channel are hydrophilic to allow for the passage of water or ions. When a ligand binds to the extracellular region of the channel, there is a conformational change in the proteins structure that allows ions such as sodium, calcium, magnesium, and hydrogen to pass through (Figure 2).

    G-protein-linked receptors bind a ligand and activate a membrane protein called a G-protein. The activated G-protein then interacts with either an ion channel or an enzyme in the membrane (Figure 3). All G-protein-linked receptors have seven transmembrane domains, but each receptor has its own specific extracellular domain and G-protein-binding site.

    Cell signaling using G-protein-linked receptors occurs as a cyclic series of events. Before the ligand binds, the inactive G-protein can bind to a newly revealed site on the receptor specific for its binding. Once the G-protein binds to the receptor, the resultant shape change activates the G-protein, which releases GDP and picks up GTP. The subunits of the G-protein then split into the α subunit and the βγ subunit. One or both of these G-protein fragments may be able to activate other proteins as a result. After awhile, the GTP on the active α subunit of the G-protein is hydrolyzed to GDP and the βγ subunit is deactivated. The subunits reassociate to form the inactive G-protein and the cycle begins anew.

    Figure 3. Heterotrimeric G proteins have three subunits: α, β, and γ. When a signaling molecule binds to a G-protein-coupled receptor in the plasma membrane, a GDP molecule associated with the α subunit is exchanged for GTP. The β and γ subunits dissociate from the α subunit, and a cellular response is triggered either by the α subunit or the dissociated βγ pair. Hydrolysis of GTP to GDP terminates the signal.

    G-protein-linked receptors have been extensively studied and much has been learned about their roles in maintaining health. Bacteria that are pathogenic to humans can release poisons that interrupt specific G-protein-linked receptor function, leading to illnesses such as pertussis, botulism, and cholera.

    Figure 4. Transmitted primarily through contaminated drinking water, cholera is a major cause of death in the developing world and in areas where natural disasters interrupt the availability of clean water. (credit: New York City Sanitary Commission)

    In cholera (Figure 4), for example, the water-borne bacterium Vibrio cholerae produces a toxin, choleragen, that binds to cells lining the small intestine. The toxin then enters these intestinal cells, where it modifies a G-protein that controls the opening of a chloride channel and causes it to remain continuously active, resulting in large losses of fluids from the body and potentially fatal dehydration as a result.

    Modern sanitation eliminates the threat of cholera outbreaks, such as the one that swept through New York City in 1866. This poster from that era shows how, at that time, the way that the disease was transmitted was not understood.

    Enzyme-linked receptors are cell-surface receptors with intracellular domains that are associated with an enzyme. In some cases, the intracellular domain of the receptor itself is an enzyme. Other enzyme-linked receptors have a small intracellular domain that interacts directly with an enzyme. The enzyme-linked receptors normally have large extracellular and intracellular domains, but the membrane-spanning region consists of a single alpha-helical region of the peptide strand. When a ligand binds to the extracellular domain, a signal is transferred through the membrane, activating the enzyme. Activation of the enzyme sets off a chain of events within the cell that eventually leads to a response. One example of this type of enzyme-linked receptor is the tyrosine kinase receptor (Figure 5). A kinase is an enzyme that transfers phosphate groups from ATP to another protein. The tyrosine kinase receptor transfers phosphate groups to tyrosine molecules (tyrosine residues). First, signaling molecules bind to the extracellular domain of two nearby tyrosine kinase receptors. The two neighboring receptors then bond together, or dimerize. Phosphates are then added to tyrosine residues on the intracellular domain of the receptors (phosphorylation). The phosphorylated residues can then transmit the signal to the next messenger within the cytoplasm.

    Practice Question

    Figure 5. A receptor tyrosine kinase is an enzyme-linked receptor with a single transmembrane region, and extracellular and intracellular domains. Binding of a signaling molecule to the extracellular domain causes the receptor to dimerize. Tyrosine residues on the intracellular domain are then autophosphorylated, triggering a downstream cellular response. The signal is terminated by a phosphatase that removes the phosphates from the phosphotyrosine residues.

    HER2 is a receptor tyrosine kinase. In 30 percent of human breast cancers, HER2 is permanently activated, resulting in unregulated cell division. Lapatinib, a drug used to treat breast cancer, inhibits HER2 receptor tyrosine kinase autophosphorylation (the process by which the receptor adds phosphates onto itself), thus reducing tumor growth by 50 percent. Besides autophosphorylation, which of the following steps would be inhibited by Lapatinib?

    Viruses revealed to be a major driver of human evolution

    The constant battle between pathogens and their hosts has long been recognized as a key driver of evolution, but until now scientists have not had the tools to look at these patterns globally across species and genomes. In a new study, researchers apply big-data analysis to reveal the full extent of viruses' impact on the evolution of humans and other mammals.

    Their findings suggest an astonishing 30 percent of all protein adaptations since humans' divergence with chimpanzees have been driven by viruses.

    "When you have a pandemic or an epidemic at some point in evolution, the population that is targeted by the virus either adapts, or goes extinct. We knew that, but what really surprised us is the strength and clarity of the pattern we found," said David Enard, Ph.D., a postdoctoral fellow at Stanford University and the study's first author. "This is the first time that viruses have been shown to have such a strong impact on adaptation."

    The study was recently published in the journal eLife and will be presented at The Allied Genetics Conference, a meeting hosted by the Genetics Society of America, on July 14.

    Proteins perform a vast array of functions that keep our cells ticking. By revealing how small tweaks in protein shape and composition have helped humans and other mammals respond to viruses, the study could help researchers find new therapeutic leads against today's viral threats.

    "We're learning which parts of the cell have been used to fight viruses in the past, presumably without detrimental effects on the organism," said the study's senior author, Dmitri Petrov, Ph.D., Michelle and Kevin Douglas Professor of Biology and Associate Chair of the Biology Department at Stanford. "That should give us an insight on the pressure points and help us find proteins to investigate for new therapies."

    Previous research on the interactions between viruses and proteins has focused almost exclusively on individual proteins that are directly involved in the immune response -- the most logical place you would expect to find adaptations driven by viruses. This is the first study to take a global look at all types of proteins.

    "The big advancement here is that it's not only very specialized immune proteins that adapt against viruses," said Enard. "Pretty much any type of protein that comes into contact with viruses can participate in the adaptation against viruses. It turns out that there is at least as much adaptation outside of the immune response as within it."

    The team's first step was to identify all the proteins that are known to physically interact with viruses. After painstakingly reviewing tens of thousands of scientific abstracts, Enard culled the list to about 1,300 proteins of interest. His next step was to build big-data algorithms to scour genomic databases and compare the evolution of virus-interacting proteins to that of other proteins.

    The results revealed that adaptations have occurred three times as frequently in virus-interacting proteins compared with other proteins.

    "We're all interested in how it is that we and other organisms have evolved, and in the pressures that made us what we are," said Petrov. "The discovery that this constant battle with viruses has shaped us in every aspect -- not just the few proteins that fight infections, but everything -- is profound. All organisms have been living with viruses for billions of years this work shows that those interactions have affected every part of the cell."

    Viruses hijack nearly every function of a host organism's cells in order to replicate and spread, so it makes sense that they would drive the evolution of the cellular machinery to a greater extent than other evolutionary pressures such as predation or environmental conditions. The study sheds light on some longstanding biological mysteries, such as why closely-related species have evolved different machinery to perform identical cellular functions, like DNA replication or the production of membranes. Researchers previously did not know what evolutionary force could have caused such changes. "This paper is the first with data that is large enough and clean enough to explain a lot of these puzzles in one fell swoop," said Petrov.

    The team is now using the findings to dig deeper into past viral epidemics, hoping for insights to help fight disease today. For example, HIV-like viruses have swept through the populations of our ancestors as well as other animal species at multiple points throughout evolutionary history. Looking at the effects of such viruses on specific populations could yield a new understanding of our constant war with viruses -- and how we might win the next big battle.

    Influenza, commonly known as the flu, is caused by a virus that attacks the upper respiratory tract (i.e., the nose, the throat and the lungs). Cold and dry weather allows the virus to survive longer outside the body than in warm weather. Therefore, in temperate regions like North America, when we are planning to enjoy Halloween, Thanksgiving, or Christmas, it is also the time when we or our family members have a higher chance of getting the flu.

    There are three types of influenza virus: A, B and C. Type A can infect humans, other mammals and birds and can spread fast and affect many people. Types B and C affect only humans and type C causes only a mild infection. Influenza type A viruses are sub-typed into two categories based on proteins, specifically the proteins hemagglutinin and neuraminidase, on the surface of the virus. The virus uses the hemagglutinin protein (often abbreviated "H" or "HA") to latch on to the host's cell and uses the neuramidase protein (often abbreviated "N" or "NA") to spread the infection. Types A and B viruses continually evolve genetically, with changes being made to the amino acid sequence of the H and N proteins. Since hosts recognize the H and N surface proteins to identify and attack the virus, by changing these proteins a little bit the virus prevents the hosts from enjoying any prolonged protection against the virus.

    When a person is vaccinated with the influenza vaccine, it should stimulate a protective immune response, particularly against the viral surface proteins in the viral strains used to make the specific vaccine. The influenza vaccine typically contains three virus strains, two are subtypes of type A and one is of type B. Type C is not included in the vaccine because it only causes a mild illness and does not lead to epidemics. To make the influenza vaccine, gene fragments that encode the H and N viral surface proteins are used from each strain. For the vaccine to give a person good protection against the virus, the protein sequences for the H and N proteins that are used in the vaccine should closely match the sequences in the strains the person may be exposed to. Every February, the World Health Organization (WHO), based on the analysis of various laboratories across the globe, will decide what influenza virus strains to include in the vaccine for the new year.

    How can scientists check that the protein sequence of the H and N proteins used in the vaccine match the ones in the virus strains they want to protect people against? If you imagine that you can hold the H or N protein with both hands and stretch it out, you will then have a linear protein sequence in your hands. A protein sequence is made up of amino acids. Unlike the English alphabet, which has 26 letters, there are 20 standard amino acids that can be used to "spell" a protein. In English, it is easy to align two words and compare their spellings. Even so, there is often more than one possible alignment, as shown in Figures 1 and 2. In Figure 1, one possible alignment of the words "strawberry" and "blueberry" is shown, where the only matching letter, "r," is highlighted in red.

    s t r a w b e r r y
    b l u e b e r r y _
    Figure 1. One possible alignment of the words "strawberry" and "blueberry," showing the matching single letter "r" in this alignment highlighted in red.

    In Figure 2, another possible alignment of these words is shown, where several matching letters, spelling "berry,"

    s t r a w b e r r y
    _ b l u e b e r r y
    Figure 2. A second possible alignment of the words "strawberry" and "blueberry," showing the matching letters "berry" in this alignment highlighted in red.

    For the words "strawberry" and "blueberry," the alignment in Figure 2 clearly gives us a greater number of matched letters between these words. Similarly, you can take two protein sequences and compare if their spelling is alike this is called sequence alignment in bioinformatics.

    The alignment example is simple enough that we can do it manually. However, when we want to align two protein sequences, they can be over 100 letters long and consequently it is much more difficult and more time consuming to do it manually. Luckily, bioinformatics comes to the rescue. Bioinformatics is the collection and analysis of large amount of biological data using computers and computational/statistical methods.

    A powerful Internet-based bioinformatics tool for aligning sequences is BLAST, which stands for Basic Local Alignment Search Tool. It aligns your query sequence of interest to a collection of sequences stored in the database, or to a specific second sequence you are interested in. It compares the results, telling you which sequences or segments are similar to your query sequence.

    All else being equal, we would expect that a strong match between the protein sequences for the H and/or N proteins used in the vaccine virus and the corresponding sequences in the "wild" virus to result in good protection against that virus. On the other hand, a poor match would result in weak protection against the virus. But to create a strong match, the WHO would need to accurately predict which strains people should be vaccinated against for the upcoming flu season. Is the prediction always accurate? How often is there a good match, and how often does the prediction fail and the vaccine does not give good protection against the common strains of the season? In this genetics and genomics science project, you will use BLAST to measure the quality of the match and estimate the effectiveness of a vaccine against different viruses.

    Epidemiology in the Virology Laboratory

    During the outbreak in 1993, definitive proof that the agent causing HPS was a novel hantavirus was obtained using a genetic detection assay. Oligonucleotide primers were designed on the basis of regions of the M segment (G2 coding region) conserved among hantaviruses and were used in a nested RT-PCR assay to amplify hantavirus-specific DNA fragments from RNA extracted from the tissues of patients. The amplified DNA fragments were then sequenced. Comparative and phylogenetic analyses of derived sequence data demonstrated that the hantavirus associated with the HPS outbreak (SNV) was a novel virus most closely related to Prospect Hill virus (PHV). In addition, a direct genetic link was made between the human HPS cases and the virus harbored byperidomestic P. maniculatus rodents. Characterization of hantaviral genetic sequences recovered from human tissues demonstrated that these sequences were identical to those from rodents captured at the site of the patient&rsquos presumed infection. This characterization has continued to facilitate identification of the site of infection when more than one such site exists and therefore focus the public health response. These techniques also allow implication of a specific rodent host in areas of overlapping hosts.

    Vaccine vs Spike Protein

    Given how crucial the spike protein is to the virus, many antiviral vaccines or drugs are targeted to viral glycoproteins.

    For SARS-CoV-2, the vaccines produced by Pfizer/BioNTech and Moderna give instructions to our immune system to make our own version of the spike protein, which happens shortly following immunisation. Production of the spike inside our cells then starts the process of protective antibody and T cell production.

    The SARS-CoV-2 virus is changing over time. NIAID-RML, CC BY

    One of the most concerning features of the spike protein of SARS-CoV-2 is how it moves or changes over time during the evolution of the virus. Encoded within the viral genome, the protein can mutate and changes its biochemical properties as the virus evolves.

    Most mutations will not be beneficial and either stop the spike protein from working or have no effect on its function. But some may cause changes that give the new version of the virus a selective advantage by making it more transmissible or infectious.

    One way this could occur is through a mutation on a part of the spike protein that prevents protective antibodies from binding to it. Another way would be to make the spikes “stickier” for our cells.

    This is why new mutations that alter how the spike functions are of particular concern – they may impact how we control the spread of SARS-CoV-2. The new variants found in the UK and elsewhere have mutations across spike and in parts of the protein involved in getting inside your cells.

    Experiments will have to be conducted in the lab to ascertain if – and how – these mutations significantly change the spike, and whether our current control measures remain effective.


    Connor Bamford is a Research Fellow, Virology, Queen’s University Belfast. Bamford is a virologist with over a decade of experience in studying how the immune system defends humans and other animals against disease-causing microbes like viruses, such as the hepatitis C virus, influenza virus and Zika virus. Bamford recently moved to Queen’s University Belfast as a ‘Wellcome Trust Institutional Strategic Support Fund (ISSF) Early Career Research Fellow’ to continue his research into the human immune system and antiviral proteins called ‘interferons’. He obtained his PhD in 2014 in molecular virology studying the mumps virus before carrying out his postdoctoral research at the MRC-University of Glasgow Centre for Virus Research (CVR) in Scotland, UK.