A Look Into Bioinformatics

Sahasra Pokkunuri
6 min readDec 2, 2020
(Image by iStock)

Imagine that your family had a history of muscular dystrophy, a fatal condition characterized by the weakening of the muscles and is more common in men. If you had a male child, you might be concerned for his health and would want confirmation to determine whether he has muscular dystrophy.

How would you do this? Well, the answer lies in bioinformatics.

To answer the question, bioinformatics allows your DNA to be sequenced and also compared to the DNA of a healthy individual to monitor any changes. If a specialist detects a change linked to muscular dystrophy, there is a chance that the individual being “tested” can display symptoms of this condition. Although it’s crazy to think about, bioinformatics has been a reality since the 1970s!

Sequencing Methods in Bioinformatics

Bioinformatics is the usage of software tools to understand and interpret biological and genomic data. With respect to the situation above, manually sequencing anyone’s DNA would take years of grueling work, which would also mean higher expenses. With the help of advancing machinery, the human genome can now be sequenced for roughly $100, which is a huge difference compared to the Human Genome Project that completed in 2003.

The machinery that’s currently in use for genome sequencing is clearly effective; not only has it astronomically reduced costs, but it’s also more time-effective. Before we look into the recent technologies used to sequence DNA, we first need to understand the techniques used by these devices. In bioinformatics, there are two important types of sequencing methods: Sanger sequencing and next-generation sequencing.

Sanger Sequencing

In any sequencing method, we first need to identify an area of interest. During Sanger sequencing, the targeted area will have a primer next to it, which immediately prompts polymerase to start adding complementary base pairs to the target DNA strand (similar to DNA replication). However, this process can’t continue since the nucleotide base at the end of the target area needs to be identified.

To stop the process, Sanger (the scientist who created this method) removed an oxygen atom from the ending ribonucleotide, causing it to become a dideoxynucleotide. This will also prevent polymerase from adding additional base pairs since the process was terminated entirely. This repeated process results in several “extensions” of DNA that have a fluorescently-marked chain terminating nucleotide at the 3' end.

However, we only need the target sequence to be identified, rather than the additional base pairs brought on by the polymerase at the beginning of this process. Therefore, the generated extensions will be separated through capillary electrophoresis (CE). With an electrical current, these extension products are pushed through a glass column with a gel polymer. An electrical field is also added to the glass column such that the negatively-charged fragments will migrate to the positive end, and vice versa.

When the extension products reach the end of the column, a laser will generate a beam that excites the flourescent dye of the nucleotides in the DNA fragment. Once a light sensor identifies the particular wavelength emitted from that specific base, bioinformaticists can use software to identify the nucleotide. When this process is complete such that four chain terminating nucleotides of A, G, T, and C had this process occur, the result is a mixed-up group of DNA fragments with identified bases. With the use of additional software, we can order the sequence and view its respective electropherogram.

A sequencing chromatogram comparing a patient’s DNA with their mother’s (Image by Walter Hunziker).
A diagram of capillary electrophoresis (Image by Michel G. Gauthier).

Next-Generation Sequencing (NGS)

The term “next-generation sequencing” is used to refer to any modern sequencing technology that uses massive parallel sequencing. Although it bears some resemblance to the Sanger method, there are major differences that make NGS more cost-efficient and time-saving.

Firstly, NGS allows for bioinformaticists to analyze several million DNA sequences at once; unlike Sanger sequencing, this method doesn’t look at single sequences at a time and thus takes less time. Despite the promise of next-generation sequencing, however, it’s yet to become a “conventional reality.”

Massive parallel sequencing is not commonly used in laboratories since the design/workflow of these devices is vastly unlike conventional DNA sequencing methods, which makes some people feel suspicious. The usage of NGS in crime units is a pretty taboo concept right now, but this might change sometime in the future!

An image of genes with flourescent markers from next-generation sequencing (Image by iStock).

Technologies Used in Bioinformatics

Bioinformatics is the interdisciplinary combination of software tools to understand biological data, so technology is obviously involved somewhere along the way. Here are some of the key technological components in this field, as well as its purpose and recently developed examples:

Software

Whether it be to analyze an unidentifiable DNA sequence or remain as a repository for the sequences of specific people, software plays a significant role in bioinformatics.

After the DNA fragments go through capillary electrophoresis in Sanger sequencing, the obtained data will be useless and just shows several unlabeled graphs. Therefore, we would need some software to analyze these graphs and assign a nucleotide base based on certain characteristics. In this situation, machine learning would be especially helpful with regards to identifying prominent features of chromatograms for adenine, thymine, cytosine, and guanine. There are several apps on the market, such as 4Peaks (Mac) and Chromas Lite (Windows) that make this process more convenient.

Software in bioinformatics can also be used as a storage unit for sequences used for medical reasons. Let’s say you’re an oncologist who wants to conduct some research on the possible inhibition of leukemia, and you’re not sure where to start. Luckily, there are several domains with growing numbers of DNA sequences of individuals with leukemia, allowing you to perform research with much more ease. These “DNA hubs,” as I like to call them, are usually developed by large research facilities with similar goals/ideas as your oncology research.

(Image by Adobe Stock)

Devices

Sequencing DNA manually is much too costly and time-consuming; the general advancement of technology has also brought major gifts to the bioinformatics industry, so scientists often rely on biological processes that involve machinery to sequence DNA. These DNA sequencers use both the Sanger sequencing or next-generation sequencing method to derive a DNA sequence that can be read on a text file. In 1987, Lloyd Smith invented the first DNA sequencer at Applied Biosystems, helping to kickstart a revolutionary change in the way DNA is sequenced.

Notable DNA sequencers include:

  • Roche
  • Illumina
  • Life Technologies
An image of DNA sequencers (Image by Applied Biosystems).

Key Takeaways

  • Bioinformatics is the field that utilizes technological aspects to analyze biological data.
  • With respect to genome sequencing, bioinformatics uses two critical methods to sequence DNA: Sanger sequencing and next-generation sequencing.
  • The forms of technology involved in bioinformatics includes software (to either store genetic data or analyze it) and devices (particularly DNA sequencers).

--

--

Sahasra Pokkunuri

I’m 17 and like writing and reading, but more writing.