Effect of a doubling of the start codon in a gene

I am learning about frameshift mutations. Frameshifts can occur due to a nucleotide deletion. Suppose that due to a frameshift, because of a deletion somewhere upstream from the original start codon, two additional start codons are generated, just before the stop codon in the new reading frame. What would happen in terms of translation?

AUG-GCC-AUA-AUG--------UAA Start Start then stop

There is a basic misconception in the question you have asked, which @biogirl has explained. There is only one start Codon in any mRNA and it defines the open reading frame.

All other AUGs in the open reading frame are simply codons that encode for the Amino Acid Methionine and have no function in the start of translation. There are factors other than AUG that determine the start of translation.

So a frame shift that gives you an additional AUG only means that you will have a different Amino Acid encoded for in the resulting polypeptide. A frame shift will generally completely alter the protein product of the gene. If however the frameshift does disrupt the start codon, then it is unlikely that you will have any translation what-so-ever, as the other elements necessary for determining the start of translation will likely not be present in other areas of the coding sequence. In prokaryotes, you need a Shine-Delgarno sequence to initiate translation, and in Eukaryotes, though all of the factors for translation start are not well understood many genes carry a Kozak sequence that indicates to the ribosome the start of the open reading frame.

The more important codons to look for are introductions of stop codons. These three codons, UAA, UAG, and UGA do not have tRNAs with complementary anticodons (for the most part, as tRNA genes can also sustain mutations that change their anticodon) and therefore all result in the termination of translation if the shifted frame results in the ribosome reading one of the three stop codons in frame.

AUG functions as a start codon only when it is at the 1st position of the open reading frame. Whenever AUG is present in between, it codes for methionine amino acid. Go through the basics of translation from a good book.

The start codon is not sufficient to start translation. A ribosomal binding site is also required. It's likely that translation would initiate at its normal location and then simply proceed through any additional start codons in the new ORF until it reaches the first stop codon.

TITER: predicting translation initiation sites by deep learning

Motivation: Translation initiation is a key step in the regulation of gene expression. In addition to the annotated translation initiation sites (TISs), the translation process may also start at multiple alternative TISs (including both AUG and non-AUG codons), which makes it challenging to predict TISs and study the underlying regulatory mechanisms. Meanwhile, the advent of several high-throughput sequencing techniques for profiling initiating ribosomes at single-nucleotide resolution, e.g. GTI-seq and QTI-seq, provides abundant data for systematically studying the general principles of translation initiation and the development of computational method for TIS identification.

Methods: We have developed a deep learning-based framework, named TITER, for accurately predicting TISs on a genome-wide scale based on QTI-seq data. TITER extracts the sequence features of translation initiation from the surrounding sequence contexts of TISs using a hybrid neural network and further integrates the prior preference of TIS codon composition into a unified prediction framework.

Results: Extensive tests demonstrated that TITER can greatly outperform the state-of-the-art prediction methods in identifying TISs. In addition, TITER was able to identify important sequence signatures for individual types of TIS codons, including a Kozak-sequence-like motif for AUG start codon. Furthermore, the TITER prediction score can be related to the strength of translation initiation in various biological scenarios, including the repressive effect of the upstream open reading frames on gene expression and the mutational effects influencing translation initiation efficiency.

Availability and implementation: TITER is available as an open-source software and can be downloaded from .

Contact: [email protected] or [email protected]

Supplementary information: Supplementary data are available at Bioinformatics online.

Ameisen JC: The origin of programmed cell death. Science 272: 1278–1279 (1996).

Cavener DR: Comparison of the consensus sequence flanking translational start site in Drosophila and vertebrates. Nucl Acids Res 15, 1353–1361 (1987).

Gallie DR, Sleat DE, Watts JW, Turner PC, Wilson TMA: A comparison of eukaryotic viral 50-leader sequences as enhancers of mRNA expression in vitro. Nucl Acids Res 15: 8693– 8711 (1987).

Gallie DR: Translational control of cellular and viral mRNAs. Plant Mol Biol 32: 145–158 (1996).

Guerineau F, Lucy A, Mullineaux P: Effect of two consensus sequences preceding the translation initiator codon on gene expression in plant protoplasts. Plant Mol Biol 18: 815–818 (1992).

Heidecker G, Messing J: Structural analysis of plant genes. Annu Rev Plant Physiol 37: 439–466 (1986).

Joshi CP: An inspection of the domain between putative TATA box and translation start site in 79 plant genes. Nucl Acids Res 15: 6643–6653 (1987).

Joshi CP: Putative polyadenylation signals in nuclear genes of higher plants: a compilation and analysis. Nucl Acids Res 15: 9627–9640 (1987).

Joshi CP, Nguyen HT: 5′ Untranslated leader sequences of eukaryotic mRNAs encoding heat shock induced proteins. Nucl Acids Res 23: 541–549 (1995).

Kaiser J: First global sequencing effort begins. Science 274: 30 (1996).

Kozak M: Point mutations define a sequence flanking the AUG initiator codon that modulates translation by eukaryotic ribosomes. Cell 44: 283–292 (1986).

Kozak M: An analysis of 5′-noncoding sequences from 699 vertebrate messenger RNAs. Nucl Acids Res 15, 8125–8148 (1987).

Kozak M: At least six nucleotides preceding the AUG initiator codon enhance translation in mammalian cells. J Mol Biol 196: 947–950 (1987).

Kozak M: Context effects and (inefficient) initiation at non-AUG codons in eukaryotic cell-free translation systems. Mol Cell Biol 9: 5073–5080 (1989).

Kozak M: An analysis of vertebrate mRNA sequences: intimations of translational control. J Cell Biol 115: 887–903 (1991).

Luehrsen KR, Walbot V: The impact of AUG start codon context on maize gene expression in vivo. Plant Cell Rep 13: 454–458 (1994).

Lutcke HA, Chow KC, Mickel FS, Moss KA, Kern HF, Scheele GA: Selection of AUG initiation codons differs in plants and animals. EMBO J 6: 43–48 (1987).

Putterill JJ, Gardner RC: Initiation of translation of the β glucuronidase reporter gene at internal AUG codon in plant cells. Plant Sci 62: 199–205 (1989).

Rogers SG, Fraley RT, Horsch RB, Levine AD, Flick JS, Brand LA, Fink CL, Mozer T, O'Connell K, Sanders PR: Evidence for ribosome scanning during translation initiation of mRNAs in transformed plant cells. Plant Mol Biol Rep 3: 111–116 (1985).

Sleat DE, Gallie DR, Jefferson RA, Bevan MW, Turner PC, Wilson TMA: Characterization of the 50-leader sequence of tobacco mosaic virus RNA as a general enhancer of translation in vitro. Gene 217: 217–225 (1987).

Taylor JL, Jones JDG, Sandler S, Mueller GM, Bedbrook J, Dunsmuir P: Optimizing the expression of chimeric genes in plant cells. Mol Gen Genet 210: 572–577 (1987).

Yamauchi K: The sequence flanking translational initiation site in protozoa. Nucl Acids Res 19: 2715–2720 (1991).


Our finding that replacing the AUA start codon of nAuORF2 with the non-cognate triplet AAA abolished β-galactosidase production from the nAuORF-lacZ construct supports the conclusion reached from ribosome profiling ( 17 ) that the AUA start codon of GCN4 nAuORF2 is recognized in vivo . At the same time, it suggests that the UUG start codon of nAuORF1 is utilized very poorly, if at all, as a start codon under the conditions of our experiments. These conclusions are consistent with the fact that the sequence context of the nAuORF2 start codon, A −3 A −2 A −1 AUA U +4 , conforms well to the preferred sequence context defined recently by Chen et al . for a naturally occurring UUG initiation codon at the yeast GRS1 gene, of A −3 A −2 (A/G) −1 UUG A +4 , with the A at −3 exerting the greatest effect and the A at +4 the least effect on initiation frequency ( 35 ). In contrast, the sequence context of the GCN4 nAuORF1 start codon, U −3 U −2 U −1 UUG C +4 , diverges at all four positions flanking the UUG from the consensus sequence proposed by Chen et al .

Using the ribosome occupancy data of Ingolia et al . ( 17 ), we estimated that the average ribosome density in nAuORF2 is ∼5-fold higher than that of nAuORF1 under the starvation conditions employed in their study. If we equate average ribosome density with translation rate, and noting that the nAuORF1 CUC -lacZ reporter (lacking the nAuORF1 start codon) conferred 25 units of β-galactosidase in histidine-starved cells, then we might expect to have observed ∼5 units of β-galactosidase (25/5 units) expressed from the nAuORF2 AAA -lacZ reporter (lacking the nAuORF2 start codon) in 3-AT-treated cells, resulting from translation of the nAuORF1-lacZ fusion. However, <1 unit of activity could be attributed to initiation at the UUG start codon of nAuORF1, calculated as the difference in expression between the nAuORF2 AAA -lacZ and nAuORF CUC,AAA -lacZ reporters (1.3−1.0 units). To explain our inability to detect translation of nAuORF1 in histidine-limited cells, it could be proposed that the fusion of lacZ coding sequences to the nAuORF altered the structure of the nAuORF1 initiation region in a manner that impairs recognition of the UUG start codon without similarly reducing recognition of the AUA initiation site at nAuORF2. This seems unlikely considering that the fusion junction is ∼100 nt downstream of the nAuORF1 start codon and only ∼25 nt 3′ of the nAuORF2 initiation site. Alternatively, it is possible that the fusion of lacZ sequences activates recognition of the nAuORF2 start codon in a manner that does not occur at the nAuORF1 start site further upstream. This might occur if the 5′-end of lacZ sequences form a structure that evokes ribosome pausing specifically in the initiation region of nAuORF2. This mechanism also seems unlikely, however, as Kozak demonstrated that the distance between the start codon and the base of a secondary structure able to compensate for a poor initiation sequence context must be ≤14 nt—the approximate distance between the leading edge of the ribosome and the start codon positioned in the ribosomal P-site ( 36 ). Thus, the junction with lacZ sequences in our nAuORF-lacZ fusion is probably located too far downstream (∼25 nt) from the AUA start codon to activate nAuORF2 translation by this pausing mechanism although we cannot rule out the possibility that lacZ sequences base pair with GCN4 sequences located just downstream of the AUA start codon to form the requisite structure.

Another discrepancy between our results using lacZ reporters and the ribosome profiling data of Ingolia et al . ( 17 ) concerns the relative translational rates of nAuORF2 and uORF1. Estimating the average ribosome densities of nAuORF2 and uORF1 from their profiling data suggests that the uORF1-lacZ fusion should be translated at a rate only ∼3.8-fold higher than that of the nAuORF1 CUC -lacZ reporter (lacking the nAuORF1 start codon) under starvation conditions, whereas the actual difference measured here for 3-AT treated cells is 15-fold ( Figure 2 B and C). The ribosome occupancy of nAuORF2 measured by Ingolia et al . is about 4.5-fold lower under non-starvation versus starvation conditions, whereas the occupancy of uORF1 is relatively higher in non-starved cells, leading to the prediction that the uORF1-lacZ fusion should be translated at a rate ∼20-fold higher than that of nAuORF1 CUC -lacZ in non-starved cells, which actually agrees well with our measurements under these conditions ( Figure 2 B and C). Thus, the main discrepancy between our data and that of Ingolia et al. regarding the relative translation rates of uORF1 versus nAuORF2 is that we observed only a small (∼1.7-fold) increase in translation initiation from the AUA start codon of nAuORF2 (the nAuORF1 CUC -lacZ reporter) in response to histidine starvation compared to the ∼4.5-fold increase observed in starved cells by ribosomal profiling. We also did not observe increased initiation at a UUG versus AUG start codon for a HIS4-lacZ fusion in response to histidine-limitation by 3-AT. Thus, the prediction made from ribosomal profiling data that the rate of initiation at non-AUG codons is considerably higher in starved versus non-starved cells probably should be treated with caution.

Although our results on the nAuORF-lacZ construct support the conclusion that nAuORF2 is translated in vivo , we did not observe any consequence of eliminating translation of this element by replacing its AUA start codon with the non-cognate AAA triplet. Neither complementation of the amino acid analog sensitivity of a gcn4Δ mutant, induction of native Gcn4 protein, or the regulated expression of a GCN4-lacZ reporter was detectably perturbed by the AUA-to-AAA replacement in nAuORF2, by the UUG-to-CUC replacement in the start codon of nAuORF1, or by the double mutation. Thus, it seems clear that nAuORFs 1 and 2 are both dispensable for wild-type repression of GCN4 mRNA translation in non-starvation conditions, and for derepression of GCN4 translation in response to histidine limitation imposed with 3-AT, nutritional shift-down of an amino acid auxotroph, or treatment with rapamycin, methyl methanesulfonate or hydrogen peroxide.

Considering the evidence presented here that nAuORF2 is translated under starvation conditions, it might seem surprising that eliminating its AUA start codon would have no detectable impact on GCN4 expression. However, a comparison of the amount of β-galactosidase produced by the nAuORF-lacZ fusion (∼25 U) to that given by the uORF1-lacZ (∼400 U) or the uORF-less GCN4-lacZ construct (∼700 U) in 3-AT-treated cells ( Figure 2 B and C) suggests that only a small fraction (∼5%) of the 43S complexes that can scan from the cap and initiate at the AUG of uORF1 or the GCN4 ORF, when present as the 5′-proximal AUG, are able to initiate at the AUA of nAuORF2. This implies, in turn, that ∼95% of the 43S complexes scanning from the cap will leaky-scan past the nAuORF2 AUA and continues downstream to uORF1, where they can engage in the regulated reinitiation process responsible for GCN4 translational control. Thus, even if the entire 5% of the scanning 43S complexes that translate nAuORF2 fail to resume scanning downstream, this would reduce the level of GCN4 translation by only 5%, which might be difficult to detect by western analysis of Gcn4 or assaying the GCN4-lacZ reporter.

A final interesting point to consider is that, besides the UUG and AUA start codons of nAuORFs 1 and 2, the GCN4 mRNA leader contains 7 other potential near-cognate start codons with a perfect consensus at the −1 to −3 positions as defined by Chen et al . ( 35 ). It is thus unclear why 80S ribosome occupancies comparable to those seen for nAuORF2 were not observed at any of these other locations by Ingolia et al . ( 17 ), particularly the A −3 A −2 A −1 AUU A +4 and A −3 A −2 A −1 AUC A +4 UU sequences present just upstream from uORF1 ( Figure 1 , −382 to −376, and −375 to −369). Perhaps the sequences immediately downstream from the AUA start codon of nAuORF2 produces a secondary structure that pauses the 43S complex with the AUA in the P-site, enhancing recognition of this particular near-cognate start codon in the GCN4 leader. The initiation at multiple near-cognate start codons in the 5′-UTRs of other yeast genes detected by Ingolia et al . might involve a similar mechanism.


Definition Edit

CpG is shorthand for 5'—C—phosphate—G—3' , that is, cytosine and guanine separated by only one phosphate group phosphate links any two nucleosides together in DNA. The CpG notation is used to distinguish this single-stranded linear sequence from the CG base-pairing of cytosine and guanine for double-stranded sequences. The CpG notation is therefore to be interpreted as the cytosine being 5 prime to the guanine base. CpG should not be confused with GpC, the latter meaning that a guanine is followed by a cytosine in the 5' → 3' direction of a single-stranded sequence.

Under-representation Edit

CpG dinucleotides have long been observed to occur with a much lower frequency in the sequence of vertebrate genomes than would be expected due to random chance. For example, in the human genome, which has a 42% GC content, [4] a pair of nucleotides consisting of cytosine followed by guanine would be expected to occur 0.21 × 0.21 = 4.41 % of the time. The frequency of CpG dinucleotides in human genomes is less than one-fifth of the expected frequency. [5] This underrepresentation is a consequence of the high mutation rate of methylated CpG sites: the spontaneously occurring deamination of a methylated cytosine results in a thymine, and the resulting G:T mismatched bases are often improperly resolved to A:T whereas the deamination of unmethylated cytosine results in a uracil, which as a foreign base is quickly replaced by a cytosine by the base excision repair mechanism. The C to T transition rate at methylated CpG sites is

10 fold higher than at unmethylated sites. [6] [7] [8] [9]

Genomic distribution Edit

CpG sites GpC sites
Distribution of CpG sites (left: in red) and GpC sites (right: in green) in the human APRT gene. CpG are more abundant in the upstream region of the gene, where they form a CpG island, whereas GpC are more evenly distributed. The 5 exons of the APRT gene are indicated (blue), and the start (ATG) and stop (TGA) codons are emphasized (bold blue).

CpG dinucleotides frequently occur in CpG islands (see definition of CpG islands, below). There are 28,890 CpG islands in the human genome, (50,267 if one includes CpG islands in repeat sequences). [10] This is in agreement with the 28,519 CpG islands found by Venter et al. [11] since the Venter et al. genome sequence did not include the interiors of highly similar repetitive elements and the extremely dense repeat regions near the centromeres. [12] Since CpG islands contain multiple CpG dinucleotide sequences, there appear to be more than 20 million CpG dinucleotides in the human genome.

CpG islands (or CG islands) are regions with a high frequency of CpG sites. Though objective definitions for CpG islands are limited, the usual formal definition is a region with at least 200 bp, a GC percentage greater than 50%, and an observed-to-expected CpG ratio greater than 60%. The "observed-to-expected CpG ratio" can be derived where the observed is calculated as: ( number of C p G s ) >CpGs)> and the expected as ( number of C ∗ number of G ) / length of sequence >C*< ext>G)/< ext>> [13] or ( ( number of C + number of G ) / 2 ) 2 / length of sequence >C+< ext>G)/2)^<2>/< ext>> . [14]

Many genes in mammalian genomes have CpG islands associated with the start of the gene [15] (promoter regions). Because of this, the presence of a CpG island is used to help in the prediction and annotation of genes.

In mammalian genomes, CpG islands are typically 300–3,000 base pairs in length, and have been found in or near approximately 40% of promoters of mammalian genes. [16] Over 60% of human genes and almost all house-keeping genes have their promoters embedded in CpG islands. [17] Given the frequency of GC two-nucleotide sequences, the number of CpG dinucleotides is much lower than would be expected. [14]

A 2002 study revised the rules of CpG island prediction to exclude other GC-rich genomic sequences such as Alu repeats. Based on an extensive search on the complete sequences of human chromosomes 21 and 22, DNA regions greater than 500 bp were found more likely to be the "true" CpG islands associated with the 5' regions of genes if they had a GC content greater than 55%, and an observed-to-expected CpG ratio of 65%. [18]

CpG islands are characterized by CpG dinucleotide content of at least 60% of that which would be statistically expected (

4–6%), whereas the rest of the genome has much lower CpG frequency (

1%), a phenomenon called CG suppression. Unlike CpG sites in the coding region of a gene, in most instances the CpG sites in the CpG islands of promoters are unmethylated if the genes are expressed. This observation led to the speculation that methylation of CpG sites in the promoter of a gene may inhibit gene expression. Methylation, along with histone modification, is central to imprinting. [19] Most of the methylation differences between tissues, or between normal and cancer samples, occur a short distance from the CpG islands (at "CpG island shores") rather than in the islands themselves. [20]

CpG islands typically occur at or near the transcription start site of genes, particularly housekeeping genes, in vertebrates. [14] A C (cytosine) base followed immediately by a G (guanine) base (a CpG) is rare in vertebrate DNA because the cytosines in such an arrangement tend to be methylated. This methylation helps distinguish the newly synthesized DNA strand from the parent strand, which aids in the final stages of DNA proofreading after duplication. However, over time methylated cytosines tend to turn into thymines because of spontaneous deamination. There is a special enzyme in humans (Thymine-DNA glycosylase, or TDG) that specifically replaces T's from T/G mismatches. However, due to the rarity of CpGs, it is theorised to be insufficiently effective in preventing a possibly rapid mutation of the dinucleotides. The existence of CpG islands is usually explained by the existence of selective forces for relatively high CpG content, or low levels of methylation in that genomic area, perhaps having to do with the regulation of gene expression. A 2011 study showed that most CpG islands are a result of non-selective forces. [21]

CpG islands in promoters Edit

In humans, about 70% of promoters located near the transcription start site of a gene (proximal promoters) contain a CpG island. [2] [3]

Distal promoter elements also frequently contain CpG islands. An example is the DNA repair gene ERCC1, where the CpG island-containing element is located about 5,400 nucleotides upstream of the transcription start site of the ERCC1 gene. [22] CpG islands also occur frequently in promoters for functional noncoding RNAs such as microRNAs. [23]

Methylation of CpG islands stably silences genes Edit

In humans, DNA methylation occurs at the 5 position of the pyrimidine ring of the cytosine residues within CpG sites to form 5-methylcytosines. The presence of multiple methylated CpG sites in CpG islands of promoters causes stable silencing of genes. [24] Silencing of a gene may be initiated by other mechanisms, but this is often followed by methylation of CpG sites in the promoter CpG island to cause the stable silencing of the gene. [24]

Promoter CpG hyper/hypo-methylation in cancer Edit

In cancers, loss of expression of genes occurs about 10 times more frequently by hypermethylation of promoter CpG islands than by mutations. For example, in a colorectal cancer there are usually about 3 to 6 driver mutations and 33 to 66 hitchhiker or passenger mutations. [25] In contrast, in one study of colon tumors compared to adjacent normal-appearing colonic mucosa, 1,734 CpG islands were heavily methylated in tumors whereas these CpG islands were not methylated in the adjacent mucosa. [26] Half of the CpG islands were in promoters of annotated protein coding genes, [26] suggesting that about 867 genes in a colon tumor have lost expression due to CpG island methylation. A separate study found an average of 1,549 differentially methylated regions (hypermethylated or hypomethylated) in the genomes of six colon cancers (compared to adjacent mucosa), of which 629 were in known promoter regions of genes. [27] A third study found more than 2,000 genes differentially methylated between colon cancers and adjacent mucosa. Using gene set enrichment analysis, 569 out of 938 gene sets were hypermethylated and 369 were hypomethylated in cancers. [28] Hypomethylation of CpG islands in promoters results in overexpression of the genes or gene sets affected.

One 2012 study [29] listed 147 specific genes with colon cancer-associated hypermethylated promoters, along with the frequency with which these hypermethylations were found in colon cancers. At least 10 of those genes had hypermethylated promoters in nearly 100% of colon cancers. They also indicated 11 microRNAs whose promoters were hypermethylated in colon cancers at frequencies between 50% and 100% of cancers. MicroRNAs (miRNAs) are small endogenous RNAs that pair with sequences in messenger RNAs to direct post-transcriptional repression. On average, each microRNA represses several hundred target genes. [30] Thus microRNAs with hypermethylated promoters may be allowing over-expression of hundreds to thousands of genes in a cancer.

The information above shows that, in cancers, promoter CpG hyper/hypo-methylation of genes and of microRNAs causes loss of expression (or sometimes increased expression) of far more genes than does mutation.

DNA repair genes with hyper/hypo-methylated promoters in cancers Edit

DNA repair genes are frequently repressed in cancers due to hypermethylation of CpG islands within their promoters. In head and neck squamous cell carcinomas at least 15 DNA repair genes have frequently hypermethylated promoters these genes are XRCC1, MLH3, PMS1, RAD51B, XRCC3, RAD54B, BRCA1, SHFM1, GEN1, FANCE, FAAP20, SPRTN, SETMAR, HUS1, and PER1. [31] About seventeen types of cancer are frequently deficient in one or more DNA repair genes due to hypermethylation of their promoters. [32] As an example, promoter hypermethylation of the DNA repair gene MGMT occurs in 93% of bladder cancers, 88% of stomach cancers, 74% of thyroid cancers, 40%-90% of colorectal cancers and 50% of brain cancers. Promoter hypermethylation of LIG4 occurs in 82% of colorectal cancers. Promoter hypermethylation of NEIL1 occurs in 62% of head and neck cancers and in 42% of non-small-cell lung cancers. Promoter hypermethylation of ATM occurs in 47% of non-small-cell lung cancers. Promoter hypermethylation of MLH1 occurs in 48% of non-small-cell lung cancer squamous cell carcinomas. Promoter hypermethylation of FANCB occurs in 46% of head and neck cancers.

On the other hand, the promoters of two genes, PARP1 and FEN1, were hypomethylated and these genes were over-expressed in numerous cancers. PARP1 and FEN1 are essential genes in the error-prone and mutagenic DNA repair pathway microhomology-mediated end joining. If this pathway is over-expressed the excess mutations it causes can lead to cancer. PARP1 is over-expressed in tyrosine kinase-activated leukemias, [33] in neuroblastoma, [34] in testicular and other germ cell tumors, [35] and in Ewing's sarcoma, [36] FEN1 is over-expressed in the majority of cancers of the breast, [37] prostate, [38] stomach, [39] [40] neuroblastomas, [41] pancreatic, [42] and lung. [43]

DNA damage appears to be the primary underlying cause of cancer. [44] [45] If accurate DNA repair is deficient, DNA damages tend to accumulate. Such excess DNA damage can increase mutational errors during DNA replication due to error-prone translesion synthesis. Excess DNA damage can also increase epigenetic alterations due to errors during DNA repair. [46] [47] Such mutations and epigenetic alterations can give rise to cancer (see malignant neoplasms). Thus, CpG island hyper/hypo-methylation in the promoters of DNA repair genes are likely central to progression to cancer.

Methylation of CpG sites with age Edit

Since age has a strong effect on DNA methylation levels on tens of thousands of CpG sites, one can define a highly accurate biological clock (referred to as epigenetic clock or DNA methylation age) in humans and chimpanzees. [48]

Unmethylated sites Edit

Unmethylated CpG dinucleotide sites can be detected by Toll-like receptor 9 [49] (TLR 9) on plasmacytoid dendritic cells, monocytes, natural killer (NK) cells, and B cells in humans. This is used to detect intracellular viral infection.

In mammals, DNA methyltransferases (which add methyl groups to DNA bases) exhibit a sequence preference for cytosines within CpG sites. [50] In the mouse brain, 4.2% of all cytosines are methylated, primarily in the context of CpG sites, forming 5mCpG. [51] Most hypermethylated 5mCpG sites increase the repression of associated genes. [51]

As reviewed by Duke et al., neuron DNA methylation (repressing expression of particular genes) is altered by neuronal activity. Neuron DNA methylation is required for synaptic plasticity is modified by experiences and active DNA methylation and demethylation is required for memory formation and maintenance. [52]

In 2016 Halder et al. [53] using mice, and in 2017 Duke et al. [52] using rats, subjected the rodents to contextual fear conditioning, causing an especially strong long-term memory to form. At 24 hours after the conditioning, in the hippocampus brain region of rats, the expression of 1,048 genes was down-regulated (usually associated with 5mCpG in gene promoters) and the expression of 564 genes was up-regulated (often associated with hypomethylation of CpG sites in gene promoters). At 24 hours after training, 9.2% of the genes in the rat genome of hippocampus neurons were differentially methylated. However while the hippocampus is essential for learning new information it does not store information itself. In the mouse experiments of Halder, 1,206 differentially methylated genes were seen in the hippocampus one hour after contextual fear conditioning but these altered methylations were reversed and not seen after four weeks. In contrast with the absence of long-term CpG methylation changes in the hippocampus, substantial differential CpG methylation could be detected in cortical neurons during memory maintenance. There were 1,223 differentially methylated genes in the anterior cingulate cortex of mice four weeks after contextual fear conditioning.

Demethylation at CpG sites requires ROS activity Edit

In adult somatic cells DNA methylation typically occurs in the context of CpG dinucleotides (CpG sites), forming 5-methylcytosine-pG, or 5mCpG. Reactive oxygen species (ROS) may attack guanine at the dinucleotide site, forming 8-hydroxy-2'-deoxyguanosine (8-OHdG), and resulting in a 5mCp-8-OHdG dinucleotide site. The base excision repair enzyme OGG1 targets 8-OHdG and binds to the lesion without immediate excision. OGG1, present at a 5mCp-8-OHdG site recruits TET1 and TET1 oxidizes the 5mC adjacent to the 8-OHdG. This initiates demethylation of 5mC. [54]

As reviewed in 2018, [55] in brain neurons, 5mC is oxidized by the ten-eleven translocation (TET) family of dioxygenases (TET1, TET2, TET3) to generate 5-hydroxymethylcytosine (5hmC). In successive steps TET enzymes further hydroxylate 5hmC to generate 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC). Thymine-DNA glycosylase (TDG) recognizes the intermediate bases 5fC and 5caC and excises the glycosidic bond resulting in an apyrimidinic site (AP site). In an alternative oxidative deamination pathway, 5hmC can be oxidatively deaminated by activity-induced cytidine deaminase/apolipoprotein B mRNA editing complex (AID/APOBEC) deaminases to form 5-hydroxymethyluracil (5hmU) or 5mC can be converted to thymine (Thy). 5hmU can be cleaved by TDG, single-strand-selective monofunctional uracil-DNA glycosylase 1 (SMUG1), Nei-Like DNA Glycosylase 1 (NEIL1), or methyl-CpG binding protein 4 (MBD4). AP sites and T:G mismatches are then repaired by base excision repair (BER) enzymes to yield cytosine (Cyt).

Two reviews [56] [57] summarize the large body of evidence for the critical and essential role of ROS in memory formation. The DNA demethylation of thousands of CpG sites during memory formation depends on initiation by ROS. In 2016, Zhou et al., [54] showed that ROS have a central role in DNA demethylation.

TET1 is a key enzyme involved in demethylating 5mCpG. However, TET1 is only able to act on 5mCpG if an ROS has first acted on the guanine to form 8-hydroxy-2'-deoxyguanosine (8-OHdG), resulting in a 5mCp-8-OHdG dinucleotide (see first figure in this section). [54] After formation of 5mCp-8-OHdG, the base excision repair enzyme OGG1 binds to the 8-OHdG lesion without immediate excision. Adherence of OGG1 to the 5mCp-8-OHdG site recruits TET1, allowing TET1 to oxidize the 5mC adjacent to 8-OHdG, as shown in the first figure in this section. This initiates the demethylation pathway shown in the second figure in this section.

Altered protein expression in neurons, controlled by ROS-dependent demethylation of CpG sites in gene promoters within neuron DNA, is central to memory formation. [58]

CPG depletion has been observed in the process of DNA methylation of Transposable Elements (TEs) where TEs are not only responsible in the genome expansion but also CpG loss in a host DNA. TEs can be known as "methylation centers" whereby the methylation process, the TEs spreads into the flanking DNA once in the host DNA. This spreading might subsequently result in CPG loss over evolutionary time. Older evolutionary times show a higher CpG loss in the flanking DNA, compared to the younger evolutionary times. Therefore, the DNA methylation can lead eventually to the noticeably loss of CpG sites in neighboring DNA. [59]

Genome size and CPG ratio are negatively correlated Edit

Previous studies have confirmed the variety of genomes sizes amount species, where invertebrates and vertebrates have small and big genomes compared to humans. The genome size is strongly connected to the number of transposable elements. However, there is a correlation between the number of TEs methylation versus the CPG amount. This negative correlation consequently causes depletion of CPG due to intergenic DNA methylation which is mostly attributed to the methylation of TEs. Overall, this contributes to a noticeable amount of CPG loss in different genomes species. [59]

Alu elements as promoters of CPG loss Edit

Alu elements are known as the most abundant type of transposable elements. Some studies have used Alu elements as a way to study the idea of which factor is responsible for genome expansion. Alu elements are CPG-rich in a longer amount of sequence, unlike LINEs and ERVs. Alus can work as a methylation center, and the insertion into a host DNA can produce DNA methylation and provoke a spreading into the Flanking DNA area. This spreading is why there are a considerable amount CPG loss and a considerable increase in genome expansion. [59] However, this is a result that is analyzed over time because older Alus elements show more CPG loss in sites of neighboring DNA compared to younger ones.

Effect of a doubling of the start codon in a gene - Biology

The Central dogma of molecular biology describes the process of translation of a gene to a protein. Specific sequences of DNA act as a template to synthesize mRNA.

The start codon is the first codon of a messenger RNA (mRNA) transcript translated by a ribosome. The start codon always codes formethionine in eukaryotes and a modified Met (fMet) in prokaryotes. The most common start codon is AUG.

The start codon is often preceded by an untranslated region 5' UTR. In prokaryotes this includes the ribosome binding site.

Alternate start codons (non AUG) are very rare in eukaryotic genomes. Mitochondrial genomes and prokaryotes use alternate start codons more significantly (mainly GUG and UUG). For example E. coli uses 83% AUG (3542/4284), 14% (612) GUG, 3% (103) UUG [1] and one or two others (e.g., an AUU and possibly a CUG). [2] [3] Bioinformatics programs usually allow for alternate start codons when searching for protein coding genes. [ citation needed ]

Note that these alternate start codons are still translated as Met when they are at the start of a protein (even if the codon encodes a different amino acid otherwise). This is because a separate transfer RNA (tRNA) is used for initiation.

Well-known coding regions that do not have AUG initiation codons are those of lacI (GUG) [4] [5] and lacA (UUG) [6] in the E. coli lac operon.


The nucleotide sequence around the translational initiation site is an important cis-acting element for post-transcriptional regulation. However, it has not been fully understood how the sequence context at the 5′-untranslated region (5′-UTR) affects the translational efficiency of individual mRNAs. In this study, we provide evidence that the 5′-UTRs of Arabidopsis genes showing a great difference in the nucleotide sequence vary greatly in translational efficiency with more than a 200-fold difference. Of the four types of nucleotides, the A residue was the most favourable nucleotide from positions −1 to −21 of the 5′-UTRs in Arabidopsis genes. In particular, the A residue in the 5′-UTR from positions −1 to −5 was required for a high-level translational efficiency. In contrast, the T residue in the 5′-UTR from positions −1 to −5 was the least favourable nucleotide in translational efficiency. Furthermore, the effect of the sequence context in the −1 to −21 region of the 5′-UTR was conserved in different plant species. Based on these observations, we propose that the sequence context immediately upstream of the AUG initiation codon plays a crucial role in determining the translational efficiency of plant genes.

Progress and challenges in the biology of FNDC5 and irisin

In 2002, a transmembrane protein now known as FNDC5 was discovered and shown to be expressed in skeletal muscle, heart and brain. It was virtually ignored for 10 years, until a study in 2012 proposed that, in response to exercise, the ectodomain of skeletal muscle FNDC5 was cleaved,traveled to white adipose tissue and induced browning. The wasted energy of this browning raised the possibility that this myokine, named irisin, might mediate some beneficial effects of exercise. Since then, more than 1,000 papers have been published exploring the roles of irisin. A major interest has been on adipose tissue and metabolism, following up the major proposal from 2012. Many studies correlating plasma irisin levels with physiological conditions are questioned for use of flawed assays for irisin concentration. However, experiments altering irisin levels by injecting recombinant irisin or by gene knockout are more promising. Recent discoveries have suggested potential roles of irisin to bone remodeling and to brain, with effects potentially related to Alzheimer's disease. We also discuss some discrepancies between research groups and mechanisms that need to be determined. Some important questions raised in the initial discovery of irisin like the role of the mutant start codon of human FNDC5, the mechanism of ectodomain cleavage remain to be answered. Apart from these specific questions, a promising new tool has been developed - mice with a global or tissue-specific knockout of FNDC5. In this review, we critically examine the current knowledge and delineate potential solutions to resolve existing ambiguities.

Keywords: FNDC5 bone brain irisin metabolism myokine.

© The Author(s) 2021. Published by Oxford University Press on behalf of the Endocrine Society.

Presence of ATG triplets in 5' untranslated regions of eukaryotic cDNAs correlates with a 'weak' context of the start codon

Motivation: The context of the start codon (typically, AUG) and the features of the 5' Untranslated Regions (5' UTRs) are important for understanding translation regulation in eukaryotic mRNAs and for accurate prediction of the coding region in genomic and cDNA sequences. The presence of AUG triplets in 5' UTRs (upstream AUGs) might effect the initiation rate and, in the context of gene prediction, could reduce the accuracy of the identification of the authentic start. To reveal potential connections between the presence of upstream AUGs and other features of 5' UTRs, such as their length and the start codon context, we undertook a systematic analysis of the available eukaryotic 5' UTR sequences.

Results: We show that a large fraction of 5' UTRs in the available cDNA sequences, 15-53% depending on the organism, contain upstream ATGs. A negative correlation was observed between the information content of the translation start signal and the length of the 5' UTR. Similarly, a negative correlation exists between the 'strength' of the start context and the number of upstream ATGs. Typically, cDNAs containing long 5' UTRs with multiple upstream ATGs have a 'weak' start context, and in contrast, cDNAs containing short 5' UTRs without ATGs have 'strong' starts. These counter-intuitive results may be interpreted in terms of upstream AUGs having an important role in the regulation of translation efficiency by ensuring low basal translation level via double negative control and creating the potential for additional regulatory mechanisms. One of such mechanisms, supported by experimental studies of some mRNAs, includes removal of the AUG-containing portion of the 5' UTR by alternative splicing.

Watch the video: The different types of mutations. Biomolecules. MCAT. Khan Academy (January 2022).