6.14: Making a complete eukaryote - Biology

Up to this point we have touched on only a few of the ways that prokaryotes (bacteria and archaea) differ from eukaryote. In the case of the system associated with aerobic respiration, these systems are located in the inner membranes of a double-membrane bound cytoplasmic organelles known as mitochondria. Photosynthetic eukaryotes (algae and plants) have a second type of cytoplasmic organelle (in addition to mitochondria), known as chloroplasts. In fact, detailed analysis of the genes and proteins involved suggest that the electron transport/ATP synthesis systems of eukaryotic mitochondria are homologous to those of ɣ-proteobacteria while the light harvesting/reaction center complexes, electron transport chains and ATP synthesis proteins of photosynthetic eukaryotes (algae and plants) appear to be homologous to those of a second type of bacteria, the photosynthetic cyanobacteria188. How do we make sense of these observations?

Clearly when a eukaryotic cell divides it must have also replicated its mitochondria and chloroplasts, otherwise they would eventually be lost through dilution. In 1883, Andreas Schimper (1856-1901) noticed that chloroplasts divided independently of their host cells. Building on Schimper's observation, Konstantin Merezhkovsky (1855-1921) proposed that chloroplasts were originally independent organisms and that plant cells were chimeras, really two independent organisms living together. In a similar vein, in 1925 Ivan Wallin (1883-1969) proposed that the mitochondria of eukaryotic cells were derived from bacteria. This “endosymbiotic hypothesis” for the origins of eukaryotic mitochondria and chloroplasts fell out of favor, in large part because the molecular methods needed to unambiguously resolve there implications were not available. A breakthrough came with the work of Lynn Margulis (1938-2011) and was further bolstered when it was found that both the mitochondrial and chloroplast protein synthesis machineries were sensitive to drugs that inhibited bacterial but not eukaryotic protein synthesis. In addition, it was discovered that mitochondria and chloroplasts contained circular DNA molecules organized in a manner similar to the DNA molecules found in bacteria (we will consider DNA and its organization soon).

All eukaryotes appear to have mitochondria. Suggestions that some eukaryotes, such as the human anaerobic parasites Giardia intestinalis, Trichomonas vaginalis and Entamoeba histolytica189 do not failed to recognize cytoplasmic organelles, known as mitosomes, as degenerate mitochondria. Based on these and other data it is now likely that all eukaryotes are derived from an ancestor that engulfed an aerobic α-proteobacteria-like bacterium. Instead of being killed and digested, these (or even one) of these bacteria survived within the eukaryotic cell, replicated, and were distributed into the progeny cell when the parent cell divided. This process resulted in the engulfed bacterium becoming an endosymbiont, which over time became mitochondria. At the same time the engulfing cell became dependent upon the presence of the endosymbiont, initially to detoxify molecular oxygen, and then to utilize molecular oxygen as an electron acceptor so as to maximize the energy that could be derived from the break down of complex molecules. All eukaryotes (including us) are descended from this mitochondria-containing eukaryotic ancestor, which appeared around 2 billion years ago. The second endosymbiotic event in eukaryotic evolution occured when a cyanobacteria-like bacterium formed an relationship with a mitochondria-containing eukaryote. This lineage gave rise to the glaucophytes, the red and the green algae. The green algae, in turn, gave rise to the plants.

As we look through modern organisms there are a number of examples of similar events, that is, one organism becoming inextricably linked to another through endosymbiotic processes. There are also examples of close couplings between organisms that are more akin to parasitism rather then a mutually beneficial interaction (symbiosis)190. For example, a number of insects have intracellular bacterial parasites and some pathogens and parasites live inside human cells191. In some cases, even these parasites can have parasites. Consider the mealybug Planococcus citri, a multicellular eukaryote; this organism contains cells known as bacteriocytes. Within these cells are Tremblaya princeps type β-proteobacteria. Surprisingly, within these Tremblaya bacterial cells, which lie within the mealybug cells, live Moranella endobia-type γ-proteobacteria192. In another example, after the initial endosymbiotic event that formed the proto-algal cell, the ancestor of red and green algae and the plants, there have been endocytic events in which a eukaryotic cell has engulfed and formed an endosymbiotic relationship with eukaryotic green algal cells, to form a “secondary” endosymbiont. Similarly, secondary endosymbionts have been engulfed by yet another eukaryote, to form a tertiary endosymbiont193. The conclusion is that there are combinations of cells that can survive better in a particular ecological niche than either could alone. In these phenomena we see the power of evolutionary processes to populate extremely obscure ecological niches in rather surprising ways.

A new view of the tree of life

The tree of life is one of the most important organizing principles in biology 1 . Gene surveys suggest the existence of an enormous number of branches 2 , but even an approximation of the full scale of the tree has remained elusive. Recent depictions of the tree of life have focused either on the nature of deep evolutionary relationships 3–5 or on the known, well-classified diversity of life with an emphasis on eukaryotes 6 . These approaches overlook the dramatic change in our understanding of life's diversity resulting from genomic sampling of previously unexamined environments. New methods to generate genome sequences illuminate the identity of organisms and their metabolic capacities, placing them in community and ecosystem contexts 7,8 . Here, we use new genomic data from over 1,000 uncultivated and little known organisms, together with published sequences, to infer a dramatically expanded version of the tree of life, with Bacteria, Archaea and Eukarya included. The depiction is both a global overview and a snapshot of the diversity within each major lineage. The results reveal the dominance of bacterial diversification and underline the importance of organisms lacking isolated representatives, with substantial evolution concentrated in a major radiation of such organisms. This tree highlights major lineages currently underrepresented in biogeochemical models and identifies radiations that are probably important for future evolutionary analyses.

Early approaches to describe the tree of life distinguished organisms based on their physical characteristics and metabolic features. Molecular methods dramatically broadened the diversity that could be included in the tree because they circumvented the need for direct observation and experimentation by relying on sequenced genes as markers for lineages. Gene surveys, typically using the small subunit ribosomal RNA (SSU rRNA) gene, provided a remarkable and novel view of the biological world 1,9,10 , but questions about the structure and extent of diversity remain. Organisms from novel lineages have eluded surveys, because many are invisible to these methods due to sequence divergence relative to the primers commonly used for gene amplification 7,11 . Furthermore, unusual sequences, including those with unexpected insertions, may be discarded as artefacts 7 .

Whole genome reconstruction was first accomplished in 1995 (ref. 12), with a near-exponential increase in the number of draft genomes reported each subsequent year. There are 30,437 genomes from all three domains of life—Bacteria, Archaea and Eukarya—which are currently available in the Joint Genome Institute's Integrated Microbial Genomes database (accessed 24 September 2015). Contributing to this expansion in genome numbers are single cell genomics 13 and metagenomics studies. Metagenomics is a shotgun sequencing-based method in which DNA isolated directly from the environment is sequenced, and the reconstructed genome fragments are assigned to draft genomes 14 . New bioinformatics methods yield complete and near-complete genome sequences, without a reliance on cultivation or reference genomes 7,15 . These genome- (rather than gene) based approaches provide information about metabolic potential and a variety of phylogenetically informative sequences that can be used to classify organisms 16 . Here, we have constructed a tree of life by making use of genomes from public databases and 1,011 newly reconstructed genomes that we recovered from a variety of environments (see Methods).

To render this tree of life, we aligned and concatenated a set of 16 ribosomal protein sequences from each organism. This approach yields a higher-resolution tree than is obtained from a single gene, such as the widely used 16S rRNA gene 16 . The use of ribosomal proteins avoids artefacts that would arise from phylogenies constructed using genes with unrelated functions and subject to different evolutionary processes. Another important advantage of the chosen ribosomal proteins is that they tend to be syntenic and co-located in a small genomic region in Bacteria and Archaea, reducing binning errors that could substantially perturb the geometry of the tree. Included in this tree is one representative per genus for all genera for which high-quality draft and complete genomes exist (3,083 organisms in total).

Despite the methodological challenges, we have included representatives of all three domains of life. Our primary focus relates to the status of Bacteria and Archaea, as these organisms have been most difficult to profile using macroscopic approaches, and substantial progress has been made recently with acquisition of new genome sequences 7,8,13 . The placement of Eukarya relative to Bacteria and Archaea is controversial 1,4,5,17,18 . Eukaryotes are believed to be evolutionary chimaeras that arose via endosymbiotic fusion, probably involving bacterial and archaeal cells 19 . Here, we do not attempt to confidently resolve the placement of the Eukarya. We position them using sequences of a subset of their nuclear-encoded ribosomal proteins, an approach that classifies them based on the inheritance of their information systems as opposed to lipid or other cellular structures 5 .

Figure 1 presents a new view of the tree of life. This is one of a relatively small number of three-domain trees constructed from molecular information so far, and the first comprehensive tree to be published since the development of genome-resolved metagenomics. We highlight all major lineages with genomic representation, most of which are phylum-level branches (see Supplementary Fig. 1 for full bootstrap support values). However, we separately identify the Classes of the Proteobacteria, because the phylum is not monophyletic (for example, the Deltaproteobacteria branch away from the other Proteobacteria, as previously reported 2,20 ).

The tree includes 92 named bacterial phyla, 26 archaeal phyla and all five of the Eukaryotic supergroups. Major lineages are assigned arbitrary colours and named, with well-characterized lineage names, in italics. Lineages lacking an isolated representative are highlighted with non-italicized names and red dots. For details on taxon sampling and tree inference, see Methods. The names Tenericutes and Thermodesulfobacteria are bracketed to indicate that these lineages branch within the Firmicutes and the Deltaproteobacteria, respectively. Eukaryotic supergroups are noted, but not otherwise delineated due to the low resolution of these lineages. The CPR phyla are assigned a single colour as they are composed entirely of organisms without isolated representatives, and are still in the process of definition at lower taxonomic levels. The complete ribosomal protein tree is available in rectangular format with full bootstrap values as Supplementary Fig. 1 and in Newick format in Supplementary Dataset 2.

The tree in Fig. 1 recapitulates expected organism groupings at most taxonomic levels and is largely congruent with the tree calculated using traditional SSU rRNA gene sequence information (Supplementary Fig. 2). The support values for taxonomic groups are strong at the Species through Class levels (>85%), with moderate-to-strong support for Phyla (>75% in most cases), but the branching order of the deepest branches cannot be confidently resolved (Supplementary Fig. 1). The lower support for deep branch placements is a consequence of our prioritization of taxon sampling over number of genes used for tree construction. As proposed recently, the Eukarya, a group that includes protists, fungi, plants and animals, branches within the Archaea, specifically within the TACK superphylum 21 and sibling to the Lokiarchaeota 22 . Interestingly, this placement is not evident in the SSU rRNA tree, which has the three-domain topology proposed by Woese and co-workers in 1990 1 (Supplementary Fig. 2). The two-domain Eocyte tree and the three-domain tree are competing hypotheses for the origin of Eukarya 5 further analyses to resolve these and other deep relationships will be strengthened with the availability of genomes for a greater diversity of organisms. Important advantages of the ribosomal protein tree compared with the SSU rRNA gene tree are that it includes organisms with incomplete or unavailable SSU rRNA gene sequences and more strongly resolves the deeper radiations. Ribosomal proteins have been shown to contain compositional biases across the three domains, driven by thermophilic, mesophilic and halophilic lifestyles as well as by a primitive genetic code 23 . Continued expansion of the number of genome sequences for non-extremophile Archaea, such as the DPANN lineages 8,13 , may allow clarification of these compositional biases.

A striking feature of this tree is the large number of major lineages without isolated representatives (red dots in Fig. 1). Many of these lineages are clustered together into discrete regions of the tree. Of particular note is the Candidate Phyla Radiation (CPR) 7 , highlighted in purple in Fig. 1. Based on information available from hundreds of genomes from genome-resolved metagenomics and single-cell genomics methods to date, all members have relatively small genomes and most have somewhat (if not highly) restricted metabolic capacities 7,13,24 . Many are inferred (and some have been shown) to be symbionts 7,25,26 . Thus far, all cells lack complete citric acid cycles and respiratory chains and most have limited or no ability to synthesize nucleotides and amino acids. It remains unclear whether these reduced metabolisms are a consequence of superphylum-wide loss of capacities or if these are inherited characteristics that hint at an early metabolic platform for life. If inherited, then adoption of symbiotic lifestyles may have been a later innovation by these organisms once more complex organisms appeared.

Figure 2 presents another perspective, where the major lineages of the tree are defined using evolutionary distance, so that the main groups become apparent without bias arising from historical naming conventions. This depiction uses the same inferred tree as in Fig. 1, but with groups defined on the basis of average branch length to the leaf taxa. We chose an average branch length that best recapitulated the current taxonomy (smaller values fragmented many currently accepted phyla and larger values collapsed accepted phyla into very few lineages, see Methods). Evident in Fig. 2 is the enormous extent of evolution that has occurred within the CPR. The diversity within the CPR could be a result of the early emergence of this group and/or a consequence of rapid evolution related to symbiotic lifestyles. The CPR is early-emerging on the ribosomal protein tree (Fig. 1), but not in the SSU rRNA tree (Supplementary Fig. 2). Regardless of branching order, the CPR, in combination with other lineages that lack isolated representatives (red dots in Fig. 2), clearly comprises the majority of life's current diversity.

The threshold for groups (coloured wedges) was an average branch length of <0.65 substitutions per site. Notably, some well-accepted phyla become single groups and others are split into multiple distinct groups. We undertook this analysis to provide perspective on the structure of the tree, and do not propose the resulting groups to have special taxonomic status. The massive scale of diversity in the CPR and the large fraction of major lineages that lack isolated representatives (red dots) are apparent from this analysis. Bootstrap support values are indicated by circles on nodes—black for support of 85% and above, grey for support from 50 to 84%. The complete ribosomal protein tree is available in rectangular format with full bootstrap values as Supplementary Fig. 1 and in Newick format in Supplementary Dataset 2.

Domain Bacteria includes more major lineages of organisms than the other Domains. We do not attribute the smaller scope of the Archaea relative to Bacteria to sampling bias because metagenomics and single-cell genomics methods detect members of both domains equally well. Consistent with this view, Archaea are less prominent and less diverse in many ecosystems (for example, seawater 27 , hydrothermal vents 28 , the terrestrial subsurface 15 and human-associated microbiomes 29 ). The lower apparent phylogenetic diversity of Eukarya is fully expected, based on their comparatively recent evolution.

The tree of life as we know it has dramatically expanded due to new genomic sampling of previously enigmatic or unknown microbial lineages. This depiction of the tree captures the current genomic sampling of life, illustrating the progress that has been made in the last two decades following the first published genome. What emerges from analysis of this tree is the depth of evolutionary history that is contained within the Bacteria, in part due to the CPR, which appears to subdivide the domain. Most importantly, the analysis highlights the large fraction of diversity that is currently only accessible via cultivation-independent genome-resolved approaches.

Initiation of Transcription in Prokaryotes

RNA polymerase initiates transcription at specific DNA sequences called promoters.

Learning Objectives

Summarize the initial steps of transcription in prokaryotes

Key Takeaways

Key Points

  • Transcription of mRNA begins at the initiation site.
  • Two promoter consensus sequences are at the -10 and -35 regions upstream of the initiation site.
  • The σ subunit of RNA polymerase recognizes and binds the -35 region.
  • Five subunits (α, α, β, β’, and σ) make up the complete RNA polymerase holoenzyme.

Key Terms

  • holoenzyme: a fully functioning enzyme, composed of all its subunits
  • promoter: the section of DNA that controls the initiation of RNA transcription

Prokaryotic RNA Polymerase

Prokaryotes use the same RNA polymerase to transcribe all of their genes. In E. coli, the polymerase is composed of five polypeptide subunits, two of which are identical. Four of these subunits, denoted α, α, β, and β’, comprise the polymerase core enzyme. These subunits assemble each time a gene is transcribed they disassemble once transcription is complete. Each subunit has a unique role: the two α-subunits are necessary to assemble the polymerase on the DNA the β-subunit binds to the ribonucleoside triphosphate that will become part of the nascent “recently-born” mRNA molecule and the β’ binds the DNA template strand. The fifth subunit, σ, is involved only in transcription initiation. It confers transcriptional specificity such that the polymerase begins to synthesize mRNA from an appropriate initiation site. Without σ, the core enzyme would transcribe from random sites and would produce mRNA molecules that specified protein gibberish. The polymerase comprised of all five subunits is called the holoenzyme.

Prokaryotic Promoters and Initiation of Transcription

The nucleotide pair in the DNA double helix that corresponds to the site from which the first 5′ mRNA nucleotide is transcribed is called the +1 site, or the initiation site. Nucleotides preceding the initiation site are given negative numbers and are designated upstream. Conversely, nucleotides following the initiation site are denoted with “+” numbering and are called downstream nucleotides.

A promoter is a DNA sequence onto which the transcription machinery binds and initiates transcription. In most cases, promoters exist upstream of the genes they regulate. The specific sequence of a promoter is very important because it determines whether the corresponding gene is transcribed all the time, some of the time, or infrequently. Although promoters vary among prokaryotic genomes, a few elements are conserved. At the -10 and -35 regions upstream of the initiation site, there are two promoter consensus sequences, or regions that are similar across all promoters and across various bacterial species. The -10 consensus sequence, called the -10 region, is TATAAT. The -35 sequence, TTGACA, is recognized and bound by σ. Once this interaction is made, the subunits of the core enzyme bind to the site. The A–T-rich -10 region facilitates unwinding of the DNA template several phosphodiester bonds are made. The transcription initiation phase ends with the production of abortive transcripts, which are polymers of approximately 10 nucleotides that are made and released.

Promoter: The σ subunit of prokaryotic RNA polymerase recognizes consensus sequences found in the promoter region upstream of the transcription start sight. The σ subunit dissociates from the polymerase after transcription has been initiated.


A eukaryote is a cell or organism that possesses a clearly defined nucleus. The cells of all multicellular organisms (plants, animals, and fungi) are eukaryotic. Algae and protists also are eukaryotic organisms.

The nucleus of a eukaryotic cell is surrounded with a nuclear membrane and contains well-defined chromosomes (bodies containing hereditary material). Eukaryotic cells also contain membrane-bound organelles, which include mitochondria, sausage-shaped bodies that produce energy the Golgi complex (or Golgi apparatus), a membranous structure that helps export newly formed proteins and lipids from the cell the endoplasmic reticulum, a canal-like system of membranes that functions in protein and lipid synthesis and lysosomes, vesicles that help break down materials. Plant and algal cells also contain organelles called plastids, including chloroplasts, which play a key role in photosynthesis.

Have LGT of prokaryotic origin significantly contributed to current eukaryotic genes sets?

The extent of prokaryotic LGT to eukaryotes is the question that Ku and Martin decided to tackle in their recent article in BMC Biology [8]. Their idea is that if LGT from prokaryotes to eukaryotes is continuous and prevalent, traces of recent LGT must be detectable in eukaryote genomes. To assess this, they have re-analyzed their 2015 dataset made up of

2600 phylogenetic trees encompassing 55 eukaryotes from diverse lineages and

2000 prokaryote species. While they identify many prokaryote to prokaryote LGT candidates with high similarity between donor and receiver genes, indicative of recent transfer, they found a paucity of highly similar prokaryote to eukaryote LGT candidates. Moreover, while in prokaryotes recent LGT candidates are present in multiple species in the receiver clade, this is much more rarely observed in eukaryotes. Furthermore, if the candidate eukaryotic acquisitions from plastid and mitochondrial ancestors are excluded from the analysis, there remain only a few species-specific recent candidate LGT events. Because these few recent candidates are specific to one or a few species and highly similar to their prokaryotic candidate donors, they cannot easily be distinguished from bacterial contamination. On the basis of these observations, the authors conclude that there is a lack of evidence for recent LGT of prokaryotic origin in eukaryotic genomes and that this phenomenon is neither continuous nor prevalent. They further propose that any protein-coding gene in a eukaryotic genome with ≥70 % identity to prokaryotic homologs should be first considered as likely contamination rather than candidate LGT.

Obviously, several confounding factors could also contribute to this paucity of candidate recent LGT in eukaryote genomes. First, in their dataset, the authors include almost 40 more bacterial species (including closely related species or different strains of the same species) than eukaryotic species (none of which are closely related). This can partly contribute to the paucity of recent candidate LGT conserved between multiple eukaryote species within a receiver clade. One could also argue that true candidate prokaryotic LGT donors have not been sampled because most are probably uncultured bacteria distant from anything that has been sequenced. Finally, the removal of everything highly similar to bacterial genes prior to eukaryotic genome annotation (or assembly) could also contribute to this deficiency of putative recent LGT. In many genome projects these highly similar sequences are considered as contaminants and are not visible in the final set of predicted protein-coding genes. However, as stated by the authors, these features probably account for only a minor part of the huge difference between prokaryote–prokaryote and prokaryote–eukaryote distribution of similarity between candidate donor and receiver genes. It is almost certain that the contribution of LGT of prokaryote origin to the making of a eukaryotic nuclear genome is several orders of magnitude less important than for prokaryotes.

What we can conclude from this recent paper and the tardigrade controversy is that any claim of prokaryote–eukaryote LGT (and particularly those with high identity to prokaryote candidate donors) must be taken with caution and, ideally, additional supporting evidence should be gathered. In addition to phylogenetic analysis, features such as presence of bona fide eukaryotic genes on the same contigs as the candidate LGT sequences, the presence of spliceosomal introns, conservation of the LGT candidate in sister species, and transcriptional support all provide additional evidence for LGT rather than contamination.

The Similarities

There are many other cell types in different forms, like neurons, epithelial, muscle cells, etc. But prokaryotes and eukaryotes are the only true cell structures and types. The following points will cover the main similarities.

  • The genetic material, i.e., presence of DNA is common between the two cells.
  • The presence of RNA is common.
  • They both have a cell membrane covering them.
  • Resemblances are seen in their basic chemical structures. Both are made up of carbohydrates, proteins, nucleic acid, minerals, fats, and vitamins.
  • Both of them have ribosomes, which make proteins.
  • They regulate the flow of nutrients and waste matter that enters and exits the cellules.
  • Basic life processes like photosynthesis and reproduction are carried out by them.
  • They need energy supply to survive.
  • They both have ‘chemical noses’ that keep them updated and aware of all the reactions that occur within them and in the surrounding environment.
  • Both these organisms have a fluid-like matrix called the cytoplasm that fills the cells.
  • Both have a cytoskeleton within the cell to support them.
  • They have a thin extension of the plasma membrane which is supported by the cytoskeleton.
  • Flagella and cilia are found in eukaryotes likewise endoflagella, fimbriae, pili and flagella are found in prokaryotes. They are used for motility and adhering to surfaces or moving matter outside the cells.
  • Some prokaryotic and eukaryotic cellules have glycocalyces as a common material. This is a sugar-based structure that is sticky and helps the cells in anchoring to each other thus, giving them some protection.
  • They have a lipid bilayer, known as the plasma layer, that forms the boundary between the inner and outer side of the cell.

There are many differences between them, of which age and structure are the main attributes. It is believed by scientists that eukaryotic cells evolved from prokaryotic cells. In short, both are the smallest units of life.

Related Posts

Both animal and plant cells are eukaryotic cells and have several similarities. The similarities include common organelles like cell membrane, cell nucleus, mitochondria, endoplasmic reticulum, ribosomes and golgi apparatus.

The controversy over stem cell research is mainly centered in the creation and/or destruction of human embryos. Read on to know more.

Where is the research in stem cells heading? How much have we achieved and what is yet to be accomplished? Get to know some interesting stem cell research facts and&hellip

Pre-mRNAs Are Cleaved at Specific 3′ Sites and Rapidly Polyadenylated

In animal cells, all mRNAs, except histone mRNAs, have a 3′ poly(A) tail. Early studies of pulse-labeled adenovirus and SV40 RNA demonstrated that the viral primary transcripts extend beyond the poly(A) site in the viral mRNAs. These results suggested that A residues are added to a 3′ hydroxyl generated by endonucleolytic cleavage, but the predicted downstream RNA fragments are degraded so rapidly in vivo that they cannot be detected. However, this cleavage mechanism was firmly established by detection of both predicted cleavage products in in vitro processing reactions performed with extracts of HeLa-cell nuclei.

Early sequencing of cDNA clones from animal cells showed that nearly all mRNAs contain the sequence AAUAAA 10 –� nucleotides upstream from the poly(A) tail. Polyadenylation of RNA transcripts from transfected genes is virtually eliminated when template DNA encoding the AAUAAA sequence is mutated to any other sequence except one encoding AUUAAA. The unprocessed RNA transcripts produced from such mutant templates do not accumulate in nuclei, but are rapidly degraded. Further mutagenesis of sequences within a few hundred bases of poly(A) sites revealed that a second signal downstream from the cleavage site is required for efficient cleavage and polyadenylation of most pre-mRNAs in animal cells. This downstream poly(A) signal is not a specific sequence but rather a GU-rich or simply a U-rich region within � nucleotides of the cleavage site.

Identification and purification of the proteins required for cleavage and polyadenylation of pre-mRNA has led to the model shown in Figure 11-12. According to this model, a 360-kDa cleavage and polyadenylation specificity factor (CPSF), composed of four different polypeptides, first forms an unstable complex with the upstream AU-rich poly(A) signal. Then at least three additional proteins —𠁚 200-kDa heterotrimer called cleavage stimulatory factor (CStF), a 150-kDa heterotrimer called cleavage factor I (CFI), and a second cleavage factor (CFII), as-yet poorly characterized —𠁛ind to the CPSF-RNA complex. Interaction between CStF and the GU- or U-rich downstream poly(A) signal stabilizes the multiprotein complex. Finally, a poly(A) polymerase (PAP) binds to the complex before cleavage can occur. This requirement for PAP binding links cleavage and polyadenylation, so that the free 3′ ends generated are rapidly polyadenylated. Assembly of this large, multiprotein cleavage-polyadenylation complex around the AU-rich poly(A) signal in a pre-mRNA is analogous in many ways to formation of the transcription-initiation complex at the AT-rich TATA box of a template DNA molecule (see Figure 10-50). In both cases, multiprotein complexes assemble cooperatively through a network of specific protein – nucleic acid and protein-protein interactions.

Figure 11-12

Model for cleavage and polyadenylation of pre-mRNAs in mammalian cells. Cleavage-and-polyadenylation specificity factor (CPSF) binds to an upstream AAUAAA polyadenylation signal. CStF interacts with a downstream GU- or U-rich sequence and with bound CPSF, (more. )

Following cleavage at the poly(A) site, polyadenylation proceeds in two phases. Addition of the first 12 or so A residues occurs slowly, followed by rapid addition of up to 200 –� more A residues. The rapid phase requires the binding of multiple copies of a poly(A)-binding protein containing the RNP motif. This protein is designated PABII to distinguish it from the poly(A)-binding protein that binds to the poly(A) tail of cytoplasmic mRNAs. PABII binds to the short A tail initially added by PAP, stimulating polymerization of additional A residues by PAP (see Figure 11-12). PABII is also responsible for signaling poly(A) polymerase to terminate polymerization when the poly(A) tail reaches a length of 200 –� residues, although the mechanism for measuring this length is not yet understood.


iGEM Wageningen UR, Wageningen University, Wageningen, The Netherlands

Matthijn C Hesselman, Jasper J Koehorst, Thijs Slijkhuis, Dorett I Odoni, Floor Hugenholtz & MarkW J van Passel

Laboratory of Microbiology, Wageningen University, Wageningen, The Netherlands

Laboratory of Systems and Synthetic Biology, Wageningen University, Wageningen, The Netherlands

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

Corresponding author

A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes

Background: Sequencing the genomes of multiple, taxonomically diverse eukaryotes enables in-depth comparative-genomic analysis which is expected to help in reconstructing ancestral eukaryotic genomes and major events in eukaryotic evolution and in making functional predictions for currently uncharacterized conserved genes.

Results: We examined functional and evolutionary patterns in the recently constructed set of 5,873 clusters of predicted orthologs (eukaryotic orthologous groups or KOGs) from seven eukaryotic genomes: Caenorhabditis elegans, Drosophila melanogaster, Homo sapiens, Arabidopsis thaliana, Saccharomyces cerevisiae, Schizosaccharomyces pombe and Encephalitozoon cuniculi. Conservation of KOGs through the phyletic range of eukaryotes strongly correlates with their functions and with the effect of gene knockout on the organism's viability. The approximately 40% of KOGs that are represented in six or seven species are enriched in proteins responsible for housekeeping functions, particularly translation and RNA processing. These conserved KOGs are often essential for survival and might approximate the minimal set of essential eukaryotic genes. The 131 single-member, pan-eukaryotic KOGs we identified were examined in detail. For around 20 that remained uncharacterized, functions were predicted by in-depth sequence analysis and examination of genomic context. Nearly all these proteins are subunits of known or predicted multiprotein complexes, in agreement with the balance hypothesis of evolution of gene copy number. Other KOGs show a variety of phyletic patterns, which points to major contributions of lineage-specific gene loss and the 'invention' of genes new to eukaryotic evolution. Examination of the sets of KOGs lost in individual lineages reveals co-elimination of functionally connected genes. Parsimonious scenarios of eukaryotic genome evolution and gene sets for ancestral eukaryotic forms were reconstructed. The gene set of the last common ancestor of the crown group consists of 3,413 KOGs and largely includes proteins involved in genome replication and expression, and central metabolism. Only 44% of the KOGs, mostly from the reconstructed gene set of the last common ancestor of the crown group, have detectable homologs in prokaryotes the remainder apparently evolved via duplication with divergence and invention of new genes.

Conclusions: The KOG analysis reveals a conserved core of largely essential eukaryotic genes as well as major diversification and innovation associated with evolution of eukaryotic genomes. The results provide quantitative support for major trends of eukaryotic evolution noticed previously at the qualitative level and a basis for detailed reconstruction of evolution of eukaryotic genomes and biology of ancestral forms.


Assignment of proteins from each…

Assignment of proteins from each of the seven analyzed eukaryotic genomes to KOGs…

Distribution of the KOGs by…

Distribution of the KOGs by the number of paralogs in each of the…

Functional breakdown of the KOGs.…

Functional breakdown of the KOGs. Designations of functional categories: A, RNA processing and…

Variation of amino-acid substitution rates…

Variation of amino-acid substitution rates among KOGs. (a) Probability-density function for the distribution…

Parsimonious scenarios of loss and…

Parsimonious scenarios of loss and emergence of genes (KOGs) in eukaryotic evolution. (a)…

Correspondence between eukaryotic and prokaryotic…

Correspondence between eukaryotic and prokaryotic orthologous gene sets. (a) Representation of prokaryotic counterparts…

Gene dispensability in yeast and…

Gene dispensability in yeast and worm and phyletic patterns of the respective KOGs.…

Section Summary

In both prokaryotic and eukaryotic cell division, the genomic DNA is replicated and each copy is allocated into a daughter cell. The cytoplasmic contents are also divided evenly to the new cells. However, there are many differences between prokaryotic and eukaryotic cell division. Bacteria have a single, circular DNA chromosome and no nucleus. Therefore, mitosis is not necessary in bacterial cell division. Bacterial cytokinesis is directed by a ring composed of a protein called FtsZ. Ingrowth of membrane and cell-wall material from the periphery of the cells results in a septum that eventually forms the separate cell walls of the daughter cells.

Watch the video: DNA replication in eukaryotes 2. replication Initiation (January 2022).