dbACP: A Comprehensive Database of Anti-Cancer Peptides

dbacp03663

General Description

Peptide name : Large ribosomal subunit protein mL40

Source/Organism : Human

Linear/Cyclic : Not found

Chirality : Not found

Sequence Information

Sequence : MTASVLRSISLALRPTSGLLGTWQTQLRETHQRASLLSFWELIPMRSEPLRKKKKVDPKKDQEAKERLKRKIRKLEKATQELIPIEDFITPLKFLDKARERPQVELTFEETERRALLLKKWSLYKQQERKMERDTIRAMLEAQQEALEELQLESPKLHAEAIKRDPNLFPFEKEGPHYTPPIPNYQPPEGRYNDITKVYTQVEFKR

Peptide length: 206

C-terminal modification: Not found

N-terminal modification : Not found

Non-natural peptide information: None

Activity Information

Assay type : Antibody-based assay

Assay time : 48h

Activity : Not found

Cell line : HaCat

Cancer type : Not specified

Other activity : Not found

Physicochemical Properties

Amino acid composition bar chart :

Molecular mass : 24490.0558 Dalton

Aliphatic index : 0.810

Instability index : 46.0733

Hydrophobicity (GRAVY) : -0.912

Isoelectric point : 9.6185

Charge (pH 7) : 8.8128

Aromaticity : 0.072

Molar extinction coefficient (cysteine, cystine): (23950, 23950)

Hydrophobic/hydrophilic ratio : 0.73109243

hydrophobic moment : -0.246

Missing amino acid : C

Most occurring amino acid : L

Most occurring amino acid frequency : 26

Least occurring amino acid : W

Least occurring amino acid frequency : 3

Structural Information

3D structure :

Secondary structure fraction (Helix, Turn, Sheet): (0.4, 0.1, 0.3)

SMILES Notation: CC[C@H](C)[C@H](NC(=O)[C@H](C)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](C)NC(=O)[C@H](Cc1c[nH]cn1)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CCCCN)NC(=O)[C@@H]1CCCN1C(=O)[C@H](CO)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](C)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H](C)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CCSC)NC(=O)[C@H](C)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@@H](NC(=O)[C@@H](NC(=O)[C@H](CC(=O)O)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CCSC)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H](CCCCN)NC(=O)[C@H](Cc1ccc(O)cc1)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CO)NC(=O)[C@H](Cc1c[nH]c2ccccc12)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](C)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@@H](NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](Cc1ccccc1)NC(=O)[C@@H](NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@@H](NC(=O)[C@H](CCC(N)=O)NC(=O)[C@@H]1CCCN1C(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](C)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CC(=O)O)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](Cc1ccccc1)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CC(C)C)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](NC(=O)[C@@H](NC(=O)[C@H](Cc1ccccc1)NC(=O)[C@H](CC(=O)O)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@@H](NC(=O)[C@@H]1CCCN1C(=O)[C@@H](NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@@H](NC(=O)[C@H](C)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@@H](NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CCCCN)NC(=O)[C@H](C)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H](CC(=O)O)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCCN)NC(=O)[C@@H]1CCCN1C(=O)[C@H](CC(=O)O)NC(=O)[C@@H](NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CC(C)C)NC(=O)[C@@H]1CCCN1C(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CO)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCSC)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](Cc1c[nH]c2ccccc12)NC(=O)[C@H](Cc1ccccc1)NC(=O)[C@H](CO)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CO)NC(=O)[C@H](C)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H](Cc1c[nH]cn1)NC(=O)[C@@H](NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@@H](NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H](Cc1c[nH]c2ccccc12)NC(=O)[C@@H](NC(=O)CNC(=O)[C@H](CC(C)C)NC(=O)[C@H](CC(C)C)NC(=O)CNC(=O)[C@H](CO)NC(=O)[C@@H](NC(=O)[C@@H]1CCCN1C(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](C)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CO)NC(=O)[C@@H](NC(=O)[C@H](CO)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CC(C)C)NC(=O)[C@@H](NC(=O)[C@H](CO)NC(=O)[C@H](C)NC(=O)[C@@H](NC(=O)[C@@H](N)CCSC)[C@@H](C)O)C(C)C)[C@@H](C)CC)[C@@H](C)O)[C@@H](C)O)[C@@H](C)O)[C@@H](C)O)[C@@H](C)CC)C(C)C)[C@@H](C)CC)[C@@H](C)O)[C@@H](C)CC)[C@@H](C)CC)[C@@H](C)CC)[C@@H](C)O)C(C)C)[C@@H](C)O)[C@@H](C)O)[C@@H](C)O)[C@@H](C)CC)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCCNC(=N)N)C(=O)N[C@@H](CC(=O)O)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](Cc1ccccc1)C(=O)N1CCC[C@H]1C(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCC(=O)O)C(=O)NCC(=O)N1CCC[C@H]1C(=O)N[C@@H](Cc1c[nH]cn1)C(=O)N[C@@H](Cc1ccc(O)cc1)C(=O)N[C@H](C(=O)N1CCC[C@H]1C(=O)N1CCC[C@H]1C(=O)N[C@H](C(=O)N1CCC[C@H]1C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](Cc1ccc(O)cc1)C(=O)N[C@@H](CCC(N)=O)C(=O)N1CCC[C@H]1C(=O)N1CCC[C@H]1C(=O)N[C@@H](CCC(=O)O)C(=O)NCC(=O)N[C@@H](CCCNC(=N)N)C(=O)N[C@@H](Cc1ccc(O)cc1)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CC(=O)O)C(=O)N[C@H](C(=O)N[C@H](C(=O)N[C@@H](CCCCN)C(=O)N[C@H](C(=O)N[C@@H](Cc1ccc(O)cc1)C(=O)N[C@H](C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@H](C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCCNC(=N)N)C(=O)O)C(C)C)[C@@H](C)O)C(C)C)[C@@H](C)O)[C@@H](C)CC)[C@@H](C)CC)[C@@H](C)O

Secondary Structure :

Method Prediction
GOR EHHHEHEEEEEEECCTTEEEEEEEHHHHHHHHHHHHHHHHHHCHHHHHHHHHHHHCCHTHHHHHHHHHHHHHHHHHHHHHHHCHHHHHHCHHHHHHHHHHCTTHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHTCTTCCHHTTTCCCCCCCCCCCCCCTTCEETEEEEEEHHHHHH
Chou-Fasman (CF) CEEEEEEEECCCCCCEEEEEEEEEEHHHHCHHHHEEEHHHHEECCCHHHHHHHHCCHHHHHHHHHHHHHHHHHHHHHHHHHEECCCEEECHHHHHHHHCCCCCCHHHHHHHHHHHHHHHHEEHHHHHHHHHHEEEHHHHHHHHHHHHHHHHHCCHHHHHHHHCCCCCCHHHHCCCEECCCCCCCCCCCCCCEEEEEEEEECCCCCC
Neural Network (NN) HHHHHHHHHHHCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHCCCCCCCCCCCCCCCCHCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCCCHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCEEEEHHHCCC
Joint/Consensus CCCCEEEEECCCCCCCCEEEEEEEHHHHHHHHHHHHHHHHHHCCCCHHHHHHHHCCCCCHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCHHHHHHHHCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCHHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCCEEEEEEEEECCCCCC

Molecular Descriptors and ADMET Properties

Molecular Descriptors: Not available.

ADMET Properties: Not available.

Cross Referencing databases

CancerPPD : Not available

ApIAPDB : Not available

CancerPPD2 ID : Not available

Reference

1 : Collins JE, et al. A genome annotation-driven approach to cloning the human ORFeome. Genome Biol. 2004; 5:R84. doi: 10.1186/gb-2004-5-10-r84

2 : Ota T, et al. Complete sequencing and characterization of 21,243 full-length human cDNAs. Nat Genet. 2004; 36:40-5. doi: 10.1038/ng1285

3 : Burkard TR, et al. Initial characterization of the human central proteome. BMC Syst Biol. 2011; 5:17. doi: 10.1186/1752-0509-5-17

4 : Amunts A, et al. Ribosome. The structure of the human mitochondrial ribosome. Science. 2015; 348:95-98. doi: 10.1126/science.aaa1193

5 : Brown A, et al. Structure of the large ribosomal subunit from human mitochondria. Science. 2014; 346:718-722. doi: 10.1126/science.1258026

6 : Funke B, et al. Isolation and characterization of a human gene containing a nuclear localization signal from the critical region for velo-cardio-facial syndrome on 22q11. Genomics. 1998; 53:146-54. doi: 10.1006/geno.1998.5488

7 : Hildebrandt T, et al. Identification of URIM, a novel gene up-regulated in metastasis. Anticancer Res. 1999; 19:525-30.

8 : Gerhard DS, et al. The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC). Genome Res. 2004; 14:2121-7. doi: 10.1101/gr.2596504

9 : Brown A, et al. Structures of the human mitochondrial ribosome in native states of assembly. Nat Struct Mol Biol. 2017; 24:866-869. doi: 10.1038/nsmb.3464

10 : Vaca Jacome AS, et al. N-terminome analysis of the human mitochondrial proteome. Proteomics. 2015; 15:2519-24. doi: 10.1002/pmic.201400617

Literature

Paper title : A genome annotation-driven approach to cloning the human ORFeome.

Doi : https://doi.org/10.1186/gb-2004-5-10-r84

Abstract : We have developed a systematic approach to generating cDNA clones containing full-length open reading frames (ORFs), exploiting knowledge of gene structure from genomic sequence. Each ORF was amplified by PCR from a pool of primary cDNAs, cloned and confirmed by sequencing. We obtained clones representing 70% of genes on human chromosome 22, whereas searching available cDNA clone collections found at best 48% from a single collection and 60% for all collections combined.

Paper title : Complete sequencing and characterization of 21,243 full-length human cDNAs.

Doi : https://doi.org/10.1038/ng1285

Abstract : As a base for human transcriptome and functional genomics, we created the "full-length long Japan" (FLJ) collection of sequenced human cDNAs. We determined the entire sequence of 21,243 selected clones and found that 14,490 cDNAs (10,897 clusters) were unique to the FLJ collection. About half of them (5,416) seemed to be protein-coding. Of those, 1,999 clusters had not been predicted by computational methods. The distribution of GC content of nonpredicted cDNAs had a peak at approximately 58% compared with a peak at approximately 42%for predicted cDNAs. Thus, there seems to be a slight bias against GC-rich transcripts in current gene prediction procedures. The rest of the cDNAs unique to the FLJ collection (5,481) contained no obvious open reading frames (ORFs) and thus are candidate noncoding RNAs. About one-fourth of them (1,378) showed a clear pattern of splicing. The distribution of GC content of noncoding cDNAs was narrow and had a peak at approximately 42%, relatively low compared with that of protein-coding cDNAs.

Paper title : Initial characterization of the human central proteome.

Doi : https://doi.org/10.1186/1752-0509-5-17

Abstract : BACKGROUND: On the basis of large proteomics datasets measured from seven human cell lines we consider their intersection as an approximation of the human central proteome, which is the set of proteins ubiquitously expressed in all human cells. Composition and properties of the central proteome are investigated through bioinformatics analyses. RESULTS: We experimentally identify a central proteome comprising 1,124 proteins that are ubiquitously and abundantly expressed in human cells using state of the art mass spectrometry and protein identification bioinformatics. The main represented functions are proteostasis, primary metabolism and proliferation. We further characterize the central proteome considering gene structures, conservation, interaction networks, pathways, drug targets, and coordination of biological processes. Among other new findings, we show that the central proteome is encoded by exon-rich genes, indicating an increased regulatory flexibility through alternative splicing to adapt to multiple environments, and that the protein interaction network linking the central proteome is very efficient for synchronizing translation with other biological processes. Surprisingly, at least 10% of the central proteome has no or very limited functional annotation. CONCLUSIONS: Our data and analysis provide a new and deeper description of the human central proteome compared to previous results thereby extending and complementing our knowledge of commonly expressed human proteins. All the data are made publicly available to help other researchers who, for instance, need to compare or link focused datasets to a common background.

Paper title : Ribosome. The structure of the human mitochondrial ribosome.

Doi : https://doi.org/10.1126/science.aaa1193

Abstract : The highly divergent ribosomes of human mitochondria (mitoribosomes) synthesize 13 essential proteins of oxidative phosphorylation complexes. We have determined the structure of the intact mitoribosome to 3.5 angstrom resolution by means of single-particle electron cryogenic microscopy. It reveals 80 extensively interconnected proteins, 36 of which are specific to mitochondria, and three ribosomal RNA molecules. The head domain of the small subunit, particularly the messenger (mRNA) channel, is highly remodeled. Many intersubunit bridges are specific to the mitoribosome, which adopts conformations involving ratcheting or rolling of the small subunit that are distinct from those seen in bacteria or eukaryotes. An intrinsic guanosine triphosphatase mediates a contact between the head and central protuberance. The structure provides a reference for analysis of mutations that cause severe pathologies and for future drug design.

Paper title : Structure of the large ribosomal subunit from human mitochondria.

Doi : https://doi.org/10.1126/science.1258026

Abstract : Human mitochondrial ribosomes are highly divergent from all other known ribosomes and are specialized to exclusively translate membrane proteins. They are linked with hereditary mitochondrial diseases and are often the unintended targets of various clinically useful antibiotics. Using single-particle cryogenic electron microscopy, we have determined the structure of its large subunit to 3.4 angstrom resolution, revealing 48 proteins, 21 of which are specific to mitochondria. The structure unveils an adaptation of the exit tunnel for hydrophobic nascent peptides, extensive remodeling of the central protuberance, including recruitment of mitochondrial valine transfer RNA (tRNA(Val)) to play an integral structural role, and changes in the tRNA binding sites related to the unusual characteristics of mitochondrial tRNAs.

Paper title : Isolation and characterization of a human gene containing a nuclear localization signal from the critical region for velo-cardio-facial syndrome on 22q11.

Doi : https://doi.org/10.1006/geno.1998.5488

Abstract : Velo-cardio-facial syndrome (VCFS) and DiGeorge syndrome are congenital disorders characterized by craniofacial anomalies, conotruncal heart defects, immune deficiencies, and learning disabilities. Both diseases are associated with similar hemizygous 22q11 deletions, indicating that haploinsufficiency of a gene(s) in 22q11 is responsible for their etiology. We describe here a new gene called NLVCF, which maps to the critical region for VCFS on 22q11 between the genes HIRA and UFD1L. NLVCF encodes a putative protein of 206 amino acids. The coding region encompasses four exons that span a genomic interval of 3.4 kb. Coding sequence analysis revealed that NLVCF is a novel gene that contains two consensus sequences for nuclear localization signals. The Nlvcf mouse homolog is 75% identical in amino acid sequence and maps to the orthologous region on mouse chromosome 16. The human NLVCF transcript is 1.3 kb in size and is expressed at varying levels in many fetal and adult tissues. Whole-mount in situ hybridization showed that Nlvcf is expressed in most structures of 9.5-dpc mouse embryos, with especially high expression in the head as well as in the first and second pharyngeal arches. NLVCF and HIRA are divergently transcribed, and their start codons lie approximately 1 kb apart in both humans and mice. Interestingly, the two genes exhibit a similar expression pattern in mouse embryos, suggesting that they may share common regulatory elements. The pattern of expression of NLVCF and its localization in the critical region suggest that NLVCF may contribute to the etiology of VCFS.

Paper title : Identification of URIM, a novel gene up-regulated in metastasis.

Doi : https://doi.org/Not available

Abstract : URIM (Up-Regulated In Metastasis) was identified as a new gene by Differential Display Technique during investigation of the transcriptional pattern of a metastasizing (NMCL-1) and a non-metastasizing (530) human melanoma cell line. The protein is encoded by 206 amino acids with an isoelectric point of 10.4. In addition, URIM displays a putative nuclear localization signal and a putative leucine zipper suggesting that URIM may function as a nuclear protein. Expression of URIM in several normal tissues and tumor cell lines was studied by Northern blotting. Surprisingly, 17 fold increased steady-state mRNA levels for URIM were detected in three cell lines derived from bone marrow micrometastasis of mammary carcinoma and one mammary carcinoma cell line derived from ascites fluid compared to normal epithelial cells from mammary gland and two cell lines derived from primary mammary carcinoma. These findings indicate that expression of URIM might be deregulated in metastases of different types of tumors.

Paper title : The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC).

Doi : https://doi.org/10.1101/gr.2596504

Abstract : The National Institutes of Health's Mammalian Gene Collection (MGC) project was designed to generate and sequence a publicly accessible cDNA resource containing a complete open reading frame (ORF) for every human and mouse gene. The project initially used a random strategy to select clones from a large number of cDNA libraries from diverse tissues. Candidate clones were chosen based on 5'-EST sequences, and then fully sequenced to high accuracy and analyzed by algorithms developed for this project. Currently, more than 11,000 human and 10,000 mouse genes are represented in MGC by at least one clone with a full ORF. The random selection approach is now reaching a saturation point, and a transition to protocols targeted at the missing transcripts is now required to complete the mouse and human collections. Comparison of the sequence of the MGC clones to reference genome sequences reveals that most cDNA clones are of very high sequence quality, although it is likely that some cDNAs may carry missense variants as a consequence of experimental artifact, such as PCR, cloning, or reverse transcriptase errors. Recently, a rat cDNA component was added to the project, and ongoing frog (Xenopus) and zebrafish (Danio) cDNA projects were expanded to take advantage of the high-throughput MGC pipeline.

Paper title : Structures of the human mitochondrial ribosome in native states of assembly.

Doi : https://doi.org/10.1038/nsmb.3464

Abstract : Mammalian mitochondrial ribosomes (mitoribosomes) have less rRNA content and 36 additional proteins compared with the evolutionarily related bacterial ribosome. These differences make the assembly of mitoribosomes more complex than the assembly of bacterial ribosomes, but the molecular details of mitoribosomal biogenesis remain elusive. Here, we report the structures of two late-stage assembly intermediates of the human mitoribosomal large subunit (mt-LSU) isolated from a native pool within a human cell line and solved by cryo-EM to ∼3-Å resolution. Comparison of the structures reveals insights into the timing of rRNA folding and protein incorporation during the final steps of ribosomal maturation and the evolutionary adaptations that are required to preserve biogenesis after the structural diversification of mitoribosomes. Furthermore, the structures redefine the ribosome silencing factor (RsfS) family as multifunctional biogenesis factors and identify two new assembly factors (L0R8F8 and mt-ACP) not previously implicated in mitoribosomal biogenesis.

Paper title : N-terminome analysis of the human mitochondrial proteome.

Doi : https://doi.org/10.1002/pmic.201400617

Abstract : The high throughput characterization of protein N-termini is becoming an emerging challenge in the proteomics and proteogenomics fields. The present study describes the free N-terminome analysis of human mitochondria-enriched samples using trimethoxyphenyl phosphonium (TMPP) labelling approaches. Owing to the extent of protein import and cleavage for mitochondrial proteins, determining the new N-termini generated after translocation/processing events for mitochondrial proteins is crucial to understand the transformation of precursors to mature proteins. The doublet N-terminal oriented proteomics (dN-TOP) strategy based on a double light/heavy TMPP labelling has been optimized in order to improve and automate the workflow for efficient, fast and reliable high throughput N-terminome analysis. A total of 2714 proteins were identified and 897 N-terminal peptides were characterized (424 N-α-acetylated and 473 TMPP-labelled peptides). These results allowed the precise identification of the N-terminus of 693 unique proteins corresponding to 26% of all identified proteins. Overall, 120 already annotated processing cleavage sites were confirmed while 302 new cleavage sites were characterized. The accumulation of experimental evidence of mature N-termini should allow increasing the knowledge of processing mechanisms and consequently also enhance cleavage sites prediction algorithms. Complete datasets have been deposited to the ProteomeXchange Consortium with identifiers PXD001521, PXD001522 and PXD001523 (http://proteomecentral.proteomexchange.org/dataset/PXD001521, http://proteomecentral.proteomexchange.org/dataset/PXD0001522 and http://proteomecentral.proteomexchange.org/dataset/PXD001523, respectively).