CPC: assess the protein-coding potential oftranscripts using sequence features and supportvector… 1 answer below »

150 150 admin

CPC: assess the protein-coding potential oftranscripts using sequence features and supportvector machineLei Kong, Yong Zhang, Zhi-Qiang Ye, Xiao-Qiao Liu, Shu-Qi Zhao, Liping Wei* and Ge Gao*Center for Bioinformatics, National Laboratory of Protein Engineering and Plant Genetic Engineering, College of Life Sciences, Peking University, Beijing 100871, P. R. ChinaReceived January 30, 2007; Revised April 13, 2007; Accepted May 1, 2007ABSTRACTRecent transcriptome studies have revealed that a large number of transcripts in mammals and other organisms do not encode proteins but function as noncoding RNAs (ncRNAs) instead. As millions of transcripts are generated by large-scale cDNA and EST sequencing projects every year, there is a need for automatic methods to distinguish protein-coding RNAs from noncoding RNAs accurately and quickly. We developed a support vector machine-based classifier, named Coding Potential Calculator (CPC), to assess the protein-coding potential of a transcript based on six biologically meaningful sequence features. Tenfold cross-validation on the training dataset and further testing on several large datasets showed that CPC can discriminate coding from noncoding transcripts with high accuracy. Furthermore, CPC also runs an order-of-magnitude faster than a previous state-of-the-art tool and has higher accuracy. We developed a user-friendly web-based interface of CPC at http://cpc.cbi.pku. edu.cn. In addition to predicting the coding potential of the input transcripts, the CPC web server also graphically displays detailed sequence features and additional annotations of the transcript that may facilitate users’ further investigation.INTRODUCTIONRecent transcriptome studies have revealed that a large number of transcripts in mammals and other organisms do not encode proteins but function as noncoding RNAs (ncRNAs) instead. In vivo experiments have demonstrated important biological roles of noncoding RNAs, includ- ing regulation of transcription and translation, RNAmodification and epigenetic modification of chromatin structure (1–3). There is immense interest within the biological community to identify and study new noncod- ing RNAs.As millions of transcripts are generated by large-scale cDNA and EST sequencing projects every year, there is a need for automatic methods to accurately and quickly distinguish protein-coding RNAs from noncoding RNAs. Since to date no web server and few standalone tools have been designed for this purpose, researchers sometimes used tools developed for other purposes such as cDNA annotation and functionally domain identification (4–12). However these methods showed varied performance on different datasets (12,13). Recently a new algorithm and standalone software named CONC was published that classifies transcripts as ‘coding’ or ‘noncoding’ using machine learning methods (13). CONC showed improved performance over previous tools such as ESTScan (6). However, CONC is slow for large datasets and does not have a web-server interface, limiting its usefulness. It works well with high-quality transcripts but may suffer from errors such as frameshifts which are common in ESTs and even occur occasionally in full-length cDNAs (11). Furthermore, CONC only outputs the ‘coding’/‘noncoding’ classification but does not provide an explanation or related information. New tools are desired that are more accurate, run faster, and have a more user-friendly web-based interface.METHODSTo assess a transcript’s coding potential, we extract six features from the transcript’s nucleotide sequence. A true protein-coding transcript is more likely to have a long and high-quality Open Reading Frame (ORF) compared with a non-coding transcript. Thus, our first three features assess the extent and quality of the ORF in a transcript. We use the framefinder software (14) to identify the

 
PLACE THIS ORDER OR A SIMILAR ORDER WITH COLLEGE NURSING PAPERS AND GET AN AMAZING DISCOUNT!

Hi there! Click one of our representatives below and we will get back to you as soon as possible.

Chat with us on WhatsApp