Clustering and Classification Methods Used in Biosequence Analysis

Год: 2011
Автор: Cagin Kandemir-Cavas and Efendi Nasibov
Издательство: LAP Lambert Academic Publishing
Since human genome studies have brought out a huge number of biosequence data, computational techniques have been developed preventing the vast of cost and time in the management process of these data. In this book, new approaches on clustering and classification methods in biosequence –protein, enzyme sequences– analysis are studied. Classification is a supervised learning algorithm that aims at categorizing or assigning class labels to a pattern set under the supervision of an expert. Therefore, the prediction of subcellular location of proteins and the classification of enzymes have been solved via data mining techniques. Clustering is an unsupervised learning technique that aims at decomposing a given set of elements into clusters based on similarity. Due to the fact that protein sequences have evolutionary relationship, all protein sequences can be organized in terms of their sequence similarity. A graphical illustration called phylogenetic tree can summarize the…