دسته‌بندی و پیش‌بینی کلاله سه‌شاخه و چند‌‌شاخه زعفران با استفاده از ابزار‌های آماری یادگیری ماشینی بدون‌نظارت

نوع مقاله: مقاله علمی پژوهشی

نویسنده

استادیارگروه زیست شناسی، دانشکده علوم پایه، دانشگاه قم

چکیده

زعفران یک گیاه تریپلوئید و عقیم است که در همه کشورها به‌عنوان یک ادویه و گیاه دارویی مورد استفاده قرار می‌گیرد. کلاله مهم‌ترین قسمت گیاه زعفران می‌باشد. تاکنون هیچ روش مطمئن مولکولی برای شناسایی و پیش‌بینی گونه‌های دارای کلاله سه و چند‌شاخه ارائه نشده است. در‌ این بررسی بر ‌اساس نشانگر‌های مولکولی چندشکلی توالی مربوط تکثیر یافته و با استفاده از الگوریتم‌های بیوانفورماتیکی مختلف،روش‌های جدیدی برای پیش‌بینی کلاله زعفران ارائه شده است. پنج آلل M131400، M151200، M151100، M10850 و G6500 به‌عنوان مهم‌ترین دسته‌بندی کننده با دقت پیش‌بینی بالا بر اساس مدل‌های Attribute Weighting انتخاب شدند که دارای پتانسیل بالایی برای ‌خوشه‌بندی و تشخیص کلاله سه‌شاخه ازچند‌‌شاخه هستند. دسته‌بندی بدون یادگیری بر اساس الگوریتم‌های K-Means و K-Medoids قادر به ‌خوشه‌بندی صحیح کلاله زعفران هستند. نتایج نشان می‌دهد که برای اولین بار، روش‌های داده‌کاوی می‌توانند شیوه‌ای بسیار مؤثر، با دقت و صحت بالای 90 درصد برای تمایز ژنتیکی کلاله سه‌شاخه از چند‌‌شاخه مورد استفاده قرار گیرد. این روش‌ها می‌توانند در مکان‌یابی ژنی و انتخاب به کمک بیومارکرها مورد استفاده قرار گیرند.

کلیدواژه‌ها

موضوعات


عنوان مقاله [English]

Classification and prediction of three and multi stigma in saffron by statistical, unsupervised machine learning tools.

نویسنده [English]

  • Amir Hosein Beiki
Assistant Professor,Department of Biology, Faculty of Science, Qom University
چکیده [English]

Saffron is a triploid, sterile plant, used as a spice and medicinalplant in all countries. Stigma is the most important part of saffron. So far no reliable molecular methods were provided to identify and prediction of the three/multi branches species. In this study, using different bioinformatics algorithms, new tools for prediction based on Sequence-Related Amplified Polymorphismmolecular markers is presented. Five alleles M1311400, M151200, M12100 and M10850 selected as the most important classifier by Attribute Weighting models which has the potential to cluster and recognize the three from multi branches stigma. K-Means and K-Medoids unsupervised clustering algorithms were fully able to cluster each genotype to the right classes. Our results showed that for the first time, data mining techniques can be effectively used to genetic differentiation between three and multi stigma with above 90 percent the accuracy andprecision. These methods can use in gene mapping and selection by biomarker.

کلیدواژه‌ها [English]

  • classifier
  • Machine Learning
  • Molecular marker
  • Sequence-related amplified polymorphism

Beiki, A., Keify, F., and Mozafari, J. 2011. Rapid genomic DNA isolation from corm of Crocus species for genetic diversity analysis. Journal of Medicinal Plants Research 5: 4596-4600.

Beiki, A.H., Saboor, S., and Ebrahimi, M. 2012. A new avenue for classification and prediction of olive cultivars using supervised and unsupervised algorithms. PloS ONE 7: e44164.

Benešová, M., Holá, D., Fischer, L., Jedelský, P.L., Hnilička, F., Wilhelmová, N., Rothová, O., Kočová, M., Procházková, D., and Honnerová, J. 2012. The physiology and proteomics of drought tolerance in maize: early stomatal closure as a cause of lower tolerance to short-term dehydration? PloS ONE 7: e38017.

Bernardo, R. 2008. Molecular markers and selection for complex traits in plants: learning from the last 20 years. Crop Science 48: 1649-1664.

Bishop, C.M. 2006. Pattern recognition and machine learning.Springer New York, 738 pages.

Caiola, M.G., and Canini, A. 2010. Looking for saffron’s (Crocus sativusL.) parents. Saffron. Functional Plant Science and Biotechnology 4: 1-14.

Caiola, M.G., Caputo, P., and Zanier, R. 2004. RAPD analysis in Crocus sativus L. accessions and related Crocus species. Biologia Plantarum 48: 375-380.

Chang, Y.-Y., Kao, N.-H., Li, J.-Y., Hsu, W.-H., Liang, Y.-L., Wu, J.-W., and Yang, C.-H. 2010. Characterization of the possible roles for B class MADS box genes in regulation of perianth formation in orchid. Plant Physiology 152: 837-853.

Duda, R.O., Hart, P.E.,and Stork, D.G. 1999. Pattern classification. John Wiley & Sons, 680 pages.

Fukunaga, K. 1990. Introduction to statistical pattern recognition. Academic press, 592 pages.

Gentleman, R., and Carey, V. 2008. Unsupervised machine learning. Bioconductor Case Studies. Springer, pp. 137-157.

Hall, M.A., and Holmes, G. 2003. Benchmarking attribute selection techniques for discrete class data mining. Knowledge and Data Engineering, IEEE Transactions on 15: 1437-1447.

Jannink, J.-L., Lorenz, A.J., and Iwata, H. 2010. Genomic selection in plant breeding: from theory to practice. Briefings in Functional Genomics 9: 166-177.

Keify, F., and Beiki, A.H. 2012. Exploitation of random amplified polymorphic DNA (RAPD) and sequence-related amplified polymorphism (SRAP) markers for genetic diversity of saffron collection. Journal of Medicinal Plants Research 6: 2761-2768.

Kohavi, R., and John, G.H. 1997. Wrappers for feature subset selection. Artificial Intelligence 97: 273-324.

Li, G., and Quiros, C.F. 2001. Sequence-related amplified polymorphism (SRAP), a new marker system based on a simple PCR reaction: its application to mapping and gene tagging in Brassica. Theoretical and Applied Genetics 103: 455-461.

Maenhout, S., De Baets, B., Haesaert, G., and Van Bockstaele, E. 2007. Support vector machine regression for the prediction of maize hybrid performance. Theoretical and Applied Genetics 115: 1003-1013.

Maenhout, S., De Baets, B., Haesaert, G., and Van Bockstaele, E. 2008. Marker-based screening of maize inbred lines using support vector machine regression. Euphytica 161: 123-131.

Mierswa, I. 2009. Open Source data mining Rapid Miner. KI 23: 62-63.

Mitra, S., and Acharya, T. 2005. Data mining: multimedia, soft computing, and bioinformatics. John Wiley & Sons, 424 pages.

Negbi, M. 2003. Saffron: Crocus sativus L. CRC Press, 148 pages.

Ornella, L., Cervigni, G., and Tapia, E. 2012. Applications of Machine Learning in Breeding for Stress Tolerance in Maize. Crop Stress and its Management: Perspectives and Strategies. Springer, pp. 163-192.

Pang, B., Lee, L., and Vaithyanathan, S. 2002. Thumbs up?: sentiment classification using machine learning techniques. Proceedings of the ACL-02 Conference on EmpiricalMethods in Natural Language Processing-Volume 10. Association for Computational Linguistics, pp. 79-86.

Steinfath, M., Gärtner, T., Lisec, J., Meyer, R.C., Altmann, T., Willmitzer, L., and Selbig, J. 2010. Prediction of hybrid biomass in Arabidopsis thaliana by selected parental SNP and metabolic markers. Theoretical and Applied Genetics 120: 239-247.

Sun, S.-J., Gao, W., Lin, S.-Q., Zhu, J., Xie, B.-G., and Lin, Z.-B. 2006. Analysis of genetic diversity in Ganoderma population with a novel molecular marker SRAP. Applied Microbiology and Biotechnology 72: 537-543.

Tsaftaris, A., Pasentsis, K., Kalivas, A., Michailidou, S., Madesis, P., and Argiriou, A. 2012. Isolation of a CENTRORADIALIS/TERMINAL FLOWER1 homolog in saffron (Crocus sativus L.): characterization and expression analysis. Molecular Biology Reports 39: 7899-7910.

Tsaftaris, A., Pasentsis, K., Makris, A., Darzentas, N., Polidoros, A., Kalivas, A., and Argiriou, A. 2011. The study of the E-class SEPALLATA3-like MADS-box genes in wild-type and mutant flowers of cultivated saffron crocus (Crocus sativus L.) and its putative progenitors. Journal of Plant Physiology 168: 1675-1684.

Tsaftaris, A., Pasentsis, K.,and Polidoros, A. 2005. Isolation of a differentially spliced C-type flower specific AG-like MADS-box gene from Crocus sativus and characterization of its expression. Biologia Plantarum 49: 499-504.

Tsaftaris, A.S., Polidoros, A.N., Pasentsis, K., and Kalivas, A. 2006. Tepal formation and expression pattern of B-class paleo AP3-like MADS-box genes in crocus (Crocus sativus L.). Plant Science 170: 238-246.

Tsaftaris, A.S., Polidoros, A.N., Pasentsis, K., and Kalivas, A. 2007. Cloning, structural characterization, and phylogenetic analysis of flower MADS-box genes from crocus (Crocus sativus L.). The Scientific World Journal 7: 1047-1062.

Webb, A.R., 2003. Statistical pattern recognition. John Wiley & Sons, 672 pages.