دانلود رایگان مقاله پیدا کردن بهترین آستانه بندی در طبقه بندی نامتوازن

قیمت خرید این محصول

رایگان

دانلود مقاله انگلیسی سفارش ترجمه این مقاله

عنوان فارسی

پیدا کردن بهترین آستانه بندی در طبقه بندی نامتوازن

عنوان انگلیسی

Finding the Best Classification Threshold in Imbalanced Classification

صفحات مقاله فارسی

0

صفحات مقاله انگلیسی

7

سال انتشار

2016

فرمت مقاله انگلیسی

PDF

نشریه

الزویر - Elsevier

کد محصول

E2281

دانشگاه

دانشکده علوم کامپیوتر و فناوری، دانشگاه تیانجین، چین

رشته های مرتبط با این مقاله

آمار و ریاضی، علوم کامپیوتر

کلمات کلیدی

گیرنده مشخصه عامل (ROC)، پروتئین تشخیص همسانی از راه دور، داده های نا متوازن، F-نمره

گرایش های مرتبط با این مقاله

داده کاوی، تحقیق در عملیات

مجله

تحقیقات کلان داده - Big Data Research

برای سفارش ترجمه این مقاله با کیفیت عالی و در کوتاه ترین زمان ممکن توسط مترجمین مجرب سایت ایران عرضه؛ روی دکمه سبز رنگ کلیک نمایید.

۰.۰ (بدون امتیاز)

امتیاز دهید

چکیده

Abstract

Classification with imbalanced class distributions is a major problem in machine learning. Researchers have given considerable attention to the applications in many real-world scenarios. Although several works have utilized the area under the receiver operating characteristic (ROC) curve to select potentially optimal classifiers in imbalanced classifications, limited studies have been devoted to finding the classification threshold for testing or unknown datasets. In general, the classification threshold is simply set to 0.5, which is usually unsuitable for an imbalanced classification. In this study, we analyze the drawbacks of using ROC as the sole measure of imbalance in data classification problems. In addition, a novel framework for finding the best classification threshold is proposed. Experiments with SCOP v.1.53 data reveal that, with the default threshold set to 0.5, our proposed framework demonstrated a 20.63% improvement in terms of F-score compared with that of more commonly used methods. The findings suggest that the proposed framework is both effective and efficient. A web server and software tools are available via datamining.xmu.edu.cn/prht/ or prht.sinaapp.com/.

نتیجه گیری

5. Conclusions

The disadvantage of using AUC for protein remote homology detection was explored in this study. A novel method was proposed for finding the proper prediction probability threshold of a testing set. Experimental evaluation was performed by using an established benchmark, and the results showed that the proposed method can effectively improve prediction performance over more commonly employed methods. In the future, we intend to explore the efficiency of using a function to classify a testing set, as compared with using a single threshold. We expect that a linear function will achieve better performance. Other approaches should also be employed for finding the proper prediction probability threshold, e.g., neural-like computing models [40–43], Hadoop based methods [44,45], which have widely been used in pattern recognition.

برچسب‌ها: دانلود رایگان مقالات انگلیسی ریاضی و آمار، دانلود رایگان مقالات isi