Self-optimized cost sensitive classifiers for imbalanced datasets and its applications in storage systems failure prediction

Perry, Todd (2015) Self-optimized cost sensitive classifiers for imbalanced datasets and its applications in storage systems failure prediction. BSc dissertation, University of Portsmouth.

[img] PDF
Restricted to Registered users only

Download (353kB)

    Abstract

    In many real-world scenarios, datasets tend to suffer from the class imbalance problem, where the number of records belonging to one class is far larger than the number of records belonging to one or more other classes. Class imbalance has been shown to adversely affect the performance of classifiers. Several approaches have been proposed to that attempt to improve the performance of imbalanced classification by either modifying the dataset (resampling), or assigning misclassification costs to the classes (cost matrix ). These methods have been shown to improve performance, but they come with many parameters that need to be set, something that usually requires a lengthy exhaustive search. This paper proposes three algorithms, Genetically Optimized Cost Matrix (GOCM), Genetically Optimized Cost Sensitive Random Forest (GOCRF) and Genetically Optimized Cost Sensitive Random Forest with Undersampling (GOCRFU). Each of these algorithms improve performance by self-optimizing the parameters of the cost matrix (GOCM), as well as some parameters related to Random Forests (GOCRF, GOCRFU) and parameters related to resampling (GOCRFU). The proposed approach is compared against unoptimized classifiers and some other technique (AdaBoost, SMOTE) using a variety of datasets. In addition to this, failure prediction datasets provided by Seagate UK are used as a case study, an example of a real world problem involving imbalanced classification.

    Item Type: Dissertation
    Departments/Research Groups: Faculty of Technology > School of Computing
    Depositing User: Jane Polwin
    Date Deposited: 03 Dec 2015 16:48
    Last Modified: 03 Dec 2015 16:48
    URI: http://eprints.port.ac.uk/id/eprint/19076

    Actions (login required)

    View Item

    Document Downloads

    More statistics for this item...