Performance analysis of small scale Hadoop clusters

Windsor, Christopher (2015) Performance analysis of small scale Hadoop clusters. BSc dissertation, University of Portsmouth.

[img] PDF
Restricted to Registered users only

Download (3188kB)

    Abstract

    The requirement to identify trends and patterns within a reasonable time frame from large amounts of data has brought about the invention of the term 'Big Data'. Apache Hadoop created a framework, utilising the MapReduce application, was created to fulfil this requirement and there have been many studies to distinguish the factors that affect the performance of Hadoop clusters. The majority of these studies use a large number of nodes, about 10 or more, and datasets approximately 100GB & higher in size with very few investigating the performance on small scale clusters or datasets. With no current limitations to what size datasets can be branded as Big Data, this leaves a big knowledge gap in the business market as to how the performance of small Hadoop clusters scale with small datasets.
    With this in mind, the project aims to research the effects on performance of smaller scale Hadoop clusters with small datasets ranging from 1GB up to 500GB using a variety of benchmarking tools provided by Apache Hadoop.

    Item Type: Dissertation
    Departments/Research Groups: Faculty of Technology > School of Computing
    Depositing User: Jane Polwin
    Date Deposited: 03 Dec 2015 14:54
    Last Modified: 03 Dec 2015 14:54
    URI: http://eprints.port.ac.uk/id/eprint/19060

    Actions (login required)

    View Item

    Document Downloads

    More statistics for this item...