You are here
A MapReduce-like Data-Intensive Processing Framework for Native Data Storage and Formats
Title: Dr.
Phone: (937) 433-2886
Email: gsabin@rnet-tech.com
Title: Dr.
Phone: (937) 433-2886
Email: vnagarajan@rnet-tech.com
Address:
Phone: () -
Type: Nonprofit College or University
MapReduce is a very popular data analytic framework that is widely used in both industry and scientific research. Despite the popularity of MapReduce, there are several obstacles to applying it for developing some commercial and scientific data analysis applications. The project will develop a Native data FOrmat MapREDuce-like framework, iNFORMER, based on OSUs SciMate architecture. The framework allows MapReduce-like applications to be executed over data stored in a native data format, without first loading the data into the framework. This addresses a major limitation of existing MapReduce-like implementations that they require that data be loaded into specialized file systems, e.g., like the Hadoop Distributed File System (HDFS). The overheads and additional data management processes required for this translation can prevent MapReduce from being used in many commercial and scientific environments. Commercial Applications and Other Benefits: There are two large classes of users will benefit from the iNFORMER product. The first are current users of MapReduce-like frameworks. MapReduce is used extensively in commercial applications and is a major component of many Cloud infrastructures. As an example of the extensive use of MapReduce, the linkedin group Hadoop Users currently has more than 30,000 members. The second class of customers that will benefit from iNFORMER includes users who currently use alternative data layouts and desire to use a MapReduce-like framework. These users are likely to currently process this data using alternative frameworks, as the overhead to convert the data into a format suitable format for MapReduce processing is too expensive. Therefore, we expect a subset of these users to be interested in iNFORMER. There are extensive users of these low-level data formats who will be potential customers. For instance, users of HDF and NetCDF users including groups from academia, industry, and national laboratories who could potentially benefit from iNFORMER. HDF is used by over 600 organizations, with over 200 different data types, and millions of users.
* Information listed above is at the time of submission. *