Term
|
Definition
Hadoop distributed file system - High-performance distributed file system for storing data. |
|
|
Term
|
Definition
Map Reduce 2.0 -splits the 2 major functionalities of the job tracker, resource managment and scheduling/monitoring, into the Resource Manager and Application Master. |
|
|
Term
|
Definition
Used for migrating data between structured data stores and hdfs/hadoop storage |
|
|
Term
|
Definition
interpreting language layered over map reduce - high level language for data analysis |
|
|
Term
|
Definition
data wharehouse facilitating querying and managing large datasets - mimics relational database syntax and such |
|
|
Term
|
Definition
utility to create and run map reduce jobs with any executable or script as the mapper or reducer |
|
|
Term
|
Definition
distributed, scalable, big data store - stores data as sorted key/value pairs with the key consisting of row and columns - used for fast lookup |
|
|
Term
|
Definition
Robust, scalable, high-performance data storage and retrieval key/value store
cell-based access controls |
|
|
Term
|
Definition
Serialization framework that compresses and serializes data for storage or transfer. Relies heavily on schemas |
|
|
Term
|
Definition
columnar storage format for Hadoop. |
|
|
Term
|
Definition
Machine learning library to build scalable machine learning algorithms implemented on top of Hadoop MapReduce |
|
|
Term
|
Definition
Distributed real-time computation system - processes streaming data in real time in memory, making it extremely fast |
|
|
Term
|
Definition
centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services |
|
|
Term
|
Definition
Open-source in-memory key/value stores |
|
|
Term
|
Definition
fast, general engine for large-scale data processing |
|
|
Term
|
Definition
Batch workflow job scheduler to run hadoop jobs |
|
|
Term
|
Definition
NoSQL database for managing large amounts of structured, semi-structured, and unstructured data |
|
|
Term
|
Definition
Design pattern - group records together by a field or set of fields and calculate a numerical aggregate per group...
mapper, partitioner, reducer |
|
|
Term
|
Definition
Design pattern - Generate an index from a data set to enable fast searches or data enrichment. Takes time, but greatly reduces search times, output can be ingested into a key/value store |
|
|
Term
|
Definition
Design pattern - used to do to do concatena@on prior to the reduce phase |
|
|
Term
|
Definition
Design pattern - use mapreduce framework's counter utility to calculate global sum entirely on the map side, producing no output |
|
|
Term
|
Definition
Filtering pattern - (map side) filtering |
|
|
Term
|
Definition
Filtering pattern - keep records that are a member of a large predefined set of values - tiny possibility of false positives. Example: filtering out comments that don't contain a keyword |
|
|
Term
|
Definition
|
|