Term
|
Definition
- On-line transaction processing
- Traditional workloads, 'bread and butter' processing
- Volumes of data, transactions grow, networks getting larger
|
|
|
Term
|
Definition
- On-line analytical processing
- Includes the use of data warehouses
- Multidimensional databases
- Data analysis
- The name given to the dynamic enterprise analysis required to create, manipulate, animate and synthesise information from exagetical, contemplative and formulaic data analysis models (exagesis=critical explanation, complentative=asking what if questions, formulaic=which parameters must be varied in order to achieve a given outcome)
|
|
|
Term
What are the 12 rules of OLAP? |
|
Definition
- Multidimensional conceptual view
- Transparency
- Accessibility
- Consistent reporting
- Client server architecture
- Generic dimensionality
- Dynamic sparse matrix handling
- Multi-user support
- Unrestricted cross-dimensioal operations
- Intuitive data manipulation
- Flexible reporting
- Unlimited dimensions and aggregation levels
|
|
|
Term
What are the key concepts of Data Mining? |
|
Definition
- The process of discovering hidden patterns and relations in large databases using a variety of advanced analytical techniques
- Attempts to use the computer to discover relationships that can be used to make predictions
- Data mining tools often find unsuspected relationships in data that other techniques would overlook
|
|
|
Term
What are some data mining approaches? |
|
Definition
- Rule-based analysis
- Neural networks
- Fuzzy logic
- K-nearest neighbour
- Genetic algorithms
- Advanced visualisation
- Combination of any of the above
|
|
|
Term
What is the data warehouse? |
|
Definition
- A subject-oriented, ingegrated, time-variant, non-volatile collection of data that is used primarily in organizational decision making
- Subject oriented - The data is organized according to subject data instead of application and contains only the information necessary for 'decision support' processing
- Integrated - Data encoding is made uniform (e.g. sex=f or m, 1 or 2, b or g. Data naming is made consistent)
- Time variant - Data is collected over time and can then be used for comparisons, trends and forecasting
- Non-volatile - The data is not updated or changed once in the data warehouse, but is simply loaded and then accessed. The data warehouse is held quite separately from the operational databse, which supports OLTP
|
|
|
Term
Why is it that a separate warehouse is needed? |
|
Definition
- Performance
- Operational databases are optimized to support known transactions and workloads
- Special data organization, access methods and implementation methods are needed
- Complex OLAP queries would degrade performance for operational transactions
- Missing data
- Decision support requires historical data, which operational databases do not typically maintain
- Data consolidation
- Decision support requires consolidation (aggregation summarization) of data from many heterogeneous resources, including operational databases and external sources
- Data quality
- Different sources typically use inconsistent data representations, codes and formats, which have to be reconciled
|
|
|
Term
Show a diagram on how data is extracted |
|
Definition
|
|
Term
How would a data warehouse may be realised? |
|
Definition
- Via a front end to existend databases and files
- In a fresh relational database
- In a multi-dimensional database (MDDB)
- In a proprietary databse format
- Using a mixture of the above
|
|
|
Term
How can data from a data warehouse may be accessed? |
|
Definition
- Decision support systems (DSS)
- Executive Information Systems (EIS)
- Data mining
- On-line analytical processing
|
|
|
Term
What are the characteristics of Data Marts? |
|
Definition
- Focus on either:
- Only one subject area or
- Only one group of users
- An organization can have:
- One enterprise data warehouse
- Many data marts
- Do not contain operational data
- Are more easily understood and navigated
|
|
|
Term
What are the characteristics of Multidimensional analysis? |
|
Definition
- Need to examine data in various ways
- Produce views of multidimensional data for users
- Slice
- Dice
- Pivot
- Drill Down
- Roll Up
|
|
|
Term
What are the characteristics of slice in multidimensional analysis? |
|
Definition
|
|
Term
What are the characteristics of dice in multidimensional analysis? |
|
Definition
|
|
Term
What are the characteristics of pivot in multidimensional analysis? |
|
Definition
|
|
Term
What are the characteristics of drill down in multidimensional analysis? |
|
Definition
|
|
Term
What are the characteristics of roll up in multidimensional analysis? |
|
Definition
|
|
Term
What are some internal aspects of data warehouses? |
|
Definition
- Schemas
- Star schema
- Snowflake schema
- Fact constellation schema
- Aggregated data
- Specialised indexes
- Bitmap indexes
- Join indexes
- Specialised join methods
|
|
|
Term
Give an example of star schema |
|
Definition
|
|
Term
Give an example of a fact table |
|
Definition
|
|
Term
Show an example of a snowflake schema |
|
Definition
|
|
Term
What are some applications/types of Data Warehouse Databases? |
|
Definition
- Relational and specialised RDBMSs
- Specialised indexing techniques, join and scan methods
- Relational OLAP (ROLAP) servers
- Explicitly developed to use a relational engine to support OLAP
- Include aggregation navigation logic, the ability to generate multi-statements SQL, and other additional services
- Multidimensional OLAP (MOLAP) servers
- The storage model is an n-dimensional array
- May use a 2-level approach, with 2D dense arrays indexed by B-trees
- Time is often one of the dimensions
|
|
|