Term
|
Definition
Database used for reporting and data analysis. Foundation of most BI. Allows accessing and integrating relevant data in a form that is consistent and readily available.
Single version of the truth.
|
|
|
Term
4 Major Characteristics of Data Warehousing |
|
Definition
- Subject-Oriented
- Integrated
- Time-Varient
- Nonvolatile
|
|
|
Term
Characteristic of Data Warehousing:
Subject-Oriented |
|
Definition
Organized by topics
Best for providing a more comprehensive view of the organization.
Says why a business is operating, not just how.
Ex: Sales, products, customers, etc. |
|
|
Term
Characteristic of Data Warehousing:
Integrated |
|
Definition
Different sources stored in one consistent format.
Gives clarity.
|
|
|
Term
Characteristic of Data Warehousing:
Time Varient |
|
Definition
Provides data at various points in time.
Daily, weekly, monthly, etc.
Both historic and current data so you can analyze trends.
Every data warehouse should have a time variable.
Ex: LSU enrollment, retention, etc. |
|
|
Term
Characteristic of Data Warehousing:
Nonvolatile |
|
Definition
Users cannot change the data once entered into the DW. Only used for reference, but unnecessary info can be deleted |
|
|
Term
Additional Chacteristics of DW |
|
Definition
- Designed for web-based usage
- Has relational/multidimensional structure
- Uses client/server layout to provide easy access to end-user
- For newer DWs, allows for real-time and active data access and analysis
- Contains metadata which is info that describes your data (data about data)
|
|
|
Term
3 Main Types of DW's:
Data Mart
(2 Subsets) |
|
Definition
Usually consists of one subject area
Ex: Sales
- Dependent Data Mart- created directly from the dw. Ensures that the user is viewing the same data available to all users. EDW must be constructed first.
- Independent Data Mart- Small warehouse designed for a department or strategic business unit (SBU). It's source is not an EDW.
|
|
|
Term
3 Main Types of DW's:
Operational Data Stores
(ODS) |
|
Definition
Type of database used as a staging area for a dw, especially for customer info. files (CIF).
Data are updated frequently as opposed to the static contents of the dw. (short-term memory) |
|
|
Term
3 Main Types of DW's:
Enterprise Data Warehouses
(EDW) |
|
Definition
Large-scale dw used across the company for decision support.
Integrates data in a standard format from many sources.
Ex: DirecTV and Enterprise Rental use EDW |
|
|
Term
|
Definition
Data about data.
Describes the contents/structure of a dw and why it's used.
(field name, data type, ect.)
|
|
|
Term
Major Component of the DW process:
Data Sources |
|
Definition
Transactional data (OLTP) such as CRM, external dta (ex: census data), Access, etc.. |
|
|
Term
Major Component of the DW process:
ETL Process
(Extraction, Transformation, Load) |
|
Definition
Data is taken from external sources, maintained in a staging area where transformed and integrated, then loaded into the DW or DM (data mart). |
|
|
Term
Major Component of the DW process:
Comprehensive Database |
|
Definition
The EDW used to support all decision analyses. |
|
|
Term
Major Component of the DW process:
Metadata |
|
Definition
Maintained so it can be used by IT users.
Includes software programs with rules for organizing data that can be indexed and searched. |
|
|
Term
Major Component of the DW process:
Middleware Tools |
|
Definition
Tools that access the contents of dw.
Ex: data mining, queries, predictive analysis, ect. |
|
|
Term
|
Definition
Computer hardware that provides a specific service |
|
|
Term
DW Architectures:
Application Server |
|
Definition
Computer hardware responsible for the efficient execution of procedures (programs) |
|
|
Term
DW Architectures:
Database Server |
|
Definition
Sometimes referred to as the "back-end"
Holds the DW |
|
|
Term
DW Architectures:
Client Software |
|
Definition
Allows users to request a server's content or function. |
|
|
Term
Two-Tiered Architecture
(Includes Pros and Cons) |
|
Definition
- Client Workstation- allows end user to request both the application functions and data content from 1 server.
- Application Server and DW run on the same server
Pros- more economical than three-tiered
Cons- Can have performance problems for large dw using complicated applications
(Picture in notes) |
|
|
Term
Three-Tiered Architecture
(Includes Pros and Cons) |
|
Definition
- Client Workstation- allows the end user to request application functions (ex: Access) and request data content (ex: Access database files)
- Application Server- responsible for execution of programs
- Database Server- houses the database or data warehouse
Pros: Seperates the application and database functions, allows for greater capacity and performance of the respective servers
Cons: Increased cost due to more hardware requirements |
|
|
Term
Issues Considered When Deciding on the Architecture to use (4) |
|
Definition
- Which Database Management System (DBMS) should be used?
- Will parallel processing/partitioning be used? (Parallel partitioning referes to splitting data tables into smaller tables for efficient access)
- Will data migration tools be used to load the data warehouse?
- What tools will be used to support data retrieval and analysis?
|
|
|
Term
Alternative DW Architectures:
Independent Data Mart |
|
Definition
Contains data coming directly from the operational data sources |
|
|
Term
Alternative DW Architectures:
Data Mart Bus |
|
Definition
Tightly integrated data marts that get their power from conformed dimensions and fact tables.
Conformed dimension is defined one time and used everywhere |
|
|
Term
Alternative DW Architectures:
Hub-and-Spoke |
|
Definition
|
|
Term
Alternative DW Architectures:
Centralized Enterprise DW |
|
Definition
Collaboration of the company data extracts. |
|
|
Term
Alternative DW Architectures:
Federated DW |
|
Definition
|
|
Term
10 Factors that Potentially Affect the Architecture Selection Decision |
|
Definition
- Info interdependence between organizational units
- Upper management's info needs
- Urgency of need for a dw
- Nature of end-user tasks
- Constraints on resources
- Strategic view of the data warehouse prior to implementation
- Compatibility with existing systems
- Percieved ability of the in-house IT staff
- Technical issues
- Social/political factors
|
|
|
Term
|
Definition
Supports the central DW architecture |
|
|
Term
3 Major Processes of Data Integration |
|
Definition
Data Integration- combining data from different sources, providing users with a unified view of the data.
- Data access
- Data federation
- Change capture
|
|
|
Term
3 Major Processes of Data Integration:
Data Access |
|
Definition
Refers to the ability to access and extract data form any data source |
|
|
Term
3 Major Processes of Data Integration:
Data Federation |
|
Definition
Exists when one application is able to treat multiple data stores as one entity |
|
|
Term
3 Major Processes of Data Integration:
Change Capture |
|
Definition
Process of capturing changes made at the data source and applying them throughout the entire enterprise. Ensures data consistency. |
|
|
Term
|
Definition
Major technical side of DW process.
Takes up 70% of time used to build DW.
DW would not exist without ETL, ensures data quality.
Includes Extraction, Transformation, and Load |
|
|
Term
|
Definition
Reading data from one or more databases.
ex: spreadsheets |
|
|
Term
|
Definition
Converting data from their original form to whatever form the DW needs.
Removes errors, splits fields, recodes, etc. |
|
|
Term
|
Definition
Putting the transformed data into the DW |
|
|
Term
What a company must do to ensure a successful DW implementation proccess |
|
Definition
- Define the plan (business objectives and strategies)
- Gather support from managers and end-users
- Set reasonable time frames and budgets
- Manage expectations
|
|
|
Term
The Imnom Model
(DW Approach) |
|
Definition
Ultimately results in an EDW
Created by Bill Inmon (Father of DW)
Top-down approach, the enterprise has one dw and the data marts are created from the dw. |
|
|
Term
Kimball Model
(Data Mart Approach) |
|
Definition
Results in an EDW
Created by Ralph Kimble
Bottom-up approach, the dw is a collection of all data marts across the enterprise
built one dm at a time
Data mart is subject-oriented and focuses on the departmental level (ex: sales, inventory, etc.) |
|
|
Term
Additional DW Developmental Considerations |
|
Definition
Outsourcing to have a seperate company maintain the DW
security/privacy concerns |
|
|
Term
The Star Schema
(Representation of data in a dw) |
|
Definition
Design is based upon a concept called
dimensional modeling (retrieval-based system that allows for querying data tables, usually in terms of location, time period, etc)
The schema consists of a central FACT table surrounded by several DIMENSION tables
The FACT table is your data to be analyzed. Contains foreign keys (FK) to be linked to the dimensional tables
The DIMENSION table contains classification information used to define how your facts/data are to be summarized
-dimension tables have one-to-many relationships with rows in the fact table
Examples in notes |
|
|
Term
|
Definition
OLAP: Captures and stores day-to-day business operations
OLTP: Can perform ad-hoc analysis and complex multidimensional queries |
|
|
Term
|
Definition
A subset of a multidimensional array of corresponding to a single value on ONE of the dimensions.
ex: (in notes) The single value on one dimension was alcohol involvement= yes
|
|
|
Term
|
Definition
A slice on more than 2 dimensions
(same as slicing the slice of a slice) |
|
|
Term
OLAP Operations:
Drill Down/Up |
|
Definition
Technique where the user navigaes among levels of data ranging from the most summarized (up) to the most detailed (down) |
|
|
Term
|
Definition
Involves computing totals across all levels of a dimension (or multiple dimensions) |
|
|
Term
|
Definition
Changing the dimensional orientation of a cube
(ex: rotating the cube) |
|
|
Term
|
Definition
Refers to the highest level of detail.
ex: sales by region vs sales by state
If granularity is too high, then the dw may not support requests to drill down into the data.
ex: if sales are only stored by department, you cannot request data by sales rep |
|
|
Term
|
Definition
The ability to accommodate when informtion increases and maintain performance. |
|
|
Term
Real-time DW (RDW)
(aka-active data warehouse (ADW)) |
|
Definition
Proccess of loading and providing data by way of a dw as they become more available
Data used for decision making is updated on an ongoing basis
Becoming more useful for companies who interact directly with customers and suppliers and need up-to-date info |
|
|
Term
Evolutionary Process of Increased Requirement of the DW |
|
Definition
- At a very basic level, the DW says what happened in terms of reports. (ex: daily sales reports)
- Next level, DW analyzes why something happened (ex: sales dropped due to defective billboards)
- As business increases, DW predict what will happen.
- Technology advances then to describe what is happening now. Allows for companies to react quickly
|
|
|
Term
|
Definition
Maintains the DW and has skills that surpass a traditional database administrator (DBA). Besides technical skills, they should have:
- Know high-performance hardware, software, and networking technologies
- Solid business insight, to understand the DW purpose
- Familiarity with business decision-making processes to understand how the DW will be use
- Excellent communication skills
|
|
|
Term
4 Main Areas of DW Security |
|
Definition
- Establishing effective corporate and security policies and procedures
- Implementing logical security procedures to restrict access (ex: authenication, encryption, etc)
- Limiting physical access to the data center environment
- Establishing an effective internal review process for security and privacy
|
|
|