Term
** Section 1. Total quality management approach to manage and improve data quality in organization. |
|
Definition
Total Data Quality Management (TDQM) Cycle:
1. Define
2. Measure
3. Analyze
4. Improve |
|
|
Term
Section 1. Common Measures of Data Quality (AB-CACTI) |
|
Definition
1. Accuracy
2. Believability
3. Completeness
4. Accessibility
5. Consistency
6. Timeliness
7. Interpretability |
|
|
Term
Section 1. Data governance establishes a formal structure and process for all important issues surrounding data. Data governance helps define MOANA: |
|
Definition
1. Mechanisms for sharing data
2. Ownership of data
3. Access rights of data
4. Necessary quality management
5. Audits of Data |
|
|
Term
Section 1. Data governance requires PACTS: |
|
Definition
1. Processes
2. Accountability
3. Commitment
4. Technology
5. Structure |
|
|
Term
Section 1. Data-driven business values |
|
Definition
Data visibility
Data accessibility
Data analytics capability
Information velocity
Individual productivity and organization performance
Data quality = the totality of features that define if the data has the ability to satisfy the given purposes. |
|
|
Term
Section 1. Mitigating the bullwhip effect through data/information sharing in the supply chain. |
|
Definition
Lack of trust among channel partners - if they share, the bullwhip effect is reduced
Can't plan production efficiently and effectively |
|
|
Term
Section 1. Business-driven analysis of IT needs in an Organization |
|
Definition
1. Business Objectives
2. Business strategies, activities, and operations
3. Information requirement
4. Core systems functionalities
5. Information systems need prioritization and evaluation |
|
|
Term
section 2. Data mining for Buis Intel. What is data mining? |
|
Definition
Technological approach to extract business intelligence from vast amounts of (high quality) data |
|
|
Term
section 2. Techniques of data mining and what it does |
|
Definition
Combining the top-down concept-driven approach and the bottom-up data-driven approach
Gives a high-level view |
|
|
Term
**Section 2. Data Mining: General Process |
|
Definition
1. Define the problem
2. Select relevant data - looking for patterns
3. Clean data
4. Transform data into usable format
a. Bottom up - we think the data is going to say X
b. Top-Down - we know the data is going to say this. Closing the loop with data mining. Using data to reinforce an assumption.
c. Not using the gut feeling approach but using data to validate the gut feeling.
5. Make Business decisions
a. Data mining: a high-level view
b. Business question
c. Prepare data through data mining
d. Analyze results
c. Business action
|
|
|
Term
Section 2. Common data patterns and their respective applications |
|
Definition
Cluster Analysis - divides set into mutually exclusive, distinct groups such that members of each group are as close together as possible to one another, and the different groups are as far apart as possible
Association Pattern/Rule Analysis - Reveals the degrees to which variable in a data set are associated with one another, in terms of intensity and frequency
Classification Analysis - assign an instance (example) to one of the predefined outcome classes
Statistical Analysis - correlation, distribution, variance analysis |
|
|
Term
Section 2. Web Mining and Business applications |
|
Definition
Discovery and analysis of useful patterns and information from the Web
Better understand customer behaviors, evaluate the effectiveness of a website, or manage marketing campaigns |
|
|
Term
Section 2. Data mining pitfalls and complexity |
|
Definition
Pitfalls
Not understanding business needs and problems
Lack of data mining model development and validation
Insufficient participation by business domain experts
Complexity
Scalability (advances in data generation and collection)
High dimensionality (data sets w/ thousands of attributes)
Heterogeneous and complex data
Data ownership and distribution
Non-traditional analysis (desire to automate the process of hypothesis generation and evaluation) |
|
|
Term
**Section 3. Association Pattern/Rule Analysis
Definitions of Itemset; Cardinality of an itemset; Support of an itemset |
|
Definition
Itemset - a set of items
Cardinality - the exact number of items in an itemset
Support - the ratio between the number of transactions that include an itemset and the total number of transactions under analysis |
|
|
Term
**Section 3. Support and confidence of an association pattern |
|
Definition
Support - the ratio between the number of transactions that include an itemset and the total number of transactions under analysis
Confidence - an indication of how often the rule is found to be true |
|
|
Term
Section 3. Example database |
|
Definition
|
|
Term
Section 3. Apriori Algorithm for association pattern analysis |
|
Definition
Apriori algorithm = The downward closure property of support. When the support of itemset X is less than the specified minimum support, any itemsets that contain X will also fail to meet the specified minimum support.
|
|
|
Term
Section 3. Definitions: Association pattern, Sequential, Classification, Clustering, Insurance |
|
Definition
Association pattern – diapers and beer example. To solve for support all of the data transactions(bottom) total number looking (top) 40/100. Confidence – part of the data
Sequential – events are linked over time. The sequence of when things are bought
Classification – groups are set up. The decision tree supervised learning – being led to a decision – supervised learning. Decisions tree
Clustering - when know groups have been defined unsupervised learning.
Insurance – group of people, cluster them M/F, Income, education, then classify them. K-means
|
|
|
Term
**Section 4. Clustering Analysis and Classification Analysis
Clusting (k-means) and business applications |
|
Definition
A process to segment a group of objects into multiple distinct subgroups such that members in one cluster are similar to each other and distinctively different from the members of any other cluster.
No pre-classified data
|
|
|
Term
**Section 4. Basic Steps for Market Segmentation |
|
Definition
1. Formulate the problem and select the variables that we want to use for the basis of clustering
2. Compute the distance customers along the selected variables
3. Apply the clustering procedure to the chosen distance measure
4. Decide the number of clusters
5. Map and interpret clusters and draw conclusions – perceptual maps |
|
|
Term
**Section 4. Classification analysis/ Accuracy |
|
Definition
A process that established classes with attributes form a set of instances.
Accuracy = overall correctness of the model and is calculated as the sum of the correct classifications decided by the total number of classifications. |
|
|
Term
Section 5. Management of Business Data and Information
**Common Problems inherent to file-processing approach for managing business data
Essential characteristics |
|
Definition
1. Redundancy
2. Quality
3. Limited Data visibility
4. Inconsistency
5. Limited data integration
Essential:
1. Self-describing collection of data
2. Related data
3. Integrated data
4. Shared data |
|
|
Term
Section 5. Advantages to database vs fire-processing |
|
Definition
Database systems:
1. Redundancy
2. Consistency
3. Data sharing
4. Accessiblity
5. Cheaper to maintain
6. Increase Workflow
Disadvantages: More specialized staff and maintenance |
|
|
Term
Section 5. Data Warehouse |
|
Definition
-
Designed and optimized for analysis and quick response to queries.
-
Are nonvolatile; when data are stored, they can be read only and rarely deleted so that they can be used for comparison with newer data.
-
Online analytic processing system (OLAP)
-
Subject-oriented
|
|
|
Term
**Section 6: Database design: Overall Process of design |
|
Definition
Data Requirements
↓
Conceptual database design (Entity-Relationship data model; this is a common tool for conceptual design)
↓
Logical database design (Relational data model)
↓
Physical database design (File organization and access path)
↓
Database implementation
|
|
|
Term
Section 6. E-R data model: Relationship model |
|
Definition
|
|
Term
Section 7. Process Management
What is a business process? |
|
Definition
|
|
Term
**Section 7. Why an organization should model (document) its important business processes? |
|
Definition
-
Measure
-
Monitor
-
Everyone knows their role
-
Consistency
-
And improve business processes
-
Internal prospective - Time, Quality, Cost
|
|
|
Term
**Section 7. STAR-BEE - Why should an firm document it's processes? |
|
Definition
Standardize processes to increase service consistency/quality
Transparency improvement – each person sees his/her role in the business process
Automate using workflow systems, increase organizational readiness for ERP
Retain and increase essential process knowledge at the organizational level
Better connect the organization’s core competence and its important business processes Enable process improvement, redesign, or reengineering
Enhance organization performance by knowing exactly what we do and how well we are doing in each important process
|
|
|
Term
Section 7. Measuring performance of processes with quantitative metrics. |
|
Definition
-
Throughput time
Process time
Quality score
Cost
Customer ratings
Retention
|
|
|
Term
Section 7 - Process benchmarking |
|
Definition
-
The process of searching for the best methods, practices, & processes, and adopting or adapting the good features to become the “best of the best”.
-
Allows the firm to identify the concepts underlying what world-class companies do, understanding how they do it, and adapting what we have learned to our own situation.
|
|
|
Term
**Section 8 - Process Modeling
DFD Basic Constructs |
|
Definition
Data flow
Data store
Process
External Entity |
|
|
Term
|
Definition
DFD Rules
-
No process can have only output data (no miracles)
-
No process can have only input data (no black holes)
-
Data cannot move directly from an external entity to a data store (must be moved by a process)
-
Data cannot move directly from a data store to an external entity
-
Data cannot move directly from a data store to another data store
-
Data cannot move directly from an external entity to another external entity
-
Data cannot go directly back to the same process it leaves
Balancing = conserving the input and output of a process in the data flow diagram (DFD) when the process is functionally decomposed to a lower level
Data dictionary = an organized, cross-referenced listing of the definition and structure for the data flows, data stores, and decomposable data elements contained in a system
|
|
|
Term
Section 8 - DFD modeling example |
|
Definition
|
|
Term
Section 8 - Customer Complaint Process |
|
Definition
|
|
Term
Section 9 - IT Investment Management
Benefits
IT productivity paradox |
|
Definition
Direct versus indirect, quantifiable versus qualitative, tangles versus intangibles
“IT Does Not Matter” |
|
|