Term
|
Definition
A software system that enable users to define, create, maintain and control access to the database. |
|
|
Term
|
Definition
An collection of related records or files consolidated into a common pool that provides data for one or more multiple uses. |
|
|
Term
|
Definition
- Database
- Database Administrator
- Application Programs
- Hardware
|
|
|
Term
|
Definition
- Data sharing
- Centralized control
- Redundancy control (normalization)
- Data integrity
- Data security
- Views
- Data independence
|
|
|
Term
Logical Data Independence (LDI) |
|
Definition
- Conceptual schema is a description of data and relationships
- HARD to achieve
|
|
|
Term
Physical Data Independence (PDI) |
|
Definition
- Physical storage can be changed without bothering users
- EASIER to achieve than LDI
|
|
|
Term
|
Definition
- DB design is complex
- Cost (hardware/software)
- Availability (single point of failure)
- Speed vs. Flexibility
|
|
|
Term
|
Definition
- Data Description Language - turns conceptual schema into physical DB description
- Data Manipulation Language - insert/delete data
- Query Language - ask questions
- Data Control Language - security
|
|
|
Term
|
Definition
The overall description of a database |
|
|
Term
|
Definition
A description of DBMS components and their interconnections |
|
|
Term
ANSI/SPARC (External Level) |
|
Definition
Describes end-user perspectives on the data |
|
|
Term
ANSI/SPARC (Conceptual Level) |
|
Definition
Defines field groupings, relationships between data, etc. |
|
|
Term
ANSI/SPARC (Internal Level) |
|
Definition
Defines record sizes, field representations, types of indices, etc. |
|
|
Term
External-Conceptual mapping |
|
Definition
Allows field renamings and provides LDI |
|
|
Term
Conceptual-Internal mapping |
|
Definition
Converts logical structure to physical representations, and provides PDI |
|
|
Term
Sources of Read/Write Delay |
|
Definition
- Seek time
- Rotational delay
- Transfer time
|
|
|
Term
|
Definition
Advantages
- can operate with a failed drive
Disadvantages
|
|
|
Term
|
Definition
Advantages
- performance (parallelism)
Disadvantages
- no data replication
- increased prob. of disk failure
|
|
|
Term
|
Definition
- Non-Redundant
- striping only
- good write performance
- no data redundancy
|
|
|
Term
|
Definition
- Mirrored
- no striping
- opportunity for parallel reads
- 2x the number disks ($$$)
|
|
|
Term
|
Definition
- Bit-Interleaved Parity
- striping unit = 1 bit
- can recover from disk failure
|
|
|
Term
|
Definition
- Block-Interleaved Distributed Parity
- striping unit = 1 block
- parity blocks are distributed across all disks
- fairly complex controller design
|
|
|
Term
|
Definition
The number of whole records that can be stored within a single block |
|
|
Term
|
Definition
A file consisting of structured references to records of another file |
|
|
Term
|
Definition
A key that can uniquely identify a record |
|
|
Term
|
Definition
|
|
Term
|
Definition
Any other field besides the primary key field |
|
|
Term
|
Definition
The field on which the records are sorted |
|
|
Term
|
Definition
- Indexed field is a candidate key
- Indexed records are sorted on the CK
- The DB file records are sorted on the CK
- Can have only one per DB file
|
|
|
Term
|
Definition
- The indexed field is not a CK
- The indexed records are sorted on the key
- The DB records are sorted on the key
|
|
|
Term
|
Definition
- The indexed field may be any field
- The index records are sorted
- The DB records are not sorted
- Can have as many secondary indices as needed
|
|
|
Term
|
Definition
- Holds one record for each in DB file
- Can do 'existence check' on just the index
|
|
|
Term
|
Definition
- Indexes only a subset of the field values
- Smaller, therefore faster searching
|
|
|
Term
|
Definition
- Each node contains at most 2M keys
- Each node (excepting the root) has at least M keys
- A non-leaf node has at least 2 children
- All leaf nodes are at the same level
- A non-leaf node of n keys has n+1 children
|
|
|
Term
|
Definition
- Each key is stored in a leaf node
- Copies of some keys occupy internal nodes
- Leaf nodes are linked sequentially left to right to create a sequence set
- Linear searching
- No extra storage required
|
|
|
Term
|
Definition
- Has built-in disk pointers for keys in leaves
- Supports exact-match and range queries
|
|
|
Term
Disadvantages of a B+-Tree |
|
Definition
- 'Wastes' storage capacity of internal nodes (relatively minimal)
- Insertions and deletions are a bit more complex
|
|
|
Term
|
Definition
A marker that indicates that a field does not have a value |
|
|
Term
|
Definition
A field in one file whose values are drawn from the primary key field of another file |
|
|
Term
|
Definition
- Requirements Analysis - talk to the client
- Conceptual Design - create conceptual schema
- Logical Design - convert to DBMS' implementation data model (normalization)
- Physical Design - physical storage requirements
|
|
|
Term
|
Definition
An instance of a general classification |
|
|
Term
|
Definition
An association between a set of entities |
|
|
Term
|
Definition
- Schema is tree-structured
- Only 1:M relationships (1 parent & M children)
|
|
|
Term
|
Definition
- Graph-based
- A std. theory of DB Systems
|
|
|
Term
|
Definition
- Theoretical foundation: set theory
- Uses no pointers
- No physical/logical schema distinction
|
|
|
Term
|
Definition
- Could map object data into a tuple (awkward)
- Object persistence
|
|
|
Term
What are the names and salaries of the people in department #5 (TRC)? |
|
Definition
{ e.surname, e.givenname, e.salary | Employee(e) Λ e.deptid = 5 } |
|
|
Term
What are the names of the parts that can be supplied by individual suppliers in quantity > 200 (TRC)? |
|
Definition
{ p.Pname | P(p) Λ ∃q (SP(q) Λ p.P# = q.P#
Λ q.Qty > 200) } |
|
|
Term
What are the names of the active suppliers of nuts (TRC)? |
|
Definition
{ s.sname | S(s) Λ s.status = 'Active' Λ
(∃q)(SP(q) Λ s.S# = q.S# Λ
(∃p)(P(p) Λ q.P# = p.P# Λ
p.pname = 'Nut')) } |
|
|
Term
What is the content of the Employee Relation (DRC)? |
|
Definition
{<efghi> | <efghi> ∈ Employee } |
|
|
Term
What are the names and salaries of the people in department #5 (DRC)? |
|
Definition
{ <fei> | (∃h)(<efghi> ∈ Employee Λ h=5) } |
|
|
Term
What are the names of the parts that can be supplied by individual suppliers in quantity > 200 (DRC)? |
|
Definition
{ <o> | (∃n)(<nopqr> ∈ P Λ
(∃tu)(<stu> ∈ SP Λ n=t Λ
u > 200)) } |
|
|
Term
What are the names of the active suppliers of nuts (DRC)? |
|
Definition
{ <k> | (∃jl)(<jklm> ∈ S Λ l='Active' Λ
(∃st)(<stu> ∈ SP Λ j=s Λ
(∃no)(<nopqr> ∈ P Λ n=t Λ
o='Nut'))) } |
|
|
Term
Find the names & salaries of the employees in department #5 (Relational Alg.) |
|
Definition
∏givenname,surname,salary (σdept#=5(Employee)) |
|
|
Term
What are the names of the active suppliers of nuts (Relational Alg.)? |
|
Definition
∏sname (σstatus>0 ∧ pname='Nut'
(σS.S#=SP.S# (σP.P#=SP.P#
(S × (SP × P))))) |
|
|
Term
What are the names of the parts that can be supplied by individual suppliers in qty > 200 (Relational Alg.)? |
|
Definition
∏pname (σqty>200 (σSP.P#=P.P# (SP × P)))
or
∏pname (σqty>200 (SP ⋈SP.P#=P.P# P)
|
|
|
Term
|
Definition
r ⋈θ s where θ is any PK-FK comparison
|
|
|
Term
|
Definition
A theta join where θ is an equality comparison between PK & FK |
|
|
Term
|
Definition
An equijoin in which columns with matching names in both tables are compared, and only one of the matching columns will appear in the resulting table |
|
|
Term
If we have one of each part in a box, how much does the content weight (SQL)? |
|
Definition
SELECT SUM(weight) from p; |
|
|
Term
What are the average quantities in which suppliers are supplying parts (SQL)? |
|
Definition
SELECT sno, AVG(qty)
FROM spj
GROUP BY sno; |
|
|
Term
|
Definition
- Left Outer Join: retains unmatched tuples from left relation
- Right Outer Join: retains unmatched tuples from right relation
- Full Outer Join: retains all unmatched tuples
|
|
|
Term
|
Definition
CREATE TABLE <name> (
<attr. name> <data type> [NOT NULL],
...
[PRIMARY KEY (<attr>)]
);
|
|
|
Term
|
Definition
CREATE VIEW <name>
[(<attr. list>)]
AS <select stmt>;
CREATE VIEW supplierpart("Supplier Name","Part #")
AS SELECT DISTINCT sname, pno
FROM s, spj
WHERE s.sno = spj.sno;
|
|
|
Term
Updating Content of Tuples (SQL) |
|
Definition
UPDATE <relation name>
SET <attr. name> = <expression> [,...]
[FROM <relation list>]
[WHERE <condition>];
Ex:
UPDATE m SET name = 'Ray' where id = 1; |
|
|
Term
|
Definition
DELETE FROM <relation name>
WHERE <condition>; |
|
|
Term
|
Definition
DROP [TABLE|INDEX|VIEW|DATABASE] <name>; |
|
|
Term
|
Definition
A group of one or more operations treated as a single, indivisible unit of work |
|
|
Term
ACID Properties of Transactions |
|
Definition
- Atomicity: Xacts are all or nothing
- Consistency: An Xact's actions retain the consistency of the DB
- Isolation: Each Xact thinks it's the only Xact in the system
- Durability: A completed Xact's changes are permanent - won't be lost
|
|
|
Term
|
Definition
- Active: Xact is being executed
- Partially Committed: Xact's actions are complete but changes may not yet be on disk
- Committed: Xact's changes are on disk
- Failed: The Xact cannot finish, but partial changes still linger
- Aborted: All changes have been cleared away
|
|
|
Term
|
Definition
- Follow the ECA model:
- Event - causes the activation
- Condition - if true, action is performed
- Action - SQL statement(s)
|
|
|
Term
Disadvantages of Triggers |
|
Definition
- Hard to write appropriate actions
- Specified separately from relations
- Can reduce concurrency of the DBMS
- Trigger interaction
- Rule Termination
|
|
|
Term
Basic Trigger Syntax (Oracle) |
|
Definition
CREATE TRIGGER <name>
{BEFORE/AFTER} {INSERT/DELETE/UPDATE}
ON <relation>
[ [FOR EACH ROW] WHEN (<condition>) ]
<PL/SQL block>; |
|
|
Term
|
Definition
The set of attributes X functionally determines the set of attributes Y iff whenever any two tuples of the relation agree on their X values, they must also agree on their Y values. |
|
|
Term
Closure (of a set of attributes) |
|
Definition
Let A be a set of attributes. The closure of A, denoted A+, is the set of attributes that A functionally determines with respect to a set of FD's. |
|
|
Term
Closure (of a set of FD's) |
|
Definition
Let G be a set of FD's. The closure of G, denoted G+, is the set of FD's that are implied by G, including those of G itself. |
|
|
Term
|
Definition
- Reflexivity: If B ⊆ A, then A→B
- Augmentation: If C→D, then:
- C→CD
- CE→D
- CE→DE
- Transitivity: If F→G and G→H, then F→H
|
|
|
Term
|
Definition
Let G and H be sets of FD's. G covers H if every FD in H is also in G+. |
|
|
Term
Equivalence (of sets of FD's) |
|
Definition
Let G and H be sets of FD's. G and H are equivalent if G+ = H+. |
|
|
Term
|
Definition
A set of FD's is minimal if all of the following hold:
- All FD's have only 1 attribute on their RHS
- No attribute can be removed from the LHS of any FD w/o changing the closure
- All of the FD's in the set are needed to retain the closure
|
|
|
Term
Minimal Cover (of a set of FD's) |
|
Definition
A minimal cover of a set of FD's G is a minimal set of FD's that is equivalent to G. |
|
|
Term
Finding a Minimal Cover of a FD Set |
|
Definition
- Put FD's in Standard Form
- Minimize the LHS of the FD's
- Remove redundant FD's
|
|
|
Term
|
Definition
The process of decomposing relations into relations of lower degree that no longer posses certain undesirable properties. |
|
|
Term
|
Definition
A relation is in 1NF if its attributes are not set-valued. |
|
|
Term
Full Functional Dependency |
|
Definition
An FD X→Y is a full functional dependency if removing any attribute from X destroys the dependency. |
|
|
Term
|
Definition
An attribute that is a member of any CK of the relation |
|
|
Term
|
Definition
A relation R is in 2NF if every non-prime attribute of R is fully functionally dependent upon every CK of R. |
|
|
Term
Trivial Functional Dependency |
|
Definition
An FD X→A is a trivial FD when A ⊆ X |
|
|
Term
|
Definition
A superkey is a set of attributes that includes a candidate key. |
|
|
Term
|
Definition
A relation R is in 3NF if, for every non-trivial FD X→A that holds in R, either:
- X is a superkey of R, or
- A is a prime attribute of R.
|
|
|
Term
Boyce-Codd Normal Form (BCNF) |
|
Definition
A relation R is in BCNF if, for every non-trivial FD X→A in R, X is a superkey of R. |
|
|
Term
|
Definition
Let R be a relational schema, and let A,B & C be subsets of R's attributes. B is multidependent on A (or A multidetermines B) iff the set of B values matching a given (A,C) pair of values depends only on the A set and is independent of the C set. |
|
|
Term
|
Definition
A relational schema is in 4NF iff whenever there exist attribute subsets A and B of R such that A multidetermines B, then all attributes of R are also functionally dependent on A. |
|
|
Term
|
Definition
CREATE USER <username> [<options>];
- PASSWORD or IDENTIFIED BY <password>
- quotas
- limits on access (e.g. time)
- capabilities (to create DB's, to add users, etc.)
|
|
|
Term
|
Definition
GRANT <privilege>
[ON <object>]
TO <user>
[WITH GRANT OPTION];
Ex:
GRANT CREATE VIEW to smithj;
GRANT SELECT ON supplier TO PUBLIC; |
|
|
Term
|
Definition
REVOKE <privilege>
[ON <object>]
FROM <user>;
Ex:
REVOKE ALL PRIVILEGES
ON supplier
FROM PUBLIC; |
|
|
Term
Hierarchy of Security Classes (MAC) |
|
Definition
TS - Top Secret
S - Secret
C - Classified
U - Unclassified
TS > S > C > U |
|
|
Term
|
Definition
Subjects (users, groups, etc.) have clearances; Objects (tables, tuples, etc.) have classifications
- Simple Security Property: S can read O only when class(S) ≥ class(O)
- Star Property: S can write O only when class(S) ≤ class(O)
|
|
|
Term
|
Definition
- Availability - to authorized users only
- Confidentiality - preserve secrecy
- Integrity - prevent data corruption and/or loss
|
|
|
Term
|
Definition
Provide security from a more 'external' perspective, via:
- Quality passwords
- Data encryption (public-key encryption, digital signatures, etc.)
|
|
|
Term
|
Definition
Be Able to recover DB's after accidents/disasters:
- Maintain off-site backups
- Log/journal DBMS actions
- Use a non-0-level of RAID
|
|
|
Term
|
Definition
A DBMS attack in which a user tries to add (inject) SQL into an incomplete query to get the DBMS to reveal additional information |
|
|
Term
|
Definition
- Sanity-check all user input (i.e. range-checking)
- Trim excess characters from input strings
- Accept raw input only when necessary (i.e. use option lists)
|
|
|
Term
|
Definition
A DBMS of data from many sources used to support an organization's decision-making |
|
|
Term
|
Definition
A small data warehouse supporting a portion of an organization |
|
|
Term
Characteristics of a Data Warehouse |
|
Definition
- Usually orders of magnitude larger than an OLTP system
- Used to generate data summaries and to identify trends
- Often known as On-line Analytical Processing (OLAP)
- Serves as a key input source for data mining algorithms
|
|
|
Term
|
Definition
- Cleaning - attempts to detect/correct errors and omissions in the data
- Reformatting - structures the data to match the schema of the DW
- Metadata ("data about data") - shows the origins of the DW's content
- Backflushing - returning cleaned data to the source DBMSes for re-integration
|
|
|
Term
Data Design for Data Warehouses |
|
Definition
- The basic OLAP schema is the Star Schema
- Fact table is at the center
- A variation: The Snowflake Schema
- Becomes more generalized away from the center
|
|
|
Term
Querying a Data Warehouse |
|
Definition
Slicing: An equality selection in 1 or more dimensions Dicing: A range selection in 1 or more dimensions Pivoting: Changing the orientation (viewpoint) by rotating the n-dimensional array to 'hide' a dimension.
Roll-up: Moves from the current level in a dimension to the next more general level (ex. Dept→College)
Drill-down: Moves from the current level to the next more specific level (ex. Vegetable→Carrot) |
|
|
Term
|
Definition
The discovery of patterns in, or rules about, the content of a data warehouse |
|
|
Term
|
Definition
- Predictions of future events (DANGEROUS)
- Identification of events (ex. system intrusion)
- Classification (ex. Which discounts appeal to which customers?)
|
|
|
Term
Applications of Data Mining |
|
Definition
- Marketing - customer behavior, targetted emailing
- Finance - credit ratings, fraud detection
- Manufacturing - optimizing processes & resources
- Health care - drug side-effects, treatment effectiveness
|
|
|
Term
|
Definition
The items contained within a grouping of selected items, ignoring quantities |
|
|
Term
|
Definition
The percentage of all transactions that contain the itemset |
|
|
Term
Support (of Association Rules) |
|
Definition
The support of a rule is the fraction of transactions that satisfy the rule.
support(L=>R) = support(L∪R) |
|
|
Term
Confidence (of Association Rules) |
|
Definition
The confidence of a rule is the fraction of itemsets containing L that also contain R.
conf(L=>R) = supp(L∪R) / supp(L)
|
|
|
Term
Order of Operations (Query Trees) |
|
Definition
- Join(s)
- Select(s)
- Project(s)
|
|
|
Term
|
Definition
The process of generating and selecting different potential query operation orderings |
|
|
Term
Algorithms for Project and Select |
|
Definition
Project: Sequential Scan of relation
Select:
- Sequential Scan (full table scan, linear search)
- Binary Search
- Index Scan
|
|
|
Term
|
Definition
- Nested-Loops Join (NLJ)
- Sort-Merge Join (SMJ)
- Hash Join
|
|
|
Term
Influencing the Query Optimizer |
|
Definition
- Change operators (ex. Do joins w/ nested queries)
- Insist on a join ordering (ex. Replace "from s,sp" with "s JOIN sp")
|
|
|
Term
|
Definition
/ CPU Registers \
/ Cache Memory \
/ Main Memory (RAM) \
/ Solid-State Devices (SSDs) \
/ Hard (Magnetic Disk) Drives \
/ Tapes/Optical Disks (Libraries) \
|
|
|