Term
What are several assumptions that traditional DBMS make? |
|
Definition
- Persistent Data storage
- Relatively static records
- (Typically) no predefined notion of time
- Complex one-off queries
|
|
|
Term
What are some requirements in applications that make Data Streams Useful? Give some application domains |
|
Definition
- Data arrives in real time
- Data is ordered (implicitly by arrival time or explicitly by timestamp)
- Too much data to store
- Data never stops coming
- Ongoing analysis of rapidly changing data
Application domains
- Network monitoring and traffic engineering
- Sensor networks, RFID tags
- Telecommunication call records
- Financial applications
- Web logs and click streams
- Manufacturing processes
|
|
|
Term
What are the key principles of Data Streams? |
|
Definition
- A potentially unbounded sequence of tuples
- Transactional data streams: log interaction between entities
- Credit card: purchases by consumers from merchants
- Telecommunications: phone calls by callers to dialed parties
- Web: accesses by clients of resources at servers
- Measurement data streams: monitor evolution of entity states
- Sensor networks: physical phenomena, road traffic
- IP network: traffic at router interfaces
- Earth climate: temperature, moisture, etc.
|
|
|
Term
Give a comparison of One-time vs Continuous Queries |
|
Definition
- One-time queries
- Run once to completion over the current dataset
- Continuous queries
- Issued once and then continuously evaluated over a data stream
- Notify me when the temperature drops below X
- tell me when prices of stock Y > 300
|
|
|
Term
Give a comparison of DBMS vs DSMS |
|
Definition
DBMS |
DBMS |
Persistent relations (relatively ststic, stored) |
Transient streams (on-line analysis) |
One-time queries |
Continuous Queries (CQ) |
Random Access |
Sequential Access |
Unbounded disk store |
Bounded main memory |
Only current state matters |
Historical data is important |
No real-time services |
Real TIme requirements |
Relatively low update rate |
Data at fine granularity |
assume precise data |
Data scale/imprecise |
Access plan determined by query processor, physical db design |
Unpredictable/variable data arrival and characteristics |
|
|
|
Term
What are the architectural issues of DBMS vs DSMS? |
|
Definition
DBMS |
DBMS |
Resource (memory, disk, per-tuple computation) rich |
Resource limited |
Extremely sophisticated query processing |
Reasonably complex, near real time query processing |
Useful to audit query results of data streams |
Useful to identify what data to populate in database |
Query evaluation: Arbitrary |
Query evaluation: One pass |
Query Plan: Fixed |
Query Plan: Adaptive |
|
|
|
Term
What are the concepts Continuous Query Language is based on? |
|
Definition
- Queries produce/refer to relations and streams
- Based on SQL, with the addition of:
- Streams as new data type
- Continuous instead of one-time semantics
- Windows on streams (derived from SQL-99)
- Sampling on streams (basic)
|
|
|
Term
What are the principles of Query Processing? |
|
Definition
- Construct query plan based on relational operators, as in an RDBMS
- Selection
- Projection
- Join
- Aggregation (group by)
- Combine plans form continuous queries (reduce redundancy)
- Stream tuples through the resulting network of operators
|
|
|
Term
What are tuple-at-a-time operators? |
|
Definition
Evaluation only requires consideration of one tuple at a time (Selection and projection) |
|
|
Term
What are the concepts of full relation operators? |
|
Definition
- Some full relation operators can work on a tuple at a time
- Count, sum, average, min, max (even with group by)
- (Order by, however, can't)
[image]
|
|
|
Term
What are some things that full relation operators can't do |
|
Definition
Intersection, difference, product, join
(Union, however, can be evaluated tuple-by-tuple) |
|
|
Term
What relational operators may block when applied to streams? |
|
Definition
- No output until entire input seen, but streams are unbounded
- Joins may need to join tuples that are aritrarily far apart
|
|
|
Term
What does relation/stream translation consist of? |
|
Definition
- Some relational operators can work directly on streams
- Selection, projection, union, some aggregates
- Some relational operators need to work on relations
- Join, product, difference, interseciton, other aggregates
- Stream-to-relation operators
- Relation-to-stream operators
- Istream, Dstream, Rstream
|
|
|
Term
What does Windowing consist of in stream operations? |
|
Definition
- Mechanism for extracting a finite relation (synopsis) from an infinite stream
- Various window proposals for restricting operator scope.
- Windows absed on ordering attribute (e.g. last 5 minutes of tuples)
- Windows absed on tuple counts (e.g. last 1000 tuples)
- Windows based on explicit markers e.g. punctuations)
- Variants (e.g. partitioning tuples in a window)
- Various window behaviours
|
|
|
Term
Show a diagram of a sliding window |
|
Definition
|
|
Term
What are some characteristics of join evaluation? |
|
Definition
- Consider a stream-based join operation
- a conventional join over a pair of windows on the input streams
- outputs a stream of tuples joined from the input streams
|
|
|
Term
What is the scalability and completeness of DBMS and DSMS? |
|
Definition
- DBMS deals with finite relations
- Query evaluation should produce all results for a given query
- DSMS deal with unbounded data streams
- May not be possible to return all results for a given query
- Trade-off between resource use and comlpeteness of result set
- Size of buffers used for windows is one example of a parameter that affects resource use and completeness
- Can further reduce resource by randomly sampling from streams
|
|
|
Term
What are some Relation-to-Stream Operators? |
|
Definition
- Insert Stream (Istream)
- Whenever a tuple is inserted into the relation, emit it on the stream
- Delete Stream (Dstream)
- Whenever a tuple is deleted from the releation, emit it on the stream
- Relation Stream (Rstream)
- At every time instant, emit every tuple in the relation on the stream
|
|
|
Term
Give an example CQL query (uses window) |
|
Definition
SELECT Istream(*)
FROM S [rows unbounded]
WHERE S.A > 10
S is converted into a relation (of unbounded size)
Resulting relation is converted back to a stream via Istream |
|
|
Term
Given an example of CQL query (No window) |
|
Definition
SELECT *
FROM S
WHERE S.A > 10
S is a stream - query plan involves only selection, so window is now unnecessary |
|
|
Term
Give an example of a CQL query (window specified on streams) |
|
Definition
SELECT *
FROM S1 [rows 1000], S2[range 2 minutes]
WHERE S1.A > S2.A and S1.A > 10
Windows specified on streams:
- Tuple based sliding window - [rows 1000]
- Time-based sliding window - [range 2 minutes]
|
|
|
Term
Give an example of a CQL query (Probing stored table) |
|
Definition
SELECT Rstream(S.A, R.B)
FROM S1 [now], R
WHERE S1.A = R.A
- Query probes a stored table R based on each tuple in streams S and streams the result
- [now] - time based sliding window containing tuples in the last time step
|
|
|
Term
What are some key concepts on query optimization? |
|
Definition
- Traditionally relation cardinalities used in query optimizer
- Minimize the size of intermediate results
- Problematic in a streaming environment
- All streams are unbounded = infinite size
- Need novel optimization objectives that are relevant when input sources are streams
- Stream rate based (e.g. NiagaraCQ)
- Resource-based (e.g. STREAM)
- QoS based (e.g. Aurora)
- Solution: Continuous adaptive optimization
|
|
|