|
|
|
|
|
GII Doctoral School on Advances in
Databases - 2009 |
|
|
University of Calabria, Rende
& Hotel S. Michele, Cetraro – Italy September 7-18, 2009 |
|
Pictures Organizing Committee Sponsors |
Program Data
exchange and integration (Maurizio Lenzerini) Abstract: "Data exchange and integration" is the problem of exchanging
and combining data residing at different sources, and providing the user with
a unified view of these data. The problem of designing data exchange and
integration systems is important in current real world applications, and is
characterized by a number of issues that are interesting from both a theoretical
and a practical point of view. In the last years, there has been a huge
amount of research work on this topic, and a precise, clear picture of a
systematic approach to the problem is now available. This course will present
an overview of the research work carried out in the area of data exchange and
integration, with emphasis on the theoretical results that are relevant for
the development of data integration solutions. Special attention will be
devoted to the following aspects: architectures for data exchange and
integration, modeling a data integration
application, moving data in data exchange, processing queries in data
integration, dealing with inconsistent data sources, and reasoning on mapping
and queries. Data
privacy and security (Elisa Bertino) Abstract: The first part of the course will review basic notions of access
control and will present the most significant models (DAC,
MAC, Chinese Wall, RBAC). The second part will
focus on access control for relational database systems. The access control
of System R will be presented together with relevant extensions, such as
positive/negative authorizations and non-cascading revoke operations. The
third part will focus on access control for advanced data management systems,
like complex object data management systems and XML data. The Author-X model
will be discussed in details, including an encryption-based access control
model used for push-based information dissemination and an architecture for
secure-third party publishing of XML data. The fourth part will focus on
privacy issues in database systems and will discuss access control models
specifically tailored to privacy. Knowledge
discovery in databases (George Karypis) Abstract: Data mining is the process of automatically extracting new and useful
knowledge hidden in large datasets. This emerging discipline is becoming
increasingly important as advances in data collection have led to the
explosive growth in the amount of available data. Despite its relative short
history, data mining techniques have been successfully deployed to a
wide-range of domains including consumer relation management, document
management and analysis, financial services, social sciences, physical
sciences, engineering, and life sciences. This sequence of lectures is
designed to achieve two primary objectives. The first is to provide an
overview of the field of data mining and present some of the fundamental
techniques associated with data pre-processing, pattern discovery,
classification, and clustering. The second is to present some of the methods
and areas that represent the current leading-edge research topics that are
being investigated by the data mining community. Relational databases, logic, and complexity (Phokion G. Kolaitis) Abstract: Since the introduction of the relational data model forty years ago,
there has been a continuous and fruitful interaction between databases and
logic. The aim of this course is to present an overview of logic-based
relational query languages with emphasis on their expressive power and
computational aspects. Topics to be
covered include: relational algebra and relational calculus; conjunctive
queries and their variants; the homomorphism theorem and connections to
constraint satisfaction; recursive queries, Datalog,
and deductive databases; set semantics vs. bag semantics; database
dependencies and the chase procedure. Foundations of XML (Frank Neven) Abstract: While logic plays a crucial role in foundational research around
query languages for the relational model (and extensions thereof), the advent
of XML opened the gateway to the field of formal language theory. This course
will provide an introduction to techniques and methods from formal language
theory (and a bit of logic) as a toolbox for analyzing W3C
standards (like the XML specification, XML Schema, XSLT, and XPath) as
well as building blocks for advanced XML research. In particular, we will
consider finite state machines, regular expressions, (extended) context-free
grammars, tree (walking) automata and monadic second-order logic. We will
apply these to study the expressiveness and complexity of DTDs,
XSDs, XSLT, and XPath. In addition, I will give a couple of concrete
examples of how the considered formalisms are used as building blocks in
recent XML research (schema learning, schema design, query optimization). Stream
databases (Carlo Zaniolo) Abstract: In the age of the Internet, massive amounts of information are continuously
exchanged as data streams that are then processed by on-line applications of
increasing complexity. For such
applications, a store-now and process-later approach cannot be used
because of real-time (or quasi real-time) requirements and the excessive data
rates. Therefore, current research seeks to develop a new generation of
information management systems, called Data Stream Management Systems (DSMS), that can support complex applications on massive
data streams with Quality of Service (QoS) guarantees.
This work has produced novel techniques, research prototypes, startup companies, and the successful deployment of DSMS in many applications, including network traffic
analysis, transaction log analysis, intrusion detection, credit-card fraud
detection, click stream analysis, and algorithmic trading. Since many such
applications involve both streaming data and stored data, the approach taken
by most DSMS consists in expressing continuous
queries on data streams using extensions of DBMS query languages. However,
significant changes in query languages
and their enabling technology are needed; indeed, DSMS
must support persistent queries on ordered streams of transient tuples, rather than
transient queries on unordered sets of persistent tuples
as in relational DBMS. In particular, only monotonic queries and non-blocking
operators can be used, and this limits the expressive power of continuous
query languages and their effectiveness in more complex applications, such as
mining data streams. Since only synopses of the unbounded streams can be
kept, basic operators, such as joins
and aggregates, must also be revised
for windows. At the implementation level, we have new query optimization
techniques that seek to minimize response time and memory utilization. Load
shedding techniques based on samples and sketches are used to achieve QoS under overload conditions. Along with these topics, we will discuss
recent DSMS advances to support data stream mining
applications. Uncertain and probabilistic databases (V.S. Subrahmanian) Abstract: Uncertainty is omnipresent in this day and age. Whether the
uncertainty arises due to sensor data that is unreliable, or because of
uncertainty in schema matching during data integration, there is a need to
answer queries over uncertain data.
This series of talks will include (i) an
overview of uncertainty in relational databases, (ii) probabilistic data
management in object oriented databases, (iii) uncertainty management in spatio-temporal data, (iv) uncertainty management arising
in data integration, and (v) probabilistic logics in AI. The coverage will
include query algebras, data models, and query processing algorithms in such
environments. |
|
|
|