GII Doctoral School on Advances in Databases - 2009

 

University of Calabria, Rende & Hotel S. Michele, Cetraro – Italy

September 7-18, 2009


 

Home

Lecturers

Program

 

Doctoral Symposium

 

Venues

Travel and accommodation

Pictures

 

Application

Registration

 

List of participants

Lecture material

 

Organizing Committee

Sponsors

Program

 

Data exchange and integration (Maurizio Lenzerini)

Abstract: "Data exchange and integration" is the problem of exchanging and combining data residing at different sources, and providing the user with a unified view of these data. The problem of designing data exchange and integration systems is important in current real world applications, and is characterized by a number of issues that are interesting from both a theoretical and a practical point of view. In the last years, there has been a huge amount of research work on this topic, and a precise, clear picture of a systematic approach to the problem is now available. This course will present an overview of the research work carried out in the area of data exchange and integration, with emphasis on the theoretical results that are relevant for the development of data integration solutions. Special attention will be devoted to the following aspects: architectures for data exchange and integration, modeling a data integration application, moving data in data exchange, processing queries in data integration, dealing with inconsistent data sources, and reasoning on mapping and queries.

 

Data privacy and security (Elisa Bertino)

Abstract: The first part of the course will review basic notions of access control and will present the most significant models (DAC, MAC, Chinese Wall, RBAC). The second part will focus on access control for relational database systems. The access control of System R will be presented together with relevant extensions, such as positive/negative authorizations and non-cascading revoke operations. The third part will focus on access control for advanced data management systems, like complex object data management systems and XML data. The Author-X model will be discussed in details, including an encryption-based access control model used for push-based information dissemination and an architecture for secure-third party publishing of XML data. The fourth part will focus on privacy issues in database systems and will discuss access control models specifically tailored to privacy.

 

Knowledge discovery in databases (George Karypis)

Abstract: Data mining is the process of automatically extracting new and useful knowledge hidden in large datasets. This emerging discipline is becoming increasingly important as advances in data collection have led to the explosive growth in the amount of available data. Despite its relative short history, data mining techniques have been successfully deployed to a wide-range of domains including consumer relation management, document management and analysis, financial services, social sciences, physical sciences, engineering, and life sciences. This sequence of lectures is designed to achieve two primary objectives. The first is to provide an overview of the field of data mining and present some of the fundamental techniques associated with data pre-processing, pattern discovery, classification, and clustering. The second is to present some of the methods and areas that represent the current leading-edge research topics that are being investigated by the data mining community.

 

Relational databases, logic, and complexity (Phokion G. Kolaitis)

Abstract: Since the introduction of the relational data model forty years ago, there has been a continuous and fruitful interaction between databases and logic. The aim of this course is to present an overview of logic-based relational query languages with emphasis on their expressive power and computational aspects.  Topics to be covered include: relational algebra and relational calculus; conjunctive queries and their variants; the homomorphism theorem and connections to constraint satisfaction; recursive queries, Datalog, and deductive databases; set semantics vs. bag semantics; database dependencies and the chase procedure.

 

Foundations of XML (Frank Neven)

Abstract: While logic plays a crucial role in foundational research around query languages for the relational model (and extensions thereof), the advent of XML opened the gateway to the field of formal language theory. This course will provide an introduction to techniques and methods from formal language theory (and a bit of logic) as a toolbox for analyzing W3C standards (like the XML specification, XML Schema, XSLT,  and XPath) as well as building blocks for advanced XML research. In particular, we will consider finite state machines, regular expressions, (extended) context-free grammars, tree (walking) automata and monadic second-order logic. We will apply these to study the expressiveness and complexity of DTDs, XSDs, XSLT, and XPath. In addition, I will give a couple of concrete examples of how the considered formalisms are used as building blocks in recent XML research (schema learning, schema design, query optimization).

 

Stream databases (Carlo Zaniolo)

Abstract: In the age of the Internet, massive amounts of information are continuously exchanged as data streams that are then processed by on-line applications of increasing complexity. For such  applications, a store-now and process-later approach cannot be used because of real-time (or quasi real-time) requirements and the excessive data rates. Therefore, current research seeks to develop a new generation of information management systems, called Data Stream Management Systems (DSMS), that can support complex applications on massive data streams with Quality of Service (QoS) guarantees. This work has produced novel techniques, research prototypes, startup companies, and the successful deployment of DSMS in many applications, including network traffic analysis, transaction log analysis, intrusion detection, credit-card fraud detection, click stream analysis, and algorithmic trading. Since many such applications involve both streaming data and stored data, the approach taken by most DSMS consists in expressing continuous queries on data streams using extensions of DBMS query languages. However, significant changes in query  languages and their enabling technology are needed; indeed, DSMS must support persistent queries on ordered streams of transient tuples, rather than  transient queries on unordered sets of persistent tuples as in relational DBMS. In particular, only monotonic queries and non-blocking operators can be used, and this limits the expressive power of continuous query languages and their effectiveness in more complex applications, such as mining data streams. Since only synopses of the unbounded streams can be kept,  basic operators, such as joins and aggregates, must also  be revised for windows. At the implementation level, we have new query optimization techniques that seek to minimize response time and memory utilization. Load shedding techniques based on samples and sketches are used to achieve QoS under overload conditions.  Along with these topics, we will discuss recent DSMS advances to support data stream mining applications.

 

Uncertain and probabilistic databases (V.S. Subrahmanian)

Abstract: Uncertainty is omnipresent in this day and age. Whether the uncertainty arises due to sensor data that is unreliable, or because of uncertainty in schema matching during data integration, there is a need to answer queries over uncertain data.  This series of talks will include (i) an overview of uncertainty in relational databases, (ii) probabilistic data management in object oriented databases, (iii) uncertainty management in spatio-temporal data, (iv) uncertainty management arising in data integration, and (v) probabilistic logics in AI. The coverage will include query algebras, data models, and query processing algorithms in such environments.

 

 

 


Contact us