Literatur zum Seminar
"Data Warehousing und Data Mining" (SS 98)
Folgende URLs stellen für nahezu alle Seminarthemen Literaturlisten bereit (zur weiteren Recherche):
Data Warehousing:
* http://www.informatik.th-darmstadt.de/DVS1/staff/wu/dw.html
* http://www.informatik.uni-freiburg.de/~seitter/DataWarehousing.html
Data Mining:
Desweiteren bieten die u.g. Bücher aus den Überblicksvorträgen zu Data Warehousing (Thema 1) bzw. Data Mining (Thema 8) zusätzlich Material für nahezu alle Seminarthemen.
Themenkomplex I: Data Warehousing
Thema 1: Einführung: Begriffe, Architekturen, ...
* Inmon, W.H.: Building the Data Warehouse, Wiley Computer Publishing, 1996 (2. Ed.)
* Anahory, S.; Murray, D.: Data Warehouse - Planung, Implementierung und Administration, Addison-Wesley, 1997
* Chaudhuri, S.; Dayal, U.: An Overview of Data Warehousing and OLAP-Technology, SIGMOD Record 26 (1), März 1997
* Tresch, M.; Rys, M.: Data Warehousing Architektur für Online Analytical Processing, Theorie u. Praxis der Wirtschaftsinformatik, Nr. 195(34), Hüthig Verlag 1997
* Wu, M.-C.; Buchmann, A.P.: Research Issues in Data Warehousing, Proc. BTW, pp. 61-82, 1997
* Widom, J.: Research Problems in Data Warehousing. Proc. CIKM 1995, pp. 25-30, Baltimore, Maryland, 1995
Thema 2: Datenextraktion und -bereinigung
* Squire, C.: Data Extraction and Transformation for the Data Warehouse, Proc. SIGMOD Conf., 1995
* Jagadish, H. V. et al.: Incremental Organization for Data Recording and Warehousing, Proc. VLDB, 1997
* Weiss, S. M.; Indurkhya, N.: Predictive Data Mining, Morgan Kaufmann, 1998: Kap. 3 (Preparing the Data)
* Labio, W. J.; Garcia-Molina, H.: Comparing Very Large Database Snapshots, TechReport CS-TN-95-27, Stanford Univ., 1995
* Labio, W. J.; Garcia-Molina, H.: Efficient Snapshot Differential Algorithms for Data Warehousing, Proc. VLDB, 1996
* Hurwicz, M.: Take your Data to the Cleaners, Byte Magazine 1, 1997
* div. White Papers von Tool-Anbietern
Thema 3: Schemaintegration und Metadaten
* Conrad, S.: Föderierte Datenbanksysteme, Springer-Verlag, 1997
* Anahory (vgl. Thema 1): Kap. 9: Metadaten
* Zhou, G. et al.: Data Integration and Warehousing Using H20, DataEng.Bull 18(2), 1995
* Brackett, Michael H.: The Data Warehouse Challenge, Kap. 18, Wiley Computer Publishing, 1996
* Musick, R., Miller Ch.: Report on the 2. IEEE Metadata Conference (Metadata '97)
http://computer.org/conferen/proceed/meta97/
dort finden sich auch die HTML-Versionen der dortigen Papers: .../list_papers.html
* Satya Sachdeva: Metadata for Data Warehouse (SYBASE):
http://www.sybase.com/services/dwpractice/meta.html
* Literaturliste (Höfling, FORWISS):
http://www.forwiss.tu-muenchen.de/~system42/public/Line42/Literatur/REPOSIT.html
(z.B. 'What is Metadata?', 'Standardizing Metadata', 'Guiding Users through disparate data layers' ...)
Thema 4: Materialisierte Sichten
Auswahl und Erzeugung:
* Labio, W. J.; Quass, D.; Adelberg, B.: Physical Database Design for Data Warehouses, Proc. ICDE, 1997
* Baralis, E.; Paraboschi, S.; Teniente, E.: Materialized View Selection in a Multidimensional Database, Proc. VLDB, 1997
* Theodoratos, D.; Sellis, T.: Data Warehouse Configuration, Proc. VLDB, 1997
* Yang, J.; Karlapalem, K.; Li, Q.: Algorithm for Materialized View Design in Data Warehousing Environment, Proc. VLDB, 1997
Pflege:
* Gupta, A.; Mumick, I. S.: Maintenance of Materialized Views: Problems, Techniques, and Applications, IEEE Data Engineering 6, 1995
* Baekgaard, L.; Roussopoulos, N.: Efficient Refreshment of Data Warehouse Views, TechReport CS-TR-3642, Univ. Maryland, 1996
* Huyn, N.: Multiple-View Self-Maintenance in Data Warehousing Environments, Proc. VLDB, 1997
* Zhuge, Y. et al.: View Maintenance in a Warehousing Environment, Proc. SIGMOD Conf., 1995
* Quass, D.; Widom, J.: On-Line Warehouse View Maintenance, Proc. SIGMOD Conf., 1997
Thema 5: Entwurf des Data Warehouse
Modellierung:
* Agrawal, Gupta et al: Modeling Multidimensional Databases, IBM Research Report, Almaden, San José, 1995
* Raden, N.: Modeling the Data Warehouse:
http://user.aol.com/nraden/iw0196_1.htm
* div. Übersichtliteratur zu DW bzgl. Star-/Snowflake-Schema, ...
* DW-Architektur für das Web, Datenbank Focus, Jan. 1998 (ROLAP vs. MOLAP)
Indexierungstechniken:
* Sarawagi, S.: Indexing OLAP data, Data Eng. Bulletin 20(1), 3/97
* Leslie, H. et al.: Efficient Search of Multidimensional B-Trees, Proc. VLDB, Zürich 1995
* Literaturliste M.-C. Wu, TH Darmstadt (Index-Techniken, z.B. 'Encoded Bitmap Indexing for Data Warehouses', ..):
http://www.informatik.th-darmstadt.de/DVS1/staff/wu.german.html
* Johnson, Th.; Shasha, D.: Some Approaches to Index Design for Cube Forests, Data Eng. Bulletin 20 (1), 3/97
* Sybase IQ - Optimizing Interactive Performance for the Data Warehouse,
(zu finden unter den Web-Seiten zu Sybase, http://www.sybase.com/)
Thema 6: Anfrageverarbeitung
OLAP:
* Pilot-Software OLAP White Paper (OLAP-'Kurs'):
http://www.pilotsw.com/olap/olap.htm
* OLAP Benckmark Study, OLAP Council:
http://www.olapcouncil.org/bmark.html
* existierende OLAP- bzw. DSS-Tools div. Hersteller
Aggregation, Cube-Operator:
* Gray, J. et al.: Data Cube: A Relational Aggregation Operator Generalizing Group-By, ..., Data Mining and Knowledge Discovery 1, Kluwer Academic Publishers, 1997
* Agarwal, S. et al.: On the Computation of Multidimensional Aggregates, Proc. VLDB, Mumbai, India, 1996
* Harinarayan, A.; Rajaraman, V.; Ullman, J. D.: Implementing Data Cubes Efficiently, Proc. SIGMOD Conf., 1996
* Deshpande, P.M. et al: Cubing Algorithms, Storage Estimation, and Storage and Processing Alternatives for OLAP, Data Eng. Bulletin 20 (1), März 97
Speziell bzw. technisch
* Gupta, A. et al.: Aggregate-Query Processing in Data Warehousing Environments, Proc. VLDB, Zürich 1995
* Ross, Srivastava: Fast Computation of Sparse Datacubes, Proc. VLDB, 1997
* Zhao, Y. et al.: An Array-Based Algorithm for Simultaneous Multidimensional Aggregates, Proc. SIGMOD Conf, Tucson, Arizona, 1997 (SIGMOD Record 26 (2))
Thema 7: Forschungsprojekte, Realisierungen
Forschungsprojekte:
* Whips: Data Warehousing at Stanford University
http://www-db.stanford.edu/warehousing/warehouse.html
* The Maryland ADMS Project
* Supporting Data Integration and Warehousing Using H2O
... alle beschrieben im DataEng.Bull. 18(2), 1995
kommerzielle DW-Lösungen:
* div. White Papers von Anbietern
* French, C. D.: "One Size Fits All" Database Architectures Do Not Work For DSS, Proc. SIGMOD Conf., 1995
Themenkomplex II: Data Mining
Thema 8: Überblick
* Holsheimer, Siebes: Data Mining: the search for knowledge in databases, TechReport CS-R9406, CWI Amsterdam, 1994
* Decker, Focardi: Technology Overview: A Report on Data Mining, TechReport TR-95-02, CSCS-ETH, 1995
* Chen, Han, Yu: Data Mining: An Overview from Database Perspective, IEEE TKDE 8 (6), 1996
* Fayyad, U. M. et al.: Advances in Knowledge and Data Mining, AAAI/MIT Press, 1996: Kap. I (Foundations) und Kap. VII (KDD Applications)
* Weiss, S. M.; Indurkhya, N.: Predictive Data Mining, Morgan Kaufmann, 1998
Thema 9: Assoziationsregeln, räumlich-zeitliche Muster
Assoziationsregeln:
* Fayyad et al. (vgl. Thema 8): Kap. IV (Dependency Derivation)
* Srikant, R.; Agrawal, R.: Mining Generalized Association Rules, Proc. VLDB, 1995
* Cheung, D. W. et al.: Maintenance of discovered association rules in large databases, Proc. ICDE, 1996
* Han, J.; Kamber, M.; Chiang, J.: Mining Multi-Dimensional Association Rules Using Data Cubes, TechReport CMPT-TR-97-06, Fraser Univ. Burnaby, 1997
* Klemettinen, M. et al.: Finding Interesting Rules from Large Sets of Discovered Association Rules, Proc. CIKM, 1994
* Mueller, A.: Fast Sequential and Parallel Algorithms for Association Rule Mining: A Comparison, TechReport CS-TR-3515, Maryland Univ., 1995
Mustererkennung und Trendanalyse:
* Fayyad et al. (vgl. Thema 8): Kap. III (Trend and Deviation Analysis)
* Faloutsos, C.; Ranganathan, M.; Manolopoulos, Y.: Fast Subsequence Matching in Time-Sereis Databases, Proc. SIGMOD Conf., 1994
* Agrawal, R. et al.: Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases, Proc. VLDB, 1995
* Li, C.-S.; Yu, P. S.; Castelli, V.: HierarchyScan: A Hierarchical Similarity Search Algorithm for Databases of Long Sequences, Proc. ICDE, 1996
Thema 10: Klassifikation, Clustering
Klassifikation:
* Agrawal, R. et al.: An Interval Classifier for Database Mining Applications, Proc. VLDB, 1992
* Lu, H.; Setiono, R.; Liu, H.: NeuroRule: A Connectionist Approach to Data Mining, Proc. VLDB, 1995
Clustering:
* Ng, R.; Han, J.: Efficient and effective clustering method for spatial data mining, Proc. VLDB, 1994
* Zhang, T.; Ramakrishnan, R.; Livny, M.: BIRCH: an efficient data clustering method for very large databases, Proc. SIGMOD Conf., 1996
* Fisher, D.: Optimization and simplification of hierarchical clusterings, Proc. KDD, 1995
* Ester, M.; Kriegel, H.-P.; Xu, X.: Knowledge discovery in large spatial databases: Focusing techniques for efficient class identification, Proc. SSD, 1995