Data Warehouses

  • Consolidate data from many sources in one large repository
  • Loading, periodic synchronization of replicas

Common architecture

  • Databases at branches handle OLTP queries
  • Local store databases copied to a central warehouse overnight
  • Analysts use the warehouse for OLAP and data mining

Warehousing Challenges

  • Semantic integration: Eliminate mismatches when integrating data from multiple sources
  • Heterogenous sources: Must fit the schemas of different sources together
  • Load, Refresh, Purge: Must load data, periodically refresh it, and then purge old data
  • Metadata Management: Must keep track of source, loading time, and other information