data warehouse [English]

Syndetic Relationships

InterPARES Definition

n. ~ A repository that stores information, taken from diverse sources, that has been extracted, normalized, and transformed into a common schema and that supports pre-defined analysis and reporting.


  • Amazon 2017 (†868 ): The data warehouse functions as a central repository of information coming from one or more data sources. Data flows into a data warehouse from transactional systems and other relational databases, and typically includes structured, semi-structured, and unstructured data. This data is processed, transformed, and ingested at a regular cadence. Users including data scientists, business analysts, and decision-makers access the processed data in the data warehouse through business intelligence tools, SQL clients, and spreadsheets. (†2608)
  • Campbell 2016 (†865 ): Wikipedia, defines Data Warehouses as: “…central repositories of integrated data from one or more disparate sources. They store current and historical data and are used for creating trending reports for senior management reporting such as annual and quarterly comparisons.” This is a very high level definition that describes the purpose of a data warehouse but doesn’t explain how the purpose is achieved. I would go on to add that a data warehouse has the following properties: · It represents an abstracted picture of the business organized by subject area. · It is highly transformed and structured. · Data is not loaded to the data warehouse until the use for it has been defined. (†2601)
  • Gartner IT Glossary (†298 s.v. "data warehouse"): A storage architecture designed to hold data extracted from transaction systems, operational data stores and external sources. The warehouse then combines that data in an aggregate, summary form suitable for enterprise-wide data analysis and reporting for predefined business needs. ¶ The five components of a data warehouse are: · production data sources · data extraction and conversion · the data warehouse database management system · data warehouse administration · business intelligence (BI) tools ¶ A data warehouse contains data arranged into abstracted subject areas with time-variant versions of the same records, with an appropriate level of data grain or detail to make it useful across two or more different types of analyses most often deployed with tendencies to third normal form. A data mart contains similarly time-variant and subject-oriented data, but with relationships implying dimensional use of data wherein facts are distinctly separate from dimension data, thus making them more appropriate for single categories of analysis. (†2606)
  • Wikipedia (†387 s.v. "data warehouse"): In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis, and is considered a core component of business intelligence. DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place and are used for creating analytical reports for knowledge workers throughout the enterprise. ¶ The data stored in the warehouse is uploaded from the operational systems (such as marketing or sales). The data may pass through an operational data store and may require data cleansing[2] for additional operations to ensure data quality before it is used in the DW for reporting. (†2609)
  • Wikipedia (†387 s.v. "data warehouse"): The concept of data warehousing dates back to the late 1980s[12] when IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse". In essence, the data warehousing concept was intended to provide an architectural model for the flow of data from operational systems to decision support environments. The concept attempted to address the various problems associated with this flow, mainly the high costs associated with it. In the absence of a data warehousing architecture, an enormous amount of redundancy was required to support multiple decision support environments. In larger corporations, it was typical for multiple decision support environments to operate independently. Though each environment served different users, they often required much of the same stored data. The process of gathering, cleaning and integrating data from various sources, usually from long-term existing operational systems (usually referred to as legacy systems), was typically in part replicated for each environment. Moreover, the operational systems were frequently reexamined as new decision support requirements emerged. Often new requirements necessitated gathering, cleaning and integrating new data from "data marts" that were tailored for ready access by users. (†2610)