Citations

Existing Citations

  • data lake : Pentaho CTO James Dixon has generally been credited with coining the term “data lake”. He describes a data mart (a subset of a data warehouse) as akin to a bottle of water…”cleansed, packaged and structured for easy consumption” while a data lake is more like a body of water in its natural state. Data flows from the streams (the source systems) to the lake. Users have access to the lake to examine, take samples or dive in. ¶ Data Lakes in a Modern Data ArchitectureThis is also a fairly imprecise definition. Let's add a few specific properties of a data lake: · All data is loaded from source systems. No data is turned away. · Data is stored at the leaf level in an untransformed or nearly untransformed state. · Data is transformed and schema is applied to fulfill the needs of analysis. ¶ Next, let's highlight five key differentiators of a data lake and how they contrast with the data warehouse approach. 1. Data Lakes Retain All Data . . . . 2. Data Lakes Support All Data Types . . . . 3. Data Lakes Support All Users 4. Data Lakes Adapt Easily to Changes . . . . 5. Data Lakes Provide Faster Insights ¶ Data Lakes in a Modern Data ArchitectureThis is also a fairly imprecise definition. Let's add a few specific properties of a data lake: · All data is loaded from source systems. No data is turned away. · Data is stored at the leaf level in an untransformed or nearly untransformed state. · Data is transformed and schema is applied to fulfill the needs of analysis. (†2602)
  • data warehouse : Wikipedia, defines Data Warehouses as: “…central repositories of integrated data from one or more disparate sources. They store current and historical data and are used for creating trending reports for senior management reporting such as annual and quarterly comparisons.” This is a very high level definition that describes the purpose of a data warehouse but doesn’t explain how the purpose is achieved. I would go on to add that a data warehouse has the following properties: · It represents an abstracted picture of the business organized by subject area. · It is highly transformed and structured. · Data is not loaded to the data warehouse until the use for it has been defined. (†2601)