Data Lake vs. Data Mart vs. Data Warehouse
“If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.” – James Dixon, the founder and CTO of Pentaho
The data mart is a subset of the data warehouse and is usually oriented to a specific business line or team. Whereas data warehouses have an enterprise-wide depth, the information in data marts pertains to a single department. The data warehouse can only store the orange data, while the data lake can store all the orange and blue data
Data Latency is how long it takes for a business user to retrieve source data from a data warehouse or business intelligence dashboard
A Data Scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician.
Probability vs. Statistics
Probability deals with predicting the likelihood of future events, while Statistics involves the analysis of the frequency of past events.