Analytics Glossary

Data Lake vs. Data Mart vs. Data Warehouse

“If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.” – James Dixon, the founder and CTO of Pentaho

The data mart is a subset of the data warehouse and is usually oriented to a specific business line or team. Whereas data warehouses have an enterprise-wide depth, the information in data marts pertains to a single department. The data warehouse can only store the orange data, while the data lake can store all the orange and blue data

Data Latency

Data Latency is how long it takes for a business user to retrieve source data from a data warehouse or business intelligence dashboard

Data Scientist

A Data Scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician.

Probability vs. Statistics

Probability deals with predicting the likelihood of future events, while Statistics involves the analysis of the frequency of past events.

Edge analytics is the collection, processing and analysis of data at the edge of a network either at or close to a sensor, a network switch or some other connected device

Machine Learning: Algorithms that can make predictions through pattern recognition.

Deep Learning: A form of machine learning that uses a computing model inspired by the structure of the brain which requires less human supervision. Deep learning isn’t an application – it’s a technology that makes many applications smarter and more natural through experience. Deep learning is a subset of machine learning, and machine learning is a subset of AI, which is an umbrella term for any computer program that does something smart.

Propensity Modeling: 

Propensity models are often used to identify those most likely to respond to an offer, or to focus retention activity on those most likely to churn.

The model may be applied to your database to score all your customers or prospects. You can then select only those who are most likely to exhibit the predicted behaviour, for example response, and focus your mailing activity appropriately.