the nature of data | - data is a collection of facts ( obtained through ,experiences, observations or experiments)
- data can consist of numbers, words, images
- data is lowest level of abstraction , data --> information --> knowledge
- date is the source of information
- data quality and integrity --> critical to analytics
-structured data ( numbers)
- unstructured date (text, images) |
definitions of data | data: facts obtained through experiments, observations, sensors, or transactions |
What are the structured datas? | structured data is what computers typically process
- categorical
•nominal : descriptive non numeric ( color of a phone)
•ordinal : order data ( first, second, third - low, medium, high)
- numerical
•interval data : measures the difference between two values
(IQ score, temperature)
• ratio data : has an a zero (weight, height) |
What is unstructured data | -Textual
-multimedia ( audio, image, Video) |
What is Data preprocessing? | Data preprocessing: getting data ready for analysis |
what are some data preprocessing ? | 1. data consolidation -sourcing
• collect relevent data
2. data cleaning - quality
• eliminate incorrect values, input missing values
3. data transformation - put in correct form for processing
•numerical variables into categorical ( random numbers, into low, medium, high)
4, data reduction |
what is a RFID? | Radio Frequency Identifying Device
tag that when scanned transmits the data on the tag
For example, ski resorts use it to allow skiers with passes to go through checkpoints before getting on the chairlift |
what is statistics? | – A collection of mathematical techniques to
characterize and interpret data |
what is Descriptive Statistics ? | Describing characteristics of the data (as it is)
Used for descriptive analytics |
what is Inferential statistics? | Drawing insights about the population based on
sample data. Sample Population |
describe characteristics of negative skewness? | -drops to left
- mode > median > mean |
describe characteristics of positive skewness? | -drops to right
-mode < median < mean |
describe characteristics of kurtosis? | normal distribution, kurtosis = 3
- is associated with height and flatness
- smaller (negative kurtosis) more flat /short
- higher ( positive kurtosis) the more peaked/tall |
Simple Regression versus Multiple Regression | Simple regression has one input variable while multiple regression has more than one
SEE MORE ON SLIDES |
interpreting regression analysis | The Multiple R is the Correlation Coefficient that
measures the strength of a linear relationship between
two variables. The larger the absolute value,
the stronger is the relationship.
• 1 means a strong positive relationship
• -1 means a strong negative relationship
• 0 means no relationship at all
• R Square signifies the Coefficient of Determination, which shows the goodness of fit. It shows how well the data fits this regression model. In our example, the value of R square is 0.97, which is an excellent fit. In other words, 97% of the variation in the dependent variable (y-values) is explained by the independent variables (x-values).
• Adjusted R Square is the modified version of R square that adjusts for predictors that
are not significant to the regression model.
• Standard error is also a goodness of fit measure. |
what is a time series | A time series is a sequence of data points of a variable of
interest over a period of time. The data points must be
evenly spaced. Eg. Quarterly sales over several years.
SEE MORE ON SLIDES |
Difference Between MAD, MSE , and MAPE | MAD = measures the average absolute errors
MAPE= measures the average percentage difference
MSE = gives the average squared differences |
what is a business report? | Business Report: Information is presented in a useful
form for business decision makers |
what is a business report's purpose ? | Purpose:
- to improve managerial decisions
– Persuade: argument with supporting evidence
– Inform – provide information, analysis, etc.
– Empower the user to act |
Time Series NAIVE APPRoach | Assumes demand in next period is the same as
demand in most recent period
– e.g., If January sales were 68, then February
sales are predicted to be 68 |
Time Series Moving Average Method | Moving Average is a series of arithmetic means
• All data points are equally weighted
• Used if little or no trend
• Used often for smoothing |
DTime Series Weighted Moving Average Method | Used when some trend might be present
– Older data usually less important
• Weights based on experience and intuition
• More recent data weighted more heavily than older data |
Time Series exponential smoothing method | Form of weighted moving average
– Most recent data weighted most
– Weights decline exponentially
• Requires smoothing constant (α)
– Ranges from 0 to 1
– Subjectively chosen
– Higher the value of α, the more weight placed on
more recent data
s• Advantage: Involves little record keeping of past data |
what kind of chart is this ? | Line chart |
what kind of chart is this ? | Bar chart |
what kind of chart is this ? | Multivariable chart |
What kind of chart is this? | Stacked bar chart |
What kind of chart is this? | scatter plot |
What kind of chart is this? | histogram
is like a bar chart but shows frequency
distribution of a continuous variable |