What is the formula for the mean? | The mean, or average, is calculated by adding up all values in a dataset and then dividing by the number of values. It represents the central value or typical amount in the data. |
How would you describe the median? | The median is the middle value in a sorted dataset. If there’s an odd number of values, it’s the exact middle number. If there’s an even number, it’s the average of the two middle numbers. The median is useful for showing the center without being affected by extremely high or low values. |
Explain what the mode of a dataset is. | The mode is the most frequently occurring value in a dataset. If two or more values appear with the same highest frequency, there can be multiple modes. |
Describe how to find the range of a dataset. | The range is found by subtracting the smallest value from the largest value in a dataset. It tells us how spread out the data is from the lowest to the highest point. |
What is variance and how do you interpret it? | Variance measures how much the values in a dataset differ from the average value. It’s calculated by finding the average of the squared differences between each value and the mean. A larger variance means the data points are more spread out. |
How is standard deviation related to variance? | Standard deviation is the square root of the variance. It gives a measure of spread in the same units as the data, so it’s easier to interpret. A higher standard deviation means the data is more spread out from the average. |
Explain the standard error in simple terms. | The standard error of the mean tells us how much we expect the average value from a sample to differ from the true average value in the population. A smaller standard error means the sample average is likely to be closer to the true population average. |
When does the Central Limit Theorem apply? | The Central Limit Theorem says that if you take a large enough sample, the average of that sample will be approximately normal (or bell-shaped), no matter the shape of the original data. This usually works well when the sample size is greater than 30. |
How would you explain a 95 percent Confidence Interval? | A 95 percent confidence interval gives a range that we are 95 percent confident contains the true population average. To calculate it, start with the sample average, then add and subtract a margin based on how much variation there is and the size of the sample. |
When do we use the t-distribution for confidence intervals? | We use the t-distribution instead of the normal distribution when the sample size is small (typically less than 30) and we don’t know the population standard deviation. The t-distribution adjusts for the extra uncertainty that comes with small samples. |
Describe what a p-value tells us | A p-value shows the probability of getting a result as extreme as the one observed, assuming the hypothesis we’re testing against is true. If this probability is very low (often less than 5 percent), it suggests our data provides strong evidence against that hypothesis. |
What’s the difference between a one-tailed and a two-tailed test? | In a one-tailed test, we are only interested in deviations in one direction, either higher or lower than expected. In a two-tailed test, we look for any deviation, whether higher or lower, and need to consider both directions. |
Explain the purpose of a hypothesis test. | A hypothesis test is a way to check if the data supports a specific claim about a population. For example, you might test if the average height in a population is different from a certain number or if a treatment has an effect. |
What are the common Z-scores used for 90, 95, and 99 percent confidence levels? | For a 90 percent confidence level, the Z-score is about 1 point 6 5. For a 95 percent level, it’s about 1 point 9 6, and for 99 percent, it’s around 2 point 5 8. |
When should we use the Student’s t-distribution instead of the normal distribution? | The Student’s t-distribution is used when the sample size is small (less than 30) and the population standard deviation is unknown. It adjusts for the extra uncertainty with small samples. |
What is an outlier and how is it detected? | An outlier is a value that is significantly different from the rest of the data. To find outliers, calculate the distance from the first and third quartiles. Values that fall more than one point five times this distance below the first or above the third quartile are considered outliers. |
What are deciles and quartiles in a dataset? | Deciles divide a dataset into ten equal parts, while quartiles divide it into four equal parts. The first quartile marks the 25th percentile, the median is the 50th percentile, and the third quartile is the 75th percentile. |
How would you describe the null hypothesis and alternative hypothesis? | The null hypothesis is the default assumption that there’s no effect or no difference. The alternative hypothesis is what we want to prove, showing there is an effect or difference. |
How do you calculate and interpret the interquartile range (IQR) in a dataset? | Find the first quartile (25th percentile) and third quartile (75th percentile) of the data.
Subtract the first quartile from the third quartile. The IQR shows the middle 50% of the data and is useful for detecting outliers. |
How do you calculate and interpret the Z-score of a value? | Subtract the mean of the dataset from the value.
Divide by the standard deviation. The Z-score shows how many standard deviations a value is from the mean, with positive scores indicating above the mean and negative scores below. |
How do you approach an exercise on population proportion testing? | State the null hypothesis that the proportion is equal to a specific value.
Use the sample proportion and calculate the standard error.
Find the Z-score for the difference between the sample and hypothesized proportion, then interpret it. This approach tests if the observed proportion is significantly different from a hypothesized value. |
What is the process for calculating probability between two values in a normal distribution? | Find the Z-scores for both values.
Use the Z-table to find the cumulative probability for each Z-score.
Subtract the smaller cumulative probability from the larger one. This gives the probability that a value falls between those two points in a normal distribution. |
How do you calculate and interpret the margin of error? | Multiply the critical Z-score or t-score by the standard error. The margin of error shows the range above and below the sample statistic that likely contains the population parameter with a given confidence level. |
How is variance calculated? | Subtract the mean from each value, square the result, and average these squared deviations. For a sample, divide by one less than the number of values. |
How do you determine the coefficient of variation? | Divide the standard deviation by the mean and multiply by 100 to express as a percentage. This helps compare variability across datasets with different units or scales. |
How do you construct a confidence interval when variance is known? | Take the sample mean and add/subtract the product of the Z-score and standard error. This gives a range that likely contains the population mean. |
What’s the process for creating a confidence interval with unknown variance? | Use the sample mean plus/minus the t-score times the standard error, based on sample size minus one degrees of freedom. |
How do you calculate a confidence interval for a proportion? | Multiply the critical Z-score by the square root of the sample proportion times one minus the sample proportion, divided by the sample size. Add/subtract this margin from the sample proportion. |
What’s the general process for hypothesis testing? | Define null and alternative hypotheses.
Choose a significance level (alpha).
Calculate the test statistic (e.g., Z or t).
Compare to critical value or calculate p-value.
If the test statistic exceeds the critical value (or p < alpha), reject the null hypothesis. |
How do you conduct a two-sample t-test? | State hypotheses about the difference between means.
Calculate the t-statistic using the difference in sample means, pooled standard deviation, and sample sizes.
Use degrees of freedom to find the critical t-value.
If the t-statistic exceeds the critical value, reject the null hypothesis. |
What is the purpose of a one-tailed test, and how is it conducted? | A one-tailed test checks if a parameter is greater or less than a hypothesized value. Calculate the test statistic, then compare to a one-tailed critical value. Only reject the null if the test statistic falls in the direction of the hypothesis. |
When do we take 1-Z-score and when do we not? | Take (1−z)when:
Left Tail with Negative z-score: If the question asks for the left tail (less than a certain value) and the z-score is negative, convert it to a positive value. Since there are no negative numbers in the standard table, the positive value represents the right tail. To find the left tail, use 1-
z
* Right Tail with Positive z-score: When finding the right tail (e.g., x
>0.58, since the table only provides left tail values, take 1−z.
Do NOT take ( 1-z- score) when:
* Right Tail with Negative z-score: If you're looking for the right tail and the z-score is negative, simply convert it to a positive value. The bell curve is symmetric, so no need for 1
* Left Tail with Negative z-score: If you need to find the value for the left tail (less than 0), just use the table directly with the positive value. |
How do we find the cut-off value | 1. Percent change= significant value. So if the cut off value we are looking for is 2,5%= 0,975. Which is the value we find in the standard normal table
2. We then look up what the z-score is for the value and for 0,975 it is 1,96
3. To find the cut off value we need to unstandardize 1,96: x= 1,96* st deviation+ mule. It is the opposite of x- mule/ st deviation. |
How do you calculate the probability that X lies between two values? | 1. Take each value and minus with the mean/expected value and then divide with standard deviation. Ex: x lies between 5 and 7 and st dev is 2 and mean is 5, 5-5/2 and 7-5/2. 2. Look up the values at the standard normal table. 3. Take each Z-score and subtract with the other Z-score. |
How do you calculate what the deciles of the standard normal distribution (that is, the 10, 20, 30, ... , 90 percentiles)? | To calculate the deciles of the standard normal distribution (the 10th, 20th, 30th, ..., 90th percentiles), you start by looking at the standard normal table. You need to find the Z-scores corresponding to the probabilities 0.50, 0.60, 0.70, 0.80, and 0.90. For example, the Z-score for 0.90 is approximately 1.28.
Since the standard normal distribution is symmetric, the values for the upper percentiles (0.90, 0.80, etc.) correspond directly to negative Z-scores for the lower percentiles (0.10, 0.20, etc.). Specifically:
* The Z-score for the 10th percentile (0.10) will be -1.28 (the negative of 0.90).
* The Z-score for the 20th percentile (0.20) will be -0.84 (the negative of 0.80).
* This symmetry continues for the other percentiles as well.
Thus, you only need to find the Z-scores for the upper half and apply the negative sign for the corresponding lower percentiles. |
How do you calculate the probability of a sample mean? | Probability of a sample over 30: Take the probability- X and divide it by st dev(sqrt of variance) and you get the Z-score, which is the probability. |
How do you calculate st deviation, st error and variance? | St error: st deviation/ sqrt of n, or sqrt of variance when find st error for pi/population proportion.
St deviation: sqrt of variance or /sqrt of (x1-xbar)^2
Variance: (x1-xbar)^2 or σ^2 divided by n |
What is the difference between σ^2, Var(x) and ?2? | σ^2 is variance for a population. Var(x) is variance for a random variable. S2 is variance for a sample(adjusted). |
What is the different percentiles and how do we find the lowest and the largest nonoutlying value? | It is 25th, 50th and 75th. To find the lowest nonoutlying value you take the 25h percentile and - it with 1,5 and multiply(*) with IQR. To find the highest nonoutlying value you take the 75th percentile and + it with 1,5 and mulitply it with the IQR. THe IQR is 75th percentile -25th percentile |
How do you calculate confidence interval with population proportion(pi) | To calculate the confidence interval you take pi-hat or population mean and then you +/- it with the Z-score(1,96, 2,575 or 1,64) and the * it with the sqrt of mean*(1-mean)/n or sample. |
How do you calculate confidence interval with either unknown σ^2 og known σ^2 ? | Confidence interval where σ^2/ population variance is known is calculated by taking
Xbar-/+z-score(1,96,1,64 etc)*σ/sqrt of n.
How to calculate Confidence interval where σ^2 depends on the sample size.
If the sample is over 120 we can assume it is normal distributed. And the formula is xbar-/+z-score*s/sqrt of n.
If the sample is under 120, we take xbar-/+ t* and multiply it with s/sqrt of n |
How do you find the t*? | To find the t∗ use Excel’s T.INV formula, start by converting the confidence interval to a cumulative probability. This requires identifying the percentage in each tail of the distribution. For example, for a 95% confidence level, there’s 5% remaining for both tails (2.5% in each tail). To find the cumulative probability up to the upper bound, add 95% and 2.5%, resulting in 97.5% (or 0.975). This is the probability you’ll enter into the formula, along with the degrees of freedom(n−1). For instance, with 24 degrees of freedom, you would use =T.INV(0.975, 24), giving you t* |
When do you use the Excel formulas and how do you use the formulas: T.INV, T.Dist.2t, Stdev.s and Stdev.P, VAR.P and VAR.S | T.INV= Find t* for CI, ex: 95% confidence interval +2,5% on each tail=0,975. Write that first and then take n-1. Example: T.INV(0,975;19).
T.Dist.2t= Find p-value for hypothesis testing. For example if a excersize is testing 5% significant level. We do the test statistic and get a value, ex, 2,04. We then use this value in the formula and add n-1 to get degree of freedom. Example: T.Dist.2t( 2,04;19). The value we get we set i percentage. Ex: 0,0077=0,77%. The value is smaller than 5% so we reject the hypothesis.
Stdev.s: Find adjusted sample deviation, when variance is unknown. You just drag the values to get the s.
St dev. p: To get standard deviation. You just drag the values to get st dev.
Var.p: Find variance. Just drag the values to get variance.
Var.s: Find adjusted sample variation. Just drag the values to get s^2 |
What is alpha and what is p-value? | The alpha level is a cutoff chosen by the researcher (like 5%) that shows the risk of mistakenly rejecting a true null hypothesis. The p-value is calculated from the data and tells us the probability of seeing an effect as strong as the one in our results, assuming the null hypothesis is true. If the p-value is less than the alpha level, we usually reject the null hypothesis. |
How do you calculate a hypothesis? | You take the mean or X- the hypothesis and the divide it by st deviation/ sqrt of n |
What are the three ways to check a hypothesis test? | 1. Test statistics-> mean or X-the hypothesis and divide it by st/sqrt of n
2. Find p-value-> use the formula t.dis.2t and plot in the value of the test statistics and n-1. If the p value is lower that the significant level we reject it
3. Confidence interval: x+/-z-value *s or st debv/ sqrt of n. Formula depends on sample size. |
How do you do a t-test for two independent population with a large sample? | You take the mean for each group and subtract them from each other. Then you divide it by the squareot of s from first group^2/ n from first group + s from the other group ^2/ n from other group. |
With this picture in mind where a hypothesis has signicant level of 10%. Where do we reject the value? | So 10% is 1,645. The standard normal distirbution on the picture is symmetrical, so on the left side it is -1,645 while on the right side it is 1,645. The middle part or the blue area is the non-rejection area. So if the value is inbetween those values it is not rejected, but if it is higher than both -1,645 or 1,645 it is rejected. |
What is 1%, 5% and 10% significant level? | 1%= 2,575. 5%= 1,96. 10%= 1,645 |
When do we accept or reject a p-value | A low p-value (below
α\alpha suggests that the observed data would be very unlikely under the assumption that H0 is true, which leads researchers to considerH0 as potentially false and we reject it.
A high p-value indicates that the data is consistent with H0 , and you do not have enough evidence to claim a significant effect |
Comment the output on the picture. The answer should include specific reference to the role of the sample used and to the point and the interval estimate (i.e. confidence interval) of the population mean... | The observed sample consists in 442 tennis players. The surveyed players report to have expereinced on average 1 episode of tennis elbow during their career (the sample average is 1.3, let’s treat the variable as a continuous one). The sample information can be used to make inference on the underlying population. According to our sample, the average number of episodes of tennis elbow in the general population of tennis players is between 1.2 and 1.5 (with 90% Confidence level). |
Please refer to the results in the table and briefly comment the difference in age between the two groups (patients and control). | Answer
Each of the two samples (patients and control individuals) consists of 133 individuals. The average age in both samples is 67.6 years old. A t-test is used to test wethere there is any significant difference in the average age between the two underlying populations (from which the two samples are drawn). The Null hypothesis that the average age is the same in the two populations (i.e. their difference is equal to zero) is tested against the Alternative Hypothesis that it is not (i.e. the difference is not equal to zero). According to the p-value reported in the table (=1), the Null Hypothesis is not rejected at any significance level. Thus, based on the observed sample information, no significant difference exist in the average age of the population of patients waiting for total hip or knee replacement and the general population. |