Statistical Analysis
From data to insights — the core of R
📖 Concept Recap
R was built for statistics. Its core functions cover everything from descriptive stats to hypothesis testing:
- Descriptive:
mean(),median(),sd(),var(),summary() - Correlation:
cor(x, y)— Pearson r between −1 and +1 - t-test:
t.test(x, y)— compare two group means, check p-value - Linear regression:
lm(y ~ x, data=df)thensummary(model)for R², coefficients, and p-values - Simulation:
rnorm(n, mean, sd),runif(n, min, max),set.seed()for reproducibility
A p-value < 0.05 is conventionally called “statistically significant” — it means the result is unlikely under the null hypothesis. R² tells you how much variance in y is explained by x.
👀 Worked Example
Simulating data, computing correlation, and fitting a linear regression:
Exercise 1 — Two-Group Comparison & t-Test
Run this complete group comparison. Study the t-test output and Cohen’s d calculation, then try changing the group means or SDs to see how the p-value responds.
Exercise 2 — Multiple Regression
Create a 100-student dataset with study_hours, sleep_hours, and exam_score. Compute a correlation matrix, fit a multiple regression, and interpret the coefficients in plain English comments.
cor(df) computes pairwise correlations for all numeric columns. In summary(model), look at the Estimate column for coefficients and Pr(>|t|) for p-values. Stars (*) indicate significance levels.Exercise 3 — Coin Flip Simulation & False Positives
Simulate coin flipping to understand false positive rates: (1) flip a fair coin 30 times using rbinom(30, 1, 0.5) and test for fairness with binom.test(), (2) repeat this 100 times, and (3) report the percentage of trials where p < 0.05 even though the coin is fair.
Department Salary Study
Analyze a simulated salary dataset: compute overall and by-department stats, run a t-test between two departments, fit a regression of salary on years of experience, and write a plain-English summary using cat().
tapply(salary, dept, function(x) ...) applies a function to salary values grouped by department. The result is a list — access elements with dept_stats[["Engineering"]].✅ Lab 7 Complete!
You’ve run t-tests, fitted regression models, interpreted R² and p-values, and understood simulation-based false positive rates. These are the statistical foundations of evidence-based analysis.
Continue to Lab 8: Tidyr & Data Reshaping →