Learn Without Walls
← Back to R Practice Labs
Lab 7 of 10

Statistical Analysis

From data to insights — the core of R

← Lab 6: ggplot2 Lab 7 of 10 Lab 8: Tidyr →
⏳ Loading R... (first load takes ~15 seconds)

📖 Concept Recap

R was built for statistics. Its core functions cover everything from descriptive stats to hypothesis testing:

A p-value < 0.05 is conventionally called “statistically significant” — it means the result is unlikely under the null hypothesis. tells you how much variance in y is explained by x.

👀 Worked Example

Simulating data, computing correlation, and fitting a linear regression:

set.seed(42) n <- 50 study_hours <- runif(n, 1, 8) exam_score <- pmin(pmax(50 + 5 * study_hours + rnorm(n, 0, 8), 0), 100) df <- data.frame(study_hours, exam_score) cat("Correlation:", round(cor(study_hours, exam_score), 3), "\n") model <- lm(exam_score ~ study_hours, data = df) cat("Intercept:", round(coef(model)[1], 2), "\n") cat("Slope:", round(coef(model)[2], 2), "\n") cat("R-squared:", round(summary(model)$r.squared, 3), "\n")
✏️ Guided

Exercise 1 — Two-Group Comparison & t-Test

Run this complete group comparison. Study the t-test output and Cohen’s d calculation, then try changing the group means or SDs to see how the p-value responds.

Output will appear here...
💡 Hint: Cohen’s d measures practical significance (effect size), not just statistical significance. A small p-value with a tiny Cohen’s d means the difference is statistically detectable but may not matter in practice.
💪 Independent

Exercise 2 — Multiple Regression

Create a 100-student dataset with study_hours, sleep_hours, and exam_score. Compute a correlation matrix, fit a multiple regression, and interpret the coefficients in plain English comments.

Output will appear here...
💡 Hint: cor(df) computes pairwise correlations for all numeric columns. In summary(model), look at the Estimate column for coefficients and Pr(>|t|) for p-values. Stars (*) indicate significance levels.
🔥 Challenge

Exercise 3 — Coin Flip Simulation & False Positives

Simulate coin flipping to understand false positive rates: (1) flip a fair coin 30 times using rbinom(30, 1, 0.5) and test for fairness with binom.test(), (2) repeat this 100 times, and (3) report the percentage of trials where p < 0.05 even though the coin is fair.

Output will appear here...
💡 Hint: This demonstrates the Type I error rate. At a significance level of 0.05, about 5% of tests on truly null effects will appear “significant” by chance alone. This is why multiple testing correction matters!
🏆 Mini Project — Salary Analysis

Department Salary Study

Analyze a simulated salary dataset: compute overall and by-department stats, run a t-test between two departments, fit a regression of salary on years of experience, and write a plain-English summary using cat().

Output will appear here...
💡 Hint: tapply(salary, dept, function(x) ...) applies a function to salary values grouped by department. The result is a list — access elements with dept_stats[["Engineering"]].

✅ Lab 7 Complete!

You’ve run t-tests, fitted regression models, interpreted R² and p-values, and understood simulation-based false positive rates. These are the statistical foundations of evidence-based analysis.

Continue to Lab 8: Tidyr & Data Reshaping →

← Lab 6: ggplot2 Lab 7 of 10 Lab 8: Tidyr →