Full Statistical Analysis in R
Everything together — a complete analysis workflow
📌 Before You Start
What you need: All 7 previous modules completed (or strong familiarity with descriptive stats, distributions, CLT, confidence intervals, hypothesis testing, and regression).
What you’ll do: A complete statistical analysis from raw data to written conclusion, applying every technique from this course in a realistic research scenario.
📖 Everything You’ve Built — Applied Here
This capstone integrates all 7 modules. Here’s what you’ll apply:
🏛️ Your Scenario: University Research Analyst
You are a research analyst at a university. A faculty committee wants to understand what factors predict student exam performance. They’ve provided data on 120 students including GPA, study habits, sleep, anxiety, major, year, and first-generation status.
Your job: produce a complete statistical report. The committee needs actionable insights — not just numbers. Work through all 6 tasks in order, then write your conclusion in Task 6.
Important: Run Task 0 (Setup) first. The dataset persists across tasks within the same browser session.
Setup — Create the Student Dataset
Run this first. It creates the students data frame with 120 observations and 8 variables. You must run this before any other task.
Descriptive Statistics — Get to Know the Data
Compute summary statistics for all numeric variables. Which variables have the most variability (highest CV = SD/mean)? Are any variables notably skewed?
Normality Check — Is GPA Normally Distributed?
Produce a histogram of GPA and a Q-Q plot. If GPA is approximately normal, the Q-Q plot points should lie close to the diagonal line.
Hypothesis Test — Do STEM Students Score Higher?
The committee hypothesizes that STEM students score higher on exams. Test this formally: H&sub0;: STEM mean = non-STEM mean. Report t-statistic, p-value, and plain-English conclusion.
Correlation Matrix — Which Variables Relate to Exam Score?
Compute a correlation matrix for all numeric variables. Which predictors correlate most strongly with exam_score? Are any predictors correlated with each other (multicollinearity)?
Multiple Regression — Predict Exam Score
Fit a multiple regression model: exam_score ~ study_hours + sleep_hours + anxiety_score + gpa. Interpret each coefficient. Check model fit and residuals.
Write Your Report — Plain-English Conclusions
Write a 6-sentence plain-English report of your findings using cat(). Structure: (1) Data summary. (2) Normality finding. (3) STEM vs. non-STEM result. (4) Top correlates with exam score. (5) Regression model summary. (6) One actionable recommendation for the committee.
💡 View Sample Solution: All Tasks Together
This shows one complete approach to all 6 tasks. Your analysis may differ — that’s expected! Statistics involves judgment.
🧠 Final Brain Break
You just did a complete statistical analysis — the same workflow used in real research, policy analysis, and data science.
Reflect: Which part of the analysis felt most uncertain? Statistical analysis always involves judgment calls — which variables to include, how to interpret results, what to recommend. The numbers only tell part of the story.
✅ Key Takeaway
A complete statistical analysis follows a workflow: (1) explore and describe, (2) check assumptions, (3) test hypotheses, (4) examine relationships, (5) build a model, (6) communicate findings. The technical skills matter — but so does the plain-English interpretation that makes results actionable.
🎓 You’ve Completed Statistics with R!
You can now run complete statistical analyses in R — from raw data to written conclusions — using the tools that statisticians, researchers, and data scientists use every day.
You’ve mastered: descriptive statistics, probability distributions, the Central Limit Theorem, confidence intervals, hypothesis testing, correlation, simple and multiple regression, and full analysis workflows. All in R. All in the browser.