Analyzing Your Data
Analysis is where data starts to tell a story — the goal is to find patterns and answer your research question
📌 Before You Start
Prerequisites: Modules 1–5. No statistical software needed for this module.
Estimated time: ~45 minutes including the mini-analysis exercise.
What you need: Pen and paper or a calculator. The small dataset in the Your Turn section.
By the end of this module you will be able to describe quantitative analysis basics, conduct a mini thematic analysis, and identify common analytical mistakes.
💡 The Big Idea
Analysis is where your data starts to tell a story. The goal is to find patterns, make meaning, and answer your research question — while being honest about uncertainty and the limits of what your data can actually show.
🔍 Deep Dive
Quantitative Analysis: Descriptive Statistics
Before any complex analysis, describe your data. Descriptive statistics summarize the basic features of your dataset.
| Measure | What it tells you | When to use |
|---|---|---|
| Mean | The arithmetic average. Sensitive to outliers. | Symmetric, roughly normal distributions. Income, test scores (without extreme outliers). |
| Median | The middle value when sorted. Resistant to outliers. | Skewed distributions. Better for income, housing prices, or anything with extreme values. |
| Mode | The most frequent value. | Categorical data. "What is the most common response?" |
| Standard Deviation (SD) | How spread out the data is around the mean. Larger SD = more spread. | Always report alongside the mean. A mean without an SD is incomplete. |
| Frequency / Percentage | How often each value or category appears. | Categorical data (yes/no, gender, major, response options). |
Comparing Groups: Is the Difference Real?
If your research question compares groups (e.g., "Do students who tutor perform better than those who don't?"), you need to determine whether the difference you observe is real or just due to random chance.
Basic inferential statistics (concepts only — no formulas here):
| Test | When to use it |
|---|---|
| t-test | Comparing means of two groups. Example: Do tutored students score higher than non-tutored students? |
| Chi-square (χ²) | Comparing frequencies or proportions of categories. Example: Are women more likely than men to report financial stress? |
| Correlation | Measuring the strength and direction of a relationship between two continuous variables. Example: Is there a relationship between study hours and GPA? |
Visualizing Quantitative Data
The right chart depends on what you are trying to show:
| Chart Type | Best for |
|---|---|
| Bar chart | Comparing categories (e.g., average GPA by major) |
| Histogram | Showing the distribution of a continuous variable (e.g., distribution of study hours) |
| Scatter plot | Showing the relationship between two continuous variables (e.g., sleep vs. GPA) |
| Pie / Donut chart | Showing proportions of a whole (use sparingly — bar charts are often clearer) |
Qualitative Analysis: Thematic Analysis
Thematic analysis is the most common qualitative method for analyzing interviews, open-ended survey responses, or documents. It involves identifying patterns (themes) in text.
Mixed Methods: The Best of Both
Mixed methods combines quantitative and qualitative approaches in the same study. Common patterns:
- Explain: Survey data shows what is happening (quantitative); interviews explain why (qualitative).
- Explore: Interviews identify themes (qualitative); a survey tests how widespread those themes are (quantitative).
Common Analytical Mistakes
📋 Real Example: A Mini Thematic Analysis
Survey question: "What is the biggest challenge you face as a college student?"
Here are responses from 8 students (condensed). Codes are shown in brackets.
- "Balancing work and classes is exhausting. I work 30 hours a week." [work-life balance] [fatigue]
- "I never feel like I belong here. Everyone seems to already know what they're doing." [belonging] [imposter syndrome]
- "Money. Always money. I stress about rent every month." [financial stress]
- "I'm the first in my family to go to college. No one can help me navigate this." [first-generation] [lack of support]
- "Working nights means I miss office hours and study groups." [work-life balance] [isolation]
- "I have ADHD and the lecture format doesn't work for me." [disability] [learning environment]
- "I feel like I'm always behind financially. I can't afford the calculator we need for class." [financial stress]
- "Sometimes I think everyone else gets this except me." [imposter syndrome] [belonging]
Emerging themes:
- Theme 1: Financial precarity (responses 3, 7, and echoed in 1)
- Theme 2: Work-study conflict (responses 1, 5)
- Theme 3: Belonging and imposter syndrome (responses 2, 4, 8)
Interpretation: Financial stress and imposter syndrome are the most common challenges. First-generation students appear particularly vulnerable to both. Note: With only 8 responses, these themes are preliminary — a larger dataset is needed before claiming saturation.
🖐️ Your Turn
What you need: A calculator or paper. About 15 minutes.
Here is a small dataset of 10 students. For each student, you have: study hours per day, hours of sleep per night, and GPA (on a 4.0 scale).
| Student | Study Hours/Day | Sleep Hours/Night | GPA |
|---|---|---|---|
| 1 | 2 | 6 | 2.8 |
| 2 | 4 | 7 | 3.4 |
| 3 | 1 | 5 | 2.2 |
| 4 | 5 | 8 | 3.8 |
| 5 | 3 | 7 | 3.1 |
| 6 | 6 | 6 | 3.6 |
| 7 | 2 | 5 | 2.5 |
| 8 | 4 | 8 | 3.5 |
| 9 | 1 | 4 | 1.9 |
| 10 | 5 | 7 | 3.7 |
- Calculate the mean for study hours, sleep hours, and GPA across all 10 students.
- Find the highest and lowest GPA in the dataset. Which students have them?
- Looking at the data, what pattern do you observe between study hours and GPA? Between sleep and GPA?
- Important caution: With only 10 students, can you conclude that studying more causes a higher GPA? What confounding variables might explain the pattern?
Mean GPA answer to check your work: 3.05
🧠 Brain Break — 2 Minutes
Think about a statistic you have seen recently.
A news headline. A product claim. A political argument. "X% of people believe..." or "Y is linked to Z..."
Ask: Is that a mean or median? What is the sample size? Could there be a confounding variable? Is the claim correlation, or are they implying causation? Now you have the tools to ask these questions every time.
✅ Key Takeaways
- Descriptive statistics (mean, median, mode, SD) summarize your data before you do any deeper analysis. Always start here.
- Inferential statistics (t-tests, chi-square, correlation) help you determine whether patterns in your sample likely reflect something real in the broader population.
- Thematic analysis is the core tool for qualitative data: read → code → find themes → interpret.
- Always distinguish correlation from causation — this is the most common mistake in research communication.
- Cherry-picking results, p-hacking, and ignoring outliers are all forms of bad (and sometimes unethical) research practice.
🎯 Module 6 Complete!
You can now make sense of data. In Module 7, you will learn the ethical rules that protect participants and the integrity of the research process.
Continue to Module 7: Research Ethics →