Learn Without Walls
← Back to Module 1

Module 1 Practice Problems

Practice Makes Perfect!

These 15 problems cover all topics from Module 1. Work through them to reinforce your learning!

Your Progress

0 / 15

Solutions revealed: 0 / 15

Part 1: What is Statistics?

1 Descriptive vs. Inferential Easy

For each scenario below, identify whether it uses descriptive statistics or inferential statistics. Explain your reasoning.

  1. A teacher calculates that the average grade in her class was 82%.
  2. Based on a survey of 500 voters, a news outlet predicts that 52% of all voters will support Candidate A.
  3. A hospital reports that 45 babies were born there last month: 23 boys and 22 girls.
  4. A pharmaceutical company tests a new drug on 2,000 patients and concludes it will reduce symptoms in the general population.

Solution:

(a) Descriptive Statistics

Explanation: The teacher is simply summarizing data she already has (her class's grades). She's not making predictions or generalizations beyond her specific class.

(b) Inferential Statistics

Explanation: The news outlet is using data from a sample (500 voters) to make a prediction about a larger population (all voters). They're inferring from the sample to the population.

(c) Descriptive Statistics

Explanation: The hospital is describing what happened—summarizing the births at their specific location. No predictions or generalizations are being made.

(d) Inferential Statistics

Explanation: The company tested a sample (2,000 patients) and is drawing conclusions about the general population. They're using sample data to infer what will happen for everyone.

Key Concept: Descriptive statistics summarize what you have. Inferential statistics predict or generalize beyond your data.

2 Real-World Applications Easy

Give three examples from YOUR life in the past week where you encountered statistics. For each example:

  • Describe the statistic you saw
  • Explain what it was measuring
  • Describe how it might have influenced a decision (yours or someone else's)

Sample Answers:

Example 1: Weather Forecast

Statistic: "70% chance of rain tomorrow"

What it measures: The probability (likelihood) that it will rain based on historical weather patterns and current conditions

Decision impact: I decided to bring an umbrella to campus because 70% is a high probability

Example 2: Product Reviews

Statistic: "4.5 stars based on 1,247 reviews" for a laptop I was considering

What it measures: Average customer satisfaction based on ratings from 1,247 people who purchased the laptop

Decision impact: The high rating (4.5/5) and large number of reviews (1,247) gave me confidence to purchase that laptop over others with fewer reviews

Example 3: Social Media Engagement

Statistic: "Your post reached 523 people" on Instagram

What it measures: The number of unique users who saw my post in their feed

Decision impact: Seeing this engagement stat made me post more content at that time of day when my posts get more reach

Note: Your examples will be different! The point is to recognize how often you encounter statistics and how they influence decisions.

3 Correlation vs. Causation Medium

A researcher finds that people who own more books tend to have higher incomes. The correlation is strong and statistically significant.

Questions:

  1. Can we conclude that buying more books causes higher income? Why or why not?
  2. Can we conclude that having higher income causes people to buy more books? Why or why not?
  3. What are some other possible explanations for this correlation?

Solution:

(a) No, we cannot conclude that buying books causes higher income.

Reasoning: This is an observational study (not an experiment), so we can't establish causation. Just because two things are correlated doesn't mean one causes the other. It's unlikely that simply purchasing books would increase your income!

(b) Possibly, but we still can't prove causation from this study alone.

Reasoning: While it's more plausible that higher income → buying more books (you have more disposable income), we still only have correlation, not causation. We'd need an experiment to prove it.

(c) Alternative explanations (confounding variables):

  • Education level: People with more education tend to have higher incomes AND tend to own more books. Education could be causing both!
  • Reading interest/culture: People who value learning may both pursue higher-paying careers and collect books
  • Age: Older people have had more time to accumulate both wealth and book collections
  • Location: People in cities may have higher incomes and more access to bookstores

Key Lesson: Correlation ≠ Causation. There's almost always a "third variable" that could explain the relationship!

Part 2: Types of Data

4 Classify the Data Easy

For each variable, identify whether it is quantitative or qualitative. If quantitative, specify whether it is discrete or continuous.

  1. Number of siblings a student has
  2. Brand of smartphone (Apple, Samsung, Google, etc.)
  3. Time spent studying per day (in hours)
  4. T-shirt size (Small, Medium, Large, XL)
  5. Heart rate (beats per minute)
  6. Number of courses a student is taking
  7. Temperature in degrees Fahrenheit
  8. Marital status (Single, Married, Divorced)

Solution:

(a) Quantitative - Discrete

Reasoning: Number of siblings is numerical (quantitative) and can only be whole numbers (0, 1, 2, 3...). You can't have 2.5 siblings!

(b) Qualitative (Categorical)

Reasoning: Brand names are categories, not numbers. Can't do math with "Apple + Samsung"!

(c) Quantitative - Continuous

Reasoning: Time is numerical (quantitative) and can take any value—2.5 hours, 2.53 hours, etc. It's measured, not counted.

(d) Qualitative (Categorical - Ordinal)

Reasoning: T-shirt sizes are categories. Even though there's an order (S < M < L < XL), they're not numerical values.

(e) Quantitative - Continuous

Reasoning: Heart rate is numerical and can be measured precisely—72 bpm, 72.3 bpm, etc.

(f) Quantitative - Discrete

Reasoning: Number of courses is numerical and can only be whole numbers (1, 2, 3, 4 courses, not 3.7 courses).

(g) Quantitative - Continuous

Reasoning: Temperature is numerical and can take any value (72.5°F, 72.53°F, etc.).

(h) Qualitative (Categorical - Nominal)

Reasoning: Marital status categories have no inherent numerical meaning or order.

Quick Check: If you can COUNT it (only whole numbers) → Discrete. If you MEASURE it (any value) → Continuous.

5 Levels of Measurement Hard

Identify the level of measurement (Nominal, Ordinal, Interval, or Ratio) for each variable. Justify your answer.

  1. Movie ratings: (terrible), (poor), (good), (excellent)
  2. Distance driven to school (in miles)
  3. Year of graduation (2023, 2024, 2025, 2026)
  4. Student ID numbers
Hint: Ask yourself: (1) Is there an order? (2) Are intervals equal? (3) Is there a true zero?

Solution:

(a) Ordinal

Justification: The ratings have a meaningful order (terrible < poor < good < excellent), but the "distance" between ratings isn't necessarily equal. The jump from terrible to poor might feel different than poor to good. You can rank them but can't do meaningful math (can't say "excellent is twice as good as poor").

(b) Ratio

Justification: Distance has equal intervals (1 mile = 1 mile everywhere) AND a true zero (0 miles = no distance driven). You can meaningfully say "10 miles is twice as far as 5 miles." This is the highest level of measurement—all math operations are valid!

(c) Interval

Justification: Years have equal intervals (2024-2023 = same as 2025-2024), but there's no true "zero year" that means "absence of time." Year 0 is arbitrary (we use it for the calendar, but time existed before!). You can't say "2024 is twice as much as 1012."

(d) Nominal

Justification: Student ID numbers are just labels/names. Even though they're written as numbers, they have no mathematical meaning. ID #5000 isn't "bigger" or "better" than ID #2500—they're just identifiers. No order, no intervals, no zero point.

Summary Hierarchy: Nominal < Ordinal < Interval < Ratio (each level has all properties of the ones below it plus more)

6 Tricky Cases Medium

Explain why each of these variables that LOOK quantitative are actually qualitative:

  1. Zip codes (90405, 10001, etc.)
  2. Sports jersey numbers (#23, #12, etc.)
  3. Social Security Numbers

Solution:

All three are qualitative (categorical) even though they're written as numbers!

(a) Zip Codes

Why qualitative: Zip codes are geographic identifiers—labels for locations, not quantities. You can't meaningfully average zip codes or say 90405 is "greater than" 90402. Even though they're digits, they function as categories (like names).

(b) Jersey Numbers

Why qualitative: Jersey numbers identify players—they're labels, not measurements. A player wearing #23 isn't "better than" or "more than" a player wearing #12. You can't average jersey numbers meaningfully. They're categorical identifiers.

(c) Social Security Numbers

Why qualitative: SSNs are unique identifiers for people. They have no numerical meaning. You wouldn't say someone with SSN 555-12-3456 is "greater than" someone with SSN 123-45-6789. They're nominal categorical data—just using numbers as labels.

The Test: Ask yourself: "Can I do meaningful math with these numbers?" If adding, averaging, or comparing them doesn't make sense, they're probably qualitative labels, not quantitative measurements!

Part 3: Data Collection Methods

7 Identify the Method Easy

For each scenario, identify whether it is an observational study, survey, or experiment.

  1. Researchers randomly assign 100 patients to receive either a new medication or a placebo, then compare outcomes.
  2. A psychologist observes children's behavior on a playground without interfering.
  3. A company emails customers asking them to rate their satisfaction on a scale of 1-10.
  4. Scientists record the temperature and ice cream sales each day for a month to see if they're related.

Solution:

(a) Experiment

Reasoning: Researchers are actively manipulating a variable (medication vs. placebo) and using random assignment. This is the hallmark of an experiment. Only experiments can establish causation!

(b) Observational Study

Reasoning: The psychologist is watching and recording what happens naturally without interfering or manipulating anything. Classic observational study.

(c) Survey

Reasoning: The company is asking people questions to collect data about their opinions/experiences. This is a survey (questionnaire method).

(d) Observational Study

Reasoning: The scientists are observing and recording data that occurs naturally (temperature and sales) without manipulating anything. They're looking for an association, not testing causation.

Key Distinction: If researchers actively manipulate/control something → Experiment. If they just observe/ask → Observational/Survey.

8 Sampling Methods Medium

A university with 20,000 students wants to survey students about campus dining. Identify the sampling method used in each scenario:

  1. Researchers use a random number generator to select 500 students from the complete list of all 20,000 students.
  2. Researchers randomly select 10 dormitories and survey ALL students living in those 10 dorms.
  3. Researchers survey the first 500 students who walk into the dining hall on Monday.
  4. Researchers ensure they get 125 freshmen, 125 sophomores, 125 juniors, and 125 seniors by randomly sampling from each class year.

Solution:

(a) Simple Random Sample

Reasoning: Every student has an equal chance of being selected from the entire population. This is the gold standard for unbiased sampling!

(b) Cluster Sample

Reasoning: The population was divided into natural groups (dorms), entire clusters were randomly selected, and everyone in those clusters was surveyed. Classic cluster sampling!

(c) Convenience Sample

Reasoning: This is BIASED sampling! They're just surveying whoever is easy to reach (first people in the dining hall). This group might not represent all students—what about students who don't eat at the dining hall, or who come at different times?

(d) Stratified Random Sample

Reasoning: The population was divided into strata (class years), and random samples were taken from each stratum to ensure all groups are represented equally.

Which is best? (a), (b), and (d) are all valid methods. Method (c) is problematic because it's likely biased—avoid convenience sampling in research!

9 Identifying Bias Hard

A local gym wants to know if members are satisfied with their facilities. They post a survey on their website that anyone can fill out voluntarily. 200 people respond, with 85% saying they're satisfied.

Questions:

  1. Identify at least THREE sources of bias in this survey design.
  2. For each source of bias you identified, explain how it might affect the results.
  3. Suggest a better way to collect this data that would reduce bias.

Solution:

(a) Three Sources of Bias:

  1. Voluntary Response Bias: People self-select whether to respond
  2. Non-Response Bias: Only 200 responded (out of how many members?)
  3. Selection Bias: Only members who visit the gym's website see the survey

(b) How Each Bias Affects Results:

1. Voluntary Response Bias: People with strong opinions (very happy OR very unhappy) are more likely to respond. However, in this case, people who are SATISFIED are probably more willing to take time to fill out their gym's survey, while dissatisfied members might have already quit or ignore the survey. This could inflate the satisfaction rate.

2. Non-Response Bias: The 85% satisfaction only reflects the 200 people who responded. What about the thousands who didn't respond? They might be less satisfied, too busy, or disengaged. We're missing a huge portion of the membership's opinions.

3. Selection Bias: Only people who regularly check the gym's website will see the survey. Members who don't use the website (perhaps less engaged members?) never get the chance to respond. This excludes certain groups entirely.

(c) Better Data Collection Method:

Recommended approach: Stratified Random Sampling with follow-up

  • Get a complete list of all current gym members
  • Randomly select 300-500 members (ensuring different age groups and membership types are represented)
  • Contact them via email AND text with survey link
  • Follow up with non-responders with phone calls or in-person requests
  • Aim for at least 60-70% response rate
  • Consider offering a small incentive (raffle entry) to increase response rate

Why this is better: Random selection reduces bias, stratification ensures all groups are represented, and follow-up reduces non-response bias. You get a more accurate picture of true member satisfaction!

10 Experiments and Causation Hard

A researcher wants to test whether a new study technique improves exam scores. She has two options:

Option A: Survey 200 students, asking them what study techniques they use and comparing their exam scores.

Option B: Randomly assign 100 students to use the new technique and 100 students to use traditional methods, then compare exam scores.

Questions:

  1. Which option is an observational study and which is an experiment?
  2. Which option can establish causation (that the technique causes better scores)? Why?
  3. What confounding variables might affect Option A but not Option B?

Solution:

(a) Identifying the Methods:

Option A: Observational Study (Survey) - Just asking and observing, no manipulation

Option B: Experiment - Actively assigning treatments (study techniques) to groups

(b) Which Can Establish Causation?

Option B (Experiment) can establish causation.

Why: Because of random assignment! When you randomly assign students to groups, those groups should be similar in every way except the study technique. They should have similar:

  • Prior knowledge and ability
  • Motivation levels
  • Time available for studying
  • Test anxiety levels
  • Any other factors that could affect scores

If the new technique group scores higher, it's reasonable to conclude the technique caused the improvement, since the groups were otherwise identical!

Option A (Observational) cannot establish causation because students chose their own study techniques—they weren't randomly assigned. Differences in scores could be due to the technique OR due to differences in the students themselves.

(c) Confounding Variables in Option A:

  • Student motivation: More motivated students might both seek out new study techniques AND study more hours, leading to higher scores
  • Prior academic performance: Better students might be more likely to experiment with new techniques
  • Time availability: Students with more free time might use more elaborate study techniques
  • Learning style: Different techniques might work better for different learners
  • Course difficulty: Students in different sections or majors might face different exam difficulty levels

Why Option B Avoids These: Random assignment distributes all these confounding variables evenly across both groups. Some motivated students will be in each group, some struggling students in each group, etc. The technique is the ONLY systematic difference!

Bottom Line: If you want to prove X causes Y, you need a well-designed experiment with random assignment. Observational studies can show correlation, but never causation!

Part 4: Data Visualization

11 Choose the Right Graph Easy

For each scenario, identify the BEST type of graph to use and explain why.

  1. Showing how a city's population has changed from 1950 to 2020 (measured every 10 years)
  2. Comparing the number of students majoring in Biology, Chemistry, Physics, and Math
  3. Showing the distribution of heights of all students in a school
  4. Investigating whether there's a relationship between hours slept and test performance
  5. Showing what percentage of a school's budget goes to salaries, facilities, programs, and administration

Solution:

(a) Line Graph

Why: Population is changing over TIME (1950-2020). Line graphs are perfect for showing trends over time. The line connecting the points shows the continuous growth pattern.

(b) Bar Chart

Why: We're comparing quantities across different CATEGORIES (the four majors). Bar charts make it easy to compare the heights of bars to see which major has more students.

(c) Histogram

Why: Height is continuous numerical data, and we want to see the DISTRIBUTION (shape) of the data. A histogram groups heights into ranges (bins) like 5'0"-5'2", 5'2"-5'4", etc., and shows how many students fall in each range. This reveals whether heights are normally distributed, skewed, etc.

(d) Scatterplot

Why: We want to see the RELATIONSHIP between two quantitative variables (hours slept and test score). Each point represents one student with their sleep hours on x-axis and test score on y-axis. The pattern of points reveals whether there's a correlation.

(e) Pie Chart

Why: We're showing PARTS OF A WHOLE where the percentages add to 100%. A pie chart visually shows how the total budget is divided among categories. (Note: A bar chart would also work here and might be easier to read!)

12 Interpreting Graphs Medium

Imagine a histogram showing the distribution of exam scores in a class. The histogram shows:

  • 50-60: 2 students
  • 60-70: 5 students
  • 70-80: 18 students
  • 80-90: 20 students
  • 90-100: 5 students

Based on this histogram, answer:

  1. How many students took the exam?
  2. What range contains the most students?
  3. Is this distribution symmetric, skewed left, or skewed right?
  4. Approximately what percentage of students scored 80% or higher?

Solution:

(a) Total number of students: 50 students

Calculation: Add all the frequencies: 2 + 5 + 18 + 20 + 5 = 50

(b) Range with most students: 80-90 (20 students)

Reasoning: This bin has the tallest bar with 20 students.

(c) Distribution shape: Skewed left (negatively skewed)

Reasoning: The bulk of the data is on the RIGHT (higher scores), with a tail extending to the LEFT (lower scores). Most students did well (70-90 range), but a few students scored lower, creating the left tail. Think of it as: the tail points to the skew direction!

(d) Percentage who scored 80% or higher: 50%

Calculation:

  • Students with 80% or higher: 20 (from 80-90) + 5 (from 90-100) = 25 students
  • Percentage: (25 / 50) × 100% = 50%

Half the class scored 80% or higher!

Histogram Reading Tips:

  • Add up all bar heights to get total count
  • The tallest bar is the mode (most common range)
  • Where the tail points = direction of skew
  • Sum specific bins to answer "how many" or "what percent" questions

13 Misleading Visualizations Medium

A company creates a bar chart to show their sales "skyrocketing" from 2023 to 2024:

  • 2023: $98,000 in sales
  • 2024: $102,000 in sales

However, the y-axis starts at $95,000 (not zero), making the 2024 bar appear three times taller than the 2023 bar visually.

Questions:

  1. What is the actual percentage increase in sales from 2023 to 2024?
  2. Why is starting the y-axis at $95,000 misleading?
  3. How would the graph look different if the y-axis started at zero?
  4. Is there ever a time when NOT starting at zero is acceptable? Explain.

Solution:

(a) Actual percentage increase:

Calculation:

  • Increase amount: $102,000 - $98,000 = $4,000
  • Percentage: ($4,000 / $98,000) × 100% ≈ 4.08%

That's about a 4% increase—modest, not "skyrocketing"!

(b) Why it's misleading:

When the y-axis starts at $95,000 instead of zero:

  • The 2023 bar only goes from $95k to $98k (3 units tall)
  • The 2024 bar goes from $95k to $102k (7 units tall)
  • Visually, it looks like sales more than DOUBLED (7/3 = 2.33x taller)
  • In reality, sales increased by only 4%!

The truncated axis exaggerates the visual difference, creating a false impression of dramatic growth.

(c) How it would look with y-axis at zero:

If the y-axis started at $0:

  • 2023 bar: 98 units tall
  • 2024 bar: 102 units tall
  • The bars would look almost the same height (because they ARE almost the same!)
  • The 4% increase would be visible but appropriately small

This would accurately represent the modest growth.

(d) When is NOT starting at zero acceptable?

LINE GRAPHS (sometimes acceptable):

  • When tracking small changes in large values over time (e.g., daily stock prices from $150-$155)
  • When the focus is on TREND rather than total magnitude
  • When the y-axis is clearly labeled so readers understand the scale
  • Example: A line graph of body temperature over a day (98°F-101°F) doesn't need to start at 0°F

BAR CHARTS (rarely acceptable):

  • Bar charts should ALMOST ALWAYS start at zero because they show magnitude/quantity
  • The length of the bar represents the value, so truncating distorts the comparison

Best practice: When in doubt, start at zero. If you must truncate, include a "break" symbol in the axis () to show the scale is not starting at zero, and make this VERY clear to readers!

14 Bar Chart vs. Histogram Easy

Explain the difference between a bar chart and a histogram. Include:

  1. What type of data each is used for
  2. Whether the bars touch or have gaps
  3. One example of when you'd use each

Solution:

Feature Bar Chart Histogram
Data Type Categorical (qualitative) data Continuous quantitative data
Do bars touch? No - bars have gaps between them Yes - bars touch each other
Why? Gaps show categories are distinct and separate Touching bars show data is continuous (no gaps in values)
X-axis shows Category names (Ice cream flavors, Majors, etc.) Numerical ranges/bins (0-10, 10-20, 20-30, etc.)
Y-axis shows Frequency or count for each category Frequency (how many values fall in each bin)
Example Use Comparing number of students in different majors (Biology: 150, Chemistry: 120, Physics: 90) Showing distribution of student ages (18-19: 500 students, 19-20: 600 students, 20-21: 450 students)

Key Memory Trick:

Bar chart = Bars have Breaks (gaps) → for Categories

Histogram = bars Hug each other (touch) → for Continuous data

Visual Difference Example:

Bar Chart (Favorite Colors):

[Red]___[Blue]___[Green]___[Yellow]

← Notice gaps between bars for distinct categories

Histogram (Test Scores):

[50-60][60-70][70-80][80-90][90-100]

← Notice bars touching because scores are continuous

15 Creating Effective Visualizations Medium

You've collected data on study hours per week and GPA for 50 students. You want to create a visualization to see if there's a relationship.

Questions:

  1. What type of graph should you use? Why?
  2. What should be on the x-axis and what should be on the y-axis?
  3. List 5 best practices you should follow when creating this graph.
  4. If you see that points generally trend upward from left to right, what does that tell you? What does it NOT tell you?

Solution:

(a) Graph type: Scatterplot

Why: You want to see the RELATIONSHIP between two quantitative variables (study hours and GPA). A scatterplot displays each student as a point, with one variable on each axis, revealing any correlation patterns.

(b) Axis assignment:

  • X-axis: Study hours per week (independent/explanatory variable)
  • Y-axis: GPA (dependent/response variable)

Why this order: We're investigating whether study hours might affect GPA, so hours go on x-axis and GPA on y-axis. Convention is: if one variable might influence the other, the influencing variable goes on x.

(c) 5 Best Practices:

  1. Clear, descriptive title: "Relationship Between Study Hours and GPA for 50 College Students"
  2. Label both axes with units: X-axis: "Study Hours per Week", Y-axis: "GPA (0.0-4.0 scale)"
  3. Use appropriate scales: Start axes at reasonable values (GPA from 0.0-4.0, study hours from 0)
  4. Make points visible: Use clear markers/dots that don't overlap too much; if overlap is an issue, use semi-transparent points
  5. Include sample size and source: Note "n=50 students" and "Data collected: Fall 2024"

Bonus practices:

  • Keep it simple—no unnecessary decorations or 3D effects
  • Use color purposefully (or keep it monochrome)
  • Ensure axes have reasonable increments (not too many tick marks, not too few)

(d) Interpretation:

What upward trend DOES tell you:

  • There is a positive correlation between study hours and GPA
  • Students who study more hours tend to have higher GPAs
  • As study hours increase, GPA tends to increase too
  • There's an association between the two variables

What upward trend does NOT tell you:

  • It does NOT prove causation! You can't conclude that studying more causes higher GPA
  • Why not? This is observational data, not an experiment. Many confounding variables could explain the relationship:
    • More motivated students might both study more AND have higher ability
    • Students taking easier courses might have more time to study and higher grades
    • Better study techniques (not just hours) might explain the difference
    • Prior knowledge from high school might affect both study efficiency and grades
  • The correlation doesn't tell you the exact GPA for any given study hours (it's a trend, not a perfect prediction)
  • It doesn't tell you if the relationship is linear or curved

Bottom line: A scatterplot can reveal correlation, but remember: Correlation ≠ Causation! To prove studying more causes higher GPA, you'd need a controlled experiment.

Great Work!

You've completed all 15 practice problems! These problems covered:

Next step: Test your knowledge with the Module 1 Quiz!