Learn Without Walls
← Previous Lesson Lesson 3 of 4 Next Lesson →

Lesson 3: Data Collection Methods

Estimated time: 30-35 minutes

Learning Objectives

By the end of this lesson, you will be able to:

Why Data Collection Matters

You've probably heard the saying "Garbage in, garbage out." In statistics, this is absolutely critical: no amount of fancy analysis can fix bad data.

The quality of your conclusions depends entirely on the quality of your data.

Even with perfect statistical methods, biased or poorly collected data will lead to wrong conclusions. That's why understanding data collection is so important!

Key Terms

  • Population: The entire group you want to study (e.g., all college students in the US)
  • Sample: A subset of the population that you actually collect data from (e.g., 1,000 students from 50 colleges)
  • Parameter: A numerical summary of the population (usually unknown)
  • Statistic: A numerical summary of the sample (what we calculate from our data)

Example: We want to know the average height of all UCLA students (parameter), so we measure 200 students (sample) and find their average is 5'7" (statistic).

Three Main Data Collection Methods

Observational Studies

What it is: Researchers observe and record data without manipulating or interfering with the subjects. You're just watching what happens naturally.

Examples:

  • Recording how many people wear masks in a grocery store
  • Tracking students' study habits and their GPAs
  • Observing wildlife behavior in their natural habitat
  • Analyzing medical records to find health patterns

Advantages

  • Studies real-world behavior
  • Ethical when experiments wouldn't be
  • Less expensive
  • Can study large populations

Limitations

  • Can't establish causation
  • Many confounding variables
  • Observer bias possible
  • Can't control for other factors

Example

Study: Researchers observe that students who eat breakfast score higher on tests.

Conclusion: There's an association between breakfast and test scores, but we can't say breakfast causes higher scores. Maybe students who eat breakfast also sleep more, have more family support, or are generally more organized—we can't separate these factors in an observational study.

Surveys

What it is: Researchers ask people questions to collect data about opinions, behaviors, or characteristics. Can be done through questionnaires, interviews, or polls.

Examples:

  • Election polls asking who you'll vote for
  • Customer satisfaction surveys
  • Health questionnaires at the doctor's office
  • Course evaluation forms

Advantages

  • Can reach large samples quickly
  • Collect data on attitudes/opinions
  • Relatively inexpensive
  • Get direct responses

Limitations

  • People may lie or misremember
  • Response bias (social desirability)
  • Low response rates
  • Question wording affects answers

Common Survey Biases

  • Leading questions: "Don't you agree that..." pushes people toward an answer
  • Social desirability bias: People give answers that make them look good (over-report voting, under-report unhealthy behaviors)
  • Non-response bias: People who don't respond may differ from those who do
  • Question order effects: Earlier questions can influence later answers

Experiments

What it is: Researchers actively manipulate one or more variables (treatment) and observe the effect on another variable (outcome), while controlling other factors. This is the ONLY method that can establish causation!

Key components:

  • Treatment/Independent variable: What the researcher manipulates
  • Outcome/Dependent variable: What is measured
  • Control group: Group that doesn't receive treatment (for comparison)
  • Random assignment: Participants randomly assigned to groups

Examples:

  • Testing if a new drug reduces symptoms (drug vs. placebo)
  • Comparing two teaching methods for effectiveness
  • Testing whether fertilizer increases crop yield

Advantages

  • CAN establish causation!
  • Control for confounding variables
  • Can isolate specific effects
  • Strong evidence for conclusions

Limitations

  • Often expensive and time-consuming
  • Sometimes unethical (can't give people cancer to study it!)
  • Artificial lab settings may not reflect real life
  • Smaller sample sizes

Example: The Gold Standard

Research question: Does a new medication reduce headache pain?

Experiment design:

  1. Recruit 200 people with chronic headaches
  2. Randomly assign 100 to medication group, 100 to placebo group
  3. Make it double-blind (neither patients nor doctors know who got what)
  4. After 30 days, measure headache frequency in both groups
  5. Compare results

Why it works: Random assignment means the groups should be similar in every way except the medication. If the medication group has fewer headaches, we can conclude the medication caused the improvement!

Sampling Methods: How Do We Choose Who to Study?

Since we usually can't study an entire population, we take a sample. But how we choose that sample makes a huge difference in whether our results are reliable!

Simple Random Sample

Every member of the population has an equal chance of being selected (like drawing names from a hat).

Best when: You want unbiased representation

Example: Using a random number generator to select 100 students from a list of all 5,000 students

Stratified Sample

Divide population into groups (strata), then randomly sample from each group.

Best when: You want to ensure all subgroups are represented

Example: Sample 50 freshmen, 50 sophomores, 50 juniors, 50 seniors to ensure all years are represented

Cluster Sample

Divide population into clusters, randomly select entire clusters, study everyone in those clusters.

Best when: Population is naturally grouped and spread out geographically

Example: Randomly select 10 schools and survey all students in those 10 schools

Systematic Sample

Select every kth member of the population (e.g., every 10th person on a list).

Best when: You have an ordered list and want even coverage

Example: Survey every 5th customer who enters a store

Convenience Sample

Sample whoever is easiest to reach.

Problem: BIASED! Not representative of population

Example: Surveying only your friends, or only people in the library

Voluntary Response Sample

People choose whether to participate (self-selection).

Problem: VERY BIASED! People with strong opinions respond

Example: Online polls, call-in surveys, Yelp reviews

The Two Big Sampling Mistakes to Avoid

  1. Convenience Sampling: Sampling whoever is easy to reach creates bias. Just because 90% of people AT THE GYM support more fitness funding doesn't mean 90% of ALL people do!
  2. Voluntary Response: When people self-select into a survey, you get extreme opinions. Happy customers rarely leave reviews, but angry ones do—so online ratings are biased toward negativity!

Evaluate These Scenarios

Read each scenario and identify the problems. Click to see the analysis!

Scenario 1: A news website posts a poll asking "Should taxes be raised?" Visitors can click Yes or No. 10,000 people respond, with 78% saying No. The website reports "78% of Americans oppose tax increases."

Problems:
  • Voluntary response bias: Only people with strong opinions click—likely those who really hate taxes
  • Self-selection bias: Website visitors aren't representative of all Americans
  • Question wording: Could be leading depending on context

Better approach: Random digit dialing to sample all Americans, with neutral question wording

Scenario 2: A professor wants to know if students like the new textbook. She asks the 5 students who always sit in the front row. All 5 say they love it, so she concludes the textbook is great.

Problems:
  • Convenience sample: Only asking easy-to-reach students (front row)
  • Sample too small: Only 5 students out of whole class
  • Selection bias: Front-row students may be more engaged and positive

Better approach: Randomly select 30-40 students from the entire class roster

Scenario 3: A gym wants to survey members about satisfaction. They email all 2,000 members. Only 150 respond (7.5% response rate). Of those, 85% say they're satisfied.

Problems:
  • Non-response bias: Very low response rate—who responded?
  • Likely bias: Satisfied members more likely to respond; dissatisfied members may have ignored it
  • Missing the unhappy customers: The 92.5% who didn't respond might tell a different story

Better approach: Stratified random sampling with follow-up calls/texts to increase response rate to 60%+

Scenario 4: A school wants to know if students support a uniform policy. They survey students during lunch in the cafeteria by approaching tables and asking everyone there.

Problems:
  • Convenience sample: Only students who eat in cafeteria (what about students who leave campus or bring lunch?)
  • Time bias: Only one lunch period surveyed (different groups at different times)
  • Peer pressure: Answering in front of friends may influence responses

Better approach: Stratified random sample of students from all grades, with anonymous survey during homeroom/advisory

Quick Reference: Comparing Methods

Method Can Establish Causation? Cost Best For
Observational Study No - only association Low to Medium Studying things as they naturally occur
Survey No - only association Low Collecting opinions, self-reported behaviors
Experiment Yes - can show causation! High Testing cause-and-effect relationships

Key Takeaways

Ready for More?

Next Lesson

In Lesson 4, you'll learn about data visualization—how to create and interpret graphs, and how visuals can be misleading!

Start Lesson 4

Need Help?

Confused about sampling methods or bias? The AI tutor can help clarify!

Ask AI Tutor