Save or print this lesson:

← Previous Lesson Lesson 3 of 4 Next Lesson →

Lesson 3: Data Collection Methods

Estimated time: 30-35 minutes

Learning Objectives

By the end of this lesson, you will be able to:

Distinguish between observational studies, surveys, and experiments
Identify different sampling methods and their appropriate uses
Recognize common sources of bias in data collection
Evaluate the quality of data based on collection methods
Understand the difference between population and sample

Why Data Collection Matters

You've probably heard the saying "Garbage in, garbage out." In statistics, this is absolutely critical: no amount of fancy analysis can fix bad data.

The quality of your conclusions depends entirely on the quality of your data.

Even with perfect statistical methods, biased or poorly collected data will lead to wrong conclusions. That's why understanding data collection is so important!

Key Terms

Population: The entire group you want to study (e.g., all college students in the US)
Sample: A subset of the population that you actually collect data from (e.g., 1,000 students from 50 colleges)
Parameter: A numerical summary of the population (usually unknown)
Statistic: A numerical summary of the sample (what we calculate from our data)

Example: We want to know the average height of all UCLA students (parameter), so we measure 200 students (sample) and find their average is 5'7" (statistic).

Three Main Data Collection Methods

Observational Studies

What it is: Researchers observe and record data without manipulating or interfering with the subjects. You're just watching what happens naturally.

Examples:

Recording how many people wear masks in a grocery store
Tracking students' study habits and their GPAs
Observing wildlife behavior in their natural habitat
Analyzing medical records to find health patterns

Advantages

Studies real-world behavior
Ethical when experiments wouldn't be
Less expensive
Can study large populations

Limitations

Can't establish causation
Many confounding variables
Observer bias possible
Can't control for other factors

Example

Study: Researchers observe that students who eat breakfast score higher on tests.

Conclusion: There's an association between breakfast and test scores, but we can't say breakfast causes higher scores. Maybe students who eat breakfast also sleep more, have more family support, or are generally more organized—we can't separate these factors in an observational study.

Surveys

What it is: Researchers ask people questions to collect data about opinions, behaviors, or characteristics. Can be done through questionnaires, interviews, or polls.

Examples:

Election polls asking who you'll vote for
Customer satisfaction surveys
Health questionnaires at the doctor's office
Course evaluation forms

Advantages

Can reach large samples quickly
Collect data on attitudes/opinions
Relatively inexpensive
Get direct responses

Limitations

People may lie or misremember
Response bias (social desirability)
Low response rates
Question wording affects answers

Common Survey Biases

Leading questions: "Don't you agree that..." pushes people toward an answer
Social desirability bias: People give answers that make them look good (over-report voting, under-report unhealthy behaviors)
Non-response bias: People who don't respond may differ from those who do
Question order effects: Earlier questions can influence later answers

Experiments

What it is: Researchers actively manipulate one or more variables (treatment) and observe the effect on another variable (outcome), while controlling other factors. This is the ONLY method that can establish causation!

Key components:

Treatment/Independent variable: What the researcher manipulates
Outcome/Dependent variable: What is measured
Control group: Group that doesn't receive treatment (for comparison)
Random assignment: Participants randomly assigned to groups

Examples:

Testing if a new drug reduces symptoms (drug vs. placebo)
Comparing two teaching methods for effectiveness
Testing whether fertilizer increases crop yield

Advantages

CAN establish causation!
Control for confounding variables
Can isolate specific effects
Strong evidence for conclusions

Limitations

Often expensive and time-consuming
Sometimes unethical (can't give people cancer to study it!)
Artificial lab settings may not reflect real life
Smaller sample sizes

Example: The Gold Standard

Research question: Does a new medication reduce headache pain?

Experiment design:

Recruit 200 people with chronic headaches
Randomly assign 100 to medication group, 100 to placebo group
Make it double-blind (neither patients nor doctors know who got what)
After 30 days, measure headache frequency in both groups
Compare results

Why it works: Random assignment means the groups should be similar in every way except the medication. If the medication group has fewer headaches, we can conclude the medication caused the improvement!

Sampling Methods: How Do We Choose Who to Study?

Since we usually can't study an entire population, we take a sample. But how we choose that sample makes a huge difference in whether our results are reliable!

Simple Random Sample

Every member of the population has an equal chance of being selected (like drawing names from a hat).

Best when: You want unbiased representation

Example: Using a random number generator to select 100 students from a list of all 5,000 students

Stratified Sample

Divide population into groups (strata), then randomly sample from each group.

Best when: You want to ensure all subgroups are represented

Example: Sample 50 freshmen, 50 sophomores, 50 juniors, 50 seniors to ensure all years are represented

Cluster Sample

Divide population into clusters, randomly select entire clusters, study everyone in those clusters.

Best when: Population is naturally grouped and spread out geographically

Example: Randomly select 10 schools and survey all students in those 10 schools

Systematic Sample

Select every kth member of the population (e.g., every 10th person on a list).

Best when: You have an ordered list and want even coverage

Example: Survey every 5th customer who enters a store

Convenience Sample

Sample whoever is easiest to reach.

Problem: BIASED! Not representative of population

Example: Surveying only your friends, or only people in the library

Voluntary Response Sample

People choose whether to participate (self-selection).

Problem: VERY BIASED! People with strong opinions respond

Example: Online polls, call-in surveys, Yelp reviews

The Two Big Sampling Mistakes to Avoid

Convenience Sampling: Sampling whoever is easy to reach creates bias. Just because 90% of people AT THE GYM support more fitness funding doesn't mean 90% of ALL people do!
Voluntary Response: When people self-select into a survey, you get extreme opinions. Happy customers rarely leave reviews, but angry ones do—so online ratings are biased toward negativity!

Evaluate These Scenarios

Read each scenario and identify the problems. Click to see the analysis!

Scenario 1: A news website posts a poll asking "Should taxes be raised?" Visitors can click Yes or No. 10,000 people respond, with 78% saying No. The website reports "78% of Americans oppose tax increases."

Problems:

Voluntary response bias: Only people with strong opinions click—likely those who really hate taxes
Self-selection bias: Website visitors aren't representative of all Americans
Question wording: Could be leading depending on context

Better approach: Random digit dialing to sample all Americans, with neutral question wording

Scenario 2: A professor wants to know if students like the new textbook. She asks the 5 students who always sit in the front row. All 5 say they love it, so she concludes the textbook is great.

Problems:

Convenience sample: Only asking easy-to-reach students (front row)
Sample too small: Only 5 students out of whole class
Selection bias: Front-row students may be more engaged and positive

Better approach: Randomly select 30-40 students from the entire class roster

Scenario 3: A gym wants to survey members about satisfaction. They email all 2,000 members. Only 150 respond (7.5% response rate). Of those, 85% say they're satisfied.

Problems:

Non-response bias: Very low response rate—who responded?
Likely bias: Satisfied members more likely to respond; dissatisfied members may have ignored it
Missing the unhappy customers: The 92.5% who didn't respond might tell a different story

Better approach: Stratified random sampling with follow-up calls/texts to increase response rate to 60%+

Scenario 4: A school wants to know if students support a uniform policy. They survey students during lunch in the cafeteria by approaching tables and asking everyone there.

Problems:

Convenience sample: Only students who eat in cafeteria (what about students who leave campus or bring lunch?)
Time bias: Only one lunch period surveyed (different groups at different times)
Peer pressure: Answering in front of friends may influence responses

Better approach: Stratified random sample of students from all grades, with anonymous survey during homeroom/advisory

Quick Reference: Comparing Methods

Method	Can Establish Causation?	Cost	Best For
Observational Study	No - only association	Low to Medium	Studying things as they naturally occur
Survey	No - only association	Low	Collecting opinions, self-reported behaviors
Experiment	Yes - can show causation!	High	Testing cause-and-effect relationships

Key Takeaways

Experiments are the ONLY method that can establish causation (because of random assignment and control)
Observational studies and surveys can only show association/correlation
Random sampling is crucial for unbiased, representative data
Convenience samples and voluntary response are almost always biased—avoid them!
Population = entire group of interest; Sample = subset you actually study
Bad data collection = bad conclusions, no matter how good your analysis is

Ready for More?

Next Lesson

In Lesson 4, you'll learn about data visualization—how to create and interpret graphs, and how visuals can be misleading!

Start Lesson 4

Need Help?

Confused about sampling methods or bias? The AI tutor can help clarify!

Ask AI Tutor