Collecting Data
How you collect data determines what questions you can answer — bad collection equals bad conclusions
📌 Before You Start
Prerequisites: Modules 1–4. You should have a research question and basic sense of your design.
Estimated time: ~45 minutes including the survey design exercise.
What you need: Pen and paper or a Google Doc. Optional: Google Forms account.
By the end of this module you will be able to design a survey, plan an interview, understand sampling methods, and identify common sources of data bias.
💡 The Big Idea
How you collect data determines what questions you can answer. Bad data collection leads to bad conclusions — and no amount of sophisticated analysis can fix data that was collected carelessly, biasedly, or unethically.
🔍 Deep Dive
Surveys and Questionnaires
Surveys are the most common data collection method in social science research. Done well, they can efficiently gather data from hundreds or thousands of people. Done poorly, they produce misleading results.
Writing good survey questions:
- One idea per question. Never ask two things at once.
- Use plain, neutral language. Avoid loaded words that push the respondent toward an answer.
- Avoid double negatives. ("Don't you agree that not exercising is unhealthy?" — confusing.)
- Provide a balanced range of response options.
- Pilot test your survey on 3–5 people before distributing widely.
❌ Poorly written questions
"Don't you think that frequent social media use is harmful to your health and your grades?"
Problems: Leading ("don't you think"), double-barreled (two topics: health AND grades)
✅ Better versions
"How many hours per day do you typically spend on social media?"
"On a scale of 1–5, how much do you feel social media use affects your academic performance?"
Likert scales are the most common format for measuring attitudes and opinions:
Open vs. closed questions:
| Type | Example | Best for |
|---|---|---|
| Closed | "How many hours per day do you sleep? (under 5 / 5–6 / 7–8 / over 8)" | Quantitative analysis. Easy to compare across respondents. |
| Open | "Describe what a typical night of sleep looks like for you." | Qualitative depth. Captures nuance you wouldn't have predicted. |
Response biases to watch for:
- Social desirability bias: People answer how they think they should behave, not how they actually behave. ("Do you exercise regularly?" — people say yes more than they do it.)
- Acquiescence bias: Some people tend to agree with everything. Balance your scale direction to catch this.
Free tools: Google Forms and SurveyMonkey (free tier) are excellent for creating and distributing surveys. Google Forms automatically produces summary charts.
Interviews
Interviews gather rich, detailed data through direct conversation. They are the primary tool of qualitative research.
| Type | What it is | Best for |
|---|---|---|
| Structured | Fixed questions asked in the same order to every participant. Like a spoken survey. | Comparing responses across many participants. |
| Semi-Structured | Guide questions with flexibility to probe deeper based on responses. | Most common in qualitative research. Balances consistency with depth. |
| Unstructured | A conversation with minimal predetermined questions. Follows the participant's lead. | Exploratory research. Understanding an experience you know little about. |
Instead of: "Do you feel stressed about money?"
Try: "Can you tell me about a time when financial stress affected your daily life?"
Recording and transcribing: Always ask for permission before recording. Most research transcribes recordings verbatim for analysis. Tools like Otter.ai or Google Docs voice typing can help.
Observation
Sometimes the best data comes from watching what people actually do, not what they say they do.
| Type | What it means |
|---|---|
| Participant Observation | The researcher joins the group being studied. Common in ethnography. Rich data, but risks researcher influence. |
| Non-Participant Observation | The researcher observes from the outside without joining. Less influence on the setting. |
Field notes are the record of your observations. Good field notes include: what happened (description), when and where, the physical setting, participants' behaviors, direct quotes when possible, and your own reflections and interpretations (kept separate from description).
Existing Data
You do not always need to collect new data. Enormous public datasets are available free of charge:
- data.census.gov — U.S. Census data on demographics, income, education
- data.gov — U.S. government datasets across many agencies
- WHO Global Health Observatory — International health data
- World Bank Open Data — Economic and development data by country
- ICPSR (icpsr.umich.edu) — Thousands of social science datasets
Sampling: Who Do You Study?
You almost never study an entire population (everyone who fits your criteria). Instead, you study a sample — a subset — and use it to make inferences about the population.
| Sampling Method | How it works | Strengths / Weaknesses |
|---|---|---|
| Random Sampling | Every member of the population has an equal chance of being selected. | Gold standard for representativeness. Difficult to implement in practice. |
| Stratified Sampling | Divide population into subgroups (strata), then randomly sample from each. | Ensures representation of key subgroups. More complex to execute. |
| Convenience Sampling | Sample whoever is easiest to reach (classmates, volunteers, social media followers). | Easy and cheap. Often biased — not representative of the broader population. |
| Purposive Sampling | Intentionally select participants who meet specific criteria relevant to your question. | Appropriate for qualitative research. Not intended to be representative. |
📋 Real Example: Designing a Financial Stress Survey
Research question: "Do college students feel financially stressed, and does financial stress affect their academic performance?"
Here are 5 survey questions — one poorly written, four done well:
❌ Q1 (Poor): "Do you worry excessively about money all the time and does it affect your grades?"
Problem: Double-barreled (two topics), loaded language ("excessively").
✅ Q1 (Fixed): "How often do you worry about money? (Never / Rarely / Sometimes / Often / Always)"
✅ Q2: "In the past month, how often did financial concerns prevent you from focusing on studying? (Never / 1–2 times / 3–5 times / More than 5 times)"
✅ Q3: "On a scale of 1 (Not at all stressed) to 5 (Extremely stressed), how financially stressed do you feel right now?"
✅ Q4: "Have you ever missed a class or assignment deadline because you needed to work? (Yes / No)"
✅ Q5 (Open): "In your own words, describe how your financial situation has affected your experience as a student this semester." (Optional)
Notice: One idea per question. Neutral language. Balanced scales. The open question invites depth without forcing it.
🖐️ Your Turn
What you need: Pen and paper or a Google Doc. About 15–20 minutes.
Using your research question from Module 2, design a short 5-question survey. For each question:
- Write the question.
- Identify the type (closed / open / Likert scale).
- Explain briefly how you avoided one potential source of bias in that question.
Bonus: What sampling method would you use to recruit participants for your study? Who is your target population, and how would you reach a representative sample?
You will use this survey in the Module 8 capstone as your proposed data collection method.
🧠 Brain Break — 2 Minutes
Have you ever answered a survey that felt off?
Maybe the questions seemed leading, the response options didn't fit your experience, or you found yourself answering what you thought the researcher wanted to hear rather than the truth.
That feeling is your research intuition activating. Trust it — and design your own surveys so that others don't feel that way.
✅ Key Takeaways
- Good surveys have one idea per question, use neutral language, and avoid leading or double-barreled questions.
- Likert scales measure attitudes; open questions capture depth. Use both strategically.
- Interviews range from structured to unstructured. Semi-structured interviews are the most common in qualitative research.
- Random sampling gives the most representative results; convenience sampling is easy but often biased.
- A large sample size does not compensate for a biased sampling method.
- Never underestimate existing data — census data, government databases, and published datasets can answer many research questions without collecting a single new data point.
🎯 Module 5 Complete!
You now know how to design data collection. In Module 6, you will learn how to make sense of the data once you have it.
Continue to Module 6: Analyzing Your Data →