Module 1 Study Guide
Introduction to Statistics & Data
Free Statistics Learning Platform • Safaa Dabagh
1. What is Statistics?
Statistics: The science of collecting, organizing, analyzing, interpreting, and presenting data to help make informed decisions based on evidence.
Two Branches of Statistics
Descriptive Statistics
Purpose: Summarize and describe data you've already collected
What it does: Uses numbers, graphs, and tables to paint a picture of your data
Example: "The average test score in our class was 82%"
Think of it as: "What happened?"
Inferential Statistics
Purpose: Make predictions or generalizations about a larger population based on a sample
What it does: Uses probability and mathematical techniques to draw conclusions beyond your data
Example: "Based on polling 1,000 voters, we predict 52% of all voters will support Candidate A"
Think of it as: "What can we conclude or predict?"
CRITICAL CONCEPT: Correlation ≠ Causation
Just because two things are related doesn't mean one causes the other!
Just because two things are related doesn't mean one causes the other!
2. Types of Data
Main Categories
| Quantitative (Numerical) | Qualitative (Categorical) |
|---|---|
| Data measured with numbers Can do math operations |
Data that describes characteristics Categories/labels |
| Examples: Height, age, weight, temperature, test scores | Examples: Eye color, gender, major, zip code, car type |
Quantitative Data Subtypes
| Discrete | Continuous |
|---|---|
| Only specific values (usually whole numbers) You count it |
Any value in a range (including decimals) You measure it |
| Examples: Number of students (25, 26, 27), dice rolls, cars in lot | Examples: Height (5'8.2"), weight (150.3 lbs), temperature (72.5°F) |
Levels of Measurement
| Level | Description | Examples |
|---|---|---|
| Nominal | Categories with no order | Eye color, zip codes, student ID |
| Ordinal | Categories with meaningful order, but unequal gaps | Class rank (1st, 2nd, 3rd), satisfaction (low, medium, high) |
| Interval | Equal intervals, NO true zero | Temperature (°F, °C), IQ scores, calendar years |
| Ratio | Equal intervals + true zero | Height, weight, age, income (zero = none) |
3. Data Collection Methods
Three Main Methods
| Method | Description | Can Establish Causation? |
|---|---|---|
| Observational Study | Observe without interfering | No - only association |
| Survey | Ask people questions | No - only association |
| Experiment | Manipulate variables with random assignment | YES - can show causation! |
Only EXPERIMENTS with random assignment can establish cause-and-effect relationships!
Sampling Methods
| Method | Description | Quality |
|---|---|---|
| Simple Random | Every member has equal chance of selection | Good |
| Stratified | Divide into groups, randomly sample from each | Good |
| Cluster | Randomly select entire groups | Good |
| Systematic | Select every kth member | Usually good |
| Convenience | Sample whoever is easy to reach | BIASED |
| Voluntary Response | People self-select to participate | VERY BIASED |
Key Terms
Population: The entire group you want to study
Sample: A subset of the population that you actually collect data from
Parameter: A numerical summary of the population (usually unknown)
Statistic: A numerical summary of the sample (what we calculate)
4. Data Visualization
Choosing the Right Graph
| Graph Type | Data Type | Purpose |
|---|---|---|
| Bar Chart | Categorical | Compare categories (bars have gaps) |
| Histogram | Continuous numerical | Show distribution (bars touch!) |
| Pie Chart | Categorical (percentages) | Show parts of whole (adds to 100%) |
| Line Graph | Time series | Show trends over time |
| Scatterplot | Two quantitative variables | Show relationship between variables |
Bar Chart vs. Histogram:
• Bar charts: Bars have GAPS (categorical data)
• Histograms: Bars TOUCH (continuous data)
• Bar charts: Bars have GAPS (categorical data)
• Histograms: Bars TOUCH (continuous data)
Misleading Graph Techniques (Watch Out!)
- Truncated Y-Axis: Starting bar chart above zero exaggerates differences
- Cherry-Picking Time Periods: Showing only part of data to tell specific story
- 3D Effects: Distorting proportions with unnecessary visual effects
- Inappropriate Graph Type: Using wrong graph makes comparisons impossible
- Dual Y-Axes Manipulation: Scaling axes to create false correlations
Best Practices
- Start bar charts at zero
- Label everything clearly (axes, units, title, legend)
- Use appropriate graph type for your data
- Keep it simple - avoid unnecessary decorations
- Show full context - don't cherry-pick
- Include data source and sample size
Quick Reference: Key Formulas & Concepts
Mean (Average)
Mean = (Sum of all values) ÷ (Number of values)
Important Reminders
1. Correlation ≠ Causation
2. Only experiments can establish causation
3. Convenience and voluntary response samples are biased
4. Bar chart bars have gaps; histogram bars touch
5. Always question: "How was this data collected?"
2. Only experiments can establish causation
3. Convenience and voluntary response samples are biased
4. Bar chart bars have gaps; histogram bars touch
5. Always question: "How was this data collected?"
Module 1: Introduction to Statistics & Data
Free Statistics Learning Platform • safaa dabagh • sdabagh.github.io
© 2025 • Part of UCLA Dissertation Research