Wednesday, February 25, 2026

Histograms

I. Intro

A histogram is a specific type of data visualization that shows the distribution of a continuous variable. While it looks similar to a bar chart, it serves a very different purpose: instead of comparing discrete categories (like "Apples" vs. "Oranges"), it groups numerical data into ranges called bins. It is essentially a visual representation of frequency table for continuous data. 


The Horizontal Axis (x-axis): The Bins
This shows the variable you measured. It is what the data represents (for example, test scores, heights, ages) and is divided into contiguous groups called classes, intervals or bins that cover the range of the data. These bins are graphically shown as adjacent bars. Usually, every bar is equal (e.g., intervals of 10 units). The bars touch because there is no space between the end of one numerical range and the start of the next (note that the bars in a bar chart usually do not touch). 

The Vertical Axis (y-axis): The Frequency
The y-axis represents the count (or frequency). It tells you how many data points from your set fall into each specific bin. This can represent Frequency (total count) or Relative Frequency (the percentage of the total). The shape of the graph remains identical regardless of which you choose.
  • Formula: Relative Frequency = Frequency/Total Number of Data Values 


II. Constructing a Histogram

1. Sort Data 
  • Sort your data in ascending order. This identifies the range and makes counting easier. 
2. Define Bins
  • Decide how many bins. Depending on the actual data distribution and the goals of the analysis, different bin widths may be appropriate. There are various useful guidelines and rules of thumb that we will go over below.
  •  Once you know the number of bins, you need to determine the bin width. To ensure your bars are consistent, calculate the width of each interval: 
    • Bid Width = Maximum Value - Minimum Value/Number of Bins. Adjust as needed. 
3. Create a Frequency Table
  • List your intervals and tally how many data points fall into each. Add relative frequency column if your histogram is displaying relative frequency. 

4. Draw and Label the Axes
  • Horizontal Axis (X-axis): Label this with what the data represents (e.g., "Height in Inches" or "Test Scores") and mark the bin boundaries.
  • Vertical Axis (Y-axis): Label this as "Frequency" or "Relative Frequency." Ensure the scale starts at zero to avoid distorting the data.
5. Draw the Bars
  • The Height: Matches the frequency of that bin. 
  • The Width: Spans the entire interval on the X-axis

Example:
A manufacturer of AAA batteries had their quality control department test the lifespan of their batteries. Forty-two batteries were randomly selected and tested, with the number of hours they lasted listed below.

The Raw Data Set: 
108 125 137 110 167 158 142
168 163 121 134 146 135 163
148 153 169 154 156 142 160
147 119 124 145 167 161 155
138 126 149 168 151 129 157
115 124 165 152 159 144 163

Step 1: Sort Data

First, we take the scattered numbers and arrange them in ascending order. This allows us to easily find the Minimum (108) and Maximum (169) values to determine the spread. 

Sorted
108, 110, 115, 119, 121, 124, 124, 125, 126, 129, 134, 135, 137, 138, 142, 142, 144, 145, 146, 147, 148, 149, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 163, 163, 163, 165, 167, 167, 168, 168, 169


Step 2: Define Bins

We need to group these into intervals (bins). Let's aim for 7 bins.

Range = Largest value - Smallest value = 169 - 108 = 61

Bin Width: 61 / 7 = 8.7

To keep it simple for the graph, we’ll round the bin width to 10 and start our first bin at 100.

Step 3: Create a Frequency Table

Now we tally the sorted data into our defined intervals .





Step 4: Draw and Label the Axes

  • Horizontal Axis (x axis): Label as "Time (Hours) and mark the bin boundaries (100 to 170).
  • Vertical Axis (y axis): Label as "Number of Batteries" and add the range (0 to 12).

Step 5: Draw the Bars

Each bar is drawn to its specific frequency. Notice that the bars touch each other, indicating a continuous scale of data. The highest frequency occurs in the 160–169 bin with 11 values.











Gemini AI

No comments:

Post a Comment