Having a numerical data set, a histogram is a data representation
from distribution point of view. Numerical data set is devided in
range of values first. Histogram graphic will show on abscissa (x axis)
the range and on the y axis how many values are in respective range.
Histogram in R can be build with function hist.
hist(x, …)
# S3 method for default
hist(x, breaks = "Sturges",
freq = NULL, probability = !freq,
include.lowest = TRUE, right = TRUE,
density = NULL, angle = 45, col = NULL, border = NULL,
main = paste("Histogram of" , xname),
xlim = range(breaks), ylim = NULL,
xlab = xname, ylab,
axes = TRUE, plot = TRUE, labels = FALSE,
nclass = NULL, warn.unused = TRUE, …)
Most intresting thing from this syntax is that for hist function, is enough
to have just one argument, x argument (a vector of values) and it will do something.
In hist(x, …) means it need x, and optional “…” means other arguments.
Below code:
x <- c(7,9,12,21,5,35,31,22,14,42,37,33,29)
hist(x)
will generate:
From this result that, hist automatically devide data in 5 ranges.
For each range it calculate frequency:
- for 0-10 frequency=3, there are 3 numbers in this range (7,9,5)
- for 10-20 frequency=2, there are 2 numbers in this range (12,14)
- etc
In contradict with this simplicity, syntax end with ” …) ” , which means it still can accept more arguments, usual graphic.
X axis label is automatically named based on input vector name
Y axis label is automatically named “Frequency”
Related to this below code specify explicitly axis labels:
x <- c(7,9,12,21,5,35,31,22,14,42,37,33,29)
hist(x, xlab = "Years", ylab="Number of entrepreneurs" )
This will generate:
Whatever is the number of values in input vector x, the histogram will have only 5 labels automatically generated on x axis, for example below code:
x <- c(7,9,12,21,5,35,31,22,14,42,37,33,29,53,76,87,82)
hist(x)
We see there is still 5 x axis labels, but values are divided now in 10 ranges.
Other example related to ranges and number of x axis labels autogenerated, let’s run code:
x <- c(7,9,12,21,5,35,31,22,14,42,37,33,29,53,76,87,82, 123,435,562,577,788,987,675,561,998,889,8)
hist(x)
This will generate:
This time we have 5 x axis labels and also 5 ranges, despite of fact that we have many more input values in x vector.