Histograms in R, explained. Part I.

Having a numerical data set, a histogram is a data representation
from distribution point of view. Numerical data set is devided in
range of values first. Histogram graphic will show on abscissa (x axis)
the range and on the y axis how many values are in respective range.

Histogram in R can be build with function hist.

hist(x, …)
# S3 method for default
hist(x, breaks = "Sturges",
     freq = NULL, probability = !freq,
     include.lowest = TRUE, right = TRUE,
     density = NULL, angle = 45, col = NULL, border = NULL,
     main = paste("Histogram of" , xname),
     xlim = range(breaks), ylim = NULL,
     xlab = xname, ylab,
     axes = TRUE, plot = TRUE, labels = FALSE,
     nclass = NULL, warn.unused = TRUE, …)

Most intresting thing from this syntax is that for hist function, is enough
to have just one argument, x argument (a vector of values) and it will do something.
In hist(x, …) means it need x, and optional “…” means other arguments.
Below code:

x <-  c(7,9,12,21,5,35,31,22,14,42,37,33,29)
hist(x)

will generate:

From this result that, hist automatically devide data in 5 ranges.
For each range it calculate frequency:

  • for 0-10 frequency=3, there are 3 numbers in this range (7,9,5)
  • for 10-20 frequency=2, there are 2 numbers in this range (12,14)
  • etc

In contradict with this simplicity, syntax end with ” …) ” , which means it still can accept more arguments, usual graphic.
X axis label is automatically named based on input vector name
Y axis label is automatically named “Frequency”
Related to this below code specify explicitly axis labels:

x <-  c(7,9,12,21,5,35,31,22,14,42,37,33,29)
hist(x, xlab = "Years", ylab="Number of entrepreneurs" )

This will generate:

Whatever is the number of values in input vector x, the histogram will have only 5 labels automatically generated on x axis, for example below code:

x <-  c(7,9,12,21,5,35,31,22,14,42,37,33,29,53,76,87,82)
hist(x)

We see there is still 5 x axis labels, but values are divided now in 10 ranges.

Other example related to ranges and number of x axis labels autogenerated, let’s run code:

x <-  c(7,9,12,21,5,35,31,22,14,42,37,33,29,53,76,87,82, 123,435,562,577,788,987,675,561,998,889,8)
hist(x)

This will generate:

This time we have 5 x axis labels and also 5 ranges, despite of fact that we have many more input values in x vector.