Histograms in R, explained. Part I.

Having a numerical data set, a histogram is a data representation
from distribution point of view. Numerical data set is devided in
range of values first. Histogram graphic will show on abscissa (x axis)
the range and on the y axis how many values are in respective range.

Histogram in R can be build with function hist.

hist(x, …)
# S3 method for default
hist(x, breaks = "Sturges",
     freq = NULL, probability = !freq,
     include.lowest = TRUE, right = TRUE,
     density = NULL, angle = 45, col = NULL, border = NULL,
     main = paste("Histogram of" , xname),
     xlim = range(breaks), ylim = NULL,
     xlab = xname, ylab,
     axes = TRUE, plot = TRUE, labels = FALSE,
     nclass = NULL, warn.unused = TRUE, …)

Most intresting thing from this syntax is that for hist function, is enough
to have just one argument, x argument (a vector of values) and it will do something.
In hist(x, …) means it need x, and optional “…” means other arguments.
Below code:

x <-  c(7,9,12,21,5,35,31,22,14,42,37,33,29)
hist(x)

will generate:

From this result that, hist automatically devide data in 5 ranges.
For each range it calculate frequency:

  • for 0-10 frequency=3, there are 3 numbers in this range (7,9,5)
  • for 10-20 frequency=2, there are 2 numbers in this range (12,14)
  • etc

In contradict with this simplicity, syntax end with ” …) ” , which means it still can accept more arguments, usual graphic.
X axis label is automatically named based on input vector name
Y axis label is automatically named “Frequency”
Related to this below code specify explicitly axis labels:

x <-  c(7,9,12,21,5,35,31,22,14,42,37,33,29)
hist(x, xlab = "Years", ylab="Number of entrepreneurs" )

This will generate:

Whatever is the number of values in input vector x, the histogram will have only 5 labels automatically generated on x axis, for example below code:

x <-  c(7,9,12,21,5,35,31,22,14,42,37,33,29,53,76,87,82)
hist(x)

We see there is still 5 x axis labels, but values are divided now in 10 ranges.

Other example related to ranges and number of x axis labels autogenerated, let’s run code:

x <-  c(7,9,12,21,5,35,31,22,14,42,37,33,29,53,76,87,82, 123,435,562,577,788,987,675,561,998,889,8)
hist(x)

This will generate:

This time we have 5 x axis labels and also 5 ranges, despite of fact that we have many more input values in x vector.

Installing R version 4.0.2 on Windows

  • Go to https://www.r-project.org/
  • Click on link “download R” or “CRAN mirror
  • Click on a link for appropriate mirror
  • In “Download and Install R” section click link “Download R for Windows
  • Under “Subdirectories:” click on link “base
  • Click on link “Download R 4.0.2 for Windows “, save file “R-4.0.2-win.exe” on PC
  • Double click on “R-4.0.2-win.exe
  • Select language, for example English
  • Read GNU General Public License, click Next
  • Select Destination Location, for example C:\Program Files\R\R-4.0.2 then click Next
  • Select Components to install, as my PC is 64 bits, I will choose “Core Files“, “64-bit Files“, “Message translations“. I don’t checked “32-bit Files” as for my use on 64 bits PC is ok “64-bit Files“. Click Next
  • For “Startup options” choose for the moment No(accept defaults) and click Next
  • For “Start Menu Folder” accept “R”, click Next
  • You can “Select Additional Tasks” like below
  • At the install end click Finish

To start R:

  • click on schortcut “R x64 4.0.2” from desktop
  • or click Start > R > “R x64 4.0.2”