Using the statistical package ‘R’ for data analysis

I have recently taken some online courses in the statistical package called ‘R’ that is freely available in the public domain. After a little effort, I was able to instal this on my Macbook. It creates beautiful plots and is a very versatile and powerful aid to understanding big data.

After installation of R Studio, and the installation of the necessary packages and libraries, I was able to get to work in understanding it. I decided to take the free online courses on edX for a start. Data Science R Basics is an introduction to the nuts and bolts of R including data frames, data types and function calls, conditionals and for loops. Data Science: Visualization is an introduction to ggplot and its features and explains how to select and summarize data from a data frame according to logical conditions.

I thoroughly recommend the courses by edX as they track your progress unit by unit and include ideos and explanatory material.

Screenshot 2019-02-03 at 01.09.53
A scatter plot of life expectancy verses fertility by region
Screenshot 2019-02-03 at 01.10.28
Stacked density plot of income by region for 1970 and 2010.

Above are some sample plots extracted from data sets containing life expectancy, fertility and income statistics. Such data sets are available as add-ons free of charge as is everything with R.

Here is one plot that I made after a few days studying the online material.

Screenshot 2019-02-03 at 02.06.10
The heights of boys and girls in the USA by age – a stacked density plot using the NHANES data.