R Tutorials by : Er Karan Arora : Founder & CEO - ITRONIX SOLUTIONS

What is R Programming Language?

R is an open-source programming language that many data analysts, data scientists, statisticians utilize to analyze data and perform statistical analysis using graphs and other forms of visualizations.

  • With the help of R, one can perform various statistical operations.
  • You can obtain it for free from the website www.r-project.org.
  • It is driven by command lines.
  • Each command is executed when the user enters them into the prompt.
  • All the packages are available for free at the R project website called CRAN. It contains over 10,000 packages in R.

Introduction to R Programming

R is a programming language and software environment for Statistical Analysis, Graphics Representation and reporting. R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team. R is freely available under the GNU General Public License, and pre-compiled binary versions are provided for various operating systems like Linux, Windows and Mac. This programming language was named R, based on the first letter of first name of the two R authors (Robert Gentleman and Ross Ihaka). It is one of the most popular languages used by statisticians, data analysts, researchers and marketers to retrieve, clean, analyze, visualize and present data.

R is a programming language is widely used by data scientists and major corporations like Google, Airbnb, Facebook, Twitter, Microsoft, Uber etc. for data analysis. This is a complete course on R for beginners and covers basics to advance topics like machine learning algorithm, linear regression, time series, statistical inference etc.

Features of R Programming

  • R is a comprehensive programming language that provides support for procedural programming involving functions as well as object-oriented programming with generic functions.
  • There are more than 10,000 packages in the repository of R programming. With these packages, one can make use of functions to facilitate easier programming.
  • Being an interpreter based language, R produces a machine-independent code that is portable in nature. Furthermore, it facilitates easy debugging of errors in the code.
  • R facilitates complex operations with vectors, arrays, data frames as well as other data objects that have varying sizes.
  • R provides robust facilities for data handling and storage.

Advantages of R Programming

Various benefits of R language are mentioned below, which will help you to grasp the concept:

1. Open Source R is an open-source programming language. This means that anyone can work with R without any need for a license or a fee. Furthermore, you can contribute towards the development of R by customizing its packages, developing new ones and resolving issues.

2. Exemplary Support for Data Wrangling** R provides exemplary support for data wrangling. The packages like dplyr, readr are capable of transforming messy data into a structured form.

3. The Array of Packages R has a vast array of packages. With over 10,000 packages in the CRAN repository, the number is constantly growing. These packages appeal to all the areas of industry.

4. Quality Plotting and Graphing R facilitates quality plotting and graphing. The popular libraries like ggplot2 and plotly advocate for aesthetic and visually appealing graphs that set R apart from other programming languages.

5. Platform Independent R is a platform-independent language. It is a cross-platform programming language, meaning that it can be run quite easily on Windows, Linux, and Mac.

6. Machine Learning Operations R provides various facilities for carrying out machine learning operations like classification, regression and also provides features for developing artificial neural networks.

7. Statistics R is prominently known as the lingua franca of statistics. This is the main reason as to why R is dominant among other programming languages for developing statistical tools.

8. Continuously Growing R is a constantly evolving programming language. It is a state of the art that provides updates whenever any new feature is added.

Disadvantages of R Programming

1. Weak Origin R shares its origin with a much older programming language “S”. This means that it does not have support for dynamic or 3D graphics.

2. Data Handling In R, the physical memory stores the objects. This is in contrast with other languages like Python. Furthermore, R utilizes more memory as compared with Python. Also, R requires the entire data in one single place, that is, in the memory. Therefore, it is not an ideal option when dealing with Big Data.

3. Basic Security R lacks basic security. This feature is an essential part of most programming languages like Python. Because of this, there are several restrictions with R as it cannot be embedded into a web-application.

4. Complicated Language R is not an easy language to learn. It has a steep learning curve. Due to this, people who do not have prior programming experience may find it difficult to learn R.

5. Lesser Speed R packages and the R programming language is much slower than other languages like MATLAB and Python.

The algorithms in R are spread across different packages. Programmers without prior knowledge of packages may find it difficult to implement algorithms.

Real-Life Use Cases of R Language

R applications are not enough until you don’t know how people/companies are using the R programming language.

  • Facebook – Facebook uses R to update status and its social network graph. It is also used for predicting colleague interactions with R.
  • Ford Motor Company – Ford relies on Hadoop. It also relies on R for statistical analysis as well as carrying out data-driven support for decision making.
  • Google – Google uses R to calculate ROI on advertising campaigns and to predict the economic activity and also to improve the efficiency of online advertising.
  • Foursquare – R is an important stack behind Foursquare’s famed recommendation engine.
  • John Deere – Statisticians at John Deere use R for time series modeling and also geospatial analysis in a reliable and reproducible way. The results are then integrated with Excel and SAP.
  • Microsoft – Microsoft uses R for the Xbox matchmaking service and also as a statistical engine within the Azure ML framework.
  • Mozilla – It is the foundation behind the Firefox web browser and uses R to visualize web activity.
  • New York Times – R is used in the news cycle at The New York Times to crunch data and prepare graphics before they go for printing.
  • Thomas Cook – Thomas Cook uses R for prediction and also Fuzzy Logic Systems to automate price settings of their last-minute offers.
  • National Weather Service – The National Weather Service uses R at its River Forecast Centers. Thus, it is used to generate graphics for flood forecasting.
  • Twitter – R is part of Twitter’s Data Science toolbox for sophisticated statistical modeling.
  • Trulia – Trulia, the real-estate analysis website uses R for predicting house prices and local crime rates.
  • ANZ Bank – ANZ, the fourth largest bank in Australia uses R for its credit risk analysis.

Lets Start Coding in R

In [1]:
print("Hello World")
[1] "Hello World"

Comments in R

  • R does not support multi-line comments
In [2]:
# This is an example of Single Line Comment

Basics types

  • Logical :: TRUE, FALSE
  • Numeric :: 12.3, 5, 999
  • Integer :: 2L, 34L, 0L
  • Complex :: 3 + 2i
  • Character :: 'a' , '"good", "TRUE", '23.4'
  • Raw :: "Hello" is stored as 48 65 6c 6c 6f

3.15 is a decimal value called numerics. 4 is a natural value called integers. Integers are also numerics. TRUE or FALSE is a Boolean value called logical. The value inside " " or ' ' are text (string). They are called characters.

First way to declare a variable: use the <-

name_of_variable <- value

Second way to declare a variable: use the =

name_of_variable = value

In [19]:
# Numeric
x <- 28
print(x)
class(x)
[1] 28
'numeric'
In [17]:
# Numeric
x = 28
print(x)
class(x)
[1] 28
'numeric'
In [9]:
# String
y <- "Itronix Solutions"
print(y)
class(y)
[1] "Itronix Solutions"
'character'
In [10]:
# Boolean
z <- TRUE
print(z)
class(z)
[1] TRUE
'logical'
In [21]:
# Complex
v <- 2+5i
print(v)
print(class(v))
[1] 2+5i
[1] "complex"
In [23]:
# Raw
v <- charToRaw("Hello")
print(v)
print(class(v))
[1] 48 65 6c 6c 6f
[1] "raw"

Finding Variables

To know all the variables currently available in the workspace we use the ls() function. Also the ls() function can use patterns to match the variable names.

In [24]:
print(ls())
[1] "a" "v" "x" "y" "z"

Deleting Variables

Variables can be deleted by using the rm() function

In [29]:
a="hello"
print(a)
rm(a)
print(a)
[1] "hello"
Error in print(a): object 'a' not found
Traceback:

1. print(a)

Advance Print Methods

In [31]:
# We can use the print() function
print("Hello World!")

# Quotes can be suppressed in the output
print("Hello World!", quote = FALSE)

# If there are more than 1 item, we can concatenate using paste()
print(paste("How","are","you?"))
[1] "Hello World!"
[1] Hello World!
[1] "How are you?"
In [1]:
typeof(5)
typeof(45.L)
typeof('Itronix')
typeof(TRUE)
typeof(3+2i)
'double'
'integer'
'character'
'logical'
'complex'

Read input from user by using readline() function

In [37]:
a=readline()
print(a)
hello
[1] "hello"
In [40]:
a=readline(prompt="Enter name: ")
print(a)
print(class(a))
typeof(a)
Enter name: Itronix Solutions
[1] "Itronix Solutions"
[1] "character"
'character'
In [44]:
a=readline(prompt="Enter Integer: ")
b=readline(prompt="Enter Float: ")
c=readline(prompt="Enter String: ")
typeof(a)
typeof(b)
typeof(c)
Enter Integer: 34
Enter Float: 3.643
Enter String: Itronix Solutions
'character'
'character'
'character'

Convert Character to Integer

In [46]:
a=readline(prompt="Enter Integer: ")
a=as.integer(a)
print(a)
typeof(a)
Enter Integer: 34
[1] 34
'integer'
In [51]:
a=as.integer(readline(prompt="Enter Integer: "))
print(a)
typeof(a)
Enter Integer: 12121
[1] 12121
'integer'
In [ ]: