Loading a .csv file in R is quite easy.
All you need to do is use the “read.csv()” function and specify the path of the file.
house<-read.csv(“C://house.csv”)
1. Data layer
2.Aesthetics layer
3. Geometry layer
4. Facet layer
5. Coordinate layer
6. Themes layer
RMarkdown is a reporting tool provided by R. With the help of Rmarkdown, you can create high-quality reports of your R code.
The output format of Rmarkdown can be:
1. HTML
2. PDF
3. WORD
1. MICE
2. Amelia
3. missForest
4. Hmisc
5. Mi
6. imputeR
7. Name some functions available in “dplyr” package.
8. filter
9. select
10 .mutate
11. arrange
12. count
Ans) Shiny is an R package that makes it easy to build interactive web apps straight from R. You can host standalone apps on a webpage or embed them in Rmarkdown documents or build dashboards. You can also extend your Shiny apps with CSS themes, htmlwidgets, and JavaScript actions.
Some packages used for data mining in R:
1. data.table- provides a fast reading of large files
2. rpart and caret- for machine learning models.
3. Arules- for association rule learning.
4. GGplot- provides various data visualization plots.
5. tm- to perform text mining.
6. Forecast- provides functions for time series analysis
Answer)Rattle is a popular GUI for data mining using R. It presents statistical and visual summaries of data, transforms data so that it can be readily modeled, builds both unsupervised and supervised machine learning models from the data, presents the performance of models graphically, and scores new datasets for deployment into production. A key feature is that all of your interactions through the graphical user interface are captured as an R script that can be readily executed in R independently of the Rattle interface.
Answer)
1. traceback()
2. debug()
3. browser()
4. trace()
5. recover()
Answer) This should be an easy one for data science job applicants. R is an open-source language and environment for statistical computing and analysis, or for our purposes, data science.
Answer) Again, this is an easy—but crucial—one to nail. For the most part, this can be demonstrated through any other code you might write for other R interview questions, but sometimes this is asked as a standalone. Some of the basic syntax for R that’s used most often might include:
# — as in many other languages, # can be used to introduce a line of comments. This tells the compiler not to process the line, so it can be used to make code more readable by reminding future inspectors what blocks of code are intended to do.
“” — quotes operate as one might expect; they denote a string data type in R.
<- — one of the quirks of R, the assignment operator is <- rather than the relatively more familiar use of =. This is an essential thing for those using R to know, so it would be good to display your knowledge of it if the question comes up.
\ — the backslash, or reverse virgule, is the escape character in R. An escape character is used to “escape” (or ignore) the special meaning of certain characters in R and, instead, treat them literally.
Answer) It’s important to be familiar with the advantages and disadvantages of certain languages and ecosystems. R is no exception.
Its open-source nature. This qualifies as both an advantage and disadvantage for various reasons, but being open source means it’s widely accessible, free to use, and extensible.
Its package ecosystem. The built-in functionality available via R packages means you don’t have to spend a ton of time reinventing the wheel as a data scientist.
Its graphical and statistical aptitude. By many people’s accounts, R’s graphing capabilities are unmatched.
Answer) Just as you should know what R does well, you should understand its failings.
Memory and performance.
In comparison to Python, R is often said to be the lesser language in terms of memory and performance.
This is disputable, and many think it’s no longer relevant as 64-bit systems dominate the marketplace.
Related: Our list of Python Interview Questions and Answers
Open-source. Being open-source has its disadvantages as well as its advantages. For one, there’s no governing body managing R, so there’s no single source for support or quality control. This also means that sometimes the packages developed for R are not the highest quality.
Security. R was not built with security in mind, so it must rely on external resources to mind these gaps.
Answer) In just about an interview for a position that involves coding, companies will ask you to accomplish a specific task by actually writing code. Facebook and Google both do as much. Because it’s difficult to predict what task an interviewer will set you to, just be prepared to write “whiteboard code” on the fly
Answer) This is another good opportunity to show that you know R, and you’re not winging it. Unlike other object-oriented languages such as C, R doesn’t ask users to declare a data type when assigning a variable. Instead, everything in R correlates to an R data object. When you assign a variable in R, you assign it a data object and that object’s data type determines the data type of the variable. The most commonly used data objects include:
1. Vectors
2. Matrices
3. Lists
4. Arrays
5. Factors
6. Data frames
Answer) This question is meant to gather a sense of your experiences in R. Simply think about some recent work you’ve done in R and explain the data objects you use most often. If you use arrays frequently, explain why and how you’ve used them.
Answer) This is a variant of the “advantages of R” question. Reasons to use R include its open-source nature and the fact that it’s a versatile tool for statistical plotting, analysis, and portrayal. Don’t be afraid to give some personal reasons as well. Maybe you simply love the assignment operator in R or feel that it’s more elegant than other languages—but always remember to explicate. You should be answering follow-up questions before they’re even asked.
Answer) As a user of R, you should be able to come up with some functions on the spot and describe them. Functions that save time and, as a result, the money will always be something an interviewer likes to hear about.
Answer) A factor variable is a form of the categorical variable that accepts either numeric or character string values. The most salient reason to use a factor variable is that it can be used in statistical modeling with great accuracy. Another reason is that they are more memory efficient.
Simply use the factor() function to create a factor variable
Answer) The Factor data objects in R are used to store and process categorical data in R.
Answer) The command getwd() gives the current working directory in the R environment.
Answer) A valid variable name consists of letters, numbers and the dot or underline characters. The variable name starts with a letter or the dot not followed by a number.
Answer) A matrix is always two dimensional as it has only rows and columns. But an array can be of any number of dimensions and each dimension is a matrix. For example, a 3x3x2 array represents 2 matrices each of dimension 3×3.
Answer) The Factor data objects in R are used to store and process categorical data in R
Answer) When two vectors of different lengths are involved in operation then the elements of the shorter vector are reused to complete the operation. This is called element recycling. Example – v1 <- c(4,1,0,6) and V2 <- c(2,4) then v1*v2 gives (8,4,0,24). The elements 2 and 4 are repeated
Answer) The lazy evaluation of a function means, the argument is evaluated only if it is used inside the body of the function. If there is no reference to the argument in the body of the function then it is simply ignored.
Answer) The package named “XML” is used to read and process the XML files.
Answer) The general expression to create a matrix in R is – matrix(data, nrow, ncol, byrow, dimnames)
Answer) In R the data objects can be converted from one form to another. For example, we can create a data frame by merging many lists. This involves a series of R commands to bring the data into the new format. This is called data reshaping.
Answer) It converts a list to a vector.
Answer) Using the function as.data.frame()
Answer) It is used to apply the same function to each of the elements in an Array. For example, finding the mean of the rows in every row.
Answer) ?NA
Answer) sd(x, na.rm=TRUE)
Answer) setwd(“Path”)
Answer) “%%” gives the remainder of the division of the first vector with second while “%/%” gives the quotient of the division of the first vector with the second.
Answer) Find the column has the maximum value for each row.
Answer) hist()
Answer) rm(x)
Answer) data(package = “MASS”)
Answer) data(package = .packages(all.available = TRUE))
Ans) It is used to install an r package from a local directory by browsing and selecting the file.
Ans) The “next” statement in R programming language is useful when we want to skip the current iteration of a loop without terminating it.
Two vectors X and Y are defined as follows – X <- c(3, 2, 4) and Y <- c(1, 2).
Ans) In R language when the vectors have different lengths, the multiplication begins with the smaller vector and continues till all the elements in the larger vector have been multiplied.
The output of the above code will be –
Z <- (3, 4, 4)
Answer) The CRAN package ecosystem has more than 6000 packages. The best way for beginners to answer this question is to mention that they would look for a package that follows good software development principles. The next thing would be to look for user reviews and find out if other data scientists or analysts have been able to solve a similar problem.
Answer) Transpose t () is the easiest method for reshaping the data before analysis.
Answer) With () function is used to apply an expression for a given dataset and BY () function is used for applying a function each level of factors.
dplyr package is used to speed up the data frame management code. Which package can be integrated with dplyr for large fast tables?
Answer) data.table