{"id":3475,"date":"2019-11-18T09:29:23","date_gmt":"2019-11-18T09:29:23","guid":{"rendered":"https:\/\/prwatech.in\/blog\/?p=3475"},"modified":"2023-07-20T05:26:48","modified_gmt":"2023-07-20T05:26:48","slug":"machine-learning-interview-questions-answers-2","status":"publish","type":"post","link":"https:\/\/prwatech.in\/blog\/interview-questions\/machine-learning-interview-questions-answers-2\/","title":{"rendered":"Top 50 Machine Learning Interview Questions and Answers"},"content":{"rendered":"<p>&nbsp;<\/p>\n<h1>Top 50 Machine Learning Interview Questions and Answers<\/h1>\n<p>&nbsp;<\/p>\n<h3>Q1) You are given a train data set having 1000 columns and 1 million rows. The data set is based on a classification problem. Your manager has asked you to reduce the dimension of this data so that model computation time can be reduced. Your machine has memory constraints. What would you do?<\/h3>\n<p><strong>Answer:<\/strong> Processing a high dimensional data on a limited memory machine is a strenuous task, your interviewer would be fully aware of that. The following are the methods you can use to tackle.<\/p>\n<p>such a situation:<br \/>\nSince we are having low RAM, we should close all other applications in our machine, including the web browser, so that most of the memory can be put to use.<br \/>\nWe can randomly sample the data set. This means we can create a smaller data set, let\u2019s say, having 1000 variables and 300000 rows and do the computations.<br \/>\nTo reduce dimensionality, we can separate the numerical and categorical variables and remove the correlated variables. For numerical variables, we\u2019ll use correlation. For categorical variables, we\u2019ll use the chi-square test.<br \/>\nAlso, we can use and pick the components which can explain the maximum variance in the data set.<br \/>\nUsing online learning algorithms like Vowpal Wabbit (available in Python) is a possible option.<br \/>\nBuilding a linear model using Stochastic Gradient Descent is also helpful.<br \/>\nWe can also apply our business understanding to estimate which all predictors can impact the response variable. But, this is an intuitive approach, failing to identify useful predictors might result in a significant loss of information.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q2. Is rotation necessary in PCA? If yes, Why? What will happen if you don\u2019t rotate the components?<\/h3>\n<p><strong>Answer:\u00a0<\/strong>Yes, rotation (orthogonal) is necessary because it maximizes the difference between variance captured by the component. This makes the components easier to interpret. Not to forget, that\u2019s the motive of doing PCA where we aim to select fewer components (than features) which can explain the maximum variance in the data set. By doing rotation, the relative location of the components doesn\u2019t change, it only changes the actual coordinates of the\u00a0points.<br \/>\nIf we don\u2019t rotate the components, the effect of PCA will diminish and we\u2019ll have to select the number of components to explain variance in the data set.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q3. You are given a data set. The data set has missing values that spread along 1 standard deviation from the median. What percentage of data would remain unaffected? Why?<\/h3>\n<p><strong>Answer:<\/strong> This question has enough hints for you to start thinking! Since the data is spread across the median, let\u2019s assume it\u2019s a normal distribution. We know, in a normal distribution, ~68% of the data lies in 1 standard deviation from mean (or mode, median), which leaves ~32% of the data unaffected. Therefore, ~32% of the data would remain unaffected by missing values.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q4. You are given a data set on cancer detection. You\u2019ve built a classification model and achieved an accuracy of 96%. Why shouldn\u2019t you be happy with your model performance? What can you do about it?<\/h3>\n<p><strong>Answer:<\/strong> If you have worked on <a href=\"https:\/\/www.youtube.com\/watch?v=8lfBjNUq6Ow&amp;t=825s\">enough data sets<\/a>, you should deduce that cancer detection results in imbalanced data. In an imbalanced data set, accuracy should not be used as a measure of performance because 96% (as given) might only be predicting majority class correctly, but our class of interest is minority class (4%) which is the people who actually got diagnosed with cancer. Hence, in order to evaluate model performance, we should use Sensitivity (True Positive Rate), Specificity (True Negative Rate), F measure to determine the class-wise performance of the classifier. If the minority class performance is found to be poor, we can undertake the following steps:<br \/>\nWe can use undersampling, oversampling or SMOTE to make the data balanced.<br \/>\nWe can alter the prediction threshold value by doing and finding an optimal threshold using the AUC-ROC curve.<br \/>\nWe can assign a weight to classes such that the minority classes get larger weight.<br \/>\nWe can also use anomaly detection.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q5. Why is naive Bayes so \u2018naive\u2019?<\/h3>\n<p><strong>Answer:<\/strong> naive Bayes is so \u2018naive\u2019 because it assumes that all of the features in a data set are equally important and independent. As we know, these assumptions are rarely true in a real-world scenario.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q6. Explain prior probability, likelihood and marginal likelihood in the context of naiveBayes algorithm?<\/h3>\n<p><strong>Answer:<\/strong> Prior probability is nothing but, the proportion of dependent (binary) variable in the data set. It is the closest guess you can make about a class, without any further information.<br \/>\nFor example: In a data set, the dependent variable is binary (1 and 0). The proportion of 1 (spam) is 70% and 0 (not spam) is 30%. Hence, we can estimate that there are 70% chances that any new email would be classified as spam.<br \/>\nThe likelihood is the probability of classifying a given observation as 1 in the presence of some other variable.<br \/>\nFor example, the probability that the word \u2018FREE\u2019 is used in the previous spam message is a likelihood. The marginal likelihood is the probability that the word \u2018FREE\u2019 is used in any message.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q7. You are working on a time series data set. Your manager has asked you to build a high accuracy model. You start with the decision tree algorithm since you know it works fairly well on all kinds of data. Later, you tried a time series regression model and got higher accuracy than the decision tree model. Can this happen? Why?<\/h3>\n<p><strong>Answer:<\/strong> Time series data is known to possess linearity. On the other hand, a decision tree algorithm is known to work best to detect non \u2013 linear interactions. The reason why the decision tree failed to provide robust predictions because it couldn\u2019t map the linear relationship as good as a regression model did. Therefore, we learned that a linear regression model can provide robust prediction given the data set satisfies its linearity assumptions<\/p>\n<p>&nbsp;<\/p>\n<h3>Q8. You are assigned a new project which involves helping a food delivery company to save more money. The problem is, the company\u2019s delivery team isn\u2019t able to deliver food on time. As a result, their customers get unhappy. And, to keep them happy, they end up delivering food for free. Which machine learning algorithm can save them?<\/h3>\n<p><strong>Answer:<\/strong> You might have started hopping through the list of ML algorithms in your mind. But, wait! Such questions are asked to test your machine learning fundamentals. This is not a machine learning problem. This is a route optimization problem. A machine learning problem consists of three things:<br \/>\n1. There exist a pattern.<br \/>\n2. You cannot solve it mathematically (even by writing exponential equations).<br \/>\n3. You have data on it.<br \/>\nAlways look for these three factors to decide if machine learning is a tool to solve a particular problem.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q9. You came to know that your model is suffering from low bias and high variance. Which algorithm should you use to tackle it? Why?<\/h3>\n<p><strong>Answer:<\/strong> Low bias occurs when the model\u2019s predicted values are near to actual values. In other words, the model becomes flexible enough to mimic the training data distribution. While it sounds like a great achievement, but not to forget, a flexible model has no generalization capabilities. It means, when this model is tested on unseen data, it gives disappointing results.<br \/>\nIn such situations, we can use the bagging algorithm (like random forest) to tackle high variance problems. Bagging algorithms divide a data set into subsets made with repeated randomized sampling. Then, these samples are used to generate a set of models using a single learning algorithm. Later, the model predictions are combined using voting (classification) or averaging (regression).<br \/>\nAlso, to combat high variance, we can:<br \/>\nUse the regularization techniques, where higher model coefficients get penalized, hence lowering model complexity.<br \/>\nUse top n features from the variable importance chart. Maybe, with all the variables in the data set, the algorithm is having difficulty in finding a meaningful signal.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q10. You are given a data set. The data set contains many variables, some of which are highly correlated and you know about it. Your manager has asked you to run PCA. Would you remove correlated variables first? Why?<\/h3>\n<p><strong>Answer:<\/strong> Chances are, you might be tempted to say No, but that would be incorrect. Discarding correlated variables have a substantial effect on PCA because, in the presence of correlated variables, the variance explained by a particular component gets inflated.<br \/>\nFor example, You have 3 variables in a data set, of which 2 are correlated. If you run PCA on this data set, the first principal component would exhibit twice the variance than it would exhibit with uncorrelated variables. Also, adding correlated variables lets PCA put more importance on those variables, which is misleading.<br \/>\n&nbsp;<\/p>\n<h1>Top 50 Machine Learning Interview Questions and Answers<\/h1>\n<p>&nbsp;<br \/>\n&nbsp;<\/p>\n<h3>Q11. After spending several hours, you are now anxious to build a high accuracy model. As a result, you build 5 GBM models, thinking a boosting algorithm would do the magic. Unfortunately, neither of the models could perform better than the benchmark score. Finally, you decided to combine those models. Though ensembled models are known to return high accuracy, you are unfortunate. Where did you miss it?<\/h3>\n<p>Answer: As we know, ensemble learners are based on the idea of combining weak learners to create strong learners. But, these learners provide superior results when the combined models are uncorrelated. Since we have used 5 GBM models and got no accuracy improvement, it suggests that the models are correlated. The problem with correlated models is, all the models provide the same information<br \/>\nFor example: If model 1 has classified User1122 as 1, there are high chances model 2 and model 3 would have done the same, even if its actual value is 0. Therefore, ensemble learners are built over the premise of combining weak uncorrelated models to obtain better predictions.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q12. How is kNN different from kmeans clustering?<\/h3>\n<p><strong>Answer:<\/strong> Don\u2019t get mislead by \u2018k\u2019 in their names. You should know that the fundamental difference between both these algorithms is, kmeans is unsupervised in nature and kNN is supervised in nature. kmeans is a clustering algorithm. kNN is a classification (or regression) algorithm.<br \/>\nkmeans algorithm partitions a data set into clusters such that a cluster formed is homogeneous and the points in each cluster are close to each other. The algorithm tries to maintain enough separability between these clusters. Due to unsupervised nature, the clusters have no labels. kNN algorithm tries to classify an unlabeled observation based on its k (can be any number ) surrounding neighbors. It is also known as a lazy learner because it involves minimal training of the model. Hence, it doesn\u2019t use training data to make a generalization on the unseen data sets.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q13. How is True Positive Rate and Recall related? Write the equation?<\/h3>\n<p><strong>Answer:<\/strong> True Positive Rate = Recall. Yes, they are equal having the formula (TP\/TP + FN).<\/p>\n<p>&nbsp;<\/p>\n<h3>Q14. You have built a multiple regression model. Your model R\u00b2 isn\u2019t as good as you\u00a0wanted. For improvement, you remove the intercept term, your model R\u00b2 becomes 0.8 from 0.3. Is it possible? How?<\/h3>\n<p>Answer: Yes, it is possible. We need to understand the significance of the intercept term in a regression model. The intercept term is showing model prediction without any independent variable i.e. mean prediction.<br \/>\nThe formula of<br \/>\nR\u00b2 = 1 \u2013 \u03a3(y \u2013 y\u00b4)\u00b2\/\u03a3(y \u2013 ymean)\u00b2<br \/>\nwhere y\u00b4 is predicted value.<\/p>\n<p>When the intercept term is present, the R\u00b2 value evaluates your model wrt. to the mean model. In absence of intercept term ( ymean), the model can make no such evaluation, with large denominator, \u03a3(y &#8211; y\u00b4)\u00b2\/\u03a3(y)\u00b2 equation\u2019s value becomes smaller than actual, resulting in higher R\u00b2.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q15. After analyzing the model, your manager has informed us that your regression model is suffering from multicollinearity. How would you check if he\u2019s true? Without losing any information, can you still build a better model?<\/h3>\n<p><strong>Answer:<\/strong> To check multicollinearity, we can create a correlation matrix to identify &amp; remove variables having a correlation above 75% (deciding a threshold is subjective). In addition, we can use calculate VIF (variance inflation factor) to check the presence of multicollinearity.<br \/>\nVIF value&lt;= 4 suggests no multicollinearity whereas a value of &gt;= 10 implies serious multicollinearity. Also, we can use tolerance as an indicator of multicollinearity.<br \/>\nBut, removing correlated variables might lead to loss of information. In order to retain those variables, we can use penalized regression models like ridge or lasso regression. Also, we can add some random noise in the correlated variables so that the variables become different from each other. But, adding noise might affect the prediction accuracy, hence this approach should<br \/>\nbe carefully used.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q16. When is Ridge regression favorable over Lasso regression?<\/h3>\n<p><strong>Answer:<\/strong> You can quote ISLR\u2019s authors Hastie, Tibshirani who asserted that, in the presence of few variables with medium \/ large sized effect, use lasso regression. In presence of many variables with small\/medium-sized effects, use ridge regression.<br \/>\nConceptually, we can say, lasso regression (L1) does both variable selection and parameter shrinkage, whereas Ridge regression only does parameter shrinkage and end up including all the coefficients in the model. In the presence of correlated variables, ridge regression might be the preferred choice. Also, ridge regression works best in situations where the east square estimates have higher variance. Therefore, it depends on our model objective.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q17. The rise in global average temperature led to a decrease in the number of pirates around the world. Does that mean that a decrease in the number of pirates caused climate change?<\/h3>\n<p><strong>Answer:<\/strong> After reading this question, you should have understood that this is a classic case of \u201ccausation and correlation\u201d. No, we can\u2019t conclude that the decrease in the number of pirates caused climate change because there might be other factors (lurking or confounding variables) influencing this phenomenon. Therefore, there might be a correlation between global average temperature and number of pirates, but based on this information we can\u2019t say that pirated died because of the rise in global<br \/>\naverage temperature.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q18. While working on a data set, how do you select important variables? Explain your methods?<\/h3>\n<p>Answer:<br \/>\nFollowing are the methods of variable selection you can use:<br \/>\n1. Remove the correlated variables prior to selecting important variables<br \/>\n2. Use linear regression and select variables based on p values<br \/>\n3. Use Forward Selection, Backward Selection, Stepwise Selection<br \/>\n4. Use Random Forest, Xgboost and plot variable importance chart<br \/>\n5. Use Lasso Regression<br \/>\n6. Measure information gain for the available set of features and select top n features accordingly.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q19. What is the difference between covariance and correlation?<\/h3>\n<p>Answer:<br \/>\nCorrelation is the standardized form of covariance.<br \/>\nCovariances are difficult to compare. For example: if we calculate the covariances of salary ($) and age (years), we\u2019ll get different covariances that can\u2019t be compared because of having unequal scales. To combat such a situation, we calculate correlation to get a value between -1 and 1, irrespective of their respective scale.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q20. Is it possible to capture the correlation between continuous and categorical variables? If yes, how?<\/h3>\n<p>Answer:<br \/>\nYes, we can use ANCOVA (analysis of covariance) technique to capture the association between continuous and categorical variables.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q21. Both being a tree-based algorithm, how is random forest different from the Gradient boosting algorithm (GBM)?<\/h3>\n<p>Answer:<br \/>\nThe fundamental difference is, random forest uses bagging techniques to make predictions. GBM uses boosting techniques to make predictions.<br \/>\nIn the bagging technique, a data set is divided into n samples using randomized sampling.<br \/>\nThen, using a single learning algorithm a model is built on all samples. Later, the resultant predictions are combined using voting or averaging. Bagging is done in parallel. In boosting, after the first round of predictions, the algorithm weighs misclassified predictions higher, such that they can be corrected in the succeeding round. This sequential process of giving higher weights to misclassified predictions continues until a stopping criterion is reached.<br \/>\nRandom forest improves model accuracy by reducing variance (mainly). The trees grown are uncorrelated to maximize the decrease in variance. On the other hand, GBM improves accuracy my reducing both bias and variance in a model<br \/>\n&nbsp;<\/p>\n<h1>Top 50 Machine Learning Interview Questions and Answers<\/h1>\n<p>&nbsp;<br \/>\n&nbsp;<\/p>\n<h3>Q22. Running a binary classification tree algorithm is the easy part. Do you know how does a tree splitting takes place i.e. how does the tree decide which variable to split at the root node and succeeding nodes?<\/h3>\n<p>Answer:<br \/>\nA classification tree makes the decision based on the Gini Index and Node Entropy. In simple words, the tree algorithm finds the best possible feature which can divide the data set into purest possible children nodes.<br \/>\nGini index says, if we select two items from a population at random then they must be of the same<br \/>\nclass and the probability for this is 1 if the population is pure. We can calculate Gini as following:<br \/>\n1. Calculate Gini for sub-nodes, using the formula sum of the square of probability for success and failure<br \/>\n(p^2+q^2).<br \/>\n2. Calculate Gini for split using weighted Gini score of each node of that split<br \/>\nEntropy is the measure of impurity as given by (for binary class):<\/p>\n<p>Here p and q is the probability of success and failure respectively in that node. Entropy is zero when a node is homogeneous. It is maximum when both the classes are present in a node at 50% \u2013 50%. Lower entropy is desirable.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q23. You\u2019ve built a random forest model with 10000 trees. You got delighted after getting training error as 0.00. But, the validation error is 34.23. What is going on? Haven\u2019t you trained your model perfectly?<\/h3>\n<p>Answer:<br \/>\nThe model has overfitted. Training error 0.00 means the classifier has minimized the training data patterns to an extent, that they are not available in the unseen data. Hence, when this classifier was run on an unseen sample, it couldn\u2019t find those patterns and returned a prediction with higher error. In a random forest, it happens when we use a larger number of trees than necessary. Hence, to avoid these situations, we should tune the number of trees using cross-validation.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q24. You\u2019ve got a data set to work having p (no. of variable) &gt; n (no. of observation). Why is OLS as a bad option to work with? Which techniques would be best to use? Why?<\/h3>\n<p>Answer: In such high dimensional data sets, we can\u2019t use classical regression techniques, since their assumptions tend to fail. When p &gt; n, we can no longer calculate a unique least-square coefficient estimate, the variances become infinite, so OLS cannot be used at all.<br \/>\nTo combat this situation, we can use penalized regression methods like lasso, LARS, ridge which can shrink the coefficients to reduce variance. Precisely, ridge regression works best in situations where the least square estimates have higher variance.<br \/>\nAmong other methods include subset regression, forward stepwise regression.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q25. What is the convex hull? (Hint: Think SVM)<\/h3>\n<p>Answer: In the case of linearly separable data, the convex hull represents the outer boundaries of the two groups of data points. Once the convex hull is created, we get maximum margin hyperplane (MMH) as a perpendicular bisector between two<br \/>\nconvex hulls.<\/p>\n<p>MMH is the line which attempts to create the greatest<br \/>\nthe separation between two groups.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q26. We know that one hot encoding increasing the dimensionality of a data set. But, label encoding doesn\u2019t. How?<\/h3>\n<p>Answer:<br \/>\nDon\u2019t get baffled at this question. It\u2019s a simple question asking the difference between the two.<br \/>\nUsing one hot encoding, the dimensionality (a.k.a features) in a data set get increased because it creates a new variable for each level present in categorical variables. For example: let\u2019s say we have a variable \u2018color\u2019. The variable has 3 levels namely Red, Blue, and Green. One hot encoding \u2018color\u2019 variable will generate three new variables as Color. Red, Color.Blue and Color.Green<br \/>\ncontaining 0 and 1 value.<br \/>\nIn label encoding, the levels of categorical variables get encoded as 0 and 1, so no new variable is created. Label encoding is majorly used for binary variables.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q27. What cross-validation technique would you use on the time series data set? Is it k-fold or LOOCV?<\/h3>\n<p>Answer:<br \/>\nNeither. In time series problem, k fold can be troublesome because there might be some pattern in year 4 or 5 which is not in year 3. Resampling the data set will separate these trends, and we might end up validation in past years, which is incorrect. Instead, we can use forward chaining<br \/>\nstrategy with 5 fold as shown below:<\/p>\n<p>fold 1: training [1], test [2]<br \/>\nfold 2: training [1 2], test [3]<br \/>\nfold 3: training [1 2 3], test [4]<br \/>\nfold 4: training [1 2 3 4], test [5]<br \/>\nfold 5: training [1 2 3 4 5], test [6]<br \/>\nwhere 1,2,3,4,5,6 represents \u201cyear\u201d.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q28. You are given a data set consisting of variables having more than 30% missing values? Let\u2019s say, out of 50 variables, 8 variables have missing values higher than 30%. How will you deal with them?<\/h3>\n<p>Answer:<br \/>\nWe can deal with them in the following ways:<br \/>\n1. Assign a unique category to miss values, who knows the missing values might decipher some trend<br \/>\n2. We can remove them blatantly.<br \/>\n3. Or, we can sensibly check their distribution with the target variable, and if found any pattern we\u2019ll keep those missing values and assign them a new category while removing others.<\/p>\n<p>&nbsp;<\/p>\n<h3>29. \u2018People who bought this, also bought\u2026\u2019 recommendations seen on amazon is a result of which algorithm?<\/h3>\n<p>Answer: The basic idea for this kind of recommendation engine comes from a collaborative filtering algorithm that considers \u201cUser Behavior\u201d for recommending items. They exploit the behavior of other users and items in terms of transaction history, ratings, selection, and purchase information. Other user&#8217;s behavior and preferences over the items are used to recommend items to the new users. In this case, features of the items are not known.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q30. What do you understand by Type I vs Type II error?<\/h3>\n<p>Answer:<br \/>\nType I error is committed when the null hypothesis is true and we reject it, also known as a \u2018False Positive\u2019. Type II error is committed when the null hypothesis is false and we accept it, also known as \u2018False Negative\u2019. In the context of the confusion matrix, we can say Type I error occurs when we classify a value as positive (1) when it is actually negative (0). Type II error occurs when we classify a value as negative (0) when it is actually positive(1).<\/p>\n<p>&nbsp;<\/p>\n<h3>Q31. You are working on a classification problem. For validation purposes, you\u2019ve randomly sampled the training data set into train and validation. You are confident that your model will work incredibly well on unseen data since your validation accuracy is high. However, you get shocked after getting poor test accuracy. What went wrong?<\/h3>\n<p>Answer:<br \/>\nIn the case of classification problems, we should always use stratified sampling instead of random sampling. A random sampling doesn\u2019t take into consideration the proportion of target classes. On the contrary, stratified sampling helps to maintain the distribution of the target variables in the resultant distributed samples also.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q32. You have been asked to evaluate a regression model based on R\u00b2, adjusted R\u00b2, and tolerance. What will be your criteria?<\/h3>\n<p>Answer:<br \/>\nTolerance (1 \/ VIF) is used as an indicator of multicollinearity. It is an indicator of<br \/>\npercent of the variance in a predictor that cannot be accounted for by other predictors. Large values of tolerance are desirable.<br \/>\nWe will consider adjusted R\u00b2 as opposed to R\u00b2 to evaluate model fit because of R\u00b2 increases irrespective of improvement in prediction accuracy as we add more variables. But, adjusted R\u00b2 would only increase if an additional variable improves the accuracy of the model, otherwise, it stays the same. It is difficult to commit a general threshold value for adjusted R\u00b2 because it varies between data sets.<\/p>\n<p>For example, a gene mutation data set might result in lower adjusted R\u00b2 and still provide fairly good predictions, as compared to a stock market data where lower adjusted R\u00b2 implies that the model is not good.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q33. In k-means or kNN, we use euclidean distance to calculate the distance between nearest neighbors. Why not manhattan distance?<\/h3>\n<p>Answer:<br \/>\nWe don\u2019t use manhattan distance because it calculates distance horizontally or<br \/>\nvertically only. It has dimension restrictions. On the other hand, euclidean metric can be used in any space to calculate distance. Since the data points can be present in any dimension, euclidean distance is a more viable option.<br \/>\nExample: Think of a chess board, the movement made by a bishop or a rook is calculated by manhattan distance because of their respective vertical &amp; horizontal movements<\/p>\n<p>&nbsp;<\/p>\n<h3>Q34. Explain machine learning to me like a 5-year-old.<\/h3>\n<p>Answer:<br \/>\nIt\u2019s simple. It\u2019s just like how babies learn to walk. Every time they fall down, they learn (unconsciously) &amp; realize that their legs should be straight and not in a bend position. The next time they fall down, they feel pain. They cry. But, they learn \u2018not to stand like that again\u2019. In order to avoid that pain, they try harder. To succeed, they even seek support from the door or wall or anything near them, which helps them stand firm.<br \/>\nThis is how a machine works &amp; develops intuition from its environment.<br \/>\nNote: The interview is only trying to test if have the ability to explain complex concepts in simple<br \/>\nterms.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q35. I know that a linear regression model is generally evaluated using Adjusted R\u00b2 or F value. How would you evaluate a logistic regression model?<\/h3>\n<p>Answer:<br \/>\nWe can use the following methods:<br \/>\n1. Since logistic regression is used to predict probabilities, we can use the AUC-ROC curve along with the confusion matrix to determine its performance.<br \/>\n2. Also, the analogous metric of adjusted R\u00b2 in logistic regression is AIC. AIC is the measure of fit which penalizes the model for the number of model coefficients. Therefore, we always prefer the model with minimum AIC value.<br \/>\n3. Null Deviance indicates the response predicted by a model with nothing but an intercept.<br \/>\nLower the value, better the model. Residual deviance indicates the response predicted by a model on adding independent variables. Lower the value, better the model.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q36. Considering the long list of the machine learning algorithm, given a data set, how do you decide which one to use?<\/h3>\n<p>Answer:<br \/>\nYou should say, the choice of machine learning algorithm solely depends on the type of data. If you are given a data set which is exhibits linearity, then linear regression would be the best algorithm to use. If you have given to work on images, audios, then the neural networks would help you to build a robust model.<br \/>\nIf the data comprises of nonlinear interactions, then a boosting or bagging algorithm should be the choice. If the business requirement is to build a model that can be deployed, then we\u2019ll use regression or a decision tree model (easy to interpret and explain) instead of black-box algorithms like SVM, GBM, etc.<br \/>\nIn short, there is no one master algorithm for all situations. We must be scrupulous enough to understand which algorithm to use.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q37. Do you suggest that treating a categorical variable as a continuous variable would result in a better predictive model?<\/h3>\n<p>Answer:<br \/>\nFor better predictions, the categorical variable can be considered as a continuous variable only when the variable is ordinal in nature.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q38. When does regularization become necessary in Machine Learning?<\/h3>\n<p>Answer:<br \/>\nRegularization becomes necessary when the model begins to overfit\/underfit. This technique introduces a cost term for bringing in more features with the objective function.<br \/>\nHence, it tries to push the coefficients for many variables to zero and hence reduce the cost term.<br \/>\nThis helps to reduce model complexity so that the model can become better at predicting (generalizing).<\/p>\n<p>&nbsp;<\/p>\n<h3>Q39. What do you understand by Bias Variance trade-off?<\/h3>\n<p>Answer:<br \/>\nThe error emerging from any model can be broken down into three components mathematically.<br \/>\nFollowing are these component:<\/p>\n<p>Bias error is useful to quantify how much on an average are the predicted values different from the actual value. A high bias error means we have an under-performing model that keeps on missing important trends. Variance on the other side quantifies how are the prediction made on the same observation different from each other. A high variance model will over-fit on your training population and perform badly on any observation beyond training.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q40. OLS is too linear regression. The maximum likelihood is logistic regression. Explain the statement.?<\/h3>\n<p>Answer:<br \/>\nOLS and Maximum likelihood are the methods used by the respective regression methods to approximate the unknown parameter (coefficient) value. In simple words, Ordinary least square(OLS) is a method used in linear regression which approximates the parameters resulting in minimum distance between actual and predicted values. Maximum Likelihood helps in choosing the values of parameters which maximizes the likelihood that the parameters are most likely to produce observed data.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q41. Difference between Arima and Sarima Model?<\/h3>\n<p>Ans: what&#8217;s Wrong with ARIMA<br \/>\nAutoregressive Integrated Moving Average, or ARIMA, is a forecasting method for univariate time series data.<br \/>\nAs its name suggests, it supports both autoregressive and moving average elements. The integrated element refers to differencing allowing the method to support time-series data with a trend.<br \/>\nA problem with ARIMA is that it does not support seasonal data. That is a time series with a repeating cycle.<br \/>\nARIMA expects data that is either not seasonal or has the seasonal component removed, e.g. seasonally adjusted via methods such as seasonal differencing.<br \/>\nThe parameters of the ARIMA model are defined as follows:<br \/>\n\u2022p: The number of lag observations included in the model, also called the lag order.<br \/>\n\u2022d: The number of times that the raw observations are differenced also called the degree of difference.<br \/>\n\u2022q: The size of the moving average window, also called the order of moving average.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q42.Difference between AIC And BIC?<\/h3>\n<p>Ans.<br \/>\nAkaike information criterion (AIC) (Akaike, 1974) is a fined technique based on in-sample fit to estimate the likelihood of a model to predict\/estimate the future values.<br \/>\nA good model is the one that has minimum AIC among all the other models. The AIC can be used to select between the additive and multiplicative Holt-Winters models.<br \/>\nBayesian information criterion (BIC) (Stone, 1979) is another criteria for model selection that measures the trade-off between model fit and complexity of the model. A lower AIC or BIC value indicates a better fit.<br \/>\nAIC and BIC are both penalized-likelihood criteria. Both are of the form \u201cmeasure of fit + complexity penalty\u201d:<br \/>\nAIC = -2*ln(likelihood) + 2*p, and BIC = -2*ln(likelihood) + ln(N)*p,<br \/>\nwhere p = number of estimated parameters, N = sample size<br \/>\n\u2022AIC is best for prediction as it is asymptotically equivalent to cross-validation.<br \/>\n\u2022BIC is best for an explanation as it allows consistent estimation of the underlying data generating process<br \/>\nAIC is equivalent to K-fold cross-validation, BIC is equivalent to leve-one-out cross-validation.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q43.Difference between AUC and ROC?<\/h3>\n<p>Ans. In Machine Learning, performance measurement is an essential task. So when it comes to a classification problem, we can count on an AUC &#8211; ROC Curve.<br \/>\nWhen we need to check or visualize the performance of the multi &#8211; class classification problem, we use AUC (Area Under The Curve) ROC (Receiver Operating Characteristics) curve. It is one of the most important evaluation metrics for checking any classification model\u2019s performance. It is also written as AUROC (Area Under the Receiver Operating Characteristics)<br \/>\nThe ROC curve is plotted with TPR against the FPR where TPR is on y-axis and FPR is on the x-axis.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q44. What is the Confusion Matrix and why you need it?<\/h3>\n<p>Well, it is a performance measurement for machine learning classification problem where output can be two or more classes. It is a table with 4 different combinations of predicted and actual values.<\/p>\n<p>It is extremely useful for measuring Recall, Precision, Specificity, Accuracy and most importantly AUC-ROC Curve.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q45.Explain naive Bayes and when it will use and how?<\/h3>\n<p>Ans. Naive Bayes performs well when we have multiple classes and working with text classification. Advantage of Naive Bayes algorithms are:<br \/>\nIt is simple and if the conditional independence assumption actually holds, a Naive Bayes classifier will converge quicker than discriminative models like logistic regression, so you need less training data. And even if the NB assumption doesn\u2019t hold.<br \/>\nIt requires less model training time.<br \/>\nThe main difference between Naive Bayes(NB) and Random Forest (RF) is their model size. Naive Bayes model size is low and quite constant with respect to the data. The NB models cannot represent complex behavior so they won\u2019t get into overfitting. On the other hand, the Random Forest model size is very large and if not carefully built, it results in overfitting. So, when your data is dynamic and keeps changing. NB can adapt quickly to the changes and new data while using an RF you would have to rebuild the forest every time something changes.<br \/>\nfrom sklearn.naive_bayes import GaussianNB<\/p>\n<p>&nbsp;<\/p>\n<h3>Q46. difference between k means clustering and knn algorithm?<\/h3>\n<p>K-nearest neighbors algorithm (k-NN) is a supervised method used for classification and regression problems. However, it is widely used in classification problems. It makes predictions <a href=\"https:\/\/prwatech.in\/\">by learning from<\/a> the past available data.<br \/>\nSupervised Technique<br \/>\nUsed for Classification or Regression<br \/>\nUsed for classification and regression of known data where usually the target attribute\/variable is known beforehand.<br \/>\nKNN needs labeled points<\/p>\n<p>K- Means clustering is used for analyzing and grouping data which does not include pre-labeled class or even a class attribute at all.<br \/>\nUnsupervised Technique<br \/>\nUsed for Clustering<br \/>\nUsed for scenarios like understanding the population demographics, social media trends, anomaly detection, etc.<br \/>\nK-Means doesn\u2019t require labeled points<\/p>\n<p>&nbsp;<\/p>\n<h3>Q 47. How does the K-means algorithm work?<\/h3>\n<p>In unsupervised learning, the data is not labeled so consider the unlabelled data. Our task is to group the data into two clusters.<\/p>\n<p>This is our data; the first thing we can do is to randomly initialize two points, called the cluster centroids.<\/p>\n<p>In k-means we do two things. First is a cluster assignment step and second is a move centroid step.<\/p>\n<p>In the first step, the algorithm goes to each of the data points and divides the points into respective classes, depending on whether it is closer to the red cluster centroid or green cluster centroid.<\/p>\n<p>In the second step, we move the centroid step. We compute the mean of all the red points and move the red cluster centroid there. We do the same thing for the green cluster.<br \/>\nThis is an iterative step so we do the above step till the cluster centroid will not move any further and the colors of the point will not change any further.<\/p>\n<p>KNN is a supervised learning algorithm which means training data is labeled. Consider the task of classifying a green circle between class 1 and class 2.<\/p>\n<p>If we choose k=1, then the green circle will go into class 1 as it is closer to class 1. If K=3, then there are \u2018two\u2019 class 2 objects and \u2018one\u2019 class one object. So KNN will classify the green circle in class 2 as it forms the majority.<\/p>\n<p>&nbsp;<\/p>\n<h3>Q 48. How will you avoid overfitting and underfitting and hence build a robust model?<\/h3>\n<p>Avoid overfitting.<br \/>\nCross-Validation: A standard way to find out-of-sample prediction error is to use 5-fold cross-validation.<br \/>\nEarly Stopping: Its rules provide us the guidance 5as to how many iterations can be run before the learner begins to over-fit.<br \/>\nPruning: Pruning is extensively used while building-related models. It simply removes the nodes which add little predictive power for the problem in hand.<br \/>\nRegularization: It introduces a cost term for bringing in more features with the objective function. Hence it tries to push the coefficients for many variables to zero and hence reduce cost term<\/p>\n<p>&nbsp;<\/p>\n<h3>Q49. How is Random Forest different from GBM, both being tree based?<\/h3>\n<p>Ans. GBM and RF both are ensemble learning methods and predict (regression or classification)<br \/>\nRFs train each tree independently, using a random sample of the data. This randomness helps to make the model more robust than a single decision tree, and less likely to overfit on the training data<br \/>\nRF is much easier to tune than GBM. There are typically two parameters in RF: number of trees and number of features to be selected at each node.<br \/>\nRF is harder to overfit than GBM.<br \/>\nThe main limitation of the Random Forests algorithm is that a large number of trees may make the algorithm slow for real-time prediction.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>&nbsp; Top 50 Machine Learning Interview Questions and Answers &nbsp; Q1) You are given a train data set having 1000 columns and 1 million rows. The data set is based on a classification problem. Your manager has asked you to reduce the dimension of this data so that model computation time can be reduced. Your [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":3474,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[36,1709],"tags":[87],"class_list":["post-3475","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-interview-questions","category-interview-questions-interview-questions","tag-machine-learning-interview-questions-and-answers"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.7 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Top 50 Machine Learning Interview Questions and Answers | Prwatech<\/title>\n<meta name=\"description\" content=\"Here is the List of Top Rated 50 Machine Learning Interview Questions and Answers with Examples,learn advanced tutorials from us today itself.\" \/>\n<meta name=\"robots\" content=\"noindex, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Top 50 Machine Learning Interview Questions and Answers | Prwatech\" \/>\n<meta property=\"og:description\" content=\"Here is the List of Top Rated 50 Machine Learning Interview Questions and Answers with Examples,learn advanced tutorials from us today itself.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/prwatech.in\/blog\/interview-questions\/machine-learning-interview-questions-answers-2\/\" \/>\n<meta property=\"og:site_name\" content=\"Prwatech\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/prwatech.in\/\" \/>\n<meta property=\"article:published_time\" content=\"2019-11-18T09:29:23+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-07-20T05:26:48+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/prwatech.in\/blog\/wp-content\/uploads\/2019\/11\/Machine-Learning-Questions-and-Answers.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"960\" \/>\n\t<meta property=\"og:image:height\" content=\"550\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Prwatech\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Eduprwatech\" \/>\n<meta name=\"twitter:site\" content=\"@Eduprwatech\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Prwatech\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"30 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/prwatech.in\/blog\/interview-questions\/machine-learning-interview-questions-answers-2\/\",\"url\":\"https:\/\/prwatech.in\/blog\/interview-questions\/machine-learning-interview-questions-answers-2\/\",\"name\":\"Top 50 Machine Learning Interview Questions and Answers | Prwatech\",\"isPartOf\":{\"@id\":\"https:\/\/prwatech.in\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/prwatech.in\/blog\/interview-questions\/machine-learning-interview-questions-answers-2\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/prwatech.in\/blog\/interview-questions\/machine-learning-interview-questions-answers-2\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/prwatech.in\/blog\/wp-content\/uploads\/2019\/11\/Machine-Learning-Questions-and-Answers.jpg\",\"datePublished\":\"2019-11-18T09:29:23+00:00\",\"dateModified\":\"2023-07-20T05:26:48+00:00\",\"author\":{\"@id\":\"https:\/\/prwatech.in\/blog\/#\/schema\/person\/db90baff7744090b2288bbc98fea87f3\"},\"description\":\"Here is the List of Top Rated 50 Machine Learning Interview Questions and Answers with Examples,learn advanced tutorials from us today itself.\",\"breadcrumb\":{\"@id\":\"https:\/\/prwatech.in\/blog\/interview-questions\/machine-learning-interview-questions-answers-2\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/prwatech.in\/blog\/interview-questions\/machine-learning-interview-questions-answers-2\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/prwatech.in\/blog\/interview-questions\/machine-learning-interview-questions-answers-2\/#primaryimage\",\"url\":\"https:\/\/prwatech.in\/blog\/wp-content\/uploads\/2019\/11\/Machine-Learning-Questions-and-Answers.jpg\",\"contentUrl\":\"https:\/\/prwatech.in\/blog\/wp-content\/uploads\/2019\/11\/Machine-Learning-Questions-and-Answers.jpg\",\"width\":960,\"height\":550,\"caption\":\"Machine Learning Questions and Answers\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/prwatech.in\/blog\/interview-questions\/machine-learning-interview-questions-answers-2\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/prwatech.in\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Top 50 Machine Learning Interview Questions and Answers\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/prwatech.in\/blog\/#website\",\"url\":\"https:\/\/prwatech.in\/blog\/\",\"name\":\"Prwatech\",\"description\":\"Share Ideas, Start Something Good.\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/prwatech.in\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/prwatech.in\/blog\/#\/schema\/person\/db90baff7744090b2288bbc98fea87f3\",\"name\":\"Prwatech\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/prwatech.in\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/c00bafc1b04045f31eda917de39891456c44fa47c092b9bb6be0f860a3a30a2f?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/c00bafc1b04045f31eda917de39891456c44fa47c092b9bb6be0f860a3a30a2f?s=96&d=mm&r=g\",\"caption\":\"Prwatech\"},\"url\":\"https:\/\/prwatech.in\/blog\/author\/prwatech123\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Top 50 Machine Learning Interview Questions and Answers | Prwatech","description":"Here is the List of Top Rated 50 Machine Learning Interview Questions and Answers with Examples,learn advanced tutorials from us today itself.","robots":{"index":"noindex","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"og_locale":"en_US","og_type":"article","og_title":"Top 50 Machine Learning Interview Questions and Answers | Prwatech","og_description":"Here is the List of Top Rated 50 Machine Learning Interview Questions and Answers with Examples,learn advanced tutorials from us today itself.","og_url":"https:\/\/prwatech.in\/blog\/interview-questions\/machine-learning-interview-questions-answers-2\/","og_site_name":"Prwatech","article_publisher":"https:\/\/www.facebook.com\/prwatech.in\/","article_published_time":"2019-11-18T09:29:23+00:00","article_modified_time":"2023-07-20T05:26:48+00:00","og_image":[{"width":960,"height":550,"url":"https:\/\/prwatech.in\/blog\/wp-content\/uploads\/2019\/11\/Machine-Learning-Questions-and-Answers.jpg","type":"image\/jpeg"}],"author":"Prwatech","twitter_card":"summary_large_image","twitter_creator":"@Eduprwatech","twitter_site":"@Eduprwatech","twitter_misc":{"Written by":"Prwatech","Est. reading time":"30 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/prwatech.in\/blog\/interview-questions\/machine-learning-interview-questions-answers-2\/","url":"https:\/\/prwatech.in\/blog\/interview-questions\/machine-learning-interview-questions-answers-2\/","name":"Top 50 Machine Learning Interview Questions and Answers | Prwatech","isPartOf":{"@id":"https:\/\/prwatech.in\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/prwatech.in\/blog\/interview-questions\/machine-learning-interview-questions-answers-2\/#primaryimage"},"image":{"@id":"https:\/\/prwatech.in\/blog\/interview-questions\/machine-learning-interview-questions-answers-2\/#primaryimage"},"thumbnailUrl":"https:\/\/prwatech.in\/blog\/wp-content\/uploads\/2019\/11\/Machine-Learning-Questions-and-Answers.jpg","datePublished":"2019-11-18T09:29:23+00:00","dateModified":"2023-07-20T05:26:48+00:00","author":{"@id":"https:\/\/prwatech.in\/blog\/#\/schema\/person\/db90baff7744090b2288bbc98fea87f3"},"description":"Here is the List of Top Rated 50 Machine Learning Interview Questions and Answers with Examples,learn advanced tutorials from us today itself.","breadcrumb":{"@id":"https:\/\/prwatech.in\/blog\/interview-questions\/machine-learning-interview-questions-answers-2\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/prwatech.in\/blog\/interview-questions\/machine-learning-interview-questions-answers-2\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/prwatech.in\/blog\/interview-questions\/machine-learning-interview-questions-answers-2\/#primaryimage","url":"https:\/\/prwatech.in\/blog\/wp-content\/uploads\/2019\/11\/Machine-Learning-Questions-and-Answers.jpg","contentUrl":"https:\/\/prwatech.in\/blog\/wp-content\/uploads\/2019\/11\/Machine-Learning-Questions-and-Answers.jpg","width":960,"height":550,"caption":"Machine Learning Questions and Answers"},{"@type":"BreadcrumbList","@id":"https:\/\/prwatech.in\/blog\/interview-questions\/machine-learning-interview-questions-answers-2\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/prwatech.in\/blog\/"},{"@type":"ListItem","position":2,"name":"Top 50 Machine Learning Interview Questions and Answers"}]},{"@type":"WebSite","@id":"https:\/\/prwatech.in\/blog\/#website","url":"https:\/\/prwatech.in\/blog\/","name":"Prwatech","description":"Share Ideas, Start Something Good.","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/prwatech.in\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/prwatech.in\/blog\/#\/schema\/person\/db90baff7744090b2288bbc98fea87f3","name":"Prwatech","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/prwatech.in\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/c00bafc1b04045f31eda917de39891456c44fa47c092b9bb6be0f860a3a30a2f?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/c00bafc1b04045f31eda917de39891456c44fa47c092b9bb6be0f860a3a30a2f?s=96&d=mm&r=g","caption":"Prwatech"},"url":"https:\/\/prwatech.in\/blog\/author\/prwatech123\/"}]}},"_links":{"self":[{"href":"https:\/\/prwatech.in\/blog\/wp-json\/wp\/v2\/posts\/3475","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/prwatech.in\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/prwatech.in\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/prwatech.in\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/prwatech.in\/blog\/wp-json\/wp\/v2\/comments?post=3475"}],"version-history":[{"count":3,"href":"https:\/\/prwatech.in\/blog\/wp-json\/wp\/v2\/posts\/3475\/revisions"}],"predecessor-version":[{"id":3478,"href":"https:\/\/prwatech.in\/blog\/wp-json\/wp\/v2\/posts\/3475\/revisions\/3478"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/prwatech.in\/blog\/wp-json\/wp\/v2\/media\/3474"}],"wp:attachment":[{"href":"https:\/\/prwatech.in\/blog\/wp-json\/wp\/v2\/media?parent=3475"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/prwatech.in\/blog\/wp-json\/wp\/v2\/categories?post=3475"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/prwatech.in\/blog\/wp-json\/wp\/v2\/tags?post=3475"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}