Data Science & Machine Learning
75.5K subscribers
796 photos
68 files
703 links
Join this channel to learn data science, artificial intelligence and machine learning with funny quizzes, interesting projects and amazing resources for free

For collaborations: @love_data
Download Telegram
Some interview questions related to Data science

1- what is difference between structured data and unstructured data.

2- what is multicollinearity.and how to remove them

3- which algorithms you use to find the most correlated features in the datasets.

4- define entropy

5- what is the workflow of principal component analysis

6- what are the applications of principal component analysis not with respect to dimensionality reduction

7- what is the Convolutional neural network. Explain me its working
๐Ÿ‘8โค5
Decision trees and Random forests?

Decision tree is a type of supervised learning algorithm (having a pre-defined target variable) that is mostly used in classification problems. It works for both categorical and continuous input and output variables. In this technique, we split the population or sample into two or more homogeneous sets (or sub-populations) based on most significant splitter / differentiator in input variables.

Random Forest is a versatile machine learning method capable of performing both regression and classification tasks. It also undertakes dimensional reduction methods, treats missing values, outlier values and other essential steps of data exploration, and does a fairly good job. It is a type of ensemble learning method, where a group of weak models combine to form a powerful model.
๐Ÿ‘9
๐Ÿ˜‰5 Machine Learning Algorithms with Project Ideas

๐Ÿ“‰Linear Regression -> House Price Prediction
๐Ÿ“ˆLogistic Regression -> Loan Default Prediction
๐Ÿ—ž๏ธ SVM -> News Classification
๐Ÿ›๏ธ KNN -> Breast Cancer Classification
๐Ÿงฎ Naive Bayes -> Text Classification
๐Ÿ‘18โค8
๐Ÿ‘4๐Ÿ˜2
You are given a data set. The data set has missing values which spread along 1 standard deviation from the median. What percentage of data would remain unaffected? Why?

Answer: This question has enough hints for you to start thinking! Since, the data is spread across median, letโ€™s assume itโ€™s a normal distribution. We know, in a normal distribution, ~68% of the data lies in 1 standard deviation from mean (or mode, median), which leaves ~32% of the data unaffected. Therefore, ~32% of the data would remain unaffected by missing values.
๐Ÿ”ฅ6๐Ÿ‘3