Pandas is one of the powerful library used in python for data science and analysis. It has n-number of functions, methods and attributes, which are comparatively easy in syntax and flexible in nature. So a data scientist or any one who wants certain insights from any huge set of data prefers it and let their […]
Decision Tree Classification Algorithm is used for classification based machine learning problems. It is one of the most popular algorithm as the final decision tree is quite easy to interpret and explain. More advanced ensemble methods like random forest, bagging and gradient boosting are having roots in decision tree algorithm. Here we will try to […]
If we figured out our problem needs a regression approach to form a predictive model then generally we adopt linear model to start. Why ? The are easy to interpret, They get trained quickly, Optimization is easy and quite better etc.
KNN or K – Nearest Neighbours is one the powerful algorithm used in classification based problems to successfully make categorical predictions. Scikit-Learn gives us built in library to use and make the process easier for us if we are having data. But here we will write KNN code mathematically without any inbuilt library to figure […]
Decision Tree is an algorithm build for Machine learning purposes which works on the concepts of dividing data into subsets. It means it works to give you subsets representing only one type of category or values within particular range. But there are certain drawback that need to be discussed.
In every situation we went through numerous option same as here. Whenever we decided to work on ML, we do stuck into a thought process of selecting an algorithm which suits our need. Its also bit complicated to know all the minor details of each one of them. Then how can we decide X is […]
Variations in the dataset is actually the information from the dataset and this is what the PCA uses. In simple terms PCA or Principal component analysis is a process to emphasise variations in a data set and generate strong pattern out of it. We can figure out the whole concepts in 3 points as follows— […]
Variance – It is the measure of squared difference from the Mean. To calculate it we follow certain steps mentioned below: Calculate average of numbers For each numbers subtract the mean and square the result Calculate the average of those squared differences i.e. Variance
In case of Principal Component Analysis we project our data points on a vector in a direction of maximum variance to decrease the number of existing components. In this case we consider the direction eigenvector generated using covariance matrix as the direction of maximum variance. In this article we look into the proof of why […]
Eigen vector is the direction in a coordinate space defined by a metrics which doesn’t change its direction with metrics transformation. Eigen value is a scaler number which is multiplied with Eigen vector to give same result as Eigen vector multiplier with existing metrics.