If we are dealing with text documents and want to perform machine learning on text, we can’t directly work with raw text. We first need to convert the text into numbers or vectors of numbers. In this article, […]
Continue readingMore TagAuthor: Niranjan B Subramanian
Introduction to Logistic Regression
In this article, we’ll discuss a supervised machine learning algorithm logistic regression. Logistic regression is a probabilistic classifier. It outputs the probability of a point belonging to a specific class. The output lies between [0,1]. Logistic […]
Continue readingMore TagAssumptions Made by Ordinary Least Squares(OLS)
Introduction: Ordinary Least Squares(OLS) is a commonly used technique for linear regression analysis. OLS makes certain assumptions about the data like linearity, no multicollinearity, no autocorrelation, homoscedasticity, normal distribution of errors. Violating these assumptions may reduce […]
Continue readingMore TagIntroduction to K-Nearest Neighbors
The K-nearest neighbor(K-NN) classifier is one of the easiest classification methods to understand and is one of the most basic classification models available. K-NN is a non-parametric method which classifies based on the distance to the […]
Continue readingMore TagLinear Regression Using Statsmodels
Introduction: In this tutorial, we’ll discuss how to build a linear regression model using statsmodels. Statsmodels is a Python module that provides classes and functions for the estimation of many different statistical models, as well as […]
Continue readingMore TagMachine Learning Pipeline
A machine learning pipeline bundles up the sequence of steps into a single unit. For example, in text classification, the documents go through an imperative sequence of steps like tokenizing, cleaning, extraction of features and training. […]
Continue readingMore TagCross Validation
Cross-validation is an important evaluation technique used to assess the generalization performance of a machine learning model. It helps us to measure how well a model generalizes on a training data set. There are two main […]
Continue readingMore TagPandas Introduction
In this article, we’ll discuss Pandas, which is the most popular python data analysis library. Data analysis is a process of cleaning, exploring, organizing, describing, and visualizing data. Pandas is mainly used for cleaning and exploring […]
Continue readingMore TagBox and Whiskers Plot
Box plot, also known as box-and-whisker plot, helps us to study the distribution of the data and to spot the outliers effectively. It is a very convenient way to visualize the spread and skew of the […]
Continue readingMore TagIntroduction to Support Vector Machine(SVM)
INTRODUCTION: Support Vector Machine(SVM) is the most popular and powerful supervised machine learning algorithm which is used for both classification and regression. However, it is more popular and extensively used in addressing the classification problems of […]
Continue readingMore Tag