## Introduction to Hyper-parameter Tuning: GridSearchCV and RandomSearchCV

Most of the machine learning algorithms contain a number of hyperparameters that we can tune to improve the model’s performance. The hyperparameter controls the behaviors of the algorithm and has an enormous impact on the results. […]

## Measure of Central Tendency and Measure of Spread

Summarizing the quantitative data can help us understand them better. In this article, we’ll see various methods to summarize quantitative data by the measure of central tendency(such as mean, median, and mode) and by the measure […]

## Why High Dimensional Data are a Curse?

In the era of big data, massive datasets(not only in the number of samples being collected but also in the number of features) are increasingly prevalent in many disciplines and are often difficult to interpret.  In […]

## Converting Raw Text to Numerical Vectors using Bag of Words, N_Grams and TF-IDF

If we are dealing with text documents and want to perform machine learning on text, we can’t directly work with raw text. We first need to convert the text into numbers or vectors of numbers. In this article, […]

## Introduction to Logistic Regression

In this article, we’ll discuss a supervised machine learning algorithm logistic regression. Logistic regression is a probabilistic classifier. It outputs the probability of a point belonging to a specific class. The output lies between [0,1]. Logistic […]

## Assumptions Made by Ordinary Least Squares(OLS)

Introduction: Ordinary Least Squares(OLS) is a commonly used technique for linear regression analysis. OLS makes certain assumptions about the data like linearity, no multicollinearity, no autocorrelation, homoscedasticity, normal distribution of errors. Violating these assumptions may reduce […]

## Introduction to K-Nearest Neighbors

The K-nearest neighbor(K-NN) classifier is one of the easiest classification methods to understand and is one of the most basic classification models available. K-NN is a non-parametric method which classifies based on the distance to the […]

## Linear Regression Using Statsmodels

Introduction: In this tutorial, we’ll discuss how to build a linear regression model using statsmodels. Statsmodels is a Python module that provides classes and functions for the estimation of many different statistical models, as well as […]

## Machine Learning Pipeline

A machine learning pipeline bundles up the sequence of steps into a single unit. For example, in text classification, the documents go through an imperative sequence of steps like tokenizing, cleaning, extraction of features and training. […]