# Introduction to T-SNE with implementation in python

## INTRODUCTION to T – SNE:

T-SNE is a non-linear dimensionality reduction technique used to visualize high-dimensional data in two or more dimensions.

Unlike PCA which preserves only the global structure of the data T-SNE preserves both the local and global structure.

It uses the local relationship between data to map the high-dimensional data in two dimensions.

Since T-SNE is just an advancement of Stochastic Neighbor Embedding(SNE), let’s first try to understand what SNE is?

## STOCHASTIC NEIGHBOR EMBEDDING:

Stochastic neighbor embedding is a probabilistic approach to visualize high-dimensional data. It converts high dimensional Euclidean distances between points into conditional probabilities.

SNE makes an assumption that the distances in both the high and low dimension are Gaussian distributed.

The conditional probability pj|i in the high dimension that a data point xi would pick xj as its neighbor is given by The conditional probability qj|i for the low-dimension is given by The variance σi is associated with the parameter called perplexity which is a measure of the number of neighbors. SNE performs a binary search to find the value σi which produces Pi with a fixed perplexity specified by the user.

Perplexity is defined as, where H(Pi) is the Shannon entropy of Pi measured in bits Since we are calculating the pairwise similarities, the probability that a point will choose itself as a neighbor is zero.

That is the values of Pi|i and Qj|j are equal to zero

Once the probabilities are constructed in both the high and low dimensions, the algorithm aims to reduce the differences between Pj|i and

Qj|i so that the low dimensional mapping will be similar to the high dimensional one.

This is achieved by minimizing a cost function which is a sum of Kullback-Leibler divergences. The cost function C is given by, The minimization of the cost function is done using the gradient descent method.

Since the KL divergence is not symmetric, it emphasizes preserving the local structure.

It preserves the local structure by penalizing heavily when the points which are near in the high dimension are mapped far away in the low dimension, but the penalty is small for mapping widely separated data points close together.

## DRAWBACKS OF SNE:

There are two significant drawbacks in Stochastic Neighbor Embedding.

1. The cost function used is difficult to optimize.

2. Crowding problem, where the moderately-distant data points and the points which are nearby are clumped together to fit in the 2-dimensional space.

## T-SNE:

As the cost function used in SNE is difficult to optimize T-SNE uses a cost function which is a symmetric version of SNE, i.e., pi|j = pj|i and qi|j = qj|i

The cost function is given by with pii = qii = 0

Here, the pairwise similarities in the high dimensional space are defined by For low dimensional mapping, instead of a normal distribution, we will use a heavy-tailed student t-distribution with one degree of freedom.

This heavy-tailed distribution helps to alleviate the crowding problem.

Using this distribution, low dimensional representation is given by, Let’s implement T-SNE using scikit-learn As you can see in the above plot, TSNE constructs an excellent visualization. The cluster of classes are well separated, and there are fewer overlaps.

## SUMMARY:

In this post, I have discussed one of the powerful dimensionality reduction techniques called t-SNE which has limitless applications.

In the beginning, we discussed Stochastic Neighbor Embedding(SNE) which is a predecessor of T-SNE.

We also discussed the drawbacks of SNE and how T-SNE overcomes it. Finally, we implement T-SNE using sklearn.

I hope this post has given you a better understanding of how tsne works and for those who want to delve deep into this topic I suggest you read the original paper and this blog post.