In this article, we are going to see how to create a word cloud in python.
A word cloud creates a collage of most prominent words from the given text. The size of each word is related to the frequency of the word in the text.
In python, to generate a word cloud, we can use the word cloud package.
If you don’t have it installed, you can install it using the following command
1 |
pip install wordcloud |
Let’ see how to create a word cloud
Python
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
import matplotlib.pyplot as plt from wordcloud import WordCloud, STOPWORDS #text from wikipedia text = 'Supervised learning is the machine learning task of learning a function that maps an input to \ an output based on example input-output pairs.[1] It infers a function from labeled training data \ consisting of a set of training examples.[2] In supervised learning, each example is a pair consisting \ of an input object (typically a vector) and a desired output value (also called the supervisory signal). \ A supervised learning algorithm analyzes the training data and produces an inferred function, which can \ be used for mapping new examples. An optimal scenario will allow for the algorithm to correctly determine \ the class labels for unseen instances. This requires the learning algorithm to generalize from the training \ data to unseen situations in a "reasonable" way (see inductive bias).' wordcloud = WordCloud(background_color='white'stopwords=set(STOPWORDS)).generate(text) |
Now we can use matplotlib to display the word cloud
Python
1 2 3 4 |
plt.imshow(wordcloud.recolor(random_state=2020)) plt.title('Most Frequent Words') plt.axis("off") plt.show() |
We can also create a word cloud of any shape. All we have to do is to provide an image.
Let’s create a word cloud using the following image as the mask image.
Let’s load the image using Image function from the Pillow module.
To install the Pillow module, use the following command
1 |
pip install Pillow |
Python
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
import numpy as np import matplotlib.pyplot as plt from wordcloud import WordCloud, STOPWORDS from PIL import Image text = 'Supervised learning is the machine learning task of learning a function that maps an input to \ an output based on example input-output pairs.[1] It infers a function from labeled training data \ consisting of a set of training examples.[2] In supervised learning, each example is a pair consisting \ of an input object (typically a vector) and a desired output value (also called the supervisory signal). \ A supervised learning algorithm analyzes the training data and produces an inferred function, which can \ be used for mapping new examples. An optimal scenario will allow for the algorithm to correctly determine \ the class labels for unseen instances. This requires the learning algorithm to generalize from the training \ data to unseen situations in a "reasonable" way (see inductive bias).' mask_img = np.array(Image.open("image.jpg")) #word cloud requires image as a numpy array wordcloud = WordCloud(background_color='white', mask=mask_img, \ stopwords=set(STOPWORDS)).generate(text) wordcloud.to_file('word_cloud.png') #save wordcloud to the computer plt.imshow(wordcloud.recolor(random_state=2020)) plt.title('Most Frequent Words') plt.axis("off") plt.show() |