Solega Co. Done For Your E-Commerce solutions.
  • Home
  • E-commerce
  • Start Ups
  • Project Management
  • Artificial Intelligence
  • Investment
  • More
    • Cryptocurrency
    • Finance
    • Real Estate
    • Travel
No Result
View All Result
  • Home
  • E-commerce
  • Start Ups
  • Project Management
  • Artificial Intelligence
  • Investment
  • More
    • Cryptocurrency
    • Finance
    • Real Estate
    • Travel
No Result
View All Result
No Result
View All Result
Home Artificial Intelligence

Implement Multinomial and Bernoulli Naive Bayes classifiers in python | by Pankaj Agrawal | Dec, 2024

Solega Team by Solega Team
December 26, 2024
in Artificial Intelligence
Reading Time: 44 mins read
0
Implement Multinomial and Bernoulli Naive Bayes classifiers in python | by Pankaj Agrawal | Dec, 2024
0
SHARES
1
VIEWS
Share on FacebookShare on Twitter


Pankaj Agrawal

An illustration comparing Multinomial and Bernoulli Naive Bayes classifiers. The left side depicts Multinomial Naive Bayes with word frequency bars, while the right shows Bernoulli Naive Bayes with binary presence/absence vector

example_train.csv: Contains all the training sentences.

Train Dataset

example_test.csv: Contains all the Test sentences.

Test Dataset

output:

convert label to a numerical variable

output:

split the dataframe into X and y labels.

output:

Now we have to convert the data into a format which can be used for training the model.

We’ll use the bag of words representation for each sentence (document).

Imagine breaking X in individual words and putting them all in a bag. Then we’ll pick all the unique words from the bag one by one and make a dictionary of unique words.

This is called vectorization of words. We have the class CountVectorizer() in scikit learn to vectorize the words.

Here “vec” is an object of class “CountVectorizer()”. This has a method called “fit()” which converts a corpus of documents to a matrix of ‘tokens’.

“Countvectorizer()” shall convert the documents into a set of unique words alphabetically sorted and indexed.

Above statement shall return a Vector containing all words and size of this Vector is 39.

So What is stop words here?

We can see a few trivial words such as ‘and’,’is’,’of’, etc.

These words don’t really make any difference in classifying a document. These are called **stop words**, so its recommended to get rid of them.

We can remove them by passing a parameter stop_words=’english’ while instantiating “Countvectorizer()” as mentioned above:

Above will eliminate all stop words and now vector size will be down from 39 to 23

So our final dictionary is made of 23 words (after discarding the stop words). Now, to do classification, we need to represent all the documents with these words (or tokens) as features.

Every document will be converted into a feature vector representing presence of these words in that document. Let’s convert each of our training documents in to a feature vector.

convert this sparse matrix into a more easily interpretable array:

output :

To make the dataset more readable, let us examine the vocabulary and the document-term matrix together in a pandas dataframe. The way to convert a matrix into a dataframe is

output :

now import and transform the test data

Our test data contains

output:

convert label to a numerical variable

output:

Convert to numpy

output:

Transform the test data

For Train dataset

  • vect.fit(train): learns the vocabulary of the training data
  • vect.transform(train) : uses the fitted vocabulary to build a document-term matrix from the training data

For Test dataset

  • vect.transform(test) : uses the fitted vocabulary to build a document-term matrix from the testing data (and ignores tokens it hasn’t seen before)

output :

output:

Hence proven that Statement “Name some good movie with good script.” Is from Movie Class.

output:

Hence it proved through Bernoulli Naive Bayes as well that Statement “Name some good movie with good script.” Is from Movie Class.



Source link

Tags: AgrawalBayesBernoulliclassifiersDecImplementMultinomialNaivePankajpython
Previous Post

Crypto Market Bleeds Out Again as Bitcoin (BTC) Was Rejected at $100K (Market Watch)

Next Post

4 Content Types Every B2B Founder Should Be Sharing To Grow on LinkedIn | by Kurtis Pykes | The Startup | Dec, 2024

Next Post
4 Content Types Every B2B Founder Should Be Sharing To Grow on LinkedIn | by Kurtis Pykes | The Startup | Dec, 2024

4 Content Types Every B2B Founder Should Be Sharing To Grow on LinkedIn | by Kurtis Pykes | The Startup | Dec, 2024

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

POPULAR POSTS

  • 10 Ways To Get a Free DoorDash Gift Card

    10 Ways To Get a Free DoorDash Gift Card

    0 shares
    Share 0 Tweet 0
  • They Combed the Co-ops of Upper Manhattan With $700,000 to Spend

    0 shares
    Share 0 Tweet 0
  • Saal.AI and Cisco Systems Inc Ink MoU to Explore AI and Big Data Innovations at GITEX Global 2024

    0 shares
    Share 0 Tweet 0
  • Exxon foe Engine No. 1 to build fossil fuel plants with Chevron

    0 shares
    Share 0 Tweet 0
  • They Wanted a House in Chicago for Their Growing Family. Would $650,000 Be Enough?

    0 shares
    Share 0 Tweet 0
Solega Blog

Categories

  • Artificial Intelligence
  • Cryptocurrency
  • E-commerce
  • Finance
  • Investment
  • Project Management
  • Real Estate
  • Start Ups
  • Travel

Connect With Us

Recent Posts

Exploring Text-to-Speech Technology for Video Game Narration

Exploring Text-to-Speech Technology for Video Game Narration

July 1, 2025
South Korea lifts 14-year ban on ‘kimchi bonds’ after dollar-backed stablecoins frenzy

South Korea lifts 14-year ban on ‘kimchi bonds’ after dollar-backed stablecoins frenzy

July 1, 2025

© 2024 Solega, LLC. All Rights Reserved | Solega.co

No Result
View All Result
  • Home
  • E-commerce
  • Start Ups
  • Project Management
  • Artificial Intelligence
  • Investment
  • More
    • Cryptocurrency
    • Finance
    • Real Estate
    • Travel

© 2024 Solega, LLC. All Rights Reserved | Solega.co