Text

The text package offers two main function which are clean_sentence, de_contract, generate_ngrams and generate_bigrams

The clean_sentence helper function clean the sentence by removing the noise from it.

from helperfns.text import *
# cleans the sentence
print(clean_sentence("text 1 # https://url.com/bla1/blah1/"))

Here is the table of arguments for the clean_sentence function:

Argument	Description	Type
sent	Input sentence	str
lower	Flag to convert to lower case (default: True)	bool

If you want to get a list of all english words you can do it as follows:

# list of all english words
print(english_words)

You can use the de_contract method to de-contract strings as follows:

# converts strings like `I'm` to 'I am'
print(de_contract("I'm"))

Here is the table of arguments for the de_contract function:

Argument	Description	Type
word	Word to de-contract	str

You can also generate bigrams using the generate_bigrams as follows:

# generate bigrams from a list of word
print(text.generate_bigrams(['This', 'film', 'is', 'terrible']))

Here is the table of arguments for the generate_bigrams function:

Argument	Description	Type
x	List of input elements	list

Apart from generating bigrams helperfns.text also provides you with a utility to generate n-grams using the generate_ngrams. Here is an example of how you can use this function

# generates n-grams from a list of words
print(text.generate_ngrams(['This', 'film', 'is', 'terrible']))

Here is a table of arguments for the generate_ngrams function.

Argument	Description	Type
x	List of input elements	list
grams	Number of grams for generating n-grams (default: 3)	int