Text
The text package offers two main function which are clean_sentence, de_contract, generate_ngrams and generate_bigrams
The clean_sentence helper function clean the sentence by removing the noise from it.
from helperfns.text import *
# cleans the sentence
print(clean_sentence("text 1 # https://url.com/bla1/blah1/"))
Here is the table of arguments for the clean_sentence function:
Argument |
Description |
Type |
|---|---|---|
sent |
Input sentence |
str |
lower |
Flag to convert to lower case (default: True) |
bool |
If you want to get a list of all english words you can do it as follows:
# list of all english words
print(english_words)
You can use the de_contract method to de-contract strings as follows:
# converts strings like `I'm` to 'I am'
print(de_contract("I'm"))
Here is the table of arguments for the de_contract function:
Argument |
Description |
Type |
|---|---|---|
word |
Word to de-contract |
str |
You can also generate bigrams using the generate_bigrams as follows:
# generate bigrams from a list of word
print(text.generate_bigrams(['This', 'film', 'is', 'terrible']))
Here is the table of arguments for the generate_bigrams function:
Argument |
Description |
Type |
|---|---|---|
x |
List of input elements |
list |
Apart from generating bigrams helperfns.text also provides you with a utility to generate n-grams using the generate_ngrams. Here is an example of how you can use this function
# generates n-grams from a list of words
print(text.generate_ngrams(['This', 'film', 'is', 'terrible']))
Here is a table of arguments for the generate_ngrams function.
Argument |
Description |
Type |
|---|---|---|
x |
List of input elements |
list |
grams |
Number of grams for generating n-grams (default: 3) |
int |