Tokenisation
import pandas as pdcontent = ['hey boss!','great..','nice@']df=pd.DataFrame(content,columns={'Sms content'})
df| Sms content | |
|---|---|
| 0 | hey boss! | 
| 1 | great.. | 
| 2 | nice@ | 
import re
def tokenize(text):
    tokens=re.split('\W+',text)
    return tokensdf['tokenized_text']=df['Sms content'].apply(lambda row : tokenize(row.lower()))
df.head()| Sms content | tokenized_text | |
|---|---|---|
| 0 | hey boss! | [hey, boss, ] | 
| 1 | great.. | [great, ] | 
| 2 | nice@ | [nice, ] |