Tech Studio Online

Posts

Showing posts from November, 2021

Twitter Data streaming by using pipeline in PySpark

- November 06, 2021

Twitter data analysis using PySpark along with Pipeline We are processing Twitter data using PySpark and we have tried to use all possible methods to understand Twitter data is being parsed in 2 stages which is sequential because of which we are using pipelines for these 3 stages Using fit function on pipeline then model is being trained then computation are being done from pyspark import SparkContext from pyspark . sql . session import SparkSession from pyspark . streaming import StreamingContext import pyspark . sql . types as tp from pyspark . ml import Pipeline from pyspark . ml . feature import StringIndexer , OneHotEncoderEstimator , VectorAssembler from pyspark . ml . feature import StopWordsRemover , Word2Vec , RegexTokenizer from pyspark . ml . classification import LogisticRegression from pyspark . sql import Row , Column import sys # define the function to get the predicted sentiment on the data received def get_prediction ( tweet_text ): t...