Posts

Showing posts with the label Python

Twitter Data streaming by using pipeline in PySpark

Twitter data analysis using PySpark along with Pipeline We are  processing  Twitter data using PySpark and we have tried to use all possible  methods  to understand Twitter data is being parsed in 2 stages which is sequential because of which we are using pipelines for these 3 stages Using fit function on pipeline then  model  is being trained then computation are being done from pyspark import SparkContext from pyspark . sql . session import SparkSession from pyspark . streaming import StreamingContext import pyspark . sql . types as tp from pyspark . ml import Pipeline from pyspark . ml . feature import StringIndexer , OneHotEncoderEstimator , VectorAssembler from pyspark . ml . feature import StopWordsRemover , Word2Vec , RegexTokenizer from pyspark . ml . classification import LogisticRegression from pyspark . sql import Row , Column import sys # define the function to get the predicted sentiment on the data received def get_prediction ( tweet_text ): t

Basics of Streaming Data and Spark Streaming by PySpark

Image
Here we will try to understand How to use a Machine Learning Model to Make Predictions on Streaming Data using PySpark Overview of this post: Streaming data is the most important concept in the machine learning space We will learn how to use a machine learning model (such as logistic regression) to make predictions on streaming data by using PySpark We’ll cover the basics of Streaming Data and Spark Streaming Every second, more than 8,500 Tweets are sent, more than 900 photos are uploaded on Instagram, more than 4,200 Skype calls are made, more than 78,000 Google Searches happen, and more than 2 million emails are sent (according to Internet Live Stats ), and in 2021 these figures are much higher. Primarily : How do we collect data at this scale? How do we ensure that our machine learning pipeline continues to  churn  out results as soon as the data is generated and collected? These are significant challenges the industry is facing and why the concept of Streaming Data is gaining more