What is BigData

Bigdata
======
How can we explain bigdata to someone who is completely new to it?
Bigdata can be expressed using 4 v’s
*Volume:size of data
*Velocity:at what speed data is generated
*variety:for example-cctv footage,video,emails are different varieties
*veracity:Trust worthiness of data(ie,can we use it for making inferences)

so in short,BIG DATA is :
More data getting generated at high speed in many varieties and
is trust worthy

we can store a 10 mb of data in our local system,but what if the data is in ZB or PB?

we need more infrastructure ..right?more infrastructure means more money…so here comes HADOOP for our rescue.

HADOOP is an open source framework to store and process BIG DATA

Popular posts from this blog

How to change column name in Dataframe and selection of few columns in Dataframe using Pyspark with example

What is Garbage collection in Spark and its impact and resolution

Window function in PySpark with Joins example using 2 Dataframes (inner join)