Posts

Showing posts from August, 2021

Word count program by MapReduce job

This is simple Map Reduce Job to process any text file and give us word with occurrences as an output Program: package com . dpq . retail ; mport java . io . IOException ; import org . apache . hadoop . conf . Configuration ; import org . apache . hadoop . fs . Path ; import org . apache . hadoop . io . IntWritable ; import org . apache . hadoop . io . LongWritable ; import org . apache . hadoop . io . Text ; import org . apache . hadoop . mapreduce . Job ; import org . apache . hadoop . mapreduce . Mapper ; import org . apache . hadoop . mapreduce . Reducer ; import org . apache . hadoop . mapreduce . lib . input . FileInputFormat ; import org . apache . hadoop . mapreduce . lib . output . FileOutputFormat ; import org . apache . hadoop . util . GenericOptionsParser ; public class WordCountDriver { public static void main ( String [] args ) throws IOException , ClassNotFoundException , InterruptedException { Configuration c = new Configuration () ; String [] fi

Analysing Stocks data and getting maximum selling price from stock dataset using MapReduce job

MaxClosingPricingMapReduceApp Here we are getting maximum selling price for each and every stock symbol for last 20 years I have provided small sample dataset and run same progam with 10 GB data on cluster with 10 mappers and it took around 35 secs to process data We have added Partioner just to understand how partition is partiioning data and mapper is being assigned to process that particular partition We have used Map Reduce job and HDFS storage to get above stats later I am planning to compare it with Spark and we will See how Spark is speeding up process and how it will reduce I/O basically Also sample data is present in inputFile directory in same project and to see what is insite in data Sample Data: ABCSE,B7J,2009-08-14,7.93,7.94,7.70,7.85,64600,7.68 ABCSE,B6J,2009-08-14,7.93,7.94,7.70,4.55,64600,6.68 ABCSE,B8J,2009-08-14,7.93,7.94,7.70,6.85,64600,4.68 ABCSE,B9J,2009-08-14,7.93,7.94,7.70,8.85,64600,73.68 ABCSE,A7J,2009-08-14,7.93,7.94,7.70,9.85,64600,7.68 ABCSE,S7J,2009-08-14,7

Program for n’th node from the end of a Linked List

Image
Given a Linked List and a number n, write a function that returns the value at the n’th node from the end of the Linked List. For example, if the input is below list and n = 3, then output is “B” Way 1 (Use length of linked list) 1) Calculate the length of Linked List. Let the length be len. 2) Print the (len – n + 1)th node from the beginning of the Linked List. // Simple Java program to find n'th node from end of linked list class LinkedList {      Node head; // head of the list      /* Linked List node */      class Node {          int data;          Node next;          Node( int d)          {              data = d;              next = null ;          }      }      /* Function to get the nth node from the last of a         linked list */      void printNthFromLast( int n)      {          int len = 0 ;          Node temp = head;          // 1) count the number of nodes in Linked List          while (temp != null ) {              temp = temp.next;              len++;      

Implement Stack using Queues

Image
We are given a Queue data structure that supports standard operations like enqueue() and dequeue(). We need to implement a Stack data structure using only instances of Queue and queue operations allowed on the instances A stack can be implemented using two queues. Let stack to be implemented be ‘s’ and queues used to implement be ‘q1’ and ‘q2’. Stack ‘s’ can be implemented in two ways: Method 1 (By making push operation costly) This method makes sure that newly entered element is always at the front of ‘q1’, so that pop operation just dequeues from ‘q1’. ‘q2’ is used to put every new element at front of ‘q1’. push(s, x)  operation’s step are described below: Enqueue x to q2 One by one dequeue everything from q1 and enqueue to q2. Swap the names of q1 and q2 pop(s)  operation’s function are described below: Dequeue an item from q1 and return it. Below is the implementation of the above approach: /* Java Program to implement a stack using  two queue */ import java.util.*;   class GfG {

Implement queue using Stacks

Image
We are given a stack data structure with push and pop operations, the task is to implement a queue using instances of stack data structure and operations on them. A queue can be implemented using two stacks. Let queue to be implemented be q and stacks used to implement q be stack1 and stack2. q can be implemented in two ways: Way 1 (By making enQueue operation costly) : This Way makes sure that oldest entered element is always at the top of stack 1, so that deQueue operation just pops from stack1. To put the element at top of stack1, stack2 is used. enQueue(q, x): While stack1 is not empty, push everything from stack1 to stack2. Push x to stack1 (assuming size of stacks is unlimited). Push everything back to stack1. Here time complexity will be O(n) deQueue(q): If stack1 is empty then error Pop an item from stack1 and return it Here time complexity will be O(1) // Java program to implement Queue using // two stacks with costly enQueue() import java.util.*; class GFG { static class Que

HDFS important and useful commands

To check Hadoop Version $ hadoop version Creating user home directory hadoop fs -mkdir -p /user/dpq  (-p will create directory if directory not present if directory already present then it wont throw exception) hadoop fs -mkdir  /user/retails (without -p  if directory already present then it will throw exception) List all directories hdfs dfs -ls / hdfs dfs -ls /user Copy file from local to HDFS hadoop fs -copyFromLocal /etc/data/retial_data.csv /user/retailshadoop fs -put /etc/data/retial_data.csv /user/retails (if 'retail_data.csv' already present inside '/user/retails' hdfs directory then it will throw excpetion 'File already present')hadoop fs -put -f /etc/data/retial_data.csv /user/retails (-f will forcefully override retail_data.csv if already present) Checking data in HDFS hdfs dfs -cat /user/retails/retial_data.csv Create empty file on HDFS hdfs dfs -touchz /user/retails/empty.csv Copy file from HDFS hdfs dfs -get /user/retails/retial_data.csv hdfs dfs -

Apache Kafka®, Kafka Streams, and ksqlDB to demonstrate real use cases - 4 (Kafka-console-consumer-read-specific-offsets-partitions)

How to read from a specific offset and partition with the Kafka Console Consumer Question: How do I read from a specific offset and partition of a Kafka topic? Example use case: You are confirming record arrivals and you’d like to read from a specific offset in a topic partition. In this tutorial you’ll learn how to use the Kafka console consumer to quickly debug issues by reading from a specific offset as well as control the number of records you read. Short Answer Use the  kafka-console-consumer  command with the  --partition  and  --offset  flags to read from a specific partition and offset. kafka-console-consumer --topic example-topic --bootstrap-server broker:9092 \ --property print.key= true \ --property key.separator= "-" \ --partition 1 \ --offset 6 Initialize the project To get started, make a new directory anywhere you’d like for this project: mkdir console-consumer-read-specific-offsets-partition && cd console-consumer-read-specific-offsets-partition 2 Ge