Tuesday, July 15, 2014

Big Data Hadoop Hive Getting Max of a Count


This Example about a query to get Max of a count.

Pre-Requiste
- Basic Hadoop Knowledge
- Basic Hive Knowledge
- Basic SQL Knowledge


My Input Data is something like as below,

1,iphone,2000,abc
2,iphone,3000,abc1
3,nokia,4000,abc2
4,sony,5000,abc3
5,nokia,6000,abc4
6,iphone,7000,abc5
7,nokia,8500,abc6
Problem:
In Hive we can perform group by as we do in ANSI SQL Example as below,

select d.phnName,count(*) from phnDetails d group by d.phnName

The output of the above query is as below,

iphone 3
nokia 3
sony 1

You might have a scenario something like to retrieve only values for equal to Max

Example if you need output like as below,

iphone 3
nokia 3

Resolution:

We need to use multiple Sub Queries to perform this operation

select c.phnName, c.counter 
from 
(select d.phnName as phnName, count(*) as counter from phnDetails d group by d.phnName ) c 
join 
(select max(f.counter) as countmax from
(select cnt.phnName as phnName, count(*) as counter from phnDetails cnt group by cnt.phnName ) f) g 
where c.counter = g.countmax;

Output is as below,



Queries built in multiple iterations as below,

CREATE TABLE phnDetails ( id INT, phnName STRING, price INT, details STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH '/home/training/Phone/phones.txt' OVERWRITE INTO TABLE phnDetails;

select * from phnDetails;


select d.phnName, count(*) from phnDetails d group by d.phnName;


select c.phnName, c.counter from 
(select d.phnName as phnName, count(*) as counter from phnDetails d group by d.phnName ) c ;

select max(f.counter) as countmax from
(select cnt.phnName as phnName, count(*) as counter from phnDetails cnt group by cnt.phnName ) f ;

select max(f.counter) as countmax from
(select cnt.phnName as phnName, count(*) as counter from phnDetails cnt group by cnt.phnName ) f ;

select c.phnName, c.counter 
from 
(select d.phnName as phnName, count(*) as counter from phnDetails d group by d.phnName ) c 
join 
(select max(f.counter) as countmax from
(select cnt.phnName as phnName, count(*) as counter from phnDetails cnt group by cnt.phnName ) f) g 
where c.counter = g.countmax;