Quantcast
Channel: SreeRam Hadoop Notes
Browsing all 25 articles
Browse latest View live

Pig : Data types and Operators

 Data types:  simple data types: ---------------------   int --> 32 bit integer.   long ---> 64 bit "   float --> 32 bit float [ not available in             latest version ]   double -->...

View Article



Pig : How to perform grouping by Multiple Columns

 how to perform grouping by multiple columns. ------------------------------------------- task: mutiple grouping with mulitiple aggregations . sql:  select dno, sex , sum(sal) ,         count(*),...

View Article

Pig : Entire Column Aggregations

 Entire column aggregations. select sum(sal) from emp; grunt> describe emp emp: {id: int,name: chararray,sal: int,sex: chararray,dno: int} grunt> esal = foreach emp generate sal; grunt> rsum =...

View Article

Pig : Word Count Using Pig Data Flow

Word Count Using Pig DataFlow: [cloudera@quickstart ~]$ cat comment hadoop is great spark is great hadoop and spark combination is great [cloudera@quickstart ~]$ hadoop fs -copyFromLocal comment...

View Article

Spark : Entire Column Aggregations

 Entire Column Aggregations: sql:   select sum(sal) from emp; scala> val emp = sc.textFile("/user/cloudera/spLab/emp") emp: org.apache.spark.rdd.RDD[String] = /user/cloudera/spLab/emp...

View Article


Spark : Handling CSV files .. Removing Headers

scala> val l = List(10,20,30,40,50,56,67) scala> val r2 = r.collect.reverse.take(3) r2: Array[Int] = Array(67, 56, 50) scala> val r2 = sc.parallelize(r.collect.reverse.take(3)) r2:...

View Article

Spark : Conditional Transformations

 Conditions Transformations: val trans = emp.map{ x =>       val w = x.split(",");       val sal = w(2).toInt            val grade = if(sal>=70000) "A" else                    if(sal>=50000)...

View Article

Pig : CoGroup examples Vs Union Examples

-- co groupinggrunt> cat...

View Article


Spark : Union and Distinct

 Unions in spark.val l1 = List(10,20,30,40,50)val l2 = List(100,200,300,400,500)val r1 = sc.parallelize(l1)val r2 = sc.parallelize(l2)val r = r1.union(r2)scala> r.collect.foreach(println)[Stage...

View Article


Spark : CoGroup And Handling Empty Compact Buffers

Co Grouping using Spark:--------------------------scala>...

View Article

Pig : load Operator

Load Operator:-------------- to load data from file to relation. [cloudera@quickstart ~]$ cat > samp110020030040050090010012023123900800[cloudera@quickstart ~]$ hadoop fs -copyFromLocal samp1...

View Article

Pig : Subsetting using Filter, Limit, Sample

Techniques of subsetting relations: i) filter: used for condiational filtering. ii) limit : takes first n number of tuples. iii) sample: to take random sample sets.     " with replace " model.filter:...

View Article

Pig : Foreach Operator

Foreach Operator:-------------------grunt> emp = load 'piglab/emp' using PigStorage(',')>>  as (id:int, name:chararray, sal:int,>>     sex:chararray, dno:int);i) to copy data from one...

View Article


Spark : Joins

[cloudera@quickstart ~]$ hadoop fs -copyFromLocal emp spLab/e[cloudera@quickstart ~]$ hadoop fs -copyFromLocal dept spLab/d[cloudera@quickstart ~]$ hadoop fs -cat...

View Article

Spark : Joins 2

Denormalizing datasets using Joins[cloudera@quickstart ~]$ cat > childrenc101,p101,Ravi,34 c102,p101,Rani,24c103,p102,Mani,20c104,p103,Giri,22c105,p102,Vani,22[cloudera@quickstart ~]$ cat >...

View Article


Pig : Order [ Sorting ] , exec, run , pig

 order :-   to sort data (tuples) in ascending or descending order. emp = load 'piglab/emp'     using PigStorage(',')     as (id:int, name:chararray,    sal:int, sex:chararray, dno:int); e1 = order emp...

View Article

Pig : Joins

[cloudera@quickstart ~]$ hadoop fs -cat spLab/e 101,aaaa,40000,m,11 102,bbbbbb,50000,f,12 103,cccc,50000,m,12 104,dd,90000,f,13 105,ee,10000,m,12 106,dkd,40000,m,12 107,sdkfj,80000,f,13...

View Article


Pig : Cross Operator to Cartisian

 Cross: -----   used cartisian product.   each element of left set, joins with each element of right set.  ds1 --> (a)          (b)          (c)  ds2 --> (1)          (2)  x = cross ds1, ds3...

View Article

Pig : UDFs

Pig UDFS ----------  UDF ---> user defined functions.     adv:       i)  custom functionalities.      ii)  reusability. Pig UDFs can be developed by    java   python    ruby    c++    javascript...

View Article

Spark : Spark streaming and Kafka Integration

steps: 1)  start zookeper server 2)  Start Kafka brokers [ one or more ] 3)  create topic . 4)  start console producer [ to write messages into topic ] 5) start console consumer [ to test , whether...

View Article
Browsing all 25 articles
Browse latest View live




Latest Images