之前部门实现row_number是使用的transform,我觉得用UDF实现后,平时的使用会更方便,免去了transform相对繁琐的语法。
<wbr><wbr><wbr>用到的测试表为:</wbr></wbr></wbr>
hive> desc row_number_test;
OK
id1<wbr><wbr><wbr><wbr>int<br>
id2<wbr><wbr><wbr><wbr>string<br>
age<wbr><wbr><wbr><wbr>int<br>
score<wbr><wbr> double<br>
name<wbr><wbr><wbr>string</wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr>
<wbr></wbr>
hive> select * from row_number_test;
OK
2<wbr><wbr><wbr><wbr><wbr><wbr>t04<wbr><wbr><wbr><wbr>25<wbr><wbr><wbr><wbr><wbr>60.0<wbr><wbr><wbr>youlia<br>
1<wbr><wbr><wbr><wbr><wbr><wbr>t01<wbr><wbr><wbr><wbr>20<wbr><wbr><wbr><wbr><wbr>85.0<wbr><wbr><wbr>liujiannan<br>
1<wbr><wbr><wbr><wbr><wbr><wbr>t02<wbr><wbr><wbr><wbr>24<wbr><wbr><wbr><wbr><wbr>70.0<wbr><wbr><wbr>zengqiu<br>
2<wbr><wbr><wbr><wbr><wbr><wbr>t03<wbr><wbr><wbr><wbr>30<wbr><wbr><wbr><wbr><wbr>88.0<wbr><wbr><wbr>hongqu<br>
2<wbr><wbr><wbr><wbr><wbr><wbr>t03<wbr><wbr><wbr><wbr>27<wbr><wbr><wbr><wbr><wbr>70.0<wbr><wbr><wbr>yongqi<br>
1<wbr><wbr><wbr><wbr><wbr><wbr>t02<wbr><wbr><wbr><wbr>19<wbr><wbr><wbr><wbr><wbr>75.0<wbr><wbr><wbr>wangdong<br>
1<wbr><wbr><wbr><wbr><wbr><wbr>t02<wbr><wbr><wbr><wbr>24<wbr><wbr><wbr><wbr><wbr>70.0<wbr><wbr><wbr>zengqiu</wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr>
<wbr></wbr>
使用时要先在子查询中进行分区与排序,比如oracle中这样一句SQL:
select row_number() over (partition by id1 order by age desc)from row_number_test;
转换为hive语句应该是:
select row_number(id1) from<wbr><wbr>--partition by的字段传到row_number函数中去</wbr></wbr>
<wbr><wbr><wbr> (select *from row_number_test distribute by id1 sort by id1,age desc) a;</wbr></wbr></wbr>
<wbr></wbr>
如果partition by 两个字段:
select row_number() over (partition by id1,id2 orderby<wbr>score) from row_number_test;</wbr>
转换为hive语句应该是:
select row_number(id1,id2)<wbr><wbr>--partition by的字段传到row_number函数中去</wbr></wbr>
<wbr><wbr><wbr> from(select * from row_number_test distribute by id1,id2 sort byid1,id2,score) a;</wbr></wbr></wbr>
<wbr></wbr>
展示一下查询结果:
1.
select id1,id2,age,score,name,row_number(id1) rn from (select *from row_number_test distribute by id1 sort by id1,age desc) a;
<wbr></wbr>
OK
2<wbr><wbr><wbr><wbr><wbr><wbr>t03<wbr><wbr><wbr><wbr>30<wbr><wbr><wbr><wbr><wbr>88.0<wbr><wbr><wbr>hongqu<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>1<br>
2<wbr><wbr><wbr><wbr><wbr><wbr>t03<wbr><wbr><wbr><wbr>27<wbr><wbr><wbr><wbr><wbr>70.0<wbr><wbr><wbr>yongqi<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>2<br>
2<wbr><wbr><wbr><wbr><wbr><wbr>t04<wbr><wbr><wbr><wbr>25<wbr><wbr><wbr><wbr><wbr>60.0<wbr><wbr><wbr>youlia<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>3<br>
1<wbr><wbr><wbr><wbr><wbr><wbr>t02<wbr><wbr><wbr><wbr>24<wbr><wbr><wbr><wbr><wbr>70.0<wbr><wbr><wbr>zengqiu<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>1<br>
1<wbr><wbr><wbr><wbr><wbr><wbr>t02<wbr><wbr><wbr><wbr>24<wbr><wbr><wbr><wbr><wbr>70.0<wbr><wbr><wbr>zengqiu<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>2<br>
1<wbr><wbr><wbr><wbr><wbr><wbr>t01<wbr><wbr><wbr><wbr>20<wbr><wbr><wbr><wbr><wbr>85.0<wbr><wbr><wbr>liujiannan<wbr><wbr><wbr><wbr><wbr>3<br>
1<wbr><wbr><wbr><wbr><wbr><wbr>t02<wbr><wbr><wbr><wbr>19<wbr><wbr><wbr><wbr><wbr>75.0<wbr><wbr><wbr>wangdong<wbr><wbr><wbr><wbr><wbr><wbr><wbr>4</wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr>
<wbr></wbr>
2.
select id1,id2,age,score,name,row_number(id1,id2) rn from(select * from row_number_test distribute by id1,id2 sortby id1,id2,score) a;
<wbr></wbr>
OK
2<wbr><wbr><wbr><wbr><wbr><wbr>t04<wbr><wbr><wbr><wbr>25<wbr><wbr><wbr><wbr><wbr>60.0<wbr><wbr><wbr>youlia<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>1<br>
1<wbr><wbr><wbr><wbr><wbr><wbr>t02<wbr><wbr><wbr><wbr>24<wbr><wbr><wbr><wbr><wbr>70.0<wbr><wbr><wbr>zengqiu<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>1<br>
2<wbr><wbr><wbr><wbr><wbr><wbr>t03<wbr><wbr><wbr><wbr>27<wbr><wbr><wbr><wbr><wbr>70.0<wbr><wbr><wbr>yongqi<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>1<br>
1<wbr><wbr><wbr><wbr><wbr><wbr>t02<wbr><wbr><wbr><wbr>24<wbr><wbr><wbr><wbr><wbr>70.0<wbr><wbr><wbr> zengqiu<wbr><wbr><wbr><wbr> 2<br>
1<wbr><wbr><wbr><wbr><wbr><wbr>t02<wbr><wbr><wbr><wbr>19<wbr><wbr><wbr><wbr><wbr>75.0<wbr><wbr><wbr> wangdong<wbr><wbr><wbr><wbr>3<br>
1<wbr><wbr><wbr><wbr><wbr><wbr>t01<wbr><wbr><wbr><wbr>20<wbr><wbr><wbr><wbr><wbr>85.0<wbr><wbr><wbr>liujiannan<wbr><wbr><wbr><wbr><wbr>1<br>
2<wbr><wbr><wbr><wbr><wbr><wbr>t03<wbr><wbr><wbr><wbr>30<wbr><wbr><wbr><wbr><wbr>88.0<wbr><wbr><wbr> hongqu<wbr><wbr><wbr><wbr><wbr>2</wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr>
<wbr></wbr>
下面是代码,只实现了接收1个参数和2个参数的evaluator方法,参数再多的照搬代码就可以了,代码仅供参考:
package org.rowincrement;
import java.util.ArrayList;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
public class RowIncrement extends GenericUDTF {
Object[] result = new Object[1];
@Override
public void close() throws HiveException {
}
@Override
public StructObjectInspector initialize(ObjectInspector[] args)
throws UDFArgumentException {
if (args.length != 1) {
throw new UDFArgumentLengthException("RowIncrement takes only one argument");
}
if (!args[0].getTypeName().equals("int")) {
throw new UDFArgumentException("RowIncrement only takes an integer as a parameter");
}
ArrayList<String> fieldNames = new ArrayList<String>();
ArrayList<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>();
fieldNames.add("col1");
fieldOIs.add(PrimitiveObjectInspectorFactory.javaIntObjectInspector);
return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames,fieldOIs);
}
@Override
public void process(Object[] args) throws HiveException {
try
{
int n = Integer.parseInt(args[0].toString());
for (int i=0;i<n;i++)
{
result[0] = i+1;
forward(result);
}
}
catch (Exception e) {
throw new HiveException("RowIncrement has an exception");
}
}
public static void main(String args[])
{
Row_number t = new Row_number();
System.out.println(t.evaluate(123));
System.out.println(t.evaluate(123));
System.out.println(t.evaluate(123));
System.out.println(t.evaluate(1234));
System.out.println(t.evaluate(1234));
System.out.println(t.evaluate(1234));
System.out.println(t.evaluate(1235));
}
}
分享到:
相关推荐
02.hive内置函数--窗口分析函数--row_number_over.mp4
hive的udf函数实现
简单介绍了hive自定义函数的编写步骤以及使用。
udf函数,用户自定义函数,可以直接在sql语句中计算的函数 优点: 允许实现模块化的程序设计、方便修改代码、增加函数 UDF的执行速度很快,通过缓存计划在语句重复执行时降低代码的编译开销,比存储方法的执行效率...
hive中分组取topN、row_number、rank和dense_rank使用介绍
NULL 博文链接:https://chengjianxiaoxue.iteye.com/blog/2235666
Spark不能使用hive自定义函数
05.hive中如何自定义函数--json解析函数示例.mp4
hive数仓、hive SQL 、 hive自定义函数 、hive参数深入浅出
udf开发–做个简单脱敏udf保留前5位,后面全部替换成*****
大数据分析工具 hive 高级分析函数的使用与优化,笔记整理!
自定义 hive udf udaf 有url解析,获取网站主域名,根据ip获取区域码,有rownum,列聚合以及一些业务实现udf。
hive-udfhive自定义函数主要实现hive3种自定义函数1,udf函数,主要用于处理一对一数据处理2,udtf函数,主要用于处理一对多数据处理2,udaf函数,主要用与处理多对一数据聚合处理
地址转换成经纬度+两地址间距离计算+省市区位置解析(Java代码) Hive自定义函数的封装
hive自定义函数demo
大数据的hive资源的详细代码设计以及分享,望博友相互交流
* 功能:对一些敏感信息进行脱敏处理,替换方式可选择自定义替换,如'#','*'等,,如不指定脱敏符号,使用个随机字符替换 * 脱敏位置可自定义,不指定位置,会对数据进行全脱敏 * 例如身份证信息: ...
Hive自定义函数 一. UDF(user defined function) 背景 系统内置函数无法解决所有的实际业务问题,需要开发者自己编写函数实现自身的业务实现诉求。 应用场景非常多,面临的业务不同导致个性化实现很多,故udf...
利用Hive进行复杂用户行为大数据分析及优化案例(全套视频+课件+代码+讲义+工具软件),具体内容包括: 01_自动批量加载数据到hive 02_Hive表批量加载数据的脚本实现(一) ...17_Hive中使用Python脚本进行预处理
hive按月份加减udf范例,http://blog.csdn.net/xiaowenK/article/details/54290354