问题描述
最近在使用java arrow进行数据处理,遇到了一些切片相关的问题。我在测试类中定义了一个 BufferAllocator
和一个数据生产函数 prepareData
。
public class TestAllocator {
private final RootAllocator rootAllocator = new RootAllocator();
private final BufferAllocator allocator = rootAllocator
.newChildAllocator("test",PropertyUtil.getLongProperty("child.allocator.init"),Long.MAX_VALUE);
public VectorSchemaRoot prepareData(){
List<FieldVector> fieldVectors = new ArrayList<>();
BigIntVector fieldVector1 = new BigIntVector("totalPrice",allocator);
IntVector fieldVector2 = new IntVector("ordNum",allocator);
Float8Vector fieldVector3 = new Float8Vector("avgPrice",allocator);
VarCharVector fieldVector4 = new VarCharVector("poiName",allocator);
VarCharVector fieldVector5 = new VarCharVector("cityName",allocator);
List<String> cityNames = new ArrayList<String>(){{
add("beijing");
add("shanghai");
add("guangzhou");
add("shenzhen");
add("chengdu");
}};
int rowNum = 10;
String a = UUID.randomUUID().toString();
for(int i = 0; i < rowNum; i++){
fieldVector1.setSafe(i,100000L);
fieldVector2.setSafe(i,3456);
fieldVector3.setSafe(i,26.45);
fieldVector4.setSafe(i,a.getBytes());
fieldVector5.setSafe(i,cityNames.get(i%5).getBytes());
}
fieldVector1.setValueCount(rowNum);
fieldVector2.setValueCount(rowNum);
fieldVector3.setValueCount(rowNum);
fieldVector4.setValueCount(rowNum);
fieldVector5.setValueCount(rowNum);
fieldVectors.add(fieldVector1);
fieldVectors.add(fieldVector2);
fieldVectors.add(fieldVector3);
fieldVectors.add(fieldVector4);
fieldVectors.add(fieldVector5);
return new VectorSchemaRoot(fieldVectors);
}
}
以下是我想描述的要点和例子。
- 父VectorSchemaRoot切片有一个子VectorSchemaRoot,关闭父VectorSchemaRoot不会影响子VectorSchemaRoot,堆外内存空间不会被释放。只有父子VectorSchemaRoot都关闭,堆外内存空间才会被释放。
@Test
public void test1(){
System.out.println(allocator.toString());
System.out.println("**************************");
VectorSchemaRoot root = prepareData();
System.out.println("root row count is " + root.getRowCount());
System.out.println(allocator.toString());
System.out.println("**************************");
VectorSchemaRoot root2 = root.slice(0,9);
root.close();
System.out.println("root2 row count is " + root.getRowCount());
System.out.println(allocator.toString());
System.out.println("**************************");
root2.close();
System.out.println(this.allocator.toString());
System.out.println("**************************");
}
输出为:
Allocator(test) 104857600/0/0/9223372036854775807 (res/actual/peak/limit)
**************************
root row count is 10
Allocator(test) 104857600/180224/180224/9223372036854775807 (res/actual/peak/limit)
**************************
root2 row count is 9
Allocator(test) 104857600/180224/180224/9223372036854775807 (res/actual/peak/limit)
**************************
Allocator(test) 104857600/0/180224/9223372036854775807 (res/actual/peak/limit)
**************************
- 但是当子 VectorSchemaRoot 是父的完整切片时
VectorSchemaRoot(like
slice(0)
),调用 parent VectorSchemaRoot 相当于调用clear函数 子 VectorSchemaRoot。
修改 test1
中的代码:VectorSchemaRoot root2 = root.slice(0,9);
为 VectorSchemaRoot root2 = root.slice(0);
。输出是
Allocator(test) 104857600/0/0/9223372036854775807 (res/actual/peak/limit)
**************************
root row count is 10
Allocator(test) 104857600/180224/180224/9223372036854775807 (res/actual/peak/limit)
**************************
root2 row count is 10
root2 is:
totalPrice:
ordNum:
avgPrice:
poiName:
cityName:
Allocator(test) 104857600/0/180224/9223372036854775807 (res/actual/peak/limit)
**************************
Allocator(test) 104857600/0/180224/9223372036854775807 (res/actual/peak/limit)
**************************
- 无论子 VectorSchemaRoot 是否显式 定义,只要使用slice函数,就必须调用close 否则会导致内存泄漏。因此,最佳实践 是明确定义每个切片的子 VectorSchemaRoot,以及 使用后调用 close 函数。
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)