问题描述
benchmark_x benchmark_y ref_point_x ref_point_y
0 525039.140 175445.518 525039.145 175445.539
1 525039.022 175445.542 525039.032 175445.568
2 525038.944 175445.558 525038.954 175445.588
3 525038.855 175445.576 525038.859 175445.576
4 525038.797 175445.587 525038.794 175445.559
5 525038.689 175445.609 525038.679 175445.551
6 525038.551 175445.637 525038.544 175445.577
7 525038.473 175445.653 525038.459 175445.594
8 525038.385 175445.670 525038.374 175445.610
9 525038.306 175445.686 525038.289 175445.626
我试图找到从线到基准的最短距离,如果线高于基准,则距离为正,如果低于基准,距离为负。见下图:
我像这样使用了 KDTree
中的 scipy
:
from scipy.spatial import KDTree
tree=KDTree(df[["benchmark_x","benchmark_y"]])
test = df.apply(lambda row: tree.query(row[["ref_point_x","ref_point_y"]]),axis=1)
test=test.apply(pd.Series,index=["distance","index"])
这似乎有效,只是由于线低于基准而无法捕获负值。
解决方法
# recreating your example
columns = "benchmark_x benchmark_y ref_point_x ref_point_y".split(" ")
data = """525039.140 175445.518 525039.145 175445.539
525039.022 175445.542 525039.032 175445.568
525038.944 175445.558 525038.954 175445.588
525038.855 175445.576 525038.859 175445.576
525038.797 175445.587 525038.794 175445.559
525038.689 175445.609 525038.679 175445.551
525038.551 175445.637 525038.544 175445.577
525038.473 175445.653 525038.459 175445.594
525038.385 175445.670 525038.374 175445.610
525038.306 175445.686 525038.289 175445.626"""
data = [float(x) for x in data.replace("\n"," ").split(" ") if len(x)>0]
arr = np.array(data).reshape(-1,4)
df = pd.DataFrame(arr,columns=columns)
# adding your two new columns to the df
from scipy.spatial import KDTree
tree=KDTree(df[["benchmark_x","benchmark_y"]])
df["distance"],df["index"] = tree.query(df[["ref_point_x","ref_point_y"]])
现在要比较一条线是否在另一条线上,我们必须在相同的 x 位置评估 y。因此,我们需要为另一条线的 x 位置插入 y 点。
df = df.sort_values("ref_point_x") # sorting is required for interpolation
xy_refpoint = df[["ref_point_x","ref_point_y"]].values
df["ref_point_y_at_benchmark_x"] = np.interp(df["benchmark_x"],xy_refpoint[:,0],1])
最后可以评估和应用您的标准:
df["distance"] = np.where(df["ref_point_y_at_benchmark_x"] < df["benchmark_y"],-df["distance"],df["distance"])
# or change the < to <,>,<=,>= as you wish