python中兩個大數據幀上的嵌套迭代

Question 1

假設我有以下數據幀：

# data frame circles
    ID   x   y
    1    4   5
    2    5   6
# data frame points
    ID   x   y
    1    2   1
    2    1   2

我要檢查每個點是否都位于每個圓內。如果點位于圓內（根據一些計算），則獲取圓的ID并將其保存在單獨的列表中

# output of the list (ID of the circles)
[1]
[1 2]

這意味著point 1位于圓1內，point 2位于兩個圓中。

現在，我編寫了以下函數來完成定位工作。

        for i in range(len(points)):
            for j in range(len(circles)):
                get_point_coordinates = (points.loc[i].at["x"], points.loc[i].at["y"])
                get_circle_coordinates = (circles.loc[j].at["x"], circles.loc[j].at["y"])
                 ### calling library's function to calculate
                distance = distance.great_circle(get_point_coordinates, get_circle_coordinates ).km
                if distance <= 5:
                    list[i].append(circles.loc[j].at["ID"])

這是一個嵌套循環，用于迭代每個點并逐個檢查所有圓。

問題是：原始數據幀超過100000行，因此需要永遠迭代。

我讀了一些關于使用apply處理海量數據的帖子。因此，我嘗試了以下方法，但沒有奏效（錯誤：Series的真值不明確）。

    for i in range(len(circles)):
        newlist = newDataFrame['result'].apply(get_distance_function(circles.loc[i].at["x"], circles.loc[i].at["y"], points['x'], points['y']))

但我認為這仍然是一個問題，因為我只去掉了內部for循環，我仍然需要迭代100000次，而不是100000*100000次

那么，有什么更好的主意嗎？或者這種方法是最短的，我應該糾正錯誤？

Question 2

import pandas as pd
points = pd.DataFrame({"ID": [1, 2], "x": [4, 5], "y": [5, 6]})
circle = pd.DataFrame({"ID": [1, 2], "x": [2, 1], "y": [1, 2]})

為了得到（點、圓）的所有組合，我們可以進行交叉連接。

new_df = points.merge(circle, how='cross', suffixes=["_point", "_circle"])
new_df

    ID_point   x_point  y_point ID_circle   x_circle    y_circle
0          1         4        5         1          2           1
1          1         4        5         2          1           2
2          2         5        6         1          2           1
3          2         5        6         2          1           2

這樣，我們可以在每行級別將一個點與一個圓進行比較。我們在行級別使用apply（axis=1）。計算距離并將其添加為新列。

import math

# this is a Euclidean distance function (feel free to change it to suit your need)
def get_distance_function(x1, y1, x2, y2):
    return math.sqrt((x1-x2)**2 + (y1-y2)**2)


new_df["distance"] = new_df.apply(lambda row: get_distance_function(row["x_point"], row["y_point"], row["x_circle"], row["y_circle"]), axis=1)
new_df

    ID_point    x_point y_point ID_circle   x_circle    y_circle    distance
0          1          4       5         1          2           1    4.472136
1          1          4       5         2          1           2    4.242641
2          2          5       6         1          2           1    5.830952
3          2          5       6         2          1           2    5.656854

通過該距離，我們可以檢查它是否在半徑范圍內（在本例中設置為5），并將ID_point按ID_circle分組，使其成為一個列表。

radius = 5
new_df[new_df["distance"]<=radius].groupby("ID_circle")["ID_point"].apply(list).reset_index()

   ID_circle    ID_point 
0          1         [1]
1          2         [1]

Answer 1

import pandas as pd
points = pd.DataFrame({"ID": [1, 2], "x": [4, 5], "y": [5, 6]})
circle = pd.DataFrame({"ID": [1, 2], "x": [2, 1], "y": [1, 2]})

為了得到（點、圓）的所有組合，我們可以進行交叉連接。

new_df = points.merge(circle, how='cross', suffixes=["_point", "_circle"])
new_df

    ID_point   x_point  y_point ID_circle   x_circle    y_circle
0          1         4        5         1          2           1
1          1         4        5         2          1           2
2          2         5        6         1          2           1
3          2         5        6         2          1           2

這樣，我們可以在每行級別將一個點與一個圓進行比較。我們在行級別使用apply（axis=1）。計算距離并將其添加為新列。

import math

# this is a Euclidean distance function (feel free to change it to suit your need)
def get_distance_function(x1, y1, x2, y2):
    return math.sqrt((x1-x2)**2 + (y1-y2)**2)


new_df["distance"] = new_df.apply(lambda row: get_distance_function(row["x_point"], row["y_point"], row["x_circle"], row["y_circle"]), axis=1)
new_df

    ID_point    x_point y_point ID_circle   x_circle    y_circle    distance
0          1          4       5         1          2           1    4.472136
1          1          4       5         2          1           2    4.242641
2          2          5       6         1          2           1    5.830952
3          2          5       6         2          1           2    5.656854

通過該距離，我們可以檢查它是否在半徑范圍內（在本例中設置為5），并將ID_point按ID_circle分組，使其成為一個列表。

radius = 5
new_df[new_df["distance"]<=radius].groupby("ID_circle")["ID_point"].apply(list).reset_index()

   ID_circle    ID_point 
0          1         [1]
1          2         [1]

国产日韩精品视频_2020久久国产最新免费观看_国内久久久久影院精品_日本一区二区视频在线

python中兩個大數據幀上的嵌套迭代

熱門問答

SwiftUI NavigationStack：列表視圖中出現意外的導航行為

內容溢出擴展頁面高度而不是默認滾動

使用D.I在.NET AWS Lambda中使用MySql.Data

如果同一發件人的另一封電子郵件有標簽，是否可以自動應用標簽

在oracle中將國際格式數字轉換為德語格式

打亂tensorflow數據集中的批次

如何在Java中復制一個ArrayList到另一個ArrayList

GBASE 8s CDC能夠支持的數據類型都有什么？

瀏覽器控制臺打印跟項目行數打印不是同一行是什么問題

為什么這里添加均值和errorbar的位置不對呢？

如何測試字符串的類型

Konva自定義圖形的問題