使用字符串包含和多列合并2個dfs

我有兩個要合并的DF。但我需要根據字符串包含并使用多個列來合并它們

df_1

    IN          Start_Time          Description                                                                     Per_Extr
0   IN7305517   2022-07-24 00:06:59 ABEND JOB PP_BRAI_VAR_CARTAO_IND_IBI_D and JOB_STREAM_NAME P26_BRAI_RS2...      FROM : 2022/01/08 TO : 2022/12/09
1   IN7305465   2022-07-24 00:09:49 ABEND JOB PP_AAAR_4898_POUP_MOV_TDCH_D and JOB_STREAM_NAME P26_AAAR_006_TSA...  FROM : 2022/01/08 TO : 2022/12/09
2   IN7305466   2022-07-24 00:10:16 ABEND JOB PP_AAAR_4898_POUPMOV_D and JOB_STREAM_NAME P26_AAAR_006_TSA...        FROM : 2022/01/08 TO : 2022/12/09
3   IN7305493   2022-07-24 00:20:27 ABEND JOB PP_BGDTPRODHBACMS102020_01_M and JOB_STREAM_NAME P26_BGDTDCHF_PUM...  FROM : 2022/01/08 TO : 2022/12/09

df_2

    JOB_STREAM_NAME     JOB_NAME
NaN P26_BRAI_RS2        PP_BRAI_VAR_CARTAO_IND_IBI_D
NaN P26_BRAI_VAR_TOD    PP_BRAI_VAR_CARTAO_IND_IBI_D
NaN P26_AAAR_006_TSA    PP_AAAR_4898_POUP_MOV_TDCH_D
NaN P26_AAAR_006_TSA    PP_AAAR_4898_POUPMOV_D
NaN P26_BGDTDCHF_PUM    PP_BGDTPRODHBACMS102020_01_M

描述列中有JOB_NAME和JOB_STREAM_NAME

我的目標是這樣一個df:merged_df

    IN          JOB_STREAM_NAME     JOB_NAME                        Start_Time          Description                                                                     Per_Extr
0   IN7305517   P26_BRAI_RS2        PP_BRAI_VAR_CARTAO_IND_IBI_D    2022-07-24 00:06:59 ABEND JOB PP_BRAI_VAR_CARTAO_IND_IBI_D and JOB_STREAM_NAME P26_BRAI_RS2...      FROM : 2022/01/08 TO : 2022/12/09
1   NaN         P26_BRAI_VAR_TOD    PP_BRAI_VAR_CARTAO_IND_IBI_D    NaN                 NaN                                                                             NaN
2   IN7305465   P26_AAAR_006_TSA    PP_AAAR_4898_POUP_MOV_TDCH_D    2022-07-24 00:10:16 ABEND JOB PP_AAAR_4898_POUPMOV_D and JOB_STREAM_NAME P26_AAAR_006_TSA...        FROM : 2022/01/08 TO : 2022/12/09
3   IN7305466   P26_AAAR_006_TSA    PP_AAAR_4898_POUPMOV_D          2022-07-24 00:10:16 ABEND JOB PP_AAAR_4898_POUPMOV_D and JOB_STREAM_NAME P26_AAAR_006_TSA...        FROM : 2022/01/08 TO : 2022/12/09
4   IN7305493   P26_AAAR_006_TSA    PP_AAAR_4898_POUPMOV_D          2022-07-24 00:20:27 ABEND JOB PP_BGDTPRODHBACMS102020_01_M and JOB_STREAM_NAME P26_BGDTDCHF_PUM...  FROM : 2022/01/08 TO : 2022/12/09

請注意,作業PP_BRAI_VAR_CARTAO_IND_IBI_D位于2 JOB_STREAM_NAME中,其中一個作業沒有in,這就是為什么在merged_df中JOB_STREAM_NAME=P26_BRAI_VAR_TOD中的作業沒有in(NaN)的原因

我被指示對一個列執行此操作,但對多個列執行相同的操作。

在一篇專欄文章中,我使用了這種方法:

jobs_list= "|".join(map(str, df_2['JOB_NAME']))
new_df.insert(0, 'merge_key', df_1['Description'].str.extract("("+jobs_list+")", expand=False))
df_merged = new_df.merge(df_1, how='right', left_on='merge_key', right_on='JOB_NAME').drop('merge_key', axis=1)

你們能幫我嗎?

? 最佳回答:

您需要一個鍵來合并這兩者,所以我們提取這些鍵并使用它們進行合并。

# extract the keys from the description and create addl columns
# you can always drop these afterwards

df[['JOB_NAME', 'JOB_STREAM_NAME' ]]=df['Description'].str.extract(r'JOB\s\b(\w+)\b.*?JOB_STREAM_NAME\s\b(\w+)\b' )

#merge on steam_name and job_name, since columns names are common, these won't be repeated
df3=df2.merge(df, on=['JOB_STREAM_NAME','JOB_NAME'], how='left')
df3

# drop the addl columns
df=df.drop(columns=['JOB_STREAM_NAME','JOB_NAME'])
    JOB_STREAM_NAME     JOB_NAME    IN  Start_Time  Description     Per_Extr
0   P26_BRAI_RS2    PP_BRAI_VAR_CARTAO_IND_IBI_D    IN7305517   2022-07-24 00:06:59     ABEND JOB PP_BRAI_VAR_CARTAO_IND_IBI_D and JOB...   FROM : 2022/01/08 TO : 2022/12/09
1   P26_BRAI_VAR_TOD    PP_BRAI_VAR_CARTAO_IND_IBI_D    NaN     NaN     NaN     NaN
2   P26_AAAR_006_TSA    PP_AAAR_4898_POUP_MOV_TDCH_D    IN7305465   2022-07-24 00:09:49     ABEND JOB PP_AAAR_4898_POUP_MOV_TDCH_D and JOB...   FROM : 2022/01/08 TO : 2022/12/09
3   P26_AAAR_006_TSA    PP_AAAR_4898_POUPMOV_D  IN7305466   2022-07-24 00:10:16     ABEND JOB PP_AAAR_4898_POUPMOV_D and JOB_STREA...   FROM : 2022/01/08 TO : 2022/12/09
4   P26_BGDTDCHF_PUM    PP_BGDTPRODHBACMS102020_01_M    IN7305493   2022-07-24 00:20:27     ABEND JOB PP_BGDTPRODHBACMS102020_01_M and JOB...   FROM : 2022/01/08 TO : 2022/12/09
(r'JOB\s  : match the literal JOB followed by \s (whitespace)
\b : word boundary
(\w+)\b : capture one or more letters followed by word boundary (that will be your jobid)
.*? : match one or letters (non greedy)
JOB_STREAM_NAME\s\b : match the literal followed by whitespace, followed by word boundary
(\w+)\b : capture one or more word characters followed by word boundary

' )
主站蜘蛛池模板: 红杏亚洲影院一区二区三区| 精品国产一区在线观看| 亚洲人成网站18禁止一区| 动漫精品第一区二区三区| 国产日韩一区二区三免费高清| 国产精品久久久久一区二区三区| 夜夜添无码试看一区二区三区| 无码人妻精品一区二区蜜桃AV| 91精品乱码一区二区三区| 精品国产亚洲一区二区三区| 色视频综合无码一区二区三区| 无码国产精品一区二区免费| 亚洲蜜芽在线精品一区| 国产高清视频一区三区| 国产亚洲无线码一区二区| 久久久久女教师免费一区| 日韩一区二区三区免费播放| 精品乱码一区二区三区在线| 亚洲乱码av中文一区二区| 亚拍精品一区二区三区| 亚洲成AV人片一区二区| 国产成人一区二区三区在线观看| 久草新视频一区二区三区| 日本不卡一区二区三区视频| 中文字幕一区二区人妻| 国产一区二区三区免费| 性色AV一区二区三区| 亚洲一区二区三区高清不卡| 精品无码日韩一区二区三区不卡| 色妞色视频一区二区三区四区| 精品国产一区二区三区久久久狼 | 精品无码成人片一区二区98| 色欲精品国产一区二区三区AV| 亚洲一区精品中文字幕| 制服丝袜一区在线| 午夜福利无码一区二区| 色一乱一伦一区一直爽| 精品乱子伦一区二区三区高清免费播放| 亚洲第一区精品观看| 少妇精品久久久一区二区三区 | 美女视频一区二区|