如果來自df1
的words
列的值包含來自df2
的keywords
列的值,我有兩個數據幀要合并。我一直在嘗試使用str.extract
。但到目前為止,沒有運氣達到預期的結果。示例如下:
df1:
[{'id': 1, 'words': 'chellomedia', 'languages': nan},
{'id': 2, 'words': 'Moien Welt!', 'languages': 'Luxemburgish'},
{'id': 3, 'words': 'Ahoj světe!', 'languages': 'Czech'},
{'id': 4, 'words': 'hello world', 'languages': nan},
{'id': 5, 'words': '?Hola Mundo!', 'languages': 'Spanish'},
{'id': 6, 'words': 'hello kitty', 'languages': 'English'},
{'id': 7, 'words': 'Ciao mondo!', 'languages': 'Italian'},
{'id': 8, 'words': 'hola world', 'languages': nan}]
df2:
[{'code': 1, 'keywords': 'Hello'},
{'code': 2, 'keywords': 'hola'},
{'code': 3, 'keywords': 'world'}]
我的試用代碼:
df1['words'] = df1['words'].str.lower()
df2['keywords'] = df2['keywords'].str.lower()
pat = '|'.join([re.escape(x) for x in df2.keywords])
df1.insert(0, 'keywords', df1['words'].str.extract('(' + pat + ')', expand=False))
pd.merge(df1, df2, on='keywords', how='left')
Out:
keywords id words languages code
0 hello 1 chellomedia NaN 1.0
1 NaN 2 moien welt! Luxemburgish NaN
2 NaN 3 ahoj světe! Czech NaN
3 hello 4 hello world NaN 1.0
4 hola 5 ?hola mundo! Spanish 2.0
5 hello 6 hello kitty English 1.0
6 NaN 7 ciao mondo! Italian NaN
7 hola 8 hola world NaN 2.0
但是想要的應該是這樣的:
keywords id words languages code
0 hello 1 chellomedia NaN 1.0
1 NaN 2 moien welt! Luxemburgish NaN
2 NaN 3 ahoj světe! Czech NaN
3 hello 4 hello world NaN 1.0
4 world 4 hello world NaN 3.0 ---> should be generated in df
5 hola 5 ?hola mundo! Spanish 2.0
6 hello 6 hello kitty English 1.0
7 NaN 7 ciao mondo! Italian NaN
8 hola 8 hola world NaN 2.0
9 world 8 hola world NaN 3.0 ---> should be generated in df
我如何才能產生預期的結果?謝謝。
您必須使用
findall
和explode
來代替extract
,例如:Output:
與您需要的完全相同:)