使用逗號分隔的值拆分csv中的行

Question 1

我有一個csv文件，列中的信息（id和文本）如下面的示例所示：

1, ?ildomos grindys
2, ?ildomos grindys, Rekuperacin? sistema
3, 
4, Skalbimo ma?ina, Su baldais, ?aldytuvas, ?ildomos grindys

我想要的輸出是將ID傳輸?shù)揭恍校⑵渑c其文本關(guān)聯(lián)（用于數(shù)據(jù)庫）。由于csv文件非常大，我只給你一小部分來了解我想要什么：

| ID             | Features   
+----------------+-------------
| 1              | ?ildomos grindys
| 2              | ?ildomos grindys
| 2              | Rekuperacin? sistema
| 3              | null
| 4              | Skalbimo ma?ina
| 4              | Su baldais
| 4              | ?aldytuvas
| 4              | ?ildomos grindys

我如何通過python做到這一點？謝謝

Question 2

這里有一種方法可以滿足您的要求：

with open('infoo.txt', 'r', encoding="utf-8") as f:
    records = []
    rows = [[x.strip() for x in row.split(',')] for row in f.readlines()]
    for row in rows:
        for i in range(1, len(row)):
            records.append([row[0], row[i] if row[i] else 'null'])
    with open('outfoo.txt', 'w', encoding="utf-8") as g:
        g.write('ID,Features\n')
        for record in records:
            g.write(f'{",".join(field for field in record)}\n')

# check the output file:
with open('outfoo.txt', 'r', encoding="utf-8") as f:
    print('contents of output file:')
    [print(row.strip('\n')) for row in f.readlines()]

Output:

contents of output file:
ID,Features
1,?ildomos grindys
2,?ildomos grindys
2,Rekuperacin? sistema
3,null
4,Skalbimo ma?ina
4,Su baldais
4,?aldytuvas
4,?ildomos grindys

UPDATE:

另一種方法是使用pandas（docs）。Pandas提供了許多處理表格數(shù)據(jù)的強大方法，但它也有一點學習曲線：

import pandas as pd
with open('infoo.txt', 'r', encoding="utf-8") as f:
    records = []
    rows = [[x.strip() for x in row.split(',')] for row in f.readlines()]
    df = pd.DataFrame([[row[0], row[1:]] for row in rows], columns=['ID', 'Feature'])
    print('Dataframe read from input file:'); print(df)
    df = df.explode('Feature').reset_index(drop=True)
    print('Dataframe with one Feature per row:'); print(df)
    df.to_csv('outfoo.txt', index = False)

    # check the output file:
    df2 = pd.read_csv('outfoo.txt')
    print('Dataframe re-read from output file:'); print(df2)

Output

Dataframe read from input file:
  ID                                            Feature
0  1                                 [?ildomos grindys]
1  2           [?ildomos grindys, Rekuperacin? sistema]
2  3                                                 []
3  4  [Skalbimo ma?ina, Su baldais, ?aldytuvas, ?ild...
Dataframe with one Feature per row:
  ID               Feature
0  1      ?ildomos grindys
1  2      ?ildomos grindys
2  2  Rekuperacin? sistema
3  3
4  4       Skalbimo ma?ina
5  4            Su baldais
6  4            ?aldytuvas
7  4      ?ildomos grindys
Dataframe re-read from output file:
   ID               Feature
0   1      ?ildomos grindys
1   2      ?ildomos grindys
2   2  Rekuperacin? sistema
3   3                   NaN
4   4       Skalbimo ma?ina
5   4            Su baldais
6   4            ?aldytuvas
7   4      ?ildomos grindys

pandas的文檔鏈接如下：

Answer 1

這里有一種方法可以滿足您的要求：

with open('infoo.txt', 'r', encoding="utf-8") as f:
    records = []
    rows = [[x.strip() for x in row.split(',')] for row in f.readlines()]
    for row in rows:
        for i in range(1, len(row)):
            records.append([row[0], row[i] if row[i] else 'null'])
    with open('outfoo.txt', 'w', encoding="utf-8") as g:
        g.write('ID,Features\n')
        for record in records:
            g.write(f'{",".join(field for field in record)}\n')

# check the output file:
with open('outfoo.txt', 'r', encoding="utf-8") as f:
    print('contents of output file:')
    [print(row.strip('\n')) for row in f.readlines()]

Output:

contents of output file:
ID,Features
1,?ildomos grindys
2,?ildomos grindys
2,Rekuperacin? sistema
3,null
4,Skalbimo ma?ina
4,Su baldais
4,?aldytuvas
4,?ildomos grindys

UPDATE:

另一種方法是使用pandas（docs）。Pandas提供了許多處理表格數(shù)據(jù)的強大方法，但它也有一點學習曲線：

import pandas as pd
with open('infoo.txt', 'r', encoding="utf-8") as f:
    records = []
    rows = [[x.strip() for x in row.split(',')] for row in f.readlines()]
    df = pd.DataFrame([[row[0], row[1:]] for row in rows], columns=['ID', 'Feature'])
    print('Dataframe read from input file:'); print(df)
    df = df.explode('Feature').reset_index(drop=True)
    print('Dataframe with one Feature per row:'); print(df)
    df.to_csv('outfoo.txt', index = False)

    # check the output file:
    df2 = pd.read_csv('outfoo.txt')
    print('Dataframe re-read from output file:'); print(df2)

Output

Dataframe read from input file:
  ID                                            Feature
0  1                                 [?ildomos grindys]
1  2           [?ildomos grindys, Rekuperacin? sistema]
2  3                                                 []
3  4  [Skalbimo ma?ina, Su baldais, ?aldytuvas, ?ild...
Dataframe with one Feature per row:
  ID               Feature
0  1      ?ildomos grindys
1  2      ?ildomos grindys
2  2  Rekuperacin? sistema
3  3
4  4       Skalbimo ma?ina
5  4            Su baldais
6  4            ?aldytuvas
7  4      ?ildomos grindys
Dataframe re-read from output file:
   ID               Feature
0   1      ?ildomos grindys
1   2      ?ildomos grindys
2   2  Rekuperacin? sistema
3   3                   NaN
4   4       Skalbimo ma?ina
5   4            Su baldais
6   4            ?aldytuvas
7   4      ?ildomos grindys

pandas的文檔鏈接如下：

使用逗號分隔的值拆分csv中的行

熱門問答

使用VBA允許在同一工作表的多個列中選擇Excel中應(yīng)用的所有內(nèi)容

Google表單上傳+在自動電子郵件回復中附加上傳的pdf

只執(zhí)行一次的嵌套bash循環(huán)

如果用戶在擁有特定角色時離開服務(wù)器，則禁止該用戶discord.py

樣式化組件不重寫內(nèi)聯(lián)樣式

UnhandledPromisejectionWarning:錯誤[ERR_HTTP_HEADERS_SENT]：在添加響應(yīng)時，將頭發(fā)送到客戶端后，無法設(shè)置頭

django {% load static %} 報錯，如何解決？

sftp.py ,python基礎(chǔ)語法

如何在Linux系統(tǒng)上安裝和配置DNS服務(wù)器

design modeler中的share topology的用法

angular項目結(jié)構(gòu)

Babel代碼與API轉(zhuǎn)換不同步