特定條件下均值和和的groupbypandas

Question 1

我試圖為數(shù)百列的大型數(shù)據(jù)框架生成摘要統(tǒng)計(jì)數(shù)據(jù)，并總結(jié)它們與感興趣的“結(jié)果”的關(guān)系。可使用以下代碼生成簡化的數(shù)據(jù)幀：

df1 = pd.DataFrame({"time1": [0, 1, 1, 0, 0],
                    "time2": [1, 0, 0, 0, 1],
                    "time3": [0, 0, 0, 1, 0],
                    "outcome": [1, 0, 0, 1, 0]})

我想做的是確定每一列在比例和總和方面與結(jié)果特征的關(guān)系。

目前，我一次只做幾個(gè)專欄，如下所示：

df1 = df1.groupby("outcome")[["time1", "time2", "time3"]].agg(["mean", "sum"]).reset_index()
      
df1[df1["outcome"] == 1].T

這會導(dǎo)致相當(dāng)混亂的數(shù)據(jù)幀，如下所示：

                1
outcome       1.0
time1   mean  0.0
        sum   0.0
time2   mean  0.5
        sum   1.0
time3   mean  0.5
        sum   1.0

如何改進(jìn)此輸出以顯示每列的平均值和各列的總和？類似下面顯示的輸出。

       mean  sum
time1     0    0
time2   0.5    1
time3   0.5    1

理想情況下，我希望對數(shù)據(jù)框架中的數(shù)百列執(zhí)行此操作，并評估它們與結(jié)果的關(guān)系。

那么，有人能給我介紹一個(gè)解決方案嗎？這個(gè)解決方案允許我對數(shù)百列執(zhí)行此操作（不需要單獨(dú)鍵入它們的名稱，這就是解決方案），并生成一個(gè)干凈的數(shù)據(jù)框，如上面的示例輸出所示？非常感謝！

Question 2

正如@sammywemmy所提到的，我們可以在計(jì)算完這些值之后unstack。我們也可以使用loc而不是reset_index從索引中選擇outcome==1：

df1 = (
    df1.groupby("outcome")
        .agg(["mean", "sum"])  # Perform Aggregations
        .loc[1]  # Select outcome==1 from index
        .unstack()  # convert index to columns
)

我們也可以先過濾groupby agg，然后過濾stack和droplevel：

df1 = (
    df1[df1["outcome"] == 1]  # Filter DataFrame
        .groupby("outcome")  # Groupby
        .agg(["mean", "sum"])  # Perform Aggregations
        .stack(0)  # Convert columns to rows
        .droplevel(0)  # Drop outcome==1
)

或者set_index+stack首先，然后groupby agg在索引上：

df1 = (
    df1.set_index('outcome').stack()  # Convert time columns to rows
        .groupby(level=[0, 1])  # Groupby
        .agg(['mean', 'sum'])  # Perform Aggregations
        .loc[1]  # Select outcome==1 from index
)

或使用pivot_table和多個(gè)聚合函數(shù)：

df1 = (
    df1.pivot_table(index='outcome', aggfunc=['mean', 'sum'])
        .loc[1]  # Select outcome==1 from index
        .unstack(0)  # convert inner index to columns
)

All produce:

       mean  sum
time1   0.0  0.0
time2   0.5  1.0
time3   0.5  1.0

Answer 1

正如@sammywemmy所提到的，我們可以在計(jì)算完這些值之后unstack。我們也可以使用loc而不是reset_index從索引中選擇outcome==1：

df1 = (
    df1.groupby("outcome")
        .agg(["mean", "sum"])  # Perform Aggregations
        .loc[1]  # Select outcome==1 from index
        .unstack()  # convert index to columns
)

我們也可以先過濾groupby agg，然后過濾stack和droplevel：

df1 = (
    df1[df1["outcome"] == 1]  # Filter DataFrame
        .groupby("outcome")  # Groupby
        .agg(["mean", "sum"])  # Perform Aggregations
        .stack(0)  # Convert columns to rows
        .droplevel(0)  # Drop outcome==1
)

或者set_index+stack首先，然后groupby agg在索引上：

df1 = (
    df1.set_index('outcome').stack()  # Convert time columns to rows
        .groupby(level=[0, 1])  # Groupby
        .agg(['mean', 'sum'])  # Perform Aggregations
        .loc[1]  # Select outcome==1 from index
)

或使用pivot_table和多個(gè)聚合函數(shù)：

df1 = (
    df1.pivot_table(index='outcome', aggfunc=['mean', 'sum'])
        .loc[1]  # Select outcome==1 from index
        .unstack(0)  # convert inner index to columns
)

All produce:

       mean  sum
time1   0.0  0.0
time2   0.5  1.0
time3   0.5  1.0

特定條件下均值和和的groupbypandas

熱門問答

如何使用searchbar過濾react本機(jī)中的映射函數(shù)？

使用時(shí)文本未更改。innerHTML和var

刪除不包含特定字符串的行的%

這可以用regex完成嗎？在字符串中找到我的密鑰

組合kotlin流結(jié)果

如何停止動畫而不重置CSS中的旋轉(zhuǎn)（示例最好地解釋了這一點(diǎn)）

后端返回權(quán)限角色名稱, 前端如何使用vue動態(tài)添加路由

在C#中集成OpenTK和OpenGL進(jìn)行游戲開發(fā)時(shí)，如何處理幀緩沖的抗鋸齒問題

CellectionView:A worked example

R語言中，如何使用邏輯條件篩選數(shù)據(jù)子集

這種效果用css3或者canvas應(yīng)該如何實(shí)現(xiàn)

為什么在ie上后臺的返回值都跟其他瀏覽器不一樣，給請求加上時(shí)間戳就好了？