Python Dataframe – How to Use Polars When-Then-Otherwise on Multiple Output Columns at Once

dataframepythonpython-polars

Assume I have this dataframe

import polars as pl

df = pl.DataFrame({
    'item':         ['CASH', 'CHECK', 'DEBT', 'CHECK', 'CREDIT', 'CASH'],
    'quantity':     [100, -20, 0, 10, 0, 0],
    'value':        [99, 47, None, 90, None, 120],
    'value_other':  [97, 57, None, 91, None, 110],
    'value_other2': [94, 37, None, 93, None, 115],
})
┌────────┬──────────┬───────┬─────────────┬──────────────┐
│ item   ┆ quantity ┆ value ┆ value_other ┆ value_other2 │
│ ---    ┆ ---      ┆ ---   ┆ ---         ┆ ---          │
│ str    ┆ i64      ┆ i64   ┆ i64         ┆ i64          │
╞════════╪══════════╪═══════╪═════════════╪══════════════╡
│ CASH   ┆ 100      ┆ 99    ┆ 97          ┆ 94           │
│ CHECK  ┆ -20      ┆ 47    ┆ 57          ┆ 37           │
│ DEBT   ┆ 0        ┆ null  ┆ null        ┆ null         │
│ CHECK  ┆ 10       ┆ 90    ┆ 91          ┆ 93           │
│ CREDIT ┆ 0        ┆ null  ┆ null        ┆ null         │
│ CASH   ┆ 0        ┆ 120   ┆ 110         ┆ 115          │
└────────┴──────────┴───────┴─────────────┴──────────────┘

Now I want to set all value columns to 0 for all rows where value is null and quantity == 0.

Right now I have this solution

cols = ['value', 'value_other', 'value_other2']
df   = df.with_columns([
    pl.when(pl.col('value').is_null() & (pl.col('quantity') == 0))
    .then(0)
    .otherwise(pl.col(col))
    .alias(col)
    for col in cols
])

which correctly gives

┌────────┬──────────┬───────┬─────────────┬──────────────┐
│ item   ┆ quantity ┆ value ┆ value_other ┆ value_other2 │
│ ---    ┆ ---      ┆ ---   ┆ ---         ┆ ---          │
│ str    ┆ i64      ┆ i64   ┆ i64         ┆ i64          │
╞════════╪══════════╪═══════╪═════════════╪══════════════╡
│ CASH   ┆ 100      ┆ 99    ┆ 97          ┆ 94           │
│ CHECK  ┆ -20      ┆ 47    ┆ 57          ┆ 37           │
│ DEBT   ┆ 0        ┆ 0     ┆ 0           ┆ 0            │
│ CHECK  ┆ 10       ┆ 90    ┆ 91          ┆ 93           │
│ CREDIT ┆ 0        ┆ 0     ┆ 0           ┆ 0            │
│ CASH   ┆ 0        ┆ 120   ┆ 110         ┆ 115          │
└────────┴──────────┴───────┴─────────────┴──────────────┘

However, I feel this is very inefficient as my when condition is executed for every value column. Is there a way to achieve this using only polar internal functions & without the native for-loop?

Best Answer

Apply the condition to all columns at once. Also replace alias by name.keep:

cols = ['value', 'value_other', 'value_other2']

(df.with_columns(
    pl.when(pl.col('value').is_null() & (pl.col('quantity') == 0))
    .then(0)
    .otherwise(pl.col(cols))
    .name.keep()
))

Output:

┌────────┬──────────┬───────┬─────────────┬──────────────┐
│ item   ┆ quantity ┆ value ┆ value_other ┆ value_other2 │
│ ---    ┆ ---      ┆ ---   ┆ ---         ┆ ---          │
│ str    ┆ i64      ┆ i64   ┆ i64         ┆ i64          │
╞════════╪══════════╪═══════╪═════════════╪══════════════╡
│ CASH   ┆ 100      ┆ 99    ┆ 97          ┆ 94           │
│ CHECK  ┆ -20      ┆ 47    ┆ 57          ┆ 37           │
│ DEBT   ┆ 0        ┆ 0     ┆ 0           ┆ 0            │
│ CHECK  ┆ 10       ┆ 90    ┆ 91          ┆ 93           │
│ CREDIT ┆ 0        ┆ 0     ┆ 0           ┆ 0            │
│ CASH   ┆ 0        ┆ 120   ┆ 110         ┆ 115          │
└────────┴──────────┴───────┴─────────────┴──────────────┘