R dplyr Sum – How to Add Two Columns Multiple Times with dplyr

dplyrrsum

Assume I have a tibble (in my example here only with four columns). In real I have two hourly time series therefore 2 times 24 columns:

s1.x,...,s24.x, s1.y,...,s24.y

Now I want to sum up column s1.x with s1.y, s2.x with s2.y to s24.x with s24.y.

a <- tibble(s1.x=2:7, s2.x=3:8, s1.y=4:9, s2.y=5:10)

a %>% 
  mutate(s1.tot=s1.x+s1.y, s2.tot=s2.x+s2.y)

How can I do this in a short (and elegant way) for all 24 hours?

Best Answer

A base R solution

cbind(
    a,
    lapply(
        split.default(a, sub("\\..*", ".tot", names(a))),
        rowSums
    )
)

or even shorter

cbind(a, t(rowsum(t(a), sub("\\..*", ".tot", names(a)))))

which gives

  s1.x s2.x s1.y s2.y s1.tot s2.tot
1    2    3    4    5      6      8
2    3    4    5    6      8     10
3    4    5    6    7     10     12
4    5    6    7    8     12     14
5    6    7    8    9     14     16
6    7    8    9   10     16     18

Benchmarking

If you are care about the speed

edward <- function() {
    a %>%
        mutate(id = 1:nrow(a)) %>%
        pivot_longer(-id,
            names_to = c("s", ".value"),
            names_pattern = "(s\\d+).(.)"
        ) |>
        mutate(tot = x + y, .by = id) |>
        pivot_wider(names_from = s, values_from = c(x, y, tot), names_glue = "{s}.{.value}") %>%
        select(-id)
}


mael <- function() {
    a |>
        mutate(across(matches(".x$"), .names = "{gsub('x', '', .col)}tot") +
            across(matches(".y$")))
}

tic1 <- function() {
    cbind(
        a,
        lapply(
            split.default(a, sub("\\..*", ".tot", names(a))),
            rowSums
        )
    )
}

tic2 <- function() {
    cbind(a, t(rowsum(t(a), sub("\\..*", ".tot", names(a)))))
}

microbenchmark(
    edward(),
    mael(),
    tic1(),
    tic2(),
    unit = "relative",
    check = "equivalent"
)

shows

Unit: relative
     expr       min        lq      mean    median        uq      max neval
 edward() 43.286275 33.290854 29.937941 32.085324 30.763830 8.997682   100
   mael()  8.465686  7.081354  6.652635  6.930750  6.867252 2.572953   100
   tic1()  1.673039  1.469692  1.527453  1.417756  1.315485 1.425672   100
   tic2()  1.000000  1.000000  1.000000  1.000000  1.000000 1.000000   100

Related Solutions

R Sorting DataFrame – How to Sort Rows by Multiple Columns

You can use the order() function directly without resorting to add-on tools -- see this simpler answer which uses a trick right from the top of the example(order) code:

R> dd[with(dd, order(-z, b)), ]
    b x y z
4 Low C 9 2
2 Med D 3 1
1  Hi A 8 1
3  Hi A 9 1

Edit some 2+ years later: It was just asked how to do this by column index. The answer is to simply pass the desired sorting column(s) to the order() function:

R> dd[order(-dd[,4], dd[,1]), ]
    b x y z
4 Low C 9 2
2 Med D 3 1
1  Hi A 8 1
3  Hi A 9 1
R>

rather than using the name of the column (and with() for easier/more direct access).

R – How to Sum a Variable by Group

Using aggregate:

aggregate(x$Frequency, by=list(Category=x$Category), FUN=sum)
  Category  x
1    First 30
2   Second  5
3    Third 34

In the example above, multiple dimensions can be specified in the list. Multiple aggregated metrics of the same data type can be incorporated via cbind:

aggregate(cbind(x$Frequency, x$Metric2, x$Metric3) ...

(embedding @thelatemail comment), aggregate has a formula interface too

aggregate(Frequency ~ Category, x, sum)

Or if you want to aggregate multiple columns, you could use the . notation (works for one column too)

aggregate(. ~ Category, x, sum)

or tapply:

tapply(x$Frequency, x$Category, FUN=sum)
 First Second  Third 
    30      5     34

Using this data:

x <- data.frame(Category=factor(c("First", "First", "First", "Second",
                                      "Third", "Third", "Second")), 
                    Frequency=c(10,15,5,2,14,20,3))

Best Answer

Benchmarking

Related Solutions

R Sorting DataFrame – How to Sort Rows by Multiple Columns

R – How to Sum a Variable by Group

Related Question