R Regex – How to Replace Variables in a Formula with Their Definitions [Closed]

rregex

[ I have the list of variables and their corresponding definitions. I also have the formulas, provided, but I would like the translation of the formulas (provided in the example) to better understand what the formula is saying. My example is simplistic, but I typically have about 100 variables and 100 formulas.

I tried to find similar questions but could not find the answer.

structure(list(variable = c("cs", "csp", "cb", "cc", "ccel", 
"ccrt"), definition = c("cost of salad", "cost of soup", "cost of bread", 
"cost of chicken", "cost of celery", "cost of carrot"), formula = c("cs=cb+ccel+cc", 
"csp=cc+ccel+crt", NA, NA, NA, NA), Translation = c("cost of salad=cost of bread+cost of celery+cost of chicken", 
"cost of soup=cost of chicken+cost of celery+cost of carrot", 
NA, NA, NA, NA)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-6L))

Best Answer

1) Using language objects. It is common in R to convert expressions to language objects and then process them.

To do that define a function subst which

  • inputs and converts a character vector of expressions, expr, to language objects, expr2
  • converts the first two columns of the data frame, DF, to a list of names, defs2
  • performs the substitutions using substitute
  • converts back to a character vector using deparse1
  • removes backticks the above process generates
  • converts "NA" to NA

Thus using the input in the Note at the end

subst <- function(expr, defs) {
  expr2 <- lapply(expr, str2lang)
  defs2 <- defs[1:2] |> deframe() |> as.list() |> lapply(as.name)
  s <- sapply(expr2, \(expr) deparse1(do.call(substitute, list(expr, defs2))))
  s <- gsub("`", "", s)
  ifelse(s == "NA", NA, s)
}

# test run
DF %>%
  mutate(translation = subst(formula, pick(variable, definition)))

giving

# A tibble: 6 × 4
  variable definition      formula        translation                           
  <chr>    <chr>           <chr>          <chr>                                 
1 cs       cost of salad   cs=cb+ccel+cc  cost of salad = cost of bread + cost …
2 csp      cost of soup    csp=cc+cel+crt cost of soup = cost of chicken + cel …
3 cb       cost of bread   <NA>           <NA>                                  
4 cc       cost of chicken <NA>           <NA>                                  
5 ccel     cost of celery  <NA>           <NA>                                  
6 ccrt     cost of carrot  <NA>           <NA>                                  

2) We could alternately use gsubfn to perform the substitutions. We assume that the variables consist entirely of word characters. This gives the same result.

library(gsubfn)

subst2 <- function(expr, defs) {
  defs2 <- defs[1:2] |> deframe() |> as.list()
  gsubfn("\\w+", defs2, expr) 
}
 
DF |>
  mutate(translation = subst2(formula, pick(variable, definition)))

Note

library(dplyr)
library(tibble)

DF <- tibble(
  variable = c("cs", "csp", "cb", "cc", "ccel", "ccrt"),
  definition = c(
    "cost of salad", "cost of soup", "cost of bread", "cost of chicken",
    "cost of celery", "cost of carrot"
  ),
  formula = c("cs=cb+ccel+cc", "csp=cc+cel+crt", NA, NA, NA, NA),
)