camr_transform_redcap
A function is proposed with no current implementation.
camr_transform_redcap(df, df_metadata, pattern, transform, rename, derived, regex)
Where pattern a regex on which to match column names. e.g. pattern='^health_adhd_[0-9]|1[1-3]$'
-
Assert items matching the pattern have compatible coding.
- Do this by inspecting REDCap metadata for the matching items in df.
- Throw an error if a matching item in df does not exist in df_metadata.
-
Assert type and range on items.
- Using the redcap metadata, verify that all unmodified items fall within the expected ranges.
-
Transform all items with a function.
- A single transformation on all columns e.g. transform = ~ as.numeric(.) / 2
- A list of transformations with columns matching a regex e.g. transform = list( '^qtn_item[1-6]$'=as.integer, # Reverse code items 7-10. '^qtn_item([7-9]|10)$'=~ -as.integer(.) ), regex=TRUE,
- A list of transformations for individual columns e.g. transform = list( 'qtn_item1'=as.integer, 'qtn_item2'=as.integer, 'qtn_item3'=~ -as.integer(.) ), regex=FALSE # the default
-
Rename all items
- A function that renames. e.g. rename = ~ paste0('INV.DBL.APSS.Q', str_extract(. '\d+$'))
- A list of individual columns to rename. e.g. rename = list('INV.LGL.Health.Risk' = 'health_risk', ...)
- Assert that new names match the provided types.
-
Derive variables
- Common ones are built-in, specified with a string: e.g. derived = c('total', 'mean')
- Subscale scoring or more complex scoring. e.g. derived = list(INV.INT.PHQ4.Anxiety ~ phq4_3 + phq4_4)
Example:
df_redcap |>
camr_transform_redcap(df_dictionary, "^health_12mo_supports___([0-9]|1[0-3])$",
transform=~ as.logical(as.integer(.)),
rename=~ paste0('INV.LGL.Health.Supports.Q', str_extract(., '\\d+$'))
) |>
camr_transform_redcap(df_dictionary, "^apss_[1-7]$",
transform=as.integer,
rename=~ paste0('INV.INT.APSS.Q', str_extract(., '\\d+$'))
)
@kp390 @lvn3 @be931 @ab1167 Would something like this be useful in your workflow?