|
| 1 | +--- |
| 2 | +description: 'R language and document formats (R, Rmd, Quarto): coding standards and Copilot guidance for idiomatic, safe, and consistent code generation.' |
| 3 | +applyTo: '**/*.R, **/*.r, **/*.Rmd, **/*.rmd, **/*.qmd' |
| 4 | +--- |
| 5 | + |
| 6 | +# R Programming Language Instructions |
| 7 | + |
| 8 | +## Purpose |
| 9 | + |
| 10 | +Help GitHub Copilot generate idiomatic, safe, and maintainable R code across projects. |
| 11 | + |
| 12 | +## Core Conventions |
| 13 | + |
| 14 | +- **Match the project’s style.** If the file shows a preference (tidyverse vs. base R, `%>%` vs. `|>`), follow it. |
| 15 | +- **Prefer clear, vectorized code.** Keep functions small and avoid hidden side effects. |
| 16 | +- **Qualify non-base functions in examples/snippets**, e.g., `dplyr::mutate()`, `stringr::str_detect()`. In project code, using `library()` is acceptable when that’s the repo norm. |
| 17 | +- **Naming:** `lower_snake_case` for objects/files; avoid dots in names. |
| 18 | +- **Side effects:** Never call `setwd()`; prefer project-relative paths (e.g., `here::here()`). |
| 19 | +- **Reproducibility:** Set seeds locally around stochastic operations using `withr::with_seed()`. |
| 20 | +- **Validation:** Validate and constrain user inputs; use typed checks and allowlists where possible. |
| 21 | +- **Safety:** Avoid `eval(parse())`, unvalidated shell calls, and unparameterized SQL. |
| 22 | + |
| 23 | +### Pipe Operators |
| 24 | + |
| 25 | +- **Native pipe `|>` (R ≥ 4.1.0):** Prefer in R ≥ 4.1 (no extra dependency). |
| 26 | +- **Magrittr pipe `%>%`:** Continue using in projects already committed to magrittr or when you need features like `.`, `%T>%`, or `%$%`. |
| 27 | +- **Be consistent:** Don't mix `|>` and `%>%` within the same script unless there's a clear technical reason. |
| 28 | + |
| 29 | +## Performance Considerations |
| 30 | + |
| 31 | +- **Large datasets:** consider `data.table`; benchmark with your workload. |
| 32 | +- **dplyr compatibility:** Use `dtplyr` to write dplyr syntax that translates to data.table operations automatically for performance gains. |
| 33 | +- **Profiling:** Use `profvis::profvis()` to identify performance bottlenecks in your code. Profile before optimizing. |
| 34 | +- **Caching:** Use `memoise::memoise()` to cache expensive function results. Particularly useful for repeated API calls or complex computations. |
| 35 | +- **Vectorization:** Prefer vectorized operations over loops. Use `purrr::map_*()` family or `apply()` family for remaining iteration needs. |
| 36 | + |
| 37 | +## Tooling & Quality |
| 38 | + |
| 39 | +- **Formatting:** `styler` (tidyverse style), two-space indents, ~100-char lines. |
| 40 | +- **Linting:** `lintr` configured via `.lintr`. |
| 41 | +- **Pre-commit:** consider `precommit` hooks to lint/format automatically. |
| 42 | +- **Docs:** roxygen2 for exported functions (`@param`, `@return`, `@examples`). |
| 43 | +- **Tests:** prefer small, pure, composable functions that are easy to unit test. |
| 44 | +- **Dependencies:** manage with `renv`; snapshot after adding packages. |
| 45 | +- **Paths:** prefer `fs` and `here` for portability. |
| 46 | + |
| 47 | +## Data Wrangling & I/O |
| 48 | + |
| 49 | +- **Data frames:** prefer tibbles in tidyverse-heavy files; otherwise base `data.frame()` is fine. |
| 50 | +- **Iteration:** use `purrr` in tidyverse code. In base-style code, prefer type-stable, vectorized patterns such as `vapply()` |
| 51 | + (for atomic outputs) or `Map()` (for elementwise operations) instead of explicit `for` loops when they improve clarity or performance. |
| 52 | +- **Strings & Dates:** use `stringr`/`lubridate` where already present; otherwise use clear base helpers (e.g., `nchar()`, `substr()`, `as.Date()` with explicit format). |
| 53 | +- **I/O:** prefer explicit, typed readers (e.g., `readr::read_csv()`); make parsing assumptions explicit. |
| 54 | + |
| 55 | +## Plotting |
| 56 | + |
| 57 | +- Prefer `ggplot2` for publication-quality plots. Keep layers readable and label axes and units. |
| 58 | + |
| 59 | +## Error Handling |
| 60 | + |
| 61 | +- In tidyverse contexts, use `rlang::abort()` / `rlang::warn()` for structured conditions; in base-only code, use `stop()` / `warning()`. |
| 62 | +- For recoverable operations: |
| 63 | +- Use `purrr::possibly()` when you want a typed fallback value of the same type (simpler). |
| 64 | +- Use `purrr::safely()` when you need to capture both results and errors for later inspection or logging. |
| 65 | +- Use `tryCatch()` in base R for fine-grained control or compatibility with non-tidyverse code. |
| 66 | +- Prefer consistent return structures—typed outputs for normal flows, structured lists only when error details are required. |
| 67 | + |
| 68 | +## Security Best Practices |
| 69 | + |
| 70 | +- **Command execution:** Prefer `processx::run()` or `sys::exec_wait()` over `system()`; validate and sanitize all arguments. |
| 71 | +- **Database queries:** Use parameterized `DBI` queries to prevent SQL injection. |
| 72 | +- **File paths:** Normalize and sanitize user-provided paths (e.g., `fs::path_sanitize()`), and validate against allowlists. |
| 73 | +- **Credentials:** Never hardcode secrets. Use env vars (`Sys.getenv()`), config outside VCS, or `keyring`. |
| 74 | + |
| 75 | +## Shiny |
| 76 | + |
| 77 | +- Modularize UI and server logic for non-trivial apps. Use `eventReactive()` / `observeEvent()` for explicit dependencies. |
| 78 | +- Validate inputs with `req()` and clear, user-friendly messages. |
| 79 | +- Use connection pooling (`pool`) for databases; avoid long-lived global objects. |
| 80 | +- Isolate expensive computations and prefer `reactiveVal()` / `reactiveValues()` for small state. |
| 81 | + |
| 82 | +## R Markdown / Quarto |
| 83 | + |
| 84 | +- Keep chunks focused; prefer explicit chunk options (`echo`, `message`, `warning`). |
| 85 | +- Avoid global state; prefer local helpers. Use `withr::with_seed()` for deterministic chunks. |
| 86 | + |
| 87 | +## Copilot-Specific Guidance |
| 88 | + |
| 89 | +- If the current file uses tidyverse, **suggest tidyverse-first patterns** (e.g., `dplyr::across()` instead of superseded verbs). If base-R style is present, **use base idioms**. |
| 90 | +- Qualify non-base calls in suggestions (e.g., `dplyr::mutate()`). |
| 91 | +- Suggest vectorized or tidy solutions over loops when idiomatic. |
| 92 | +- Prefer small helper functions over long pipelines. |
| 93 | +- When multiple approaches are equivalent, prefer readability and type stability and explain the trade-offs. |
| 94 | + |
| 95 | +--- |
| 96 | + |
| 97 | +## Minimal Examples |
| 98 | + |
| 99 | +```r |
| 100 | +# Base R variant |
| 101 | +scores <- data.frame(id = 1:5, x = c(1, 3, 2, 5, 4)) |
| 102 | +safe_log <- function(x) tryCatch(log(x), error = function(e) NA_real_) |
| 103 | +scores$z <- vapply(scores$x, safe_log, numeric(1)) |
| 104 | + |
| 105 | +# Tidyverse variant (if this file uses tidyverse) |
| 106 | +result <- tibble::tibble(id = 1:5, x = c(1, 3, 2, 5, 4)) |> |
| 107 | +dplyr::mutate(z = purrr::map_dbl(x, purrr::possibly(log, otherwise = NA_real_))) |> |
| 108 | +dplyr::filter(z > 0) |
| 109 | + |
| 110 | +# Example reusable helper with roxygen2 doc |
| 111 | +#' Compute the z-score of a numeric vector |
| 112 | +#' @param x A numeric vector |
| 113 | +#' @return Numeric vector of z-scores |
| 114 | +#' @examples z_score(c(1, 2, 3)) |
| 115 | +z_score <- function(x) (x - mean(x, na.rm = TRUE)) / stats::sd(x, na.rm = TRUE) |
| 116 | +``` |
0 commit comments