Skip to content

Conversation

@strengejacke
Copy link
Member

@strengejacke strengejacke commented Nov 21, 2024

The message when standardizing failed due to "problematic formulas" was not always clear. Now it should be clearer to users why model cannot be standardized.
Inspired by this SO post: https://stackoverflow.com/questions/79207876/variable-names-and-easystats-reports

data(mtcars)
m <- lm(mpg ~ hp, data = mtcars)
datawizard::standardise(m)
#> 
#> Call:
#> lm(formula = mpg ~ hp, data = data_std)
#> 
#> Coefficients:
#> (Intercept)           hp  
#>  -3.149e-17   -7.762e-01

colnames(mtcars)[1] <- "1_mpg"
m <- lm(`1_mpg` ~ hp, data = mtcars)
datawizard::standardise(m)
#> Warning: Looks like you are using invalid syntactically variables names, quoted
#>   in backticks: `1_mpg`. This may result in unexpected behaviour. Please
#>   rename your variables (e.g., `X1_mpg` instead of `1_mpg`) and fit the
#>   model again.
#> Model cannot be standardized.
#> 
#> Call:
#> lm(formula = `1_mpg` ~ hp, data = mtcars)
#> 
#> Coefficients:
#> (Intercept)           hp  
#>    30.09886     -0.06823

data(mtcars)
m <- lm(mtcars$mpg ~ mtcars$hp)
datawizard::standardise(m)
#> Warning: Using `$` in model formulas can produce unexpected results. Specify your
#>   model using the `data` argument instead.
#>   Try: mpg ~ hp, data = mtcars
#> Model cannot be standardized.
#> 
#> Call:
#> lm(formula = mtcars$mpg ~ mtcars$hp)
#> 
#> Coefficients:
#> (Intercept)    mtcars$hp  
#>    30.09886     -0.06823

m <- lm(mtcars[, 1] ~ hp, data = mtcars)
datawizard::standardise(m)
#> Warning: Using indexed data frames, such as `df[, 5]`, as model response can
#>   produce unexpected results. Specify your model using the literal name of
#>   the response variable instead.
#> Model cannot be standardized.
#> 
#> Call:
#> lm(formula = mtcars[, 1] ~ hp, data = mtcars)
#> 
#> Coefficients:
#> (Intercept)           hp  
#>    30.09886     -0.06823

Created on 2024-11-21 with reprex v2.1.1

@strengejacke strengejacke changed the title Warn user for invalif formula Warn user for invalid formula Nov 21, 2024
@strengejacke strengejacke marked this pull request as ready for review November 21, 2024 15:22
Copy link
Member

@etiennebacher etiennebacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice messages, very clear! I have a doubt on the behavior when formula is not ok, and there are some tests to fix but otherwise LGTM.


Edit: actually shouldn't it be "using syntactically invalid variables names" instead of "using invalid syntactically variables names"?

@strengejacke
Copy link
Member Author

"using syntactically invalid variables names" instead of "using invalid syntactically variables names"?

yeah, but that's an issue in insight ;-) will fix.

@strengejacke
Copy link
Member Author

Just saw, some tests are out of date, because these code-lines are now no longer reached for the exceptions. Will look at it.

@strengejacke
Copy link
Member Author

data(mtcars)
m <- lm(mpg ~ hp, data = mtcars)
datawizard::standardise(m)
#> 
#> Call:
#> lm(formula = mpg ~ hp, data = data_std)
#> 
#> Coefficients:
#> (Intercept)           hp  
#>  -3.149e-17   -7.762e-01

colnames(mtcars)[1] <- "1_mpg"
m <- lm(`1_mpg` ~ hp, data = mtcars)
datawizard::standardise(m)
#> Error: Model cannot be standardized.
#>   Looks like you are using syntactically invalid variables names, quoted
#>   in backticks: `1_mpg`. This may result in unexpected behaviour. Please
#>   rename your variables (e.g., `X1_mpg` instead of `1_mpg`) and fit the
#>   model again.

data(mtcars)
m <- lm(mtcars$mpg ~ mtcars$hp)
datawizard::standardise(m)
#> Error: Model cannot be standardized.
#>   Using `$` in model formulas can produce unexpected results. Specify your
#>   model using the `data` argument instead.
#>   Try: mpg ~ hp, data = mtcars

m <- lm(mtcars[, 1] ~ hp, data = mtcars)
datawizard::standardise(m)
#> Error: Model cannot be standardized.
#>   Using indexed data frames, such as `df[, 5]`, as model response can
#>   produce unexpected results. Specify your model using the literal name of
#>   the response variable instead.

Created on 2024-11-21 with reprex v2.1.1

@etiennebacher
Copy link
Member

Thanks!

@etiennebacher etiennebacher merged commit 2741cdc into main Nov 21, 2024
21 of 22 checks passed
@etiennebacher etiennebacher deleted the check_formula branch November 21, 2024 18:55
@strengejacke
Copy link
Member Author

I fixed "variables names" in insight, should be "variable names" in the error message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants