Skip to content

@df: Passing column name as string overflows stack #562

@ron-wolf

Description

@ron-wolf

I was trying to plot this CSV dataset using StatsPlots.@df. I wrote and ran the following code in a fresh Pluto notebook:

using Downloads, CSV, DataFrames, StatsPlots
lifesat = CSV.read(Downloads.download("https://github.com/ageron/data/raw/main/lifesat/lifesat.csv"), DataFrame)
@df lifesat plot(cols("GDP per capita (USD)"), cols("Life satisfaction"))

I expected this to work, as per the README, but I was instead greeted by a StackOverflowError. Running again line-by-line in the Julia REPL yielded this stack trace:

 [1] add_sym!(cols::Vector{Symbol}, s::Char, names::Tuple{Symbol, Symbol, Symbol}) (repeats 79984 times)
   @ StatsPlots ~/.julia/packages/StatsPlots/cStOe/src/df.jl:199

However, this isn't the whole story, as expanding the macro by itself with macroexpand yields the following code (cleaned up):

(d -> begin
  (var1, var2), names = StatsPlots.extract_columns_and_names(d, "GDP per capita (USD)", "Life satisfaction")
  StatsPlots.add_label([StatsPlots.compute_name(names, "GDP per capita (USD)"), StatsPlots.compute_name(names, "Life satisfaction")], plot, var1, var2)
end)(lifesat)

And sure enough, attempting to call that first function extract_columns_and_names with those arguments yields the same error. The culprit is the following line of code in that function:

selected_cols = add_sym!(Symbol[], syms, names)

Which delegates to add_sym!, which recurses indefinitely over the first character of the first string, as follows:

add_sym!(Symbol[], "GDP per capita (USD)", (:Country, Symbol("GDP per capita (USD)"), Symbol("Life satisfaction")))
add_sym!(Symbol[], 'G', (:Country, Symbol("GDP per capita (USD)"), Symbol("Life satisfaction")))
add_sym!(Symbol[], 'G', (:Country, Symbol("GDP per capita (USD)"), Symbol("Life satisfaction")))
add_sym!(Symbol[], 'G', (:Country, Symbol("GDP per capita (USD)"), Symbol("Life satisfaction")))
# ...

The solution is to special-case strings in the definition for add_sym!, making sure not to iterate over their characters individually, but instead to convert them to symbols first.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions