Skip to content

Error to read parquet with latest parquet-go #61

@tanyaofei

Description

@tanyaofei

  1. Create a file with python pandas
dataframe = pandas.DataFrame({
        "A": ["a", "b", "c", "d"],
        "B": [2, 3, 4, 1],
        "C": [10, 20, None, None]
    })

dataframe.to_parquet("1.parquet")

This file looks like:
image

  1. Read this file
func main() {
    ctx := context.Background()
    fr, _ := local.NewLocalFileReader("1.parquet")
    df, err := imports.LoadFromParquet(ctx, fr)
    if err != nil {
        panic(err)
    }
    fmt.Println(df)
}
  1. Got a unique name error
panic: names of series must be unique: 

goroutine 1 [running]:
github.com/rocketlaunchr/dataframe-go.NewDataFrame({0xc0001f8000, 0x3, 0xc000149a10?})
        .../rocketlaunchr/[email protected]/dataframe.go:41 +0x33c
github.com/rocketlaunchr/dataframe-go/imports.LoadFromParquet({0x1497868, 0xc000020080}, {0x1498150?, 0xc00000e798?}, {0xc0000021a0?, 0xc000149f70?, 0x1007599?})
        .../go/pkg/mod/github.com/rocketlaunchr/[email protected]/imports/parquet.go:110 +0x8ae
main.main()
        .../main.go:13 +0x78
  1. Following the stack, I found some useful informations
  • All series in method imports.LoadFromParquet with empty names

image

  • goFieldNameToActual
    each keys in this map with prefix "Scheme", but goName didn't, may be it's the reason why can't not find a name from this map

image

image

This's the first time I use golang to read parquet files. It is an error cause by parquet-go breaking changes or something else ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions