Skip to content

Use patternProperties to match columns with a certain prefix #176

@agusinac

Description

@agusinac

In my samplesheet.csv I wish to include other columns, such as CONTRAST_ or VARIABLE_ that will be used downstream in post-analysis.

In my R package, I use something similar as follows, that does perform pattern matching. Unfortunately, this is not supported and the samplesheetToList ignores all patternProperties despite when additionalProperties is set to true.

"patternProperties": {
          "^CONTRAST_": {
            "anyOf": [
                { "type": "string", "pattern": "^(\\S*)$" },
                { "type": "number" }
              ]
            },
          "^VARIABLE_": {
            "anyOf": [
              { "type": "string", "pattern": "^(\\S*)$" },
              { "type": "number" }
            ]
          }
        },

At the moment I made my own samplesheetToMetadata function that simply takes everything and returns it as a list. Of course, this is not the ideal approach..

def samplesheetToMetadata(input) {
    def rows = []
    input.withReader { reader ->
        def headers = reader.readLine().split(',').collect { it.trim() }
        reader.eachLine { line ->
            def values = line.split(',').collect { it.trim() }
            def row = [:]
            headers.eachWithIndex { h, i -> row[h] = values[i] }
            rows << row
        }
    }

    def isNumeric = { str ->
        str ==~ /^-?\d+(\.\d+)?$/
    }

    // Check types for each column
    def columnTypes = [:]
    if (rows) {
        def headers = rows[0].keySet()
        headers.each { col ->
            def values = rows.collect { it[col] }
            def allNumeric = values.every { v -> isNumeric(v) }
            def allString = values.every { v -> v instanceof String && !isNumeric(v) }
            columnTypes[col] = allNumeric ? 'numeric' : (allString ? 'string' : 'mixed')
        }
    }

    return [rows]
}

I didn't bother looking into samplesheetToList because I needed a quick fix to move on, but I might look into this myself soon if you nobody else is looking into it.

Also, on another note, I noticed that when I want to use the same schema_input.json in R.. it doesn't follow the correct json syntax according to json validator

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions