Skip to content

SortedTableMap fails with large datasets (30GB CSV) AssertionError #1057

@jankod

Description

@jankod

AssertionError in flushPage when creating SortedTableMap from large CSV (30GB) I am encountering an AssertionError within the flushPage method of SortedTableMap when attempting to create a database from a 30GB CSV file. This issue does not occur with smaller CSV files (e.g., < 500MB)

Volume volume = MappedFileVol.FACTORY.makeVolume(dbPath, false);
SortedTableMap.Sink<Integer, byte[]> sink =
SortedTableMap.create(
volume,
Serializer.INTEGER,
Serializer.BYTE_ARRAY
)
.createFromSink();

BufferedReader reader = new BufferedReader(new FileReader(groupedCsvPath));
String line;

    while ((line = reader.readLine()) != null) {

        String[] parts = line.split(",", 2);


        double mass = Double.parseDouble(parts[0]);
        int massInt = (int) Math.round(mass * 10000);
        String seqAccs = parts[1];
        byte[] seqAccsBytes = BinaryPeptideDbUtil.writeGroupedRow(seqAccs);
        sink.put(massInt, seqAccsBytes);
    }
    SortedTableMap<Integer, byte[]> map = sink.create();

Exception:

Exception in thread "main" java.lang.AssertionError
at org.mapdb.SortedTableMap$Companion$createFromSink$1.flushPage(SortedTableMap.kt:213)
at org.mapdb.SortedTableMap$Companion$createFromSink$1.pairsToNodes(SortedTableMap.kt:191)
at org.mapdb.SortedTableMap$Companion$createFromSink$1.put(SortedTableMap.kt:139)
at org.mapdb.SortedTableMap$Companion$createFromSink$1.put(SortedTableMap.kt:117)
at org.mapdb.SortedTableMap$Sink.put(SortedTableMap.kt:24)

MapDB Version: 3.1.0
OS: macos (SSD disk) and linux ubuntu (HDD)
JVM Version: 23

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions