Roadmap comments from DAI perspective #2700
Replies: 4 comments 3 replies
-
|
Any news on time-estimates? |
Beta Was this translation helpful? Give feedback.
-
|
regarding:
My understanding: Right now, we load data in DTBL format, do munging, but before pushing into LGBM we need to convert data into format expected by LGBM (=copy of data). @arnocandel was proposing to integrate DTBL directly to LGBM to be able to accept data in DTBL format.
Able to use MOJO in "map" call on data frame in a similar way how H2O-3/Spark are scoring with MOJO: split data to partitions and score partitions in parallel. |
Beta Was this translation helpful? Give feedback.
-
|
Actions
|
Beta Was this translation helpful? Give feedback.
-
|
Current priority from H2O:
From community perspective:
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
This is the list of issues annotated by the priority from DAI perspective:
🚨 Top Priority
Goal: optimize computation speed, minimize data conversions
to_arrowdocs)High Priority
Eliminate "Rereading" pass in fread? #1843,streaming dt.fread(path).to_jay() #1750) (DAI: HIGH, time needed: ?)Other
Full support of datasets with >2B rows (Implement sorting of columns with >2B rows #2336) (DAI: MEDIUM, time needed: ?)
Inner/outer joins (Implement inner/outer joins for non-keyed frames #1080) (DAI: LOW)
Ability to score dataframe using MOJO in efficient way (
f.score(mojo)) (DAI: MEDIUM, time needed: ?) [clarify]Fread improvements:
Eliminate "Rereading" pass in fread? #1843,streaming dt.fread(path).to_jay() #1750) (DAI: LOW, time needed: ?)fread may sometimes detect incorrect newline character #1343, Unable to parse attached dataset. #1045, File containing a single unescaped " out-of-sample is read incorrectly #1036, Improve headers detection logic when all columns are of "string" type #946, If last field has unclosed quote, then it will not be parsed properly #934, fread should not detect sep within quoted fields #922, fread erroneously guesses sep=' ' #518) (DAI: LOW, time needed: ?)Functions for string manipulations (DAI: MEDIUM, time needed: ?)
*)Functions for date/time manipulations (DAI: LOW, time needed: ?)
Rolling windows support (Rolling aggregate support based on windows within a DT #1500) (DAI: LOW, time needed: ?)
New proposals
Beta Was this translation helpful? Give feedback.
All reactions