Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, I might have misworded my question. My question is in relation to this paragraph on page 12:

"Note that the SQL ORDER BY directive can be modelled as a non-linear aggregate function that emits a list. However, such an implementation is not efficiently incrementalizable in DBSP. We leave the efficient handling of ORDER BY to future work."

My understanding is that Feldera does indeed support ORDER BY, which I imagine it does incrementally, thus my question.

The statement in the paper that ordering is not efficiently incrementalisable seems to makes sense to me. It is clear that even though Z-sets are not naively able to represent diffs of ordered relations (since Z-sets are unordered), ordering can be modelled as an aggregate that first builds up the first row, then the second row, and so on. Even as formulated this way however, I fail to see how the entire "incrementalised" computation would still be practically incremental, in the sense that the size of the output diff (Z-set) is small as long as the diff of the input is small.

For example, consider the query `select x from y order by x asc`, and the following values respectively occur in the stream of x: 5, 4, 3, 2, 1. Now, consider the incremental diff for the last value of 1. If presumably one models order by a list aggregation, then the Z-set for the entire computation seems to be

    - [ 2, 3, 4, 5 ]
    + [ 1, 2, 3, 4, 5 ]
which grows with the size of the output set rather than the size of the input diff. If presumably one models order by e.g. adding an order column, the diff would be

    - 2 (index: 0)
    - 3 (index: 1)
    - 4 (index: 2)
    - 5 (index: 3)
    + 1 (index: 0)
    + 2 (index: 1)
    + 3 (index: 2)
    + 4 (index: 3)
    + 5 (index: 4)
which once again varies with the size of the output set.


Your explanation of why ORDER BY is not efficiently incrementalizable is spot on. At the moment Feldera ignores the outermost ORDER BY clause, unless it is part of the ORDER BY ... LIMIT pattern, which is SQL's way to express the top-k query.

So, a better solution is still future work :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: