Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -206,7 +206,7 @@ datafusion-execution = { git = "https://github.com/auron-project/datafusion.git"
datafusion-optimizer = { git = "https://github.com/auron-project/datafusion.git", rev = "9034aeffb"}
datafusion-physical-expr = { git = "https://github.com/auron-project/datafusion.git", rev = "9034aeffb"}
datafusion-spark = { git = "https://github.com/auron-project/datafusion.git", rev = "9034aeffb"}
orc-rust = { git = "https://github.com/auron-project/datafusion-orc.git", rev = "9beb12c"}
orc-rust = { git = "https://github.com/auron-project/datafusion-orc.git", rev = "59bcd29"}

# arrow: branch=v55.2.0-blaze
arrow = { git = "https://github.com/auron-project/arrow-rs.git", rev = "5de02520c"}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1014,4 +1014,23 @@ class AuronQuerySuite extends AuronQueryTest with BaseAuronSQLSuite with AuronSQ
|FROM t_filter_agg_2289""".stripMargin)
}
}

test("test not null filter for orc table") {
withTable("orc_string_filter") {
sql("create table orc_string_filter(id int, b string) using orc")
sql("insert into orc_string_filter values (1, 'abc'), (2, null), (3, 'def')")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test reproduces #2005 (the Not / is not null row-group path), but the short values here ('abc', 'def') stay well under ORC's string-statistics truncation threshold, so they don't exercise the #2042 fix (#79, truncated string stats) — that path needs min/max stats long enough to get truncated. Since this PR closes #2042 as well, would it be worth adding a second case with long string values and a comparison/equality predicate, so the truncated-stats fix is locked in against regressions the same way the not-null case is?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added unit test for #2042

checkSparkAnswerAndOperator("select * from orc_string_filter where b is not null")
}
}

test("test string filter for orc table with truncated stats") {
withTable("orc_string_trunc") {
sql("create table orc_string_trunc(id int, b string) using orc")
val longA = "a" * 2000 // > 1024 bytes -> min/max truncated (lower_bound set)
val longB = "b" * 2000 // > 1024 bytes -> upper_bound set
sql(s"insert into orc_string_trunc values (1, '$longA'), (2, '$longB'), (3, 'mid')")
checkSparkAnswerAndOperator(s"select * from orc_string_trunc where b = '$longA'")
checkSparkAnswerAndOperator("select * from orc_string_trunc where b > 'a'")
}
}
}
Loading