[COMPRESS-727] Provide a symlink-resistant and concurrency-safe archive Extractor#776
[COMPRESS-727] Provide a symlink-resistant and concurrency-safe archive Extractor#776tomasilluminati wants to merge 2 commits into
Conversation
…ve Extractor Adds a security-first Extractor under archivers. extractor with a SecureDirectoryStream-based race-safe path on Linux and a best-effort base extractor elsewhere. Symlink and special-file policies, safe hard-link resolution, tests and a changes.xml entry are included.
|
@ppkarwasz PR Opened. |
Compare Path values instead of toString() so the platform separator does not break the test.
|
@ppkarwasz Fixed the Windows Test Errors. |
garydgregory
left a comment
There was a problem hiding this comment.
Hello @tomasilluminati
This is interesting and I am hoping we can extract something more generic on this theme.
The background nugget is hidden in Apache Commons Text in the package-private class PathFence.
I extracted the class for addition in Apache Commons IO in PR 805 but it might not be good enough. The IO PR is something I had started to iterate with @ppkarwasz but other things popped up since and now seems like a good time to revist.
There are two concerns: resolving relative paths and dealing with various types of file system links. Handling the relative path issue is "simple", the links less so. At first glance it looks like you've covered many cases here.
I'm not sure if both concerns should be intermingled in one class or composed.
Can you see a PathFence class as something that can be improved on and reused here?
The PathFence could also be reused in Apache Commons Configuration for example.
Looking forward to your thoughts.
TY!
|
@garydgregory The syntactic part is a clean fit. toAbsolutePath().normalize() plus the within-root check is generic, and Compress, Configuration and Text can all share it as-is. Links are the tricky bit, and I'd lean toward composing rather than folding them in. Part of why they resist living in a path predicate is that during extraction the entry doesn't exist yet, so I can't just toRealPath() the target. The real protection ends up being operational, resolve each component no-follow, create with CREATE_NEW so an existing link is never followed, and on Linux write relative to the directory handle (SecureDirectoryStream) to close the TOCTOU window. That's bound to the act of creating files, not to judging a path. Where @ppkarwasz point lands well is the already-on-disk case, like a trusted config dir. There a resolving check (toRealPath then the same within-root test) makes sense, and that could be an optional mode in PathFence. The extractor would lean on the syntactic decision and keep its no-follow creation layer on top. So my vote is composed. PathFence owns the containment decision, extraction-specific link safety sits around it, and PathFence stays light enough for Configuration to reuse. Happy to dig into the IO PR with you and Piotr whenever it's a good time, and to bring over the link cases I already cover as tests. Thanks. |
Adds a security-first Extractor under archivers. extractor with a SecureDirectoryStream-based race-safe path on Linux and a best-effort base extractor elsewhere. Symlink and special-file policies, safe hard-link resolution, tests and a changes.xml entry are included.