-
Couldn't load subscription status.
- Fork 6k
fix[apply-patch]: preserve original CRLF/LF line endings on text updates #5587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cpjet64 thanks for digging into this! Really appreciate the investigation here. My primary concern is whether this change will be sufficiently deterministic.
If you're interested in implementing follow-ups to the review, we're happy to continue the discussion as you iterate, or we can just build off the work you've contributed so far. But we're eager to land a fix here to improve the Windows experience!
codex-rs/apply-patch/src/lib.rs
Outdated
| Crlf, | ||
| } | ||
|
|
||
| fn detect_eol(s: &str) -> Eol { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this will sufficiently cover all cases, e.g. how do we expect to handle newly created files in a repo that uses CRLF?
We might want to consider a more deterministic approach for detect_eol. We could check git configuration.
Curious if you have thoughts on adding logic to collect_git_info and passing using a cli flag to inform this setting instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dylan-hurd-oai Thanks for the review. I kept this PR intentionally small to limit the review surface. If you’re okay with a little more LOC I’ll add a follow-up commit on this same branch (fix/win-crlf-preserve) to make new-file EOL behavior deterministic while leaving edits to existing files unchanged.
Proposal (follow-up commit)
Behavior
-
Existing files: unchanged — detect EOL from on-disk contents and preserve (fast, backward compatible).
-
New files (deterministic precedence):
.gitattributesfor the path (usegit check-attr eol -- <path>where available)git config(core.eolthencore.autocrlf)- CLI/env override:
--assume-eol=(git|crlf|lf|detect)orAPPLY_PATCH_ASSUME_EOL - Fallback to the current default behavior (what this writer already uses)
Wiring
- Add a
decide_eol()helper that implements the precedence and call it at the final write site before normalizing text. - Add a
--assume-eolCLI flag + optional env var for explicit overrides. - Keep diff/preview logic unchanged (previews remain LF); normalization only affects final disk writes.
Common edge cases & how this follow-up handles them
-
Repo not a git repo /
gitunavailable- Fall back to existing behavior.
gitis best-effort; failures are non-fatal and logged at debug level.
- Fall back to existing behavior.
-
.gitattributespattern matching complexity- Prefer
git check-attrto avoid reimplementing matching. Ifgitis missing, fall back to a best-effort.gitattributesparse (documented).
- Prefer
-
core.autocrlfsemantics / cross-machine variance- Precedence:
.gitattributes>core.eol>core.autocrlf. - Interpret
core.autocrlf=true→ CRLF,input→ LF,false→ unknown..gitattributesis the repo-level guarantee.
- Precedence:
-
Binary files /
-textattribute- If
git check-attrreportsbinaryor-text, skip normalization — don’t rewrite binaries.
- If
-
Mixed line endings in existing files
- Preserve current behavior: detect dominant EOL and preserve. No auto-fixing of mixed files.
-
Large files / performance
- We only read content for existing files (current behavior). A streaming heuristic can be added later if needed.
-
Paths with spaces / special characters
- Use
std::process::Command(no shell) and pass path args directly togitto avoid escaping issues.
- Use
-
Files outside repo / submodules / worktrees
- If file sits outside the repo root passed to the writer, skip repo checks and fall back. Submodule/worktree behavior depends on repo root provided; failures fall back safely.
-
CI vs local dev differences
.gitattributesis authoritative for CI; that’s why it’s first in precedence.
-
Old
gitwithoutcheck-attr- Fall back to parsing
.gitattributesifgit check-attrisn’t available; document this as weaker behavior.
- Fall back to parsing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the thorough proposal and consideration here! At first glance this feels reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also considered adding a TUI slash command to auto-detect CRLF/LF, force CRLF, or force LF, but it would have been a much larger addition. I think most users would never use it, so it risked becoming bloat.
Context: * Windows edits sometimes flipped CRLF files to LF or created mixed endings. Change: * Detect the original file EOL (treat any \r\n as CRLF; else LF) and normalize the final text to match only for Update File writes. * Added small helper (detect_eol/normalize_to_eol) and a single call at the final write site. * Added focused tests to verify CRLF/LF preservation. Risk: * Minimal and localized. Only affects the text update write path; binary paths and new-file adds are unchanged. Tests: * cargo fmt * cargo clippy -p codex-apply-patch -- -D warnings (clean) * cargo test -p codex-apply-patch (all tests passed; includes new EOL tests) Verification: * Verified on local runs that updating CRLF files preserves CRLF and LF files preserve LF. New-file behavior remains unchanged.
1822f8e to
0c56804
Compare
…it config > CLI; preserve EOF newline)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dylan-hurd-oai Commit pushed. Please let me know if you have any questions or want any changes! Super happy to help resolve any and all Windows issues!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really like the direction here! 1 strong opinion on the precedence, otherwise mostly structural requestsx
codex-rs/apply-patch/src/lib.rs
Outdated
| AssumeEol::Lf => return Eol::Lf, | ||
| AssumeEol::Crlf => return Eol::Crlf, | ||
| _ => {} | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO cli args should take precedence, even for existing files. If the client of this command explicitly sets their preferene, we should consistently adhere to it. e.g. if we wanted to use this option in core, we would also want it to flow through and ignore the split on existing files.
codex-rs/apply-patch/src/lib.rs
Outdated
| // Existing files: detect from on-disk bytes | ||
| if !is_new_file { | ||
| if let Ok(bytes) = std::fs::read(path) { | ||
| let e = detect_eol_from_bytes(&bytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this fallback should just be based on the patch contents here (i.e. use the previous logic). Re-reading the entire file is going to be too much of a performance hit, and if a file is already mixed then adding more based on the line endings near the edit seems reasonable / equivalently deterministic.
| let _argv0 = args.next(); | ||
| // Allow optional flags, then a single positional PATCH argument; otherwise read from stdin. | ||
| let args = std::env::args().skip(1).collect::<Vec<String>>(); | ||
| let mut patch_arg: Option<String> = None; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we have more than 1 arg, I would prefer to switch to using clap for arg parsing here
codex-rs/apply-patch/src/lib.rs
Outdated
| .arg("check-attr") | ||
| .arg("eol") | ||
| .arg("text") | ||
| .arg("binary") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function call here suggests it's generic, but it parses out specific attributes for line endings. Can we make either move this into git_check_attr_eol to simplify, or make this logic generic?
codex-rs/apply-patch/src/lib.rs
Outdated
|
|
||
| fn os_native_eol() -> Eol { | ||
| if cfg!(windows) { Eol::Crlf } else { Eol::Lf } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lib.rs has now crossed 2000 lines. We should have done this already, but can we start splitting this up? Let's move the eol functions into their own eol mod, please
codex-rs/apply-patch/src/lib.rs
Outdated
| _ => {} | ||
| } | ||
|
|
||
| // 2) .gitattributes, core.eol, core.autocrlf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documenting that I'm a bit conflicted about whether this config or patch contents should take precedence for existing files. I think the performance of not executing git commands on every patch is likely worth it, but given how core apply_patch is I'd like to be exceedingly thoughtful here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dylan-hurd-oai Would you have a issue with some simple caching? Heres what I am thinking:
Precedence and rationale:
For existing files, CLI/env wins; otherwise we use the original file’s EOL from the patch/original buffer (no git), which preserves intent and avoids I/O.
For new files, CLI/env → .gitattributes (path‑specific) → core.eol → core.autocrlf → OS native → detect; this matches repo policy and CI while staying deterministic.
Performance:
Hot path for existing files does zero git calls and no extra file re‑reads (we infer from bytes already in memory or the hunk’s adjacent lines).
New‑file policy does at most two cheap git queries per path (attrs + config), memoized per run.
Edge handling:
Mixed EOL in existing files keeps the dominant EOL (we don’t auto‑“fix” mixed files); binaries or -text skip normalization; trailing newline presence is preserved exactly; previews stay LF‑normalized—only the final write normalizes.
Caching safety:
Per‑process, in‑memory caches keyed by (repo_root, rel_path) with invalidation if the patch touches any .gitattributes; on any git error we fall back by precedence (no risk of stale/corrupt state).
If this works I can easily roll it in.
…ted caching
Context:
* Address Dylan’s review on EOL handling: extract logic into eol.rs, add a clear CLI/env override, and ensure precedence covers .gitattributes, core.eol, core.autocrlf, and OS native. Preserve trailing newline presence exactly and skip normalization for binary/-text. Add an opt‑in caching layer to reduce repeated git calls per run.
Change:
* Extracted EOL types + helpers into src/eol.rs (detect, normalize, git helpers, OS native).
* Implemented precedence: CLI/env > .gitattributes (typed; binary/-text -> Unknown) > core.eol > core.autocrlf > OS native; plus detect‑from‑patch for new files in non‑git dirs or when --assume-eol=detect.
* Standalone binary now uses clap; new -E/--assume-eol {lf|crlf|git|detect}; CLI wins over env.
* Added process‑local, feature‑gated caches for git lookups (core.eol, core.autocrlf, check‑attr) with invalidation on .gitattributes Add/Update/Delete.
* Strengthened Windows handling: repo root canonicalization, forward‑slash rel paths, and resilient rel calculation; ensured attrs win absent overrides.
* Tests: added/updated unit + integration tests for detection, normalization, precedence, detect fallback, CLI/env interaction, Windows path behavior, and caching hit/miss + invalidation.
Risk:
* Low; behavior identical except where explicitly specified (detect fallback for new files). Caches are in‑memory and feature‑gated (default on). No sandbox or shared crates touched.
Tests:
* cargo test -p codex-apply-patch (with cache): 54 passed.
* cargo test -p codex-apply-patch --no-default-features (cache off): 52 passed.
* Workspace run shows an existing app-server user-agent test failure; unrelated to apply‑patch.
Verification:
* Ran cargo fmt, cargo clippy (scoped), and tests as above. Confirmed invalidation triggers on .gitattributes changes and that CLI ‘detect’ overrides repo policy as intended.
|
@dylan-hurd-oai Commit pushed with requested changes. Please let me know if I am on the right track or if theres anything else you would prefer. EOL Policy Implementation
|
Preserve CRLF/LF newline consistency on Windows edits
Problem & intent
Edits on Windows sometimes converted CRLF files to LF or produced mixed endings, creating noisy diffs and IDE warnings. This PR ensures newline preservation during file writes.
Before / After
Before
The text writer emitted LF regardless of a file’s original newline convention.
CRLF files could flip to LF or end up mixed.
After
Scope of change
codex-rs/apply-patch/src/lib.rsEol,detect_eol,normalize_to_eol.No changes to binary paths, new-file adds, TUI paste logic, or PATH/PATHEXT/env behavior.
Security effects
None. Only affects how edited text is serialized to disk.
Testing evidence
Static
cargo fmtcargo clippy -p codex-apply-patch -- -D warnings→ cleancargo test -p codex-apply-patch→ added CRLF/LF preservation tests pass; full suite green.Manual
Notes
Checklist
cargo fmt/clippyclean