Skip to content

Fixed Process Watcher garbled text on Windows with non-UTF-8 locales. #9457#9749

Merged
akshay-joshi merged 1 commit intopgadmin-org:masterfrom
RohitBhati8269:GH-9457
Mar 16, 2026
Merged

Fixed Process Watcher garbled text on Windows with non-UTF-8 locales. #9457#9749
akshay-joshi merged 1 commit intopgadmin-org:masterfrom
RohitBhati8269:GH-9457

Conversation

@RohitBhati8269
Copy link
Contributor

@RohitBhati8269 RohitBhati8269 commented Mar 16, 2026

Detail Info:- #9457

Summary by CodeRabbit

  • Bug Fixes
    • Fixed corrupted/garbled text in process logs on Windows with non-UTF-8 locales by detecting subprocess output encoding and ensuring logs are stored and displayed as UTF-8 (with a UTF-8 fallback), so subprocess output appears correctly across platforms.

@coderabbitai
Copy link

coderabbitai bot commented Mar 16, 2026

Walkthrough

Detect subprocess output encoding at startup using locale.getpreferredencoding(False) (with UTF-8 fallback) and store it in _subprocess_encoding. ProcessLogger now decodes incoming subprocess output bytes with that encoding (errors replaced), strips leading line breaks, and re-encodes to UTF-8 before writing.

Changes

Cohort / File(s) Summary
Subprocess Output Encoding Handling
web/pgadmin/misc/bgprocess/process_executor.py
Added locale import and module-level _subprocess_encoding. Initialize _subprocess_encoding on startup using locale.getpreferredencoding(False) with UTF-8 fallback. ProcessLogger log handling: strip leading line breaks, decode bytes using detected encoding with error replacement, then re-encode to UTF-8 before writing. Updated docstrings/comments to reflect bytes input and encoding conversion.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically identifies the main change: fixing garbled text in the Process Watcher on Windows with non-UTF-8 locales, which directly aligns with the code changes that implement subprocess encoding detection and UTF-8 conversion.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can generate a title for your PR based on the changes with custom instructions.

Set the reviews.auto_title_instructions setting to generate a title for your PR based on the changes in the PR with custom instructions.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
web/pgadmin/misc/bgprocess/process_executor.py (1)

215-223: Consider using codecs.lookup() for robust UTF-8 detection.

The check _subprocess_encoding.lower() not in ('utf-8', 'utf8') may miss other UTF-8 aliases. For example, on Windows, 'cp65001' is UTF-8 but wouldn't match, causing unnecessary decode/encode cycles.

Also, with the 'replace' error handler on line 219, UnicodeDecodeError should never be raised—only LookupError can occur if the encoding name is invalid.

♻️ Suggested improvement using codecs normalization
+import codecs
+
+def _is_utf8_encoding(encoding):
+    """Check if encoding is a UTF-8 variant."""
+    try:
+        return codecs.lookup(encoding).name == 'utf-8'
+    except LookupError:
+        return False
+
 # In the log method:
-                if _subprocess_encoding and _subprocess_encoding.lower() not in (
-                        'utf-8', 'utf8'):
+                if _subprocess_encoding and not _is_utf8_encoding(
+                        _subprocess_encoding):
                     try:
                         msg = msg.decode(
                             _subprocess_encoding, 'replace'
                         ).encode('utf-8')
-                    except (UnicodeDecodeError, LookupError):
+                    except LookupError:
                         # If decoding fails, write as-is
                         pass
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/misc/bgprocess/process_executor.py` around lines 215 - 223, The
current encoding check on _subprocess_encoding can miss UTF-8 aliases and
unnecessarily re-encode; use codecs.lookup(_subprocess_encoding).name to get the
canonical encoding name and compare that to 'utf-8' (or compare lowercased
canonical name) before doing the decode/encode, and only catch LookupError from
codecs.lookup (since using 'replace' prevents UnicodeDecodeError); update the
block that decodes msg (the lines referencing _subprocess_encoding and
msg.decode(...).encode('utf-8')) to perform a codecs.lookup check and remove
UnicodeDecodeError from the except clause, keeping only LookupError handling.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@web/pgadmin/misc/bgprocess/process_executor.py`:
- Around line 215-223: The current encoding check on _subprocess_encoding can
miss UTF-8 aliases and unnecessarily re-encode; use
codecs.lookup(_subprocess_encoding).name to get the canonical encoding name and
compare that to 'utf-8' (or compare lowercased canonical name) before doing the
decode/encode, and only catch LookupError from codecs.lookup (since using
'replace' prevents UnicodeDecodeError); update the block that decodes msg (the
lines referencing _subprocess_encoding and msg.decode(...).encode('utf-8')) to
perform a codecs.lookup check and remove UnicodeDecodeError from the except
clause, keeping only LookupError handling.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 396bf3b0-c2b1-477e-a23c-c6636d52126a

📥 Commits

Reviewing files that changed from the base of the PR and between b301bdc and 91f8e8c.

📒 Files selected for processing (1)
  • web/pgadmin/misc/bgprocess/process_executor.py

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
web/pgadmin/misc/bgprocess/process_executor.py (3)

215-223: Consider handling the _subprocess_encoding is None case more explicitly.

While the current code safely falls through (no conversion) when _subprocess_encoding is None, this could happen if the module is imported/used outside the if __name__ == '__main__': block. A defensive comment or explicit handling might improve clarity.

Optional: Add a comment for clarity
                 # Convert subprocess output from system encoding to UTF-8
                 # This fixes garbled text on Windows with non-UTF-8 locales
                 # (e.g., Japanese CP932, Chinese GBK)
                 msg = msg.lstrip(b'\r\n' if _IS_WIN else b'\n')
-                if _subprocess_encoding and \
+                # _subprocess_encoding is set at startup; if None, skip conversion
+                if _subprocess_encoding and \
                         _subprocess_encoding.lower() not in ('utf-8', 'utf8'):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/misc/bgprocess/process_executor.py` around lines 215 - 223, The
branch that converts msg based on _subprocess_encoding should explicitly handle
the None case for clarity: update the conditional around _subprocess_encoding in
process_executor.py (the block referencing _subprocess_encoding and the
msg.decode(...).encode('utf-8') logic) to either add an explicit early-check for
_subprocess_encoding is None (and skip conversion) or add a concise defensive
comment explaining that None means no conversion is performed; ensure the check
remains: if _subprocess_encoding and _subprocess_encoding.lower() not in
('utf-8', 'utf8'): and document the None behavior so future imports/usages
outside __main__ are clear.

214-214: Verify the necessity of lstrip() on subprocess output.

The lstrip(b'\r\n' if _IS_WIN else b'\n') strips all leading newline characters. Since readline() in the run() method returns lines with trailing (not leading) newlines, this might be addressing a specific edge case. However, lstrip() could inadvertently remove meaningful empty lines if the subprocess output begins with intentional blank lines.

Was this added to handle a specific scenario with garbled output, or could it be narrowed to a more targeted fix?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/misc/bgprocess/process_executor.py` at line 214, The code
currently uses msg = msg.lstrip(b'\r\n' if _IS_WIN else b'\n') which strips all
leading newlines and can remove intentional blank lines; in the run() method
narrow this to only remove a single leading newline sequence when present
(rather than all leading newline characters). Locate the msg variable in
process_executor.py (inside run()) and replace the lstrip call with logic that
checks startswith(b'\r\n') on Windows or startswith(b'\n') otherwise and then
slices off only that one sequence (e.g., msg = msg[len(prefix):] when present),
so you preserve multiple intentional leading blank lines while handling the
specific garbled-leading-newline case.

191-229: Consider adding test coverage for encoding conversion.

This fix addresses a platform-specific issue (Windows with non-UTF-8 locales like CP932, GBK). Manual testing across different locales is difficult. Consider adding unit tests that mock _subprocess_encoding to verify:

  1. UTF-8 input passes through unchanged
  2. Non-UTF-8 input (e.g., CP932 bytes) is correctly converted to UTF-8
  3. Invalid/malformed bytes are handled gracefully with 'replace'

Would you like me to help draft test cases for this encoding conversion logic?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/misc/bgprocess/process_executor.py` around lines 191 - 229, Add
unit tests for process_executor.log to cover encoding conversion: create a
ProcessExecutor instance (or the class under test) with a writable in-memory
logger and mock get_current_time to a fixed value, then set the module-global
_subprocess_encoding to each scenario and call log(msg) with representative byte
inputs; verify the logger content contains the fixed timestamp prefix and the
UTF-8 payload. Test cases: (1) _subprocess_encoding set to 'utf-8' and msg
already UTF-8 — assert bytes passed through unchanged (minus leading newline
stripping); (2) set _subprocess_encoding to a non-UTF8 encoding (e.g., 'cp932')
and provide bytes that decode under that encoding — assert output is converted
to UTF-8; (3) provide malformed bytes and set _subprocess_encoding to a non-UTF8
value to force the except path — assert log still writes something (no
exception) and uses replacement characters. Ensure tests restore/cleanup module
globals (_subprocess_encoding, _IS_WIN if modified) and the in-memory logger
between tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@web/pgadmin/misc/bgprocess/process_executor.py`:
- Around line 210-225: The log writer currently converts subprocess output using
_subprocess_encoding and then writes bytes via self.logger.write(msg), which
leads to log files encoded as UTF-8 but read back using sys.getdefaultencoding()
in read_log(), causing garbled text on Windows non-UTF-8 locales; fix by
normalizing to UTF-8 at the write side and making the reader explicitly use
UTF-8: in process_executor.py ensure the msg path always encodes to UTF-8 (i.e.,
after any decoding/replacement do .encode('utf-8') unconditionally before
calling self.logger.write), and update read_log() in processes.py to decode
using 'utf-8' (not sys.getdefaultencoding()) so both writer (msg handling) and
reader (read_log) consistently use UTF-8.

---

Nitpick comments:
In `@web/pgadmin/misc/bgprocess/process_executor.py`:
- Around line 215-223: The branch that converts msg based on
_subprocess_encoding should explicitly handle the None case for clarity: update
the conditional around _subprocess_encoding in process_executor.py (the block
referencing _subprocess_encoding and the msg.decode(...).encode('utf-8') logic)
to either add an explicit early-check for _subprocess_encoding is None (and skip
conversion) or add a concise defensive comment explaining that None means no
conversion is performed; ensure the check remains: if _subprocess_encoding and
_subprocess_encoding.lower() not in ('utf-8', 'utf8'): and document the None
behavior so future imports/usages outside __main__ are clear.
- Line 214: The code currently uses msg = msg.lstrip(b'\r\n' if _IS_WIN else
b'\n') which strips all leading newlines and can remove intentional blank lines;
in the run() method narrow this to only remove a single leading newline sequence
when present (rather than all leading newline characters). Locate the msg
variable in process_executor.py (inside run()) and replace the lstrip call with
logic that checks startswith(b'\r\n') on Windows or startswith(b'\n') otherwise
and then slices off only that one sequence (e.g., msg = msg[len(prefix):] when
present), so you preserve multiple intentional leading blank lines while
handling the specific garbled-leading-newline case.
- Around line 191-229: Add unit tests for process_executor.log to cover encoding
conversion: create a ProcessExecutor instance (or the class under test) with a
writable in-memory logger and mock get_current_time to a fixed value, then set
the module-global _subprocess_encoding to each scenario and call log(msg) with
representative byte inputs; verify the logger content contains the fixed
timestamp prefix and the UTF-8 payload. Test cases: (1) _subprocess_encoding set
to 'utf-8' and msg already UTF-8 — assert bytes passed through unchanged (minus
leading newline stripping); (2) set _subprocess_encoding to a non-UTF8 encoding
(e.g., 'cp932') and provide bytes that decode under that encoding — assert
output is converted to UTF-8; (3) provide malformed bytes and set
_subprocess_encoding to a non-UTF8 value to force the except path — assert log
still writes something (no exception) and uses replacement characters. Ensure
tests restore/cleanup module globals (_subprocess_encoding, _IS_WIN if modified)
and the in-memory logger between tests.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 006b5aef-d407-4a91-9c1a-c65905e12f47

📥 Commits

Reviewing files that changed from the base of the PR and between 91f8e8c and 9b92c45.

📒 Files selected for processing (1)
  • web/pgadmin/misc/bgprocess/process_executor.py

@akshay-joshi akshay-joshi merged commit 74b1a3b into pgadmin-org:master Mar 16, 2026
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants