Multi-Serial & Rich Multimodal Data (Embeddings, Visual, Audio) by dam2452 · Pull Request #113 · dam2452/RanchBot

dam2452 · 2026-02-03T08:20:53Z

No description provided.

… inline

Add an approval check to the build_and_test_staging job (if: github.event.review.state == 'approved') so it only runs after a PR review is approved. Also export a comprehensive set of environment variables from repository secrets (Elasticsearch, Postgres, Telegram, JWT, test DB creds, feature flags, ports, and other runtime settings) so the self-hosted staging job has the credentials and configuration it needs to build and test/deploy.

…ew-data

.github/workflows/build_deploy_test.yml

…ew-data

skelly37

powoli do przodu

bot/adapters/telegram/telegram_responder.py

bot/services/reindex/zip_extractor.py

bot/services/reindex/video_path_transformer.py

bot/services/reindex/reindex_service.py

bot/handlers/not_sending_videos/serial_context_handler.py

bot/handlers/not_sending_videos/search_list_handler.py

Co-authored-by: Kamil <skelly37@protonmail.com>

Standardize and refine type hints across multiple modules by replacing bare built-ins (list/dict/tuple) and PEP 585 inline generics with explicit typing generics (List, Dict, Tuple, Any, Optional) and adding required imports. Updated function signatures, dataclass field types, and several helper methods to improve static typing and readability without changing runtime behavior. Affected areas include bot adapters/handlers/services and many preprocessor modules (e.g. telegram_responder, database_manager, bot handlers, reindex, zip_extractor, CLI commands, config, embeddings, transcription importer, utils, validation, and frame subprocessors).

Replace generic Dict[str, Any] annotations with the concrete ElasticsearchSegment type across multiple handlers to improve type safety and clarity. Also import LastClip where needed and update signatures/return types (e.g. __get_segment_and_clip now returns Optional[ElasticsearchSegment], Optional[LastClip]; inline handler methods accept List[ElasticsearchSegment]). Files updated: bot/handlers/not_sending_videos/save_clip_handler.py, bot/handlers/sending_videos/adjust_video_clip_handler.py, bot/handlers/sending_videos/inline_clip_handler.py. No behavioural changes intended, only typing/signature adjustments.

Introduce new video-related TypedDicts in bot/types.py (SceneDict, FrameRequest, HashResult, Detection, VideoMetadata, SceneTimestamp*, SceneTimestampsData, CharacterDetectionInFrame, ObjectDetectionInFrame) and update function signatures across the codebase to use these types. Changes include stronger typing for document lists in reindex_service, multiple episode/scene/embedding handlers in elastic_document_generator, segment/word structures in sound_separator, transcription and report helpers in validation modules, frame request handling in frame_exporter, and scene dict/metadata shapes in scene_detector. These adjustments improve static type checking and clarify data structures passed between preprocessors and services.

Rename multiple internal helper functions to private (single- or double-underscore) to improve encapsulation and clarity across the codebase. Notable changes: auth helpers (verify_password -> _verify_password), JWT helpers (generate_token/verify_jwt_token -> _generate_token/_verify_jwt_token), many DatabaseManager connection/row helpers (get_db_connection/_row_to_video_clip -> __get_db_connection/__row_to_video_clip) and numerous factory/handler methods (create_* -> _create_* or __create_*). Also: remove an unused TelegramInlineQuery sender accessor, move/import and type cleanup in several handlers and preprocessor CLI (renamed embedding/hash/print helpers to private), minor whitespace and formatting fixes, and small typing additions to reindex service. These are refactors only — no business logic changes intended.

Rename public service/progress attributes in handlers to private names (_reindex_service, _last_progress_time, _progress_message, _serial_manager) and update all references to match. Simplify search_list_handler to use _get_user_active_series for determining the series name. Change SerialContextManager to import settings and return settings.DEFAULT_SERIES instead of the hardcoded "ranczo" fallback. Add DEFAULT_SERIES to settings with default "ranczo". These changes standardize naming, centralize the default series value, and clean up progress handling logic.

Add documentation and tooling for refactor work: PLAN_MULTI_SERIES_SUPPORT.md, RAW_STRINGS_* analysis and guides, analyze_raw_strings.py and rename_episodes.py, plus preprocessor/utils/constants.py. Apply widespread preparatory updates across preprocessor and bot (handlers, responders, search, transcription, embeddings, CLI commands, whisper engine, reindex, database, utils, etc.) to support upcoming multi-series output/input path changes and to centralize/raw-string handling. This commit is mainly organizational and preparatory (plans, analysis, constants, and code adjustments) to enable the multi-series migration and string-constant extraction.

Introduce async context manager for ReindexService and update ReindexHandler to use `async with`, moving cleanup into __aexit__. Add get_sender_id to TelegramInlineQuery. Simplify ZipExtractor filename type detection (mapping + Optional return). Remove a large SQL migration block from init_db.sql. Misc: remove unused variables/imports, adjust typing imports (add Any/Optional), relocate/import console in elastic document generator, add pylint duplicate-code disables in constants, tighten imports in scene_detector and transcoder, and add small lint comments in ManualClipHandler. These changes improve resource management, type clarity, and code cleanliness.

Add a guard in AdjustVideoClipHandler to detect invalid intervals (start >= end): reply with an error, log an INFO system message, and abort processing. Also import the new response helpers for the invalid-interval message and log. Update the expected test artifact hash for "merged_clip_1 2 3.mp4" to match the regenerated file.

skelly37

mysle ze to juz odtatnia iteracja

bot/utils/constants.py

bot/handlers/bot_message_handler.py

bot/handlers/sending_videos/send_clip_handler.py

bot/handlers/sending_videos/manual_clip_handler.py

bot/adapters/telegram/telegram_responder.py

bot/handlers/not_sending_videos/serial_context_handler.py

Change send_video to raise exceptions instead of returning bool and standardize reply methods across handlers. - Introduces bot.exceptions with VideoTooLargeException and CompilationTooLargeException to represent oversized video errors. - Updates AbstractResponder.send_video signature to return None and updates RestResponder/TelegramResponder: TelegramResponder now checks file size, raises VideoTooLargeException, and always cleans up files; RestResponder removed boolean return. - BotMessageHandler now catches VideoTooLargeException and CompilationTooLargeException, centralizes too-large user message, and introduces private helper methods (_reply, _reply_error, _handle_ffmpeg_exception, _compile_and_send_video). - compile flow adjusted: compilation raises CompilationTooLargeException when constituent send_video raises VideoTooLargeException. - Bulk update across handlers to use _reply/_reply_error and to stop relying on send_video boolean returns; logging calls updated accordingly. This refactor centralizes error handling for large videos, simplifies responder behavior, and makes handler code consistent.

Introduce bot.exceptions package export and update callers to use it. Add bot/exceptions/__init__.py to re-export VideoException, VideoTooLargeException and CompilationTooLargeException. Update imports in telegram_responder and bot_message_handler to import from bot.exceptions. Use exception chaining (raise ... from exc) when wrapping TelegramEntityTooLarge and when re-raising CompilationTooLargeException to preserve original context.

skelly37

no ogólnie git jest, wywalić te 5 funkcji czy ile jest i można mergować 😎

bot/adapters/telegram/telegram_responder.py

Move auth-related exceptions into bot/exceptions/auth_exceptions.py (renamed from exceptions.py) and re-export TooManyActiveTokensError from bot.exceptions.__init__.py. Update __all__ to include the token error so existing imports continue to work; this separates auth exceptions from video exceptions for clearer organization.

Replace BotMessageHandler wrapper send methods with direct calls to AbstractResponder.send_* and update handlers accordingly. Removed unused imports and response helpers (Path import, get_clip_size_exceed_log_message, get_clip_size_log_message, get_video_sent_log_message) and deleted _answer/_answer_markdown/_answer_photo/_answer_video/_answer_document implementations. Updated EpisodeListHandler and InlineClipHandler to use self._responder.send_markdown/send_text/send_document and adjusted clip-duration handling to call the responder directly.

skelly37 added 30 commits January 24, 2026 16:23

test inline mode

a2a5b5f

fixed abstract

5774679

inline handlers

fac1a4d

inline send actually

e59befa

skip tests

45ae703

skip tests

1d69c84

pylint + added untracked file

3c08ffa

pylint

7b7a9f1

test inline mode

0c5bb4f

fixed abstract

04f4041

inline handlers

93463c4

inline send actually

7d48738

skip tests

b026902

skip tests

7deed14

pylint + added untracked file

2b21025

heavy wip just to address the conflicts

6de7e0f

Merge branch 'inline' of https://github.com/dam2452/RANCZO_KLIPY into…

cad2162

… inline

conflicts with main

3fda478

Merge branch 'main' into inline

4e721c8

test now

7fcf9d9

fix env

cd95e37

fix results passing

ae72bf6

split into functions, still far away from perfection

22b4f99

fix inline + disable broad-exception-caught inspection

ade5ed6

permissions check

0f39047

error handling

3f12cd7

try after errors

62a993f

new logs

33a7479

debug logs

3bfe045

use log_system_message instead of logger

fbb91cc

dam2452 added 2 commits February 7, 2026 15:40

Merge branch 'new-data' of https://github.com/dam2452/RanchBot into n…

635afaa

…ew-data

skelly37 reviewed Feb 7, 2026

View reviewed changes

.github/workflows/build_deploy_test.yml Outdated Show resolved Hide resolved

skelly37 and others added 4 commits February 7, 2026 15:54

Fix manual tests

1c38ad4

Update build_deploy_test.yml

0b4d4bb

Merge branch 'new-data' of https://github.com/dam2452/RanchBot into n…

80456d0

…ew-data

Update expected_file_hashes.json

961076c

skelly37 requested changes Feb 7, 2026

View reviewed changes

dam2452 and others added 11 commits February 7, 2026 23:03

Update bot/adapters/telegram/telegram_responder.py

e9f6e96

Co-authored-by: Kamil <skelly37@protonmail.com>

.

57b7035

.

d861ecc

skelly37 requested changes Feb 8, 2026

View reviewed changes

dam2452 added 3 commits February 8, 2026 16:40

Update __init__.py

3fc498c

skelly37 approved these changes Feb 8, 2026

View reviewed changes

bot/adapters/telegram/telegram_responder.py Show resolved Hide resolved

dam2452 added 2 commits February 8, 2026 17:09

skelly37 enabled auto-merge (squash) February 8, 2026 16:36

skelly37 approved these changes Feb 8, 2026

View reviewed changes

skelly37 merged commit f72e2cc into main Feb 8, 2026
5 checks passed

skelly37 deleted the new-data branch February 8, 2026 16:40

Conversation

dam2452 commented Feb 3, 2026

Uh oh!

Uh oh!

skelly37 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

skelly37 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

skelly37 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants