Skip to content

Multi-Serial & Rich Multimodal Data (Embeddings, Visual, Audio)#113

Merged
skelly37 merged 160 commits intomainfrom
new-data
Feb 8, 2026
Merged

Multi-Serial & Rich Multimodal Data (Embeddings, Visual, Audio)#113
skelly37 merged 160 commits intomainfrom
new-data

Conversation

@dam2452
Copy link
Owner

@dam2452 dam2452 commented Feb 3, 2026

No description provided.

Add an approval check to the build_and_test_staging job (if: github.event.review.state == 'approved') so it only runs after a PR review is approved. Also export a comprehensive set of environment variables from repository secrets (Elasticsearch, Postgres, Telegram, JWT, test DB creds, feature flags, ports, and other runtime settings) so the self-hosted staging job has the credentials and configuration it needs to build and test/deploy.
Copy link
Collaborator

@skelly37 skelly37 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

powoli do przodu

dam2452 and others added 11 commits February 7, 2026 23:03
Co-authored-by: Kamil <skelly37@protonmail.com>
Standardize and refine type hints across multiple modules by replacing bare built-ins (list/dict/tuple) and PEP 585 inline generics with explicit typing generics (List, Dict, Tuple, Any, Optional) and adding required imports. Updated function signatures, dataclass field types, and several helper methods to improve static typing and readability without changing runtime behavior. Affected areas include bot adapters/handlers/services and many preprocessor modules (e.g. telegram_responder, database_manager, bot handlers, reindex, zip_extractor, CLI commands, config, embeddings, transcription importer, utils, validation, and frame subprocessors).
Replace generic Dict[str, Any] annotations with the concrete ElasticsearchSegment type across multiple handlers to improve type safety and clarity. Also import LastClip where needed and update signatures/return types (e.g. __get_segment_and_clip now returns Optional[ElasticsearchSegment], Optional[LastClip]; inline handler methods accept List[ElasticsearchSegment]). Files updated: bot/handlers/not_sending_videos/save_clip_handler.py, bot/handlers/sending_videos/adjust_video_clip_handler.py, bot/handlers/sending_videos/inline_clip_handler.py. No behavioural changes intended, only typing/signature adjustments.
Introduce new video-related TypedDicts in bot/types.py (SceneDict, FrameRequest, HashResult, Detection, VideoMetadata, SceneTimestamp*, SceneTimestampsData, CharacterDetectionInFrame, ObjectDetectionInFrame) and update function signatures across the codebase to use these types. Changes include stronger typing for document lists in reindex_service, multiple episode/scene/embedding handlers in elastic_document_generator, segment/word structures in sound_separator, transcription and report helpers in validation modules, frame request handling in frame_exporter, and scene dict/metadata shapes in scene_detector. These adjustments improve static type checking and clarify data structures passed between preprocessors and services.
Rename multiple internal helper functions to private (single- or double-underscore) to improve encapsulation and clarity across the codebase. Notable changes: auth helpers (verify_password -> _verify_password), JWT helpers (generate_token/verify_jwt_token -> _generate_token/_verify_jwt_token), many DatabaseManager connection/row helpers (get_db_connection/_row_to_video_clip -> __get_db_connection/__row_to_video_clip) and numerous factory/handler methods (create_* -> _create_* or __create_*). Also: remove an unused TelegramInlineQuery sender accessor, move/import and type cleanup in several handlers and preprocessor CLI (renamed embedding/hash/print helpers to private), minor whitespace and formatting fixes, and small typing additions to reindex service. These are refactors only — no business logic changes intended.
Rename public service/progress attributes in handlers to private names (_reindex_service, _last_progress_time, _progress_message, _serial_manager) and update all references to match. Simplify search_list_handler to use _get_user_active_series for determining the series name. Change SerialContextManager to import settings and return settings.DEFAULT_SERIES instead of the hardcoded "ranczo" fallback. Add DEFAULT_SERIES to settings with default "ranczo". These changes standardize naming, centralize the default series value, and clean up progress handling logic.
Add documentation and tooling for refactor work: PLAN_MULTI_SERIES_SUPPORT.md, RAW_STRINGS_* analysis and guides, analyze_raw_strings.py and rename_episodes.py, plus preprocessor/utils/constants.py. Apply widespread preparatory updates across preprocessor and bot (handlers, responders, search, transcription, embeddings, CLI commands, whisper engine, reindex, database, utils, etc.) to support upcoming multi-series output/input path changes and to centralize/raw-string handling. This commit is mainly organizational and preparatory (plans, analysis, constants, and code adjustments) to enable the multi-series migration and string-constant extraction.
Introduce async context manager for ReindexService and update ReindexHandler to use `async with`, moving cleanup into __aexit__. Add get_sender_id to TelegramInlineQuery. Simplify ZipExtractor filename type detection (mapping + Optional return). Remove a large SQL migration block from init_db.sql. Misc: remove unused variables/imports, adjust typing imports (add Any/Optional), relocate/import console in elastic document generator, add pylint duplicate-code disables in constants, tighten imports in scene_detector and transcoder, and add small lint comments in ManualClipHandler. These changes improve resource management, type clarity, and code cleanliness.
Add a guard in AdjustVideoClipHandler to detect invalid intervals (start >= end): reply with an error, log an INFO system message, and abort processing. Also import the new response helpers for the invalid-interval message and log. Update the expected test artifact hash for "merged_clip_1 2 3.mp4" to match the regenerated file.
Copy link
Collaborator

@skelly37 skelly37 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mysle ze to juz odtatnia iteracja

Change send_video to raise exceptions instead of returning bool and standardize reply methods across handlers.

- Introduces bot.exceptions with VideoTooLargeException and CompilationTooLargeException to represent oversized video errors.
- Updates AbstractResponder.send_video signature to return None and updates RestResponder/TelegramResponder: TelegramResponder now checks file size, raises VideoTooLargeException, and always cleans up files; RestResponder removed boolean return.
- BotMessageHandler now catches VideoTooLargeException and CompilationTooLargeException, centralizes too-large user message, and introduces private helper methods (_reply, _reply_error, _handle_ffmpeg_exception, _compile_and_send_video).
- compile flow adjusted: compilation raises CompilationTooLargeException when constituent send_video raises VideoTooLargeException.
- Bulk update across handlers to use _reply/_reply_error and to stop relying on send_video boolean returns; logging calls updated accordingly.

This refactor centralizes error handling for large videos, simplifies responder behavior, and makes handler code consistent.
Introduce bot.exceptions package export and update callers to use it. Add bot/exceptions/__init__.py to re-export VideoException, VideoTooLargeException and CompilationTooLargeException. Update imports in telegram_responder and bot_message_handler to import from bot.exceptions. Use exception chaining (raise ... from exc) when wrapping TelegramEntityTooLarge and when re-raising CompilationTooLargeException to preserve original context.
Copy link
Collaborator

@skelly37 skelly37 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no ogólnie git jest, wywalić te 5 funkcji czy ile jest i można mergować 😎

Move auth-related exceptions into bot/exceptions/auth_exceptions.py (renamed from exceptions.py) and re-export TooManyActiveTokensError from bot.exceptions.__init__.py. Update __all__ to include the token error so existing imports continue to work; this separates auth exceptions from video exceptions for clearer organization.
Replace BotMessageHandler wrapper send methods with direct calls to AbstractResponder.send_* and update handlers accordingly. Removed unused imports and response helpers (Path import, get_clip_size_exceed_log_message, get_clip_size_log_message, get_video_sent_log_message) and deleted _answer/_answer_markdown/_answer_photo/_answer_video/_answer_document implementations. Updated EpisodeListHandler and InlineClipHandler to use self._responder.send_markdown/send_text/send_document and adjusted clip-duration handling to call the responder directly.
@skelly37 skelly37 enabled auto-merge (squash) February 8, 2026 16:36
@skelly37 skelly37 merged commit f72e2cc into main Feb 8, 2026
5 checks passed
@skelly37 skelly37 deleted the new-data branch February 8, 2026 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment