Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
4defe5c
docs(awesome): add midscene java sdk (#1324)
yuyutaotao Oct 20, 2025
13b4f1d
fix(core): support number type for aiInput value field (#1339)
quanru Oct 20, 2025
0af151a
fix(report): prevent sidebar jitter when expanding case selector (#1344)
quanru Oct 20, 2025
db3a27e
refactor(core): unify cache config parameters (#1346)
quanru Oct 21, 2025
652f29a
release: v0.30.5
github-actions[bot] Oct 21, 2025
693f76f
docs(site): optimize v0.30 changelog with user-focused improvements (…
quanru Oct 21, 2025
567e5fa
fix(ios): correct horizontal scroll direction and improve swipe imple…
quanru Oct 23, 2025
efbc2d3
feat(android-playground): enable alwaysFetchScreenInfo for AndroidDev…
quanru Oct 23, 2025
ba3849a
fix(core): handle ZodEffects and ZodUnion in schema parsing (#1359)
quanru Oct 23, 2025
a871e67
feat(playground): implement task cancellation for Android/iOS playgro…
quanru Oct 23, 2025
c19cb5b
fix(yaml): skip environment variable interpolation in YAML comments (…
Copilot Oct 23, 2025
ca6a22a
fix(core): handle null data in WaitFor and support array keyName in K…
quanru Oct 23, 2025
5657c82
perf(android): optimize clearInput performance by batching keyevents …
quanru Oct 23, 2025
13132cd
release: v0.30.6
github-actions[bot] Oct 23, 2025
4761a6c
fix(core): improve Assert task error handling (#1374)
quanru Oct 24, 2025
c4072b8
fix(web-integration): add wait logic to both beforeInvokeAction and a…
quanru Oct 24, 2025
bd29efb
release: v0.30.7
github-actions[bot] Oct 24, 2025
6fbb34a
feat(core): set a delay after invoking interaction (#1393)
yuyutaotao Oct 28, 2025
99c528d
Add midscene-pc projects to Awesome MidScene (#1390)
Copilot Oct 28, 2025
5b061ab
release: v0.30.8
github-actions[bot] Nov 4, 2025
e920bd3
refactor(core): replace @langchain/core with native template literals…
lilac Dec 3, 2025
f353300
release: v0.30.9
github-actions[bot] Dec 4, 2025
b90c10d
perf(core): move JSON.stringify out of loop
lufy90 Dec 5, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,10 @@ There are so many UI automation tools out there, and each one seems to be all-po
Community projects that extend Midscene.js capabilities:

* [midscene-ios](https://github.com/lhuanyu/midscene-ios) - iOS automation support for Midscene
* [midscene-pc](https://github.com/Mofangbao/midscene-pc) - PC operation device for Windows, macOS, and Linux
* [midscene-pc-docker](https://github.com/Mofangbao/midscene-pc-docker) - Docker container image with MidScene-PC server pre-installed
* [Midscene-Python](https://github.com/Python51888/Midscene-Python) - Python SDK for Midscene automation
* [midscene-java](https://github.com/Master-Frank/midscene-java) - Java SDK that brings Midscene automation features to JVM projects


## 📝 Credits
Expand Down
3 changes: 3 additions & 0 deletions README.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,10 @@ for (const record of recordList) {
基于 Midscene.js 开发的社区项目:

* [midscene-ios](https://github.com/lhuanyu/midscene-ios) - iOS 设备自动化工具
* [midscene-pc](https://github.com/Mofangbao/midscene-pc) - 支持 Windows、macOS 和 Linux 的 PC 操作设备
* [midscene-pc-docker](https://github.com/Mofangbao/midscene-pc-docker) - 预装 MidScene-PC 服务器的 Docker 容器镜像
* [Midscene-Python](https://github.com/Python51888/Midscene-Python) - Python 版本的 Midscene SDK
* [midscene-java](https://github.com/Master-Frank/midscene-java) - Java 版本的 Midscene SDK,便于在 JVM 项目中使用自动化能力

## 📝 致谢

Expand Down
2 changes: 1 addition & 1 deletion apps/chrome-extension/static/manifest.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "Midscene.js",
"description": "Open-source SDK for automating web pages using natural language through AI.",
"version": "0.135",
"version": "0.140",
"manifest_version": 3,
"permissions": [
"activeTab",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
padding: 0 12px;
background: #F2F4F7;
border-radius: 8px;
border: 1px solid transparent;
cursor: pointer;
display: flex;
align-items: center;
Expand Down
14 changes: 14 additions & 0 deletions apps/site/docs/en/awesome-midscene.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,25 @@ A curated list of community projects that extend Midscene.js capabilities across
- Enables automated testing and interaction with iOS applications
- Extends Midscene's cross-platform capabilities to Apple's mobile ecosystem

### PC automation
- **[midscene-pc](https://github.com/Mofangbao/midscene-pc)** - PC operation device for Windows, macOS, and Linux
- Enables automated testing and interaction with desktop applications across all major platforms
- Supports both local and remote operation capabilities
- **[midscene-pc-docker](https://github.com/Mofangbao/midscene-pc-docker)** - Docker container image with MidScene-PC server pre-installed
- Based on Ubuntu 20 with GNOME desktop for maximum application compatibility
- Includes built-in VNC service for browser-based desktop monitoring
- Deploy automation client directly on standard servers with a single command

### Python SDK
- **[Midscene-Python](https://github.com/Python51888/Midscene-Python)** - Python SDK for Midscene automation
- Brings Midscene's AI-powered automation capabilities to Python developers
- Allows integration with existing Python testing and automation workflows

### Java SDK
- **[midscene-java](https://github.com/Master-Frank/midscene-java)** - Java SDK for Midscene automation
- Offers a JVM-friendly way to script Midscene experiences similar to the Python SDK
- Fits easily into existing Java automation or testing pipelines

## Contributing

Have you created a project that extends Midscene.js? We'd love to feature it here!
Expand Down
54 changes: 54 additions & 0 deletions apps/site/docs/en/changelog.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,60 @@

> For the complete changelog, please refer to: [Midscene Releases](https://github.com/web-infra-dev/midscene/releases)

## v0.30 - 🎯 Cache management upgrade and mobile experience optimization

### 🎯 More flexible cache strategy

v0.30 improves the cache system, allowing you to control cache behavior based on actual needs:

- **Multiple cache modes available**: Supports read-only, write-only, and read-write strategies. For example, use read-only mode in CI environments to reuse cache, and use write-only mode in local development to update cache
- **Automatic cleanup of unused cache**: Agent can automatically clean up unused cache records when destroyed, preventing cache files from accumulating
- **Simplified unified configuration**: Cache configuration parameters for CLI and Agent are now unified, no need to remember different configurations

### 📊 Report management convenience

- **Support for merging multiple reports**: In addition to playwright scenarios, all scenarios now support merging multiple automation execution reports into a single file for centralized viewing and sharing of test results

### 📱 Mobile automation optimization

#### iOS platform improvements
- **Real device support improvement**: Removed simctl check restriction, making iOS real device automation smoother
- **Auto-adapt device display**: Implemented automatic device pixel ratio detection, ensuring accurate element positioning on different iOS devices

#### Android platform enhancements
- **Flexible screenshot optimization**: Added `screenshotResizeRatio` option, allowing you to customize screenshot size while ensuring visual recognition accuracy, reducing network transmission and storage overhead
- **Screen info cache control**: Use `alwaysRefreshScreenInfo` option to control whether to fetch screen information each time, allowing cache reuse in stable environments to improve performance
- **Direct ADB command execution**: AndroidAgent added `runAdbCommand` method for convenient execution of custom device control commands

#### Cross-platform consistency
- **ClearInput support on all platforms**: Solves the problem of AI being unable to accurately plan clear input operations across platforms

### 🔧 Feature enhancements

- **Failure classification**: CLI execution results can now distinguish between "skipped failures" and "actual failures", helping locate issue causes
- **aiInput append mode**: Added `append` option to append input while preserving existing content, suitable for editing scenarios
- **Chrome extension improvements**:
- Popup mode preference saved to localStorage, remembering your choice on next open
- Bridge mode supports auto-connect, reducing manual operations
- Support for GPT-4o and non-visual language models

### 🛡️ Type safety improvements

- **Zod schema validation**: Introduced type checking for action parameters, detecting parameter errors during development to avoid runtime issues
- **Number type support**: Fixed `aiInput` support for number type values, making type handling more robust

### 🐞 Bug fixes

- Fixed potential issues caused by Playwright circular dependencies
- Fixed issue where `aiWaitFor` as the first statement could not generate reports
- Improved video recorder delay logic to ensure the last frame is captured
- Optimized report display logic to view both error information and element positioning information simultaneously
- Fixed issue where `cacheable` option in `aiAction` subtasks was not properly passed

### 📚 Community

- Awesome Midscene section added [midscene-java](./awesome-midscene.md) community project

## v0.29 - 📱 iOS platform support added

### 🚀 iOS platform support added
Expand Down
1 change: 1 addition & 0 deletions apps/site/docs/en/integrate-with-android.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,7 @@ The AndroidDevice constructor supports the following parameters:
- `imeStrategy?: 'always-yadb' | 'yadb-for-non-ascii'` - Optional, when should Midscene invoke [yadb](https://github.com/ysbing/YADB) to input texts. `'yadb-for-non-ascii'` uses yadb only when handling non-ASCII words, while `'always-yadb'` forces yadb for every input task. Try switching between these strategies if the default configuration fails to input texts. (Default: 'yadb-for-non-ascii')
- `displayId?: number` - Optional, the display id to use. (Default: undefined, means use the current display)
- `screenshotResizeScale?: number` - Optional, controls the size of the screenshot Midscene sends to the AI model. Default is `1 / devicePixelRatio`, so a 1200×800 display with a device pixel ratio of 3 sends an image of roughly 400×267 to the model. Adjusting this value manually is not recommended.
- `alwaysRefreshScreenInfo?: boolean` - Optional, whether to re-fetch screen size and orientation information every time. Default is false (uses cache for better performance). Set to true if the device may rotate or you need real-time screen information.

### Additional Android Agent Interfaces

Expand Down
2 changes: 1 addition & 1 deletion apps/site/docs/en/mcp-android.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ Midscene MCP provides the following Android device automation tools:
Parameters:
- deviceId: (Optional) Device ID to connect to. If not provided, uses the first available device.
- displayId: (Optional) Display ID for multi-display Android devices (e.g., 0, 1, 2). When specified, all ADB input operations will target this specific display.
- alwaysFetchScreenInfo: (Optional) Whether to always fetch screen size and orientation from the device on each call. Defaults to false (uses cache for better performance). Set to true if the device may rotate or you need real-time screen information.
- alwaysRefreshScreenInfo: (Optional) Whether to always fetch screen size and orientation from the device on each call. Defaults to false (uses cache for better performance). Set to true if the device may rotate or you need real-time screen information.
```

### App control
Expand Down
16 changes: 15 additions & 1 deletion apps/site/docs/zh/awesome-midscene.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,25 @@
- 支持 iOS 应用程序的自动化测试和交互
- 将 Midscene 的跨平台能力扩展到苹果移动生态系统

### PC 自动化
- **[midscene-pc](https://github.com/Mofangbao/midscene-pc)** - 支持 Windows、macOS 和 Linux 的 PC 操作设备
- 支持跨所有主流平台的桌面应用程序自动化测试和交互
- 支持本地和远程操作能力
- **[midscene-pc-docker](https://github.com/Mofangbao/midscene-pc-docker)** - 预装 MidScene-PC 服务器的 Docker 容器镜像
- 基于 Ubuntu 20 和 GNOME 桌面,最大化应用程序兼容性
- 内置 VNC 服务,支持通过浏览器监控桌面操作
- 一键命令即可在标准服务器上部署自动化客户端

### Python SDK
- **[Midscene-Python](https://github.com/Python51888/Midscene-Python)** - Python 版本的 Midscene SDK
- 为 Python 开发者提供 Midscene 的 AI 驱动自动化能力
- 支持与现有 Python 测试和自动化工作流程的集成

### Java SDK
- **[midscene-java](https://github.com/Master-Frank/midscene-java)** - Java 版本的 Midscene SDK
- 提供与 Python 版本类似的体验,适配 JVM 生态
- 易于整合到现有的 Java 自动化或测试流程

## 如何贡献

创建了扩展 Midscene.js 功能的项目?我们很乐意在这里展示!
Expand All @@ -30,4 +44,4 @@ Awesome Midscene 应当满足:

---

*没有看到你喜欢的平台或语言支持?考虑创建一个社区项目或为现有项目贡献代码!*
*没有看到你喜欢的平台或语言支持?考虑创建一个社区项目或为现有项目贡献代码!*
54 changes: 54 additions & 0 deletions apps/site/docs/zh/changelog.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,60 @@

> 完整更新日志请参考:[Midscene Releases](https://github.com/web-infra-dev/midscene/releases)

## V0.30 - 🎯 缓存管理升级与移动端体验优化

### 🎯 更灵活的缓存策略

v0.30 版本改进了缓存系统,让你可以根据实际需求控制缓存行为:

- **多种缓存模式可选**: 支持只读(read-only)、只写(write-only)、读写(read-write)等策略。例如在 CI 环境中使用只读模式复用缓存,在本地开发时使用只写模式更新缓存
- **自动清理无用缓存**: Agent 销毁时可自动清理未使用的缓存记录,避免缓存文件越积越多
- **配置更简洁统一**: CLI 和 Agent 的缓存配置参数已统一,无需记忆不同的配置方式

### 📊 报告管理更便捷

- **支持合并多个报告**: 除了 playwright 场景,现在任意场景均支持将多次自动化执行的报告合并为单个文件,方便集中查看和分享测试结果

### 📱 移动端自动化优化

#### iOS 平台改进
- **真机支持改进**: 移除了 simctl 检查限制,iOS 真机设备的自动化更流畅
- **自动适配设备显示**: 实现设备像素比自动检测,确保在不同 iOS 设备上元素定位准确

#### Android 平台增强
- **灵活的截图优化**: 新增 `screenshotResizeRatio` 选项,你可以在保证视觉识别准确性的前提下自定义截图尺寸,减少网络传输和存储开销
- **屏幕信息缓存控制**: 通过 `alwaysRefreshScreenInfo` 选项控制是否每次都获取屏幕信息,在稳定环境下可复用缓存提升性能
- **直接执行 ADB 命令**: AndroidAgent 新增 `runAdbCommand` 方法,方便执行自定义的设备控制命令

#### 跨平台一致性
- **ClearInput 全平台支持**: 解决 AI 无法准确规划各平台清空输入的操作问题

### 🔧 功能增强

- **失败分类**: CLI 执行结果现在可以区分「跳过的失败」和「真正的失败」,帮助定位问题原因
- **aiInput 追加输入**: 新增 `append` 选项,在保留现有内容的基础上追加输入,适用于编辑场景
- **Chrome 扩展改进**:
- 弹窗模式偏好会保存到 localStorage,下次打开记住你的选择
- Bridge 模式支持自动连接,减少手动操作
- 支持 GPT-4o 和非视觉语言模型

### 🛡️ 类型安全改进

- **Zod 模式验证**: 为 action 参数引入类型检查,在开发阶段发现参数错误,避免运行时问题
- **数字类型支持**: 修复了 `aiInput` 对 number 类型值的支持,类型处理更健壮

### 🐞 问题修复

- 修复了 Playwright 循环依赖导致的潜在问题
- 修复了 `aiWaitFor` 作为首个语句时无法生成报告的问题
- 改进视频录制器延迟逻辑,确保最后的画面帧也能被捕获
- 优化报告展示逻辑,现在可以同时查看错误信息和元素定位信息
- 修复了 `aiAction` 子任务中 `cacheable` 选项未正确传递的问题

### 📚 社区

- Awesome Midscene 板块新增 [midscene-java](./awesome-midscene.md) 社区项目

## v0.29 - 📱 新增 iOS 平台支持

### 🚀 新增 iOS 平台支持
Expand Down
1 change: 1 addition & 0 deletions apps/site/docs/zh/integrate-with-android.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,7 @@ AndroidDevice 的构造函数支持以下参数:
- `imeStrategy?: 'always-yadb' | 'yadb-for-non-ascii'` - 可选参数,控制 Midscene 何时调用 [yadb](https://github.com/ysbing/YADB) 来输入文本。`'yadb-for-non-ascii'` 仅在输入非 ASCII 文本时启用 yadb,而 `'always-yadb'` 会在所有输入任务中都使用 yadb。如果默认配置无法正确输入文本,可尝试在这两种策略之间切换。默认值为 'yadb-for-non-ascii'。
- `displayId?: number` - 可选参数,用于指定要使用的显示器 ID。默认值为 undefined,表示使用当前显示器。
- `screenshotResizeScale?: number` - 可选参数,控制发送给 AI 模型的截图尺寸。默认值为 `1 / devicePixelRatio`,因此对于分辨率 1200×800、设备像素比(DPR)为 3 的界面,发送到模型的图片约为 400×267。不建议手动修改该参数。
- `alwaysRefreshScreenInfo?: boolean` - 可选参数,是否每次都重新获取屏幕尺寸和方向信息。默认为 false(使用缓存以提高性能)。如果设备可能会旋转或需要实时屏幕信息,设置为 true。

### Android Agent 上的更多接口

Expand Down
2 changes: 1 addition & 1 deletion apps/site/docs/zh/mcp-android.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ Midscene MCP 提供以下 Android 设备自动化工具:
参数:
- deviceId:(可选)要连接的设备 ID。如果未提供,使用第一个可用设备
- displayId:(可选)多屏 Android 设备的显示屏 ID(如 0、1、2),当指定时,所有 ADB 输入操作将针对此特定显示屏
- alwaysFetchScreenInfo:(可选)是否每次都重新获取屏幕尺寸和方向信息。默认为 false(使用缓存以提高性能)。如果设备可能会旋转或需要实时屏幕信息,设置为 true
- alwaysRefreshScreenInfo:(可选)是否每次都重新获取屏幕尺寸和方向信息。默认为 false(使用缓存以提高性能)。如果设备可能会旋转或需要实时屏幕信息,设置为 true
```

### 应用控制
Expand Down
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "midscene",
"private": true,
"version": "0.30.4",
"version": "0.30.9",
"scripts": {
"dev": "nx run-many --target=build:watch --exclude=android-playground,chrome-extension,@midscene/report,doc --verbose --parallel=6",
"build": "nx run-many --target=build --exclude=doc --verbose",
Expand Down
2 changes: 1 addition & 1 deletion packages/android-playground/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@midscene/android-playground",
"version": "0.30.4",
"version": "0.30.9",
"description": "Android playground for Midscene",
"main": "./dist/lib/index.js",
"types": "./dist/types/index.d.ts",
Expand Down
14 changes: 10 additions & 4 deletions packages/android-playground/src/bin.ts
Original file line number Diff line number Diff line change
Expand Up @@ -119,11 +119,17 @@ const main = async () => {
const selectedDeviceId = await selectDevice();
console.log(`✅ Selected device: ${selectedDeviceId}`);

// Create device and agent instances with selected device
const device = new AndroidDevice(selectedDeviceId);
const agent = new AndroidAgent(device);
// Create PlaygroundServer with agent factory
const playgroundServer = new PlaygroundServer(
// Agent factory - creates new agent with device each time
async () => {
const device = new AndroidDevice(selectedDeviceId);
await device.connect();
return new AndroidAgent(device);
},
staticDir,
);

const playgroundServer = new PlaygroundServer(device, agent, staticDir);
const scrcpyServer = new ScrcpyServer();

// Set the selected device in scrcpy server
Expand Down
2 changes: 1 addition & 1 deletion packages/android/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@midscene/android",
"version": "0.30.4",
"version": "0.30.9",
"description": "Android automation library for Midscene",
"keywords": [
"Android UI automation",
Expand Down
2 changes: 1 addition & 1 deletion packages/android/src/agent.ts
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ export async function agentFromAdbDevice(
usePhysicalDisplayIdForDisplayLookup:
opts?.usePhysicalDisplayIdForDisplayLookup,
screenshotResizeScale: opts?.screenshotResizeScale,
alwaysFetchScreenInfo: opts?.alwaysFetchScreenInfo,
alwaysRefreshScreenInfo: opts?.alwaysRefreshScreenInfo,
});

await device.connect();
Expand Down
Loading