Skip to content

Conversation

ncarbon
Copy link
Collaborator

@ncarbon ncarbon commented Sep 23, 2025

Description

This pull request introduces more validation and rendering improvements for handling faker method arguments in the mock data generator modal. The main changes include adding comprehensive validation logic for faker arguments and method names, updating UI components to display faker arguments, and refactoring validation logic out of the module to a shared utility.

Screenshot 2025-09-25 at 10 55 11 AM Screenshot 2025-09-25 at 1 55 11 PM Screenshot 2025-09-25 at 1 03 48 PM

Validation and Utility Enhancements:

  • Added areFakerArgsValid and improved isValidFakerMethod utility functions in utils.ts to thoroughly validate faker arguments (including nested and object types) and method names, enforcing limits on argument length, string size, and value types.
  • Refactored the schema validation logic in collection-tab.ts to use the new isValidFakerMethod utility, ensuring faker method and argument validation is consistent and centralized. [1] [2] [3]

Type and Interface Updates:

  • Extended the FakerArg type to support nested arrays, enabling more flexible representation of faker arguments.

UI and Rendering Improvements:

  • Updated FakerMappingSelector to render TextInput components for each faker argument, supporting various types (string, number, boolean, arrays, and objects) and displaying them in a read-only format. [1] [2] [3]
  • Updated FakerSchemaEditorContent to pass the active faker arguments to the mapping selector for display. [1] [2]

Testing:

  • Added comprehensive unit tests for areFakerArgsValid and isValidFakerMethod in utils.spec.ts, covering edge cases such as deeply nested structures, invalid argument types, and method existence.
  • Added an integration test to ensure that excessively large faker arguments are not displayed in the modal.

Checklist

  • New tests and/or benchmarks are included
  • Documentation is changed or added
  • If this change updates the UI, screenshots/videos are added and a design review is requested
  • I have signed the MongoDB Contributor License Agreement (https://www.mongodb.com/legal/contributor-agreement)

Motivation and Context

  • Bugfix
  • New feature
  • Dependency update
  • Misc

Open Questions

Dependents

Types of changes

  • Backport Needed
  • Patch (non-breaking change which fixes an issue)
  • Minor (non-breaking change which adds functionality)
  • Major (fix or feature that would cause existing functionality to change)

@ncarbon ncarbon added the no release notes Fix or feature not for release notes label Sep 23, 2025
@github-actions github-actions bot added the feat label Sep 23, 2025
@ncarbon ncarbon marked this pull request as ready for review September 23, 2025 17:15
@ncarbon ncarbon requested a review from a team as a code owner September 23, 2025 17:15
@ncarbon ncarbon requested review from mabaasit and Copilot September 23, 2025 17:15
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances the mock data generator modal with comprehensive validation and rendering improvements for faker method arguments. The changes focus on improving argument validation, centralizing validation logic, and enabling UI display of faker arguments.

  • Adds comprehensive validation for faker arguments including nested structures, type checking, and size limits
  • Refactors validation logic from collection-tab.ts to a shared utility module for better maintainability
  • Updates UI components to display faker arguments as read-only text inputs with proper type handling

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
collection-tab.ts Refactors validation to use centralized utility function
utils.ts Introduces comprehensive faker argument and method validation utilities
utils.spec.ts Adds extensive unit tests for validation functions
script-generation-utils.ts Extends FakerArg type to support nested arrays
mock-data-generator-modal.spec.tsx Adds integration test for oversized argument handling
faker-schema-editor-screen.tsx Passes faker arguments to mapping selector
faker-mapping-selector.tsx Implements UI rendering for faker arguments

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link

Assigned gagik for team compass-developers because mabaasit is out of office.

expect(result.fakerArgs).to.deep.equal([]);
});

it('returns false for valid method with invalid arguments and fallback fails', () => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be invalid arguments with valid method falls back to true and strips args? That's what's happening I think, which is why this test is failing right now

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, forgot to update it.

expect(result.fakerArgs).to.deep.equal([]);
});

it('returns false for method with invalid fakerArgs', () => {
Copy link
Collaborator

@jcobis jcobis Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment applies here too I believe. Both tests should expect isValid: true and fakerArgs: []

} else if (typeof arg === 'boolean') {
// booleans are always valid, continue
} else if (Array.isArray(arg)) {
if (!areFakerArgsValid(arg)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we limit the depth of this to prevent potential infinite recursion? We could pass a depth param and limit to say, 3 levels

callable();
return { isValid: true, fakerArgs: [] };
}
} catch {
Copy link
Collaborator

@kpamaran kpamaran Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A warning when there's an exception or the args are invalid would be valuable for fine tuning the prompt

@kpamaran
Copy link
Collaborator

The faker arg form experience itself is interesting, I wonder how well users can intuit that the args are positional. Could be a follow-up. Maybe the user can be shown an inline preview of the faker call (cc @jcobis), right now they'd need to make the connection by reading the Script output

@ncarbon
Copy link
Collaborator Author

ncarbon commented Sep 23, 2025

The faker arg form experience itself is interesting, I wonder how well users can intuit that the args are positional. Could be a follow-up. Maybe the user can be shown an inline preview of the faker call (cc @jcobis), right now they'd need to make the connection by reading the Script output

Yeah, good question. If we can't show a specific label for each faker argument, an inline preview of the faker call would be good UI to show.

@jcobis
Copy link
Collaborator

jcobis commented Sep 24, 2025

The faker arg form experience itself is interesting, I wonder how well users can intuit that the args are positional. Could be a follow-up. Maybe the user can be shown an inline preview of the faker call (cc @jcobis), right now they'd need to make the connection by reading the Script output

Yeah, good question. If we can't show a specific label for each faker argument, an inline preview of the faker call would be good UI to show.

What if: if there are args, we just display the whole sample function call with args in a readonly text input? This is what I had in mind. Thoughts? @ncarbon @kpamaran

@ncarbon ncarbon changed the title feat(compass-collection): CLOUDP-347048 LLM Output Validation Followup - Mock Data Generator feat(compass-collection): Output Validation Followup - Mock Data Generator CLOUDP-347048 LLM Sep 25, 2025
@kpamaran
Copy link
Collaborator

What if: if there are args, we just display the whole sample function call with args in a readonly text input? This is what I had in mind. Thoughts?

I have confidence the sample call should be displayed. I'm not sure if it should be in an input, because the preview is not part of the form data (non-semantic usage). It could be some inline text above the field set. @jcobis @ncarbon

const MAX_FAKER_ARGS_LENGTH = 10;
const MAX_FAKER_STRING_LENGTH = 1000;
const MAX_FAKER_ARGS_DEPTH = 3;
const MAX_FAKER_NUMBER_SIZE = 1000;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on these values? @jcobis

Copy link
Collaborator

@jcobis jcobis Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we could bump MAX_FAKER_NUMBER_SIZE to 10000 at least

@ncarbon ncarbon changed the title feat(compass-collection): Output Validation Followup - Mock Data Generator CLOUDP-347048 LLM feat(compass-collection): Output Validation Followup - Mock Data Generator LLM CLOUDP-347048 Sep 25, 2025
@ncarbon ncarbon requested a review from kpamaran September 25, 2025 20:57
fontWeight: 600,
});

const parseFakerArg = (arg: FakerArg): string => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: there's a parseFakerArgs with a different behavior. One should be renamed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find another function you mentioned, but the name still doesn't make too much sense: it seems to be the opposite of parse, takes the arguments and serializes them to string

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renaming to stringifyFakerArg

@ncarbon ncarbon changed the title feat(compass-collection): Output Validation Followup - Mock Data Generator LLM CLOUDP-347048 feat(compass-collection): Output Validation Followup - Mock Data Generator LLM CLOUDP-347048 Sep 26, 2025
@ncarbon ncarbon requested a review from gribnoysup September 29, 2025 16:17
fontWeight: 600,
});

const parseFakerArg = (arg: FakerArg): string => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find another function you mentioned, but the name still doesn't make too much sense: it seems to be the opposite of parse, takes the arguments and serializes them to string

Comment on lines 31 to 38
if (typeof arg === 'object' && arg !== null && 'json' in arg) {
try {
return JSON.stringify(JSON.parse(arg.json));
} catch {
return '';
}
}
return arg.toString();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the type definition, doesn't look like the array case is being handled well at all. If arg is FakerArg[] (which is part of the FakerArg type, it recursively references itself), you will just call prototype toString on it. Is this a mistake in how types are defined or in the method logic?

Copy link
Collaborator

@jcobis jcobis Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, going to handle arrays recursively

Comment on lines 121 to 127
function isAllowedHelper(moduleName: string, methodName: string) {
return (
moduleName !== 'helpers' ||
methodName === 'arrayElement' ||
methodName === 'arrayElements'
);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard to understand the intent here reading this expression, also function name doesn't match what's going on here, we're not validating only the helpers, but all method names. I would suggest to restructure it to be more explicit about what we're actually checking:

Suggested change
function isAllowedHelper(moduleName: string, methodName: string) {
return (
moduleName !== 'helpers' ||
methodName === 'arrayElement' ||
methodName === 'arrayElements'
);
}
function isAllowedFakerFn(moduleName: string, methodName: string) {
if (moduleName !== 'helpers') {
return true // all non-helper modules are allowed unconditionally
}
return methodName === 'arrayElement' || methodName === 'arrayElements' // only array helpers are allowed
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored for readability

if (typeof arg === 'object' && arg !== null && 'json' in arg) {
return JSON.parse((arg as { json: string }).json);
}
return arg;
Copy link
Collaborator

@gribnoysup gribnoysup Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FakerArg is recursively an array of the same type and this is not handled. Is this type actually defined correctly? There is a lot of code here now that uses FakerArg[], but FakerArg type already includes an array, so either marking it as an array is redundant, or the type is wrong and it's always supporsed to be a flat array of arguments

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recursive type definition is to support nested arrays.

import type { FakerArg } from './script-generation-utils';
import { faker } from '@faker-js/faker/locale/en';

const MAX_FAKER_ARGS_LENGTH = 10;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking through the docs I can't see a case where it's ever more than 1 argument actually, any reason not to limit this more aggresively?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see some examples of 2 arguments, such as fromCharacters and arrayElements, but none more than that. Will reduce this to 2.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see the issue, this is just an imprecise variable name. Should be MAX_ARRAY_LENGTH, will rename. I can also add validation to ensure top-level args are limited to 2 at most.

Comment on lines 51 to 54
typeof (arg as { json?: unknown }).json === 'string'
) {
try {
const parsedJson = JSON.parse((arg as { json: string }).json);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use an in check here which will also make it so that typescript know that property exists, so that you don't need to do extra assertions later:

Suggested change
typeof (arg as { json?: unknown }).json === 'string'
) {
try {
const parsedJson = JSON.parse((arg as { json: string }).json);
'json' in arg && typeof arg.json === 'string'
) {
try {
const parsedJson = JSON.parse(arg.json);

function prepareFakerArgs(args: FakerArg[]) {
return args.map((arg) => {
if (typeof arg === 'object' && arg !== null && 'json' in arg) {
return JSON.parse((arg as { json: string }).json);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant assertion

Suggested change
return JSON.parse((arg as { json: string }).json);
return JSON.parse(arg.json);

@jcobis jcobis requested a review from gribnoysup October 6, 2025 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat no release notes Fix or feature not for release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants