You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: tools/agentic_mcp_evaluation/mcp_evaluation_tasks.xml
+1-28Lines changed: 1 addition & 28 deletions
Original file line number
Diff line number
Diff line change
@@ -15,44 +15,17 @@
15
15
<task>
16
16
<prompt>Which neighborhoods have the highest share of users seeking a Long-term relationship? Rank by percentage.</prompt>
17
17
</task>
18
-
<task>
19
-
<prompt>Show the top 10 hobbies among users aged 25–35 in Manhattan, with counts.</prompt>
20
-
</task>
21
18
<task>
22
19
<prompt>Break down gender distribution by 3-year age buckets (18–20, 21–23, …) across all profiles.</prompt>
23
20
</task>
24
21
<task>
25
22
<prompt>Among profiles that mention pets, which locations have the highest concentration of pet owners?</prompt>
26
23
</task>
27
24
<task>
28
-
<prompt>List all conversations flagged as bad actors and summarize counts by primary_concern and behavior_severity.</prompt>
29
-
</task>
30
-
<task>
31
-
<prompt>For moderation reports where escalation_observed = true, what are the most common primary_concern values and recommended_action outcomes?</prompt>
32
-
</task>
33
-
<!-- <task>
34
25
<prompt>Identify repeat offenders: which user IDs appear most often as the primary_bad_actor? Return top 10 with counts and common concerns.</prompt>
35
26
</task>
36
-
<task>
37
-
<prompt>What recommended_action is most frequently applied per risk category (e.g., harassment_risk, scam_fraud_risk)? Show a table mapping risk → top action with counts.</prompt>
38
-
</task>
39
-
<task>
40
-
<prompt>Find profiles whose ideal_partner description includes the word "cat" and who are located in Manhattan. Return profile_id, full_name, location, pets.</prompt>
41
-
</task>
42
-
<task>
43
-
<prompt>Compare the distribution of "looking_for" categories across Manhattan, Brooklyn, and Queens. Show counts by borough.</prompt>
44
-
</task>
45
-
<task>
46
-
<prompt>From the moderation_report, list the top 5 conversations with harassment_risk = true ordered by behavior_severity, including conversation_summary (limit length) and recommended_action.</prompt>
47
-
</task>
48
-
<task>
49
-
<prompt>Compute the proportion of moderation reports with any risk flag set (at least one risk=true) that also have conversation_flagged_as_bad_actor = true.</prompt>
50
-
</task>
51
-
<task>
52
-
<prompt>For users aged 26–32 looking for a Long-term relationship, which occupations are most common? Return top 10 with counts.</prompt>
53
-
</task>
54
27
<task>
55
28
<prompt>Are there patterns between behavior_severity and recommended_action? Provide counts for each (behavior_severity, recommended_action) pair.</prompt>
0 commit comments