-
Notifications
You must be signed in to change notification settings - Fork 8
Visualizations for the Queries Deleted pages , Automated Edits , Edits revert,rollback,undo in Wikimedia from Multiple wikipedia language sets (tewiki , hiwiki , mlwiki) #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left comments
Number_of_Automated_Edits.sql
Outdated
JOIN user_groups | ||
ON actor.actor_user = user_groups.ug_user -- Join user_groups and actor tables | ||
WHERE user_groups.ug_group = 'bot' -- Filter for bot user group | ||
AND revision.rev_timestamp BETWEEN '20230101' AND '20240301'; -- Filter by specific date range |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't filter in WHERE condition, that way you won't be able to filter on the dashboard.
Add DATE to SELECT statement. Also, count distinct revision IDs.
SELECT | ||
(SELECT COUNT(*) | ||
FROM revision -- For edits we use revision table | ||
WHERE LEFT(rev_timestamp, 8) BETWEEN '20240101' AND '20240301') AS total_edits, -- Filtered between specific dates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only need deleted pages, not revisions. This can be removed.
|
||
(SELECT COUNT(*) | ||
FROM archive -- For deleted pages we use archive tale | ||
WHERE LEFT(ar_timestamp, 8) BETWEEN '20240101' AND '20240301') AS deleted_pages; -- Filtered between specific dates No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to above, don't filter in WHERE condition. Add date statement to SELECT
SELECT | ||
(SELECT COUNT(*) | ||
FROM archive -- For deleted edits we use archieve table | ||
WHERE LEFT(ar_timestamp, 8) BETWEEN '20230301' AND '20231212') AS deleted_edits, -- Filtering between specific dates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deleted edits are not required, only pages, this can be removed.
(SELECT COUNT(*) | ||
FROM revision r -- For reverted or rollback edits we use comment table | ||
JOIN comment c ON r.rev_comment_id = c.comment_id | ||
WHERE LEFT(r.rev_timestamp, 8) BETWEEN '20230301' AND '20231212' -- Filtering between specific dates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove filtering in WHERE on timestamp, add them to SELECT statement as date
AND (c.comment_text LIKE '%revert%' | ||
OR c.comment_text LIKE '%rollback%' | ||
OR c.comment_text LIKE '%undid%')) AS reverted_edits; -- Comparing strings i.e revert or rollback with column comment_text No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a good way to get reverted edits. You have to join the ctd_tag and ctd_tag_def tables, and check if the tags are mw-reverted, mw-rollback or mw-undo.
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
@@ -0,0 +1,464 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This graph looks very clumsy. For representing plots with continuous data, line plots look more cleaner instead of bar plots. With a line plot, this graph will have just 3 lines each with a color and it also records the ups and downs over time without looking clumsy. It becomes easy to understand insights
Reply via ReviewNB
@@ -0,0 +1,464 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #22. if all_dfs:
Box plots are not the ideal way of representing this. Histograms, bar plots (including count plots), and line plots are effective for visualizing data where "number of counts" is a key metric. Histograms are ideal for displaying the distribution of numerical data, while bar plots, especially count plots, are suitable for showing the frequency of categorical variables.
Reply via ReviewNB
Please review my updated Visualizations notebook in reviewnb |
Metric 1 : Number of automated edits
Metric 2 : Number of deleted pages and edits
Metric 3 : Number of edits deleted, reverted or rolled back