Skip to content

Commit 3acc760

Browse files
authored
[#49] [Feature] Use of GitLeaks for GitLab projects (#56)
* refactor: GitHub - preconditions Signed-off-by: Pierre-Yves Lapersonne <[email protected]> * feat: #49 - look for leaks with GitLeaks in GitLab projects Signed-off-by: Pierre-Yves Lapersonne <[email protected]>
1 parent 2d20ed6 commit 3acc760

File tree

6 files changed

+267
-10
lines changed

6 files changed

+267
-10
lines changed

CHANGELOG.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
### Features
66

77
- [#32](https://github.com/Orange-OpenSource/floss-toolbox/issues/32) GitLab Auto Backup
8+
- [#49](https://github.com/Orange-OpenSource/floss-toolbox/issues/49) Look for leaks (GitLab)
89

910
### Bugs
1011

@@ -14,7 +15,7 @@
1415

1516
### Features
1617

17-
- [#44](https://github.com/Orange-OpenSource/floss-toolbox/issues/44) Look for leaks
18+
- [#44](https://github.com/Orange-OpenSource/floss-toolbox/issues/44) Look for leaks (GitHub)
1819
- [#29](https://github.com/Orange-OpenSource/floss-toolbox/issues/29) Dry run
1920

2021
### Refactoring

README.md

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -427,7 +427,7 @@ brew install gitleaks
427427

428428
You need to define in the _configuration.rb_ files the Github organisation at **GITHUB_ORGANIZATION_NAME** and also your GitHub personal token at ** GITHUB_PERSONAL_ACCESS_TOKEN**.
429429

430-
**You should also have your _git_ environment ready i.e. add your SSH private key if you clone by SSH for example. _gh_ must be installed, and _python3_ be ready. Obvisously _gitleaks_ must be installed**
430+
**You should also have your _git_ environment ready i.e. add your SSH private key if you clone by SSH for example. _gh_ must be installed, and _python3_ be ready. Obviously _gitleaks_ must be installed**
431431

432432
# Play with GitLab web API
433433

@@ -464,3 +464,30 @@ You need to define in the _configuration.rb_ files the GitLab organisation ID at
464464
You have to also define the location to store clones at **REPOSITORIES_CLONE_LOCATION_PATH** and the access token at **GILAB_PERSONAL_ACCESS_TOKEN**.
465465

466466
**You should also have your _git_ environment ready, i.e. add your SSH private key if you clone by SSH for example.**
467+
468+
### Check if there are leaks in organisation repositories (using gitleaks)
469+
470+
_Keywords: #organisation #GitLab #repositories #leaks #gitleaks_
471+
472+
**Warning: This operation can take long time because of both Git histories and file trees parsing**
473+
474+
This feature allows to check in all repositories of the GitHub organisation if there are leaks using the _gitleaks_ tool.
475+
476+
Run the following command:
477+
```shell
478+
bash GitLabWizard.sh look-for-leaks
479+
```
480+
481+
This script needs a GitLab personal access otken to make requests to GitLab API and also the GitLab group ID to use to get projects under it.
482+
The wizard Shell script will pick configuration details from the Ruby configuration file ; and triggers another Shell script for the data process. A Python code will be called too to process JSON sent by GItLab API..
483+
484+
The [gitleaks](https://github.com/zricethezav/gitleaks) tool will be used to look inside the repository. To install it:
485+
486+
```shell
487+
brew install gitleaks
488+
```
489+
490+
You need to define in the _configuration.rb_ files the GitLab organisation ID at **GITLAB_ORGANIZATION_ID**.
491+
You have to also define the location to store clones at **REPOSITORIES_CLONE_LOCATION_PATH** and the access token at **GILAB_PERSONAL_ACCESS_TOKEN**.
492+
493+
**You should also have your _git_ environment ready i.e. add your SSH private key if you clone by SSH for example. _gh_ must be installed, and _python3_ be ready. Obviously _gitleaks_ must be installed**

toolbox/dry-run.sh

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -144,7 +144,8 @@ echo -e "\nCheck files..."
144144
CheckIfFileExists "gitlab/configuration.rb"
145145
CheckIfFileExists "gitlab/GitLabWizard.sh"
146146
CheckIfFileExists "gitlab/utils/dump-git-repositories-from-gitlab.sh"
147-
CheckIfFileExists "github/utils/extract-repos-field-from-json.py" # Stored in github folder but used by ump-git-repositories-from-gitlab.sh
147+
CheckIfFileExists "github/utils/extract-repos-field-from-json.py" # Stored in github folder but used by dump-git-repositories-from-gitlab.sh
148+
CheckIfFileExists "github/utils/count-leaks-nodes.py" # Stored in github folder but used by check-leaks-from-gitlab.sh
148149

149150
# Runtimes and tools
150151
# ------------------

toolbox/github/utils/check-leaks-from-github.sh

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -75,14 +75,14 @@ if [ -z "$organisation_name" -o "$organisation_name" == "" ]; then
7575
fi
7676

7777
cloning_url_key=$2
78-
if [ -z "$cloning_url_key" -o "$organisation_name" == "" ]; then
78+
if [ -z "$cloning_url_key" -o "$cloning_url_key" == "" ]; then
7979
echo "ERROR: No JSON key for URL. Exits now."
8080
UsageAndExit
8181
exit $EXIT_BAD_ARGUMENTS
8282
fi
8383

8484
dump_folder_name=$3
85-
if [ -z "$dump_folder_name" -o "$organisation_name" == "" ]; then
85+
if [ -z "$dump_folder_name" -o "$dump_folder_name" == "" ]; then
8686
echo "ERROR: No dump folder name defined. Exits now."
8787
UsageAndExit
8888
exit $EXIT_BAD_ARGUMENTS
@@ -195,7 +195,7 @@ while read url_line; do
195195

196196
done < "$dir_before_dump/$url_for_cloning"
197197

198-
echo "Looking done!"
198+
echo "Scanning done!"
199199

200200
# Step 6 - Clean up
201201

toolbox/gitlab/GitLabWizard.sh

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ VERSION="1.0.0"
1818

1919
RUBY_CONFIGURATION_FILE="./configuration.rb"
2020
SHELL_REPOSITORIES_DUMPER="./utils/dump-git-repositories-from-gitlab.sh"
21+
SHELL_REPOSITORIES_LEAKS_SCANNER="./utils/check-leaks-from-gitlab.sh"
2122

2223
# Exit codes
2324
# ----------
@@ -37,6 +38,7 @@ UsageAndExit(){
3738
echo "bash GitLabWizard.sh feature-to-launch"
3839
echo "with feature-to-launch:"
3940
echo -e "\t backup-all-repositories-from-org...............: Dump all repositories in GitHub to a specific location in the disk"
41+
echo -e "\t look-for-leaks.................................: Checks with gitleaks if there are leaks in all repositories"
4042
echo "About exit codes:"
4143
echo -e "\t 0................: Normal exit"
4244
echo -e "\t 1................: Bad arguments given to the script"
@@ -68,7 +70,7 @@ if [ -z "$feature_to_run" ]; then
6870
exit $EXIT_NO_FEATURE
6971
fi
7072

71-
if [ $feature_to_run != "backup-all-repositories-from-org" ]; then
73+
if [ $feature_to_run != "backup-all-repositories-from-org" -a $feature_to_run != "look-for-leaks" ]; then
7274
echo "ERROR: '$feature_to_run' is unknown feature. Exit now"
7375
UsageAndExit
7476
exit $EXIT_UNKNOWN_FEATURE
@@ -88,7 +90,8 @@ if [ ! -f "$RUBY_CONFIGURATION_FILE" ]; then
8890
exit $EXIT_BAD_SETUP
8991
fi
9092

91-
if [ $feature_to_run == "backup-all-repositories-from-org" ]; then
93+
# Features: backup-all-repositories-from-org, look-for-leaks
94+
if [ $feature_to_run == "backup-all-repositories-from-org" -o $feature_to_run == "look-for-leaks" ]; then
9295

9396
if [ ! -f "$SHELL_REPOSITORIES_DUMPER" ]; then
9497
echo "ERROR: SHELL_REPOSITORIES_DUMPER does not exist. Exits now."
@@ -125,9 +128,17 @@ if [ $feature_to_run == "backup-all-repositories-from-org" ]; then
125128
exit $EXIT_BAD_SETUP
126129
fi
127130

128-
echo "Start Shell script ($SHELL_REPOSITORIES_DUMPER) for feature to dump repositories of '$GITLAB_ORGANIZATION_ID' to '$REPOSITORIES_CLONE_LOCATION_PATH'"
129131
start_time_seconds=`date +%s`
130-
./$SHELL_REPOSITORIES_DUMPER $CLONING_URL_JSON_KEY $GITLAB_ORGANIZATION_ID $RESULTS_PER_PAGE $REPOSITORIES_CLONE_LOCATION_PATH $GILAB_PERSONAL_ACCESS_TOKEN
132+
133+
if [ $feature_to_run == "backup-all-repositories-from-org" ]; then
134+
echo "Start Shell script ($SHELL_REPOSITORIES_DUMPER) for feature to dump repositories of '$GITLAB_ORGANIZATION_ID' to '$REPOSITORIES_CLONE_LOCATION_PATH'"
135+
./$SHELL_REPOSITORIES_DUMPER $CLONING_URL_JSON_KEY $GITLAB_ORGANIZATION_ID $RESULTS_PER_PAGE $REPOSITORIES_CLONE_LOCATION_PATH $GILAB_PERSONAL_ACCESS_TOKEN
136+
fi
137+
138+
if [ $feature_to_run == "look-for-leaks" ]; then
139+
echo "Start Shell script ($SHELL_REPOSITORIES_LEAKS_SCANNER) to look for leaks in repositories of '$GITLAB_ORGANIZATION_ID'"
140+
./$SHELL_REPOSITORIES_LEAKS_SCANNER $GITLAB_ORGANIZATION_ID $CLONING_URL_JSON_KEY $RESULTS_PER_PAGE $GILAB_PERSONAL_ACCESS_TOKEN
141+
fi
131142
fi
132143

133144
# Stats & bye
Lines changed: 217 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,217 @@
1+
#!/bin/bash
2+
# Software Name: floss-toolbox
3+
# SPDX-FileCopyrightText: Copyright (c) 2021-2022 Orange
4+
# SPDX-License-Identifier: Apache-2.0
5+
#
6+
# This software is distributed under the Apache 2.0 license.
7+
#
8+
# Author: Pierre-Yves LAPERSONNE <pierreyves(dot)lapersonne(at)orange(dot)com> et al.
9+
10+
# Since...............: 09/03/2022
11+
# Description.........: Check if there are leaks thanks to gitleaks in GitLab projects
12+
13+
#set -euxo pipefail
14+
VERSION="1.0.0"
15+
16+
# Config
17+
# ------
18+
19+
EXIT_OK=0
20+
EXIT_BAD_ARGUMENTS=1
21+
EXIT_BAD_SETUP=2
22+
23+
URL_EXTRACTER_FILE="./../github/utils/extract-repos-field-from-json.py" # TODO: Extract this Python sript to common files
24+
LEAKS_PARSER="./../github/utils/count-leaks-nodes.py" # TODO: Extract this Python sript to common files
25+
GITLEAKS_FINAL_REPORT="$$_gitleaks-final_report-count.csv"
26+
27+
# Functions
28+
# ---------
29+
30+
UsageAndExit(){
31+
echo "check-leaks-from-gitlab.sh - Version $VERSION"
32+
echo "USAGE:"
33+
echo "bash check-leaks-from-gitlab.sh ORGANISATION_ID KEY TOKEN FOLDER_NAME PAGINATION TOKEN"
34+
echo "with ORGANISATION_ID: GitLab organisation ID"
35+
echo "with KEY: JSON key to use for cloning URL"
36+
echo "with PAGINATION: number if items per page"
37+
echo "with TOKEN: GitLab access token"
38+
echo "About exit codes:"
39+
echo -e "\t 0................: Normal exit"
40+
echo -e "\t 1................: Bad arguments given to the script"
41+
echo -e "\t 2................: Bad setup for the script or undefined LEAKS_PARSER file"
42+
exit $EXIT_OK
43+
}
44+
45+
# Check setup
46+
# -----------
47+
48+
if [ "$#" -eq 0 ]; then
49+
UsageAndExit
50+
exit $EXIT_OK
51+
fi
52+
53+
if [ "$#" -ne 4 ]; then
54+
echo "ERROR: Bad arguments number. Exits now"
55+
UsageAndExit
56+
exit $EXIT_BAD_ARGUMENTS
57+
fi
58+
59+
if [ ! -f "$URL_EXTRACTER_FILE" ]; then
60+
echo "ERROR: Bad set up for URL extracter. Exits now"
61+
UsageAndExit
62+
exit $EXIT_BAD_SETUP
63+
fi
64+
65+
if [ ! -f "$LEAKS_PARSER" ]; then
66+
echo "ERROR: Bad set up for leaks parser. Exits now"
67+
UsageAndExit
68+
exit $EXIT_BAD_SETUP
69+
fi
70+
71+
organisation_id=$1
72+
if [ -z "$organisation_id" -o "$organisation_id" == "" ]; then
73+
echo "ERROR: No organisation ID defined. Exits now."
74+
UsageAndExit
75+
exit $EXIT_BAD_ARGUMENTS
76+
fi
77+
78+
cloning_url_key=$2
79+
if [ -z "$cloning_url_key" -o "$cloning_url_key" == "" ]; then
80+
echo "ERROR: No JSON key for URL. Exits now."
81+
UsageAndExit
82+
exit $EXIT_BAD_ARGUMENTS
83+
fi
84+
85+
pagination=$3
86+
if [ -z "$pagination" ]; then
87+
echo "ERROR: No pagination defined. Exits now."
88+
UsageAndExit
89+
exit $EXIT_BAD_ARGUMENTS
90+
fi
91+
92+
access_token=$4
93+
if [ -z "$access_token" ]; then
94+
echo "ERROR: No access token is defined. Exits now."
95+
UsageAndExit
96+
exit $EXIT_BAD_ARGUMENTS
97+
fi
98+
99+
# Run
100+
# ---
101+
102+
echo "---------------------------------------------"
103+
echo "check-leaks-from-gitlab.sh - Version $VERSION"
104+
echo "---------------------------------------------"
105+
106+
# Step 1 - Get all groups and subgroups projects
107+
108+
max_number_of_pages=10 # TODO: Remove magic number for max number of pages
109+
echo "Get all projects of groups and subgroups with $pagination items per page and arbitrary $max_number_of_pages pages max..."
110+
111+
gitlab_projects_dump_file_raw="./data/.gitlab-projects-dump.raw.json"
112+
gitlab_projects_dump_file_clean="./data/.gitlab-projects-dump.clean.json"
113+
if [ -f "$gitlab_projects_dump_file_raw" ]; then
114+
rm $gitlab_projects_dump_file_raw
115+
fi
116+
117+
for page in `seq 1 $max_number_of_pages`
118+
do
119+
curl --silent --header "Authorization: Bearer $access_token" --location --request GET "https://gitlab.com/api/v4/groups/$organisation_id/projects?include_subgroups=true&per_page=$pagination&page=$page" >> $gitlab_projects_dump_file_raw
120+
done
121+
122+
# Step 2 - Extract repositories URL
123+
124+
# Because of pagination (max 100 items par ages, arbitrary 10 pages here, raw pages are concatenated in one file.
125+
# So with have pasted JSON array in one file.
126+
# We see arrays with pattern ][. Merge all arrays be replacing cumulated JSON arrays, so replacing ][ by ,
127+
# By for empty pages we have the empty arrays ][ replaced by cumulated , so with remove them.
128+
# Then it remains the final array with a useless , with pattern },] replaced by }]
129+
cat $gitlab_projects_dump_file_raw | sed -e "s/\]\[/,/g" | tr -s ',' | sed -e "s/\}\,\]/\}\]/g" > $gitlab_projects_dump_file_clean
130+
131+
url_for_cloning="./data/.url-for-cloning.txt"
132+
echo "Extract cloning from results (using '$cloning_url_key' as JSON key)..."
133+
python3 "$URL_EXTRACTER_FILE" --field $cloning_url_key --source $gitlab_projects_dump_file_clean > $url_for_cloning
134+
repo_count=`cat $url_for_cloning | wc -l | sed 's/ //g'`
135+
echo "Extraction done. Found '$repo_count' items."
136+
137+
# Step 3 - Clone repositories
138+
139+
dir_before_dump=`pwd`
140+
echo "Creating dump directory..."
141+
directory_name=$(date '+%Y-%m-%d')
142+
cd "$repositories_location"
143+
if [ -d "$directory_name" ]; then
144+
echo "Removing old directory with the same name"
145+
rm -rf $directory_name
146+
fi
147+
mkdir $directory_name
148+
cd $directory_name
149+
echo "Dump directory created with name '$directory_name' at location `pwd`."
150+
151+
# Step 4 - For each repository, clone it and make a scan
152+
153+
number_of_url=`cat "$dir_before_dump/$url_for_cloning" | wc | awk {'print $1 '}`
154+
cpt=1
155+
echo "Dumping of $number_of_url repositories..."
156+
while read url_line; do
157+
158+
# Step 4.1 - Clone
159+
# WARNING: gitleaks looks inside files and git histories, so for old and big projects it will take too many time!
160+
161+
echo "Cloning ($cpt / $number_of_url) '$url_line'..."
162+
git clone "$url_line"
163+
164+
# Step 4.2 - Extract new folder name
165+
166+
target_folder_name=`basename -s .git $(echo "$url_line")`
167+
echo "Cloned in folder '$target_folder_name'"
168+
169+
# Step 5.3 - Look for leaks
170+
171+
gitleaks_file_name="$target_folder_name".gitleaks.json
172+
gitleaks detect --report-format json --report-path "$gitleaks_file_name" --source "$target_folder_name" || true # gitleaks returns 1 if leaks found
173+
174+
# In JSON report, a project as no leak if the result file containsan empty JSON array, i.e. only the line
175+
# []
176+
if [ -f "$gitleaks_file_name" ]; then
177+
pwd
178+
count=`python3 "../$LEAKS_PARSER" --file "$gitleaks_file_name"`
179+
180+
if [ "$count" -eq "0" ]; then
181+
echo "✅ ;$target_folder_name;$count" >> $GITLEAKS_FINAL_REPORT
182+
echo "✅ Gitleaks did not find leaks for '$target_folder_name'"
183+
cpt_clean_repo=$((cpt_clean_repo+1))
184+
else
185+
echo "🚨;$target_folder_name;$count" >> $GITLEAKS_FINAL_REPORT
186+
echo "🚨 WARNING! gitleaks may have found '$count' leaks for '$target_folder_name'"
187+
cpt_dirty_repo=$((cpt_dirty_repo+1))
188+
fi
189+
else
190+
echo "💥 ERROR: The file '$gitleaks_file_name' does not exist, something has failed with gitleaks!"
191+
fi
192+
193+
rm -rf "$target_folder_name"
194+
195+
cpt=$((cpt+1))
196+
197+
done < "$dir_before_dump/$url_for_cloning"
198+
199+
echo "Scanning done!"
200+
201+
# Step 6 - Clean up
202+
203+
git config --global diff.renameLimit $previous_git_diff_rename_limit # (default seems to be 0)
204+
205+
mv $GITLEAKS_FINAL_REPORT "$dir_before_dump"
206+
echo "GitLab organisation ID...............: '$organisation_id'"
207+
echo "Total number of projects.............: '$number_of_url'"
208+
echo "Number of projects with alerts.......: '$cpt_dirty_repo'"
209+
echo "Number of projects without alerts....: '$cpt_clean_repo'"
210+
echo "Final report is......................: '$GITLEAKS_FINAL_REPORT'"
211+
212+
rm -rf "$target_folder_name"
213+
rm -rf "$dir_before_dump/$url_for_cloning"
214+
cd "$dir_before_dump"
215+
rm -f $url_for_cloning
216+
217+
echo "Check done!"

0 commit comments

Comments
 (0)