Overview: Perform PCA and MDS on Univerisity Rankings dataset obtained from Kaggle (https://www.kaggle.com/joeshamen/world-university-rankings-2020). Visualize the data using scatter-plots and scatter-matrix.
Assignment Tasks: Task 1: data clustering and decimation (30 points)
- implement random sampling and stratified sampling (remove 75% of data)
- the latter includes the need for k-means clustering (optimize k using elbow) Task 2: dimension reduction on both org and 2 types of reduced data (30)
- find the intrinsic dimensionality of the data using PCA
- produce scree plot visualization and mark the intrinsic dimensionality
- show the scree plots before/after sampling to assess the bias introduced
- obtain the three attributes with highest PCA loadings Task 3: visualization of both original and 2 types of reduced data (40 points)
- visualize the data projected into the top two PCA vectors via 2D scatterplot
- visualize the data via MDS (Euclidian & correlation distance) in 2D scatterplots
- visualize the scatterplot matrix of the three highest PCA loaded attributes