Skip to content

Commit 7c1ac58

Browse files
author
Kaleb D Ruscitti
committed
Upload to github
0 parents  commit 7c1ac58

File tree

793 files changed

+200252
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

793 files changed

+200252
-0
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
.ipynb_checkpoints/*
2+
__pycache__/*
3+
mapperplayground.ipynb

DemoV0.1.ipynb

Lines changed: 1183 additions & 0 deletions
Large diffs are not rendered by default.

README.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
## (Unclassified//OUO) Mapper' V.0.1 - June '24
2+
This is a mostly-working version of the mapper' code for generating graphs & sankey diagrams of timestamped data.
3+
Direct questions to Kaleb D. Ruscitti; ([email protected] before Aug 23 2024, or [email protected] afterthat).
4+
5+
### Components
6+
There are three subcomponents
7+
1. `temporal_grapher` - this is the original code which implements mapper' and builds the graph object.
8+
2. `weighted fast_hdbscan` - HDBSCAN but the points are weighted; thanks Patsy Greenwood for writing this code :)
9+
3. `modified holoviews` - A copy of holoviews with the Sankey diagram code mangled to work with mass creation & destruction.
10+
11+
### Conda Env
12+
The environment.yml file contains a dump of my conda env when I tested this code.
13+
Stack overflow tells me that you can make a copy of the environment by running
14+
`conda env create -f environment.yml`
15+
Worst case scenario you can just manually install all the usual packages. Just don't install holoviews or HDBSCAN to
16+
avoid collisions.
17+
18+
### Usage
19+
The file `DemoV0.1.ipynb` is a start-to-finish example of how to generate a Sankey diagram with this package.
20+
21+
### Parameters
22+
Since temporal grapher is mostly undocumented, let me quickly mention a few choices you can make.
23+
24+
`HDBSCAN(min_cluster_size=n)`
25+
This is the usual HDBSCAN parameter, but now that the points are weighted, and the weights are strictly less than 1, you generally want to set this a bit lower than you might usually do.
26+
27+
#### `tm.TemporalGraph()` parameters
28+
29+
##### Checkpoints
30+
In mapper' there are no slices anymore. Slices have been replaced by checkpoints, which are single time points around
31+
which you wish to cluster. You can either pass tm.TemporalGraph() a list of checkpoints; `checkpoints = arrayLike`
32+
or you can use the `N_checkpoints = int` and `slice_method = str` parameters to have it generate checkpoints for you.
33+
34+
`slice-method` takes either 'time' or 'data'. The time option generates checkpoints evenly spaced in time, and the data
35+
option generates checkpoints evenly spaced in the number of data points.
36+
37+
#### Temporal kernel parameters
38+
The temporal kernel is used to give the points weight in time. You can pass a kernel function to tm.TemporalGraph
39+
`kernel=myFunc`. The default is `temporal_mapper.weighted_clusters.gaussian` which is a Gaussian kernel. If your kernel
40+
function takes parameters, you can pass `kernel_params = (param1, param2, ...)`
41+
42+
If you want to recover original mapper, you can pass `kernel = temporal_mapper.weighted_clusters.square`. This has a
43+
required parameter `kernel_params=(overlap,)` which is the amount of overlap between slices. If in doubt, set it = 1.
44+
45+
The parameter `rate_sensitivity` can be any number >=0, or -1. This controls how sensitive the temporal kernel is to
46+
changes in the temporal density of your data. This is an exponent factor; at the default setting (= 1.) points
47+
with double the temporal density will have a kernel that is half as wide. At sensitivity 2, double density gives 1/4 as
48+
wide, and so on. The option -1 sets the scale to be logarithmic; 10x as dense = 1/2 as wide.
49+
50+
### A note on the Holoviews code
51+
I'm not sure why, but sometimes when you try to plot the Sankey diagram it throws an error like `no plotting option Sankey`
52+
(I can't recreate it and I don't remember exactly what the error says). If this happens just reload the kernel and run
53+
the code again top-down. Not sure what is causing this one.

data/.ipynb_checkpoints/data_gen-checkpoint.ipynb

Lines changed: 473 additions & 0 deletions
Large diffs are not rendered by default.

data/ai_arxiv_coordinates.npy

78.3 KB
Binary file not shown.

data/ai_arxiv_data.feather

4.01 MB
Binary file not shown.

data/ai_arxiv_vectors.npy

29.3 MB
Binary file not shown.

data/data_gen.ipynb

Lines changed: 478 additions & 0 deletions
Large diffs are not rendered by default.

data/dtest_data.npy

61.1 KB
Binary file not shown.

diagnostic.ipynb

Lines changed: 285 additions & 0 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)