|
| 1 | +## (Unclassified//OUO) Mapper' V.0.1 - June '24 |
| 2 | +This is a mostly-working version of the mapper' code for generating graphs & sankey diagrams of timestamped data. |
| 3 | +Direct questions to Kaleb D. Ruscitti; ( [email protected] before Aug 23 2024, or [email protected] afterthat). |
| 4 | + |
| 5 | +### Components |
| 6 | +There are three subcomponents |
| 7 | +1. `temporal_grapher` - this is the original code which implements mapper' and builds the graph object. |
| 8 | +2. `weighted fast_hdbscan` - HDBSCAN but the points are weighted; thanks Patsy Greenwood for writing this code :) |
| 9 | +3. `modified holoviews` - A copy of holoviews with the Sankey diagram code mangled to work with mass creation & destruction. |
| 10 | + |
| 11 | +### Conda Env |
| 12 | +The environment.yml file contains a dump of my conda env when I tested this code. |
| 13 | +Stack overflow tells me that you can make a copy of the environment by running |
| 14 | +`conda env create -f environment.yml` |
| 15 | +Worst case scenario you can just manually install all the usual packages. Just don't install holoviews or HDBSCAN to |
| 16 | +avoid collisions. |
| 17 | + |
| 18 | +### Usage |
| 19 | +The file `DemoV0.1.ipynb` is a start-to-finish example of how to generate a Sankey diagram with this package. |
| 20 | + |
| 21 | +### Parameters |
| 22 | +Since temporal grapher is mostly undocumented, let me quickly mention a few choices you can make. |
| 23 | + |
| 24 | +`HDBSCAN(min_cluster_size=n)` |
| 25 | +This is the usual HDBSCAN parameter, but now that the points are weighted, and the weights are strictly less than 1, you generally want to set this a bit lower than you might usually do. |
| 26 | + |
| 27 | +#### `tm.TemporalGraph()` parameters |
| 28 | + |
| 29 | +##### Checkpoints |
| 30 | +In mapper' there are no slices anymore. Slices have been replaced by checkpoints, which are single time points around |
| 31 | +which you wish to cluster. You can either pass tm.TemporalGraph() a list of checkpoints; `checkpoints = arrayLike` |
| 32 | +or you can use the `N_checkpoints = int` and `slice_method = str` parameters to have it generate checkpoints for you. |
| 33 | + |
| 34 | +`slice-method` takes either 'time' or 'data'. The time option generates checkpoints evenly spaced in time, and the data |
| 35 | +option generates checkpoints evenly spaced in the number of data points. |
| 36 | + |
| 37 | +#### Temporal kernel parameters |
| 38 | +The temporal kernel is used to give the points weight in time. You can pass a kernel function to tm.TemporalGraph |
| 39 | +`kernel=myFunc`. The default is `temporal_mapper.weighted_clusters.gaussian` which is a Gaussian kernel. If your kernel |
| 40 | +function takes parameters, you can pass `kernel_params = (param1, param2, ...)` |
| 41 | + |
| 42 | +If you want to recover original mapper, you can pass `kernel = temporal_mapper.weighted_clusters.square`. This has a |
| 43 | +required parameter `kernel_params=(overlap,)` which is the amount of overlap between slices. If in doubt, set it = 1. |
| 44 | + |
| 45 | +The parameter `rate_sensitivity` can be any number >=0, or -1. This controls how sensitive the temporal kernel is to |
| 46 | +changes in the temporal density of your data. This is an exponent factor; at the default setting (= 1.) points |
| 47 | +with double the temporal density will have a kernel that is half as wide. At sensitivity 2, double density gives 1/4 as |
| 48 | +wide, and so on. The option -1 sets the scale to be logarithmic; 10x as dense = 1/2 as wide. |
| 49 | + |
| 50 | +### A note on the Holoviews code |
| 51 | +I'm not sure why, but sometimes when you try to plot the Sankey diagram it throws an error like `no plotting option Sankey` |
| 52 | +(I can't recreate it and I don't remember exactly what the error says). If this happens just reload the kernel and run |
| 53 | +the code again top-down. Not sure what is causing this one. |
0 commit comments