Skip to content

Cluster centroids#21

Open
awohns wants to merge 1 commit intomainfrom
cluster_centroids
Open

Cluster centroids#21
awohns wants to merge 1 commit intomainfrom
cluster_centroids

Conversation

@awohns
Copy link
Owner

@awohns awohns commented Mar 1, 2022

Adds code to run louvain community detection and create cluster centroids from the results.
One issue is that clusters with ~ >10000 nodes are difficult to handle, since creating the genotype matrix is extremely expensive (10,000 nodes * 1,000,000 sites). I currently have a max_cluster_size parameter to randomly subsample large clusters.
More generally, we get around the expense of creating huge genotype matrices by using tskit.TreeSequence.simplify(), which cuts down the tree sequence quickly, making it easier to deal with.

@awohns awohns force-pushed the cluster_centroids branch from 271422d to 02ead76 Compare March 1, 2022 05:12
@awohns awohns force-pushed the cluster_centroids branch from 02ead76 to 2600564 Compare March 1, 2022 05:17
@awohns
Copy link
Owner Author

awohns commented Mar 1, 2022

On second thought, this might be better as two different functions: one to do the clustering and another to compute the centroids.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant