edu.uci.ics.jung.algorithms.cluster
Class VoltageClusterer<V,E>

java.lang.Object
  extended by edu.uci.ics.jung.algorithms.cluster.VoltageClusterer<V,E>

public class VoltageClusterer<V,E>
extends Object

Clusters vertices of a Graph based on their ranks as calculated by VoltageScorer. This algorithm is based on, but not identical with, the method described in the paper below. The primary difference is that Wu and Huberman assume a priori that the clusters are of approximately the same size, and therefore use a more complex method than k-means (which is used here) for determining cluster membership based on co-occurrence data.

The algorithm proceeds as follows:

NOTE: Depending on how the co-occurrence data splits the data into clusters, the number of clusters returned by this algorithm may be less than the number of clusters requested. The number of clusters will never be more than the number requested, however.

Author:
Joshua O'Madadhain
See Also:
"'Finding communities in linear time: a physics approach', Fang Wu and Bernardo Huberman, http://www.hpl.hp.com/research/idl/papers/linear/", VoltageScorer, KMeansClusterer

Nested Class Summary
protected  class VoltageClusterer.MapValueArrayComparator
           
 
Field Summary
protected  Graph<V,E> g
           
protected  KMeansClusterer<V> kmc
           
protected  int num_candidates
           
protected  Random rand
           
 
Constructor Summary
VoltageClusterer(Graph<V,E> g, int num_candidates)
          Creates an instance of a VoltageCluster with the specified parameters.
 
Method Summary
protected  void addOneCandidateCluster(LinkedList<Set<V>> candidates, Map<V,double[]> voltage_ranks)
          alternative to addTwoCandidateClusters(): cluster vertices by voltages into 2 clusters.
protected  void addTwoCandidateClusters(LinkedList<Set<V>> candidates, Map<V,double[]> voltage_ranks)
          Do k-means with three intervals and pick the smaller two clusters (presumed to be on the ends); this is closer to the Wu-Huberman method.
protected  Collection<Set<V>> cluster_internal(V origin, int num_clusters)
          Does the work of getCommunity and cluster.
 Collection<Set<V>> cluster(int num_clusters)
          Clusters the vertices of g into num_clusters clusters, based on their connectivity.
 Collection<Set<V>> getCommunity(V v)
          Returns a community (cluster) centered around v.
protected  Map<V,double[]> getObjectCounts(Collection<Set<V>> candidates, V seed)
           
protected  List<V> getSeedCandidates(Collection<Set<V>> candidates)
          Returns an array of cluster seeds, ranked in decreasing order of number of appearances in the specified collection of candidate clusters.
protected  void setRandomSeed(int random_seed)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

num_candidates

protected int num_candidates

kmc

protected KMeansClusterer<V> kmc

rand

protected Random rand

g

protected Graph<V,E> g
Constructor Detail

VoltageClusterer

public VoltageClusterer(Graph<V,E> g,
                        int num_candidates)
Creates an instance of a VoltageCluster with the specified parameters. These are mostly parameters that are passed directly to VoltageScorer and KMeansClusterer.

Parameters:
num_candidates - the number of candidate clusters to create
Method Detail

setRandomSeed

protected void setRandomSeed(int random_seed)

getCommunity

public Collection<Set<V>> getCommunity(V v)
Returns a community (cluster) centered around v.

Parameters:
v - the vertex whose community we wish to discover

cluster

public Collection<Set<V>> cluster(int num_clusters)
Clusters the vertices of g into num_clusters clusters, based on their connectivity.

Parameters:
num_clusters - the number of clusters to identify

cluster_internal

protected Collection<Set<V>> cluster_internal(V origin,
                                              int num_clusters)
Does the work of getCommunity and cluster.

Parameters:
origin - the vertex around which clustering is to be done
num_clusters - the (maximum) number of clusters to find

addTwoCandidateClusters

protected void addTwoCandidateClusters(LinkedList<Set<V>> candidates,
                                       Map<V,double[]> voltage_ranks)
Do k-means with three intervals and pick the smaller two clusters (presumed to be on the ends); this is closer to the Wu-Huberman method.

Parameters:
candidates -
voltage_ranks -

addOneCandidateCluster

protected void addOneCandidateCluster(LinkedList<Set<V>> candidates,
                                      Map<V,double[]> voltage_ranks)
alternative to addTwoCandidateClusters(): cluster vertices by voltages into 2 clusters. We only consider the smaller of the two clusters returned by k-means to be a 'true' cluster candidate; the other is a garbage cluster.

Parameters:
candidates -
voltage_ranks -

getSeedCandidates

protected List<V> getSeedCandidates(Collection<Set<V>> candidates)
Returns an array of cluster seeds, ranked in decreasing order of number of appearances in the specified collection of candidate clusters.

Parameters:
candidates -

getObjectCounts

protected Map<V,double[]> getObjectCounts(Collection<Set<V>> candidates,
                                          V seed)


Copyright © 2009. All Rights Reserved.