Compute hierarchical or kmeans cluster analysis and return the group association for each observation as vector.
sjc.cluster(data, groupcount = NULL, method = c("hclust", "kmeans"), distance = c("euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski"), agglomeration = c("ward", "ward.D", "ward.D2", "single", "complete", "average", "mcquitty", "median", "centroid"), iter.max = 20, algorithm = c("Hartigan-Wong", "Lloyd", "MacQueen"))
data | A data frame with variables that should be used for the cluster analysis. |
---|---|
groupcount | Amount of groups (clusters) used for the cluster solution. May also be
a set of initial (distinct) cluster centres, in case
|
method | Method for computing the cluster analysis. By default ( |
distance | Distance measure to be used when |
agglomeration | Agglomeration method to be used when |
iter.max | Maximum number of iterations allowed. Only applies, if
|
algorithm | Algorithm used for calculating kmeans cluster. Only applies, if
|
The group classification for each observation as vector. This group
classification can be used for sjc.grpdisc
-function to
check the goodness of classification.
The returned vector includes missing values, so it can be appended
to the original data frame data
.
Since R version > 3.0.3, the "ward"
option has been replaced by
either "ward.D"
or "ward.D2"
, so you may use one of
these values. When using "ward"
, it will be replaced by "ward.D2"
.
To get similar results as in SPSS Quick Cluster function, following points
have to be considered:
Use the /PRINT INITIAL
option for SPSS Quick Cluster to get a table with initial cluster centers.
Create a matrix
of this table, by consecutively copying the values, one row after another, from the SPSS output into a matrix and specify nrow
and ncol
arguments.
Use algorithm="Lloyd"
.
Use the same amount of iter.max
both in SPSS and this sjc.qclus
.
This ensures a fixed initial set of cluster centers (as in SPSS), while kmeans
in R
always selects initial cluster sets randomly.
Maechler M, Rousseeuw P, Struyf A, Hubert M, Hornik K (2014) cluster: Cluster Analysis Basics and Extensions. R package.
# Hierarchical clustering of mtcars-dataset groups <- sjc.cluster(mtcars, 5) # K-means clustering of mtcars-dataset groups <- sjc.cluster(mtcars, 5, method="k")