User Tools

Site Tools


k-means_clustering_using_r

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
k-means_clustering_using_r [2015/07/24 15:24]
vincenzo
k-means_clustering_using_r [2015/07/24 15:25] (current)
vincenzo
Line 1: Line 1:
 Load libraries Load libraries
    
-<​code>​+<​code ​python>
 library(caret) library(caret)
 library(ykmeans) library(ykmeans)
Line 9: Line 9:
 </​code>​ </​code>​
 Load data Load data
-<​code>​+<​code ​python>
 sample <- read.csv("​./​clusters.csv"​) sample <- read.csv("​./​clusters.csv"​)
 </​code>​ </​code>​
 clean the first column containing text, check for "near zero values"​ clean the first column containing text, check for "near zero values"​
-<​code>​+<​code ​python>
 sample <- sample[,-1] sample <- sample[,-1]
 nzv <- nearZeroVar(sample) nzv <- nearZeroVar(sample)
Line 20: Line 20:
 </​code>​ </​code>​
 Create a data matrix Create a data matrix
-<​code>​+<​code ​python>
 sample1 <- data.matrix(sample) sample1 <- data.matrix(sample)
 samplecor <- cor(sample1) samplecor <- cor(sample1)
Line 40: Line 40:
 Clustering, from 3 to 6 clusters Clustering, from 3 to 6 clusters
  
-<​code>​+<​code ​python>
 samplepca <- data.frame(samplenocorbase$x) samplepca <- data.frame(samplenocorbase$x)
 keys <- names(samplepca) keys <- names(samplepca)
Line 47: Line 47:
  
 Check the deviation to infer the number of clusters Check the deviation to infer the number of clusters
-<​code>​+<​code ​python>
 table(samplekm$cluster) table(samplekm$cluster)
 </​code>​ </​code>​
Line 53: Line 53:
 Plot the clusters! Plot the clusters!
  
-<​code>​+<​code ​python>
 samplekm <- ykmeans(samplepca,​ keys, "​PC1",​ 6) samplekm <- ykmeans(samplepca,​ keys, "​PC1",​ 6)
 ggplot(samplekm,​ aes(x=PC1, y=PC2, col=as.factor(cluster),​ shape=as.factor(cluster))) + geom_point() ggplot(samplekm,​ aes(x=PC1, y=PC2, col=as.factor(cluster),​ shape=as.factor(cluster))) + geom_point()
Line 62: Line 62:
  
 Add the cluster column Add the cluster column
-<​code>​+<​code ​python>
 sample$cluster <- samplekm$cluster sample$cluster <- samplekm$cluster
 </​code>​ </​code>​
 Basically, do the pivot, using average, on cluster. (I could have melted and casted data. Basically, do the pivot, using average, on cluster. (I could have melted and casted data.
-<​code>​+<​code ​python>
 samplecenter <- aggregate(sample,​ by=list(sample$cluster),​ FUN=mean) samplecenter <- aggregate(sample,​ by=list(sample$cluster),​ FUN=mean)
 samplecenter$cluster <- NULL  samplecenter$cluster <- NULL 
Line 84: Line 84:
 </​code>​ </​code>​
 Plot the radarchart. Plot the radarchart.
-<​code>​+<​code ​python>
 ##​par(family="​HiraKakuProN-W3"​) ##​par(family="​HiraKakuProN-W3"​)
 radarchart(sampleradar,​ seg=5, plty=4, plwd=4, pcol=rainbow(5)) radarchart(sampleradar,​ seg=5, plty=4, plwd=4, pcol=rainbow(5))
k-means_clustering_using_r.txt ยท Last modified: 2015/07/24 15:25 by vincenzo