Title: | The Online Regularized K-Means Clustering Algorithm |
---|---|
Description: | Algorithm of online regularized k-means to deal with online multi(single) view data. The philosophy of the package is described in Guo G. (2020) <doi:10.1080/02331888.2020.1823979>. |
Authors: | Guangbao Guo [aut, cre]
|
Maintainer: | Guangbao Guo <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.8.0.0 |
Built: | 2025-01-31 02:37:30 UTC |
Source: | https://github.com/cran/ORKM |
Algorithm of online regularized k-means to deal with online multi(single) view data. The philosophy of the package is described in Guo G. (2020) <doi:10.1080/02331888.2020.1823979>.
The DESCRIPTION file:
Package: | ORKM |
Title: | The Online Regularized K-Means Clustering Algorithm |
Date: | 2024-5-5 |
Version: | 0.8.0.0 |
Authors@R: | c(person("Guangbao", "Guo",role = c("aut", "cre"), email = "[email protected]", comment = c(ORCID = "0000-0002-4115-6218")), person("Miao", "Yu", role="aut"), person("Haoyue", "Song", role="aut"), person("Ruiling", "Niu", role="aut")) |
Description: | Algorithm of online regularized k-means to deal with online multi(single) view data. The philosophy of the package is described in Guo G. (2020) <doi:10.1080/02331888.2020.1823979>. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.0 |
Author: | Guangbao Guo [aut, cre] (<https://orcid.org/0000-0002-4115-6218>), Miao Yu [aut], Haoyue Song [aut], Ruiling Niu [aut] |
Maintainer: | Guangbao Guo <[email protected]> |
Suggests: | testthat (>= 3.0.0) |
Imports: | MASS, Matrix, stats, |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2024-05-05 19:28:49 UTC; 14482 |
Depends: | R (>= 3.5.0) |
Date/Publication: | 2024-05-05 21:50:03 UTC |
Repository: | https://guangbaog.r-universe.dev |
RemoteUrl: | https://github.com/cran/ORKM |
RemoteRef: | HEAD |
RemoteSha: | 8a8c25be9b85485df132c5f76f01a9a56eae76b7 |
Index of help topics:
DMC Deep matrix clustering algorithm for multi-view data INDEX Caculate the indication on the functions KMeans K-means clustering algorithm for multi/single view data OGD Online gradient descent algorithm for online single-view data clustering OMU Online multiplicative update algorithm for online multi-view data clustering ORKM-package The Online Regularized K-Means Clustering Algorithm ORKMeans Online regularized K-means clustering algorithm for online multi-view data PKMeans Power K-means clustering algorithm for single view data QCM The QCM data set with K=5. RKMeans Regularized K-means clustering algorithm for multi-view data Washington_cites The third view of Washington data set. Washington_content The second view of Washington data set. Washington_inbound The third view of Washington data set. Washington_outbound The fourth view of Washington data set. Wisconsin_cites The first view of Wisconsin data set. Wisconsin_content The second view of Wisconsin data set. Wisconsin_inbound The third view of Wisconsin data set. Wisconsin_outbound The fourth view of Wisconsin data set. cora_view1 The first view of Cora data set. cora_view2 The second view of Cora data set. cora_view3 The third view of Cora data set. cora_view4 The fourth view of Cora data set. cornell_cites The first view of Cornell data set. cornell_content The second view of Cornell data set. cornell_inbound The third view of Cornell data set. cornell_outbound The fourth view of Cornell data set. labelTexas True clustering labels for Texas data set. labelWashington True clustering labels for Washington data set. labelWisconsin True clustering labels for Wisconsin data set. labelcora True clustering labels for Cora data set. labelcornell True clustering labels for Cornell data set. movie_1 The first view of Movie data set. movie_2 The second view of Movie data set. seed A single-view data set named Seeds. sobar A single-view data set named Sobar. texas_cites The first view of Texas data set. texas_content The second view of Texas dataset. texas_inbound The third view of Texas data set. texas_outbound The fourth view of Texas data set. turelabel Ture label of Movie data set.
You can use this package for online multi-view clustering, the dataset and real labels are also provided in the package.
Guangbao Guo [aut, cre] (<https://orcid.org/0000-0002-4115-6218>), Miao Yu [aut], Haoyue Song [aut], Ruiling Niu [aut]
Maintainer: Guangbao Guo <[email protected]>
Guangbao Guo, Miao Yu, Guoqi Qian, (2023), Orkm: Online Regularized k-Means Clustering for Online Multi-View Data.
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4484209
library(MASS) library(Matrix) yita=0.5;V=2;chushi=100;K=3;r=0.5;max.iter=10;n1=n2=n3=70;gamma=0.1;alpha=0.98;epsilon=1 X1<-rnorm(n1,20,2);X2<-rnorm(n2,25,1.5);X3<-rnorm(n3,30,2) Xv<-c(X1,X2,X3) data<-matrix(Xv,n1+n2+n3,2) data[1:70,2]<-1;data[71:140,2]<-2;data[141:210,2]<-3 truere=data[,2] X<-matrix(data[,1],n1+n2+n3,1) lamda1<-0.2;lamda2<-0.8 lamda<-matrix(c(lamda1,lamda2),nrow=1,ncol=2) sol.svd <- svd(lamda) U1<-sol.svd$u D1<-sol.svd$d V1<-sol.svd$v C1<-t(U1) Y1<-C1/D1 view<-V1 view1<-matrix(view[1,]) view2<-matrix(view[2,]) X1<-matrix(view1,n1+n2+n3,1) X2<-matrix(view2,n1+n2+n3,1) ORKMeans(X=X1,K=K,V=V,r=r,chushi=chushi,yita=yita,gamma=gamma,epsilon=epsilon, max.iter=max.iter,truere=truere,method=0)
library(MASS) library(Matrix) yita=0.5;V=2;chushi=100;K=3;r=0.5;max.iter=10;n1=n2=n3=70;gamma=0.1;alpha=0.98;epsilon=1 X1<-rnorm(n1,20,2);X2<-rnorm(n2,25,1.5);X3<-rnorm(n3,30,2) Xv<-c(X1,X2,X3) data<-matrix(Xv,n1+n2+n3,2) data[1:70,2]<-1;data[71:140,2]<-2;data[141:210,2]<-3 truere=data[,2] X<-matrix(data[,1],n1+n2+n3,1) lamda1<-0.2;lamda2<-0.8 lamda<-matrix(c(lamda1,lamda2),nrow=1,ncol=2) sol.svd <- svd(lamda) U1<-sol.svd$u D1<-sol.svd$d V1<-sol.svd$v C1<-t(U1) Y1<-C1/D1 view<-V1 view1<-matrix(view[1,]) view2<-matrix(view[2,]) X1<-matrix(view1,n1+n2+n3,1) X2<-matrix(view2,n1+n2+n3,1) ORKMeans(X=X1,K=K,V=V,r=r,chushi=chushi,yita=yita,gamma=gamma,epsilon=epsilon, max.iter=max.iter,truere=truere,method=0)
This data matrix is the first view of the multi-view data set called Cora, the keyword view. Cora data set is a multi-view data set of machine learning papers with 4 views, a sample size of nearly 3000 and a number of features of 1500, with a number of clusters of K=4.
data("cora_view1")
data("cora_view1")
The format is: num [1:2708, 1:2708] 0 0 0 0 0 0 0 0 1 0 ...
Cora data set includes keyword view, inbound, outbound link view, and citation network view. It takes the form of a sparse matrix. It has 2708 samples and 2708 features.
http://www.cs.umd.edu/projects/linqs/projects/lbc/
http://www.cs.umd.edu/projects/linqs/projects/lbc/
data(cora_view1); str(cora_view1)
data(cora_view1); str(cora_view1)
This data matrix is the second view of Cora data set. It called the citation network view and the form of a sparse matrix. It has 2708 samples and 1433 features. Cora data set is a multi-view data set of machine learning papers with 4 views, a sample size of nearly 3000 and a number of features of 1500, with a number of clusters of K=4.
data("cora_view2")
data("cora_view2")
The format is: num [1:2708, 1:1433] 0 0 0 0 0 0 0 0 0 0 ...
The second view of Cora data set.
http://www.cs.umd.edu/projects/linqs/projects/lbc/
http://www.cs.umd.edu/projects/linqs/projects/lbc/
data(cora_view2); str(cora_view2)
data(cora_view2); str(cora_view2)
This data matrix is the third view of Cora data set. It called the inbound link view and the form of a sparse matrix. It has 2708 samples and 2708 features. Cora data set is a multi-view data set of machine learning papers with 4 views, a sample size of nearly 3000 and a number of features of 1500, with a number of clusters of K=4.
data("cora_view3")
data("cora_view3")
The format is: num [1:2708, 1:2708] 0 0 0 0 0 0 0 0 0 0 ...
The third view of Cora data set.
http://www.cs.umd.edu/projects/linqs/projects/lbc/
http://www.cs.umd.edu/projects/linqs/projects/lbc/
data(cora_view3); str(cora_view3)
data(cora_view3); str(cora_view3)
The fourth view(outbound view) of Cora data set. Cora data set is a multi-view data set of machine learning papers with 4 views, a sample size of nearly 3000 and a number of features of 1500, with a number of clusters of K=4.
data("cora_view4")
data("cora_view4")
The format is: num [1:2708, 1:2708] 0 0 0 0 0 0 0 0 1 0 ...
The fourth view of Cora data set.
http://www.cs.umd.edu/projects/linqs/projects/lbc/
http://www.cs.umd.edu/projects/linqs/projects/lbc/
data(cora_view4); str(cora_view4)
data(cora_view4); str(cora_view4)
Webkb data set contains web pages from four universities, with the corresponding clusters categorised as Professor, Student, Program, or Other pages. The data set contains four subsets of data, Cornell data set, Texas data set, Washington dataset, and Wisconsin data set.
data("cornell_cites")
data("cornell_cites")
The format is: num [1:195, 1:195] 0 0 0 0 0 0 0 0 0 0 ...
Cornell data set contains four views with a number of clusters of 5. This data set is the first view with a sample size of 195 and a number of features of 195.
http://www.cs.cmu.edu/~webkb/
M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam and S. Slattery. Learning to Extract Symbolic Knowledge from the World Wide Web. Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98).
data(cornell_cites) ## maybe str(cornell_cites) ; plot(cornell_cites) ...
data(cornell_cites) ## maybe str(cornell_cites) ; plot(cornell_cites) ...
Webkb data set contains web pages from four universities, with the corresponding clusters categorised as Professor, Student, Program, or Other pages. The data set contains four subsets of data, Cornell data set, Texas data set, Washington data set, and Wisconsin data set.
data("cornell_content")
data("cornell_content")
The format is: num [1:195, 1:1703] 0 0 0 0 0 0 0 0 0 0 ...
Cornell data set contains four views with a number of clusters of 5. This data set is the second view with a sample size of 195 and a number of features of 1703.
http://www.cs.cmu.edu/~webkb/
M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam and S. Slattery. Learning to Extract Symbolic Knowledge from the World Wide Web. Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98).
data(cornell_content) ## maybe str(cornell_content) ; plot(cornell_content) ...
data(cornell_content) ## maybe str(cornell_content) ; plot(cornell_content) ...
Webkb data set contains web pages from four universities, with the corresponding clusters categorised as Professor, Student, Program, or Other pages. The data set contains four subsets of data, Cornell data set, Texas data set, Washington dataset, and Wisconsin data set.
data("cornell_inbound")
data("cornell_inbound")
The format is: num [1:195, 1:195] 0 0 0 0 0 0 0 0 0 0 ...
Cornell data set contains four views with a number of clusters of 5. This data set is the third view with a sample size of 195 and a number of features of 195.
http://www.cs.cmu.edu/~webkb/
M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam and S. Slattery. Learning to Extract Symbolic Knowledge from the World Wide Web. Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98).
data(cornell_inbound) ## maybe str(cornell_inbound) ; plot(cornell_inbound) ...
data(cornell_inbound) ## maybe str(cornell_inbound) ; plot(cornell_inbound) ...
Webkb data set contains web pages from four universities, with the corresponding clusters categorised as Professor, Student, Program, or Other pages. The data set contains four subsets of data, Cornell data set, Texas data set, Washington data set, and Wisconsin data set.
data("cornell_outbound")
data("cornell_outbound")
The format is: num [1:195, 1:195] 0 0 0 0 0 0 0 0 0 0 ...
Cornell data set contains four views with a number of clusters of 5. This data set is the fourth view with a sample size of 195 and a number of features of 195.
http://www.cs.cmu.edu/~webkb/
M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam and S. Slattery. Learning to Extract Symbolic Knowledge from the World Wide Web. Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98).
data(cornell_outbound) ## maybe str(cornell_outbound) ; plot(cornell_outbound) ...
data(cornell_outbound) ## maybe str(cornell_outbound) ; plot(cornell_outbound) ...
This algorithm decomposes the multi-view data matrix into representative subspaces layer by layer, and generates a cluster at each layer. To enhance the diversity between the generated clusters, new redundant quantifiers arising from the proximity between samples in these subspaces are minimised. An iterative optimisation process is further introduced to simultaneously seek multiple clusters with quality and diversity.
DMC(X, K, V, r, lamda, truere, max.iter, method = 0)
DMC(X, K, V, r, lamda, truere, max.iter, method = 0)
X |
data matrix |
K |
number of cluster |
V |
number of view |
r |
first banlance parameter |
lamda |
second balance parameter |
truere |
true cluster result |
max.iter |
max iter |
method |
caculate the index of NMI |
NMI,Alpha1,center,result
Miao Yu
library(MASS) V=2;lamda=0.5;K=3;r=0.5;max.iter=10;n1=n2=n3=70 X1<-rnorm(n1,20,2);X2<-rnorm(n2,25,1.5);X3<-rnorm(n3,30,2) Xv<-c(X1,X2,X3) data<-matrix(Xv,n1+n2+n3,2) data[1:70,2]<-1;data[71:140,2]<-2;data[141:210,2]<-3 truere=data[,2] X<-matrix(data[,1],n1+n2+n3,1) lamda1<-0.2;lamda2<-0.8 lamda0<-matrix(c(lamda1,lamda2),nrow=1,ncol=2) sol.svd <- svd(lamda0) U1<-sol.svd$u D1<-sol.svd$d V1<-sol.svd$v C1<-t(U1)%*%t(X) Y1<-C1/D1 view<-V1%*%Y1 view1<-matrix(view[1,]) view2<-matrix(view[2,]) X1<-matrix(view1,n1+n2+n3,1) X2<-matrix(view2,n1+n2+n3,1) DMC(X=X1,K=K,V=V,lamda=lamda,r=r,max.iter=max.iter,truere=truere,method=0)
library(MASS) V=2;lamda=0.5;K=3;r=0.5;max.iter=10;n1=n2=n3=70 X1<-rnorm(n1,20,2);X2<-rnorm(n2,25,1.5);X3<-rnorm(n3,30,2) Xv<-c(X1,X2,X3) data<-matrix(Xv,n1+n2+n3,2) data[1:70,2]<-1;data[71:140,2]<-2;data[141:210,2]<-3 truere=data[,2] X<-matrix(data[,1],n1+n2+n3,1) lamda1<-0.2;lamda2<-0.8 lamda0<-matrix(c(lamda1,lamda2),nrow=1,ncol=2) sol.svd <- svd(lamda0) U1<-sol.svd$u D1<-sol.svd$d V1<-sol.svd$v C1<-t(U1)%*%t(X) Y1<-C1/D1 view<-V1%*%Y1 view1<-matrix(view[1,]) view2<-matrix(view[2,]) X1<-matrix(view1,n1+n2+n3,1) X2<-matrix(view2,n1+n2+n3,1) DMC(X=X1,K=K,V=V,lamda=lamda,r=r,max.iter=max.iter,truere=truere,method=0)
This function contains the calculation of five clustering effect evaluation metrics, specifically, Purity, NMI, F-score, RI, Precision and Recall, which are used to evaluate the clustering effect of the above functions, method=0 purity;method=1,precision; method=2,recall; method=3, F-score; method=4, RI.
INDEX(vec1, vec2, method = 0, mybeta = 0)
INDEX(vec1, vec2, method = 0, mybeta = 0)
vec1 |
algorithm cluster result |
vec2 |
true cluster result |
method |
Calculate the selection of indicators. |
mybeta |
caculate the index |
accuracy
P1<-c(1,1,1,2,3,2,1);truelabel<-c(1,1,1,2,2,2,3) INDEX(P1,truelabel,method=0);INDEX(P1,truelabel,method=2)
P1<-c(1,1,1,2,3,2,1);truelabel<-c(1,1,1,2,2,2,3) INDEX(P1,truelabel,method=0);INDEX(P1,truelabel,method=2)
The K-means clustering algorithm is a common clustering algorithm that divides a data set into K clusters, with each cluster represented using the mean of all samples within the cluster, referring to that mean as the j-cluster centre. The algorithm is unsupervised learning, where the categories are not known in advance and similar objects are automatically grouped into the same cluster. The K-means algorithm achieves clustering by calculating the distance between each point and the centre of mass of different clusters and assigning it to the nearest cluster. The algorithm is simple and easy to implement, but is susceptible to the initial centre of mass, the possibility of empty clusters, and the possibility of convergence to local minima. Clustering applications can be used to discover different groups of users, allowing for tasks such as precision marketing, document segmentation, finding people in the same circle in social networks, and handling anomalous data.
KMeans(X, K, V, r, max.iter, truere, method = 0)
KMeans(X, K, V, r, max.iter, truere, method = 0)
X |
data matrix |
K |
number of cluster |
V |
number of view |
r |
balance parameter |
truere |
true cluster result |
max.iter |
max iter |
method |
caculate the index of NMI |
NMI,weight,center,result
Miao Yu
library(MASS) library(Matrix) V=2;K=3;r=0.5;max.iter=10;n1=n2=n3=70 X1<-rnorm(n1,20,2);X2<-rnorm(n2,25,1.5);X3<-rnorm(n3,30,2) Xv<-c(X1,X2,X3) data<-matrix(Xv,n1+n2+n3,2) data[1:70,2]<-1;data[71:140,2]<-2;data[141:210,2]<-3 truere=data[,2] X<-matrix(data[,1],n1+n2+n3,1) lamda1<-0.2;lamda2<-0.8 lamda<-matrix(c(lamda1,lamda2),nrow=1,ncol=2) sol.svd <- svd(lamda) U1<-sol.svd$u D1<-sol.svd$d V1<-sol.svd$v C1<-t(U1) Y1<-C1/D1 view<-V1 view1<-matrix(view[1,]) view2<-matrix(view[2,]) X1<-matrix(view1,n1+n2+n3,1) X2<-matrix(view2,n1+n2+n3,1) KMeans(X=X1,K=K,V=V,r=r,max.iter=max.iter,truere=truere,method=0)
library(MASS) library(Matrix) V=2;K=3;r=0.5;max.iter=10;n1=n2=n3=70 X1<-rnorm(n1,20,2);X2<-rnorm(n2,25,1.5);X3<-rnorm(n3,30,2) Xv<-c(X1,X2,X3) data<-matrix(Xv,n1+n2+n3,2) data[1:70,2]<-1;data[71:140,2]<-2;data[141:210,2]<-3 truere=data[,2] X<-matrix(data[,1],n1+n2+n3,1) lamda1<-0.2;lamda2<-0.8 lamda<-matrix(c(lamda1,lamda2),nrow=1,ncol=2) sol.svd <- svd(lamda) U1<-sol.svd$u D1<-sol.svd$d V1<-sol.svd$v C1<-t(U1) Y1<-C1/D1 view<-V1 view1<-matrix(view[1,]) view2<-matrix(view[2,]) X1<-matrix(view1,n1+n2+n3,1) X2<-matrix(view2,n1+n2+n3,1) KMeans(X=X1,K=K,V=V,r=r,max.iter=max.iter,truere=truere,method=0)
True clustering labels for the Cora dataset, which can be applied to 4 views.
data("labelcora")
data("labelcora")
The format is: chr [1:2708] "1" "2" "3" "3" "4" "4" "5" "1" "1" "5" "1" "6" "4" "7" ...
True clustering labels for the Cora dataset, which can be applied to 4 views.
http://www.cs.umd.edu/projects/linqs/projects/lbc/
http://www.cs.umd.edu/projects/linqs/projects/lbc/
data(labelcora)
data(labelcora)
Webkb data set contains web pages from four universities, with the corresponding clusters categorised as Professor, Student, Program, or Other pages. The data set contains four subsets of data, Cornell data set, Texas data set, Washington data set, and Wisconsin data set.
data("labelcornell")
data("labelcornell")
The format is: int [1:195, 1] 1 1 2 3 3 3 2 4 3 3 ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : chr "V1"
Cornell dat aset contains four views with a number of clusters of 5. You can use this true label to calculate your clustering accuracy.
http://www.cs.cmu.edu/~webkb/
M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam and S. Slattery. Learning to Extract Symbolic Knowledge from the World Wide Web. Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98).
data(labelcornell) ## maybe str(labelcornell) ; plot(labelcornell) ...
data(labelcornell) ## maybe str(labelcornell) ; plot(labelcornell) ...
Webkb data set contains web pages from four universities, with the corresponding clusters categorised as Professor, Student, Program, or Other pages. The data set contains four subsets of data, Cornell data set, Texas data set, Washington data set, and Wisconsin data set.
data("labelTexas")
data("labelTexas")
The format is: num [1:187] 1 2 3 1 4 3 3 3 4 1 ...
Texas data set contains four views with a number of clusters of 5. You can use this true label to calculate your clustering accuracy.
http://www.cs.cmu.edu/~webkb/
M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam and S. Slattery. Learning to Extract Symbolic Knowledge from the World Wide Web. Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98).
data(labelTexas) ## maybe str(labelTexas) ; plot(labelTexas) ...
data(labelTexas) ## maybe str(labelTexas) ; plot(labelTexas) ...
Webkb data set contains web pages from four universities, with the corresponding clusters categorised as Professor, Student, Program, or Other pages. The data set contains four subsets of data, Cornell data set, Texas data set, Washington data set, and Wisconsin data set.
data("labelWashington")
data("labelWashington")
The format is: num [1:230] 1 2 2 2 2 2 2 2 2 2 ...
Washington data set contains four views with a number of clusters of 5. You can use this true label to calculate your clustering accuracy.
http://www.cs.cmu.edu/~webkb/
M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam and S. Slattery. Learning to Extract Symbolic Knowledge from the World Wide Web. Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98).
data(labelWashington) ## maybe str(labelWashington) ; plot(labelWashington) ...
data(labelWashington) ## maybe str(labelWashington) ; plot(labelWashington) ...
Webkb data set contains web pages from four universities, with the corresponding clusters categorised as Professor, Student, Program, or Other pages. The data set contains four subsets of data, Cornell dataset, Texas dataset, Washington dataset, and Wisconsin data set.
data("labelWisconsin")
data("labelWisconsin")
The format is: num [1:265] 1 2 3 3 1 1 1 1 1 1 ...
Wisconsin data set contains four views with a number of clusters of 5. You can use this true label to calculate your clustering accuracy.
http://www.cs.cmu.edu/~webkb/
M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam and S. Slattery. Learning to Extract Symbolic Knowledge from the World Wide Web. Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98).
data(labelWisconsin) ## maybe str(labelWisconsin) ; plot(labelWisconsin) ...
data(labelWisconsin) ## maybe str(labelWisconsin) ; plot(labelWisconsin) ...
The first view(keyword view) of Movie data set. Movie data set contains 2 views, each containing 1878 variables from 617 instances, and the number of clusters to be clustered is K = 17. The number of clusters is large, so it is difficult to cluster. The data set was extracted from IMDb and the main objective was to to find the movie genres, combined from two view matrices.
data("movie_1")
data("movie_1")
The format is: num [1:617, 1:1878] 1 0 0 0 0 0 0 0 0 0 ...
The first view of Movie dataset.
https://lig-membres.imag.fr/grimal/data.html.
C. Grimal. the multi-view movie data set. 2010. URL https://lig-membres.imag.fr/grimal/data.html.
data(movie_1); str(movie_1)
data(movie_1); str(movie_1)
The second view(participant view) of Movie data set. Movie data set contains 2 views, each containing 1878 variables from 617 instances, and the number of clusters to be clustered is K = 17. The number of clusters is large, so it is difficult to cluster. The data set was extracted from IMDb and the main objective was to to find the movie genres, combined from two view matrices.
data("movie_2")
data("movie_2")
The format is: num [1:617, 1:1398] 1 0 0 0 0 0 0 0 0 0 ...
The second view of Movie data set.
https://lig-membres.imag.fr/grimal/data.html.
C. Grimal. the multi-view movie data set. 2010. URL https://lig-membres.imag.fr/grimal/data.html.
data(movie_2); str(movie_2)
data(movie_2); str(movie_2)
Online gradient descent is an optimisation algorithm in machine learning for when the amount of data is too large to process all the data at the same time. In this algorithm, the model parameters are updated based on a single training sample, rather than using the entire training set. The direction of each update is determined by the direction of the gradient of the current sample, and the local or global extremes of the gradient descent algorithm depend on the order of the sampled samples. Compared to Batch Gradient Descent (BGD) algorithm, online gradient descent algorithms can process data streams and update the model as they process the data, and are therefore more efficient for large-scale data. However, online gradient descent algorithm should only be used if the data stream is continuously present and updated.
OGD(X, K, gamma, max.m, chushi, yita, epsilon, truere, method = 0)
OGD(X, K, gamma, max.m, chushi, yita, epsilon, truere, method = 0)
X |
data matrix |
K |
number of cluster |
gamma |
step size |
yita |
the regularized parameter |
truere |
true cluster result |
max.m |
max iter |
epsilon |
epsilon |
chushi |
the initial value |
method |
caculate the index of NMI |
result,NMI,M
Miao Yu
yita=0.5;V=2;K=3;chushi=100;epsilon=1;gamma=0.1;max.m=10;n1=n2=n3=70 X1<-rnorm(n1,20,2);X2<-rnorm(n2,25,1.5);X3<-rnorm(n3,30,2) Xv<-c(X1,X2,X3) data<-matrix(Xv,n1+n2+n3,2) data[1:70,2]<-1;data[71:140,2]<-2;data[141:210,2]<-3 X<-matrix(data[,1],n1+n2+n3,1) truere=data[,2] lamda1<-0.2;lamda2<-0.8 lamda<-matrix(c(lamda1,lamda2),nrow=1,ncol=2) sol.svd <- svd(lamda) U1<-sol.svd$u D1<-sol.svd$d V1<-sol.svd$v C1<-t(U1) Y1<-C1/D1 view<-V1 view1<-matrix(view[1,]) view2<-matrix(view[2,]) X1<-matrix(view1,n1+n2+n3,1) X2<-matrix(view2,n1+n2+n3,1) OGD(X=X1,K=K,gamma=gamma,max.m=max.m,chushi=chushi, yita=yita,epsilon=epsilon,truere=truere,method=0)
yita=0.5;V=2;K=3;chushi=100;epsilon=1;gamma=0.1;max.m=10;n1=n2=n3=70 X1<-rnorm(n1,20,2);X2<-rnorm(n2,25,1.5);X3<-rnorm(n3,30,2) Xv<-c(X1,X2,X3) data<-matrix(Xv,n1+n2+n3,2) data[1:70,2]<-1;data[71:140,2]<-2;data[141:210,2]<-3 X<-matrix(data[,1],n1+n2+n3,1) truere=data[,2] lamda1<-0.2;lamda2<-0.8 lamda<-matrix(c(lamda1,lamda2),nrow=1,ncol=2) sol.svd <- svd(lamda) U1<-sol.svd$u D1<-sol.svd$d V1<-sol.svd$v C1<-t(U1) Y1<-C1/D1 view<-V1 view1<-matrix(view[1,]) view2<-matrix(view[2,]) X1<-matrix(view1,n1+n2+n3,1) X2<-matrix(view2,n1+n2+n3,1) OGD(X=X1,K=K,gamma=gamma,max.m=max.m,chushi=chushi, yita=yita,epsilon=epsilon,truere=truere,method=0)
This algorithm integrates the multiplicative normalization factor as an additional term in the original additivity update rule, which usually has approximately opposite direction. Thus, the improved iteration rule can be easily converted to a multiplicative version. After each iteration After each iteration, non-negativity is maintained.
OMU(X,K,V,chushi,yita,r,max.iter,epsilon,truere,method=0)
OMU(X,K,V,chushi,yita,r,max.iter,epsilon,truere,method=0)
X |
data matrix |
K |
number of cluster |
V |
number of view |
chushi |
the initial value |
yita |
the regularized parameter |
r |
banlance parameter |
max.iter |
max iter |
epsilon |
epsilon |
truere |
true cluster result |
method |
caculate the index of NMI |
NMI,result,M
yita=0.5;V=2;chushi=100;K=3;r=0.5;max.iter=10;n1=n2=n3=70;epsilon=1 X1<-rnorm(n1,20,2);X2<-rnorm(n2,25,1.5);X3<-rnorm(n3,30,2) Xv<-c(X1,X2,X3) data<-matrix(Xv,n1+n2+n3,2) data[1:70,2]<-1;data[71:140,2]<-2;data[141:210,2]<-3 truere=data[,2] X<-matrix(data[,1],n1+n2+n3,1) lamda1<-0.2;lamda2<-0.8 lamda<-matrix(c(lamda1,lamda2),nrow=1,ncol=2) sol.svd <- svd(lamda) U1<-sol.svd$u D1<-sol.svd$d V1<-sol.svd$v C1<-t(U1)%*%t(X) Y1<-C1/D1 view<-V1%*%Y1 view1<-matrix(view[1,]) view2<-matrix(view[2,]) X1<-matrix(view1,n1+n2+n3,1) X2<-matrix(view2,n1+n2+n3,1) OMU(X=X1,K=K,V=V,chushi=chushi,yita=yita,r=r,max.iter=max.iter, epsilon=epsilon,truere=truere,method=0)
yita=0.5;V=2;chushi=100;K=3;r=0.5;max.iter=10;n1=n2=n3=70;epsilon=1 X1<-rnorm(n1,20,2);X2<-rnorm(n2,25,1.5);X3<-rnorm(n3,30,2) Xv<-c(X1,X2,X3) data<-matrix(Xv,n1+n2+n3,2) data[1:70,2]<-1;data[71:140,2]<-2;data[141:210,2]<-3 truere=data[,2] X<-matrix(data[,1],n1+n2+n3,1) lamda1<-0.2;lamda2<-0.8 lamda<-matrix(c(lamda1,lamda2),nrow=1,ncol=2) sol.svd <- svd(lamda) U1<-sol.svd$u D1<-sol.svd$d V1<-sol.svd$v C1<-t(U1)%*%t(X) Y1<-C1/D1 view<-V1%*%Y1 view1<-matrix(view[1,]) view2<-matrix(view[2,]) X1<-matrix(view1,n1+n2+n3,1) X2<-matrix(view2,n1+n2+n3,1) OMU(X=X1,K=K,V=V,chushi=chushi,yita=yita,r=r,max.iter=max.iter, epsilon=epsilon,truere=truere,method=0)
For the online clustering problem, this function proposes the Online Regularized K-means Clustering (ORKMC) method to deal with online multi-view data. Firstly, for the clustering problem of multi-view data, a non-negative matrix decomposition is used as the starting point of the model to find the indicator matrix and cluster centres of each cluster; for online updating, a projected gradient descent method is proposed to perform online updating to improve the accuracy and speed of data clustering; for the overfitting phenomenon, regularisation is proposed to avoid the above problem. In addition, since the choice of regularization parameters is extremely important to the effectiveness of the ORKMC algorithm, the choice of regularization parameters varies in different datasets. In this paper, a suitable range of regularisation parameters and model parameters is given. The effectiveness of the ORKMC algorithm is tested through an extensive study of multi-view/single-view data. The validity of the ORKMC algorithm is tested through an extensive study of multi-view/single-view data.
ORKMeans(X,K,V,chushi,r,yita,gamma,alpha,epsilon,truere,max.iter,method=0)
ORKMeans(X,K,V,chushi,r,yita,gamma,alpha,epsilon,truere,max.iter,method=0)
X |
is the online single/multi-view data matrix |
K |
is the number of cluster |
V |
is the view of X |
chushi |
is the initial value for online |
yita |
is the regularized parameter |
r |
is the banlance parameter |
gamma |
is the step size |
alpha |
is the caculated the weight of view |
epsilon |
is the epsilon |
truere |
is the ture label in data set |
max.iter |
is the max iter |
method |
is the caluate the NMI |
NMI,weight,center,result
Miao Yu
library(MASS) library(Matrix) yita=0.5;V=2;chushi=100;K=3;r=0.5;max.iter=10;n1=n2=n3=70;gamma=0.1;alpha=0.98;epsilon=1 X1<-rnorm(n1,20,2);X2<-rnorm(n2,25,1.5);X3<-rnorm(n3,30,2) Xv<-c(X1,X2,X3) data<-matrix(Xv,n1+n2+n3,2) data[1:70,2]<-1;data[71:140,2]<-2;data[141:210,2]<-3 truere=data[,2] X<-matrix(data[,1],n1+n2+n3,1) lamda1<-0.2;lamda2<-0.8 lamda<-matrix(c(lamda1,lamda2),nrow=1,ncol=2) sol.svd <- svd(lamda) U1<-sol.svd$u D1<-sol.svd$d V1<-sol.svd$v C1<-t(U1) Y1<-C1/D1 view<-V1 view1<-matrix(view[1,]) view2<-matrix(view[2,]) X1<-matrix(view1,n1+n2+n3,1) X2<-matrix(view2,n1+n2+n3,1) ORKMeans(X=X1,K=K,V=V,r=r,chushi=chushi,yita=yita,gamma=gamma,epsilon=epsilon, max.iter=max.iter,truere=truere,method=0)
library(MASS) library(Matrix) yita=0.5;V=2;chushi=100;K=3;r=0.5;max.iter=10;n1=n2=n3=70;gamma=0.1;alpha=0.98;epsilon=1 X1<-rnorm(n1,20,2);X2<-rnorm(n2,25,1.5);X3<-rnorm(n3,30,2) Xv<-c(X1,X2,X3) data<-matrix(Xv,n1+n2+n3,2) data[1:70,2]<-1;data[71:140,2]<-2;data[141:210,2]<-3 truere=data[,2] X<-matrix(data[,1],n1+n2+n3,1) lamda1<-0.2;lamda2<-0.8 lamda<-matrix(c(lamda1,lamda2),nrow=1,ncol=2) sol.svd <- svd(lamda) U1<-sol.svd$u D1<-sol.svd$d V1<-sol.svd$v C1<-t(U1) Y1<-C1/D1 view<-V1 view1<-matrix(view[1,]) view2<-matrix(view[2,]) X1<-matrix(view1,n1+n2+n3,1) X2<-matrix(view2,n1+n2+n3,1) ORKMeans(X=X1,K=K,V=V,r=r,chushi=chushi,yita=yita,gamma=gamma,epsilon=epsilon, max.iter=max.iter,truere=truere,method=0)
The power K-means algorithm is a generalization of the Lloyd algorithm, which approximates the ordinary K-means algorithm by a majorization-minimization method with the descent properties and lower complexity of the Lloyd algorithm. The power K-means embeds the K-means problem into a series of better performing problems. These smooth intermediate problems have a smoother objective function and tend to guide the clustering to find a global minimum with the K-means as the objective. The method has the same iteration complexity as Lloyd's algorithm, reduces sensitivity to initialization, and greatly improves algorithm performance in the high-dimensional case.
PKMeans(X, K, yitapower, sm, max.m, truere, method = 0)
PKMeans(X, K, yitapower, sm, max.m, truere, method = 0)
X |
is the data matrix |
K |
is the number of cluster |
yitapower |
is the regularized parameter |
sm |
is the banlance parameter |
max.m |
is the max iter |
truere |
is the ture label in data set |
method |
is the caluate the NMI |
center,NMI,result
Miao Yu
library(MASS) yitapower=0.5;K=3;sm=0.5;max.m=100;n1=n2=n3=70 X1<-rnorm(n1,20,2);X2<-rnorm(n2,25,1.5);X3<-rnorm(n3,30,2) Xv<-c(X1,X2,X3) data<-matrix(Xv,n1+n2+n3,2) data[1:70,2]<-1;data[71:140,2]<-2;data[141:210,2]<-3 truere=data[,2] X11<-matrix(data[,1],n1+n2+n3,1) PKMeans(X=X11,K=K,yitapower=yitapower,sm=sm,max.m=max.m,truere=truere,method=0)
library(MASS) yitapower=0.5;K=3;sm=0.5;max.m=100;n1=n2=n3=70 X1<-rnorm(n1,20,2);X2<-rnorm(n2,25,1.5);X3<-rnorm(n3,30,2) Xv<-c(X1,X2,X3) data<-matrix(Xv,n1+n2+n3,2) data[1:70,2]<-1;data[71:140,2]<-2;data[141:210,2]<-3 truere=data[,2] X11<-matrix(data[,1],n1+n2+n3,1) PKMeans(X=X11,K=K,yitapower=yitapower,sm=sm,max.m=max.m,truere=truere,method=0)
Five different QCM gas sensors were used and five different gas measurements were made for each sensor (1-octanol, 1-propanol, 2-butanol, 2-propanol and 1-isobutanol).
data("QCM")
data("QCM")
The format is: num [1:125, 1:15] -10.06 -9.69 -12.07 -14.21 -16.57 ...
The QCM data set with K=5.
https://www.sciencedirect.com/science/article/pii/S2215098619303337.
M. F. Adak, P. Lieberzeit, P. Jarujamrus, and N. Yumusak. Classification of alcohols obtained by qcm sensors with different characteristics using abc based neural network. Engineering Science and Technology, an International Journal, 23(3):463–469, 2020. ISSN 2215-0986. doi: https://doi. org/10.1016/j.jestch.2019.06.011. URL https://www.sciencedirect.com/science/article/pii/S2215098619303337.
data(QCM); str(QCM)
data(QCM); str(QCM)
This function improves the regularized K-means clustering (RKMC) algorithm for the multi-view data clustering problem. Specifically, the regularisation term is added to the K-means algorithm to avoid overfitting of the data. Numerical analysis shows that the RKMC algorithm significantly improves the clustering performance compared to other methods. In addition, in order to reveal the structure of real data as realistically as possible, improve the clustering accuracy of high-dimensional data, and balance the weights of each view, the RKMC algorithm assigns a series of learnable weight values to each view, thus reflecting the relationship and compatibility of each view more flexibly.
RKMeans(X, K, V, yita, r, max.iter, truere, method = 0)
RKMeans(X, K, V, yita, r, max.iter, truere, method = 0)
X |
is the data matrix |
K |
is the number of cluster |
V |
is the view of X |
yita |
is the regularized parameter |
r |
is the banlance parameter |
max.iter |
is the max iter |
truere |
is the ture label in data set |
method |
is the caluate the NMI |
NMI,weight,center,result
Miao Yu
library(MASS) library(Matrix) yita=0.5;V=2;K=3;r=0.5;max.iter=10;n1=n2=n3=70 X1<-rnorm(n1,20,2);X2<-rnorm(n2,25,1.5);X3<-rnorm(n3,30,2) Xv<-c(X1,X2,X3) data<-matrix(Xv,n1+n2+n3,2) data[1:70,2]<-1;data[71:140,2]<-2;data[141:210,2]<-3 X<-matrix(data[,1],n1+n2+n3,1) truere=data[,2] lamda1<-0.2;lamda2<-0.8 lamda<-matrix(c(lamda1,lamda2),nrow=1,ncol=2) sol.svd <- svd(lamda) U1<-sol.svd$u D1<-sol.svd$d V1<-sol.svd$v C1<-t(U1) Y1<-C1/D1 view<-V1 view1<-matrix(view[1,]) view2<-matrix(view[2,]) X1<-matrix(view1,n1+n2+n3,1) X2<-matrix(view2,n1+n2+n3,1) RKMeans(X=X1,K=K,V=V,yita=yita,r=r,max.iter=max.iter,truere=truere,method=0)
library(MASS) library(Matrix) yita=0.5;V=2;K=3;r=0.5;max.iter=10;n1=n2=n3=70 X1<-rnorm(n1,20,2);X2<-rnorm(n2,25,1.5);X3<-rnorm(n3,30,2) Xv<-c(X1,X2,X3) data<-matrix(Xv,n1+n2+n3,2) data[1:70,2]<-1;data[71:140,2]<-2;data[141:210,2]<-3 X<-matrix(data[,1],n1+n2+n3,1) truere=data[,2] lamda1<-0.2;lamda2<-0.8 lamda<-matrix(c(lamda1,lamda2),nrow=1,ncol=2) sol.svd <- svd(lamda) U1<-sol.svd$u D1<-sol.svd$d V1<-sol.svd$v C1<-t(U1) Y1<-C1/D1 view<-V1 view1<-matrix(view[1,]) view2<-matrix(view[2,]) X1<-matrix(view1,n1+n2+n3,1) X2<-matrix(view2,n1+n2+n3,1) RKMeans(X=X1,K=K,V=V,yita=yita,r=r,max.iter=max.iter,truere=truere,method=0)
The Seeds data set holds data on the area, circumference, compaction, seed length, seed width, asymmetry factor, length of the ventral groove of the seed and category data for different varieties of wheat seeds. The data set contains a total of 210 records, 7 features, and one label, which is divided into 3 categories.
data("seed")
data("seed")
The format is: num [1:210, 1:8] 15.3 14.9 14.3 13.8 16.1 ...
A single-view data set named seed.
http://archive.ics.uci.edu/ml/datasets/seeds
http://archive.ics.uci.edu/ml/datasets/seeds
data(seed); str(seed)
data(seed); str(seed)
A single-view data set named Sobar. Sobar data set is a behavioural risk data set for cervical cancer, which has a number of clusters of 2.
data("sobar")
data("sobar")
The format is: num [1:72, 1:20] 10 10 10 10 8 10 10 8 10 7 ...
A single-view data set named sobar.
http://archive.ics.uci.edu/ml/datasets/Cervical+Cancer+Behavior+Risk
http://archive.ics.uci.edu/ml/datasets/Cervical+Cancer+Behavior+Risk
data(sobar); str(sobar)
data(sobar); str(sobar)
Webkb data set contains web pages from four universities, with the corresponding clusters categorised as Professor, Student, Program, or Other pages. The data set contains four subsets of data, Cornell data set, Texas data set, Washington data set, and Wisconsin data set.
data("texas_cites")
data("texas_cites")
The format is: num [1:187, 1:187] 0 1 1 1 0 1 1 0 1 0 ...
Texas data set contains four views with a number of clusters of 5. This data set is the first view with a sample size of 187 and a number of features of 187.
http://www.cs.cmu.edu/~webkb/
M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam and S. Slattery. Learning to Extract Symbolic Knowledge from the World Wide Web. Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98).
data(texas_cites) ## maybe str(texas_cites) ; plot(texas_cites) ...
data(texas_cites) ## maybe str(texas_cites) ; plot(texas_cites) ...
Webkb data set contains web pages from four universities, with the corresponding clusters categorised as Professor, Student, Program, or Other pages. The data set contains four subsets of data, Cornell dataset, Texas dataset, Washington dataset, and Wisconsin dataset.
data("texas_content")
data("texas_content")
The format is: num [1:187, 1:1703] 0 0 0 0 0 0 0 0 0 0 ...
Texas data set contains four views with a number of clusters of 5. This data set is the second view with a sample size of 187 and a number of features of 1703.
http://www.cs.cmu.edu/~webkb/
M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam and S. Slattery. Learning to Extract Symbolic Knowledge from the World Wide Web. Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98).
data(texas_content) ## maybe str(texas_content) ; plot(texas_content) ...
data(texas_content) ## maybe str(texas_content) ; plot(texas_content) ...
Webkb data set contains web pages from four universities, with the corresponding clusters categorised as Professor, Student, Program, or Other pages. The data set contains four subsets of data, Cornell data set, Texas data set, Washington data set, and Wisconsin data set.
data("texas_inbound")
data("texas_inbound")
The format is: num [1:187, 1:187] 0 0 0 0 0 0 0 0 0 0 ...
Texas data set contains four views with a number of clusters of 5. This data set is the third view with a sample size of 187 and a number of features of 187.
http://www.cs.cmu.edu/~webkb/
M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam and S. Slattery. Learning to Extract Symbolic Knowledge from the World Wide Web. Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98).
data(texas_inbound) ## maybe str(texas_inbound) ; plot(texas_inbound) ...
data(texas_inbound) ## maybe str(texas_inbound) ; plot(texas_inbound) ...
Webkb data set contains web pages from four universities, with the corresponding clusters categorised as Professor, Student, Program, or Other pages. The data set contains four subsets of data, Cornell data set, Texas data set, Washington data set, and Wisconsin data set.
data("texas_outbound")
data("texas_outbound")
The format is: num [1:187, 1:187] 0 1 1 1 0 1 1 0 1 0 ...
Texas data set contains four views with a number of clusters of 5. This data set is the fourth view with a sample size of 187 and a number of features of 187.
http://www.cs.cmu.edu/~webkb/
M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam and S. Slattery. Learning to Extract Symbolic Knowledge from the World Wide Web. Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98).
data(texas_outbound) ## maybe str(texas_outbound) ; plot(texas_outbound) ...
data(texas_outbound) ## maybe str(texas_outbound) ; plot(texas_outbound) ...
Ture label of Movie data set. You can use it to calculate the accuracy of the clustering results.
data("turelabel")
data("turelabel")
A data frame with 617 observations on the following variable.
V1
a numeric vector
Ture label of Movie data set.
https://lig-membres.imag.fr/grimal/data.html.
C. Grimal. the multi-view movie data set. 2010. URL https://lig-membres.imag.fr/grimal/data.html.
data(turelabel) ## maybe str(turelabel) ; plot(turelabel) ...
data(turelabel) ## maybe str(turelabel) ; plot(turelabel) ...
Webkb data set contains web pages from four universities, with the corresponding clusters categorised as Professor, Student, Program, or Other pages. The data set contains four subsets of data, Cornell data set, Texas data set, Washington data set, and Wisconsin data set.
data("Washington_cites")
data("Washington_cites")
The format is: num [1:230, 1:230] 2 0 0 0 0 0 0 0 0 0 ...
Washington data set contains four views with a number of clusters of 5. This data set is the third view with a sample size of 230 and a number of features of 230.
http://www.cs.cmu.edu/~webkb/
M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam and S. Slattery. Learning to Extract Symbolic Knowledge from the World Wide Web. Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98).
data(Washington_cites) ## maybe str(Washington_cites) ; plot(Washington_cites) ...
data(Washington_cites) ## maybe str(Washington_cites) ; plot(Washington_cites) ...
Webkb dataset contains web pages from four universities, with the corresponding clusters categorised as Professor, Student, Program, or Other pages. The data set contains four subsets of data, Cornell data set, Texas data set, Washington data set, and Wisconsin data set.
data("Washington_content")
data("Washington_content")
The format is: num [1:230, 1:1703] 0 0 0 0 0 0 0 0 0 0 ...
Washington data set contains four views with a number of clusters of 5. This data set is the second view with a sample size of 230 and a number of features of 1703.
http://www.cs.cmu.edu/~webkb/
M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam and S. Slattery. Learning to Extract Symbolic Knowledge from the World Wide Web. Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98).
data(Washington_content) ## maybe str(Washington_content) ; plot(Washington_content) ...
data(Washington_content) ## maybe str(Washington_content) ; plot(Washington_content) ...
Webkb data set contains web pages from four universities, with the corresponding clusters categorised as Professor, Student, Program, or Other pages. The data set contains four subsets of data, Cornell data set, Texas data set, Washington data set, and Wisconsin data set.
data("Washington_inbound")
data("Washington_inbound")
The format is: num [1:230, 1:230] 1 0 0 0 0 0 0 0 0 0 ...
Washington data set contains four views with a number of clusters of 5. This data set is the third view with a sample size of 230 and a number of features of 230.
http://www.cs.cmu.edu/~webkb/
M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam and S. Slattery. Learning to Extract Symbolic Knowledge from the World Wide Web. Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98).
data(Washington_inbound) ## maybe str(Washington_inbound) ; plot(Washington_inbound) ...
data(Washington_inbound) ## maybe str(Washington_inbound) ; plot(Washington_inbound) ...
Webkb data set contains web pages from four universities, with the corresponding clusters categorised as Professor, Student, Program, or Other pages. The data set contains four subsets of data, Cornell data set, Texas data set, Washington data set, and Wisconsin data set.
data("Washington_outbound")
data("Washington_outbound")
The format is: num [1:230, 1:230] 1 0 0 0 0 0 0 0 0 0 ...
Washington data set contains four views with a number of clusters of 5. This data set is the fourth view with a sample size of 230 and a number of features of 230.
http://www.cs.cmu.edu/~webkb/
M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam and S. Slattery. Learning to Extract Symbolic Knowledge from the World Wide Web. Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98).
data(Washington_outbound) ## maybe str(Washington_outbound) ; plot(Washington_outbound) ...
data(Washington_outbound) ## maybe str(Washington_outbound) ; plot(Washington_outbound) ...
Webkb data set contains web pages from four universities, with the corresponding clusters categorised as Professor, Student, Program, or Other pages. The data set contains four subsets of data, Cornell data set, Texas data set, Washington data set, and Wisconsin data set.
data("Wisconsin_cites")
data("Wisconsin_cites")
The format is: num [1:265, 1:265] 0 1 0 1 0 0 0 0 0 0 ...
Wisconsin data set contains four views with a number of clusters of 5. This data set is the first view with a sample size of 265 and a number of features of 265.
http://www.cs.cmu.edu/~webkb/
M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam and S. Slattery. Learning to Extract Symbolic Knowledge from the World Wide Web. Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98).
data(Wisconsin_cites) ## maybe str(Wisconsin_cites) ; plot(Wisconsin_cites) ...
data(Wisconsin_cites) ## maybe str(Wisconsin_cites) ; plot(Wisconsin_cites) ...
Webkb data set contains web pages from four universities, with the corresponding clusters categorised as Professor, Student, Program, or Other pages. The data set contains four subsets of data, Cornell data set, Texas data set, Washington data set, and Wisconsin data set.
data("Wisconsin_content")
data("Wisconsin_content")
The format is: num [1:265, 1:1703] 0 0 0 0 0 0 0 0 0 0 ...
Wisconsin data set contains four views with a number of clusters of 5. This data set is the second view with a sample size of 265 and a number of features of 1703.
http://www.cs.cmu.edu/~webkb/
M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam and S. Slattery. Learning to Extract Symbolic Knowledge from the World Wide Web. Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98).
data(Wisconsin_content) ## maybe str(Wisconsin_content) ; plot(Wisconsin_content) ...
data(Wisconsin_content) ## maybe str(Wisconsin_content) ; plot(Wisconsin_content) ...
Webkb data set contains web pages from four universities, with the corresponding clusters categorised as Professor, Student, Program, or Other pages. The data set contains four subsets of data, Cornell data set, Texas data set, Washington data set, and Wisconsin data set.
data("Wisconsin_inbound")
data("Wisconsin_inbound")
The format is: num [1:265, 1:265] 0 1 0 1 0 0 0 0 0 0 ...
Wisconsin data set contains four views with a number of clusters of 5. This data set is the third view with a sample size of 265 and a number of features of 265.
http://www.cs.cmu.edu/~webkb/
M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam and S. Slattery. Learning to Extract Symbolic Knowledge from the World Wide Web. Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98).
data(Wisconsin_inbound) ## maybe str(Wisconsin_inbound) ; plot(Wisconsin_inbound) ...
data(Wisconsin_inbound) ## maybe str(Wisconsin_inbound) ; plot(Wisconsin_inbound) ...
Webkb data set contains web pages from four universities, with the corresponding clusters categorised as Professor, Student, Program, or Other pages. The data set contains four subsets of data, Cornell data set, Texas data set, Washington data set, and Wisconsin data set.
data("Wisconsin_outbound")
data("Wisconsin_outbound")
The format is: num [1:265, 1:265] 0 0 0 0 0 0 0 0 0 0 ...
Wisconsin data set contains four views with a number of clusters of 5. This data set is the fourth view with a sample size of 265 and a number of features of 265.
http://www.cs.cmu.edu/~webkb/
M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam and S. Slattery. Learning to Extract Symbolic Knowledge from the World Wide Web. Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98).
data(Wisconsin_outbound) ## maybe str(Wisconsin_outbound) ; plot(Wisconsin_outbound) ...
data(Wisconsin_outbound) ## maybe str(Wisconsin_outbound) ; plot(Wisconsin_outbound) ...