74
Consistent Biclustering via Fractional 0–1 Programming Panos Pardalos, Stanislav Busygin and Oleg Prokopyev Center for Applied Optimization Department of Industrial & Systems Engineering University of Florida Consistent Biclustering via Fractional 0–1 Programming

Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

Consistent Biclusteringvia Fractional 0–1 Programming

Panos Pardalos, Stanislav Busygin and Oleg Prokopyev

Center for Applied OptimizationDepartment of Industrial & Systems Engineering

University of Florida

Consistent Biclustering via Fractional 0–1 Programming

Page 2: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Massive Datasets

The proliferation of massive datasets brings with it aseries of special computational challenges. This dataavalanche arises in a wide range of scientific andcommercial applications.

In particular, microarray technology allows one tograsp simultaneously thousands of gene expressionsthroughout the entire genome. To extract usefulinformation from such datasets a sophisticated datamining algorithm is required.

Consistent Biclustering via Fractional 0–1 Programming

Page 3: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Massive Datasets

Abello, J.; Pardalos, P.M.; Resende, M.G. (Eds.),Handbook of Massive Data Sets, Series: MassiveComputing, Vol. 4, Kluwer, 2002.

Consistent Biclustering via Fractional 0–1 Programming

Page 4: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Data Representation

A dataset (e.g., from microarray experiments) isnormally given as a rectangular m × n matrix A,where each column represents a data sample (e.g.,patient) and each row represents a feature (e.g.,gene):

A = (aij)m×n,

where the value aij is the expression of i-th feature inj-th sample.

Consistent Biclustering via Fractional 0–1 Programming

Page 5: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Major Data Mining ProblemsClustering (Unsupervised): Given a set of samplespartition them into groups of similar samplesaccording to some similarity criteria.Classification (Supervised Clustering): Determineclasses of the test samples using known classificationof training data set.Feature Selection: For each of the classes, select asubset of features responsible for creating thecondition corresponding to the class (it’s also aspecific type of dimensionality reduction ).Outlier Detection: Some of the samples are notgood representative of any of the classes. Therefore,it is better to disregard them while preforming datamining.

Consistent Biclustering via Fractional 0–1 Programming

Page 6: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Major challenges in Data Mining

Typical noisiness of data arising in many data miningapplications complicates solution of data miningproblems.

High-dimensionality of data makes complete searchin most of data mining problems computationallyinfeasible.

Some data values may be inaccurate or missing.

The available data may be not sufficient to obtainstatistically significant conclusions.

Consistent Biclustering via Fractional 0–1 Programming

Page 7: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Biclustering

Biclustering is a methodology allowing for feature setand test set clustering (supervised or unsupervised)simultaneously.

It finds clusters of samples possessing similarcharacteristics together with features creating thesesimilarities.

The required consistency of sample and featureclassification gives biclustering an advantage overother methodologies treating samples and features ofa dataset separately of each other.

Consistent Biclustering via Fractional 0–1 Programming

Page 8: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Biclustering

Figure: Partitioning of samples and features into 2 classes.

Consistent Biclustering via Fractional 0–1 Programming

Page 9: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Survey on Biclustering Methodologies

“Direct Clustering” (Hartigan)

The algorithm begins with the entire data as a singleblock and then iteratively finds the row and columnsplit of every block into two pieces. The splits aremade so that the total variance in the blocks isminimized.

The whole partitioning procedure can be representedin a hierarchical manner by trees.

Drawback: this method does NOT optimize a globalobjective function.

Consistent Biclustering via Fractional 0–1 Programming

Page 10: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Survey on Biclustering Methodologies

Cheng & Church’s algorithm

The algorithm constructs one bicluster at a time usinga statistical criterion – a low mean squared resedue(the variance of the set of all elements in thebicluster, plus the mean row variance and the meancolumn variance).

Once a bicluster is created, its entries are replacedby random numbers, and the procedure is repeatediteratively.

Consistent Biclustering via Fractional 0–1 Programming

Page 11: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Survey on Biclustering Methodologies

Graph Bipartitioning

Define a bipartite graph G(F , S, E), where F is theset of data set features, S is the set of data setsamples, and E are weighted edges such that theweight Eij = aij for the edge connecting i ∈ F withj ∈ S. The biclustering corresponds to partitioning ofthe graph into bicliques.

Consistent Biclustering via Fractional 0–1 Programming

Page 12: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Survey on Biclustering Methodologies

Given vertex subsets V1 and V2, define

cut(V1, V2) =∑i∈V1

∑j∈V2

aij

and for k vertex subsets V1, V2, . . . , Vk ,

cut(V1, V2, . . . , Vk) =∑i<j

cut(Vi , Vj)

Consistent Biclustering via Fractional 0–1 Programming

Page 13: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Survey on Biclustering Methodologies

Biclustering may be performed as

minV1,V2,...,Vk

cut(V1, V2, . . . , Vk),

on G or with some modification of the definition of cutto favor balanced clusters.

This problem is NP-hard, but spectral heuristics showgood performance [Dhillon ]

Consistent Biclustering via Fractional 0–1 Programming

Page 14: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Biclustering: Applications

Biological and Medical:

Microarray data analysis

Analysis of drug activity, Liu and Wang (2003)

Analysis of nutritional data, Lazzeroni et al. (2000)

Consistent Biclustering via Fractional 0–1 Programming

Page 15: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Biclustering: Applications

Text Mining: Dhillon (2001, 2003)

Marketing: Gaul and Schader (1996)

Dimensionality Reduction in Databases: Agrawalet al. (1998)

Others:

electoral data - Hartigan (1972)currency exchange - Lazzeroni et al. (2000)

Consistent Biclustering via Fractional 0–1 Programming

Page 16: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Biclustering: Surveys

S. Madeira, A.L. Oliveira, Biclustering Algorithms forBiological Data Analysis: A Survey, 2004.

A. Tanay, R. Sharan, R. Shamir, BiclusteringAlgorithms: A Survey, 2004.

D. Jiang, C. Tang, A. Zhang, Cluster Analysis forGene Expression Data: A Survey, 2004.

Consistent Biclustering via Fractional 0–1 Programming

Page 17: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Definitions

Data set of n samples and m features is a matrix

A = (aij)m×n,

where the value aij is the expression of i-th feature inj-th sample.

We consider classification of the samples into classes

S1,S2, . . . ,Sr , Sk ⊆ {1 . . . n}, k = 1 . . . r ,

S1 ∪ S2 ∪ . . . ∪ Sr = {1 . . . n},

Sk ∩ S` = ∅, k , ` = 1 . . . r , k 6= `.

Consistent Biclustering via Fractional 0–1 Programming

Page 18: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Definitions

This classification should be done so that samplesfrom the same class share certain commonproperties. Correpondingly, a feature i may beassigned to one of the feature classes

F1,F2, . . . ,Fr , Fk ⊆ {1 . . . m}, k = 1 . . . r ,

F1 ∪ F2 ∪ . . . ∪ Fr = {1 . . . m},

Fk ∩ F` = ∅, k , ` = 1 . . . r , k 6= `,

in such a way that features of the class Fk are“responsible” for creating the class of samples Sk .

Consistent Biclustering via Fractional 0–1 Programming

Page 19: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Definitions

This may mean for microarray data, for example,strong up-regulation of certain genes under a cancercondition of a particular type (whose samplesconstitute one class of the data set). Such asimultaneous classification of samples and featuresis called biclustering (or co-clustering ).

Consistent Biclustering via Fractional 0–1 Programming

Page 20: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Definitions

DefinitionA biclustering of a data set is a collection of pairs ofsample and feature subsets

B = ((S1,F1), (S2,F2), . . . , (Sr ,Fr ))

such that the collection (S1,S2, . . . ,Sr ) forms a partition ofthe set of samples, and the collection (F1,F2, . . . ,Fr )forms a partition of the set of features.

Consistent Biclustering via Fractional 0–1 Programming

Page 21: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Our Approach: Intuition

Let us distribute features among the classes oftraining set such that each feature belongs to theclass where its average expression among thetraining samples is highest.

Now, if we transpose the matrix, take the featureclassification as given, and re-classify the trainingsamples according to highest average expressionvalues in feature classes, will we obtain the sametraining set classification?

If yes, we will say that we obtained a consistentbiclustering .

Consistent Biclustering via Fractional 0–1 Programming

Page 22: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Consistent Biclustering

Let each sample be already assigned somehow toone of the classes S1,S2, . . . ,Sr . Introduce a 0–1matrix S = (sjk)n×r such that sjk = 1 if j ∈ Sk , andsjk = 0 otherwise.

The sample class centroids can be computed as thematrix C = (cik)m×r :

C = AS(ST S)−1,

whose k -th column represents the centroid of theclass Sk .

Consistent Biclustering via Fractional 0–1 Programming

Page 23: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Consistent Biclustering

Consider a row i of the matrix C. Each value in itgives us the average expression of the i-th feature inone of the sample classes. As we want to identify thecheckerboard pattern in the data, we have to assignthe feature to the class where it is most expressed.So, let us classify the i-th feature to the class k withthe maximal value ci k :

i ∈ Fk ⇒ ∀k = 1 . . . r , k 6= k : ci k > cik

Consistent Biclustering via Fractional 0–1 Programming

Page 24: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Consistent Biclustering

Using the classification of all features into classes F1,F2, . . ., Fr , let us construct a classification of samplesusing the same principle of maximal averageexpression. We construct a 0–1 matrix F = (fik)m×r

such that fik = 1 if i ∈ Fk and fik = 0 otherwise. Then,the feature class centroids can be computed in formof matrix D = (djk)n×r :

D = AT F (F T F )−1,

whose k -th column represents the centroid of theclass Fk .

Consistent Biclustering via Fractional 0–1 Programming

Page 25: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Consistent Biclustering

The condition on sample classification we need toverify is

j ∈ Sk ⇒ ∀k = 1 . . . r , k 6= k : dj k > djk

Consistent Biclustering via Fractional 0–1 Programming

Page 26: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Consistent Biclustering

DefinitionA biclustering B will be called consistent if the followingrelations hold:

i ∈ Fk ⇒ ∀k = 1 . . . r , k 6= k : ci k > cik

j ∈ Sk ⇒ ∀k = 1 . . . r , k 6= k : dj k > djk

Consistent Biclustering via Fractional 0–1 Programming

Page 27: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Consistent Biclustering

DefinitionA data set is biclustering-admitting if some consistentbiclustering for it exists.

DefinitionThe data set will be called conditionallybiclustering-admitting with respect to a given (partial)classification of some samples and/or features if thereexists a consistent biclustering preserving the given(partial) classification.

Consistent Biclustering via Fractional 0–1 Programming

Page 28: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Consistent Biclustering

A consistent biclustering implies separability ofthe classes by convex cones.

TheoremLet B be a consistent biclustering. Then there existconvex cones P1,P2, . . . ,Pr ⊆ Rm such that all samplesfrom Sk belong to the cone Pk and no other samplebelongs to it, k = 1 . . . r . Similarly, there exist convexcones Q1,Q2, . . . ,Qr ⊆ Rn such that all features from Fk

belong to the cone Qk and no other feature belongs to it,k = 1 . . . r .

Consistent Biclustering via Fractional 0–1 Programming

Page 29: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Conic Separability

Proof.

Let Pk be the conic hull of the samples of Sk . Supposej ∈ S`, ` 6= k , belongs to Pk . Then

a.j =∑j∈Sk

γja.j ,

where γj ≥ 0. Biclustering consistency implies thatdj` > djk , that is ∑

i∈F`ai j

|F`|>

∑i∈Fk

ai j

|Fk |

Consistent Biclustering via Fractional 0–1 Programming

Page 30: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Conic Separability

Proof (cont’d).

Plugging the conic representation of ai j , we can obtain∑j∈Sk

γjdj` >∑j∈Sk

γjdjk ,

that contradicts to dj` < djk (also implied by biclusteringconsistency).Similarly, we can show that the formulated conicseparability holds for feature classes.

Consistent Biclustering via Fractional 0–1 Programming

Page 31: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Biclustering

Supervised Biclustering

Unsupervised Biclustering

Consistent Biclustering via Fractional 0–1 Programming

Page 32: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Supervised Biclustering

One of the most important problems for real-life datamining applications is supervised classification oftest samples on the basis of information provided bytraining data.

A supervised classification method consists of tworoutines, first of which derives classification criteriawhile processing the training samples, and thesecond one applies these criteria to the test samples.

Consistent Biclustering via Fractional 0–1 Programming

Page 33: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Supervised Biclustering

In genomic and proteomic data analysis, as well as inother data mining applications, where only a smallsubset of features is expected to be relevant to theclassification of interest, the classification criteriashould involve dimensionality reduction and featureselection.

We handle such a task utilizing the notion ofconsistent biclustering. Namely, we select a subset offeatures of the original data set in such a way that theobtained subset of data becomes conditionallybiclustering-admitting with respect to the givenclassification of training samples.

Consistent Biclustering via Fractional 0–1 Programming

Page 34: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Fractional 0–1 Programming Formulation

Formally, let us introduce a vector of 0–1 variablesx = (xi)i=1...m and consider the i-th feature selected ifxi = 1.

The condition of biclustering consistency, when onlythe selected features are used, becomes∑m

i=1 aij fi kxi∑mi=1 fi kxi

>

∑mi=1 aij fikxi∑m

i=1 fikxi, ∀j ∈ Sk , k , k = 1 . . . r , k 6= k .

Consistent Biclustering via Fractional 0–1 Programming

Page 35: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Fractional 0–1 Programming Formulation

We will use the fractional relations as constraints ofan optimization problem selecting the feature set. Itmay incorporate various objective functions over x ,depending on the desirable properties of the selectedfeatures, but one general choice is to select themaximal possible number of features in order tolose minimal amount of information provided bythe training set . In this case, the objective function is

maxm∑

i=1

xi

Consistent Biclustering via Fractional 0–1 Programming

Page 36: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Fractional 0–1 Programming Formulation

One of the possible fractional 0–1 formulations basedon biclustering criterion:

maxx∈Bn

m∑i=1

xi ,

s.t.∑mi=1 aij fi kxi∑m

i=1 fi kxi≥ (1+t)

∑mi=1 aij fikxi∑m

i=1 fikxi, ∀j ∈ Sk , k , k = 1 . . . r , k 6= k ,

where t is a class separation parameter.

Consistent Biclustering via Fractional 0–1 Programming

Page 37: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Fractional 0–1 Programming Formulation

Generally, in the framework of fractional 0–1programming we consider problems, where weoptimize a multiple-ratio fractional 0–1 functionsubject to a set of linear constraints.

We have a new class of fractional 0–1 programmingproblems, where fractional terms are not in theobjective function, but in constraints, i.e. we optimizea linear objective function subject to fractionalconstraints.

How to solve fractionally constrained 0–1programming problem ?

Consistent Biclustering via Fractional 0–1 Programming

Page 38: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Linear Mixed 0–1 Formulation

We can reduce our problem to a linear mixed 0–1programming problem applying the approach similarto the one used to linearize problems with fractional0–1 objective function.

T.-H. Wu, A note on a global approach for general0–1 fractional programming, European J. Oper.Res. 101 (1997) 220–223.

Consistent Biclustering via Fractional 0–1 Programming

Page 39: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Linear Mixed 0–1 Formulation

TheoremA polynomial mixed 0–1 term z = xy , where x is a 0–1variable, and y is a continuous variable, can berepresented by the following linear inequalities:(1) z ≤ Ux ;(2) z ≤ y + L(x − 1);(3) z ≥ y + U(x − 1);(4) z ≥ Lx ,where U and L are upper and lower bounds of variable y ,i.e. L ≤ y ≤ U.

Consistent Biclustering via Fractional 0–1 Programming

Page 40: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Linear Mixed 0–1 Formulation

To linearize the fractional 0–1 program we need tointroduce new variable yk

yk =1∑m

`=1 f`kx`

, k = 1, . . . , r .

Consistent Biclustering via Fractional 0–1 Programming

Page 41: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Linear Mixed 0–1 Formulation

In terms of the new variables fractional constraintsare replaced by

m∑i=1

aij fi kxiyk ≥ (1 + t)m∑

i=1

aij fikxiyk

Consistent Biclustering via Fractional 0–1 Programming

Page 42: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Linear Mixed 0–1 Formulation

Next, observe that the term xiyk is present if and onlyif fik = 1, i.e., i ∈ Fk . So, there are totally only m ofsuch products, and hence we can introduce mvariables zi = xiyk , i ∈ Fk :

zi =xi∑m

`=1 f`kx`

, i ∈ Fk .

Consistent Biclustering via Fractional 0–1 Programming

Page 43: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Linear Mixed 0–1 Formulation

In terms of zi we have the following constraints:

m∑i=1

fikzi = 1, k = 1 . . . r .

m∑i=1

aij fi kzi ≥ (1+t)m∑

i=1

aij fikzi ∀j ∈ Sk , k , k = 1 . . . r , k 6= k .

yk − zi ≤ 1− xi , zi ≤ yk , zi ≤ xi , zi ≥ 0, i ∈ Fk .

Consistent Biclustering via Fractional 0–1 Programming

Page 44: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Supervised Biclustering

Unfortunately, while the linearization works nicely forsmall-size problems, it often creates instances, wherethe gap between the integer programming and thelinear programming relaxation optimum solutions isvery big for larger problems. As a consequence, theinstance can not be solved in a reasonable timeeven with the best techniques implemented inmodern integer programming solvers.

HuGE Index Data set: about 7000 features

ALL vs. AML Data Set: about 7000 features

GBM vs. AO data set: about 12000 features

Consistent Biclustering via Fractional 0–1 Programming

Page 45: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Heuristic

If we know that no more than mk features can beselected for class Fk , then we can impose

xi ≤ mkzi , xi ≥ zi , i ∈ Fk .

Consistent Biclustering via Fractional 0–1 Programming

Page 46: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Heuristic

Algorithm 11. Assign mk := |Fk |, k = 1 . . . r .

2. Solve the mixed 0–1 programming formulationusing the inequalities

xi ≤ mkzi , xi ≥ zi , i ∈ Fk .

instead of

yk − zi ≤ 1− xi , zi ≤ yk , zi ≤ xi , zi ≥ 0, i ∈ Fk .

3. If mk =∑m

i=1 fikxi for all k = 1 . . . r , go to 6.

4. Assign mk :=∑m

i=1 fikxi for all k = 1 . . . r .

5. Go to 2.

6. STOP.Consistent Biclustering via Fractional 0–1 Programming

Page 47: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Supervised Biclustering

After the feature selection is done, we performclassification of test samples according to thefollowing procedure.

If b = (bi)i=1...m is a test sample, we assign it to theclass Fk satisfying∑m

i=1 bi fi kxi∑mi=1 fi kxi

>

∑mi=1 bi fikxi∑m

i=1 fikxi, k = 1 . . . r , k 6= k .

Consistent Biclustering via Fractional 0–1 Programming

Page 48: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

HuGE index data set: Feature Selection

A computational experiment that we conducted wason feature selection for consistent biclustering ofthe Human Gene Expression (HuGE) Index data set.The purpose of the HuGE project is to provide acomprehensive database of gene expressions innormal tissues of different parts of human body andto highlight similarities and differences among theorgan systems.

The number of selected features (genes) is 6889 (outof 7070).

Consistent Biclustering via Fractional 0–1 Programming

Page 49: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

HuGE index data set: Feature Selection

Figure: HuGE Index heatmap.

Consistent Biclustering via Fractional 0–1 Programming

Page 50: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

ALL vs. AML data set

T. Golub at al. (1999) considered a datasetcontaining 47 samples from ALL patients and 25samples from AML patients. The dataset wasobtained with Affymetrix GeneChips.Our biclustering algorithm selected 3439 features forclass ALL and 3242 features for class AML. Thesubsequent classification contained only one error :the AML-sample 66 was classified into the ALL class.The SVM approach delivers up to 5 classificationerrors depending on how the parameters of themethod are tuned. The perfect classification wasobtained only with one specific set of values of theparameters.

Consistent Biclustering via Fractional 0–1 Programming

Page 51: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

ALL vs. AML data set

Figure: ALL vs. AML heatmap.

Consistent Biclustering via Fractional 0–1 Programming

Page 52: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

GBM vs. AO data set

The algorithm selected 3875 features for the classGBM and 2398 features for the class AO. Theobtained classification contained only 4 errors: twoGBM samples (Brain NG 1 and Brain NG 2) wereclassified into the AO class and two AO samples(Brain NO 14 and Brain NO 8) were classified intothe GBM class.

Consistent Biclustering via Fractional 0–1 Programming

Page 53: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

GBM vs. AO data set

Figure: GBM vs. AO heatmap.

Consistent Biclustering via Fractional 0–1 Programming

Page 54: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

References

S. Busygin, P. Pardalos, O. Prokopyev, Featureselection for consistent biclustering via fractional 0–1programming, Journal of Combinatorial Optimization,Vol. 10/1 (2005), pp. 7–21.

P.M. Pardalos, S. Busygin, O.A. Prokopyev, “OnBiclustering with Feature Selection for MicroarrayData Sets,” BIOMAT 2005 – International Symposiumon Mathematical and Computational Biology, R.Mondaini (ed.), World Scientific (2006), pp. 367–378.

Consistent Biclustering via Fractional 0–1 Programming

Page 55: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Unsupervised Biclustering

Suppose we want to assign each sample to one ofthe classes

S1, S2, . . . , Sr .

We introduce a 0–1 matrix S = (sjk)n×r such thatsjk = 1 if j ∈ Sk , and sjk = 0 otherwise.

We also want to classify all features into classes

F1, F2, . . . , Fr .

Let us introduce a 0–1 matrix F = (fik)m×r such thatfik = 1 if i ∈ Fk and fik = 0 otherwise.

Consistent Biclustering via Fractional 0–1 Programming

Page 56: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Unsupervised Biclustering

We have the following constraints on biclusteringconsistency :

sj k

(∑mi=1 aij fi k∑m

i=1 fi k− (1 + t)

∑mi=1 aij fik∑m

i=1 fik

)≥ 0 ∀j , k , k = 1 . . . r , k 6= k

fi k

(∑nj=1 aijsj k∑n

j=1 sj k

− (1 + t)

∑nj=1 aijsjk∑n

j=1 sjk

)≥ 0 ∀i , k , k = 1 . . . r , k 6= k

Consistent Biclustering via Fractional 0–1 Programming

Page 57: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Unsupervised Biclustering

These constraints are equivalent to

∑mi=1 aij fi k∑m

i=1 fi k− (1 + t)

∑mi=1 aij fik∑m

i=1 fik≥ −Ls

j (1− sj k)

∑nj=1 aijsj k∑n

j=1 sj k

− (1 + t)

∑nj=1 aijsjk∑n

j=1 sjk≥ −Lf

i (1− fi k)

Consistent Biclustering via Fractional 0–1 Programming

Page 58: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Unsupervised Biclustering

Lfi and Ls

j are large enough constants, which can bechosen as

Lsj = max

iaij −min

iaij

Lfi = max

jaij −min

jaij

Consistent Biclustering via Fractional 0–1 Programming

Page 59: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Linear Mixed 0–1 Reformulation

Let us introduce new variables

uk =1∑m

i=1 fik, k = 1 . . . r .

vk =1∑n

j=1 sjk, k = 1 . . . r .

zik =fik∑m

`=1 f`k, i = 1 . . . m, k = 1 . . . r .

yjk =sjk∑n

`=1 s`k, j = 1 . . . n, k = 1 . . . r .

Consistent Biclustering via Fractional 0–1 Programming

Page 60: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Linear Mixed 0–1 Reformulation

m∑i=1

aijzi k−(1+t)m∑

i=1

aijzik ≥ −Lsj (1−sj k) ∀j , k , k = 1 . . . r , k 6= k ,

n∑j=1

aijyj k−(1+t)n∑

j=1

aijyjk ≥ −Lfi (1−fi k) ∀i , k , k = 1 . . . r , k 6= k ,

m∑i=1

zik = 1,n∑

j=1

yjk = 1, k = 1 . . . r .

uk −zik ≤ 1− fik , zik ≤ uk , zik ≤ fik , zik ≥ 0, ∀i , k = 1 . . . r .

vk−yjk ≤ 1−sjk , yjk ≤ vk , yjk ≤ sjk , yjk ≥ 0, ∀j , k = 1 . . . r .

Consistent Biclustering via Fractional 0–1 Programming

Page 61: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Linear Mixed 0–1 Reformulation

The number of new continuous variables is 2r .

The number of new 0–1 variables is (m + n)r .

The total number of new variables is

2r + (m + n)r

Consistent Biclustering via Fractional 0–1 Programming

Page 62: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Additional Constraints

Each feature can be selected to at most one class

∀ir∑

k=1

fik ≤ 1

Each sample must be classified at least once

∀jr∑

k=1

sjk ≤ 1

Consistent Biclustering via Fractional 0–1 Programming

Page 63: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Additional Constraints

Each class must contain at least one feature

∀km∑

i=1

fik ≥ 1

Each class must contain at least one sample

∀kn∑

j=1

sjk ≥ 1

Consistent Biclustering via Fractional 0–1 Programming

Page 64: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Objective Function

We formulate the biclustering problem with featureselection and outlier detection as an optimizationtask and use the objective function to minimize theinformation loss. In other words the goal is to selectas many features and samples as possible while atthe same time satisfying constraints on biclusteringconsistency . The objective function may beexpressed as

max m ·r∑

k=1

n∑j=1

sjk + n ·r∑

k=1

m∑i=1

fik

Consistent Biclustering via Fractional 0–1 Programming

Page 65: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Random Data Simulation Results

We studied the existence of large biclusteringpatterns in random data sets (n = 30 and m = 30).

One would expect that such patterns would beextremely rare due to the fact that consistentbiclustering criterion is rather strong.

Consistent Biclustering via Fractional 0–1 Programming

Page 66: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Random Data Simulation Results

Surprisingly , the numerical experiments showedthat for a small number of classes (r ≤ 3) thecheckerboard pattern can be obtained on the basis ofalmost entire data set (in the case of r = 2), or atleast on the basis of a half of the data set (r = 3).

Consistent Biclustering via Fractional 0–1 Programming

Page 67: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Random Data Simulation Results

This results questions the general value ofunsupervised biclustering techniques with a smallnumber of classes . Unless some specific stronglyexpressed pattern exists in the data, unsupervisedbiclustering with a small number of classes can findany partitioning of the data set with no relevance tothe phenomenon of interest.

Consistent Biclustering via Fractional 0–1 Programming

Page 68: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Challenges

This formulation is currently computationallyintractable for data sets with more few hundredsamples/features.

New methods for solving fractionally constrained0–1 optimization problems ?!

Consistent Biclustering via Fractional 0–1 Programming

Page 69: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Alternative Computational Approach

Similarly to such clustering algorithms as k -means andSOM, we can try to achieve consistent biclustering by aniterative process.

1 Start from a random partition of samples into kgroups.

2 Put each feature into the class where its averageexpression value is largest with respect to thepartition of samples.

3 Put each sample into the class where its averageexpression value is largest with respect to thepartition of features.

4 If at least one sample or feature is moved, go to 2.

Consistent Biclustering via Fractional 0–1 Programming

Page 70: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

Alternative Computational Approach

The convergence of the procedure is not guaranteed,but in some instances it delivers plausible result.

The procedure cannot perform feature selection andoutlier detection explicitly but some of the createdclusters may be easily recognized as “junk” if theirseparation is weak.

On the HuGE dataset, the procedure clearlydesignates classes BRA, LI, and MU.

Consistent Biclustering via Fractional 0–1 Programming

Page 71: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conception of Consistent BiclusteringSupervised BiclusteringUnsupervised Biclustering

HuGE index data set: Unsupervised Result

Figure: HuGE Index heatmap.

Consistent Biclustering via Fractional 0–1 Programming

Page 72: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conclusions-I

We proposed a data mining methodology that utilizesboth sample and feature patterns, is able to performfeature selection, classification, and unsupervisedlearning.

In contrast to other biclustering schemes, consistentbiclustering is justified by the conic separationproperty.

Consistent Biclustering via Fractional 0–1 Programming

Page 73: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conclusions-II

The obtained fractional 0-1 programming problem forsupervised biclustering is tractable via arelaxation-based heuristic. The method requires fromthe user to provide only 1 parameter (t , a classseparation parameter), that is particularly attractivefor biomedical researchers who are not experts indata mining.

The consistent biclustering framework is also viablefor unsupervised learning, though the fractional 0-1programming formulation becomes intractable forreal-life datasets. Alternative approaches arepossible.

Consistent Biclustering via Fractional 0–1 Programming

Page 74: Consistent Biclustering via Fractional 0–1 Programmingmariog/slides/Pardalos_biclustering.pdf · Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised):

IntroductionConsistent Biclustering

Conclusions

Conclusions-III

A general challenge for data mining research is not tobe “fooled by randomness”. That is, revealed patternsshould have a negligible probability to appear inrandom data. Unfortunately, it is not the case forunsupervised clustering into a small number ofclasses.

Consistent Biclustering via Fractional 0–1 Programming