6

Click here to load reader

Clustering and Matlab - QuestionInBox

Embed Size (px)

Citation preview

Page 1: Clustering and Matlab - QuestionInBox

8/19/2019 Clustering and Matlab - QuestionInBox

http://slidepdf.com/reader/full/clustering-and-matlab-questioninbox 1/6

HomeDayWeek MonthYear Contact

Search

clustering and matlab

Visits(95) Posted by AmroGarrith Graham 3.2k

Hi im trying to cluster some data I have from the kdd 1999 cup dataset

the output from the file looks like this:

0,tcp,http,SF,239,486,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,19,19,1.00,0.00,0.05,0.00,0.00,0.00,0.00,0.00,normal.

with 48 thousand different records in that format. I have cleaned the data up and removed the text keeping only the numbers. The output looks like this now:

I created a comma dillemated file in excel and saved as a csv file then created a data source from the csv file in matlab, ive tryed running it through the fcm toolboxin matlab (findcluster outputs 38 data types which is expected with 38 columns).

The clusters however dont look like clusters or its not accepting and working the way I need it to.

Could anyone help finding the clusters? Im new to matlab so dont have any experience and Im also new to clustering.

The method:

1. Chose number of clusters (K)2. Initialize centroids (K patterns randomly chosen from data set)

3. Assign each pattern to the cluster with closest centroid4. Calculate means of each cluster to be its new centroid5. Repeat step 3 until a stopping criteria is met (no pattern move to another cluster)

This is what I'm trying to achieve:

Page 2: Clustering and Matlab - QuestionInBox

8/19/2019 Clustering and Matlab - QuestionInBox

http://slidepdf.com/reader/full/clustering-and-matlab-questioninbox 2/6

This is what im getting:

load kddcup1.datplot(kddcup1(:,1),kddcup1(:,2),'o') [center,U,objFcn] = fcm(kddcup1,2); Iteration count = 1, obj. fcn = 253224062681230720.000000 Iteration count = 2, obj. fcn = 241493132059137410.000000 Iteration count = 3, obj. fcn = 241484544542298110.000000 Iteration count = 4, obj. fcn = 241439204971005280.000000 Iteration count = 5, obj. fcn = 241090628742523840.000000 Iteration count = 6, obj. fcn = 239363408546874750.000000 Iteration count = 7, obj. fcn = 238580863900727680.000000 Iteration count = 8, obj. fcn = 238346826370420990.000000 Iteration count = 9, obj. fcn = 237617756429912510.000000 Iteration count = 10, obj. fcn = 226364785036628320.000000 Iteration count = 11, obj. fcn = 94590774984961184.000000 Iteration count = 12, obj. fcn = 2220521449216102.500000 Iteration count = 13, obj. fcn = 2220521273191876.200000 Iteration count = 14, obj. fcn = 2220521273191876.700000 Iteration count = 15, obj. fcn = 2220521273191876.700000

figureplot(objFcn) title('Objective Function Values') xlabel('Iteration Count') ylabel('Objective Function Value')

maxU = max(U); index1 = find(U(1, :) == maxU); index2 = find(U(2, :) == maxU); figureline(kddcup1(index1, 1), kddcup1(index1, 2), 'linestyle',... 'none','marker', 'o','color','g'); line(kddcup1(index2,1),kddcup1(index2,2),'linestyle',... 'none','marker', 'x','color','r'); hold onplot(center(1,1),center(1,2),'ko','markersize',15,'LineWidth',2)

plot(center(2,1),center(2,2),'kx','markersize',15,'LineWidth',2)

Answer (1) matlab machine-learning cluster-analysis data-mining fuzzy

# 1

Answered by Amro

Page 3: Clustering and Matlab - QuestionInBox

8/19/2019 Clustering and Matlab - QuestionInBox

http://slidepdf.com/reader/full/clustering-and-matlab-questioninbox 3/6

Since you are new to machine-learning/data-mining, you shouldn't tackle such advanced problems. After all, the data you are working with was used in acompetition (KDD Cup'99), so don't expect it to be easy!

Besides the data was intended for a classification task (supervised learning), where the goal is predict the correct class (bad/good connection). You seem to beinterested in clustering (unsupervised learning), which is generally more difficult.

This sort of dataset requires a lot of preprocessing and clever feature extraction. People usually employ domain knowledge (network intrusion detection) to obtain better features from the raw data.. Directly applying simple algorithms like K-means will generally yield poor results.

For starters, you need to normalize the attributes to be of the same scale: when computing the euclidean distance as part of step 3 in your method, the features withvalues such as 239 and 486 will dominate over the other features with small values as 0.05, thus disrupting the result.

Another point to remember is that too many attributes can be a bad thing (curse of dimensionality). Thus you should look into feature selection or dimensionality

reduction techniques.

Finally, I suggest you familiarize yourself with a simpler dataset...

Related questions

1. clustering and matlabHi im trying to cluster some data I have from the kdd 1999 cup dataset the output from the file looks like this:0,tcp,http,SF,239,486,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,19,19,1.00,0.00,0.05,0.00,0.00,0.00,0.00,0.00,nor...

2. fuzzy c- means categorical datacan the fuzzy c-means applied on non numerical data sets ? i.e categorical or mixed numerical and categorical..if yes (I hope so :( ): how we calculate cluster centers ? If NO , what is the alternative .. how to fuzzy clusters these data ? I need the resp...

3. Fuzzy c-means tcp dump clustering in matlabHi I have some data thats represented like this:0,tcp,http,SF,239,486,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,19,19,1.00,0.00,0.05,0.00,0.00,0.00,0.00,0.00,normal. Its from thekdd cup 1999 which was based on the darpa set....

4. Fuzzy K-modes clustering how to find the cluster centersI'm trying to understand fuzzy k-modes algorithm (look mainly at page 3) in order to implement it. I'm stuck at the calculation of cluster centers they said asshown in the pic I need to know whether the following is true or false and please correct me In...

5. fuzzy k-mode clustering membership value calculationI was searching for a clustering algorithm to fuzzy cluster categorical attributes and I found the k-modes algorithm I've got the way it works but I'm notunderstanding if the membership or belonging matrix is calculated the same way as this matrix in fuz...

6. matlab clustering and data formatsLeading on from a previous question FCM Clustering numeric data and csv/excel file Im now trying to figure out how to take the outputed information andcreate a workable .dat file for use with clustering in matlab. %# read the list of featuresfid = fopen(...

7. is there a discretized method available in matlab?I have a set attributes like so in my data file: The selected attributes consists of both discrete and continuous attribute types. The attributes Protocol Type andService are of type discrete and the attribute Src Bytes, Dst Bytes, Count are of continuou...

8. is there a discretized method available in matlab?I have a set attributes like so in my data file: The selected attributes consists of both discrete and continuous attribute types. The attributes Protocol Type andService are of type discrete and the attribute Src Bytes, Dst Bytes, Count are of continuou...

9. Clustering strings in R (is it possible?)I have a dataset with a column that is currently being treated as a factor with 1000+ levels. These are values for the column. I would like to clean up thisdata.Some values are strings like "-18 + 5 = -13" and "5 - 18 = -13", I would like the clustering ...

10. FCM Clustering numeric data and csv/excel fileHi I asked a previous question that gave a reasonable answer and I thought I was back on track, Fuzzy c-means tcp dump clustering in matlab the problem isthe preprocessing stage of the below tcp/udp data that I would like to run through matlabs fcm clust...

11. Selecting an appropriate similarity metric & assessing the validity of a k-means clustering modelI have implemented k-means clustering for determining the clusters in 300 objects. Each of my objecthas about 30 dimensions. The distance is calculatedusing the Euclidean metric. I need to know How would I determine if my algorithms works correctly? I ca...

12. input must be empty or a format stringHi I keep getting an error with this: %% generate sample dataK = 3;numObservarations = 12000;dimensions = 20;data = fopen('M.dat','rt');C = textscan(data,[numObservarations dimensions]); ??? Error using == textscanSecond input must be empty or a format st...

13. .dat file how to create one based on excel documentI have a .csv file in my matlab folder with 38 columns and about 48 thousand entries. I was hoping on using the findcluster gui but it only accepts .dat files.How do I create a .dat file in matlab or specifically how do I convert the .csv file into a .da...

14. Kmeans going exceptionally slow when clustering more than 3 documents [closed]I'm trying to use kmeans to cluster similar documents to each other. I am using NLTK's KMeans. When I only cluster 3 documents, it takes less than 5seconds. But once I add in a fourth document, it doesn't finish (I cut it out after 10 minutes). When ther...

15. What customizable machine learning toolkits are available?I'm looking for a machine learning toolkit that will allow me to specify custom similarity measures as well as choose my own representations for the data. Cananyone point me to any such toolkits? Preferably Python or Java. Thank you. ...

16. Seed selection strategies for K-meansI wonder what kind of seed selection methods I can apply to K-means algorithm. Google search wasn't that helpful. Any suggestions? ...

17. Classifying a classifier I've implemented a classifier which Each iteration receives a parameter object to classify, some objects share a classifiable "property" like a color name.Classification parameters could change, so they are parametrized tooand passed to this classifier a...

18. Classifying a classifier I've implemented a classifier which Each iteration receives a parameter object to classify, some objects share a classifiable "property" like a color name.Classification parameters could change, so they are parametrized tooand passed to this classifier a...

19. Training data for sentiment analysisWhere can I get a corpus of documents that have already been classified as positive/negative for sentiment in the corporate domain? I want a large corpus of documents that provide reviews for companies, like reviews of companies provided by analysts and m...

20. Data clustering in KMeans Algorithm using binary tree structureI am having trouble in generating code for KMeans clustering in java. I have already known the algorithm but it's very hard to write in in java code.Myassignment is to retrieve data from database then run the Clustering with KMeans, in this case, the dat...

21. Cluster adjacency matrix of different sizesI have created adjacency matrix for directed graphs of different sizes. I have around 30,000 matrices, each on a separate text file. How can I cluster them, isthere any tools available. What is the best way to represent a directed graph for clustering. T...

22. K Means Clustering using MahoutI'm using the clustering technique given here for clustering a large dataset, which is given in Mahout examples. However, when I visualize the particular clustering I get the following figure. I'm really struggling to understand what this actually means a...

23. Opensource data mine tools, searching for a good option (GNU data mining apps) [closed]

Page 4: Clustering and Matlab - QuestionInBox

8/19/2019 Clustering and Matlab - QuestionInBox

http://slidepdf.com/reader/full/clustering-and-matlab-questioninbox 4/6

I want to test some apps for data mining in GNU/Linux Debian, I downloaded "Gnome Data Mine Tools" fromhttp://www.togaware.com/datamining/gdatamine/ ; I followed the instructions, I installed the app(s) and then it says that you should run the command: g...

24. Ways to determine a group of units in RTSLooking for an algorithm that can be used to determine groups of units that move together as a squad in a real time strategy game like StarCraft. The directionthat I am currently look at is a clustering algorithm but having a hard time finding which one ...

25. Combining different similarities to build one final similarityIm pretty much new to data mining and recommendation systems, now trying to build some kind of rec system for users that have such parameters: cityeducation interest To calculate similarity between them im gonna apply cosine similarity and discrete simil...

26. How clustering works, especially String clustering?I heard about clustering to group similar data. I want to know how it works in the specific case for String. I have a table with more than different 100,000words. I want to identify the same word with some differences (eg.: house, house!!, hooouse, HoUse...

27. Cluster center mean of DBSCAN in R?Using dbscan in package fpc I am able to get an output of: dbscan Pts=322 MinPts=20 eps=0.005 0 1seed 0 233border 87 2total 87 235 but I need to find the

cluster center (mean of cluster with most seeds). Can anyone show me how to proceed with this? ...28. What is the difference between a Confusion Matrix and Contingency Table?

I'm writting a piece of code to evaluate my Clustering Algorithm and I find that every kind of evaluation method needs the basic data from a m*n matrix likeA = aij where aij is the number of data points that are members of class ci and elements of clus...

29. example to train hidden markov model using mallet ( machine learning for langauge engineering)I need to have a library of hmm for sequence modeling to model labeling of sentences in text. For this i explored to example of MALLET source code"TrainHMM" but stucked with not having training and testing file reference and description.Please help.. Reg...

30. Find statistical correlations in a relational databaseI have a large SQL database of associations between state features and a reward metric. e.g. A ^ B ^ C ^ D ^ Action(E) = 0.1F ^ G ^ W ^ D ^ Action(R,P,H) =0.9A ^ T ^ U ^ Y ^ Action(A,S) = 0.2 My features may be discrete, continuous, or nominal. I'm tryin...

31. The approach to calculating 'similar' objects based on certain weighted criteriaI have a site that has multiple Project objects. Each project has (for example): multiple tags multiple categories a size multiple types etc. I would like to write amethod to grab all 'similar' projects based on the above criteria. I can easily retrieve ...

32. sequence mining for time and product predictionI am facing a tricky problem about sequence mining, say I have 10 products, I have million records of these products are purchased. Each user may have only1 record or 100 records..such as : user 1, p1, t1user 1, p1, t2user 1, p2, t3user 1, p3, t4user 1, ...

33. Text classification categorisation pointersi am trying to develop a very simple program for classifying and categorising documents using various algorithms. My problem, since i am a beginner is that icannot find good articles or websites for simple tutorials of how to get started with it. I have ...

34. Predicting Values with k-Means Clustering AlgorithmI'm messing around with machine learning, and I've written a K Means algorithm implementation in Python. It takes a two dimensional data and organisesthem into clusters. Each data point also has a class value of either a 0 or a 1. What confuses me about ...

35. Good data set for Pre-processingI am enrolled in an under-graduate course in Data Mining and I've got an assignment to code a Data Mining Pre-processor. I have the liberty to choose the

programming language and the data set. I was wondering if anybody could suggest a good data set to us...36. How to store large number of ngrams efficently?

I am extracting 4-grams from binary items in hexadecimal form, this mean I can have at most 65535 different grams per item. I want to associate every item toit's grams and their frequency but I am puzzled on how to store everything – this is my first dat...

37. Similarity matrix -> feature vectors algorithm?If we have a set of M words, and know the similarity of the meaning of each pair of words in advance (have a M x M matrix of similarities), which algorithmcan we use to make one k-dimensional bit vector for each word, so that each pair of words can be co...

38. How to approach number guessing game(with a twist) algorithm?I am learning programming (python and algo’s) and was trying to work on a project that I find interesting. I have created a few basic python scripts but I’m not

sure how to approach a solution to a game I am trying to build. Here’s how the game will work:...39. NLP and Machine learning for sentiment analysis

I'm trying to write a program that takes text(article) as input and outputs the polarity of this text, weather its a positive or a negative sentiment. I've readextensively about different approaches but i am still confused. I read about many techniques l...

40. Data mining for significant variables (numerical): Where to start?I have a trading strategy on the foreign exchange market that I am attempting to improve upon. I have a huge table (100k+ rows) that represent every possibletrade in the market, the type of trade (buy or sell), the profit/loss after that trade closed, an...

41. Data mining for significant variables (numerical): Where to start?I have a trading strategy on the foreign exchange market that I am attempting to improve upon. I have a huge table (100k+ rows) that represent every possibletrade in the market, the type of trade (buy or sell), the profit/loss after that trade closed, an...

42. kmeans matlab code feed own data sourceI want to try this K-means clustering code on my own file how do I change it so it doesnt create random information but reads it from my own data source?%% generate sample dataK = 3;numObservarations = 100;dimensions = 3;data = rand([numObservarations di...

43. Techniques for finding repeat transactions between customers with misspellings or other change in information?This isn't a SQL Server specific question; but there might be tSQL specific options here. I've got a bunch of customer details; many of them cancel and resignup for their service. They get an entirely new account; and our datavalidation is sketchy at bes...

44. Weighted Naive Bayes Classifier in Apache MahoutI am using Naive Bayes classifier for my sentiment analysis on customer support. But unfortunately I don't have huge annotated data sets in the customer support domain. But I have a little amount of annotated data in the same domain(around 100 positive an...

45. Customer support data sets for e-mail sentiment analysisI am looking for an annotated data set in the customer support domain for a sentiment analysis, to train my Naive Bayes Classifier. Are there any such datasets available on the internet? I am unable to find any so far. How do I go about this. ...

46. Sentimental analysis using apache mahout [closed]I am planning to develop a system that would predict the mood of a given text(sentiment analysis in short). I would also prefer apache mahout because, it isseriously huge data and the system should be scalable realtime. Kindly suggest me algorithms that ...

47. Sentiment analysis in other languagesMy CSE graduation project I chose to be a simulation of a search engine that uses sentiment analysis to evaluate whether comments/reviews is

positive/negative/neutral I am not sure how would I be doing this yet, But I understood that it uses classifying a...48. Sentiment analysis in other languages

My CSE graduation project I chose to be a simulation of a search engine that uses sentiment analysis to evaluate whether comments/reviews is positive/negative/neutral I am not sure how would I be doing this yet, But I understood that it uses classifying a...

49. Identifying the entity in sentiment analysis using LingpipeI have implemented sentiment analysis using the sentiment analysis module of Lingpipe. I know that they use a Dynamic LR model for this. It just tells me if

the test string is a positive sentiment or negative sentiment. What ideas could I use to determine...50. Identifying the entity in sentiment analysis using Lingpipe

I have implemented sentiment analysis using the sentiment analysis module of Lingpipe. I know that they use a Dynamic LR model for this. It just tells me if the test string is a positive sentiment or negative sentiment. What ideas could I use to determine...

51. clusterdata Matlab functionI am using Matlab clusterdata function to classify my data (noise and non-noise) into 2 categories: noise and non-noise groups. The function works well exceptthat sometimes it names all noise data as group 1 and all non-noise data as group 2. Sometimes i...

52. Agglomerative Clustering in Matlab

Page 5: Clustering and Matlab - QuestionInBox

8/19/2019 Clustering and Matlab - QuestionInBox

http://slidepdf.com/reader/full/clustering-and-matlab-questioninbox 5/6

I have a simple 2-dimensional dataset that I wish to cluster in an agglomerative manner (not knowing the optimal number of clusters to use). The only way I've been able to cluster my data successfully is by giving the function a 'maxclust' value. For simp...

53. clustering data outputs irregular plot graphOk I will run down what im trying to achieve and how I tryed to achieve it then I will explain why I tryed this method. I have data from the KDD cup 1999 inits original format the data has 494k of rows with 42 columns. My goal is trying to cluster this d...

54. script error relating to naming conventionI have some data stored in a mat file spreadsheet when i try to run my kmeans.m script I get this error and I cant work out whats going on? Attempt to executeSCRIPT kmeans as a function Error in == kmeans at 10 [clustIDX, clusters, interClustSum, Dist] =...

55. Recommendations for log analysis tools for ejabberd logs [closed]I'm looking at a massive set of ejabberd logs, and I'm trying to pry out some useful information from them. Are there any existing tools that can help me getsome of the work done, or am I left to roll my own? ...

56. Which is a better method? libsvm or svmclassify?I have been recently trying to use svm for feature classification. While i was doing so, a question came to my mind. Which would be a better method to use,

LIBSVM or svmclassify ? What I mean by svmclassify is to use in-built functions in MATLAB such as s...57. Classification with Matlab. Recognize classes in the test set

I have a situation that seems trivial but I can't figure it out. I have a dataset in Matlab that has categorical values. For example:Outlook,Temperature,Humidity,Windy,Playsunny,hot,high,false,nosunny,hot,high,true,noovercast,hot,high,false,yesrainy,mild...

58. Unable to instantiate a Weka class in MATLABI'm trying to convert data X in MATLAB into a Weka Instance class. I'm using Weka 3.7.5 and MATLAB 7.10 (2010a). I've tried the following:

javaaddpath([WEKA_HOME 'weka.jar']);import weka.core.*;N = 3;inst = Instance( N ); And I receive the error ??? No co...59. How to start SVM training on MATLAB

I have a set of facial features that i have obtained and would like to classify using SVM. I intend to use libsvm package and use MATLAB to carry out thetraining.I have already read up on SVM by watching the Stanford lecture. But I am not sure how to use...

60. How to use libsvm in Matlab?I am new to matlab and don't know how to use libsvm. Is there any sample code for classifying some data (with 2 features) with a SVM and then visualize theresult? How about with kernel (RBF, Polynomial, and Sigmoid )?I saw that readme file in libsvm pack...

61. What is the algorithm used in learn_dmm.m in Kevin Murphy HMM toolbok?i'm going to rewrite a MATLAB script that use the Kevin Murphy's toolbok in Python. I know that there are some HMM algos implementation in python(Viterbi, Baum Welch, Backword Forward) so i think that i have everything i need to do the porting matlab--py...

62. Implementing Naïve Bayes algorithm in MATLAB - Need some guidanceI have a Binary classification problem that I need to do in MATLAB. There are two classes and the training data and testing data problems are from twoclasses and they are 2d coordinates drawn from Gaussian distributions. The samples are 2D points and the...

63. Problems using ezplot with implicit functionI am trying to visualize the decision boundary when using a Bayesian classifier in MATLAB. To do this, I have written an implicit functions which usestraining data to determine which of two classes a datapoint P=(x,y) belongs to. This is done be evaluati...

64. k-nearest-neighbor classifier in matlabI'm completely new to the k-nearest neighbor classifier algorithm. Can someone please give me a link to a good tutorial/lecture that gives a dataset so that Ican apply k-nearest neighbor to it. I really really need to learn this but due to lack of exampl...

65. Matlab Covariance Matrix Computation for Different ClassesI've got 2 different files, one of them is an input matrix (X) which has 3823*63 elements (3823 input and 63 features), the other one is a class vector (R) whichhas 3823*1 elements; those elements have values from 0 to 9 (there are 10 classes). I have to...

66. Feasibility of Machine Learning techniques for Network Intrusion DetectionIs there a machine learning concept (algorithm or multi-classifier system) that can detect the variance of network attacks(or try to). One of the biggest

problems for signature based intrusion detection systems is the inability to detect new or variant at...67. Conditional Random Fields

Is there a training and optimization algorithm for 2-D (two dimensional) conditional random fields (CRF) suited for classification of imagery? Has anyone

used CRF package in R ( http://crf.r-forge.r-project.org/html/CRF-package.html ) for image classifica...68. using precomputed kernels with libsvm

I'm currently working on classifying images with different image-descriptors. Since they have their own metrics, I am using precomputed kernels. So giventhese NxN kernel-matrices (for a total of N images) i want to train and test a SVM. I'm not very expe...

69. How to use random forests in R with missing values?I would like to fit a random forest model, but when I call library(randomForest)cars$speed[1] - NA # to simulate missing valuemodel - randomForest(speed ~.,data=cars) I get the following error Error in na.fail.default(list(speed = c(NA, 4, 7, 7, 8, 9, 10...

70. How do I form a feature vector for a classifier targeted at Named Entity Recognition?I have a set of tags (different from the conventional Name, Place, Object etc.). In my case, they are domain-specific and I call them: Entity, Action, Incident. Iwant to use these as a seed for extracting more named-entities. I came across this paper: " ...

71. Histogram approximation for streaming dataThis question is a slight extension of the one answered here . I am working on re-implementing a version of the histogram approximation found in Section 2.1of this paper , and I would like to get all my ducks in a row before beginning this process again....

72. How can I evaluate my technique?I am dealing with a problem of text summarization i.e. given a large chunk(s) of text, I want to find the most representative "topics" or the subject of the text.For this, I used various information theoretic measures such as TF-IDF, Residual IDF and Poi...

73. What's a good database for fast and frequent retrieval of large cross-sections of data?Basically, I have about 100K items that don't quite fit in memory (though they can if they absolutely have to), and I want to make a lot of comparisons of largesets of these items. For instance, imagine this was a database of user behavior, and I wanted ...

74. Dynamic text-pattern detection algorithm? [closed]I was wondering if such algorithm exists. I have a bunch of text documents and would like to find a pattern among all these documents, if a pattern exists.Please note im NOT trying to classify the documents all i want to do is find a pattern if it exists...

75. What does correlation coefficient actually represent [closed]What does correlation coefficient intuitively mean? If I have a series of X and then a series of Y, and if I input these two into Weka multilayer perceptrontreating Y as the output and X as input, I get a correleation coefficient as 0.76. What does this ...

76. What does correlation coefficient actually represent [closed]What does correlation coefficient intuitively mean? If I have a series of X and then a series of Y, and if I input these two into Weka multilayer perceptrontreating Y as the output and X as input, I get a correleation coefficient as 0.76. What does this ...

77. Reinforcement learning with neo4j: make 2 copies of the graph vs store 2 copies of all values on 1 graphI'm planning on running a machine learning algorithm that learns node values and edge weights. The algorithm is very similar to the value iteration algorithmhere . Each node represents a location and each edge is a path to a new location. Each node and e...

78. C++ Reinforcement Learning LibraryI have been looking for a C++ Library that implements Reinforcement Learning Algorithms but was not very satisfied with the results. I found the

Reinforcement Learning Toolbox 2.0 from the TU Graz but unfortunately this project is very old and I was unabl...79. What is the preferred machine learning technique for building a real-time game player simulator? [closed]

I've set out to build an AI-engine that learns to play Tetris, i.e. an engine that can improve it's performance, perhaps by adjusting its heuristics, and so forth.Let's say that I've got the GUI out of the way--where would I begin in building the engine?...

80. Reading in high dimensional data into R without use of data frameI have very sparse high dimensional (40k observations, 20k dimensions) text data in ARFF format generated by WEKA . There are 2 ARFF readers availablein R via RWeka and foreign packages. Problem with both these arff readers is that they read in the arff ...

81. need an idea about text mining for mining data from bulk of files

Page 6: Clustering and Matlab - QuestionInBox

8/19/2019 Clustering and Matlab - QuestionInBox

http://slidepdf.com/reader/full/clustering-and-matlab-questioninbox 6/6

I am new for data mining. I am doing my B.Tech final year, my final year project title is "Extraction and analysis of faculty performance of managementdiscipline from student feedback using text mining". Here we will have number of files which contains f...

82. Algorithms/methods to compile forum discussions into categorized articles or information?I'm designing and coding a knowledge based community sharing system (forum, QA, article sharing between students, professors and experts) in Java, for theweb. I need to use some data mining/text processing techniques/algorithms to analyse the discussions...

83. Search webpage that contain specific linksSuppose I wan to search the web pages that contain the links I want. I would normally use the link as the query and search it(Like in Google) Note here, I justneed to pages that contain the link. But normally, the search engine would return results that ...

84. URL path similarity/string similarity algorithmMy problem is that I need to compare URL paths and deduce if they are similar. Below I provide example data to process: # GROUP 1/robots.txt# GROUP2/bot.html# GROUP 3/phpMyAdmin-2.5.6-rc1/scripts/setup.php/phpMyAdmin-2.5.6-rc2/scripts/setup.php/phpMyAdmi...

85. Data Mining situationSuppose I have the data as mentioned below. 11AM user1 Brush 11:05AM user1 Prep Brakfast 11:10AM user1 eat Breakfast 11:15AM user1 Take bath

11:30AM user1 Leave for office 12PM user2 Brush 12:05PM user2 Prep Brakfast 12:10PM user2 eat Breakfast 12:15PM us...86. exception in thread on KMeans clustering [error]

I encounter problem on Kmeans clustering, I actually needs to cluster data input from notepad into some clusters. however I encounter exception and the cpdeis not working well. kindly needs help on this error Exception in thread "main" java.lang.NullPoin...

87. Generating clusters from adjacency matrix / edge list in R I am trying to find potential clusters or groups of nodes (forum messages, in this case). In the current data, each node (message) has been tentatively groupedtogether with n other messages, and that group given a name. So, we know that msg ID 1 has been...

88. Group n points in k clusters of equal size [duplicate]Possible Duplicate: K-means algorithm variation with equal cluster size EDIT: like casperOne point it out to me this question is a duplicate. Anyways here is amore generalized question that cover this one: http://stats.stackexchange.com/questions/8744/cl...

89. Friend GroupingI am writing a program that fetches the links between friends on facebook and then create friendship groups from these links. I have got as far as creating thedata structure which is something like [ friend_id:[ mutual_friend_id, mutual_friend_id, mutual...

90. Java library method or algorithm to estimate aggregate string similarity?I have responses from users to multiple choice questions, e.g. (roughly): Married/SingleMale/FemaleAmerican/Latin American/European/Asian/African WhatI want is to estimate similarity by aggregating all responses into a single field which can be compared ...

91. Markov Clustering AlgorithmI've been working through the following example of the details of the Markov Clustering algorithm: http://www.cs.ucsb.edu/~xyan/classes/CS595D-2009winter/MCL_Presentation2.pdf I feel like I have accurately represented the algorithm but I am not getting th...

92. Clustering a list using boundary functionGiven a list, I'd like to divide it into clusters using a "boundary function". Such function would take two consecutive elements of the list and decide whether or not they should belong to the same cluster. So essentially, I want something like this: clus...

93. Get point IDs after clustering, using python [duplicate]Possible Duplicate: Python k-means algorithm I want to cluster 10000 indexed points based on their feature vectors and get their ids after clustering i.e.cluster1:[p1, p3, p100, ...], cluster2:[...] ... Is there any way to do this in Python? Thx~ P.s. Th...

94. Clustering a sparse dataset of binary vectorsIf I have a sparse dataset where each data is described by a vector of 1000 elements, each element of this vector can be either 0 or 1 (a lot of 0 and some 1), doyou know any distance function that could help me to cluster them ? Is something like euclid...

95. R: Unused argument “label” in hclustI'm using the following code to build and hierarchical cluster: dat - read.table(textConnection("pdb PA EHSS 1avd_model.pdb 3028.0 3920.01ave_model.pdb3083.0 4019.01ij8_model.pdb 2958.0 3830.01ldo_model.pdb 2889.0 3754.01ldq_model.pdb 2758.0 3590.01lel_m...

96. Clustering algorithm to cluster objects based on their relation weightI have n words and their relatedness weight that gives me a n*n matrix. I'm going to use this for a search algorithm but the problem is I need to cluster the

entered keywords based on their pairwise relation. So let's say if the keywords are tennis,feder...97. How to best do server-side geo clustering?

I want to do pre-clustering for a set of approx. 500,000 points. I haven't started yet but this is what I had thought I would do: store all points in a localSOLR index determine "natural cluster positions" according to some administrative information (big...

98. Same result from K-means and sequential K-means?Do we obtain the same result if we apply K-means and sequential K-means methods to the same dataset with the same initial settings? Explain your reasons.Personally I think the answer is No. The result obtained by sequential K-means depends on the present...

99. Excel 2010 - Create Cluster GraphIs there a way to create cluster graphs within Excel 2010? More specifically, I am looking for the type of cluster graph which resembles a scatter graph asopposed to a bar chart. I am working with k-means and the best I can achieve in Excel at the moment...

100. Google Maps Clustering Markersi have a list of markers but i want to change them as address. var data = "loc": ["longitude": -81.81718856098772, "latitude": 26.278657439364583 ,"longitude": -81.81291211952795, "latitude": 26.199298735114475,"longitude": - 81.74875180993064, "lat...

Copyright (c) 2015 questioninbox.com. All rights reserved.