Measurement of Software Similarity

Embed Size (px)

Citation preview

  • 7/22/2019 Measurement of Software Similarity

    1/46

    MEASUREMENT OF SOFTWARE SIMILARITY

    UNDER THE SUPERVISION OF

    PROF. ARITRA PAN

    In the Year of 2010

    Group NO: 13

    SAMRAT GUPTA (ROLL-071531012037)

    SANJUKTA MITRA (ROLL-071531012009)

    MOU MONDAL (ROLL- 071531012064)

    MOITRAYEE MONDAL (ROLL-071531012090)

    SUSHOVAN POLLEY (ROLL-071531012065

    SYAMAPARASAD INSTITUTE OF TECHNOLOGY &

    MANAGEMENT7, Raja Ram Mohon Ray Road Kolkata: 41West

    Bengal India

    Syamaprasad Institute of Technology and Management

  • 7/22/2019 Measurement of Software Similarity

    2/46

    2

    SYAMAPARASAD INSTITUTE OF TECHNOLOGY &

    MANAGEMENT

    7, Raja Ram Mohon Ray Road Kolkata:41 West Bengal,

    India

    Certificate

    The work presented in this report is the united effort of

    Sanjukta Mitra, Samrat Gupta & Mou Mondal, Moitrayee Mondal and

    Sushovan Polley that any work of others that was used during the

    execution of the project or is included in the report has been

    suitably acknowledgement through the, standard practice of citing

    references and stating appropriate acknowledgements.

    We hereby forward the project

    entitled MEASUREMENT OF SOFTWARE SIMILARITY, presented by

    Samrat Gupta(Roll No: 071531012037 Reg. NO:

    071531012101037), Sanjukta Mitra (Roll No: 071531012009

    Reg. NO: : 071531012201009) & Mou Mondal (Roll No:

    071531012064Reg. NO: 071531012201064) & Moitrayee Mondal

    (Roll No: 071531012090 Reg No: 071531012201090)&

    Sushovan Polley(Roll No: 071531012065 Reg. NO:

    071531012101065 ) of 2007-2008 of 6th semester , Bachelor Of

    Computer Application under the guidance in partial fulfillment of

    the requirements for the degree of Bachelor Of Computer

    Application of this college.

    Prof. Aritra Pan

    Syamaprasad Institute Of Technology & Managemen

  • 7/22/2019 Measurement of Software Similarity

    3/46

    (Project Supervisor)Associate Professor.Dept. of BCA, SITM

    SYAMAPARASAD INSTITUTE OF TECHNOLOGY &

    MANAGEMENT

    7, Raja Ram Mohon Ray Road Kolkata:41 West Bengal,

    India

    Certificate Of Approval

    The forgoing project report is hereby

    approved as a creditable study of Bachelor in Computer Application

    in a manner satisfactory to warrant its acceptance as a prerequisite

    to the degree for which it has been submitted. It is understood that

    by this approval the undersigned do not necessarily endorse or

    approve any statement made, opinion expressed or conclusion

    therein but approve this project report only for the purpose for

    which it is submitted.

    .

    (External Examiners)

    Prof. Aritra Pan Prof. Manikaustabh

    Goswami

    (Project Supervisor) Teacher In - Charge

    Associate Professor SITM

    Syamaprasad Institute Of Technology & Managemen

  • 7/22/2019 Measurement of Software Similarity

    4/46

  • 7/22/2019 Measurement of Software Similarity

    5/46

    4.FLOW CHART28-29

    4.1.FLOW CHART OF THE CHARACTER MATCHING30-31

    4.2. PROGRAM OF THE CHARACTER MATCHING....32-334.3. FLOW CHART OF THE STRING MATCHING.34-35

    4.4. PROGRAM OF THE STRING MATCHING............36-37

    5. HARDWARE & SOFTWARE.....

    5.1. NECESSITY OF HARDWARE AND SOFTWARE...38

    6. ADVANTAGES39

    7. FUTURE SCOPE......40

    8. PROBLEMS.41

    9. REFERENCES.42-54

    10. CONCLUSION..55

    Syamaprasad Institute Of Technology & Managemen

  • 7/22/2019 Measurement of Software Similarity

    6/46

    ACKNOWLEDGMENT

    We would like to thank our project Supervisor Prof. Aritra Pan for her

    moral support and guidance to complete our synopsis on time.

    WE express our gratitude to all our friends and classmates for their

    support and help in this project.

    Last, but not the least we wish to express our gratitude to God

    almighty for his abundant blessings without which this synopsis would

    not have been successful.

    Syamaprasad Institute Of Technology & Managemen

  • 7/22/2019 Measurement of Software Similarity

    7/46

    ABSTRACT

    Program assignments are traditionally an area of

    serious concern in maintaining the integrity of the educational process.

    Systematic inspection of all solutions for possible plagiarism has

    generally required unrealistic amounts of time and effort. The Measure

    Of Software Similarity tool developed by Alex Aiken at UC Berkeley

    makes it possible

    to objectively and automatically check all solutions

    for evidence of plagiarism. We have used MOSS in several large sections

    of a C programming course. (MOSS can also handle

    a variety of other languages.) We feel that MOSS is a

    major innovation for faculty who teach programming and recommend

    that it be used routinely to screen for plagiarism.

    Syamaprasad Institute Of Technology & Managemen

  • 7/22/2019 Measurement of Software Similarity

    8/46

    1. INTRODUCTION

    Probably every instructor of a programming course

    has been concerned about possible plagiarism in the program

    solutions turned in by students. Instances of cheating are found, but

    traditionally only on an ad hoc basis. For example, the instructor

    may notice that two programs have the same idiosyncrasy in their

    I/O interface, or the same pattern of failures with certain test cases.

    With suspicions raised, the programs may be examined further and

    the plagiarism discovered. Obviously, this leaves much to chance.The larger the class, and the more different people involved in the

    grading, the less the chance that a given instance of plagiarism will

    be detected. For students who know about various instances of

    cheating, which instances are detected and which are not may

    seem (in fact, may be) random. A policy of comparing all pairs of

    solutions against each other for evidence of plagiarism seems like

    the correct approach. But a simple file diff would of course detect

    only the most obvious attempts at cheating. The standard dumb

    attempt at cheating on a program assignment is to obtain a copy of

    a working program and then change statement spacing, variable

    names, I/O prompts and comments. This has been enough to

    require a careful manual comparison for detection, which simply

    becomes infeasible for large classes with regular assignments.

    Thus, programming classes have been in need of an automated

    tool which allows reliable and objective detection of plagiarism.

    Syamaprasad Institute Of Technology & Managemen

  • 7/22/2019 Measurement of Software Similarity

    9/46

    1.1What is Moss

    Moss (for a Measurement Of Software Similarity) is an automatic system

    for determining the similarity of programs. To date, the main application

    of Moss has been in detecting plagiarism in programming classes. Since

    its development in 1994, Moss has been very effective in this role. The

    algorithm behind moss is a significant improvement over other cheating

    detection algorithms (at least, over those known to us. Measure Of

    Software Similarity (MOSS) is a tool for determining similarities among

    software programs, As of now MOSS can be used to detect similarities inC, C++, Java, Pascal, Ado, ML, Lisp and Scheme programs. MOSS is

    primarily used for detecting plagiarism in programming assignments in

    computer science and other engineering courses, though several text

    formats are supported as well. The latest MOSS script can be

    downloaded from the MOSS site. MOSS can execute on all UNIX, Linux

    systems which have Perl, Mail etc. After downloading the MOSS script,

    copy it to the directory consisting of the student programs. Then run

    the moss script in that directory. After execution, the script sends the

    data to MOSS server at Berkeley. MOSS server sends back a webpage

    address, which is displayed at the prompt. This webpage consists of the

    results. Results are available on the MOSS server for 14 days. The script

    can also be run with one more options which handle more complicated

    situations like comparing programs from different directories, excluding

    Certain part of a program from the comparison etc.Our project namely is

    developed on C programme code .At first we are developing a C

    programme based on String Similarity, that checks two strings in two

    separate array, if they are similar or not. When this is done successfully,

    Syamaprasad Institute Of Technology & Managemen

  • 7/22/2019 Measurement of Software Similarity

    10/46

    10

    the same checking is done on two separate files using C programme

    code.

    1.2 FEATURES

    This Measure of Software Similarity program can detect the similarity of

    any program, text, and file.

    This project can be used in any institute to prevent copy any assignment

    from other.

    This program can also applicable in search for any duplicating

    information from same program is executing in different machine which

    is connected to the main server.

    t is also applicable in online duplication.

    This project can also be used to avoid plagiarism.

    It can also be used to eliminate redundancy of data.

    It also helps to reduce the cost of a particular project.

    This project namely Measurement of Software Similarity, helps to detect

    data redundancy of any software, programme, text or file. One of the

    biggest disadvantages of data redundancy is that it increases the size of

    the database unnecessarily. Also data redundancy might cause the same

    result to be returned as multiple search results when searching the

    database causing confusion in the results. This also wastes a lot of space

    thus incurring extra cost.

    Another problem that can be met with Plagiarism is the act of taking

    credit for someone else's work. This particular project helps to eliminatethis drawback.

    Syamaprasad Institute Of Technology & Managemen

  • 7/22/2019 Measurement of Software Similarity

    11/46

    1

    1.3 plagiarism

    Plagiarism, as defined in the 1995 Random House Compact Unabridged

    Dictionary, is the "use or close imitation of the language and thoughts ofanother author and the representation of them as one's own original

    work". Within academia, plagiarism by students, professors, or

    researchers is considered academic dishonesty or academic fraud and

    offenders are subject to academic censure, up to and including

    expulsion. In journalism, plagiarism is considered a breach of

    journalistic ethics, and reporters caught plagiarizing typically face

    disciplinary measures ranging from suspension to termination of

    employment. Some individuals caught plagiarizing in academic or

    journalistic contexts claim that they plagiarized unintentionally, by failing

    to include quotations or give the appropriate citation. While plagiarism

    in scholarship and journalism has a centuries-old history, the

    development of the Internet, where articles appear as electronic text,

    has made the physical act of copying the work of others much easier.

    Plagiarism is not the same as copyright infringement. While both

    terms may apply to a particular act, they are different transgressions.

    Copyright infringement is a violation of the rights of a copyright holder,

    when material protected by copyright is used without consent. On the

    other hand, plagiarism is concerned with the unearned increment to the

    plagiarizing author's reputation that is achieved through false claims of

    authorship.

    Syamaprasad Institute Of Technology & Managemen

  • 7/22/2019 Measurement of Software Similarity

    12/46

    12

    1.3.1 Plagiarism prevention

    Plagiarism cannot be eliminated completely but som

    preventive measures may reduce plagiarism to minimum. There are threemain strategies to prevent plagiarism. First is the Trust Method, wherei

    the students are told that we trust them and they are mature enough to

    know that the test is for their benefit and cheating will prohibit thei

    chances to see how well they mastered a particular concept. Thus, th

    Trust Method trusts the learners to obey the rules and is implemented b

    making the learners sign a Honor code before appearing for the test

    Second is the Fence Method, which aims at making cheating impossible

    This is implemented by tightening the security during tests, differen

    questions for different students etc. Third Method is the Threat Method

    which threatens the learners with the punishments that they will have to

    face if plagiarism is detected. This is done, by announcing the penalt

    before the assignment submissions or tests have started. Ideally one o

    more of the above methods can be used as a preventive measure. Th

    Instructor has to decide as to which method/methods to adopt based o

    the purpose of the test. If the test is a part of the final exam of a course o

    degree, then Fence Method or Threat method or both could b

    implemented. If the test were a practice test for the self-assessment o

    learners, Trust Method would be the best. These methods are to b

    implemented before the commencement of the test/assignmen

    submission. Additionally preventive measures can also be taken whil

    conducting the test. If we have a test running parallel in a number o

    remote centers, we can have authorised proctors to inspect the exam a

    respective centers. These proctors can make sure that only authorised

    students are taking the test at proper time, without any unauthorized

    Syamaprasad Institute Of Technology & Managemen

  • 7/22/2019 Measurement of Software Similarity

    13/46

    13

    help. The tests at these centers can also be supervised by observing eac

    center by Video Conferencing from a coordinating center. 1.3.2

    Plagiarism Detection

    In the previous section we saw the preventive measures fo

    plagiarism. Now, after the test is conducted and the results are out, the

    tough job starts. The questions becoming increasingly important in thi

    context are can we trust the results that the machine has given us i.e

    Does it mean that a student has understood a concept just because hi

    score says so. If not then how do we differentiate between genuin

    attempts and copies. In other words how to we detect copies among the

    number of assignments submitted. Detecting plagiarism in a test fo

    which n students had appeared, involves comparing each solution with

    other n-1 solutions and this is not a trivial task. Let us see some attempt

    to detect plagiarism in programming tests, which have evolved over time

    Traditional attempts to detect plagiarism have been ad-hoc, typicall

    involving manual checking, of programming assignments, for plagiarismThis manual checking too mostly happens only for suspected program

    like two programs failing for same testcases, two programs looking ver

    similar by structure etc. Also, the plagiarism detection is limited t

    programs, which look alike or verbatim copy. Manually checking all th

    programs in all possible combinations of plagiarism requires fair amoun

    of time and manpower, especially when the number of programs to be

    tested is large. Inspecting all the possible combinations for more complex

    attempts of plagiarism (beyond verbatim copy), in such a scenario, is a

    tougher job. The inconvenience and limitations of traditional attempts fo

    detecting plagiarism led the instructors to exploit some advance

    methods to do the same. Instructors eventually started using available

    Syamaprasad Institute Of Technology & Managemen

  • 7/22/2019 Measurement of Software Similarity

    14/46

    14

    tools (e.g Unix utilities like diff, cmp, etc) to automate the task of

    detecting all possible combinations of plagiarism among a large set o

    programs. Use of these tools minimized the time and effort, howeve

    plagiarism detection was yet limited to verbatim copies.

    1.4 Ways to handle technologyenhanced cheating

    Focus on the process of writing - observe and coach thprocess. Require a thesis statement, an initial bibliography, an outlinenotes, a first draft etc.

    Avoid "choose any topic" papers. Tie the topic to the goals o

    the course.

    Use a few papers from "cheat sites" as examples. Provide agrade for these and use as reference material. Students will be hesitant touse a service you know about.

    Be clear and comprehensive regarding plagiarism policies. Thmore students know the less likely they will be to attempt plagiarism.

    Require students to use material from class lectures

    presentations, discussions etc in their graded assignments. This makefinding "matching" papers more difficult.

    Require students to conduct an original survey or interview apart of the assignment. The survey or transcripts of the interview arincluded as an appendix.

    Require an annotated bibliography as part of the process owriting the assignment. These are difficult to plagiarize.

    Require an abstract of the paper where appropriate. Writing aaccurate synopsis of a plagiarized paper is difficult.

    Require a description of the research process with the finadraft.

    Get to know your students. Require a writing sample during thfirst week of class. Have the students do this in their "best written style

    Syamaprasad Institute Of Technology & Managemen

  • 7/22/2019 Measurement of Software Similarity

    15/46

    1

    and make it personalized and customized to them individually. Keep thion record for comparison purposes.

    Use Plagiarism.org or Plagiarism.com to check submitted wor(links below).

    Use MOSS (Measure of Software Similarity) which detectplagiarism in programming classes (link below).

    Make assignments relatively difficult. This makes it mordifficult to get casual, though ongoing, help during the semester.

    Frequent assessments also make getting help logisticalldifficult.

    Use master type questions and case studies rather tha"memorization" questions.

    If using online quizzes - give different questions to differenstudents - i.e. use a test bank. Add a short answer question that will begraded by hand.

    If using online tests or quizzes limit the amount of time the tesis available.

    Use alternate means of assessment, portfolios and multiplmeasures of mastery.

    Use proctored exams (only if absolutely necessary).

    If you suspect plagiarism, look carefully at the paper and gentlyconfront the student with your concerns. Frequently this is enough to

    uncover or deter plagiarism.

    Require raw materials of the research process. For examplecopies of the cited works.

    Syamaprasad Institute Of Technology & Managemen

  • 7/22/2019 Measurement of Software Similarity

    16/46

    16

    1.5 Teaching activities to preventcheating

    Quizzes: Create regular, frequent (weekly or daily) quizzes fo

    students.

    Discussion: Create discussions and use participation in discussionas an aid in measuring student progress.

    Request feedback: Randomly e-mail all the students in the clasand request a comment or two on some subject.

    Variance analysis: Check the regular quiz scores to see if there i

    a sudden change. For example, a student flunks five quizzes and thenhires someone to take the final online exam and gets an A.

    Spot calls: If a teacher has any concerns about a particulaindividual, she or he can call the student and have a short discussion. Iwill quickly reveal whether the student knows the course material.

    Online chat exams: The instructor can conduct an oral charoom exam with each student to interactively test the studentsknowledge of the course material.

    Syamaprasad Institute Of Technology & Managemen

  • 7/22/2019 Measurement of Software Similarity

    17/46

    17

    1.6 Additional security techniques

    First, many of the same problems regarding the authenticity of astudents work and plagiarism exist in the traditional classroom as well

    To get someones help through an entire online program would taksubstantial effort. For most students it is just not possible to havconsistent help through many tests at many different times. Besideswho would consent to putting in so much work for someone else and noget credit for it?

    Use a log-in/password system (but of course, a student could jusgive the username and password to someone else).

    Make exercises difficult enough so that the person who hasnt donethe previous work in your course will not be able to complete thassignment.

    Give many short exams that are embedded in class exercises sothat it would be difficult for a student to have "help" there all the time.

    Ask mastery-type questions so that a student must know thmaterial himself/herself in order to answer the question (i.e. case studieVs memorization questions).

    Ask students to relate the subject matter to their owpersonal/professional/life experiences so their answers are personalizeand difficult to replicate.

    Require students to submit an outline and rough draft of termpapers and essays before the final paper is due. This way, a professocan see the work in progress.

    Syamaprasad Institute Of Technology & Managemen

  • 7/22/2019 Measurement of Software Similarity

    18/46

    1

    Give different questions to different students construct a large seof questions from which an automated testing program can randomlselect (i.e. a database of 50 questions with 10 randomly chosen).

    Limit the times when the online test is available; ensure that th

    test is taken in a certain amount of time. Some automated testingprograms allow this feature.

    Provide online exam practice sample questions, self-studquestions with answers and feedback, and require a proctored, nononline examination for course credit (i.e. on campus, at a testing centerlibrary, etc.)

    Finally, remember that testing should never be the only means by

    which you assess the abilities of your students. If they are evaluated witvarious different methods, you have the best way of ensuring that theris real learning taking place. As with a traditional classroom, the besway to assess student and course progress is to know the studenthrough the student's work and pay attention to student feedback.

    The American Association of Higher Education has devisenine principles of good practice for assessing student learning. Thescan also be helpful when thinking about how to avoid plagiarism andcheating in online courses. The principles are:

    The assessment of student learning begins with educational valuesAssessment is not an end in itself but a vehicle for educationaimprovement. Its effective practice, then, begins with and enacts a visionof the kinds of learning we most value for students and strive to helthem achieve. Educational values should drive not only whatwe choosto assess but also how we do so. Where questions about educationamission and values are skipped over, assessment threatens to be aexercise in measuring what's easy, rather than a process of improvingwhat we really care about.

    Assessment is most effective when it reflects an understanding olearning as multidimensional, integrated, and revealed in performanceover time. Learning is a complex process. It entails not only whastudents know but what they can do with what they know; it involves noonly knowledge and abilities but values, attitudes, and habits of mindthat affect both academic success and performance beyond thclassroom. Assessment should reflect these understandings b

    Syamaprasad Institute Of Technology & Managemen

  • 7/22/2019 Measurement of Software Similarity

    19/46

    19

    employing a diverse array of methods, including those that call for actuaperformance, using them over time so as to reveal change, growth, andincreasing degrees of integration. Such an approach aims for a morecomplete and accurate picture of learning, and therefore firmer bases foimproving our students' educational experience.

    Assessment works best when the programs it seeks to improvhave clear, explicitly stated purposes. Assessment is a goal-orienteprocess. It entails comparing educational performance with educationapurposes and expectations -- those derived from the institution'mission, from faculty intentions in program and course design, and fromknowledge of students' own goals. Where program purposes lacspecificity or agreement, assessment as a process pushes a camputoward clarity about where to aim and what standards to applyassessment also prompts attention to where and how program goals wi

    be taught and learned. Clear, shared, implementable goals are thcornerstone for assessment that is focused and useful.

    Assessment requires attention to outcomes but also to thexperiences that lead to those outcomes.Information about outcomes iof high importance; where students "end up" matters greatly. But timprove outcomes, we need to know about student experience along theway -- about the curricula, teaching, and kind of student effort that leadto particular outcomes. Assessment can help us understand whic

    students learn best under what conditions; with such knowledge comethe capacity to improve the whole of their learning.

    Assessment works best when it is ongoing not episodic. Assessmenis a process whose power is cumulative. Though isolated, "one-shotassessment can be better than none, improvement is best fostered whenassessment entails a linked series of activities undertaken over time

    This may mean tracking the process of individual students, or of cohortof students; it may mean collecting the same examples of studenperformance or using the same instrument semester after semester. The

    point is to monitor progress toward intended goals in a spirit ocontinuous improvement. Along the way, the assessment process itseshould be evaluated and refined in light of emerging insights.

    Assessment fosters wider improvement when representatives fromacross the educational community are involved. Student learning is campus-wide responsibility, and assessment is a way of enacting tharesponsibility. Thus, while assessment efforts may start small, the aim

    Syamaprasad Institute Of Technology & Managemen

  • 7/22/2019 Measurement of Software Similarity

    20/46

    20

    over time is to involve people from across the educational communityFaculty members play an especially important role, but assessment'questions can't be fully addressed without participation by studentaffairs educators, librarians, administrators, and students. Assessmenmay also involve individuals from beyond the campus (alumni/aetrustees, employers) whose experience can enrich the sense o

    appropriate aims and standards for learning. Thus understoodassessment is not a task for small groups of experts but a collaborativeactivity; its aim is wider, better-informed attention to student learning byall parties with a stake in its improvement.

    Assessment makes a difference when it begins with issues of useand illuminates questions that people really care about. Assessmenrecognizes the value of information in the process of improvement. Buto be useful, information must be connected to issues or questions thapeople really care about. This implies assessment approaches tha

    produce evidence that relevant parties will find credible, suggestive, andapplicable to decisions that need to be made. It means thinking in

    advance about how the information will be used, and by whom. Thepoint of assessment is not to gather data and return "results"; it is aprocess that starts with the questions of decision-makers, that involvethem in the gathering and interpreting of data, and that informs andhelps guide continuous improvement.

    Assessment is most likely to lead to improvement when it is part o

    a larger set of conditions that promote change. Assessment alonchanges little. Its greatest contribution comes on campuses where thquality of teaching and learning is visibly valued and worked at. On succampuses, the push to improve educational performance is a visible andprimary goal of leadership; improving the quality of undergraduateducation is central to the institution's planning, budgeting, anpersonnel decisions. On such campuses, information about learninoutcomes is seen as an integral part of decision making, and avidlsought.

    9.Through assessment, educators meet responsibilities to studentand to the public. There is a compelling public stake in education. Aeducators, we have a responsibility to the public that supports odepends on us to provide information about the ways in which oustudents meet goals and expectations. But that responsibility goebeyond the reporting of such information; our deeper obligation -- toourselves, our students, and society -- is to improve. Those to whom

    Syamaprasad Institute Of Technology & Managemen

  • 7/22/2019 Measurement of Software Similarity

    21/46

    2

    educators are accountable have a corresponding obligation to supporsuch attempts at improvement.

    2. PlatformProcedural programming can sometimes be used a

    a synonym for imperative programming (specifying the steps thprogram must take to reach the desired state), but can also refer (as inthis article) to a programming paradigm, derived from structureprogramming, based upon the concept of theprocedure call. Proceduresalso known as routines, subroutines, methods, or functions (not to beconfused with mathematical functions, but similar to those used ifunctional programming) simply contain a series of computational stepto be carried out. Any given procedure might be called at any poinduring a program's execution, including by other procedures or itself. Aprocedural programming language provides a programmer a mean

    to define precisely each step in the performance of a task. Thprogrammer knows what is to be accomplished and provides through thelanguage step-by-step instructions on how the task is to be done. Using aprocedural language, the programmer specifies language statements tperform a sequence of algorithmic steps. Procedural programming ioften a better choice than simple sequential or unstructureprogramming in many situations which involve moderate complexity owhich require significant ease of maintainability.

    Possible benefits:

    The ability to re-use the same code at different placein the program without copying it.An easier way to keep track of program flow than a

    collection of "GOTO" or "JUMP" statements (which can turn a largecomplicated program into spaghetti code).

    The ability to be strongly modular or structured.Emphasis is on doing things algorithm.Employs top-down approach in program design.

    Syamaprasad Institute Of Technology & Managemen

  • 7/22/2019 Measurement of Software Similarity

    22/46

    22

    Large programs are divided into smaller programknown as functions.

    3. What is Algorithm

    In mathematics, computer science, and relatesubjects, an algorithm is an effective method for solving a problemusing a finite sequence of instructions. Algorithms are used focalculation, data processing, and many other fields. Each algorithm is alist of well-defined instructions for completing a task. Starting from ainitial state, the instructions describe a computation that proceedthrough a well-defined series of successive states, eventuallterminating in a final ending state. The transition from one state to th

    next is not necessarily deterministic; some algorithms, known arandomized algorithms, incorporate randomness. If you sit down in fronof a computer and try to write a program to solve a problem, you will btrying to do four out of five things at once.

    These are:

    ANALYSE THE PROBLEM

    DESIGN A SOLUTION/PROGRAM

    CODE/ENTER THE PROGRAM

    TEST THE PROGRAM

    5. EVALUATE THE SOLUTION

    Syamaprasad Institute Of Technology & Managemen

  • 7/22/2019 Measurement of Software Similarity

    23/46

    23

    To begin with we will look at three methods used icreating an algorithm, these are

    STEPPING

    LOOPING

    CHOOSING

    3.1 ALGORITHM OF THE CHARACTERMATCHING

    STEP 1: Begin

    STEP 2: We take two file names in two pointer variable fn1 andfn2

    STEP 3: fopen(fn1)

    STEP 4: If fn1 not opened then

    Print Cannot open first file

    Return

    Else

    Print File is open

    STEP 5: c=0

    STEP 6: Repeat 6 TO 16 as long as !feof(f1)

    STEP 7: str1= fgetc(f1)

    Syamaprasad Institute Of Technology & Managemen

  • 7/22/2019 Measurement of Software Similarity

    24/46

    24

    STEP 8: f=0

    STEP 9: for i=0; i=0) thenFlag=1i=i+1

    STEP 15: if(flag==1) thenPrint match

    STEP 16: fclose(f1)

    STEP 17: fclose(f2)

    STEP 18: END

    Syamaprasad Institute Of Technology & Managemen

  • 7/22/2019 Measurement of Software Similarity

    25/46

    2

    3.2 Algorithm of the string Matching

    Step 1: Begin

    Step 2: We take two files name in two pointer variable fn1 & fn2

    Step 3: fopen (fn1)

    Step 4: If fn1 not opened thenPrint cannot open first fileReturnElsePrint file is open

    Step 5: Repeat Steps 6 TO 24 as long as! Feof (f1)

    Step 6: i =0

    Step 7: str1=NULL

    Step 8: ch= fgetc (f1)

    Step 9: Repeat Steps 10 to 12 as long as ch! = and ch! = \nand ch!= EOF

    Syamaprasad Institute Of Technology & Managemen

  • 7/22/2019 Measurement of Software Similarity

    26/46

    26

    Step 10: str1[i]= ch

    Step 11: ch=fgetc(f1)

    Step 12: i=i+1

    Step 13: fopen(fn2)

    Step 14: if fn2 not open thenPrint cannot open second fileexit

    Step 15: repeat steps 16 to 24 as long as! Feof (f2)

    Step 16: i=0

    Step 17: str2=NULL

    Step 18: ch=fgetc (f2)

    Step 19: repeat steps 20 to 22 as long as ch= and ch! \nand ch! =EOF

    Step 20:str2 [i] =ch

    Step 21: ch=fgetc (f2)

    Step 22: i=i+1

    Step 23: str1=str2 thenPrint match

    Step 24: fclose (f1)

    Step 25: fclose (f2)

    Step 26: END

    Syamaprasad Institute Of Technology & Managemen

  • 7/22/2019 Measurement of Software Similarity

    27/46

    27

    4. FLOW-CHART

    What is Flow-Chart

    A flowchart is a pictorial representation of an algorithm. It is the layout, i

    a visual, two-dimensional format, of the plan to be followed when the correspondin

    algorithm is converted into a program by writing in a programming language. It acts like

    roadmap like a programmer and guides him/her on how to go from the starting point to th

    final point while converting the algorithm into a computer program.Flow Chart is the pictorial representation of separate steps o

    a process.Using Flow-Chart one can easily design, analyze, prepar

    documentation or manage a process running in a system

    Why we use Flow-Chart

    Normally, an algorithm is first represented in the form of aflowchart and the flowchart is then expressed in some programminlanguage to prepare a computer program. The main advantage of thestwo step approach in programming writing is that while drawing flowchart, a programmer is not concern with the details of the elementof programming language. Hence, he/she can fully concentrate on thlogic of the procedure. Moreover, since a flowchart shows the flow ooperations in pictorial form, any error in the logic of the procedure cabe detected more easily than in the case of a program. Once th

    Syamaprasad Institute Of Technology & Managemen

  • 7/22/2019 Measurement of Software Similarity

    28/46

    2

    flowchart is ready the programmer can forget about the logic and canconcentrate only on coding the operations in each box of the flowchart interms of the statements of the programming language. This will normallyensure an error-free program.

    The symbols used in a Flow-Chart are shown below:

    SYMBOLNAMEDESCRIPTION

    TerminatorTo indicate The start or stop of a Flow-Chart

    Input/OutputTo take any input or output In th Flow-Chart

    ProcessTo represent a running process in the Flow-Chart

    Decision BoxTo make a decision

    ConnectorTo connect one part of the flow chart to the other, oto continue flow Chart from one page to another.

    4.1 Flow chart of Character Matching

    Syamaprasad Institute Of Technology & Managemen

    Input fn1, fn2

    C = 0

    Fopen (fn1)

    Start

  • 7/22/2019 Measurement of Software Similarity

    29/46

    29

    Yes

    Syamaprasad Institute Of Technology & Managemen

    Str 1 = fgetc (f1)

    f = 0

    l = 0

    If

    match[i

    ] =STR

    1

    f = 1

    I =i+1

    If

    i

  • 7/22/2019 Measurement of Software Similarity

    30/46

    30

    4.2 PROGRAM OF THE CHARACTERMATCHING

    Syamaprasad Institute Of Technology & Managemen

    Str2=fgetc (f2)

    If (str1 =

    str2) and

    (str1 >=0)

    Flag =1

    I = i+1

    If? Feof

    (f2)

    If flag

    ==1?

    Prientf (tMatch = %c: appeared %d times,

    str1, i)

    Fclose (f2)

    If!

    Feof

    (f1)

    Prientf (End of the program..);

    Fclose (f1)

    4

    Stop

    Yes

    Yes

    No

    YesNo

    No

  • 7/22/2019 Measurement of Software Similarity

    31/46

    3

    #include #include #include void main()

    { FILE *f1,*f2;char ch,*fn1,*fn2,str1,str2,*match;int i,len,flag,c,f;clrscr();printf(\n\t Enter 1st file name with extension: );gets(fn1);if((f1=fopen(fn1,r))==NULL){

    printf(Cannot open first file.\n);getch();

    return;}else

    printf(%s File is opend,fn1);fflush(stdin);printf(\n\tEnter 2nd file name with extension: );gets(fn2);c=0;

    while (!feof (f1)){

    Str1=fgetc(f1);F=0;For(i=0;i

  • 7/22/2019 Measurement of Software Similarity

    32/46

    32

    getch();Return;

    }if(f==0){Flag=0;

    i=0;While(!feof(f2));{

    str2=fgetc(f2);if(str1==str2 && str1>=0)

    {flag=1;i++;

    }

    }if(flag==1)

    {printf(\n\n\tMatch = %c; appeared %d times,str1,i);fclose(f2);

    }}printf(\n\n\t\tEnd of the program.);fclose(f1);

    fflush(stdin);getch();}

    4.3 Flow chart of the string Matching

    Syamaprasad Institute Of Technology & Managemen

    Start

    Input fn1, fn2

    C = 0

    Fopen (fn1)

  • 7/22/2019 Measurement of Software Similarity

    33/46

    33

    Yes

    Syamaprasad Institute Of Technology & Managemen

    i = o: str1= NULL: ch= fgetc = (f1)

    =o

    Str1 [i] = ch

    i= i+1

    ch = fgetc (f1)

    If ch! =

    NULL &&

    ch! =\n &&

    ch! = EOF?

    Fopen (f2)

    i = o: str1= NULL: ch= fgetc = (f2)

    =o

    Step2[i] =ch : ch=fgetc(f2): I++

    =o If ch! = NULL&& ch! =\n

    && ch! = EOF?

    2

    1

    If

    STR

    =st2?

    1

    Printf ("\n\n\match = %s", str2);

    Yes

    No

  • 7/22/2019 Measurement of Software Similarity

    34/46

    34

    4.4 Program of the String Matching

    #include#include#include

    void main()

    Syamaprasad Institute Of Technology & Managemen

    ! Feof

    (f2) ?

    Fclose (f2)

    ! Feof

    (f1)?2

    3

    Printf (End of the program..)

    Fclose (f1)

    Stop

    Yes

    No

    No

    Yes

  • 7/22/2019 Measurement of Software Similarity

    35/46

    3

    {FILE *f1, *f2;char ch,*fn1,*fn2,*str1,*str2;int i,len;clrscr();printf("\n\tEnter 1st file name with extension : ");

    i=0;gets(fn1);if ((f1 = fopen(fn1, "r")) == NULL)

    {printf("Cannot open first file.\n");getch();return ;

    }else

    printf("%s file is opened",fn1);fflush(stdin);printf("\n\tEnter 2nd file name with extension : ");i=0;gets(fn2);while (!feof(f1)){i=0;str1="";

    ch=fgetc(f1);while(ch!=' ' && ch!='\n' && ch!=EOF){str1[i]=ch;ch=fgetc(f1);i++;

    }str1[i]='\0';printf("\nfile 1 string : %s",str1);

    if ((f2=fopen(fn2, "r+")) == NULL){

    printf("Cannot open second file.\n");getch();return ;

    }while (!feof(f2)){

    Syamaprasad Institute Of Technology & Managemen

  • 7/22/2019 Measurement of Software Similarity

    36/46

    36

    i=0;str2="";ch=fgetc(f2);while(ch!=' ' && ch!='\n' && ch!=EOF){

    str2[i]=ch;

    ch=fgetc(f2);i++;

    }str2[i]='\0';printf("\nFile 2 string : %s ",str2);if(strcmp(str1,str2)==0)printf("\n\n\tMatch = %s",str2);

    }fclose(f2);

    }printf("\n\n\t\tEnd of the program....");fclose(f1);fflush(stdin);getch();

    }

    5.1 Necessary hardware

    The project is designed so that it is compatible with server basmachine with operating system Windows (XP, Windows server 2000)Moreover the project is being computerized, as because computerizedsystem are worth-mentioning.

    The hardware requirements for the project are follows: -

    Motherboard-Intel OriginalProcessor-core 2 QuadOperating System- Windows server 2000RAM-DDR3 4GB 800 MHzHDD 1 TB

    Syamaprasad Institute Of Technology & Managemen

  • 7/22/2019 Measurement of Software Similarity

    37/46

    37

    5.2 Necessary software

    Operating System Windows XP

    Software - Turbo C++: full installation minimum 5MB

    Compiler.

    6. AdvantagesMoss (for a Measurement Of Software Similarity) is an automati

    system for determining the similarity of programrs.The system allows for a variety of more complicated situations

    For example, it allows for a base file. The base file might be a programoutline or partial solution handed out by the instructor.

    MOSS makes it easy to examine the corresponding portions of aprogram pair. Clicking on a program pair in the results summary bringup side-by-side frames containing the program sources.

    MOSS just as easily uncovers more sophisticated attempts acheating. Multiple distinct similar sections separated by sections witdifferences are still found and given color-coded highlighting.

    Traditional attempts to detect plagiarism have been ad-hoctypically involving manual checking, of programming assignments, foplagiarism.

    There was strongly a need for more sophisticated mechanismwhich would automate the task to a large extent as well as detect fairlycomplex attempts of copies.

    Syamaprasad Institute Of Technology & Managemen

  • 7/22/2019 Measurement of Software Similarity

    38/46

    3

    7. FUTURE Scope

    The future scope of the project is that it enables to detecseveral software which are very much similar to each other. Hence i

    helps to increase the efficiency of the project. Hence it helps to preventhe duplicacy of any software. Analogy software estimation is based oassumption. Similar software projects have similar software effort. Buincomplete and noisy data, measurement and similarity assessmenuncertainty, complex interaction between attributes, data type ordinaand nominal scale.

    8.Problems

    Two projects that may seem similar may indeed be different ia critical way. The uncertainty in assessing similarities and differencemeans that two different estimators could significantly develop differenviews and effort estimates.

    The uncertainty stem form:Data collection tool.

    The type of information available.Attribute measurement.Skill of estimator.

    Syamaprasad Institute Of Technology & Managemen

  • 7/22/2019 Measurement of Software Similarity

    39/46

    39

    8. REFERENCE

    Help with Cheating

    Plagiarism.org includes software to detect plagiarism and allowa free trial. http://www.plagiarism.org/

    Plagiarism.com is more plagiarism software, also has a sedetection test (http://www.plagiarism.com/self.detect.htm) to hel

    students spot plagiarism in their work. http://www.plagiarism.com/

    Plagiarism Webliography for Faculty

    An extensive list of the websites, resources and detection toolshttp://www.utpb.edu/library/plagiarism.htmlMOSS (Measure of Software Similarity) Detects plagiarism i

    programming classes http://theory.stanford.edu/~aiken/moss/

    2. Word Check Systems "checks keyword uses and keywordfrequencies in electronic documents and presents a "percentage omatch" between compared data.http://www.wordchecksystems.com/

    Cheat Sites

    1.Direct Essays: http://directessays.com/

    Syamaprasad Institute Of Technology & Managemen

    http://www.plagiarism.org/http://www.plagiarism.com/http://www.wordchecksystems.com/http://www.plagiarism.org/http://www.plagiarism.com/http://www.wordchecksystems.com/
  • 7/22/2019 Measurement of Software Similarity

    40/46

    40

    A 1 Term Paper: http://www.a1-termpaper.com/

    Fast Papers: http://www.fastpapers.com/

    Student Network Resources: http://www.snrinfo.com/

    Schoolsucks: http://www.schoolsucks.com/

    Cheathouse: http://www.cheathouse.com/

    EZwrite: http://www.ezwrite.com/

    Term Papers on File: http://www.termpapers-on-file.com/

    Research Assistance: http://www.research-assistance.com/^ J. MacQueen, 1967^ Yi Lu, Shiyong Lu, Farshad Fotouhi, Youping Deng, and Susa

    Brown, "FGKA: A Fast Genetic K-means Algorithm", in Proc. of the 19tACM Symposium on Applied Computing, pp. 162-163, Nicosia, CyprusMarch, 2004.

    ^ Yi Lu, Shiyong Lu, Farshad Fotouhi, Youping Deng, and SusaBrown, "Incremental Genetic K-means Algorithm and its Application i

    Gene Expression Data Analysis", BMC Bioinformatics, 5(172), 2004.^ Bezdek, James C. (1981), Pattern Recognition with FuzzyObjective Function Algorithms, ISBN0306406713

    ^Google News personalization: scalable online collaborativfiltering

    ^ Basak S.C., Magnuson V.R., Niemi C.J., Regal R.R. "DetermingStructural Similarity of Chemicals Using Graph Theoretic Indices". Discr

    Appl. Math., 19, 1988: 17-44.^ E. B. Fowlkes & C. L. Mallows (September 1983). "A Method

    for Comparing Two Hierarchical Clusterings".Journal of the American

    Statistical Association78(384): 553584. doi:10.2307/2288117.^ Alexander Kraskov, Harald Stgbauer, Ralph G. Andrzejak

    and Peter Grassberger, "Hierarchical Clustering Based on MutuaInformation", (2003)ArXiv q-bio/0311039

    ^ David J. Marchette, Random Graphs for Statistical PatternRecognition, Wiley-Interscience, 2004.

    ^ Jiyeon Choo, Rachsuda Jiamthapthaksin, Chun-sheng ChenOner Ulvi Celepcikay, Christian Giusti, and Christoph F. Eick, "MOSAIC: A

    Syamaprasad Institute Of Technology & Managemen

    http://www.a1-termpaper.com/http://www.ezwrite.com/http://www.termpapers-on-file.com/http://www.research-assistance.com/http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-0%23cite_ref-0http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-1%23cite_ref-1http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-2%23cite_ref-2http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-Bezdek1981_3-0%23cite_ref-Bezdek1981_3-0http://en.wikipedia.org/wiki/International_Standard_Book_Numberhttp://en.wikipedia.org/wiki/Special:BookSources/0306406713http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-4%23cite_ref-4http://www2007.org/program/paper.php?id=570http://www2007.org/program/paper.php?id=570http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-5%23cite_ref-5http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-6%23cite_ref-6http://en.wikipedia.org/wiki/Journal_of_the_American_Statistical_Associationhttp://en.wikipedia.org/wiki/Journal_of_the_American_Statistical_Associationhttp://en.wikipedia.org/wiki/Digital_object_identifierhttp://dx.doi.org/10.2307%2F2288117http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-7%23cite_ref-7http://arxiv.org/abs/q-bio/0311039http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-8%23cite_ref-8http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-9%23cite_ref-9http://www.a1-termpaper.com/http://www.ezwrite.com/http://www.termpapers-on-file.com/http://www.research-assistance.com/http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-0%23cite_ref-0http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-1%23cite_ref-1http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-2%23cite_ref-2http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-Bezdek1981_3-0%23cite_ref-Bezdek1981_3-0http://en.wikipedia.org/wiki/International_Standard_Book_Numberhttp://en.wikipedia.org/wiki/Special:BookSources/0306406713http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-4%23cite_ref-4http://www2007.org/program/paper.php?id=570http://www2007.org/program/paper.php?id=570http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-5%23cite_ref-5http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-6%23cite_ref-6http://en.wikipedia.org/wiki/Journal_of_the_American_Statistical_Associationhttp://en.wikipedia.org/wiki/Journal_of_the_American_Statistical_Associationhttp://en.wikipedia.org/wiki/Digital_object_identifierhttp://dx.doi.org/10.2307%2F2288117http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-7%23cite_ref-7http://arxiv.org/abs/q-bio/0311039http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-8%23cite_ref-8http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-9%23cite_ref-9
  • 7/22/2019 Measurement of Software Similarity

    41/46

    4

    proximity graph approach for agglomerative clustering," Proceedings 9thInternational Conference on Data Warehousing and KnowledgeDiscovery (DaWaK), Regensbug Germany, September 2007.

    .Clatworthy, J., Buick, D., Hankins, M., Weinman, J., & Horne, R(2005). The use and reporting of cluster analysis in health psychology: Areview. British Journal of Health Psychology10: 329-358.

    Cole, A. J. & Wishart, D. (1970). An improved algorithm for thJardine-Sibson method of generating overlapping clusters. The ComputeJournal 13(2):156-163.

    Ester, M., Kriegel, H.P., Sander, J., and Xu, X. 1996. A densitybased algorithm for discovering clusters in large spatial databases witnoise. Proceedings of the 2nd International Conference on KnowledgDiscovery and Data Mining, Portland, Oregon, USA: AAAI Press, pp. 226231.

    Heyer, L.J., Kruglyak, S. and Yooseph, S., Exploring Expressio

    Data: Identification and Analysis of Coexpressed Genes, GenomeResearch 9:1106-1115.

    S. Kotsiantis, P. Pintelas, Recent Advances in Clustering: A BrieSurvey, WSEAS Transactions on Information Science and ApplicationsVol 1, No 1 (73-81), 2004.

    Huang, Z. (1998). Extensions to the K-means Algorithm foClustering Large Datasets with Categorical Values. Data Mining andKnowledge Discovery, 2, p. 283-304.

    Wong, W., Liu, W. & Bennamoun, M. Tree-Traversing An

    Algorithm for Term Clustering based on Featureless Similarities. In: DatMining and Knowledge Discovery, Volume 15, Issue 3, Pages 349381. doi: 10.1007/s10618-007-0073-y. A demo of this term clusteringalgorithm is available here

    Jardine, N. & Sibson, R. (1968). The construction of hierarchiand non-hierarchic classifications. The Computer Journal 11:177.

    The on-line textbook: Information Theory, Inference, anLearning Algorithms, by David J.C. MacKay includes chapters on k-meanclustering, soft k-means clustering, and derivations including the E-Malgorithm and the variational view of the E-M algorithm.

    MacQueen, J. B. (1967). Some Methods for classification anAnalysis of Multivariate Observations, Proceedings of 5-th BerkeleSymposium on Mathematical Statistics and Probability, BerkeleyUniversity of California Press, 1:281-297

    Ng, R.T. and Han, J. 1994. Efficient and effective clusterinmethods for spatial data mining. Proceedings of the 20th VLDConference, Santiago, Chile, pp. 144155.

    Syamaprasad Institute Of Technology & Managemen

    http://dx.doi.org/10.1007/s10618-007-0073-yhttp://explorer.csse.uwa.edu.au/research/algorithm_tta.plhttp://www.inference.phy.cam.ac.uk/mackay/itila/http://www.inference.phy.cam.ac.uk/mackay/itila/http://en.wikipedia.org/wiki/David_J.C._MacKayhttp://dx.doi.org/10.1007/s10618-007-0073-yhttp://explorer.csse.uwa.edu.au/research/algorithm_tta.plhttp://www.inference.phy.cam.ac.uk/mackay/itila/http://www.inference.phy.cam.ac.uk/mackay/itila/http://en.wikipedia.org/wiki/David_J.C._MacKay
  • 7/22/2019 Measurement of Software Similarity

    42/46

    42

    Prinzie A., D. Van den Poel (2006), Incorporating sequentiainformation into traditional classification models by using aelement/position-sensitive SAM. Decision Support Systems 42 (2): 508526.

    Rivera, C. G., Vakil, R. M. & Bader, J. S. NeMo: Network Moduleidentification in Cytoscape. BMC Bioinformatics 2010, 11(Supp

    1):S61.doi: 10.1186/1471-2105-11-S1-S61. The plugin can bdownloaded in Cytoscape or here.

    Romesburg, H. Clarles, Cluster Analysis for Researchers, 2004340 pp. ISBN 1-4116-0617-5, reprint of 1990 edition published by KriegePub. Co... A Japanese language translation is available from UchidRokakuho Publishing Co., Ltd., Tokyo, Japan.

    Sheppard, A. G. (1996). The sequence of factor analysis ancluster analysis: Differences in segmentation and dimensionality throughthe use of raw and factor scores. Tourism Analysis, 1(Inaugural Volume)

    49-57.Sergios Theodoridis, Konstantinos Koutroumbas (2009) "Patter

    Recognition" , 4th Edition, Academic Press, ISBN: 978-1-59749-272-0.Zhang, T., Ramakrishnan, R., and Livny, M. 1996. BIRCH: A

    efficient data clustering method for very large databases. Proceedings oACM SIGMOD Conference, Montreal, Canada, pp. 103114.

    Nguyen Xuan Vinh, Epps, J. and Bailey, J., 'Information TheoretiMeasures for Clusterings Comparison: Is a Correction for ChancNecessary?', in Procs. the 26th International Conference on Machin

    Learning (ICML'09).Jianbo Shi and Jitendra Malik, "Normalized Cuts and ImagSegmentation", IEEE Transactions on Pattern Analysis and MachinIntelligence, 22(8), 888-905, August 2000. Available onJitendra Malik'homepage

    Marina Meila and Jianbo Shi, "Learning Segmentation witRandom Walk", Neural Information Processing Systems, NIPS, 2001Available fromJianbo Shi's homepage

    see referenced articles atLuigidragone.comKernel MDL to Determine the Number of Clusters,MLDM, pp

    203-217, 2007.Stan Salvador and Philip Chan, Determining the Number o

    Clusters/Segments in Hierarchical Clustering/Segmentation AlgorithmsProc. 16th IEEE Intl. Conf. on Tools with AI, pp. 576584, 2004.

    Can, F., Ozkarahan, E. A. (1990) "Concepts and effectiveness othe cover coefficient-based clustering methodology for text databases.ACM Transactions on Database Systems. 15 (4) 483-517

    Aldenderfer, M.S., Blashfield, R.K, Cluster Analysis, (1984), Newbury Par(CA): Sage.

    Syamaprasad Institute Of Technology & Managemen

    http://econpapers.repec.org/paper/rugrugwps/05_2F292.htmhttp://econpapers.repec.org/paper/rugrugwps/05_2F292.htmhttp://econpapers.repec.org/paper/rugrugwps/05_2F292.htmhttp://econpapers.repec.org/paper/rugrugwps/05_2F292.htmhttp://www.biomedcentral.com/1471-2105/11/S1/S61/abstracthttp://baderlab.bme.jhu.edu/baderlab/index.php/NeMohttp://en.wikipedia.org/wiki/Special:BookSources/1411606175http://en.wikipedia.org/w/index.php?title=Krieger_Pub._Co.&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Krieger_Pub._Co.&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Uchida_Rokakuho_Publishing_Co.&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Uchida_Rokakuho_Publishing_Co.&action=edit&redlink=1http://www.cs.berkeley.edu/~malik/malik-pubs-ptrs.htmlhttp://www.cs.berkeley.edu/~malik/malik-pubs-ptrs.htmlhttp://www.cis.upenn.edu/~jshi/jshi_publication.htmhttp://www.luigidragone.com/datamining/spectral-clustering.html#referenceshttp://www.tsi.enst.fr/~kyrgyzov/publications.htmlhttp://www.springerlink.com/content/j646uqx4p435j530/http://www.springerlink.com/content/j646uqx4p435j530/http://cs.fit.edu/~pkc/papers/ictai04salvador.pdfhttp://cs.fit.edu/~pkc/papers/ictai04salvador.pdfhttp://econpapers.repec.org/paper/rugrugwps/05_2F292.htmhttp://econpapers.repec.org/paper/rugrugwps/05_2F292.htmhttp://econpapers.repec.org/paper/rugrugwps/05_2F292.htmhttp://www.biomedcentral.com/1471-2105/11/S1/S61/abstracthttp://baderlab.bme.jhu.edu/baderlab/index.php/NeMohttp://en.wikipedia.org/wiki/Special:BookSources/1411606175http://en.wikipedia.org/w/index.php?title=Krieger_Pub._Co.&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Krieger_Pub._Co.&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Uchida_Rokakuho_Publishing_Co.&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Uchida_Rokakuho_Publishing_Co.&action=edit&redlink=1http://www.cs.berkeley.edu/~malik/malik-pubs-ptrs.htmlhttp://www.cs.berkeley.edu/~malik/malik-pubs-ptrs.htmlhttp://www.cis.upenn.edu/~jshi/jshi_publication.htmhttp://www.luigidragone.com/datamining/spectral-clustering.html#referenceshttp://www.tsi.enst.fr/~kyrgyzov/publications.htmlhttp://www.springerlink.com/content/j646uqx4p435j530/http://www.springerlink.com/content/j646uqx4p435j530/http://cs.fit.edu/~pkc/papers/ictai04salvador.pdfhttp://cs.fit.edu/~pkc/papers/ictai04salvador.pdf
  • 7/22/2019 Measurement of Software Similarity

    43/46

    43

    [edit]External links

    This article's use ofexternal links may not follow Wikipedia'spolicies orguidelines

    Pleaseimprove this article by removing excessive and inappropriate external links or by converting linkintofootnote references. (May 2009)Citeseerx.ist.psu.edu, When Is "Nearest Neighbor" Meaningful?

    P. Berkhin, Citeseer.ist.psu.edu Survey of Clustering Data Mining Techniques, Accru

    Software, 2002.Jain, Murty and Flynn: Citeseer.ist.psu.edu Data Clustering: A Review, ACM Comp. Surv

    1999.for another presentation of hierarchical, k-means and fuzzy c-means see the introduction t

    clustering on home.dei.polimi.it. It also has an explanation on mixture ofGaussians.David Dowe, csse.monash.edu.au, Mixture Modelling page - other clustering and mixtur

    model links.

    Gauss.nmsu.edu, A tutorial on clustering.Inference.phy.cam.ac.uk, The on-line textbook: Information Theory, Inference, an

    Learning Algorithms, byDavid J.C. MacKay includes chapters on k-means clustering, soft k-mean

    clustering, and derivations including the E-M algorithm and the variational view of the E-M algorithm.People.revoledu.com, Numerical example of Hierarchical Clustering.

    Cran.r-project.org, kernlab - R package for kernel based machine learning (include

    spectral clustering implementation)Home.dei.polimi.it - Tutorial with introduction of Clustering Algorithms (k-means, fuzzy

    c-means, hierarchical, mixture of gaussians) + some interactive demos (java applets)

    Data Mining Software at the Open Directory Project

    Machine Learning Software at the Open Directory ProjectHomepages.feis.herts.ac.ukJava, Competitive Learning Application, a suite o

    Unsupervised Neural Networks for clustering. Written in Java. Complete with all source code.

    Factominer.free.fr, FactoMineR (free exploratory multivariate data analysis software linketo R)

    AI4r.rubyforge.org, Data clustering algorithms implementation in Ruby (AI4R)

    PMML Representation - Standard way to represent clustering models.

    1C. Alexander. Notes on the Synthesis of Form. Harvard U. Press, 1964. 2Edward B. Allen , Sampat

    Gottipati , Rajiv Govindarajan, Measuring size, complexity, and coupling of hypergraph abstractions osoftware: An information-theory approach, Software Quality Control, v.15 n.2, p.179-212, Jun

    2007 [doi>10.1007/s11219-006-9010-3] 3Tom Arbuckle, Visually Summarising Software ChangProceedings of the 2008 12th International Conference Information Visualisation, p.559-568, July 09-11

    2008 [doi>10.1109/IV.2008.58] 4T. Arbuckle, A. Balaban, D. K. Peters, and M. Lawford. Softwar

    documents: Comparison and measurement. In Proc. 18th Int. Conf. on Software Eng.&Knowledge Engpages 740--745, July 2007. 5C. H. Bennett, P. Gcs, M. Li, P. Vitnyi, and W. H. Zurek. Informatio

    distance. IEEE Trans. Information Theory, 44(4):1407--1423, 1998. 6M. Cebrin, M. Alfonseca, and A

    Ortega. Common pitfalls using the normalized compression distance: What to watch out for in compressor. Comms. Info. Sys., 5(4):367--384, 2005. 7Gregory J. Chaitin, On the Length of Program

    Syamaprasad Institute Of Technology & Managemen

    http://en.wikipedia.org/w/index.php?title=Cluster_analysis&action=edit&section=24http://en.wikipedia.org/wiki/Wikipedia:External_linkshttp://en.wikipedia.org/wiki/Wikipedia:What_Wikipedia_is_not#Wikipedia_is_not_a_mirror_or_a_repository_of_links.2C_images.2C_or_media_fileshttp://en.wikipedia.org/wiki/Wikipedia:What_Wikipedia_is_not#Wikipedia_is_not_a_mirror_or_a_repository_of_links.2C_images.2C_or_media_fileshttp://en.wikipedia.org/wiki/Wikipedia:External_linkshttp://en.wikipedia.org/wiki/Wikipedia:External_linkshttp://en.wikipedia.org/w/index.php?title=Cluster_analysis&action=edithttp://en.wikipedia.org/wiki/Wikipedia:Citing_sourceshttp://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.1422http://citeseer.ist.psu.edu/berkhin02survey.htmlhttp://citeseer.ist.psu.edu/jain99data.htmlhttp://home.dei.polimi.it/matteucc/Clustering/tutorial_html/index.htmlhttp://en.wikipedia.org/wiki/Normal_distributionhttp://en.wikipedia.org/wiki/Normal_distributionhttp://www.csse.monash.edu.au/~dld/cluster.htmlhttp://gauss.nmsu.edu/~lludeman/video/ch6pr.htmlhttp://www.inference.phy.cam.ac.uk/mackay/itila/http://en.wikipedia.org/wiki/David_J.C._MacKayhttp://en.wikipedia.org/wiki/David_J.C._MacKayhttp://people.revoledu.com/kardi/tutorial/Clustering/index.htmlhttp://cran.r-project.org/web/packages/kernlab/index.htmlhttp://home.dei.polimi.it/matteucc/Clustering/tutorial_html/http://www.dmoz.org/Computers/Software/Databases/Data_Mining/Public_Domain_Software/http://en.wikipedia.org/wiki/Open_Directory_Projecthttp://www.dmoz.org/Artificial_Intelligence/Machine_Learning/Software/http://en.wikipedia.org/wiki/Open_Directory_Projecthttp://homepages.feis.herts.ac.uk/~nngroup/software.phphttp://factominer.free.fr/http://en.wikipedia.org/wiki/R_programming_languagehttp://ai4r.rubyforge.org/index.htmlhttp://www.dmg.org/v4-0/ClusteringModel.htmlhttp://portal.acm.org/citation.cfm?id=1232687&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1232687&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1232687&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1232687&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1007/s11219-006-9010-3http://portal.acm.org/citation.cfm?id=1440202&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1440202&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1440202&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1109/IV.2008.58http://portal.acm.org/citation.cfm?id=321506&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://en.wikipedia.org/w/index.php?title=Cluster_analysis&action=edit&section=24http://en.wikipedia.org/wiki/Wikipedia:External_linkshttp://en.wikipedia.org/wiki/Wikipedia:What_Wikipedia_is_not#Wikipedia_is_not_a_mirror_or_a_repository_of_links.2C_images.2C_or_media_fileshttp://en.wikipedia.org/wiki/Wikipedia:External_linkshttp://en.wikipedia.org/w/index.php?title=Cluster_analysis&action=edithttp://en.wikipedia.org/wiki/Wikipedia:Citing_sourceshttp://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.1422http://citeseer.ist.psu.edu/berkhin02survey.htmlhttp://citeseer.ist.psu.edu/jain99data.htmlhttp://home.dei.polimi.it/matteucc/Clustering/tutorial_html/index.htmlhttp://en.wikipedia.org/wiki/Normal_distributionhttp://www.csse.monash.edu.au/~dld/cluster.htmlhttp://gauss.nmsu.edu/~lludeman/video/ch6pr.htmlhttp://www.inference.phy.cam.ac.uk/mackay/itila/http://en.wikipedia.org/wiki/David_J.C._MacKayhttp://people.revoledu.com/kardi/tutorial/Clustering/index.htmlhttp://cran.r-project.org/web/packages/kernlab/index.htmlhttp://home.dei.polimi.it/matteucc/Clustering/tutorial_html/http://www.dmoz.org/Computers/Software/Databases/Data_Mining/Public_Domain_Software/http://en.wikipedia.org/wiki/Open_Directory_Projecthttp://www.dmoz.org/Artificial_Intelligence/Machine_Learning/Software/http://en.wikipedia.org/wiki/Open_Directory_Projecthttp://homepages.feis.herts.ac.uk/~nngroup/software.phphttp://factominer.free.fr/http://en.wikipedia.org/wiki/R_programming_languagehttp://ai4r.rubyforge.org/index.htmlhttp://www.dmg.org/v4-0/ClusteringModel.htmlhttp://portal.acm.org/citation.cfm?id=1232687&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1232687&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1232687&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1007/s11219-006-9010-3http://portal.acm.org/citation.cfm?id=1440202&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1440202&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1440202&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1109/IV.2008.58http://portal.acm.org/citation.cfm?id=321506&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483
  • 7/22/2019 Measurement of Software Similarity

    44/46

    44

    for Computing Finite Binary Sequences: statistical considerations, Journal of the ACM (JACM), v.16 n.

    p.145-159, Jan. 1969 [doi>10.1145/321495.321506] 8Robert Noyes Chanon, On a measure o

    program structure., Carnegie Mellon University, Pittsburgh, PA, 1974 9R. N. Chanon, On a measure oprogram structure, Programming Symposium, Proceedings Colloque sur la Programmation, p.9-16, Apr

    09-11, 1974 10S. R. Chidamber , C. F. Kemerer, A Metrics Suite for Object Oriented Design, IEE

    Transactions on Software Engineering, v.20 n.6, p.476-493, June 1994 [doi>10.1109/32.295895] 11RCilibrasi. The CompLearn Toolkit. {Online} http://complearn.sourceforge.net/, 2003. 12R. Cilibrasi and P

    Vitnyi. Clustering by compression. IEEE Trans. Information Theory, 51(4):1523--1545, Apr2005. 13David Clark , Sebastian Hunt , Pasquale Malacaria, Quantitative Information Flow, Relation

    and Polymorphic Types, Journal of Logic and Computation, v.15 n.2, p.181-199, Apr2005 [doi>10.1093/logcom/exi009] 14Norman Fenton, When a software measure is not a measur

    Software Engineering Journal, v.7 n.5, p.357-362, Sept., 1992 15Maurice H. Halstead, Elements o

    Software Science (Operating and programming systems series), Elsevier Science Inc., New York, NY1977 16Mark Harman, The Current State and Future of Search Based Software Engineering, 200

    Future of Software Engineering, p.342-357, May 23-25, 2007 [doi>10.1109/FOSE.2007.29] 17Mar

    Harman, Search Based Software Engineering for Program Comprehension, Proceedings of the 15th IEEInternational Conference on Program Comprehension, p.3-13, June 26-2

    2007 [doi>10.1109/ICPC.2007.35] 18L. Hellerman, A Measure of Computational Work, IEE

    Transactions on Computers, v.21 n.5, p.439-446, May 1972 [doi>10.1109/T-C.1972.223539] 19MJackson. The Name and Nature of Software Engineering, pages 1--38. LNCS. 2008. 20Dennis KafurA survey of software metrics, Proceedings of the 1985 ACM annual conference on The range of computin

    : mid-80's perspective: mid-80's perspective, p.502-506, October 1985, Denver, Colorado, Unite

    States [doi>10.1145/320435.320583] 21Huzefa Kagdi , Michael L. Collard , Jonathan I. Maletic, Asurvey and taxonomy of approaches for mining software repositories in the context of software evolution

    Journal of Software Maintenance and Evolution: Research and Practice, v.19 n.2, p.77-131, Marc

    2007 [doi>10.1002/smr.344] 22T. M. Khoshgoftaar and E. B. Allen. Applications of informatiotheory to software engineering measurement. Software Quality Journal, 3(2):79--103, June 1994. 23A. N

    Kolmogorov. Three approaches to the quantitative definition of information. Probl. Inform. Trans., 1(1):1

    7, 1965. 24G. Kroah-Hartman and K. Sievers. udev. {Online} http://www.kernel.org/, 2003. 25M. Li, X

    Chen, X. Li, B. Ma, and P. Vitnyi. The similarity metric. IEEE Trans. Information Theory, 50(12):32503264, 2004. 26Rudi Lutz, Evolving good hierarchical decompositions of complex systems, Journal o

    Systems Architecture: the EUROMICRO Journal, v.47 n.7, p.613-634, July 2001 [doi>10.1016/S1383

    7621(01)00019-4] 27Thomas J. McCabe, A complexity measure, Proceedings of the 2nd internationconference on Software engineering, p.407, October 13-15, 1976, San Francisco, California, Unite

    States 28Stephen McCamant , Michael D. Ernst, Quantitative information flow as network flow

    capacity, Proceedings of the 2008 ACM SIGPLAN conference on Programming language design animplementation, June 07-13, 2008, Tucson, AZ, USA [doi>10.1145/1375581.1375606] 29Stacy

    Prowell , Jesse H. Poore, Foundations of Sequence-Based Software Specification, IEEE Transactions o

    Software Engineering, v.29 n.5, p.417-429, May 2003 [doi>10.1109/TSE.2003.1199071] 30C. EShannon. A mathematical theory of communication. The Bell System Technical Journal, 27:379--423 an

    623--656, 1948. 31H. A. Simon and A. Ando. Aggregation of variables in dynamic systems. Econometrica29:111--138, 1961. 32R. J. Solomonoff. A formal theory of inductive inference. part I and part I

    Information and Control, 7(1 and 2):1--22 and 224--254, 1964. 33M. H. van Emden. Hierarchicdecomposition of complexity. Machine Intelligence, 5:361--380, 1969. 34M. H. van Emden. An Analys

    of Complexity. PhD thesis, Mathematisches Zentrum, Amsterdam, 1971. 35Horst Zuse, Softwar

    Complexity: Measures and Methods, Walter de Gruyter & Co., Hawthorne, NJ, 1990

    Syamaprasad Institute Of Technology & Managemen

    http://portal.acm.org/citation.cfm?id=321506&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=321506&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://doi.acm.org/10.1145/321495.321506http://portal.acm.org/citation.cfm?id=906949&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=906949&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=721517&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=721517&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=721517&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=631131&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=631131&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1109/32.295895http://portal.acm.org/citation.cfm?id=1094516&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1094516&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1094516&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1093/logcom/exi009http://portal.acm.org/citation.cfm?id=146592&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=146592&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=540137&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=540137&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=540137&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1254729&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1254729&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1109/FOSE.2007.29http://portal.acm.org/citation.cfm?id=1271341&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1271341&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1271341&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1271341&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1109/ICPC.2007.35http://portal.acm.org/citation.cfm?id=1310458&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1310458&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1109/T-C.1972.223539http://portal.acm.org/citation.cfm?id=320583&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=320583&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=320583&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=320583&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://doi.acm.org/10.1145/320435.320583http://portal.acm.org/citation.cfm?id=1345057&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1345057&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1345057&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1345057&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1002/smr.344http://portal.acm.org/citation.cfm?id=543292&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=543292&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1016/S1383-7621(01)00019-4http://dx.doi.org/10.1016/S1383-7621(01)00019-4http://portal.acm.org/citation.cfm?id=807712&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=807712&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=807712&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1375606&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1375606&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1375606&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://doi.acm.org/10.1145/1375581.1375606http://portal.acm.org/citation.cfm?id=776809&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=776809&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=776809&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1109/TSE.2003.1199071http://portal.acm.org/citation.cfm?id=533784&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=533784&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=321506&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=321506&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://doi.acm.org/10.1145/321495.321506http://portal.acm.org/citation.cfm?id=906949&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=906949&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=721517&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=721517&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=721517&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=631131&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=631131&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1109/32.295895http://portal.acm.org/citation.cfm?id=1094516&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1094516&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1094516&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1093/logcom/exi009http://portal.acm.org/citation.cfm?id=146592&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=146592&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=540137&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=540137&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=540137&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1254729&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1254729&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1109/FOSE.2007.29http://portal.acm.org/citation.cfm?id=1271341&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1271341&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1271341&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1109/ICPC.2007.35http://portal.acm.org/citation.cfm?id=1310458&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1310458&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1109/T-C.1972.223539http://portal.acm.org/citation.cfm?id=320583&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=320583&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=320583&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://doi.acm.org/10.1145/320435.320583http://portal.acm.org/citation.cfm?id=1345057&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1345057&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1345057&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1002/smr.344http://portal.acm.org/citation.cfm?id=543292&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=543292&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1016/S1383-7621(01)00019-4http://dx.doi.org/10.1016/S1383-7621(01)00019-4http://portal.acm.org/citation.cfm?id=807712&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=807712&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=807712&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1375606&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1375606&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1375606&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://doi.acm.org/10.1145/1375581.1375606http://portal.acm.org/citation.cfm?id=776809&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=776809&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=776809&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1109/TSE.2003.1199071http://portal.acm.org/citation.cfm?id=533784&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=533784&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483
  • 7/22/2019 Measurement of Software Similarity

    45/46

    4

    Conclusion

    Our project namely is developed on C programme code. At first we are developing a

    programme based on String Similarity, that checks two strings in two separate array, if they are similar onot. When this is done successfully, the same checking is done on two separate files using C programm

    code.

    Syamaprasad Institute Of Technology & Managemen

  • 7/22/2019 Measurement of Software Similarity

    46/46

    46