28
Requirement Analysis THE STAT PROJECT Milestone 1 Report

STAT Requirement Analysis

  • Upload
    stat

  • View
    319

  • Download
    1

Embed Size (px)

Citation preview

Page 1: STAT Requirement Analysis

Requirement AnalysisTHE STAT PROJECTTHE STAT PROJECT

Milestone 1 Report

Page 2: STAT Requirement Analysis

To design a framework, how many variations we need to protect? How many

functionalities we need to provide for supporting all these variations?

QUESTIONSQUESTIONS

Page 3: STAT Requirement Analysis

Variation for importing dataset (File Sources)

Page 4: STAT Requirement Analysis

Variations for importing dataset (File formats)

Page 5: STAT Requirement Analysis

Variations for importing dataset (Schemas)

Even if we only consider dataset in XML, each dataset may have its own schema.

Page 6: STAT Requirement Analysis

Reuters dataset example

Page 7: STAT Requirement Analysis

Simplified approach

One approach: High Level Reader Class, - ReutersReader- RCV1ReaderOnce written, can be shared by community

One approach: High Level Reader Class, - ReutersReader- RCV1ReaderOnce written, can be shared by community

Observation: for the sake of comparison, researchers usually deal with a few famous dataset (e.g., Reuters, RCV-1)

Page 8: STAT Requirement Analysis

Able to persist and read back memory objects

Page 9: STAT Requirement Analysis

Able to visualize memory objects

Page 10: STAT Requirement Analysis

STAT (brief) Domain Model

Note: We ignore texts on connectors for brevity. Some connections are not drawn because of space limitation Note: We ignore texts on connectors for brevity. Some connections are not drawn because of space limitation

Page 11: STAT Requirement Analysis

STAT framework sample code (conceptual)

Page 12: STAT Requirement Analysis
Page 13: STAT Requirement Analysis

Domain Concept: RawCorpus

A collection of RawDocument, supporting collection operations: - Add new RawDocument element - Remove existing RawDocument element - Accessing elements in the collection - …

Page 14: STAT Requirement Analysis

Domain Concept: RawCorpus

abstract class RawCorpus {List<RawDocument> rawDocuments;RawDocument getDocument(int index);void setDocument(int index, T doc);void removeDocument(int index);

}

Page 15: STAT Requirement Analysis

Domain Concept: RawDocument

An object with one or more string fields, serving as a non-processed, in-memory representation of a document unit - Like Java beans with getter and setter - All fields must be string type, even for numbers

Page 16: STAT Requirement Analysis

Domain Concept: RawDocument

class MyRawDocument extends RawDocument {String title;String author;String body;String date;String numOfClicks;String topicType;…

}

abstract class RawDocument {public RawDocument() {}

}

Page 17: STAT Requirement Analysis

Domain Concept: Processor

An object that processes RawCorpus and produces Corpus. - Linguistic: Tokenizer, Stemmer, StopRemover, PosTagger, … - Machine learning: Feature-specific, document-specific

Page 18: STAT Requirement Analysis

Domain Concept: Corpus

An object representing a collection of Document for use by machine learning side of framework. This object provides a notion of splits which is commonly used (e.g., train, test)

Page 19: STAT Requirement Analysis

Domain Concept: Trainer

A representation of a machine learning algorithm, which can learn from a Corpus and produce a Model.

Page 20: STAT Requirement Analysis

Domain Concept: Model

An object of what machine learning algorithm (i.e., Trainer) creates to store parameters that are "learned" from the data (i.e., Corpus)

Page 21: STAT Requirement Analysis

Domain Concept: Classifier

An object that maps Documents to target values (label, number, probability). It takes a Corpus and a Model as inputs, and produces a Prediction associated with the Corpus according to the Model.

Page 22: STAT Requirement Analysis

Domain Concept: Prediction

A collection of target values (label, number, probability) that associate with a Corpus, i.e., a collection of Document.

Page 23: STAT Requirement Analysis

Domain Concept: Evaluator

An object used for comparing the Prediction against its associated Corpus and generating Evaluation

Page 24: STAT Requirement Analysis

Domain Concept: Evaluation

A representation of evaluation result given by a Evaluator, in a summarized manner.

Page 25: STAT Requirement Analysis

THE STAT PROJECTTHE STAT PROJECT

Thanks

Page 26: STAT Requirement Analysis

CorpusCorpus

ReaderReader ProcessorProcessorRawCorpusRawCorpus

TrainerTrainerModelModel

ClassifierClassifier

PredictionPrediction

EvaluatorEvaluator

EvaluationEvaluation

STAT (brief) Domain Model

Note: We ignore texts on connectors for brevity. Some connections are not drawn because of space limitation Note: We ignore texts on connectors for brevity. Some connections are not drawn because of space limitation

WriterWriter

VocabularyVocabulary

Page 27: STAT Requirement Analysis

CorpusCorpusReaderReader ProcessorProcessorRawCorpusRawCorpus

TrainerTrainer

ModelModelClassifierClassifierPredictionPredictionEvaluatorEvaluator

EvaluationEvaluation WriterWriter

STAT Domain Model

Note: We ignore texts above lines for brevity

Page 28: STAT Requirement Analysis

CorpusCorpus

ReaderReader

ProcessorProcessor

RawCorpusRawCorpus

TrainerTrainerModelModel

ClassifierClassifier

PredictionPrediction

EvaluatorEvaluator

EvaluationEvaluation

STAT Domain Model

Note: We ignore texts above lines for brevity

DocumentDocument

RawDocumentRawDocument