Research Methods for Computational Statistics

Preview:

DESCRIPTION

Lecture notes for STIS students

Citation preview

Metodologi Penelitian Komputasi Statistik

Setia Pramana

Educational Background

Hasselt Universiteit, Belgium, MSc in Applied Statistics 2005-2006.

Hasselt Universiteit, Belgium, MSc in Biostatistics 2006-2007.

Hasselt Universiteit, Belgium, PhD Statistical Bioinformatics, 2007-2011.

Educational Background

Medical Epidemiology And Biostatistics Dept. Karolinska Institutet, Sweden, Postdoctoral, 2011-2014

Biostatistics

The study of statistics as applied to biological areas such as Biological laboratory experiments, medical research (including clinical research), and public health services research.

Biostatistics, far from being an unrelated mathematical science, is a discipline essential to modern medicine – a pillar in its edifice’ (Journal of the American Medical Association (1966)

4

Bioinformatics

Bioinformatics is a science straddling the domains of biomedical, informatics, mathematics and statistics.

Applying computational techniques to biology data

Functional Genomics

Proteomics

Sequence Analysis

Phylogenetic

Etc,.

5

“Informatics” in Bioinformatics

DatabasesBuilding, QueryingObject DB

•Text String ComparisonText Search

Finding PatternsAI / Machine LearningClusteringData mining

etc

6

Current Research

Statistical methods for high-throughput data analyses particularly in Next generation sequencing (NGS) data (Whole genome-seq, Exome-seq and RNA-seq).

RNA microarray expression studies and GWAS in cancer and cardiovascular diseases.

Classification in NGS data.

R-Graphical User Interface (R-GUI) for high-throughput data analyses.

Course Outline

Basic concept Research

Problem identification and hypothesis

Literature Review

Research Design

Quantitative research

Make Scientific report/paper

Survival Data Analysis

9

Course Workload

40% Theory, 60% practice

Group Project (5 students)

Presentation every week

Slides can be seen at : http://www.slideshare.net/hafidztio/

Setia Pramana

Research

An organized, systematic, data-based, critical, objective, scientific inquiry or investigation into a specific problem, undertaken with the purpose of finding answers or solutions to it.

It provides the needed information that guides managers to make informed decisions to successfully deal with problems.

The information provided could be the result of a careful analysis of data gathered firsthand or of data that are already available (in the company, industry, archives, etc.).

Purpose of A Research

Review or synthesize existing knowledge

Investigate existing situations or problems

Provide solutions to problems

Explore and analyze more general issues

Construct or create new procedures or systems

Explain new phenomenon

Generate new knowledge

or a combination of any of the above!

Research Outcome

1. Product or Innovation directly used by Industry

2. Patent

3. International Publication

Types of Research, by Purpose

Basic Research

Applied Research

Evaluation Research

Research and Development

Types of Research, by Time

Cross-Sectional Research

Longitudinal Research

Types of Research, by Method

Quantitative research:Descriptive Correlational researchCausal-comparativeExperimental Single-subject research

Qualitative Research:

Narrative research

Types of Research, by Method

Types of Research

Deductive Reasoning

Starts out with a general statement, or hypothesis, and examines the possibilities to reach a specific, logical conclusion.

The scientific method uses deduction to test hypotheses and theories.

Ex: "All men are mortal. Harold is a man. Therefore, Harold is mortal."

Theory

Hypothesis Observation

Confirmation

Inductive Reasoning

The opposite of deductive reasoning.

Makes broad generalizations from specific observations.

Ex: "Harold is a grandfather. Harold is bald. Therefore, all grandfathers are bald."

TheoryTentative Hypothesis Pattern

Confirmation

Deductive/Inductive Research

Basic Steps

1. Develop a research question

2. Conduct thorough literature review

3. Re-define research question/ hypothesis

4. Design research methodology/study

5. Create research proposal

Basic Steps

6. Apply for funding

7. Apply for ethics approval

8. Collect and analyze data/Software developing and testing

9. Draw conclusions and relate findings

Basic Steps

Research Question Development

Research Question Development

Problem Identification

Limit the research scope

Research Question Identification

Goals Identification

Hypothesis

Statistical Hypothesis

Hypothetical Statement

Building block of Science

Possible Source of RQs

Observational Research

Discussions, brainstorming

Experts, academics and industry

Bibliography, journals, research report, Populas science magazine, etc.

A Research Question Should

Have research value: Original, can be tested/evaluated.

Fisible: Can be answered, data available, cost and can be solved in time.

Match to the researchers qualification

FINER criteria for RQ

F Feasible Adequate number of subjectsAdequate technical expertiseAffordable in time and moneyManageable in scope

I Interesting Getting the answer intrigues investigator, peers and community

N Novel Confirms, refutes or extends previous findings

Hulley S, Cummings S, Browner W, et al. Designing clinical research. 3rd ed. Philadelphia (PA): Lippincott Williams and Wilkins; 2007.

FINER criteria for RQ

E EthicalAmenable to a study that institutional review board will approve

R Relevant To scientific knowledgeTo clinical and health policyTo future research

Hulley S, Cummings S, Browner W, et al. Designing clinical research. 3rd ed. Philadelphia (PA): Lippincott Williams and Wilkins; 2007.

Research Hypothesis

Hypothesis Definition

Research Hypothesis

The primary research question should be driven by the hypothesis rather than the data.

The research question and hypothesis should be developed before the start of the study.

A good hypothesis must be based on a good research question at the start of a study and drive data collection for the study.

Hypothesis

Is a clear statement of what is intended to be investigated.

It should be specified before research is conducted and openly stated in reporting the results.

It allows to Identify: the research objectives the key abstract concepts involved in the research its relationship to both the problem statement and

the literature review

Source of Hypothesis

Environment

Literature

Other Empirical Data

Personal Experience

Type of Hypothesis

Null Hypothesis

Alternative Hypothesis

Type of Hypothesis

Example

Example

There is no significant gain between pre-test and post- test scored of students exposed to Computer-Aided Instruction in Analytic Geometry

Special Consideration for Null Hypothesis

Hypothesis Testing:

1-sided or 2-sided hypotheses?

A 2-sided hypothesis states that there is a difference between the experimental group and the control group, but it does not specify in advance the expected direction of the difference.

A 1-sided hypothesis states a specific direction (e.g., there is an improvement in outcomes with computer-assisted surgery).

A 2-sided hypothesis should be used unless there is a good justification for using a 1-sided hypothesis.

Error Type

Research objective

The primary objective should be coupled with the hypothesis of the study.

Study objectives define the specific aims of the study and should be clearly stated in the introduction of the research protocol.

Example: Hypothesis : there is no difference in functional outcomes

between computer-assisted acetabular component placement and free-hand placement,

The primary objective can be stated as follows: this study will compare the functional outcomes of computer-assisted acetabular component insertion versus free-hand placement in patients undergoing total hip arthroplasty.

Research objective

The study objective is an active statement about how the study is going to answer the specific research question.

Objectives state exactly which outcome measures are going to be used within their statements.

They are important to not only guide the development of the protocol and design of study but also play a role in sample size calculations and determining the power of the study.

Literature Review

Literature Review

Is an evaluative report of studies found in the literature related to your selected area.

Should describe, summarize, evaluate and clarify this literature.

Give a theoretical basis for the research and help you determine the nature of your own research.

Select a limited number of works that are central to your area rather than trying to collect a large number of works that are not as closely connected to your topic area.

Boote, D.N. & Beile, P. (2005). Scholars before researchers: On the centrality of the dissertation literature review in research preparation. Educational Researcher 34/6, 3-15.

Literature Review Purpose

Provide a context for the research

Justify the research

Ensure the research hasn't been done before (or that it is not just a "replication study")

Show where the research fits into the existing body of knowledge

Enable the researcher to learn from previous theory on the subject

Literature Review Purpose

Illustrate how the subject has been studied previously

Highlight flaws in previous research

Outline gaps in previous research

Show that the work is adding to the understanding and knowledge of the field

Help refine, refocus or even change the topic

Strategies

Strategies

Kirby, S., Greaves, L. & Reid, C. (2006). Searching the Literature. In Experience research social change: Methods beyond the mainstream

Literature Review in a thesis

The cycle

Hasibuan, 2007, Metode Penelitian Komputasi

What you should do

Compare

Contrast

Criticize

Synthesize

Summarize

Hasibuan, 2007, Metode Penelitian Komputasi

Sources

Articles in International Journal

Thesis

Disertasi

Proceeding

Magazines

Abstract book

Websites

Literature Citation

Whenever you quote, summarize, paraphrase or refer to the work of another person you need to cite it.

Giving credit to the original author for any information that you learn through our research process and share with the readers.

Citing is the way to give credit to other's work when you use it in your papers, speeches and projects.

Citing other's work is a very important step in the academic writing process and the best way to avoid plagiarism.

Literature Citation

Two ways: Use sentence that introduce the author Add the author’s name at the end of the sentence

We must provide last name and year of publication

Paraphrase signal phrase:

“According to Smith (2004) the cost of treating alcoholism is increasing dramatically.”

Direct Quote:“ the cost of treating alcoholism is exceeded only by the cost of treating illness from tobacco use, and is increasing exponentially” (Smith, 2004)

Research Design

Research Design

A plan or strategy for conducting the research

Spells out the basic strategies that researchers adopt to develop evidence that is accurate and interpretable.

Deals with matters such as selecting participants for the research and preparing for data collection.

Purposes of Research Design

1. To provide answers to research questions

2. To control variance

Purposes of Research Design

1. To provide answers to research questions

2. To control variance

Characteristics for good research design

1. Freedom from bias

2. Freedom from confusing

3. Control of extraneous variables

4. Statistical correctness for testing hypothesis

TYPES OF RESEARCH1. Experimental research – involves

manipulating condition and studying effects – (IPO-Input-Process-Output)

2. Correlational research – involves studying relationship s among variables within a single group, and frequently suggests the possibility of cause and effect.

3. Survey research – involves describing the characteristics of a group by means of such instruments as interview schedules, questionnaires, and tests.

Ethnographic research - concentrates on documenting or portraying the everyday experiences of people using observation and interviews.

Involve how well, how much, how efficiently, knowledge, attitudes or opinion in the like exists.

Case study – is a detailed analysis of one or a few individuals

Historical research – involves studying some aspect of the past

Action research – is a type of research by practitioners designed to help improve their practice.

GENERAL RESEARCH TYPES

It is useful to consider the various research methodologies we have described as falling within one or more general research categories –

Descriptive

Associational

Intervention-type Studies

1. DESCRIPTIVE STUDIES It describe a given state of affairs as fully and

carefully as possible.

Examples:

- In Biology, where each variety of plant and animal species is meticulously described

and information is organized into useful taxonomic categories.

- In educational research, the most common descriptive methodology is the survey, as when researchers summarize the

characteristics (abilities, preferences, behaviors, and so on) of individuals or groups or physical environment (school)

2. ASSOCIATIONAL RESEARCHResearch that investigates relationships

is often referred to as associational research

Correlational and causal-comparative methodologies are the principal examples of associational research.

Example: Studying relationship

(a) between achievement and attitude

(b) between childhood experiences and adult characteristics

(c) between teacher characteristic and student achievement

(d) between methods of instruction & achievement (comparing

students who

have been taught by each method)

(e) between gender and attitude (comparing attitudes of males and females)

Descriptive research is not satisfying since most researchers want to have complete understanding of people and things not just merely describing but need further analysis.

Associational studies are, they too are ultimately unsatisfying.

- because it did not permit researchers to “do something” to influence or change outcomes.

- Simply determining interest or achievement of students does not tell us how to change or improve either interest or achievement.

3. INTERVENTION STUDIES

To find out whether one thing will have an effect on something else, researchers need to conduct some form of intervention study.

Is a particular treatment is expected to influence one or more outcomes.

Such studies enable researchers to assess

For example:

- the effectiveness of various teaching methods,

- curriculum models,

- classroom arrangements

- and other efforts to influence the characteristics of individuals or groups.

Experiment is the primary methodology used in intervention research

Some types of research may combine these 3 general types

Quantitative vs. qualitative research

Areas Quantitative Qualitative

Goals -Theory testing, establishing facts, statistical description, prediction, relationship between variables

- Sensitizing concepts, describe multiple realities, grounded theory, develop understanding

Design - Structured, predetermined, formal, specific detailed plan of operation

- Evolving, flexible

Areas Quantitative Qualitative

Data -Quantitative, quantifiable coding counts, measures, operationalized variables statistics

- Descriptive, personal documents, field notes, photographs, people’s own words, official documents

Sample - Large, stratified, control groups, precise, random, control of extraneous variables

- Small, non-representative, focused, purposeful, convenient

Areas Quantitative Qualitative

Technique or methods

- Experiments, surveys, structured interviewing, structured observation

- Observation, participant observation, review of documents, open-ended interviewing, first person accounts.

Relationship with subjects

- Detached, short term, distant, subject-researcher restricted

- Empathy, emphasis on trust, democratic

Areas Quantitative Qualitative

Data analysis

- Deductive, statistical

- Ongoing models, themes, concepts, inductive, analytic,constant comparative.

Problems - Controlling other variables, validity, reliability

- Time consuming, data reduction difficulties, procedures not standardized, difficulty to study large populations,Empathy, emphasis on trust, democratic

Research Types under Quantitative & Qualitative

Quantitative Qualitative1.Experimental

Research2.Single-Subject

Research3.Correlational

Research4.Causal-

Comparative Research

5.Survey Research

1.Ethnographic Research

2.Historical Research

IDENTIFY WHAT TYPE OF RESEARCHHistorical study of college entrance

requirements over time that examine the relationship between those requirements and achievement in mathematics.

An ethnographic study that describes in detail the daily activities of an inner-city high school and also finds a relationship between media attention and teacher morale in school

An investigation of the effects of different teaching methods on concept learning and gender

We can classify designs into a simple threefold classification by asking some

key questions.

This threefold classification is especially useful for describing the design with respect to internal validity.

A randomized experiment generally is the strongest of the three designs when your interest is in establishing a cause-effect relationship.

A non-experiment is generally the weakest in this respect only to internal validity or causal assessment.

In fact, the simplest form of non-experiment is a one-shot survey design that consists of nothing but a single observation O.

The most common forms of research descriptive ones

What research type would be appropriate for these research problem?

1. How do parents feel about the elementary school counseling program?

2. Do students who have high score on reading tests also have high scores on writing tests?

3. What effect does the gender of a counselor have on how he or she is “received by counselees”?

4. How can Tom Adams be helped to learn to read?

ANSWER1. ETHNOGRAPHIC STUDY

2. CORRELATIONAL STUDY

3. CAUSAL-CORRELATION STUDY/INTERVENTION STUDY

4. EXPERIMENT/CORRELATIONAL OR

ASSOCIATIONAL-INTERVENTION STUDY

Sampling Methods

What exactly IS a “sample”?

What exactly IS a “sample”?

A subset of the population, selected by either

“probability” or “non-probability” methods. If you have a “probability sample”

you simply know the likelihood of any member of the

population being included (not necessarily that it is

“random.”

SAMPLING 93

A sample is “a smaller (but hopefully representative) collection of units from a population used to determine truths about that population” (Field, 2005)

Why sample?Resources (time, money) and workloadGives results with known accuracy that can be

calculated mathematically

The sampling frame is the list from which the potential respondents are drawn Registrar’s officeClass rostersMust assess sampling frame errors

SAMPLING…….

94

3 factors that influence sample representative-ness

Sampling procedure Sample size Participation (response)

When might you sample the entire population? When your population is very small When you have extensive resources When you don’t expect a very high response

Assumptions of quantitative sampling

We want to generalize to the population.

Random events are predictable.

Therefore…We can compare random events to our results.

Probability sampling is the best approach.

96

SAMPLING BREAKDOWN

SAMPLING…….97

TARGET POPULATION

STUDY POPULATION

SAMPLE

Process 98

The sampling process comprises several stages:Defining the population of concern Specifying a sampling frame, a set of items

or events possible to measure Specifying a sampling method for selecting

items or events from the frame Determining the sample size Implementing the sampling plan Sampling and data collecting Reviewing the sampling process

Assumptions of qualitative sampling

Social actors are not predictable like objects.

Randomized events are irrelevant to social life.

Probability sampling is expensive and inefficient.

Therefore…

Non-probability sampling is the best approach.

Types of samples

Types of Samples 101

Probability (Random) Samples

Simple random sampleSystematic random sampleStratified random sampleMultistage sampleMultiphase sampleCluster sample

Non-Probability SamplesConvenience samplePurposive sampleQuota

Simple Random Sample

1. Get a list or “sampling frame”a. This is the hard part! It must not systematically

exclude anyone.

b. Remember the famous sampling mistake?

2. Generate random numbers

3. Select one person per random number

SIMPLE RANDOM SAMPLING……..103

Estimates are easy to calculate.

Simple random sampling is always an EPS design, but not all EPS designs are simple random sampling.

Disadvantages

If sampling frame large, this method impracticable.

Minority subgroups of interest in population may not be present in sample in sufficient numbers for study.

Systematic Random Sample

1. Select a random number, which will be known as k

2. Get a list of people, or observe a flow of people (e.g., pedestrians on a corner)

3. Select every kth persona. Careful that there is no systematic rhythm to the

flow or list of people.

b. If every 4th person on the list is, say, “rich” or “senior” or some other consistent pattern, avoid this method

SYSTEMATIC SAMPLING……105

ADVANTAGES:

Sample easy to select

Suitable sampling frame can be identified easily

Sample evenly spread over entire reference population

DISADVANTAGES:

Sample may be biased if hidden periodicity in population coincides with that of selection.

Difficult to assess precision of estimate from one survey.

Stratified Random Sample

1. Separate your population into groups or “strata”

2. Do either a simple random sample or systematic random sample from there

a. Note you must know easily what the “strata” are before attempting this

b. If your sampling frame is sorted by, say, school district, then you’re able to use this method

STRATIFIED SAMPLING……107

Drawbacks to using stratified sampling.

First, sampling frame of entire population has to be prepared separately for each stratum

Second, when examining multiple criteria, stratifying variables may be related to some, but not to others, further complicating the design, and potentially reducing the utility of the strata.

Finally, in some cases (such as designs with a large number of strata, or those with a specified minimum sample size per group), stratified sampling can potentially require a larger sample than would other methods

Multi-stage Cluster Sample

1. Get a list of “clusters,” e.g., branches of a company

2. Randomly sample clusters from that list

3. Have a list of, say, 10 branches

4. Randomly sample people within those branchesa. This method is complex and expensive!

The Convenience Sample

1. Find some people that are easy to find

The Snowball Sample

1. Find a few people that are relevant to your topic.

2. Ask them to refer you to more of them.

The Quota Sample

1. Determine what the population looks like in terms of specific qualities.

2. Create “quotas” based on those qualities.

3. Select people for each quota.

The Theoretical Sample

Jenis Penelitian untuk Skripsi Komputasi Statistik STIS

Pengembangan sistem informasi statistik

Sistem informasi berbasis komputer yang dikembangkan untuk mendukung kegiatan pada domain/area statistik. Contoh: Sistem Informasi Rujukan Statistik, Sistem Informasi Geografis yang menggunakan data (hasil olahan) statistik, Sistem Informasi Diseminasi Statistik, serta Sistem Informasi Data Entri dan Monitoring dalam kegiatan pengumpulan data statistik.

Jenis Penelitian untuk Skripsi Komputasi Statistik STIS

Pengembangan aplikasi statistik

Program aplikasi yang dibuat untuk mendukung pemecahan masalah di bidang statistika.

Program harus dibuat sendiri dan pemecahan masalah tersebut belum bisa dilakukan dengan menggunakan paket program pengolahan data statistik yang sudah ada; atau program boleh dibuat dengan menggunakan pustaka/library yang sudah ada namun belum ada interface nya; atau bisa dilakukan dengan paket program namun proses/prosedurnya tidak/belum efisien sehingga perlu dibuat suatu aplikasi yang terintegrasi.

Contoh: Pengembangan Aplikasi Fitting Regresi, Aplikasi Pengujian Hipotesis Menggunakan Permutation Test dalam Resampling.

Jenis Penelitian untuk Skripsi Komputasi Statistik STIS

Kajian teknologi di bidang komputasi statistik

Kajian yang dilakukan pada dua bidang keilmuan tersebut yang hasilnya dapat bermanfaat bagi perkembangan ilmu komputer maupun statistik.

Tema penelitian yang tidak masuk dalam jenis pertama dan kedua bisa dimasukkan ke dalam jenis ketiga ini jika dipandang tema penelitiannya memiliki orisinalitas dan inovasi serta tingkat kontribusi yang tinggi bagi perkembangan ilmu komputer maupun statistik, Badan Pusat Statistik, maupun bagi masyarakat.

Contoh: Pengembangan Inference Engine Sistem Pakar Berbasis Database (Studi Kasus Penentuan Metode Penyusunan Indeks Harga dan Produksi), Pengembangan Mesin Pencari Statistik Berbasiskan Supervised Learning dan Relevant Feedback.

Metode,Teknik Dan InstrumenDalam Penelitian

Research Instruments:

Tools for gathering dataQuestioners Interview

Questioners

The most common instrument or tool of research for obtaining the data beyond the physical reach of the observer which

Closed form / Closed-ended

Open form / Open-ended

Questioners

Clarity of language

Singleness of purpose

Relevant to the objective of the study

Correct grammar

Questioner: Advantages

Facilitates data gathering

Is easy to test data for reliability and validity

Is less time-consuming than interview and observation

Preserves the anonymity and confidentiality of the respondents’ reactions and answers

Questioner: Disadvantages

Printing and mailing are costly

Response rate maybe low

Respondents may provide only socially acceptable answers

There is less chance to clarify ambiguous answer

Respondents must be literate and with no physical handicaps

Rate of retrieval can be low because retrieval itself is difficult

Interview

Purpose:

to verify information gathered from written sources

to clarify points of information

to update information and

to collect data

Interview: Types

Screening interview

Panel or Group Interview

Telephone interview

How to measure the instruments?

Validity- measure what is intends to measure External validity: is the results of a study can be generalized from a

sample to a population? Content validity: The appropriateness of the content of an instrument.

In other words, do the measures (questions, observation logs, etc.) accurately assess what you want to know

Reliability – stability in maintaining consistent measurement in a test administered twice Inter-Rater/Observer Reliability: The degree to which different

raters/observers give consistent answers or estimates. Test-Retest Reliability: The consistency of a measure evaluated over

time. Parallel-Forms Reliability: The reliability of two tests constructed the

same way, from the same content. Internal Consistency Reliability: The consistency of results across

items, often measured with Cronbach’s Alpha.

Recommended