34
Graduate Research Methods and Scholarly Writing in the Social Sciences: Government and History Harvard Summer School: SSCI S- 100b Section 2 (32761) Joe Bond 6/24/2013

Graduate Research Methods and Scholarly Writing in the Social Sciences: Government and History Harvard Summer School: SSCI S-100b Section 2 (32761) Joe

Embed Size (px)

Citation preview

  • Slide 1

Slide 2 Graduate Research Methods and Scholarly Writing in the Social Sciences: Government and History Harvard Summer School: SSCI S-100b Section 2 (32761) Joe Bond 6/24/2013 Slide 3 Introduction to the Course, Social Science Approach, ALM Context and Proseminar Objectives Course Requirements and Grading Facilitation (1 minimum) & participation in class discussions 7% In-class exercises (8 of 10 required, but NOT graded) 8% Argument writing assignment (1ST paper) 10% Book review writing assignment (2nd paper) 15% Literature review writing assignment (3rd paper) 25% Mid-term exam 10% Research design writing assignment (4th paper) 20% Final class presentation5% Harvard Extension School is not a traditional graduate program. Explain. Volunteers for next weeks facilitation Slide 4 Introductions Slide 5 Basics will be brief but the.ppt will be posted on the course website Qualitative vs. Quantitative vs. Mixed Methods Largely a moot debate (more and more studies utilize mixed methods) Your questions should always determine your methodological approach, not the reverse Why and when to use A and why and when to use B depends: Is there a relationship between regime type and violent conflict? What are the odds that the nation of Fester will fail in the next 5 years? What is the role of political culture as it relates to negotiations? What would have happened if Germany refrained from invading Poland in 1939? Your choice of methods depends on how you operationalize your variables (e.g. how do you intend to measure political culture, etc.?) Slide 6 Variables Independent variables (IVs) are those variables that help explain a dependent variable Independent variables must be antecedent to dependent variables (e.g. relationship between education and income) Dependent variables (DVs) are the things you are trying to explain Example: Relationship between SAT scores (IV) and success in college (DV) Dependent variable should always be labeled along the y axis of a graph Slide 7 Level of Measurement Why is it important? Nominal: (measures not ranked: gender, religion, etc.) Ordinal (measures rank ordered: economic class) Interval (measures equally ranked: income) Ratio as characterized in the social sciences (the measure has an absolute zero: mass, length, time) Think Nominal, Ordinal, Interval, Ratio (NOIR)! Slide 8 Association An Association between two variables: the values of one variable tend to coincide (vary or covary) with the values of the another Example 1: the relationship between sex education and teen pregnancy. Teen pregnancy as the DV, sex education as an IV (note: in this example we treat the latter as antecedent to the former) We might hypothesize that increased exposure to sex education programs help mitigate the incidences of teen pregnancy (i.e. they vary: as X goes up, Y goes down) Example 2: the relationship between education (IV) and income (DV) We might hypothesize the more education one has, the higher ones future income will be (i.e. they covary: as X goes up, Y goes up) An anomaly. Wrong career trajectory. Slide 9 Correlation A statistical term that indicates the strength and direction of a linear relationship between variables (e.g. the relationship between education and income) IMPORTANT! Association or correlation DOES NOT imply causation example 1: drowning (DV) and consumption of ice cream (IV) they covary (as ice cream consumption goes up, incidents of drowning increases) example 2: childrens shoe size (IV) and math performance (DV) they covary (as shoe size gets bigger, math skills go up) Example 2 also highlights the importance of definitions, operationalization and transparency example 1: ice cream consumption is a proxy for temperature example 2: shoe size is a proxy for age Slide 10 More on Correlation Correlation is a measure of the direction and degree of strength between two or more variables A correlation coefficient (r or Pearsons r) is a numerical index of that relationship The magnitude of the correlation coefficient indicates the strength of the relationship between variables (i.e. -1 to +1) +1 means a perfect positive correlation (co-vary) while -1 shows a perfect negative correlation (vary) The closer the correlation coefficient is to +1 or -1, the stronger the relationship But even a strong [negative or positive] correlation is meaningless if the level of error (significance) is large (e.g. p < 0.5 vs. p < 0.01) Slide 11 Hypotheses & Null Hypotheses H 1 : as education increases, likelihood of voting increases H 0 : education has no effect () on the likelihood of a person voting Why do we test the null hypothesis? the strongest proof is the inability to disprove error cannot be eliminated like it or not, facts change Avoid words like this proves or this is irrefutable proof; instead, use supports, lends support to, etc. Slide 12 Types of Analysis Analysis may have exploratory, descriptive, explanatory, and predictive objectives or some combination of these aims Evaluation research is a 5 th type that is not discussed here, albeit it is no less important Slide 13 Exploratory Research Undertaken when very little is known about a phenomenon Forms the foundation for subsequent descriptive and explanatory research In the early 1980s we did not have a good handle on how many Americans were infected with HIV/AIDS or even what caused of it. This sort of research is often linked with activism Slide 14 Descriptive Research Serves to identify important areas of inquiry Often serves as the first step in explanatory inquiry Addresses whether a phenomenon is a common occurrence or a rate event Describe the U.S. electorate and electoral behavior: Jewish Americans tend to vote for democrats Catholics tended to vote democratic but the abortion issue has created a rift Latinos tended to vote overwhelmingly democratic but this began to change in 1999 and swung back again in 2008 Examples: Observational Research, Historical Research, etc. Slide 15 Explanatory Research Scientific inquiry usually does not end with description but proceeds to explanation Descriptive findings are likely to lead to the investigation of the factors associated with the outcome and to attempts to understand how these factors contribute to the occurrence of the outcome Understanding how something works allows us to better predict the future (applies to both qualitative and quantitative research) Examples: Lessons Learned, Counterfactual Thought Experiments, Regression Analysis, etc. Slide 16 Prediction: optimistic/happy pop hits predict a bull market six months in advance Typically follows explanatory research but not always! State Failures, Stock Predictions, etc. Model, below, yields between 50-55% excess returns with no compounding using events data Slide 17 mile Durkheims Suicide (1897): An Example of the Research Process Slide 18 Durkheims Variables Inductive Approach or Theory Building Dependent Variable(s) (what is he trying to explain): RATES of SUICIDE in Europe (1800s) Independent Variables (those things that help explain the Dependent Variable(s)): CLIMATE, AGE, GENDER, POLITICAL TURMOIL, RELIGION (limited to Christianity), MARITAL STATUS, DEPENDENTS, ETC. Recall Levels of Measurement (NOIR) Nominal (cant be ranked) Ordinal (ranked with unequal or arbitrary intervals) Interval (equal intervals) Ratio (as interval with true zero) Slide 19 Some of Durkheims Descriptive Findings Suicide rates are higher for widowed, single and divorced men than married men Suicide rates are higher for people without children than with children Suicide more pronounced in colder climates Suicide rates are higher among Protestants than Catholics Slide 20 Differences between Protestants and Catholics Suicide is [more of] a sin for Catholics Role of coroners if no suicide note is left, it comes down to the coroner's interpretation (circa 1897) Differences in social integration Catholics tend to have higher levels of social integration think the movie My Big, Fat Greek Wedding. Slide 21 The Notion of Integration: Going Beyond Religion Catholic countries tend to be more integrated than Protestant countries, with closer family ties this is why people who are married and/or have children commit less suicide simply put, they have more to live for This is even reflected in physical proximity when speaking with others Social bonds are composed of two factors: social integration: attachment to other individuals within society social regulation: attachment to society's norms Suicide rates may increase when extremities in these factors occur Slide 22 Building a Theory: Social Integration abnormally high or low levels of social integration may result in increased suicide rates; low levels of social integration result in disorganized society (chaos); high levels of social integration drive some to suicide in order to avoid becoming burdens on society Slide 23 Durkheims Suicide Typology Egoistic suicide Ties attaching the individual to society are weak Few social ties to keep the individual from taking his or her own life (Why not?) Altruistic suicide Individuals are extremely attached to society and have no life of their own (self- emulation) They believe their death can bring about a benefit to the society Anomic suicide Weak social regulation between the society's norms and the individual (life becomes too unpredictable and uncertain) Often brought on by dramatic changes in economic and/or social circumstances (e.g. wars, recessions and other turmoil, etc.) Fatalistic suicide Social regulation is completely instilled in the individual (suicide bombers) No hope of change against an oppressive society Slide 24 Research Cycle as an Iterative Process Durkheim used an inductive approach, moving from steps #2 & #3 to build step #1 (observation theory) Most quantitative research involves deductive research (i.e. theory empirical testing) Slide 25 Group Exercise (groups of 2 or 3) 1.Form groups of 3 or 4 2.For each group, 1.define one of the four concepts, below 2.operationalize the concept (i.e. how would you measure the concept in your research?) 3.Reconvene in 5 8 minutes (max) Attractiveness Democracy Leadership Love Slide 26 State Failures State Failure project (1994) objective: then VP Gore asked the CIA to predict which states will fail 5 years out analyzed thousands of [structural] variables found that 3 variables could predict failures 85% of the time looking out 5 years infant mortality (a proxy? for what?) level of democratization openness to trade other salient factors: youth bulge, religious distributions, etc. We will return to this later on in the semester Slide 27 Africa Prospects: Predicting State Failure with Structural Data Slide 28 Africa Prospects Purpose: to assess the vulnerability of countries to conflict escalation based on its profile or set of structural indicators. Overall Accuracy: is defined as the ratio of correct classifications (C) to all classifications (A). Accuracy = C/A * 100%. Recall: is defined as the ratio of correct classifications (C) to the observed classification (O). Recall = C/O * 100% and represents the ability of the algorithm to classify the conflicts as they were observed. Precision: is defined as the ratio of correct classifications (C) to correct (C) and incorrect classifications (I). Precision = C/(C+I) *100%. Illuminates the algorithms false positives; specifically, the higher the ratio the lower the false positives. Slide 29 28 Near-perfect forecast model High recall = 99% High precision = 100% High accuracy =99.5% Bad forecast model #1 (almost every country will be unstable) High recall= 99% (1% miss rate) Low precision = 5% (95% false positive rate) Low accuracy =40-50% NET IS CAST TOO BROADLY Bad forecast model #2 (few countries will be unstable) Low recall = 5% (95% miss rate) High precision =100% (0% false positives) Low accuracy =40-50% NET IS CAST TOO NARROWLY Countries forecast to be unstable at some level of intensity Countries that experience instability Countries that DO NOT experience instability # of correct predictions # of predictions made Recall # of correctly predicted conflicts # of conflicts that occurred # of correctly predicted conflicts # of conflicts predicted to occur Precision Overall Accuracy False positives Misses misses Forecasting Performance Metrics: Definitions and Illustrations Slide 30 29 5-15 Year Validation of Forecasting Average Performance Scores For Different Training Sets / Forecast Periods Forecast Period Low precision scores (high false positive rates) in the out years indicate that the world was more stable than would have been expected given macro-structural conditions. However, high recall scores indicate the net is cast wide enough to correctly forecast conflicts that DO occur (errors fall on the side of caution). Accuracy Recall Precision Slide 31 Independent Variables Caloric Intake: Estimate of the average number of calories consumed per person, per day. GDP per Capita: Annual gross domestic product per person measured in constant 1995 U.S. dollars. Male/Female Infant Mortality: Number of deaths of male and female children under 1 year of age per 1,000 live births. Life Expectancy: Average life expectancy (males and females combined). Youth Bulge: Ratio of population aged 15-29 to those aged 30-69. Among others.. Slide 32 31 Slide 33 32 Instability Levels High intensity (if combined probability > 67% Moderate intensity (if combined probability > 67% None/Low intensity (if combined probability > 67% 3 Levels of instability intensity Maximum level/intensity of conflict per country-year; source: KOSIMO Data Project, Heidelberg Institute of International Conflict Research (HIIK), 1975-2003. http://www.hiik.de/de/index_d.htm Represents a high threshold of instability Dependent Variable: Index of Instability Key Assumption: country is unstable if (and only if) the government or its opponent(s)threatens or initiates a conflict to restore equilibrium or harmony with respect to its internal or external relations. Slide 34 Steps 1. Compile a time series data set of the selected target variables intensity 2. Compile a time series data set of candidate indicators associated with the targets intensity 3. Train an algorithm that explains the historical target intensities with the candidate indicators 4. Calculate performance measures of the explanation from a time series of historical test data 5. Generate vulnerability scores based on projections from the current value of the indicators 6. Calculate a confidence level for the forecasts using the likelihood of occurrence at each intensity 7. Iterate from step #1 for experimentation with alternative target variables 8. Iterate from step #2 for experimentation with alternative explanatory variables or indicators Slide 35 In-Class Writing Exercise 1 June 24, 2013 Educating Sergeant PantzkeEducating Sergeant Pantzke (7:35) Should take no longer than 10-15 minutes On the opposite side of this paper only, take a position: The U.S. government should [should not] decide which schools can receive GI bill funding. For example, veterans working their way through Harvard should be able to use GI bill funds whereas vets working on a degree at the University of Phoenix should be prohibited to fund their education through the GI bill. Include any evaluation criteria that come to mind if you take the position that some schools but not others should qualify for GI bill funding.