31
Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Edinburgh, 2 April 2004 Louise Corti

Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Edinburgh, 2 April 2004 Louise Corti

Embed Size (px)

Citation preview

Page 1: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Edinburgh, 2 April 2004 Louise Corti

Guidelines for data preparation

Social Science Data Archives: creating, depositing and using data

Edinburgh, 2 April 2004Louise Corti

Page 2: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Edinburgh, 2 April 2004 Louise Corti

ESRC Datasets Policy –what is expected of award holders?

• to preserve and share data from ESRC funded research

• funding allowed to prepare data for archiving • all award-holders must offer data for deposit to

the ESDS within 3 mths of the end of the award • any potential problems should be notified to the

ESDS at the earliest opportunity • final payment will be withheld if dataset has not

been deposited within 3 mths of the end of the award, except where a waiver has been agreed in advance

• ESRC Datasets Policy http://www.esrc.ac.uk/esrccontent/researchfunding/sec17.asp

Page 3: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Edinburgh, 2 April 2004 Louise Corti

Depositing data• data should be deposited to a standard that would enable

them to be used by a third party, including the provision of adequate documentation – Good housekeeping = good research = good archives

• any potential problems in archiving the data should be discussed with ESDS acquisitions as soon as possible

• issues of consent and confidentiality allowing archiving should be included in the project management plan & addressed before data collection starts

• unless a waiver on deposition has been agreed with the ESDS and the ESRC, researchers should not make commitments to informants which preclude archiving their data.

Page 4: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Edinburgh, 2 April 2004 Louise Corti

Data creation and deposit: best practice

• Early advice to data creators:– high quality data and documentation– consent and ethical issues are taken on board– IPR issues considered

• Promoting standards in: – research design– transcription techniques– data and project management– documenting data collection & analysis

Page 5: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Edinburgh, 2 April 2004 Louise Corti

Characteristics of a “good” archived research collection

• intellectual content

• accurate data, well organised and labelled files

• supporting documentation created– major stages of research recorded

• data that can be stored in user-friendly “dissemination” formats, but can also be archived in a future-proof “preservation” format

• consent, confidentiality & copyright resolved

Page 6: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Edinburgh, 2 April 2004 Louise Corti

Intellectual content

• builds on previous research

• addresses new issues

• topics not too specific or narrowly focused

• innovative approach to discipline and methodology

Page 7: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Edinburgh, 2 April 2004 Louise Corti

Extensive raw data

• Types of research data assembled

– survey data – in-depth interviews– focus groups– field notes / participant observation– case study notes

• images and sound recordings

• range of material – broad focus

Page 8: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Edinburgh, 2 April 2004 Louise Corti

Supporting Documentation

• Examples

– Funding application– Description of methodology– Communication with informants on confidentiality– Coding schemes / themes– Technical details of equipment – Interview schedules– End of award report– Documentation from CAQDAS software packages, e.g.,

analytical memos– Bibliographies, resulting publications

• Anything that adds insight or aids understanding and secondary usage

Page 9: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Edinburgh, 2 April 2004 Louise Corti

Why so crucial?ESDS Qualidata for key activities

• enhanced user guides and digital samplers

• exemplars and case studies of re-use

• online access to qualitative data

• user support and training activities to support secondary analysis of data

Page 10: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Edinburgh, 2 April 2004 Louise Corti

Qualitative data: enhanced user guides and digital

samplers

providing a better understanding of the study and research methods

• enhanced users guides – detailed notes on study methodology and re-use; ‘behind the scenes’ interviews with depositors; FAQs

• thematic pages – combining interviews

• digital samplers of classic sociology collections

Page 11: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Edinburgh, 2 April 2004 Louise Corti

Qualitative data: exemplars and case studies of re-use

providing guidance on data resources and how to re-use them

• overview of ways of re-using data

• case studies of re-use including reflections and commentary, eg How was it really done documentation of methods; Team “discussions” about coding

• full bibliography of re-use articles

• online ‘packaged’ training resources

• user support and training programme

Page 12: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Edinburgh, 2 April 2004 Louise Corti

Online access to qualitative data

• new emphasis on providing direct access to collection content

– supports more powerful resource discovery

– greater scope for searching and browsing content of data (supplementary to higher level study-related metadata)

– since users can search and explore content directly… can retrieve data immediately

• providing access to qualitative data via common interface (EDSD Qualidata Online)

• supporting tools for searching, retrieval, and analysis across different datasets

Page 13: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Edinburgh, 2 April 2004 Louise Corti

Q: Transcribing Research

• integrated into the ongoing research – budget accordingly

• full transcriptions or summaries

• costs and benefits;– self transcription– internal team transcription– external transcription

• full transcriptions;– consistent layout– speaker tags– line breaks– header with identifier / other details – checked for errors

Page 14: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Edinburgh, 2 April 2004 Louise Corti

Q: Identifiers removed

• Confidentiality respected

• Anonymisation?

• Problems of anonymisation– Applied too weakly– Applied to strongly– Timing – Potential for distortion– Examples

• User undertakings

• Appropriate and sympathetic

Page 15: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Edinburgh, 2 April 2004 Louise Corti

Q: Labelling and listing Research

• list of contents of research collection

• acts as a point of entry for secondary user

• qualitative data: template approach – interviewee/case study characteristics

• See example in pack

Page 16: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Edinburgh, 2 April 2004 Louise Corti

Qu: Accuracy of data: validation checks

Computer aided surveys (CAPI, CATI or CAWI)

• these are the most accurate way of gathering survey data, but the software (e.g. Blaise) and hardware (e.g. a laptop for every interviewer) may be beyond project resources 

• computer aided surveys allow one to build in as many logical checks - on question routing and responses - as is possible at the point of data creation

Non computer aided surveys

• less control over initial responses, but checks can performed:– at the point of data entry/transcription if “data entry” software

is used. However, there are few cheap data entry packages around

– the only feasible option may be to enter data without checks directly into a spreadsheet style interface (e.g. Excel worksheet, SPSS data view), and perform validation checks afterwards - via command files in statistical packages or Visual Basic code in Excel or Access

Page 17: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Edinburgh, 2 April 2004 Louise Corti

Qu: An example of data seemingly untouched by the human eye:

Originating error in text variables:

Occupation Description of Occupation‘sole trader’ ‘purveyor of seafood’

Propagated error in derived numeric variables:• Respondent was coded under the standard occupational

(SIC) code relating to food retailers:

52.2 Retail sale of food, beverages and tobacco in specialised stores

Page 18: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Edinburgh, 2 April 2004 Louise Corti

Qu: Labelling of data• all variables should be named. Variable names should not exceed 8

characters where possible, as the most common format for disseminating data is SPSS

• all variables should be labelled. Labels should be brief (preferably < 80 characters), but precise and always make explicit the unit of measurement for continuous (interval) variables. Where possible, all variable labels should reference the question number (and if necessary questionnaire). For example, the variable q11bhexc might have the label “q11b: hours spent taking physical exercise in a typical week”. This gives the unit of measurement and a reference to the question number (q11b), so the user can quickly and easily cross-reference to it

• for categorical variables, all codes (values) should be given a brief label (preferably < 60 characters). For example, p1sex (gender of person 1) might have these value labels: 1 = male, 2 = female, -8 = don’t know, -9 = not answered

• where possible, all such labelling should be created and supplied to the UKDA as part of the data file itself. This is the expectation with data supplied in one of the three major statistical packages - SPSS, STATA or SAS.

Page 19: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Edinburgh, 2 April 2004 Louise Corti

Qu: Documentation

Core documentation:• Questionnaire.• Methodology: details of sample design, response rate,

etc.• “Codebook”, i.e. a comprehensive list of variable names,

variable descriptions, code names and variable formatting information. This is essential If the package being used for data management does not allow the sort of variable and code labelling to be stored within the data file

• Technical report describing the research project.

Other useful documentation that is seldom supplied:• Code used to create derived variables or check data

(e.g. SPSS, STATA or SAS “command files”).

Page 20: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Edinburgh, 2 April 2004 Louise Corti

QU: Good and bad data documentation formats

  Preferred format(s) Acceptable format(s)

Problematic format(s)

Data held in a statistical package

SPSS - portable (.por) or system (.sav) file.

STATA; SAS (with formats information), delimited text

Fixed-width (undelimited) text format.

Data held in a Spreadsheet

Delimited text (tab delimited or comma separated), Excel, Lotus

Quattro Pro  

Data held in a database

Delimited text with SQL data definition statements, MS ACCESS, dBase, FoxPro, SIR export, XML

Filemaker Pro, Paradox Fixed-width (undelimited) text format.

Documentation(e.g. questionnaires, codebooks, interviewers instructions, project description, etc.)

Microsoft Word, Adobe PDF, Rich text format (RTF)

SGML, HTML, XML, WordPerfect

Hard copy (paper)

See ESDS web site for full table

Page 21: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Edinburgh, 2 April 2004 Louise Corti

Consent for archiving• anonymity and privacy of research participants should be

respected

• explicit ‘informed’ consent gained

• information for research participants should be clear and coherent and include:

– purpose of research – what is involved in participation – benefits and risks – storage and access to data – usage of data (current and future uses)– withdrawal of consent at any time– Data Protection & Copyright Acts

• N.B. Additional measures are needed when participants are unable to consent through incapacity or age

• reflect needs and views of all

• works in practice

Page 22: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Edinburgh, 2 April 2004 Louise Corti

Ethical practice - history• The Nuremburg code

– principles for conducting research with human subjects arises from the Nuremberg Trials, which took place after the Second World War

– sets out statements of certain moral, ethical and legal principles relating to research involving human subjects

• Declaration of Helsinki – Adopted in 1964 by the World Medical Assembly

adopted – provides guidance for physicians in biomedical

research with human subjects. Most recently amended in 1989 and 1996

• the ethical guidelines of many professional organisations endorse this principle

Page 23: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Edinburgh, 2 April 2004 Louise Corti

Legal issues in data preparation

• ‘Duty of confidentiality’

• Law of Defamation

• Data Protection Act 1998 and EU Directive

• Copyright Act 1988

Page 24: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Edinburgh, 2 April 2004 Louise Corti

Duty of Confidentiality

• disclosure of information may constitute a breach of confidentiality and possibly a breach of contract

• not governed by an Act of Parliament• not necessarily in writing• can be a legal contractual

• exemptions are:– relevant police investigations or proceedings– disclosure by court order– ‘public interest’ - defined by the courts– ethical obligations in cases of disclosure of child

abuse

Page 25: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Edinburgh, 2 April 2004 Louise Corti

Law of Defamation

• a defamatory statement is one which may injure the reputation of another person, company or business

Page 26: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Edinburgh, 2 April 2004 Louise Corti

Data Protection Act 1998

• eight principles:– Fairly and lawfully processed – Processed for limited purposes – Adequate, relevant and not excessive – Accurate – Not kept longer than necessary – Processed in accordance with the data subject's rights – Secure

– Not transferred to countries without adequate protection

• allows for secondary use of data for research purposes under certain conditions

Page 27: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Edinburgh, 2 April 2004 Louise Corti

Options for preserving confidentiality

• anonymisation

• consent to archive at the time of field work

• researcher contacts informants retrospectively

• user undertakings

• in exceptional circumstances - permission to use or closure of material

Page 28: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Edinburgh, 2 April 2004 Louise Corti

Copyright Act 1988• developed for the broadcasting industry not research!

• protection of author’s rights

• multiple copyrights apply:– automatically assigned to the speaker– researcher holds the copyright in the sound

recording of an interview obtain written assignment of copyright from

interviewee, or oral agreement (license) to use– employer holds the copyright in research data

obtain copyright clearance from employer)• copyright lasts for 70 years after the end of the year in

which the author dies • copying work is an infringement unless it is for the

purposes of research, private study, criticism or review or reporting current events, and if the use can be regarded as being in the context of 'fair dealing

• seek legal advice on problem issues

Page 29: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Edinburgh, 2 April 2004 Louise Corti

Conclusion: archivable research

• suitable for electronic dissemination

• suitable formats for re-use and long-term preservation

• in-house data processing – ‘cleaning up’ research/ documenting– repairing minor errors– meeting user expectations

• meeting users needs– building an expansive and varied data portfolio– creating online exploratory/data browsing systems

good housekeeping = good research = good archives

Page 30: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Edinburgh, 2 April 2004 Louise Corti

Depositing data with ESDS• Provide details of all data collected, together with three

samples of qualitative data, if applicable

• to do this, complete the Data Submission form on the ‘Deposit’ pages on UKDA website

• dataset will then be reviewed for archiving by the UKDA Acquisitions Review Committee

• if accepted, complete the Archive’s Deposit and Licence forms, and send the data, documentation and forms to the UK Data Archive within the required time-scales

• you will be notified when your data are being released via the UKDA online catalogue

• access to data will be granted to registered bona fide researchers only

Page 31: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Edinburgh, 2 April 2004 Louise Corti

Creating or Depositing DataCreating or Depositing Data

www.esds.ac.uk/create

[email protected]

Susan Cadogan and Gill Backhouse