DIGITAL REPOSITORIES: ESSENTIAL INFORMATION FOR ACADEMIC LIBRARIANS
DIGITAL REPOSITORIES: ESSENTIAL INFORMATION FOR ACADEMIC LIBRARIANSAURARIA LIBRARY FEBRUARY, 2015
Title is ambiguous.1 OutlineTerminologyInstitutional RepositoriesIRs in ColoradoIR softwareStandard identifiers for digital objects in repositoriesDigital preservation for IRsDisciplinary repositoriesData repositoriesOAI-PMHThe future
2 TerminologyInstitutional repository (IR)Disciplinary repository (Subject repository)Green open-accessPost-printAuthor's accepted manuscript (AAM)SPARC author addendumEmbargo periodPre-print serverSherpa RomeoDark archive
An institutional repository is an OA repository that is sponsored by an institution, usually a university or college. Most of its content is open access, but some may be embargoed and some content may be dark archived.
Green open-access refers to author self-archiving of a post-print of a published work (published in a toll-access journal) in an open-access repository. The repository can be institutional or disciplinary. The advantage to the author is that he or she gets to publish in a top toll-access journal and at the same time the content is freely available through the repository. There are many disadvantages to green OA. Because you sign over copyright to the publisher, you need their permission to post the content in the repository. If they grant this permission, they only grant it for the Word version which is not the version that they copyedit and not the version for which they enhance the images, tables, etc. Many also impose embargoes before the author can post the document, six months, one year, two years. Some publishers only allow green OA for institutional repositories, that is, disciplinary repositories are excluded.
A post print is the authors last version of the paper that he or she sends to the journal. It is usually a Word document and incorporates all the changes suggested by peer reviewers. The term authors accepted manuscript (AAM) is synonymous.
The SPARC author addendum The form provides a templated request by authors to add to thecopyright transfer agreementwhich the publisher sends to the author upon acceptance of their work for publication. Authors which use the form typically retain the rights to use their own work without restriction, receive attribution, and toself-archive. The form gives the publisher the right to obtain a non-exclusive right to distribute a work for profit and to receive attribution as the journal of first publication From Wikipedia.
arXive is a preprint server. This tradition started in the particle physics field. In the pre-internet days, because of the long lag time between submitting a manuscript and its eventual publication in a journal, physicists would create mimeographed copies of their manuscripts or pre-prints and share them with colleagues via the mail or at conferences. Eventually these became photocopies, and eventually they became available through telnet and gopher. I can remember helping set up a database at Harvard in 1991 or 2 that was called the Physics Preprint database, and it was metadata for all the preprints. Then the internet came and changed everything. Today the physics preprint database is known as arXive, and its still called a pre-print server, but many people are submitting papers to it and then never submitting them to any journal. So its morphed into a type of publisher. Similar initiatives are being started in other fields. The problem is that much of the content is not peer-reviewed. We know that the major publishers make articles available soon after they are accepted, generally using names like articles in press or something like that, and this is an attempt to compete with pre-print servers.
Sherpa Romeo is a free database that collects green OA policy statements for journals. Authors can use it to determine what they can do with their post-prints.
A dark archive is one that is not accessible at all generally, and may include embargoed material or material being stored for cooperative preservation.
3 Institutional Repositories Directory: http://www.opendoar.org/
First well talk about institutional repositories. They are often referred to as IRs. Open DOAR is a directory of them. 4 Institutional Repositories : Local instancesColorado / Wyoming Institutional Repositories (selected)University of Colorado Boulder, University of Colorado Colorado Springs, Anschutz Medical Campus, Colorado School of Mines, Colorado Mesa University, and Colorado State University still using Digital Collections of ColoradoWyoming Scholars Repository (Digital Commons)University of Northern Colorado, Denver University and Colorado College and others use the Colorado Alliance's repository service, which is an Islandora implementation.Fort Lewis College has Fort Works, an Eprints implementation
To give some local context, I gathered information about IRs in this region. 5
Institutional Repositories : Institutional Repository Software / Hosting
Digital CommonsDSpaceEPrintsFedora IslandoraInvenio / TINDGreenstoneSobekCM
Here are some of the principal IR companies. Explain hosted versus softwareSome of these are open source.Explain TIND.
6 Institutional Repositories : Digital Preservation
"The Academic Preservation Trust (APTrust) is committed to the creation and management of a sustainable environment for digital preservation. APTrusts aggregated repository will solve one of the greatest challenges facing research libraries and their parent institutions preventing the permanent loss of scholarship and cultural records being produced today.""The Digital Preservation Network (DPN) was formed to ensure that the complete scholarly record is preserved for future generations. DPN uses a federated approach to preservation. The higher education community has created many digital repositories to provide long-term preservation and access. By replicating multiple dark copies of these collections in diverse nodes, DPN protects against the risk of catastrophic loss due to technology, organizational or natural disasters."There are two cooperatives for digital preservation for institutional repositories. Basically they work by having several other libraries host all your content in a dark archive on their servers, and you do the same in return. Academic Preservation Trust is based at UVA. Its members include:Columbia UniversityIndiana UniversityJohns Hopkins UniversityNorth Carolina State UniversityPenn State UniversitySyracuse UniversityUniversity of ChicagoUniversity of CincinnatiUniversity of ConnecticutUniversity of MarylandUniversity of MiamiUniversity of MichiganUniversity of North CarolinaUniversity of Notre DameUniversity of VirginiaVirginia Tech
The digital preservation network does not indicate where it is based but it gives a 434 area code for its telephone number, which is Lynchburg, Virginia, so it looks like Virginia is the hotspot for digital preservation. It has these members:
Member ListingArizona State UniversityBrigham Young UniversityBrown UniversityCalifornia Institute of TechnologyColumbia UniversityCornell UniversityDartmouth CollegeDuke UniversityEmory UniversityHarvard UniversityIndiana UniversityIowa State UniversityJohns Hopkins UniversityKansas State UniversityMassachusetts Institute of TechnologyMichigan State UniversityNew York UniversityNorthwestern UniversityNorth Carolina State UniversityOhio State UniversityPennsylvania State UniversityPrinceton UniversityPurdue UniversityRutgers UniversityStanford UniversitySyracuse UniversityTexas A&MTexas Tech UniversityTufts UniversityTulane UniversityUniversity of AlabamaUniversity of ArizonaUniversity of BuffaloUniversity of California San DiegoUniversity of ChicagoUniversity of FloridaUniversity of Illinois at ChicagoUniversity of Illinois at Urbana-ChampaignUniversity of IowaUniversity of KansasUniversity of KentuckyUniversity of MarylandUniversity of MiamiUniversity of MichiganUniversity of MinnesotaUniversity of NebraskaUniversity of New MexicoUniversity of North CarolinaUniversity of Notre DameUniversity of TennesseeUniversity of TexasUniversity of UtahUniversity of VirginiaUniversity of WashingtonUniversity of WisconsinUtah State UniversityVanderbilt UniversityVirginia Polytechnic Institute and State UniversityYale UniversityTexas Digital LibraryCalifornia Digital LibraryJohn D. Evans FoundationAmerican Council on Education
Figshre is unique because it markets to individual scholars. It does also market to institutions. Its owned by Digital Science, which is owned by Macmillan Publishers Limited.8 DataCite
https://www.datacite.org/There is an organization called DataCite that focuses on citing digital objects. They have something called the Metadata Store where you can buy DOIs and assign them to the digital objects in your repository. Increasingly, the quality of a repository will be judged by whether it provides DOIs for its objects and digital preservation for its content. The sponsors of repositories essentially become publishers, and publishers have responsibilities. Publishing is much more than just mounting PDFs or images on the internet; there are many activities that must be carried out to support publishing, if you want to do it right.9 Disciplinary RepositoriesDirectory of disciplinary repositories (Simmons College) = http://oad.simmons.edu/oadwiki/Disciplinary_repositories
Some major disciplinary repositories:SSRN (Social Sciences Research Network)RePEc (Research Papers in Economics)E-LIS (Eprints in Library and Information Science)PMC (PubMedCentral)Ag Econ Search (University of Minnesota)
Now lets talk about disciplinary repositories. There is one directory of them that I know of, and it covers most fields, and its hosted on the Sommons College OA wiki. Some of the major subject repositories include these. 10Disciplinary repository screenshots
Here are screenshots of SSRN and RePec, which I think is pronounced REE Peck.I dont completely understand SSRN. It is starting to act more like a business than a repository. Indeed its owned by a company called Social Science Electronic Publishing, Inc. It may also do some publishing. It also hosts preprints. It uses number of downloads as a metric to measure individual researchers.RePEc is sponsored by the Research Division of theFederal Reserve Bank of St. Louis11 Focus: PubMed Central (PMC) PMC (PubMed Central) launched in 2000 as a free archive for full-text biomedical and life sciences journal articles. PMC serves as a digital counterpart to the NLM extensive print journal collection; it is a repository for journal literature deposited by participating publishers, as well as for author manuscripts that have been submitted in compliance with the NIH Public Access Policy and similar policies of other research funding agencies. Some PMC journals are also MEDLINE journals. For publishers, there are a number of ways to participate and deposit their content in this archive, explained on the NLM Web pages Add a Journal to PMC and PMC Policies. Journals must be in scope according to the NLM Collection Development Manual. Although free access is a requirement for PMC deposit, publishers and individual authors may continue to hold copyright on the material in PMC and publishers can delay the release of their material in PMC for a short period after publication. There are reciprocal links between the full text in PMC and corresponding citations in PubMed. PubMed citations are created for content not already in the MEDLINE database. Some PMC content, such as book reviews, is not cited in PubMed.
What is the Difference between PubMed Central and PubMed?
The basic difference is that PubMed is a database of metadata, and PMC is a database of full-text scholarly articles. The two databases are often confused. PMC has an HTML reader and a classic reader and in many cases the publishers PDFs are also available.
Both PubMed and PMC are made available by the National Center for Biotechnology Information, NCBI, which is part of the U.S. National Library of Medicine. A lot of funding agencies in the bio-medical sciences require that research completed using their funding be made freely available, and PMC is one place where this is often done.13 Data RepositoriesDirectories of Data Repositories
Data repositories (Simmons College, OA Directory) Registry of Research Data RepositoriesDatabib "Databib is a searchable catalog registry / directory/ bibliography of research data repositories."
Data repositories publish much more than just numerical or statistical data. They also publish genomic data, structured textual data, image data, and more. 14 Focus: Dryad Digital RepositoryWorks with journalsRequires use of the CC 0 licenseLocated at http://datadryad.org/Costs $90
DataDryad.org is a curated general-purpose repository that makes the data underlying scientific publications discoverable, freely reusable, and citable. Dryad has integrated data submission for a growing list of journals; submission of data from other publications is also welcome -- http://datadryad.org/Mention CC 0 licenseStarted in North Carolina with grant funding. One of the ideas is that people can use the published data to generate new researchThey can also re-do the experiments and see of they get the same results. 15 Focus: GitHubA collection of software repositoriesUsed for sharing code, programs, softwareHas paid and free options; free option used for open sourceGitHub is the largest code host on the planet with over 19.4 million repositories. Large or small, every repository comes with the same powerful tools. These tools are open to the community for public projects and secure for private projects.
16 DMP = Data management planFrom the Wikipedia article, "Data management plan
Description of the dataHow / When / Where data will be acquiredHow the data will be processedWhat file formats the data will be in, naming conventionsVersion controlMetadataPolicies for access, sharing, and re-useLong-term storage and data managementBudget Review of OAI-PMH
Open-Archives Initiative Protocol for Metadata HarvestingProvides a way to create a "union catalog" of resources in digital repositoriesThe metadata is indexed in WorldCat (including WCL), updated quarterly
It started at the University of Michigan.It doesnt work well for items that are removed. ResourceSync is a prototype replacement. It aims to synchronize metadata with the objects they describe.18ConclusionInstitutional repositories convert libraries into publishers, and this has many long-term legal, ethical, and financial implications.Repositories exist in sort of a digital version of the Wild WestRepositories with strong digital preservation practices and that use and maintain standard identifiers for the digital objects they publish will stand out from others.Most repositories will contain material of secondary or local-only importance, but a few gems will exist here and there.Libraries are competing with scholarly publishers (Odlyzko , 2013).4th bullet point: Ive heard the term publications ghetto used to refer to institutional repositories, specifically referring to green open access articles, which are Word versions of documents or a PDF derivative of such. 19CodaInvestigate the possibility of constructing the worlds first all-scholarship repository (ASR). [...] Conversations are currently ongoing on this matter. The Department of Energy has authorized the Los Alamos National Laboratory (LANL) to build the prototype ASR. SOURCEThis is an initiative of the National Science Communication Institute. It would be centralized and would make things like OAISTER obsolete. In other words, it would centralize all IR content rather than just the metadata.