16
1 The Ryerson Index INDEXING FROM NEWSPAPER WEBSITES Table of Contents Table of Contents Table of Contents Table of Contents Introduction .................................................................................................................................... 2 Saving the Notices ........................................................................................................................ 4 Ryerson Notice Repository (RNR)............................................................................................ 4 Updating the RNR ....................................................................................................................... 5 Naming Conventions within the RNR....................................................................................... 5 How Often Should I Save Notices? .......................................................................................... 6 Saving Notices from the NewsCorp Website .......................................................................... 7 Saving Notices from an ACM Website................................................................................... 12 Indexing Digital Notices from the RNR ................................................................................. 15 The Windows Snipping Tool .................................................................................................... 16 Version 1 – 25 Jun 2020 – initial document Version 2 – 27 Jun 2020 – revised and simplified ACM access Version 3 – 28 Jun 2020 – new section on how often to save notices, and the importance of expanding “more…” before saving

indexing from newspaper websites Jun2020 › documents › indexing_from_newspaper... · 2020-06-28 · suspension of many of their smaller mastheads until at least the end of June

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: indexing from newspaper websites Jun2020 › documents › indexing_from_newspaper... · 2020-06-28 · suspension of many of their smaller mastheads until at least the end of June

1

The Ryerson Index

INDEXING FROM NEWSPAPER WEBSITES

Table of ContentsTable of ContentsTable of ContentsTable of Contents

Introduction .................................................................................................................................... 2

Saving the Notices ........................................................................................................................ 4

Ryerson Notice Repository (RNR)............................................................................................ 4

Updating the RNR ....................................................................................................................... 5

Naming Conventions within the RNR ....................................................................................... 5

How Often Should I Save Notices? .......................................................................................... 6

Saving Notices from the NewsCorp Website .......................................................................... 7

Saving Notices from an ACM Website ................................................................................... 12

Indexing Digital Notices from the RNR ................................................................................. 15

The Windows Snipping Tool .................................................................................................... 16

Version 1 – 25 Jun 2020 – initial document

Version 2 – 27 Jun 2020 – revised and simplified ACM access

Version 3 – 28 Jun 2020 – new section on how often to save notices, and the importance of

expanding “more…” before saving

Page 2: indexing from newspaper websites Jun2020 › documents › indexing_from_newspaper... · 2020-06-28 · suspension of many of their smaller mastheads until at least the end of June

2

Introduction For more than 21 years, Ryerson had resisted using newspaper websites as a source for indexing. Wherever possible, we used either a copy of the printed paper, or a .pdf of the paper. There were two reasons for this. The first reason was that we found, when checking the print version against the newspaper website, that there were discrepancies between the two sources. Sometimes notices which appeared on the newspaper website were not included in the print edition. Sometimes notices in the print edition did not appear on the website. There appeared to be no logical reason why some notices were missed, but they were – hence the website was considered an unreliable source when compared to the print edition (which, unlike the website, would still be available many years hence.) The second reason developed slowly, as newspapers began to concentrate more and more on their websites. We found that notices were being listed on the website with the “published” date shown being a date on which the paper was not printed. We tracked this down to the newspapers (most particularly APN papers) considering that a notice was “published” when it was added to the website, not when it appeared in print. Because we could not rely on the publication date being correct, we could not index any of these notices. That situation remained the case until April 2020. The COVID-19 worldwide pandemic caused huge disruption to most countries. In Australia (and many other countries), businesses were forced to close, personal movement was severely restricted, and working from home became the norm for a large number of people - all with the aim of stopping the spread of COVID-19. Large-scale business closures, and the reduction in spending by the general populace, had a severe impact on the newspaper industry. Almost overnight, the advertising income stream for most publications dried up. During the second week of April 2020, the two major newspaper publishers (News Corp and Australian Community Media (ACM)) announced the suspension of many of their smaller mastheads until at least the end of June. The stated intention of both publishers was that the "suspensions" would only be temporary. News Corp was the first to break ranks, announcing on 28 May the closure of a number of regional and suburban publications, and the switch of many more to become digital only. Thus a very large number of print mastheads have ceased to exist, with in many cases their final print edition being in the second week of April 2020. At this time, ACM has reiterated its intention to re-open all suspended mastheads at the end of June – but their intentions regarding print v. digital are not yet clear. To compensate for the closure of so many papers, the Ryerson Committee took the decision that, for digital-only papers, we would

1. Save each notice published on the newspaper website in the RNR, and

2. Index from the saved notices.

Page 3: indexing from newspaper websites Jun2020 › documents › indexing_from_newspaper... · 2020-06-28 · suspension of many of their smaller mastheads until at least the end of June

3

Our reasoning was that this switch from print to digital was only the beginning, and eventually most papers would follow suit. If we stuck to our previous position of not indexing from a newspaper website, then Ryerson would wither and die, as the number of notices appearing in print gradually reduced to nil, or very close to nil. We considered our action of saving each notice to the RNR would provide us with a source which we could produce on request from a researcher if, sometime in the future, the particular notice had disappeared from the newspaper website. It could even be possible that sometime in the future, the RNR would be the sole repository of some notices. We also found that NewsCorp’s online database of notices could be searched by newspaper, with search results displayed in chronological order. This meant the process of saving the notices from NewsCorp papers became quite efficient – remember, we are only saving notices from the small papers which have become digital-only; many having zero or one notice per week. ACM has also updated their Tributes website – they now use the www.legacy.com platform, which provides similar searching capabilities, and efficiencies in saving, to the NewsCorp platform. One important point to note with the NewsCorp platform – sometimes only an abbreviated version of the notice is displayed, with the word “more …” at the bottom, eg

Please, ensure you click “More” to expand the notice before you save it. A lot of family information useful to a researcher could be hidden behind “more”, and we want to save it!

Page 4: indexing from newspaper websites Jun2020 › documents › indexing_from_newspaper... · 2020-06-28 · suspension of many of their smaller mastheads until at least the end of June

4

Saving the Notices

To control what could potentially become a very large collection of notices, some standardisation of the process is necessary, both in how the notices are saved, and how they are stored.

Ryerson Notice Repository (RNR)

The RNR is a folder stored in Dropbox and accessible to all indexers on a read-only basis. It contains all the notices we have saved over the years – approximately 2 million – and is also the place where notices saved from newspaper websites will be stored. The RNR has a rigid structure of folders and sub-folders. The standard three-level folder structure is: Newspaper Name Year Month As an example,

Sydney Morning Herald 2019 2019-01 Notices for each day of Jan 2019 2019-02 Notices for each day of Feb 2019 …… 2019.12 Notices for each day of Dec 2019 2020 2020-01 Notices for each day of Jan 2020 etc There is a simplified folder structure for those smaller papers (with less than approx. 100 notices per year) to allow for the fact that monthly folders may quite often be empty. The simplified standard looks like this: Newspaper Name Year As an example,

Kiama Independent 2019 Notices for all of 2019 2020 Notices for all of 2020 etc

Page 5: indexing from newspaper websites Jun2020 › documents › indexing_from_newspaper... · 2020-06-28 · suspension of many of their smaller mastheads until at least the end of June

5

Updating the RNR

Because indexers only have read-only access to the RNR (at 250+Gb, it will not fit into the usual free Dropbox space allocation), then we needed to create a structure to allow indexers to “update” the RNR. We did this by giving each indexer their own RNR update folder within dropbox. This folder is shared by the indexer and those with RNR update access (currently 3 people). Periodically, one of those people with update access will clear out the indexer’s update folder by copying the contents into the RNR. The copy process is such that all sub-folders within the indexer’s update folder are copied directly into the RNR in a single move. Consequently, the folder structure within the indexer’s update folder must mimic EXACTLY the folder structure within the RNR for the paper(s) you are saving, to ensure correct placement of the notices in the RNR. Further details of the naming conventions follow in the next section. It is absolutely crucial that indexers, when creating a new folder within their RNR update folder, create the folder with a name that fits the paper’s naming conventions, otherwise the files copied will not end up where they are supposed to. Individual update folders will be named “Ryerson Digital and (indexer name)”, and indexers will have shared access only to their folder. This is to avoid the possibility of multiple indexers submitting lots of notices at the same time, and filling up other indexers’ Dropbox space allocation.

Naming Conventions within the RNR

There are two sets of naming conventions, to cater for the standard and simplified structures.

1. The Standard Folder Structure Within each monthly folder, we create daily folders with a name in the format yyyy-mm-dd. There will quite often be no notices created for a day, so these folders can be created by the indexer as required. There is no need to create a folder for a day on which there are no notices. Within each daily folder, the notices are saved with a name in the format set out in (3) below.

2. The Simplified Folder Structure Within each yearly folder, we create daily folders with a name in the format yyyy-mm-dd. There will in a large number of cases be no notices saved for a day, so these folders can be created by the indexer as required. Again, there is no need to create a folder for a day on which there are no notices.

Page 6: indexing from newspaper websites Jun2020 › documents › indexing_from_newspaper... · 2020-06-28 · suspension of many of their smaller mastheads until at least the end of June

6

Within each daily folder, the notices are saved with a name in the format set out in (3) below.

3. Naming Each Notice There are three items of information which are vital when we are retrieving a notice following a lookup request:

The name of the deceased The publication date The newspaper

Consequently each notice should be named in the following format:

Newspaper code_publication date_surname (ie with each field separated by an underscore) While it may seem that the newspaper code is redundant information, given that the notice is being stored in a subfolder under a specific folder for the newspaper, including the code is insurance for us to ensure the notices are filed in the correct place. The publication date should be in the format yyyy-mm-dd, to ensure notices are displayed within the folder in chronological sequence. The surname should be in the same case as you would use when indexing (eg SMITHERS, de BORTOLI) In the rare case of these three fields not providing a unique filename, add a suffix of _#n to the filename, where n starts at 2.

How Often Should I Save Notices?

This is entirely up to you, with the proviso it is done at least once each calendar month. Some papers might have only two or three notices in a month, so time can be wasted if the website is checked daily and no new notices found. Other papers might have two or three notices each day, so that a daily or weekly check spreads the load, and doesn’t leave a large job at the end of the month. The choice is yours – after all, you know your papers better than anyone else.

Page 7: indexing from newspaper websites Jun2020 › documents › indexing_from_newspaper... · 2020-06-28 · suspension of many of their smaller mastheads until at least the end of June

7

Saving Notices from the NewsCorp Website

Access to all NewsCorp notices can be gained from ANY of their newspaper websites – all

notices are stored in a common database. Fortunately we are able to search these notices

chronologically by newspaper.

The search process is as follows (using www.dailytelegraph.com.au as a sample starting

paper)

1. Locate the classifieds section of the paper. This can be found in the toolbar

immediately below the masthead:

When the “Classifieds” tag is not showing, click the > at the right of the screen to

uncover further options:

2. Click the “Classifieds” tab to go to the next level of search:

Page 8: indexing from newspaper websites Jun2020 › documents › indexing_from_newspaper... · 2020-06-28 · suspension of many of their smaller mastheads until at least the end of June

8

3. Click the “Tributes” tab to go to the next level:

From this screen, you will need to conduct three separate searches – one for death

notices, one for funeral notices and one for obituaries. The process for each is identical.

4. Click the “Death Notices” tab, and the following screen will appear:

Rather annoyingly, this screen has two important failings. Firstly, it tries to be too clever

by saying something like “the Daily Telegraph is a Sydney paper, so I’ll put ‘Sydney’

into the location field” – which has the unfortunate result of only showing notices which

include the word “Sydney”. Secondly, it shows a list of recent notices from ALL papers

– but it doesn’t show the filter fields to allow you to select a particular paper.

To correct these problems, you need to carry out the following steps in the order shown

– if not, your location correction will be ignored.

Page 9: indexing from newspaper websites Jun2020 › documents › indexing_from_newspaper... · 2020-06-28 · suspension of many of their smaller mastheads until at least the end of June

9

5. To find the filter which allows you to select a specific newspaper, you need to click on

the funny little symbol immediately to the left of the GO button. This will bring up a pop-

up panel like this:

6. Click the “All Publications” tab, and scroll through the list until you find the paper you

wish to index:

Page 10: indexing from newspaper websites Jun2020 › documents › indexing_from_newspaper... · 2020-06-28 · suspension of many of their smaller mastheads until at least the end of June

10

7. Now correct the location problem by clicking on the drop-down menu on the Location

box and selecting the entry at the top of the list – “All Locations”. The pop-up panel

should now look like this:

8. Click the “Update Search” button. This will bring up a list of all notices of the required

type for the selected paper – but not in chronological order! At the top of the returned

search results, you will see this:

which shows the number of notices found (4 in this example), the number of notices

displayed per page (20), and the current sort sequence (“Relevance”, whatever that

means!) To sort the notices into chronological sequence, click this funny little symbol

which appears between “relevance” and “Results”, and select “sort by date”. Clicking

on the bottom half of the symbol will sort the notices into descending chronological

sequence, which is what we want (ie the most recent at the top).

9. You now have ALL the notices of the selected type in descending chronological order.

It now becomes a simple case of checking the publication date for any notices

published since you last indexed this paper, and saving any new notices.

10. Repeat steps 4-8 for Funeral Notices, and again for Obituaries.

Page 11: indexing from newspaper websites Jun2020 › documents › indexing_from_newspaper... · 2020-06-28 · suspension of many of their smaller mastheads until at least the end of June

11

This process might appear complicated at first, but after a few uses it will become second

nature. The website could obviously be improved to make searching easier, but at this stage

we have to work with what we have.

Remember, it is absolutely imperative to follow these steps in the correct order – no short

cuts, otherwise you will miss notices – guaranteed!

As you get familiar with the process, you might want to use the date-pickers to set a particular

date range for your results. Personally, I find it faster to sort by date and only look at the top

of the returned list of search results.

Page 12: indexing from newspaper websites Jun2020 › documents › indexing_from_newspaper... · 2020-06-28 · suspension of many of their smaller mastheads until at least the end of June

12

Saving Notices from an ACM Website

ACM has outsourced the website storage and display of its notices to www.legacy.com. This

is a huge help to us, as we can now easily locate all the ACM papers we want, from a single

screen.

At this stage, the legacy.com site has notices dating back to the second half of April 2020, ie

shortly after the newspapers were suspended.

The situation with ACM is also fluid at this time – we do not know which papers will be coming

back in print at the end of June, and which will come back as digital only. These instructions

are being prepared on the basis that there will be at least some digital-only ACM papers.

However, it is important to note that these instructions should not be used if a paper has

a print version, for the reasons detailed on page 2.

To save the notices for a particular paper, follow these steps (which use the Goulburn Post

as an example):

1. Go to https://www.legacy.com/search-international-newspaper-obituaries/#australia –

the screen should look like this:

Page 13: indexing from newspaper websites Jun2020 › documents › indexing_from_newspaper... · 2020-06-28 · suspension of many of their smaller mastheads until at least the end of June

13

2. Page down to the newspaper you want and click on the link. The screen should look

like this:

This screen is divided into two parts. The top section lists an index to the most recent 21

notices alphabetically by name, while the bottom section details every notice, in reverse

chronological order (ie latest first).

Page 14: indexing from newspaper websites Jun2020 › documents › indexing_from_newspaper... · 2020-06-28 · suspension of many of their smaller mastheads until at least the end of June

14

3. Ignore the top list of names in boxes, and scroll down to the “Recent Notices”.

Here we have what appears to be all the notices in the database (starting from 27 April). It

is now a simple matter to snip each notice, and file it according to the naming conventions

described earlier.

As an example, file GP_2020-06-19_WELSH.jpg would look like this:

It is important to snip the newspaper name and publication date with each notice – this is

our proof of “publication”.

Page 15: indexing from newspaper websites Jun2020 › documents › indexing_from_newspaper... · 2020-06-28 · suspension of many of their smaller mastheads until at least the end of June

15

Indexing Digital Notices from the RNR

Having saved the notices, we now need to index them.

To continue using the Ryerson update process without having to make significant changes,

we will continue to rely on the “publication date” as listed with each notice, and included in

the filename for each notice. However, this introduces a further complication for papers

where the pre-suspension publication frequency was not daily.

The objective is not to increase the workload of indexers more than is absolutely necessary,

in particular by not requiring the submission of a NIL return for days on which there are no

notices published.

Now that notices will have a “publication date” of the date when the notice went onto the

website, we can no longer schedule the expected arrival of files to happen at regular

intervals, as in the past (on,say, every Wednesday, and not on any other day). Notices for

what was a weekly paper can now be published on any weekday (for simplicity, I am

assuming no Saturday or Sunday “publishing” – it remains to be seen whether this is a valid

assumption or not.)

So to minimise the extra work for indexers, we have separated all digital-only papers into

two groups – those which were published on FOUR or MORE days each week, and those

which were published on THREE or FEWER days each week.

Those papers in the FOUR or MORE group will require a NIL day to be submitted for any

day Mon-Sat on which a digital notice is not published.

Those papers in the THREE or FEWER group will not require a NIL day to be submitted for

any day on which a digital notice is not published.

As all indexers covering the digital-only papers were previously indexing the print version,

you will know into which group your paper fits. If in doubt, you can check the newspaper

page (via the “Newspaper Coverage” link on the website) to see the pre-suspension

publication frequency.

Page 16: indexing from newspaper websites Jun2020 › documents › indexing_from_newspaper... · 2020-06-28 · suspension of many of their smaller mastheads until at least the end of June

16

The Windows Snipping Tool By mid-2020, the majority of indexers are using Windows 10, which has a very simple snipping tool as part of the standard OS. Where possible, indexers are encouraged to use this to save notices – but if you have another preferred snipping tool, keep using it – we are concerned about the final destination, not the journey!. The simplest way to invoke the Snipping Tool under Win10 is to press Windows+Shift+S keys simultaneously. An alternative method is to save it to your Windows taskbar – that way, it is accessible when required via a single mouse-click. To do this,

1. Click the Windows Start icon (bottom left)

2. Scroll through the list of apps until you find “Snipping Tool” or “Snip & Sketch”

3. Right-click on the app name

4. Select “pin to taskbar” (if not present, select “More”, then select “pin to taskbar”)

You will now see the little circular snipping icon in your taskbar. It looks like this:

To save a notice using the snipping tool,

1. Locate the notice

2. Click the snipping tool icon in the taskbar, or use the Windows+Shift+S keys

3. Click “New” – the window will go grey, and the cursor will change to a +

4. Outline the notice to be saved

5. Click the “Save As” icon

6. Select the destination folder, and a name for your file

7. Click “Save”

8. Close the snipping tool