A client engaged HKA Data to deliver content from obituary data. Many obituaries request donations to charities – typically one that is supporting whatever issues the deceased faced. Their intention was to aggregate these names and sell them to the respective charity for fundraising purposes. As is often the case, considerably more value can be realized once the data are captured, cleaned and analysed. The system built by HKA was unique and extremely cost effective; the data captured precipitated a wealth of information.

The Challenge
To isolate the data in the publications, key with 100% accuracy the contents of about 4000 obituaries daily. Select subsets of the data, reformat it per the client’s need and delivery it in a file daily. Obituaries, published by public sources are not encumbered by copyright are free to use if credit is given to the source. Thousands of obituaries are published every day in Canada’s newspapers. Our job was to capture these data and format it per the requirements of the client. Accuracy was paramount, so from a data keying standpoint, the data had to be verified. This is usually done by proofreading; or double key entry (by a different operator). The latter is the standard for the highest accuracy.

Data Capture – an Innovative Solution
The obvious answer is to buy a newspaper and double-key the data needed. Start with 200 newspapers on subscription, locate the needed data and key it twice. The obvious resources: A manager, herds of keyers and a system to organize, reformat and export the data required. We designed an alternative: a highly scalable, 100% accurate system that required 3 people.

The Daily Procedure
A file was made of all the newspapers and their web site address. A program was developed to go to through the list of papers daily, access their web site, navigate to the obituary section and copy all the data into a file stored by HKA. If a web site was unavailable or the address changed, the application was adjusted and rerun. Once on file at HKA; programmatically the file was parsed by obituary and every word in the obituary copy was appended with a word count displayed above it as a superscript. The annotated obituary copy was displayed and a clerk noted the number above each word that had to be extracted from the copy. E.g. the name of the deceased may have been words 3 to 5, the date of death word 9 and the funeral home words 175 to 178. The required data could now be represented as a string of numbers.

The Data Extraction
The string of numbers was used to extract the underlying words from the obituary and stored in the relevant fields. The resulting data were programmatically verified (i.e. dates and times follow certain formats, and funeral homes were checked against a master database of funeral homes that was built by the system). Since no words were keyed, there were no data capture errors.

The Hidden Value
At the time there was no database of obituaries. Death notices were now searchable. The authourities found it very useful as a research tool; alumni associations used it to solicit funds as did countless charities. At a macro level it monitored deaths by area, by chronic disease and it could be used for pandemic surveillance. At its lowest level it could identify spouses and relatives. Funeral choices implied ethnicity and an accurate list of funeral homes and cemeteries was built. These data were also sold to third parties.

The HKA Advantage
With knowledge of what can be done and an innovative approach, this scalable solution was operated by one clerk who oversaw the operation and two data entry operators who merely keyed numbers. Over 4000 letter-perfect obituaries broken down by about 30 fields were delivered daily; the cost – about one-tenth of a typical data capture team. The accuracy – perfect. The client – thrilled.

