[IEEE 2011 45th Asilomar Conference on Signals, Systems and Computers - Pacific Grove, CA, USA (2011.11.6-2011.11.9)] 2011 Conference Record of the Forty Fifth Asilomar Conference

Implementation of NICHD diagnostic criteria forfeature extraction and classification of fetal heart

rate signalsShishir Dash,∗ Jolene Muscat,† J. Gerald Quirk,† and Petar M. Djuric∗

∗Department of Electrical and Computer Engineering†Department of Obstetrics, Gynecology and Reproductive Medicine

Stony Brook University, Stony Brook, NY 11790

Abstract—We present a description of a computerized ap-proach to interpretation of fetal heart rate (FHR) patternsthat uses the 2008 Fetal Heart Rate Monitoring Guidelinespublished by the National Institute of Child Health and HumanDevelopment. The study addresses accurate feature extractionthat gives programmatic meaning to heretofore visually adjudgedmorphological changes in FHR that may be indicative of com-promised fetal status. We develop methods for estimation ofbaseline heart rate, detection of morphological changes such asdecelerations, and a novel method for assessing FHR variability.A strong motivation for this effort was to imitate physicianjudgement by following the standardized guidelines as closelyas possible. We demonstrate the accuracy of this expert systemon a database of 30 traces that had been originally labeled bytwo independent physicians.

I. INTRODUCTION

Nowadays electronic fetal heart rate monitoring (EFM) haspermeated obstetrical practice almost completely, as opposedto being used merely for complicated pregnancies when thetechnology was first introduced in the 1960s [1]. In 2002,nearly 85% of live births in the US (3.5 million women)underwent EFM [2]. The technology offers obvious advantagescompared to old methods like periodic auscultation with afetoscope. Typically the clinical recording consists of a fetalheart rate (FHR) signal collected via maternal (abdominal)ECG, fetal ECG acquired by scalp electrodes connected to thebaby’s scalp or Doppler ultrasound signal. Simultaneously, theintra-uterine pressure signal (IUP) is obtained by measuringthe amniotic fluid pressure using a catheter and strain-gaugecombination [1]. However, traditional methods of EFM havealways been prone to a substantial amount of inter- and intra-observer variability. The FHR signal is far from being astationary “well-behaved” signal, and morphological changesassociated with common symptoms of fetal compromise varywidely across subjects. In addition, particularly in the caseof Doppler-data, there are substantial problems associatedwith noise (e.g., movement artifacts) that may hide importantdeviations from baselines. The situation is further complicatedby the lack of agreement of the term “fetal distress”, whichhas led to broad terms like “reassuring” or “non-reassuring”that do not capture, in many cases, fine details of the trace.Hence, very often physicians themselves may not be certainof the overall message from the EFM trace.

Since 1997, the National Institute for Child Health and Hu-man Development (NICHD) has been publishing periodicallyupdated standardized guidelines for the interpretation of theEFM traces and for use in clinical evaluations. The latestversion was published in 2008 [3]. It describes (a) general rulesfor visual assessment of morphological features in the FHR-IUP recording, (b) a guide to qualitative clustering of thesefeatures into reasonably defined categories that are indicativeof fetal health and (c) if-then rules for final diagnosis ofthe trace given the appearance of specific combinations offeatures.

In this paper, we exploit the NICHD rules to build anexpert system for performing consistent diagnoses and foremulation of physician decision making in the clinic. We notethat computerized systems have already been implementedpreviously, e.g. [4], [5]. Building such systems involves featureextractions that give equivalent information as that visuallyadjudged in the clinic. According to the NICHD rules there arefive main informative features for diagnosing fetal health. Theyare the baseline rate, uterine contraction frequency, baselineFHR variability, presence and types of decelerations, andpresence of accelerations. We have developed algorithms toassess each of these as described in the sequel.

In accordance with the NICHD rules, we have also usedsimple rules for categorizing a continuous valued feature (suchas baseline variability) into “types” (such as “marked” or“minimal”). Finally, we perform diagnosis based on knowl-edge of which combinations of feature types occur for anygiven subject. We have tested our algorithm on a database of30 different subjects whose data were collected in the StonyBrook University hospital and independently annotated by twoindependent physicians.

The paper is organized as follows. In the next section wedescribe the features and their extraction from the acquiredsignals. In Section 3, we present the feature categorizationand in Section 4, the diagnostic decision flow. In Section 5,we show results and provide a brief discussion. We concludethe paper with some final remarks in Section 6.

II. FEATURE EXTRACTION

We have recordings from 9 subjects. Each of these record-ings was first segmented into 20-min epochs, and we had a

1684978-1-4673-0323-1/11/$26.00 ©2011 IEEE Asilomar 2011

total of 830 such segments over the 9 subjects. The samplingrate of the data (both FHR and IUP) was 4Hz, and thereforewe had 4,800 samples per segment. Two independent physi-cians labeled 30 segments, and we considered the physicians’annotations as the gold-standard while doing diagnostic per-formance analysis.

A. Preprocessing

Prior to carrying out feature extraction, we performed somepreprocessing to remove various artifacts including ones due tomovement. We used a method that is similar to that employedin [5]. The FHR time series, which were acquired via theDoppler-autocorrelation method with sampling frequency fs =4Hz, were processed to remove so-called “spiky” artifacts.The latter were defined as FHR segments where successiveHR differences greater than 25bpm were detected. Wheneversuch a beat combination was detected, linear interpolation wasperformed between the first detection and the first subsequent“stable” segment, defined as a group of five samples whosebeat-to-beat difference did not exceed 10 bpm. We also kepta record of such interpolated periods, which was used duringfeature extraction. While looking for specific features such asdecelerations, we typically had to search certain sub-segmentsof the 20-min epoch. We used the rule that if the total durationof interpolated periods during any sub-segment exceeded 30%of the sub-segment duration, we ignored the sub-segment fromthe search procedure entirely.

The IUP signal was generally a cleaner signal. It wassmoothed by a simple averaging filter whose length was 17samples. The smoothed FHR and IUP signals were passed tothe feature extraction block.

In the following, we denote the thresholds for various onset,return, and peak/nadir detections as θ. The correspondingtimes of detection are marked by n, and the actual FHR andIUP values, by h and u, respectively. Accelerations are denotedas A, while decelerations by D.

B. Uterine Contractions

The IUP signal was first scaled to values between 0 and100 (percentage scale) and then a reasonable “baseline” IUPwas estimated via mode estimation. We used a Gaussiankernel method to estimate the probability mass function (usingbin centers at {0.5, 1.5, . . . , 99.5}). The kernel widths werecalculated using the method from [6] as follows (where u[n]is the IUP signal and N = 20 minutes):

Kernel width S = 0.9min {σ, 1.4826Mu},

Where, u = {u[1], · · · , u[Nfs]},

σ2 = Variance of u,

Mu = Mean absolute deviation of u.

The u value at which the pmf was maximized was consideredthe baseline bu. A uterine contraction onset was detectedwhenever u[n] exceeded bu by a minimum of θus = 3%,and that time instant was denoted nu

s . For each such onsetcandidate, we detected the return time nu

r . If the duration

0 200 400 600 800 1000 12000

20

40

60

80

100

Time (s)

IUP

(%

)

Fig. 1. An example of uterine contraction detection with onsets and endsannotated by filled circles. The mode was calculated recursively for the secondcontraction as described in the text.

of contraction Lu = (nur − nu

s ) exceeded the thresholdθuL = 185fs, we recalculated the mode by the above procedureand repeated the onset and return detections until we got avalid contraction. Finally, once we detected a valid contraction,we calculated the peak time of the contraction as nu

p . Fordiagnosis, we used the contraction frequency Fu defined asthe number of detected contractions in a 20-minute period asthe feature of interest. An example of contraction detectionis shown in Fig. 1, where the recursive mode computation isalso shown.

C. Baseline rate

Clinically, the baseline is defined as the average heartrate over FHR periods free from episodic deviations suchas accelerations, decelerations and marked variability-periods.However, the episodic deviations are defined with referenceto a pre-calculated baseline FHR, which leads to a problemof definition of the baseline. Despite this, the baseline isestimated without much trouble visually (where doctors areindifferent to its precise values). However, for a programmaticdescription we need to have more concrete rules.

In our approach, the baseline FHR is estimated using awindowed median filtering method. We used simulation meth-ods to measure the performance of median filtering methodsfor various window lengths, and a five minute window lengthwas found to be appropriate for accurate baseline estimation.The key for good estimates is in keeping the window shortenough so that it does not miss important slow changes to theFHR trend (of periods free of episodes) while rejecting shorterepisode-related deviations. The baseline signal is denotedbh[n] and was defined over the same time interval. The featureof interest from diagnostic perspective is the median baselineFHR denoted Bh. An example baseline estimation result canbe seen in Fig. 2.

D. Accelerations

Clinically, accelerations are defined as “visually apparentabrupt increases from baseline”. Once the baseline FHR was

1685

0 200 400 600 800 1000 120060

80

100

120

140

160

180

Time (s)

FH

R (

bpm

)FHRBaseline

Fig. 2. An example of deceleration detection and estimated baseline. Asdesired, the large artifact towards the beginning of the tracing was not treatedas a deceleration.

estimated, we detected the onset times of accelerations as thefirst sample indices nA

s when the FHR h[n] upwardly deviatedfrom bh[n] by at least θAs = 1bpm. For each onset candidate,we detected the return time nA

r and defined the duration asLA = nA

r − nAs . If LA > θAL (= 15fs), we found the peak

of the candidate nAp and the corresponding deviation from the

baseline hAp at the peak location. Typically since the FHR is

not a smooth signal, detecting an obvious peak is difficult.Hence, we detected the first “significant” peak, defined as thefirst local maximum within the top 20th percentile of the seriesof FHR deviations during the acceleration. If there was nosuch local maximum, we simply took the maximum during theacceleration duration. Finally, a valid acceleration was detectedwhen the candidate acceleration satisfied the following threeconditions:

nAp − nA

s < θAp = 30fs,

hAp > θAh = 15bpm,

LA ∈ [15fs, 600fs].

E. Decelerations

Clinically, decelerations are defined as visually apparentabrupt or gradual decreases from baseline. Once the baselineFHR was estimated, we detected the onset times of decel-erations as the first sample indices nD

s when the FHR h[n]downwardly deviated from bh[n] by at least θDs = 1bpm.For each onset candidate, we detected the return time nD

r anddefined the duration as LD = nD

r −nDs . If LD > θDL (= 15fs),

we found the nadir of the candidate nDp and the corresponding

deviation from the baseline hDp at the nadir location. In

order to detect only the first significant nadir, we used aprocedure similar to the one for accelerations. In addition, weobserved that deceleration detection was particularly prone tofalse positives because of a higher degree of noise due toelectrode movement/ drop-off. In such instances, the signalsuddenly would dip below threshold and it could take it sometime to come back to baseline, thus artificially increasing the

0 10 20 30 40 50 60−10

−5

0

5

10

Time(s)

FH

R −

Bas

elin

e (b

pm)

Fig. 3. An example of variability estimation. The middle dashed line showsthe zero value, and the two dot-dashed lines are the crossing thresholds (±θδ).

“abruptness” of the episode. Hence we used another threshold,θDp = 3fs, to differentiate true decelerations from such falseepisodes. In other words, the candidate deceleration had to takeat least 3s from onset to nadir to count as a valid deceleration.An example detection is shown in Fig. 2.

F. Baseline Variability

The variability of the FHR signal is considered one of themost important features for detection of fetal distress. Whereasthere is a rich literature on adult heart rate variability withmore-or-less agreed upon standards of measurement [7], thereis no such agreement for fetal heart rate studies. In fact, the2008 NICHD guidelines make no difference between beat-to-beat and long-term variability, “because in actual practice theyare visually determined as a unit.”

In order to stay close to physician interpretation of vari-ability, and with a view towards keeping feature extraction asnon-parametric as possible, we did not use traditional methodsof estimating variability, such as power spectral densities orentropy measures. Instead we used a simple zero-crossingmethod defined as follows. We first found sub-segments inthe FHR series h = {h[1], . . . , h[Nfs]} which are free ofaccelerations, decelerations and noise (as defined previously).Each such sub-segment was first de-baselined (using the bhvalue) and further divided into non-overlapping one-minutesegments. From the resulting signal hv[n], we calculatedthe number of times the signal went above (resp. below)the thresholds θδ (resp. −θδ), and the result was denotedkv. This was taken as an estimate of the number of FHRcycles around the baseline. If the per minute cycle frequency(= kv) exceeded the clinical threshold (2cycles/min) for avalid variability signal, we calculated the feature of interestas follows. For each detected cycle, we obtained the crest-to-trough range and took the median of all these values as thevariability Vh for the one-minute sub-segment. Finally, in orderto calculate a variability value for the full 20-minute signal,we took the median value of all the Vh’s over that period. We

1686

denoted this as Vh. An example of variability estimation for aone-minute sub-segment is shown in Fig. 3.

III. FEATURE CATEGORIZATION

Clearly, simply finding out numerical values for the abovefeatures is not very helpful for clinical diagnosis. In order tobe of value to physicians, we needed to define appropriatecategories which would map continuous valued features into“types” or “quality” measures. This has also been done usingthe guidelines in [3].

First, we defined the uterine contraction rate to be aTachysystole whenever the contraction frequency Fu exceededa value of 0.5 contractions per minute as calculated over the20-minute segment; otherwise, we categorized it as Normal.

Next, we looked at the baseline FHR value. If Bh was lessthan 110 bpm, we qualified it as Bradycardia, and if it wasgreater than 160 bpm, it was called Tachycardia. Baseline FHRin the intermediate range was considered Normal.

The value of baseline variability Vh was considered Markedif it exceeded a threshold of 25 bpm. Values of Vh between5 and 25 bpm were called Moderate, while those between 2and 5 bpm, Minimal. Absent variability corresponded to thesituation where the variability detection algorithm could notfind a single valid “cycle” in most of the one-minute epochsin the data set (thus making the median Vh null valued). Inclinical practice, when physicians see a so-called “flat-line”trace in the HR record, it is considered that variability isabsent. However, the “flat-line” criterion does not seem tobe strictly used, and we found instances where physicianswould classify a segment as having absent variability evenwhen small fluctuations could be perceived. This was one ofthe reasons why we used the thresholding approach in thevariability algorithm, instead of just looking for zero-crossings.

Accelerations can be classified as either Normal or Pro-longed depending on whether the total duration of the episode(from onset to return) is less than two minutes or not.However, things are slightly more complicated in the caseof decelerations. In clinical practice, three different typesof deceleration-related features are assessed: (a) time untildeceleration nadir, (b) timing of each deceleration with respectto associated uterine contractions, (c) number of decelerationsassociated with uterine contractions and (d) total duration ofeach deceleration. Thus, for the expert system implementation,one first needs to define a set of rules to decide which uterinecontractions are associated with the deceleration. This is donein our implementation by simply finding any contraction thathas at least 25% overlap with the deceleration. In this way,there may be more than one deceleration for some contractionsor vice versa.

The first classification (covering cases (a) and (b)) is shownin the pseudocode in Figure 4. Every detected decelerationis first classified into one of three types (Early, Late orVariable) using this algorithm. We note that there may besome cases where none of the if conditions is satisfied by agiven deceleration (for instance, if the deceleration is gradualbut does not have any contraction associated with it). Since,

1: procedure CLASSIFYDECEL(D,U)2: ndip ← nD

p − nD

s � Time to nadir3: ncoinc ← |n

U

p − nD

p | � Time diff. between peaks4: ndur ← LD � Time from decel onset to return5: if ndip > θdipn then � If dip is gradual...6: if ncoinc ≤ θcoincn then7: dType ← “Early”8: else9: dType ← “Late”

10: end if11: else � If dip is abrupt...12: if hD

p < θ AND tdur ∈ [θDdur1, θDdur2] then � If

its a big dip but has normal duration13: dType ← “Variable”14: end if15: end if16: end procedure

Fig. 4. Algorithm to classify decelerations depending on abruptness ofFHR decrease and timing of deceleration nadir with respect to associatedcontraction’s peak. D is a structure containing deceleration information whileU contains onset, peak and return information for contractions associated withthis deceleration.

clinical guidelines do not explicitly state how to deal with suchcases, we classify such decelerations as type Unknown.

For deceleration-related information of type (c), our goal isto find whether each type of detected deceleration is (in clinicalparlance) Recurrent. We illustrate this with an example. Letus assume that for some FHR trace we find that each of thedetected decelerations is one of three types: Early, Variableor Late. Our program will then find how many Variabledecelerations were associated with contractions. This numberis divided by the total number of contractions detected in thetrace. If this fraction RD

v exceeds a threshold θDR (= 0.5),the program outputs a decision that Recurrent Variable de-celerations were detected. Similarly, the program decides ifRecurrent Late or Recurrent Early decelerations were detectedusing the corresponding fractions RD

e and RDl respectively.

Finally, the program needs to decide if any Prolongeddecelerations were detected. It does it by finding whetherany decelerations had total onset-to-return duration LD greaterthan the threshold θDprol(= 120fs).

IV. DIAGNOSTIC DECISION FLOW

Based on the clustering updates obtained from the proceduredescribed in the previous section, we can use the NICHDdiagnostic criteria to classify a 20-minute trace into one ofthree categories: Category 1 corresponds to Normal traces,Category 2 to Indeterminate, and Category 3 to Abnormal.From clinical perspective, detection of abnormalities is ofparamount importance, followed by Category 2 (where theremay be some evidence of compromise but not convincingenough) and then Category 1. This is our order of decisionmaking in the program version as well.

1687

A. Category 3 conditions

A trace is diagnosed as Category 3 when the followingconditions are satisfied:

1) Absent baseline variability AND

• Any Recurrent Variable OR Recurrent Late decel-erations OR

• Baseline Rate is Bradycardia.

B. Category 2 conditions

If the above symptom-combinations are not present, wecheck conditions for Category 2. When any one or more of thefollowing conditions are satisfied, we categorize the tracing asCategory 2.

1) Baseline rate is Bradycardia AND variability is notAbsent

2) Baseline rate is Tachycardia3) Baseline variability is Minimal4) Baseline variability is Absent AND any Recurrent de-

celerations present5) Baseline variability is Marked6) Presence of Recurrent Variable decelerations AND vari-

ability is Minimal OR Moderate7) Presence of Recurrent Late decelerations AND variabil-

ity is Moderate8) Presence of Prolonged decelerations.

C. Category 1 conditions

The last check is for Category 1 conditions. When all of thefollowing conditions are satisfied, we categorize the tracing asCategory 1:

1) Baseline rate is Normal2) Baseline variability is Moderate3) No Recurrent Variable or Recurrent Late decelerations

detected.

V. RESULTS AND DISCUSSION

We tested our system on a database of 30 20-minute FHR-IUP recordings collected from 9 subjects at the Stony BrookUniversity Hospital. All consent and approval guidelines werefollowed rigorously. Each record was independently labeledby two physicians, and it was observed that for all tracingsexcept one, there were no disagreements in categorizationbetween them. The only record whose diagnosis was disputedwas diagnosed as Category 2 by one physician while theother diagnosed it a Category 3. Because of the dispute,it was agreed to take the gold-standard labeling for thisrecord as Category 2. Table I shows the confusion matrix forthe classification by the program, which shows that 81% ofCategory 1 recordings were detected as Category 1 while 80%of Category 3 tracings were detected as Category 3 by theprogram.

We point out that the full picture of fetal health cannotbe provided by the FHR and IUP signals only. Nevertheless,one should still aim for decreased inter- and intra-observervariability from the information at hand. We note that in

TABLE ICONFUSION MATRIX OF THE EXPERT SYSTEM CLASSIFICATION. ’ES’

STANDS FOR “EXPERT SYSTEM CLASSIFICATION”, WHILE ’PHYSICIAN’DENOTES THE TRUE PHYSICIAN LABELLING.

Physician

1 2 3

ES1 13 0 0

2 3 9 1

3 0 0 4

previous attempts, e.g. the Oxford-Sonicaid system [4], therewas no room for detection of Category 2 readings, when infact, in practice, Category 2 readings comprise a substantialpercentage of cases.

Finally, in our current system, we have not included yetthe detection of sinusoidal rhythms, which are an importantsymptom for diagnosing Category 3 readings. This work is inprogress.

VI. CONCLUSIONS

Effective translation of clinical rules for biomedical signalinterpretation is a strong focus of research in signal processing.With regard to electronic monitoring of fetal heart rate, theproblem has been compounded by the existence of multipleguidelines for interpretation even among the medical com-munity. We developed an expert system that is based on thestandardized NICHD rules and the use of FHR-IUP signals.The rules were used in a systematic way so that diagnosiscriteria were applied consistently. We tested the performanceof the system on a small size of training database and obtainedencouraging results. We expect to improve the performance ofthe system by working with much larger databases. Researchon this is ongoing.

REFERENCES

[1] F. G. Cunningham, K. J. Leveno, S. L. Bloom, J. C. Hauth, D. J. Rouse,and C. Y. Spong, Williams Obstetrics, D. M. Twickler and G. D. Wendel,Eds. McGraw-Hill Medical, 2010.

[2] J. A. Martin, B. E. Hamilton, P. D. Sutton, S. J. Ventura, F. Menacker,and M. L. Munson, “Birth: Final data for 2002,” National Vital StatisticsReport, vol. 52, p. 1, 2003.

[3] G. A. Macones, G. D. V. Hankins, C. Y. Spong, J. Hauth, and T. Moore,“The 2008 National Institute of Child Health and Human Developmentworkshop report on electronic fetal monitoring: Update on definitions, in-terpretation, and research guidelines,” Journal of Obstetric, Gynecologic,& Neonatal Nursing, vol. 37, no. 5, pp. 510–515, 2008.

[4] J. Pardey, M. Moulden, and C. W. Redman, “A computer system for thenumerical analysis of nonstress tests,” American Journal of Obstetricsand Gynecology, vol. 186, no. 5, pp. 1095–1103, 2002.

[5] D. Ayres-de Campos, J. Bernardes, A. Garrido, J. Marques-de-Sa, andL. Pereira-Leite, “Sisporto 2.0: A program for automated analysis ofcardiotocograms,” The Journal of Maternal-Fetal Medicine, vol. 9, no. 5,pp. 311–318, 2000.

[6] D. R. Bickel, “Robust and efficient estimation of the mode of continuousdata: The mode as a viable measure of central tendency,” Journal ofStatistical Computation and Simulation, vol. 73, no. 12, pp. 899–912,Dec. 2003.

[7] M. Malik, “Heart rate variability,” Annals of Noninvasive Electrocardiol-ogy, vol. 1, no. 2, pp. 151–181, 1996.

1688

Documents

[IEEE 2011 45th Asilomar Conference on Signals, Systems and Computers - Pacific Grove, CA, USA (2011.11.6-2011.11.9)] 2011 Conference Record of the Forty Fifth Asilomar Conference