BEHAVIORAL AND EMOTIONAL SCREENING SYSTEM STUDENT FORM …

THE BEHAVIORAL AND EMOTIONAL SCREENING SYSTEM _ STUDENT FORM AS A

PREDICTOR OF BEHAVIORAL OUTCOMES IN YOUTH

AN ABSTRACT

SUBMITTED ON THE FOURTEENTH DAY OF JULY 2016

TO THE DEPARTMENT OF PSYCHOLOGY

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

OF THE SCHOOL OF SCIENCE AND ENGINEERING

OF TULANE, I.INIVERSITY

FOR THE DEGREE

OF

DOCTOR OF PHILOSOPHY

hryn M. Jones, M.S, M.A.

APPROVED:

Chair

Constance Patterson. Ph.D.

AN ABSTRACT

Early identification and intervention is key to decreasing the short- and long-term

negative outcomes associated with behavioral and emotional difficulties in youth.

Universal screening in schools has been found to be an effective and proactive means of

identifying youth at-risk for or currently experiencing behavioral and emotional

difficulties (Burke et al., 2012). It is imperative that schools have access to measurement

tools that are capable of making accurate predictions regarding youth outcomes that can

inform tailored prevention and intervention efforts. One such tool is the BASC-2

Behavioral and Emotional Screening System – Student Form (BESS SF; Kamphaus &

Reynolds, 2007). Although support for the predictive validity of the BESS SF overall risk

score was found, examinations of classification accuracy for suspensions and absences

call its effectiveness at predicting negative behavioral outcomes into question as the

BESS SF was a better predictor of which students were not at risk for negative behavioral

outcomes than of which students were at risk for such outcomes. Initial support for the

utility of alternate behavioral outcomes (e.g., Major Discipline Citations, Positive

Behaviors) was found but concerns over the reliability of the teacher-collected outcome

data merit further investigation. Two BESS SF domain-specific factors

(Inattention/Hyperactivity and School Problems) were found to predict behavioral

outcomes, indicating room for improvement in the precision of BESS SF predictions. At

this time, caution is urged regarding the use the BESS SF to identify low income African

American students in need of prevention and intervention efforts until further validation

can be completed.

Key words: universal screening, BESS SF, predictive validity, classification accuracy, domain-specific factors, behavioral outcomes, African American youth

BESS-SF AS A PREDICTOR OF BEHAVIOR

THE BEHAVIORAL AND EMOTIONAL SCREENING SYSTEM _ STUDENT FORM AS A

PREDICTOR OF BEHAVIORAL OUTCOMES IN YOUTH

A DISSERTATION

SUBMITTED ON THE, FOURTE,ENTH DAY OF JULY 2016

TO THE DEPARTMENT OF PSYCHOLOGY

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

OF THE SCHOOL OF SCIENCE AND ENGINEEzuNG

OF TULANE LINIVERSITY

FOR THE DEGREE

OF

DOCTOR OF PHILOSOPHY

Chair

hryn M. Jones, M.S, M.A.

APPROVED:

Constance Pattbrson, Ph.D.

© Copyright by Kathryn M. Jones, 2016 All Rights Reserved

ii

TABLE OF CONTENTS

LIST OF TABLES iv

Chapter

I. THE BEHAVIORAL AND EMOTIONAL SCREENING SYSTEM – STUDENT FORMASAPREDICTOROFBEHAVIORALOUTCOMESINYOUTH

1

Evidence Supporting the Use of the BESS Student Form as a Universal Screener

4

The BASC-2 Behavioral and Emotional Screening System (BESS) 5

Criterion-Related Validity of BESS SF 7

Factor Structure of the BESS SF 17

Current Study 24

II. METHODS 29

Participants 29

Procedure 29

Measures 30

BESS Student Form 30

Behavioral Outcome Variables 32

Suspensions 32

Absences 32

Specific Behavioral Outcomes 33

4

iii

Major Discipline Citations 34

Minor Discipline Citations 34

Positive Behaviors 34

III. RESULTS 35

Data Screening 35

Descriptive Analyses 36

Aim One: Predictive Validity of the BESS SF Overall Risk Score 37

Aim Two: Classification Accuracy of the BESS SF 38

Absences 39

Suspensions 39

Aim Three: Predictive Utility of the Four-Factor Bifactor Model of the BESS SF

40

Absences 42

Suspensions 42

Minor Discipline Citations 42

Major Discipline Citations 43

Positive Behaviors 43

IV. DISCUSSION 44

Limitations 56

Implications and Future Directions 59

TABLES 71

APPENDIX 80

LIST OF REFERENCES 82

iv

LIST OF TABLES

Table 1. Descriptive Statistics 71

Table 2. Correlations Between Demographic, Predictor, and Outcome Variables 72

Table 3. Prediction of Absences by Overall BESS SF Risk Score 73

Table 4. Prediction of Minor Discipline Citations by Overall BESS SF Risk Score 73

Table 5. Prediction of Number of Suspension by Overall BESS SF Risk Score 73

Table 6. Prediction of Days Suspended by Overall BESS SF Risk Score 74

Table 7. Prediction of Major Discipline Citations by Overall BESS Risk Score 74

Table 8. Prediction of Positive Behaviors by Overall BESS SF Risk Score 74

Table 9. Classification Accuracy Using BESS SF Overall Score 75

Table 10. BESS SF Bifactor Model Standardized Weight Estimates 76

Table 11. Prediction of Absences by the BESS Domain-Specific Factors 77

Table 12. Prediction of Number of Suspensions by the BESS Domain-Specific Factors

77

Table 13. Prediction of Days Suspended by the BESS Domain-Specific Factors 78

Table 14. Prediction of Minor Discipline Citations by the BESS Domain-Specific Factors

78

Table 15. Prediction of Major Discipline Citations by the BESS Domain-Specific Factors

79

Table 16. Prediction of Positive Behaviors by the BESS Domain-Specific Factors 79

1

I. THE BEHAVIORAL AND EMOTIONAL SCREENING SYSTEM – STUDENT

FORM AS A PREDICTOR OF BEHAVIORAL OUTCOMES IN YOUTH

By emphasizing the importance of behavioral and emotional skill development,

schools can help prepare students to function effectively within the larger society, while

giving them tools that strengthen their academic performance at the same time (Zins,

Bloodworth, Weissberg, & Walberg, 2007). One important component of promoting

behavioral and emotional health is effectively and efficiently attending to students who

are at-risk for behavioral and emotional problems or are already experiencing difficulties

(Burke et al., 2012; Walker, Nishioka, Zeller, Severson, & Feil, 2000). As early

identification and intervention is key to decreasing the short- and long-term negative

outcomes associated with behavioral and emotional difficulties in youth, it is important

that schools implement methods that best facilitate timely identification of youth in need

of additional assessment and services.

Traditionally, referrals for social, emotional and behavioral services in schools

have relied on teacher reports and the use of disciplinary records such as office discipline

referrals (ODRs) and suspension data (King & Reschly, 2014; Renshaw et al., 2009;

Walker, Cheney, Stage, & Blum, 2005). One major problem with traditional

identification methods is that they utilize a wait-to-fail approach where students are not

identified as in need of intervention until they are already presenting problem behaviors

(Burke et al., 2012; Schanding & Nowell, 2013; Walker, 2010). Additionally, use of

traditional methods may over-identify those exhibiting overt problem behaviors while

2

missing students struggling with internalizing symptoms (Walker et al., 2005). More

recently, there is a call for increased use of universal screening using standardized tools

designed to proactively identify students in need of additional supports to promote

positive behavioral and emotional functioning (e.g., Glover & Albers, 2007; Walker et

al., 2000; Walker et al., 2005). Universal screening allows educators to take a proactive

approach by identifying youth who are at-risk for behavioral and emotional difficulties as

well as those who may already be experiencing problems but have not come to the

attention of school staff (Albers & Kettler, 2014; Burke et al., 2012; Walker et al., 2000).

In order to promote school use of universal screening, it is imperative that schools be able

to select measurement tools that have been thoroughly evaluated psychometrically, are

appropriate to use with the demographic population of the school, and fit with the

practical needs of the school (e.g., time constraints, administrative ease, cost; Glover &

Albers, 2007; Young, Sabbah, Young, Reiser, & Richardson, 2010).

One such tool is the BASC-2 Behavioral and Emotional Screening System

(BESS; Kamphaus & Reynolds, 2007). The BESS was designed to assess behavioral and

emotional functioning in youth, using selected items from the Behavioral Assessment

Scale for Children, 2nd Edition (BASC-2; Reynolds & Kamphaus, 2004) that reflect the

domains of Inattention/Hyperactivity, Internalizing Problems, School Problems, and

Personal Adjustment. The overall risk score produced by the BESS demonstrates

concurrent and longitudinal relationships with important student outcomes including

increased school disciplinary actions (e.g., office discipline referrals [ODRs] and

suspensions), decreased academic performance, and decreased academic engagement

(e.g., teacher ratings of work effort and cooperation; Chin, Dowdy, & Quirk, 2013;

3

Kamphaus, DiStefano, Dowdy, Eklund, & Dunn, 2010; King & Reschly, 2014; King,

Reschly, & Appleton, 2012; Renshaw et al., 2009).

Although the overall risk score has some utility, it provides limited information

regarding the specific nature of an individual’s behavioral and emotional difficulties that

could guide next steps in assessment and allow for the use of targeted interventions.

Factor analytic studies provide preliminary data to support a hierarchical factor structure

of the BESS (Chen, West, & Sousa, 2006; Naser, Hitti, & Overstreet, 2016; Schanding &

Nowell, 2013; Wiesner & Schanding, 2013), but to date, the predictive utility of those

factors or whether they predict outcomes over and above the general factor, has not been

studied. Therefore, an examination of the unique contributions of lower order factors as

predictors of external criteria is an important next step in the evaluation of the BESS as a

universal screening tool. The current study sought to examine the utility of a hierarchical

factor structure applied to the student report form of the BESS in predicting behavioral

outcomes. Although there is initial support for this type of factor structure (Naser et al.,

2016), there have been no studies examining the predictive ability of the factor scores

over and above the overall risk score produced by the BESS. This study provides an

important initial investigation the utility of domain-specific factors as predictors of

behavioral outcomes.

The current study utilized student reports because students are able to provide

important information related to how they act and emote in different situational contexts

as opposed to teachers, who only interact with students within the school context

(Achenbach, McConaughy, & Howell, 1987; King & Reschly, 2014). Additionally, the

exclusive use of teacher measures may result in less consistent identification of students

4

with internalizing problems than externalizing problems (Achenbach et al., 1987). It is

also possible that ethnic and racial disproportionality in special education services may be

reduced through the use of self-report universal screening tools by removing the

influence of implicit teacher bias on referrals (Raines, Dever, Kamphaus, & Roach,

2012). Although research has shown that valuable information can be gained from the use

of the BESS TF (e.g., Chin et al., 2013; King & Reschly, 2014; Wiesner & Schanding,

2013), the BESS SF may provide unique information about student risk status. It is,

therefore, imperative that the BESS SF be fully validated as a tool for obtaining

information regarding youth behavioral and emotional functioning in order to provide a

complete view of overall personal functioning.

The following literature review provides an overview of the BESS, summarizes

studies examining the predictive ability of the BESS SF overall risk score, and reviews

results from factor analytic studies. Based on those findings, rationale for examining

predictive ability of domain-specific factor scores is presented, followed by a description

of the proposed study.

Evidence Supporting the Use of the BESS Student Form as a Universal Screener

In order to make informed decisions, school personnel need access to information

that allows them to evaluate how universal screening tools fit their individual needs and

contextual considerations, including the relevance of results to guiding identification and

intervention as well as the practicalities of administration and scoring (Feeney-Kettler,

Kratochwill, Kaiser, Hemmeter, & Kettler, 2010; Glover & Albers, 2007). Although the

current research base regarding the BESS is limited, the research that does exist

demonstrates its promise as a universal screening tool. This section will review the

5

characteristics of the BESS before summarizing the currently available research

regarding the predictive validity and factor structure of the BESS Student Form (SF).

The BASC-2 Behavioral and Emotional Screening System (BESS)

The BESS (Kamphaus & Reynolds, 2007) is a brief (5 – 10 minutes), broadband

screener of behavioral and emotional risk for youth populations. There are

complimentary parent, teacher, and student report versions. Informants use a 4-point

Likert scale ranging from 1 (never) through 4 (almost always) to indicate the frequency

that the student experiences different behavioral and emotional problems using items

taken from the parent, teacher, and student versions of the Behavior Assessment System

for Children –Second Edition (BASC-2; Reynolds & Kamphaus, 2004). In order to create

the BESS SF, unrotated principal components analyses (PCAs) were performed on the

items composing the four composite scales of the BASC-2 Self-Report of Personality

(BASC-2 SRP; Internalizing Problems, Inattention/Hyperactivity, Personal Adjustment,

and School Problems). Items chosen for inclusion were those that best represented each

composite scale according to the PCA while also covering the range of content

represented by each composite. The internal consistency of the items representing each

composite was also evaluated and additional items were added until all dimensions

reached a minimum of .80. As a result, the BESS SF is a self-report measure consisting

of 30 items (six Inattention/Hyperactivity, six School Problems, eight Personal

Adjustment, and ten Internalizing Problems).

Despite the presence of items representing four separate constructs, the official

BESS manual does not provide instructions on scoring or interpreting any subscales;

rather, for each version of the BESS, an overall risk score in the form of a T-score is

6

obtained that signifies overall behavioral and emotional risk status (Kamphaus &

Reynolds, 2007). Individuals scoring at or below 60 are classified as normal risk and are

not considered at-risk for behavioral or emotional difficulties. Those scoring between 61

and 70 are considered at elevated risk and those scoring 71 or above are considered at

extremely elevated risk. The severity of an individual’s current risk status serves to guide

next steps in assessment and intervention.

Although the usability and acceptability of the BESS to schools, parents, and

other stakeholders have not been formally assessed, it seems that the BESS meets many

of the standards set by Glover and Albers (2007) regarding the evaluation of universal

screeners. With the limited time and financial resources available in schools to address

mental health concerns, it is very important that universal screeners be able to meet the

needs of a school in a cost and time efficient manner (Glover & Albers, 2007). As a brief

broadband measure, the BESS is designed to provide information regarding overall

behavioral and emotional difficulties and the student version can be administered by

teachers in classroom settings in approximately 5 to 10 minutes (Kamphaus & Reynolds,

2007). There are many benefits associated with the use of this tool, including the

availability of forms for multiple informants allowing for comparisons of risk status

across settings, Spanish-language and audio versions, and availability of electronic

scoring programs. As an efficient provider of risk status information that can be used to

guide further assessment and intervention, the BESS is a strong candidate for an

appropriate and practical universal screener within school settings.

7

Criterion-Related Validity of BESS SF

Another important factor that schools must consider when choosing a universal

screener is the psychometric adequacy of the screener (Glover & Albers, 2007). BESS SF

norms were developed using the BASC-2 SRP normative sample, which was designed to

be representative of the population of the United States (Kamphaus & Reynolds, 2007).

Initial evaluations of the BESS SF conducted during test development found it to exhibit

strong internal consistency and test-retest reliability (Kamphaus & Reynolds, 2007). The

overall BESS SF risk score was strongly correlated in the expected directions with

BASC-2 and the Achenbach System of Empirically Based Assessment – Youth Self

Report (ASEBA YSR; Achenbach & Rescorla, 2001) scales representing internalizing

and externalizing concerns (Kamphaus & Reynolds, 2007). Based on this information,

the BESS SF is both a reliable and valid method of assessing youth behavior and

emotional risk, but continued validation of the BESS SF by outside researchers is needed

to provide further evidence for its psychometric adequacy.

Criterion-related validity is how accurately an assessment tool predicts

performance on another related tool or measure (Glover & Albers, 2007). There are two

types of criterion-related validity, concurrent validity and predictive validity. Concurrent

validity concerns how well an assessment tool predicts current performance on related

outcome measures (Glover & Albers, 2007; Michel, Schultze-Lutter, & Schimmelmann,

2014). Although universal screeners are important as tools of early identification of

behavioral and emotional difficulties that can inform prevention and intervention efforts

employed by schools, it is also important to identify youth who are already engaging in

8

risky behavior or experiencing emotional difficulties so that appropriate interventions can

be instituted in a timely manner.

One investigation of the utility of the BESS SF as a tool for identifying those

currently engaging in problematic behaviors was conducted by Dowdy, Furlong, and

Sharkey (2012). They examined the concurrent validity of the BESS SF with a primarily

Latino (64.8%) sample of 3,331 students (51.5% female, 48.5% male) in eighth, tenth,

and twelfth grades in four school districts in California. Engagement in problematic

behaviors was assessed using the California Healthy Kids Survey (CHKS; California

Department of Education, 2010), which measured the frequency of eight specific

behaviors in the past 30 days (i.e., cigarette use, alcohol use, binge drinking, marijuana

use, and skipping school out of fear) or the past year (i.e., fighting at school, being

injured or threatened with a weapon at school, and contemplation of suicide in past year).

The survey also assessed feelings of chronic sadness (i.e., 2 weeks of sadness in past

year). Scores on the BESS SF and chronic sadness measure served as mental health

indicators and were represented dichotomously (i.e., no sadness reported/sadness reported

and normal/elevated risk) and were used to predict problematic behaviors, which were

also represented dichotomously (i.e., presence or absence of each specific behavior).

Students categorized as at-risk based on the BESS score were more likely than

students not at-risk to engage in all eight problematic behaviors assessed even after

controlling for the presence of chronic sadness (Dowdy et al., 2012). In fact, odds ratios

revealed that at-risk status was a stronger predictor than chronic sadness of seven of the

eight problematic behaviors with contemplation of suicide being the lone exception. At-

risk status was an even stronger predictor when used in combination with chronic

9

sadness; students who were categorized both as at-risk and chronically sad were more

likely to engage in problematic behaviors than youth with either elevated risk or chronic

sadness alone. This study highlights the utility of the BESS SF in the identification of

youth who are already engaging in risky behavior in order to begin intervention services

or make appropriate changes to services already in place. This study could be improved

by examining the utility of the BESS in making predictions of future student outcomes

rather than looking strictly at concurrent behaviors. The addition of outcomes reported by

someone other than the student could improve this study by decreasing the possible

impact of social desirability on responding.

Predictive validity has been identified as particularly important in relation to

universal screening as it concerns how well an assessment tool predicts future

performance (Glover & Albers, 2007; Michel et al., 2014). As the goal of behavioral and

emotional universal screening is to identify youth at risk for future difficulties, it is

important that schools use measures that are proven in their ability to identify youth who

later go on to experience said difficulties as well as correctly identifying those youth who

are not in need of services. Classification accuracy is an important component of

predictive validity and indicates the degree to which the assignment of test takers to

specific categories is accurate and avoids false positives and false negatives. Four

statistical concepts are routinely used to evaluate classification accuracy: sensitivity,

specificity, positive predictive power/value, and negative predictive power/value (Glaros

& Kline, 1988; Glover & Albers, 2007; Hill, Lochman, Coie, Greenberg, & The Conduct

Problems Prevention Research Group, 2004; Levitt, Saka, Romanelli, & Hoagwood,

2007; Streiner, 2003).

10

Sensitivity is “the capacity of an assessment instrument to yield a positive result

for a person with the attribute of interest” (Glaros & Kline, 1988, p. 1014). Specificity is

“the capacity of an assessment instrument to yield a negative result for a person without”

the attribute of interest (Glaros & Kline, 1988, p. 1014). By examining sensitivity and

specificity with respect to a specific behavioral outcome, it is possible to determine how

well the BESS identified youth who did and did not engage in problem behaviors.

Although this is useful in the examination of validity of a measure when outcomes are

clearly known, screening to determine risk status for potential outcomes does not involve

such concrete attributes (Glaros & Kline, 1988). Instead, at the time of the screening what

outcome group members will experience is unknown. Since provision of prevention and

intervention efforts is contingent upon screening outcomes, it is important to determine

the likelihood that youth have been correctly sorted into their respective groups, which

can be assessed by examining positive and negative predictive powers/values (Glaros &

Kline, 1988). Positive predictive power “is the likelihood that a person with a positive

test finding actually has the predicted attribute” (Glaros & Kline, 1988, p. 1016).

Negative predictive power “is the likelihood that a person with a negative test sign does

not” have the predicted attribute of interest (Glaros & Kline, 1988, p. 1016). By

examining these statistics, researchers and schools can examine what proportion of kids

identified as at-risk on the BESS go on to exhibit problem behaviors and what proportion

of kids identified as normal risk do not go on to exhibit problem behaviors.

Due to the high risk of negative outcomes for at-risk youth who are not accurately

identified by screening tools, some authors have proposed that schools may prefer to use

measures that tend to over-identify rather than under-identify at-risk youth (Glover &

11

Albers, 2007; Levitt et al., 2007). Consistent with this purpose, acceptable psychometric

standards for classification accuracy of screening instruments is generally lower than for

diagnostic tests, ranging from 70% through 80% (American Academy of Pediatrics,

2012; Glover & Albers, 2007). Glover and Albers (2007) argue that low positive

predictive power and high sensitivity may be ideal for screening instruments as this

decreases the risk of missing youth who are in need of prevention and intervention,

although it also increases the risk of identifying youth who are not in need. As a screener

meant to be used as part of a comprehensive system rather than a diagnostic tool, the

BESS SF was designed with this standard in mind and is meant to err on the side of over-

identifying youth (Kamphaus & Reynolds, 2007; King et al., 2012). As this may result in

increased need for follow-up assessments and potential misapplication of limited

resources to youth who do not need services, it is important for schools to consider their

ability to deal with the consequences of over-identification before choosing to use the

BESS SF as part of their system of identification (Glovers & Albers, 2007). With these

considerations and when used as part of a comprehensive screening and intervention

system, the BESS SF can be considered a valid tool for screening for at-risk youth with

limited false negatives despite its tendency to over-identify youth who are at-risk of

negative outcomes (Glover & Albers, 2007; Kamphaus & Reynolds, 2007; Levitt et al.,

2007). Even though the importance of validity is widely recognized, studies examining

the predictive validity of the BESS SF are few and far between. The studies that do exist

demonstrate the utility of the BESS SF overall risk score as a predictor of future

behavioral and emotional difficulties although there is room for improvement in its

identification of youth who are in need of services.

12

As part of a larger evaluation of the BESS system as universal screening tools,

King et al. (2012) examined the predictive validity of the BESS SF on student behavioral

outcomes with elementary-aged students. The study was conducted in a rural community

with 207 students attending third through fifth grade at one elementary school. The

authors do not provide specific information on the ethnicity, gender, or socioeconomic

status (SES) for the youth who completed the BESS SF, but the overall sample was

primarily European American (64.7%), gender was evenly split (52.4% female), and

68.3% were from low SES homes as indicated by receiving free or reduced lunch. BESS

data was collected 10 weeks into the school year. Behavior outcome data (e.g., ODRs,

attendance, suspensions) was collected midway through the school year. Academic

performance data was assessed using benchmark measures of oral reading fluency

obtained in November of the school year of administration.

Using Spearman’s rho correlations, BESS SF T-scores were positively correlated

with ODRs and negatively correlated with attendance and oral reading fluency. The

overall risk score on the BESS SF was not significantly correlated with suspensions.

Next, similar to Dowdy et al. (2012), the authors collapsed the elevated and extremely

elevated groups into a single at-risk group, as their goal was to evaluate the utility of the

BESS as a screener to identify students who were at risk of negative outcomes (King et

al., 2012). Nonparametric Independent Samples Kruskal-Wallis Tests were used to

predict behavioral and academic outcomes using BESS SF classification as normal or at-

risk. At-risk youth were found to have significantly higher rates of ODRs and

significantly lower oral reading fluency and attendance than youth who were not at-risk.

13

Suspension rates approached significance with at-risk youth tending to have higher rates

of suspension.

To provide information on classification accuracy, the authors used the presence

of an office disciplinary referral as the indicator of problematic behavior (King et al.,

2012). Of those students who did not demonstrate the problematic behavior by mid-year,

73.184% were identified as being at normal risk at the beginning of the year (specificity).

Similarly, of those students who were identified as being at normal risk at the beginning

of the year, 90.345% did not demonstrate the problematic behavior by mid-year (negative

predictive power). These results indicate that the BESS does a good job of identifying

students who are unlikely to go on to develop problematic behaviors. However, the

classification accuracy of the BESS is not as strong for students who develop problematic

behaviors. For example, of those students who demonstrated problematic behavior at

mid-year, just 57.576% were identified as being at elevated risk at the beginning of the

year (sensitivity). Furthermore, of those students who were identified as being at elevated

risk for problem behaviors, only 28.358% went on to demonstrate problematic behavior

by mid-year (positive predicative power). Despite the fact that the BESS SF predicted

ODRs, attendance, and oral reading fluency in the expected directions, additional analysis

of ODRs as a behavioral outcome indicated that the overall score of the BESS SF was

better at identifying those youth who were not at-risk of negative behaviors than

identifying those who are at risk (King et al., 2012).

In their examination of the BESS SF and TF as predictors of student outcomes,

Chin et al. (2013) expanded on King et al. (2012) by using the BESS SF to predict

outcomes over a full school year. They used a sample of 694 sixth and seventh grade

14

students (age 11 through 14; 46.5% female). Participants were primarily Latino (88.3%)

and attended a single middle school in Southern California. The BESS SF and TF were

administered universally to all students in the fall. Dichotomized behavioral outcomes

(e.g., ODRs, suspensions, unsatisfactory behavioral grades) represent data from the full

academic year. Logistic regressions were used to examine the relationship between BESS

scores and behavior outcomes.

The overall T-score on the BESS SF was found to significantly predict all

behavioral outcomes (Chin et al., 2013). Specifically, as the BESS SF score increased, so

did the likelihood of students receiving one or more ODRs and/or one or more

suspensions during the examined school year. Additionally, higher overall scores were

associated with unsatisfactory behavioral grades reflecting poorer work habits and

cooperation. When divided into risk status groups based on their overall BESS SF risk

score, students in the extremely elevated risk group had the worst outcomes (i.e., ODRs,

suspensions, unsatisfactory behavioral grades), followed by those in the elevated group,

then those in the normal risk group.

Although data on classification accuracy was not specifically provided by Chin et

al. (2013), information provided in the manuscript was used to evaluate classification

accuracy of the BESS SF. The presence of an office discipline referral and total number

of suspensions during the course of the school year were used to indicate the

demonstration of problematic behavior. Similar to the findings reported by King et al.

(2012), the BESS SF showed the best accuracy in classifying students at normal risk for

both outcomes; for ODRs, 89.231% of students who did not demonstrate problematic

behavior were identified as normal risk at the beginning of the year (specificity) and

15

79.700% of students identified as normal risk at the beginning of the year did not

demonstrate problematic behavior (negative predictive power); for suspensions, a

specificity of 85.107% and a negative predictive power of 94.300% were obtained.

Accuracy for at-risk students was not as strong; for ODRs, just 32.107% of students who

demonstrated problematic behavior over the course of the year were identified as being at

elevated risk at the beginning of the year and only 50.018% of students identified as

being at elevated risk at the beginning of the year went on to demonstrate problematic

behaviors; for suspensions, a sensitivity of 32.485% and positive predictive power of

14.252% were obtained. Similar to King et al. (2012), although the BESS SF predicted

ODRs, suspensions, and unsatisfactory behavior grades in the expected directions,

additional analysis of ODRs and suspensions as behavioral outcomes indicated that the

overall score of the BESS SF was better at accurately identifying those youth who were

not at-risk of negative behaviors than identifying those who are at-risk.

Although the studies discussed above demonstrate initial evidence of the BESS

SF as a predictor of behavioral outcomes for youth, the current evidence is less than

definitive. When ODRs served as the behavioral outcome, the BESS SF was actually

better at identifying youth who were not at-risk than youth who were. While this may be

due to characteristics inherent in the BESS itself, classification accuracy has to do with

both the test and the outcome (Glaros & Kline, 1988; Streiner, 2003). If the outcome is

not the most reliable or if the specific cut point is not clinically meaningful, then

classification accuracy can be appear to be worse than if another outcome and/or cut

point were chosen. Therefore, in addition to considering the validity of the BESS SF

itself, we need to examine what we should be using as markers of problem behaviors. It is

16

possible that the tendency of the BESS SF to be better at classifying “normal” youth than

at-risk youth is a remnant of the decision of King et al. (2012) and Chin et al. (2013) to

use the presence of least one office discipline referral as a marker for problem behavior

and, in the case of Chin et al. (2013), the receipt of one or more suspension. A cut point

based on one occurrence of a behavior or an event may be meaningful for a variable like

suspensions, which is indicative of a serious infraction, but may not be meaningful for

variables like ODRs or school absences, which could be associated with less serious risk.

In other words, a present/absent cut point for such variables may be too low to truly be

reflective of problem behavior. Instead, it may be necessary to identify a more “clinically

significant” cut point to indicate the presence of the behavior at a meaningful level. In the

current study, this will be examined with respect to absences. In the state where the

participating school is located, students who miss more than 10 days of school are

considered ineligible for promotion to the next grade (Louisiana Department of

Education, n.d.). Therefore, a cut point of 10 absences was chosen as a “clinically

significant” indicator of absences.

In addition to examining ways to improve classification accuracy, the current

study also expands the literature on predictive validity by examining the association

between the BESS and important future outcomes and strategies for strengthening those

associations. One possible strategy for doing so involves improving the strength of the

BESS SF as a predictor variable. Although findings from Dowdy et al., (2012) suggest

that although BESS SF was a strong predictor of problematic behaviors, it became even

stronger when more specific information about sadness was added to it. This

demonstrates that there is room for improvement in the predictive ability of the BESS

17

overall score. Recent thinking about the hierarchical structure of the BESS could make it

possible to use the BESS itself to refine the predictor variable. Through the addition of

the domain-specific factors, the ability of the BESS SF overall risk score to predict

outcomes may be improved upon. The current study will examine the utility of using

lower order, domain-specific factors to predict experiencing negative behavioral

outcomes to enhance the predictions made using the overall BESS SF risk score.

Factor Structure of the BESS SF

As the BESS SF was specifically designed to reflect the four main composites of

the BASC-2 SRP (Internalizing Problems, Inattention/Hyperactivity, School Problems,

Adaptive Behavior; Kamphaus & Reynolds, 2007), it is possible that the BESS has an

underlying factor structure that reflects these components beyond the unidimensional

factor represented by the overall score. In fact, studies examining the factor structure of

the BESS SF and TF have found evidence supporting the existence of an underlying

factor structure (Dowdy et al., 2011; Harrell-Williams, Raines, Kamphaus, & Dever,

2015; Naser et al., 2016, Wiesner & Schanding, 2013). These studies have used a variety

of statistical methodologies to investigate the factor structure of the BESS, including

nonhierarchical factor analytic techniques and more complex methodologies designed to

explore hierarchical structures.

Using a combination of exploratory factor analysis and confirmatory factor

analyses, Dowdy et al. (2011) tested the factor structure of the BESS SF using three

different samples, two from the BASC-2 norming sample (N = 994 and N = 1,466, both

representative of U.S. population, ages 6 through 11) and an independent verification

sample (N = 273, 81.4% Latino, ages 7 through 12). An exploratory factor analysis

18

(EFA) was completed with the first sample, investigating model solutions with one

through six factors. A unidimensional model was identified and found to have adequate

factor loadings, however, analyses most strongly supported a four-factor solution

consistent with the BASC-2 scales from which items were drawn (Personal Adjustment,

Inattention/Hyperactivity, Internalizing Problems, School Problems). In order to achieve

best fit, three items (9, 11, and 22) were removed, resulting in a final model based on 27

rather than 30 BESS SF items. Next, two confirmatory factor analyses (CFAs) were

conducted, examining the fit of the four-factor solution using the second sample from the

norming group and the independent verification sample. Both CFAs supported the

previously identified four-factor structure, with a range in goodness of fit from acceptable

to good in all three analyses. Internal consistency for each factor was found to be

acceptable. Dowdy et al. (2011) concluded that the BESS successfully measures the

constructs it was designed to assess. As many of these analyses were in the acceptable

range, additional research is warranted, especially research that employs independent

samples (Dowdy et al., 2011).

Answering the call for continuing validation studies of the factor structure of the

BESS SF, Harrell-Williams et al. (2015) used CFAs to evaluate the four-factor solution

previously found by Dowdy and colleagues with three high school samples, one from

southern California (N = 1,688, 94% Latino), one from central Georgia (N = 1,857,

72.8% African American), and one subsample of the BESS national norming sample (N

= 1,261, representative of U.S. population). The authors tested a one-factor model and a

four-factor model using all 30 BESS SF items, unlike the final model identified by

Dowdy et al. (2011) that excluded three items. Harrell-Williams et al. (2015) chose not to

19

remove these items after an item analysis conducted using all three samples combined did

not find that it was warranted. Using chi-square difference tests, the four-factor solution

was determined to fit the BESS SF better than the unidimensional model for all three

samples. Consistent with Dowdy et al. (2011), these four factors aligned with the original

BASC-2 SRP composite scales from which the BESS SF items were drawn; Internalizing

Problems, Hyperactivity/Inattention, School Problems, and Personal Adjustment.

However, only the Internalizing Problems factor demonstrated adequate expected a

posteriori over person variance (EAP/PV) reliability estimates across all three samples

using the .80 threshold recommended by Lance, Butts, and Michels (2006).

In response to these findings, Harrell-Williams et al. (2015) endorsed the

continued use of the overall BESS SF score for universal screening rather than the

individual factor scores due to concerns about the reliability and usefulness of the factors.

The authors are concerned that by relying on the predictive ability of the individual

factors rather than the overall risk score, schools may overly narrow their screening to

look at specific concerns. As a result, they worried that schools may be less likely to

identify all at-risk youth and/or use interventions that are too limited in scope based on

the “diagnosis” provided by the BESS SF (Harrell-Williams et al., 2015). Given the

preliminary nature of the findings and lack of guidance for schools to switch to a new

method of using the BESS SF for screening, it is reasonable and appropriate to suggest

that practitioners continue to use the BESS SF overall risk score to guide prevention and

intervention decisions.

However, researchers should view these findings as a way to expand our

understanding of the BESS and explore whether domain-specific factors can provide

20

more subtle information that could help guide further assessment and intervention. Rather

than dismissing the possibility that factors may provide additional useful information in

the identification of at-risk youth and treating these factor reliabilities as evidence that

factor scores should not be utilized to predict behavior, it is imperative that researchers

actually test the ability of the factors to predict student outcomes. It is possible that

predicting outcomes using factor scores could serve to refine identification and

intervention procedures as part of a comprehensive system. In order to facilitate this

process, further research should be conducted examining the utility of domain-specific

factors as predictors of student outcomes, which may be useful in guiding interventions

and further assessment as well as choosing outcomes for progress monitoring.

Furthermore, the strength of the Internalizing Problems factor, an area that is often under-

identified by traditional methods of screening, merits additional investigation.

Researchers must investigate whether or not alternate factor structures, such as

hierarchical and bifactor models, can produce more reliable factor structures.

When researchers and practitioners are interested in both a general construct as

well as domain-specific constructs, bifactor models may provide a better structural

approach than non-hierarchical or unidimensional models (Brunner, Nagy, & Wilhelm,

2012; Chen et al., 2006; Naser et al., 2016; Reise, 2012). Bifactor models allow modeling

of constructs that are thought to be both hierarchical and multifaceted. These models

consist of a general factor explaining common variance between all items as well as

orthogonal lower order, domain-specific factors that include items with more

conceptually specific content clusters (Brunner et al., 2012; Chen et al., 2006; Reise,

2012). This combination is ideal for psychological measurement tools designed to assess

21

psychological constructs that are theoretically multidimensional, like the behavioral and

emotional risk construct assessed by the BESS SF (Brunner et al., 2012; Chen et al.,

2006; Naser et al., 2016; Reise, 2012, Wiesner & Schanding, 2013). By utilizing a

bifactor model to examine the BESS SF, it is possible to separate out the common

variance accounted for by the overall factor from the unique variance associated with

each specific factor (Chen et al., 2006; Reise, 2012; Wiesner & Schanding, 2013).

Theoretically, this allows more specific predictions of student outcomes to be made

through the examination of the predictive validity of the factors controlling for the effects

of the other factors.

As universal screening of behavior in youth aims to use a single tool to identify

risk associated with related but unique areas of behavioral and emotional functioning,

bifactor modeling is a theoretically appealing option (Chen et al., 2006; Naser et al.,

2016; Wiesner & Schanding, 2013). In fact, bifactor models have been used to model

another commonly utilized universal screening tool for youth behavioral and emotional

functioning, the Strengths and Difficulties Questionnaire (SDQ; Goodman, 1997). Using

a nonclinical population of Hungarian youth ages 8 through 13, Kóbor, Takács, and

Urbán (2013) found that a five-factor bifactor model yielded excellent fit statistics. The

model consisted of a higher order factor, labeled “General Problems”, that reflected

overall behavioral functioning, and five lower order factors that generally corresponded

with the five scales of the SDQ (i.e., Emotional Symptoms, Behavioral Problems,

Hyperactivity, Peer Problems, and Prosocial). By examining the factor scores in addition

to the overall score on the SDQ, the authors believe that a stronger picture of the

22

behavioral and emotional strengths and weaknesses of individual children can be

developed than by relying on the overall score alone.

At this time, the application of bifactor modeling to the BESS has been limited to

two studies. Wiesner and Schanding (2013) explored non-hierarchical unidimensional

and multidimensional models, as well as hierarchical multidimensional models, to

identify the most appropriate and meaningful structure of the BESS TF. A total of 1,885

first through fifth grade students attending a suburban Southern school district were

screened using the BESS TF. CFAs testing a unidimensional model, a four-factor model,

and a higher order CFA model were not found to fit the data adequately. In contrast, two

other measurement models did adequately fit the data, one of which was a bifactor model

with three factors. This model consisted of a general factor, Maladaptive Problems, and

three domain-specific factors: Internalizing Problems, Externalizing Problems, and

combined Low Adaptive Skills/School Problems. Although a non-hierarchical four-factor

model also fit the data adequately, the authors deemed the bifactor model as the best fit

due to the inclusion of a global factor accounting for substantial covariation among

screener items that corresponds with the overall T-score obtained by the BESS TF, as

well as the existence of the orthogonal (i.e., uncorrelated) specific factors that represent

distinct concepts being measured by the BESS separate from the overall risk factor. It is

imperative that researchers examine the overall and specific factors as predictors of youth

behavioral outcomes in order to assess their relationship and usefulness in identifying at-

risk youth within the screening process.

These results indicate that traditional CFA may not be the best statistical approach

to examine the factor structure of the BESS TF (and, by extension, the BESS SF;

23

Wiesner & Schanding, 2013). Instead, additional factor analyses should be completed

beyond EFAs and CFAs, including a wider variety of statistical techniques. Specifically,

bifactor models are a theoretically ideal fit to psychological assessment tools such as the

BESS as they show support for an overall factor that can be used for identification of all

at-risk youth as well as providing more specific information regarding the particular areas

in which individual children may be in most need of intervention through the existence of

domain-specific factors (Chen et al., 2006; Naser et al., 2016; Wiesner & Schanding,

2013).

At this time, only one known study has been conducted applying bifactor

modeling to the BESS SF. Naser et al. (2016) utilized BESS SF scores obtained at the

beginning of the academic year from 893 African American fourth through eighth grade

students attending two urban public schools in a Southeastern state. The authors tested

three model types: unidimensional, multidimensional, and hierarchical, multidimensional.

Each of the multidimensional models utilized a factor structure based upon the four

BASC-2 SRP composite scales (Inattention/Hyperactivity, School Problems, Personal

Adjustment, and Internalizing Problems) from which the items were originally drawn.

The unidimensional model demonstrated a poor fit, while the fit of the nonhierarchical

multidimensional model was acceptable. However, the bifactor model representing a

hierarchical, multidimensional structure had the best fit out of the tested models. This

model consisted of an overall, general factor, which corresponds with the overall risk

score, and four orthogonal domain-specific factors, which correspond with the BASC

SRP scales from which the BESS SF items were drawn. These findings extend the

previous research on the factor structure of the BESS SF by applying advanced statistical

24

models that are more theoretically appropriate to evaluating the factor structure of

universal screeners than unidimensional and nonhierarchical multidimensional models

(Brunner et al., 2012; Chen et al., 2006; Naser et al., 2016; Reise, 2012; Wiesner &

Schanding, 2013). The next step is to look at the predictive validity of the overall and

domain-specific factors in order to determine their utility in predicting behavioral and

emotional difficulties in youth. In fact, behavioral predictions made using domain-

specific factor scores may provide more insight into the type of behavioral outcomes that

students with different risk presentations are vulnerable for, facilitating the selection of

appropriate prevention and intervention efforts.

Current Study

The BESS SF has proven to be a strong candidate for universal screening in

schools, allowing for the identification of youth in need of intervention to proactively

work towards prevention of negative behavioral outcomes. However, we need more

information on classification accuracy, especially within African American populations.

The majority of the previous research regarding the BESS SF has been conducted with

Latino and European American samples. Although a differential item function analysis

conducted by Harrell-Williams et al. (2015) did not find evidence of measurement bias

based on ethnicity, socioeconomic status, or language proficiency, a lack of differential

item functioning does not rule out differential predictive validity for different groups of

students (c.f., Helms, 1992; 2006). Therefore, it is imperative that the BESS SF be fully

investigated with diverse populations as the decisions made based on it can greatly

impact youth short- and long-term outcomes. Additional studies examining the predictive

validity and classification accuracy of the BESS SF with African American populations

25

are greatly needed in order to determine its appropriateness as a screener with this

population. By using a similar sample to Naser et al. (2016), this study continues to

expand the research base by examining the predictive validity of the BESS SF for a

sample of African American youth.

The first aim of the current study was to examine the predictive validity of the

BESS SF overall risk score and behavioral outcomes via longitudinal associations. The

current study sought to replicate past studies by utilizing commonly used variables.

Consistent with the work of King et al. (2012) and Chin et al. (2013), this study examined

the ability of the BESS SF to predict student suspensions and absences. Based on prior

findings, it was hypothesized that students demonstrating higher risk scores on the BESS

would exhibit higher rates of absences and suspensions than those with lower risk scores.

In order to extend past research, the current study also examined the ability of the

BESS SF to predict student behavior based on citations for minor and major discipline

violations (Educational and Community Supports, 2016; Gion, McIntosh, & Horner,

2014) as well as citations for positive behavior. Prior research has only examined total

number of discipline referrals, which tells us nothing about the severity of the behavior

the child demonstrated. Depending on school policies, which can vary a great deal, an

ODR could be for anything from a uniform violation to fighting. The current study

sought to use more specific behavioral outcomes by examining the severity of student

behaviors and including both positive and negative behaviors. To do so, data gathered as

part of the school-wide positive behavioral intervention and supports system (SWPBIS)

was utilized. SWPBIS is intended to promote positive student outcomes and decrease

behavior problems through the emphasis of reinforcement for positive behaviors instead

26

of punishment for inappropriate behaviors (e.g., Bear, 2008; Positive Behavioral

Interventions and Supports [PBIS], 2016). Teachers recorded instances of positive

behaviors and violations of school rules electronically using Kickboard (Kickboard,

2016) with students receiving positive or negative points for specified behaviors (e.g.,

High Academic Achievement = positive points, Off Task = negative points). Students

were then able to “purchase” rewards with their earned points. Using this data had two

important advantages: 1) it was already being gathered resulting in no additional burden

to teachers and 2) it is clinically meaningful to the school as the behaviors represent

school-endorsed rules and values.

One way to group behavioral citations into meaningful categories is based upon

the severity of the demonstrated behavior (Educational and Community Supports, 2016;

Gion et al., 2014). Behavioral violations can be considered Major Discipline Citations

(Bullying/Taunting, Stealing, Lying, Willful Disobedience) or Minor Discipline Citations

(Off Task, Not Following Directions, Gossiping/Ribbing), with Major offenses being

generally consistent with behaviors for which students can be suspended according to the

statutes of the state in which the study was conducted (Child Trends & EMT Associates,

Inc., 2016). By using more specific behavior outcomes than simple counts of ODRs, the

BESS SF may gain more predictive power. It was hypothesized that students with higher

risk scores on the BESS SF would receive more Major and Minor Discipline Citations

than those with lower risk scores. As Major Discipline Citations are granted for more

extreme, less common behaviors than Minor Discipline Citations, Major Discipline

Citations are likely a more clinically meaningful indicator of negative behavioral

27

outcomes. Therefore, it was also predicted that the BESS SF will be a better predictor of

Major Discipline Citations than Minor Discipline Citations.

In addition to improving our understanding of problematic student behavior, it is

beneficial to examine the ability of the BESS SF to predict positive student behaviors

(Kaufman et al., 2010). As students are expected to engage in positive behaviors such as

achieving academically, participating in class, and being a leader, failing to exhibit

positive behaviors may represent another way to conceptualize risk. In these cases, youth

with lower BESS SF scores might engage in more positive behaviors while those with

elevated risk may engage in fewer positive behaviors. It was hypothesized that higher risk

scores on the BESS SF will be associated with lower rates of citations for Positive

Behavior than higher risk scores.

The second aim of the study is to examine classification accuracy by calculating

the sensitivity, specificity, positive predictive power, and negative predictive power of

the BESS SF. This study sought to utilize more “clinically significant” outcomes than

past studies by using cut points to indicate severity of problem behavior. As suspensions

are indicative of engaging in serious problematic behavior, a cut point of one was

selected for suspensions, which is consistent with prior research (Chin et al., 2013). For

absences, a cut point of 10 was selected as that is the point at which students become

ineligible for promotion to the next grade in the state in which the study was conducted

(Louisiana Department of Education, n.d.). It was hypothesized that measures of

classification accuracy would demonstrate improvements upon those obtained by King et

al. (2012) and Chin et al. (2013) due to the use of more clinically significant cut points.

Despite this improvement, the use of risk group status based on the overall risk T-score is

28

predicted to produce sensitivity and positive predictive power scores below acceptable

limits (e.g., 70%) due to problems with the lack of precision inherent in the overall score

on the BESS SF.

The third aim of the proposed study was to investigate the predictive utility of the

four-factor bifactor model developed by Naser et al. (2016). Although past research

found that the BESS SF overall risk score performed better at identifying youth who are

not at risk of poor behavioral outcomes than those who are (Chin et al., 2013; King et al.,

2012), it is possible that its predictive abilities may be enhanced through the use of its

underlying factor structure. The current study sought to examine the predictive

relationship between the domain-specific factors of the BESS SF (Internalizing Problems,

Inattention/Hyperactivity, School Problems, and Personal Adjustment) and student

outcomes over and above the general risk score produced by the BESS. It was

hypothesized that the BESS SF factors would predict behavioral outcomes above and

beyond what is predicted by the overall BESS SF score.

29

II. METHODS

Participants

Participants for this study were drawn from archival data collected at an urban

public charter school in a Southeastern state during the 2013 – 2014 academic year. Out

of the 447 students enrolled across kindergarten through eighth grade that school year,

97.3% identified as African American, 1.6% as Latino/Hispanic, 0.4% as Caucasian,

0.4% as Hawaiian/Pacific Islander, and 0.2% as Multi-Racial (New Orleans Parents’

Guide, 2014). The majority of the student body (94.2%) qualified for free or reduced

lunch. All students in fourth through eighth grade completed the BESS SF as part of the

schools’ fall universal screening data collection. A total of 230 (92% response rate)

students completed the BESS SF. After removing participants with missing data (see

below for more information), 220 students were included in the final sample (52.273%

female, age 8 – 15, M = 11.430, SD = 1.666). Students from fourth through eighth grade

were equally represented in the sample (20.909% in fourth grade, 17.273% in fifth,

19.091% in sixth, 20.909% in seventh, and 21.818% in eighth).

Procedure

The participating school provided archival data for use in this study. As the data

were de-identified and there was minimal risk to participants, the Institutional Review

Board of the sponsoring university deemed this study exempt from human subjects

review. Ethical standards endorsed by the American Psychological Association guided

the collection and handling of obtained data.

30

All students in fourth through eighth grade completed the BESS SF as part of the

school’s fall universal screening program in October 2013. Passive consent procedures

were utilized. Administration of the BESS SF occurred approximately two months into

the school year. Within one week after the initial administration, school representatives

attempted to administer the BESS SF to all students who were absent on the day of initial

administration; those who were unable to complete the measure within this timeframe

were not included in this study.

Students completed the measure in a group format during their enrichment period;

surveys were read aloud and students followed along. Student responses were recorded

on BESS SF Scantron forms, which were reviewed for completion and readability. These

forms were scored using BESS software, which produced an overall BESS SF risk score

for each student and scores for each individual item. Archival student records including

suspensions, absences, and student behavioral outcomes were provided at the conclusion

of the 2013 – 2014 school year.

Measures

BESS Student Form. The BESS SF (Kamphaus & Reynolds, 2007) is a 30-item

broadband screener of behavioral and emotional risk for youth populations. Items

representing four composite scales (six Inattention/Hyperactivity, six School Problems,

eight Personal Adjustment, and ten Internalizing Problems) were taken from the BASC-2

SRP to create the BESS SF (Reynolds & Kamphaus, 2004). Students respond to items on

a 4-point Likert scale ranging from 1 (never) through 4 (almost always) to indicate the

frequency of different behaviors and emotions. An overall risk score in the form of a T-

score is obtained that signifies overall behavioral and emotional risk status. Individuals

31

scoring at or below 60 are classified as being at “normal risk” for behavioral or emotional

difficulties. Those scoring between 61 and 70 are considered at elevated risk and those

scoring 71 or above are considered at extremely elevated risk.

Initial evaluations of the BESS SF conducted during test development found

strong internal consistency (split-half reliability = .90 - .93) and test-retest reliability (.80;

Kamphaus & Reynolds, 2007). The overall BESS SF risk score was strongly correlated in

the expected directions with BASC-2 and the Achenbach System of Empirically Based

Assessment – Youth Self Report (ASEBA YSR; Achenbach & Rescorla, 2001) scales

representing internalizing and externalizing concerns (Kamphaus & Reynolds, 2007).

Analysis revealed strong internal consistency (� = .878) of the BESS SF for the

current sample. Risk status groups (e.g., Normal, Elevated, and Extremely Elevated) were

determined based on T-scores as recommended by Kamphaus and Reynolds (2007), with

185 participants being classified as normal risk (84.091%), 28 classified as elevated

(12.727%), and 7 classified as extremely elevated (3.182%). The elevated and extremely

elevated categories were collapsed into one “At-Risk” group (n = 35; 15.909% of the

sample) following the procedure used by Dowdy et al. (2012) and King et al. (2012).

To investigate the underlying factor structure of the BESS –SF, four domain-

specific factors were constructed that align with the BASC-2 composites from which the

items were drawn (Internalizing Problems, Inattention/Hyperactivity, School Problems,

Personal Adjustment; Kamphaus & Reynolds, 2007). These factors also align with the

structure identified by Naser et al. (2016). Specific factors were computed by summing

the items representing each composite. All factors demonstrated acceptable internal

consistency: Personal Adjustment (� = .744; possible score range = 8 – 32),

32

Inattention/Hyperactivity (� = .705; possible score range = 6 – 24), Internalizing

Problems (� = .816; possible score range = 10 – 40), School Problems (� = .811;

possible score range = 6 – 24). T-scores were computed individually for each factor for

use in regressions analyses. Please see the discussion of Aim Three in the Results section

for additional discussion of the factor development process.

Behavioral outcome variables. Behavioral outcome variables included

suspensions, absences, and behavioral indicators derived from the school-wide positive

behavioral intervention and supports system (SWPBIS) employed by the study school. As

the goal of this study was to predict longitudinal behavior outcomes, only outcome data

representing quarters two through four were used in analyses, thereby restricting

behavioral outcomes to those occurring after BESS SF administration.

Suspensions. Suspensions reflect out-of-school suspensions. They were utilized

in three ways: two continuous variables were computed to represent total number of

suspensions (Number of Suspensions) and total days of suspensions (Days Suspended); a

dichotomous variable was created to represent students with no suspensions and students

with one or more.

Absences. Absences were utilized in two ways: a continuous variable was

computed to represent total number of absences and a dichotomous variable was

computed to represent students with excessive absences (10+) and students with non-

excessive absences (less than 10). Students who miss more than 10 days of school are

considered ineligible for promotion to the next grade (Louisiana Department of

Education, n.d.).

33

Specific Behavioral Outcomes. Specific behavioral outcomes were derived from

the Kickboard (Kickboard, 2016) electronic recording system as part of the school-wide

positive behavioral intervention and supports system (SWPBIS). Kickboard allows

teachers and administrators to instantly award students points for positive behavior and

subtract points for inappropriate behavior. The system tracks students’ total points,

allowing students to earn rewards over time with the intention of improving overall

student behavior.

Teachers recorded student behavior on Kickboard each day. A total of 57

behaviors were tracked by the study school, representing both appropriate (e.g., Doing

More Than Asked, High Academic Achievement) and inappropriate behaviors (e.g.,

Causing Distractions/Disturbances, No Homework, Throwing). The 57 behaviors were

categorized into Positive Behavior as well as Minor and Major Discipline Citations based

on the classification system that has been incorporated into the School-Wide Information

System Suite (SWIS), an electronic system designed by Horner and colleagues at the

University of Oregon to assist schools in implementing positive behavioral interventions

(see Appendix for categorizations; Educational and Community Supports, 2016). Major

Discipline Citations (e.g., Bullying/Taunting, Stealing, Lying, Willful Disobedience)

were generally consistent with suspendable offenses according to the statutes of the state

in which the study was conducted (Child Trends & EMT Associates, Inc., 2016;

Educational and Community Supports, 2016; Gion et al., 2014). Minor Discipline

Citations (e.g., Off Task, Not Following Directions, Gossiping/Ribbing) generally related

to disrespect, disruption, and dress code violations.

34

Behaviors tracked on Kickboard were examined for fit with Major versus Minor

categories and a newly created category for Positive Behavior. Two variables were

removed because they did not represent student behavior (Parental Involvement, Signed

Paycheck), one variable because it was not defined (LTS), and one because the school did

not start tracking this variable until the fourth quarter (Kindness, Empathy, and Respect).

The three behavior categories are described below. Descriptions do not include internal

consistency estimates because these behaviors represent discrete events that are not

assessing a specific construct (e.g., Gray, Litz, Hsu, & Lombardo, 2004; Netland, 2001).

Major Discipline Citations. Major Discipline Citations includes 13 behaviors that

demonstrate severe behavioral violations (e.g., Bullying/Taunting, Stealing, Lying,

Willful Disobedience). Instances of the 13 behaviors during quarters two through four

were summed to represent frequency of Major Discipline Citations; total scores were

converted to z-scores for use in analyses.

Minor Discipline Citations. Minor Discipline Citations includes 29 behaviors that

demonstrate minor behavioral violations (e. g., Off Task, Not Following Directions,

Gossiping/Ribbing). Instances of the 29 behaviors during quarters two through four were

summed to represent frequency of Minor Discipline Citations; total scores were

converted to z-scores for use in analyses.

Positive Behaviors. Positive Behaviors included 11 behaviors representing

positive school values (e.g., High Academic Achievement, Exemplary Effort, High

Enthusiasm). Instances of the 11 behaviors during quarters two through four were

summed to represent frequency of Positive Behaviors; total scores were converted to z-

scores for use in analyses.

35

III. RESULTS

Data Screening

Data screening was completed prior to statistical analyses in order to assess the

overall accuracy of the data and results (Tabachnick & Fidell, 2007). In terms of missing

data, screening revealed that ten students lacked attendance data (4.348%), and eight of

these students also lacked at least one quarter of Kickboard data, suggesting that they did

not complete the school year at the study school. As this represented less than 5% of

cases, a decision was made to remove these cases from the sample (Tabachnick & Fidell,

2007). An examination of the BESS data revealed that 20 students (9.091%) were

missing at least one item score, with eight of those cases missing two item scores; there

was no obvious pattern to the missing data. The BESS scoring system was used to

compute overall T-scores and risk group status prior to estimation of missing data as the

software is capable of doing so with up to two missing scores (Kamphaus & Reynolds,

2007). To compute factor scores, mean substitution for each missing BESS item was

chosen to estimate the missing data (Tabachnick & Fidell, 2007), allowing for the

calculation of domain-specific factor scores with no missing items.

Outliers (e.g., values over three standard deviations from mean) were identified

for composite variables (e.g., BESS scores, behavioral outcomes). Outlier cases were

maintained in the data set through the use of winsorizing procedures (Field, 2013).

Specifically, values over three standard deviations from the mean were replaced with the

next highest obtained value. After two rounds of winsorizing, no variables exhibited

36

scores farther than three standard deviations from the mean. Despite these efforts, there

continued to be some evidence that some cases were demonstrating stronger than

expected influence over the predictive values (e.g., high leverage, high Mahalnobis

distances; Field, 2013). As the goal of this study is to investigate the predictions of

students who are at-risk for behavioral outcomes, including those with high levels of risk

and extreme behavioral outcomes, the decision was made to not delete any further

participants from the data set.

Descriptive Analyses

Means, standard deviations, and the observed range for continuous variables are

presented in Table 1. Although several variables demonstrated significant skew and/or

kurtosis (p < .05), the impact of skewness and kurtosis is reduced in sample sizes over

200 (Tabachnick & Fidell, 2007) and therefore, no corrections were made.

During quarters two through four, participants were absent on average 7.709 days

(SD = 5.639) with 32.727% of students being reported as absent from school 10 days or

more in this time period. In comparison, 15.9% of fifth through eighth grade students in

public schools within the city were reported to have missed 10 or more days of school

during the 2013 – 2014 school year (Sims & Vaughn, 2014). Participants averaged 0.982

suspensions (SD = 1.567) for an average of 1.864 days total (SD = 3.136). Out of all

participants, 39.545% were suspended on at least one occasion during this time period.

On average, schools within the city suspended 9.998% of students within the full school

year (New Orleans Parents’ Guide, 2015).

Correlation analyses were completed in order to assess the relationship between

demographic variables, predictors, and outcome variables (see Table 2). With respect to

37

demographic variables, gender was significantly correlated with Minor Discipline

Citations, Major Discipline Citations, and Positive Behaviors. Age was significantly

positively correlated with School Problems, Absences, Number of Suspensions, Days

Suspended, Minor Discipline Citations, Major Discipline Citations, and Positive

Behaviors. Based on these analyses and Kaufman et al. (2010)’s findings that gender and

age are strongly associated with school disciplinary experiences, gender and age will be

controlled for in regression analyses.

Aim One: Predictive Validity of the BESS SF Overall Risk Score

Two types of analyses were conducted to assess the relationship between the

BESS SF overall risk score and behavioral outcomes. First, correlation analyses were

completed to assess associations between the BESS SF overall risk score and outcome

variables. Next, the predictive ability of the BESS SF overall risk score was assessed

through the use of linear regressions, controlling for gender and age.

The BESS overall risk score was positively correlated with Number of

Suspensions and Days Suspended and negatively correlated with Positive Behaviors (see

Table 2). No significant association was found between the BESS overall risk score and

Absences, Minor Discipline Citations, or Major Discipline Citations.

Next, linear regressions were completed to assess the predictive power of the

BESS SF overall risk score. Due to past findings about the influence of age and gender

on behavioral outcomes in schools (Kaufman et al., 2010) and their significant correlation

with many of the outcome variables (see Table 2), age and gender were controlled for in

all linear regressions. See Tables 3 through 8 for detailed results.

38

Consistent with correlational results, the BESS SF overall risk score was not

found to significantly predict Absences (F [3, 216] = 2.422, p > .05; see Table 3) or

Minor Disciplinary Citations (F [3, 216] = 23.718, p < .001; β = .097, p > .05; see Table

4) after controlling for age and gender. However, the BESS SF overall risk score was

found to be a significant predictor of Number of Suspensions (F [3, 216]) = 14.893, p <

.001; see Table 5), Days Suspended (F [3, 216] = 13.525, p < .001; see Table 6), Major

Discipline (F [3, 216] = 18.602, p < .001; see Table 7), and Positive Behaviors (F [3,

216] = 6.814, p < .001; see Table 8) after controlling for age and gender. As the overall

risk score on the BESS SF increases, participants were suspended more frequently and

for longer periods of time during quarters two through four, providing support for the

hypothesis that the BESS SF overall risk score predicts suspensions. The hypotheses that

the overall risk score would predict Absences and Minor Behavior Citations were not

supported.

Aim Two: Classification Accuracy of the BESS SF

To assess the ability of the BESS SF to predict whether or not students will

exhibit problematic behavioral outcomes, indices of classification accuracy were

calculated using the two risk groups (Normal and Elevated) based on the BESS SF

overall risk score as the predictor variable and the dichotomized Absences and

Suspensions variables as outcome variables (see Table 9). Logistic regressions were also

conducted to further examine the ability of the BESS SF overall risk score to predict

whether or not participants exhibited excessive absences or suspensions, supplementing

the classification accuracy analyses, which require the use of a categorical rather than a

continuous predictor. Age and gender were controlled for in both logistic regressions.

39

Absences. For Absences, 82.432% of participants who did not demonstrate

excessive absences were identified as being at normal risk at the beginning of the year

(specificity). Similarly, 65.946% of those students who were identified as being at normal

risk at the beginning of the year did not demonstrate excessive absences (negative

predictive power). In contrast, only 12.500% of those students who demonstrated

excessive absences were identified as being at elevated risk at the beginning of the year

(sensitivity). Furthermore, only 25.714% of those students who were identified as being

at elevated risk went on to demonstrate excessive absences (positive predicative power).

Similar to previous studies (Chin et al., 2013; King et al., 2012), the BESS SF was better

at predicting students who did not demonstrate negative outcomes than predicting those

who did demonstrate negative outcomes. The use of excessive versus non-excessive

absences as an indicator of negative behavioral outcomes did not demonstrate a

meaningful improvement in the classification accuracy of the BESS SF.

Logistic regression revealed that the Overall BESS SF risk score did not

significantly predict group membership based on Absences (χ2 = 5.739, p > .05) when

controlling for age and gender. For students with non-excessive absence rates, 99.324%

were correctly classified in contrast to only 4.167% of those with excessive absences,

representing an overall classification accuracy of 68.182%. This is consistent with

classification accuracy statistics in previous research and the current study, which found

that the BESS SF is better at predicting who will not have problematic outcomes than

who will have problematic outcomes.

Suspensions. For Suspensions, 85.714% of participants who were not suspended

were identified as being at normal risk at the beginning of the year (specificity).

40

Similarly, 61.622% of those students who were identified as being at normal risk at the

beginning of the year were not suspended (negative predictive power). In contrast, just

18.391% of those students who were suspended were identified as being at elevated risk

at the beginning of the year (sensitivity). Furthermore, only 45.714% of those students

who were identified as being at elevated risk went on to be suspended (positive

predicative power). The BESS SF was better at predicting students who did not

demonstrate negative outcomes than predicting those who did demonstrate negative

outcomes.

Using logistic regression the Overall BESS SF risk score was able to significantly

predict group membership based on Suspensions (χ2 = 29.736, p < .001, R2 = .171

[Nagelkerke]) when controlling for age and gender. Students with higher overall risk

scores were more likely to have been suspended than those with lower overall risk scores

(β = .035, Wald < .05). The classification table revealed that although overall risk

predicted those who received zero suspensions at a rate of 86.466%, only 42.529% of

those who were suspended were correctly classified. Overall, this model accurately

predicted group membership for 69.091% of participants. Although the overall BESS SF

risk score significantly predicted group membership for suspensions, supporting the

hypothesis regarding the classification accuracy of the BESS, further analysis revealed

that the BESS SF is better at predicting who will not be suspended than students who will

be suspended, consistent with the results found above.

Aim Three: Predictive Utility of the Four-Factor Bifactor Model of the BESS SF

A confirmatory factor analysis was run using AMOS 18 in order to assess the fit

of the bifactor model of the BESS SF identified by Naser et al. (2016) to the current data

41

set. This was done prior to the exclusion of 10 cases that were missing absence and

suspension data (see above), resulting in the inclusion of 230 rather than 220 participants.

This model consisted of an overall factor and four orthogonal domain-specific factors

consistent with the BASC-2 composites (Internalizing Problems,

Inattention/Hyperactivity, School Problems, Personal Adjustment). Although the factor

loadings were generally consistent with Naser et al. (2016; see Table 10), the model did

not demonstrate acceptable fit to the current data, χ2 (376) = 787.841, p < .001, RSMEA

= .141, 90% CI = (.136, .147), pclose < .001, CFI = .000, and TLI = .000.

The unacceptable fit of the bifactor model precludes testing Aim 3 as proposed. It

is not appropriate to include the overall risk score and domain-specific factors in the same

linear regression due to multicollinearity. Instead, the decision was made to examine the

predictive ability of the domain-specific factors separately from the overall risk score. To

do this, four domain-specific factors were constructed that align with the BASC-2

composites from which the items were drawn (Internalizing Problems,

Inattention/Hyperactivity, School Problems, Personal Adjustment; Kamphaus &

Reynolds, 2007). These factors also align with the structure identified by Naser et al.

(2016). As described in the Methods section, specific factors were computed by summing

the items representing each composite. All factors demonstrated acceptable internal

consistency: Personal Adjustment (� = .744; possible score range = 8 – 32),

Inattention/Hyperactivity (� = .705; possible score range = 6 – 24), Internalizing

Problems (� = .816; possible score range = 10 – 40), School Problems (� = .811;

possible score range = 6 – 24). T-scores were computed individually for each factor for

use in regressions analyses.

42

Linear regressions were completed to assess the predictive power of the BESS SF

domain-specific factors. Due to past findings about the influence of age and gender on

behavioral outcomes in schools (Kaufman et al., 2010) and their significant correlation

with many of the outcome variables (see Table 2), age and gender were controlled for in

all linear regressions.

Absences. The model with all four domain-specific factors was not found to

significantly predict of Absences (F [6, 213] = 1.508, p > .05; see Table 11) after

controlling for age and gender.

Suspensions. The model with all four domain-specific factors was found to be a

significant predictor of Number of Suspensions (F [6, 213] = 10.005, p < .001; see Table

12) and Days Suspended (F [6, 213] = 8.743, p < .001; see Table 13), after controlling

for age and gender. Results indicated that the significant variance accounted for in the

model for both Suspension variables was due to Inattention/Hyperactivity (Number β =

.285, p < .001 and Days β = .270, p < .001). None of the other domain-specific factors

accounted for a significant amount of variance in the model. As the t-score on the

Inattention/Hyperactivity factor increased, participants were suspended more frequently

and for longer periods of time during quarters two through four.

Minor Discipline Citations. The model with all four domain-specific factors was

found to be a significant predictor of Minor Discipline Citations (F [6, 213]) = 16.540, p

< .001; see Table 14) after controlling for age and gender. Results indicated that the

significant variance accounted for in the model was due to Inattention/Hyperactivity (β =

.263, p < .001). None of the other domain-specific factors accounted for a significant

43

amount of variance in the model. As the t-score on the Inattention/Hyperactivity factor

increased, participants received more Minor Discipline Citations.

Major Discipline Citations. The model with all four domain-specific factors was

found to be a significant predictor of Major Discipline Citations (F [3, 216] = 12.199, p <

.001; see Table 15) when controlling for age and gender. Results indicated that the

significant variance accounted for in the model was due to Inattention/Hyperactivity (β =

.270, p < .001). None of the other domain-specific factors accounted for a significant

amount of variance in the model. As the t-score on the Inattention/Hyperactivity factor

increased, participants received more Major Discipline Citations.

Positive Behaviors. The model with all four domain-specific factors was found to

be a significant predictor of Positive Behaviors (F [3, 216] = 5.513, p < .001; see Table

16) when controlling for age and gender. Results indicated that the significant variance

accounted for in the model was due to Inattention/Hyperactivity (β = -.201, p < .01) and

School Problems (β = -.170, p < .05). As the t-scores on Inattention/Hyperactivity and

School Problems increased, participants received fewer citations for Positive Behaviors.

44

IV. DISCUSSION

As schools strive to improve proactive identification and intervention with youth

at risk for negative behavioral outcomes, it is imperative that they have access to

validated universal screeners that fit their population needs. The current study sought to

assess the predictive validity and classification accuracy of the BESS SF overall risk

score within a school serving a largely African American student population.

Additionally, this study sought to identify how improving the specificity of outcome

variables impacted the overall predictive validity of the BESS SF. Finally, this study

sought to investigate the ability of the BESS SF factors to predict behavioral outcomes

above and beyond what is predicted by the overall BESS SF score through the application

of the bifactor model identified by Naser et al. (2016).

Despite the high rates of poverty at the school of interest and known connections

between living in poor, urban environments and high rates of life stressors and traumatic

experiences, (New Orleans Parents’ Guide, 2014, Overstreet & Mazza, 2003), the number

of students identified as at-risk in this study are generally on par with what is expected

using a three-tiered model of socioemotional functioning (e.g., Splett, Fowler, Weist,

McDaniel, & Dvorsky, 2013). Based on these models, it is expected that 80% of students

in any given school exhibit normal levels of risk, 15% are at-risk and/or exhibiting low

levels of problematic behavior, and 5% exhibit significant behavioral problems. In

comparison, in the current study 84.091% were classified as normal risk, 12.727% were

classified as elevated risk, and 3.182% were classified as extremely elevated risk on the

45

BESS SF. These relatively normative rates of students at risk for negative behavioral and

emotional outcomes may be indicative of high levels of resilience in the face of stress

amongst these students. However, students’ self-reported resilience does not seem to be

reflected in their teachers’ responses to them in the school environment. The high rates of

suspension and citations for major and minor behaviors indicated that the school

environment may be particularly “reactive” as evidenced by the high rates of disciplinary

action. This may be due to implicit bias of the teachers in the interpretation of the

behavior of their students as African American students frequently receive

disproportionate rates of disciplinary actions even for similar actions (Gregory, Skiba, &

Noguera, 2010; Skiba, Michael, Nardo, & Peterson, 2002; Skiba et al., 2011; U.S.

Department of Education, 2016). At a system-wide level, emphasis on rules and

discipline in school policies may create an environment where teachers feel required to

issue high levels of disciplinary citations as part of administrator efforts to promote

student behavior through development of strict rules (e.g., American Psychological

Association [APA] Zero Tolerance Policy Task Force, 2008; Bear, 2008; Fleming &

Rose, 2007). Therefore, behavioral outcomes may have more to do with the adults in the

environment and their perception of and reaction to behavior than with the student’s own

reported risk. Nevertheless, student reported risk was predictive of some important

outcomes.

There was some support for the predictive validity of the BESS SF overall risk

score and behavioral outcomes as demonstrated via longitudinal associations. Consistent

with the work of King et al. (2012) and Chin et al. (2013), the BESS SF overall risk score

obtained at the beginning of the school year significantly predicted the number of times

46

that students were suspended throughout the rest of the school year. Building on past

research, the BESS SF was also found to predict the total number of days that students

were suspended. Students with higher risk were suspended more frequently and for more

days than students with lower risk, supporting the study hypothesis and providing

evidence of the ability of the BESS SF to predict negative student outcomes. Providing

evidence in support of expanding conceptualizations of behavioral outcomes used in

validation studies, students scoring higher on the BESS SF were found to receive more

Major Discipline Citations than those with low levels of risk. As indicated by their name,

Major Discipline Citations represent severe behaviors that are consistent with

suspendable offenses (Child Trends & EMT Associates Inc., 2016; Educational and

Community Supports, 2016; Gion et al., 2014). Although hypervigilance may play a part

in increasing the overall number of suspensions and Major Discipline Citations within

this population, the severity of many of the behaviors necessary to warrant these actions

(e.g., Bullying/Taunting, Defacing School Property, Stealing) indicates that students are

likely exhibiting behaviors that warrant concern. The serious nature of these infractions is

associated with difficulties with behavioral and emotional control even if less severe

disciplinary actions may have been appropriate, allowing for the prediction of these

behaviors based on overall risk. Together, these results support study hypotheses and

provide evidence that the BESS SF is able to predict “serious” and clinically meaningful

negative behavioral outcomes, including suspensions and major violations of school

behavioral expectations.

Consistent with this finding and in support of the study hypothesis, overall risk

was negatively association with Positive Behaviors. This paints a picture of students who

47

are cited for negative behaviors while receiving lower accommodations for positive

behaviors. It is possible that children exhibiting inappropriate behaviors are not likely to

receive as much positive attention and recognition from their teachers. Instead, they are

disciplined which may, in turn, cause them to disengage further from school, resulting in

a reciprocal relationship between disciplinary actions and engagement (Wang &

Fredricks, 2014). As they become more disengaged from school, they may be less prone

to exhibiting positive behaviors, falling into a dangerous cycle of problematic behavior,

discipline, and disengagement from school.

Although this data provides support for the predictive validity of the BESS SF

overall risk score, examinations of classification accuracy call its effectiveness at

predicting negative behavioral outcomes into question. Similar to past research (e.g.,

King et al., 2012; Chin et al., 2013), the BESS SF was a better predictor of which

students were not at risk for negative behavioral outcomes than of which students were at

risk for such outcomes. This result was consistent across classification accuracy statistics

and logistic regressions. Although the BESS SF overall risk score had better positive

predictive power than in the Chin et al. (2013) with respect to suspensions (Current:

45.714%, Chin: 14.252%), results in the current study demonstrated lower sensitivity and

negative predictive power than found by Chin and colleagues (Current: 18.391% and

61.622%, Chin: 32.485% and 94.300%). Specificity was almost identical for suspensions

in both studies (Current: 85.714%, Chin: 85.107%). These results do not support the

utility of the BESS SF at identifying at-risk youth within this population.

With the exception of specificity, the classification accuracy statistics for

suspensions fall below the recommended standards of 70% to 80% classification

48

accuracy for screening instruments (American Academy of Pediatrics, 2012; Glover &

Albers, 2007), indicating problems with the use of the BESS SF as a clinical measure of

negative behavioral outcomes in youth despite the predictive ability of the overall BESS

SF score. High suspension rates in the current study may have decreased the

meaningfulness of suspensions as an indicator of negative behavioral outcomes as it

failed to represent a clinically meaningful cut-off point (Glaros & Kline, 1988; Streiner,

2003). In contrast to Chin et al. (2013) who reported that 7.080% of participants were

suspended one or more times, 39.545% of students participating in the current study were

suspended at least once. Due to the high rate of suspensions within the population, a

higher cut-off than one suspension may be necessary to distinguish at-risk students from

those who are not at-risk.

Although hypervigilance towards behavior may not have inhibited the prediction

of severe disciplinary actions due to the serious nature of the infractions and associated

difficulty with behavioral and emotional control, the same may not be true for more

minor disciplinary concerns. This is reflected in the inability of the BESS SF to predict

Minor Discipline Citations which indicates that some factor or factors outside of the child

may better account for these disciplinary actions. One way that student control over

outcomes may be restricted is through teacher hypervigilance to minor behaviors,

resulting in the issuance of citations based on low thresholds of inappropriate behavior

and/or misinterpretation of behaviors as problematic. It is possible that implicit bias on

behalf of teachers played a role in the high rates of disciplinary actions. Past research has

found that African American students are disciplined at higher rates than their peers, even

for similar actions (Gregory et al., 2010; Skiba, et al., 2002; Skiba et al., 2011, U.S.

49

Department of Education, 2016). Unlike the severe behaviors resulting in suspensions

and Major Discipline Citations, those resulting in Minor Discipline Citations tend to be

more normative rule violations (e.g., Causing Distractions/Disturbances, Gum Chewing,

Running in Hallway/Stairway) rather than true indicators of risk without knowing the

duration and intensity of these events. Using Off Task as an example, a student who is

given a citation for being off task for briefly daydreaming who still managed to complete

his/her work demonstrates a different severity of behavior from a student who received a

citation for being off task who was unfocused for long periods of time, resulting in work

incompletion. If a teacher immediately cites both students, the teacher’s hypervigilance to

the behavior may obscure the utility of the citation as an indicator of risk. If a student is

cited as Not Following Directions for being out of his/her seat without the teacher

recognizing that he/she was retrieving a dropped pencil, this behavior is misinterpreted as

problematic when it is not. It is also important to consider the larger system in which the

teachers are operating due to the influence of school policies on the behavior of

individual teachers (Bronfenbrenner 1977; 1986; Fenning & Rose, 2007; Foreman &

Zins, 2008; Nastasi, Moore, & Varjas, 2004). School-wide emphasis on strict behavioral

codes may result in hostile school environments with high rates of discipline citations

such as seen here (e.g., APA Zero Tolerance Policy Task Force, 2008; Bear, 2008).

Additionally, some of the included minor behaviors may be more indicative of

life circumstances outside of the control of students that commonly impact youth from

low socioeconomic status backgrounds (Overstreet & Mazza, 2003). For example, a

student may be out of uniform due to lack of funds to complete laundry that week or

sleeping during class because noise in their neighborhood kept them up the night before.

50

Rather than looking solely at the risk associated with factors internal to students, it is

important to consider how external factors including teacher interpretation of behavior

through a lens of implicit bias, school policies, and socioeconomic status (SES) influence

student behavior.

A lack of student control in determining school attendance may also explain the

fact that contrary to past research (King et al., 2012), overall risk did not predict school

absences despite fact that 32.727% of students were absent from school for 10 or more

days during quarters two through four. In comparison, 15.9% of students in fifth through

eighth grade students attending public schools within the same city missed 10 or more

days during entire school year (Sims & Vaughn, 2014). Rather than representing

behavioral risks specific to the child, the high absence rate may be influenced by factors

outside of their control, such as factors related to SES (Chang & Romero, 2008;

Morrissey, Hutchison, & Winsler, 2014). Children growing up in low SES families may

have more difficulty attending school regularly due to several concerns, including

housing instability and neighborhood safety. Additionally, many low-SES families do not

have access to reliable transportation. In cities where neighborhood-based schools have

been replaced with charter systems, as has happened in the city where this study was

conducted (Kamenetz, 2014), families can experience problems with school access due to

transportation issues. For example, if something happens to their method of

transportation (e.g., car breaks down, bus arrives early), students may have to miss a day

of school due to lack of alternate means to get them. Therefore, absences may mean

something different in the current study than in King et al. (2012); rather than

representing attendance problems related to self-reported risk (e.g., skipping), absences

51

for this population may reflect the influence of factors outside of student control such as

SES and associated difficulties.

Examinations of classification accuracy were consistent in that excessive

absences (10+) versus non-excessive absences were not found to be a clinically

meaningful indicator of behavioral risk, reaching an acceptable standard for specificity

only (82.432%). Although absences had better specificity than ODRs in King et al.

(2012; Current: 85.714%, King: 73.184%) and better positive predictive power than past

examinations of suspensions (Current: 25.714%, Chin et al., 2013: 14.252%), it was

weaker on all other previously gathered classification accuracy measures. Due to the high

rate of absences within the population, a higher cut-off than 10 absences may be

necessary to distinguish students at risk for behavioral outcomes from those who are not

at risk. Another possible explanation for the differential findings may be due to

operationalizion of absences/attendance in the current study versus King et al. (2012). In

the current study, the variable of interest was number of absences, looking specifically at

quarters two through four, while King et al. (2012) looked at the percentage of days that

students attended school on-time, including tardies. This difference in measurement may

have changed the nature of the outcome variable from what was employed in the current

study.

In summary, although linear regressions indicate that the BESS SF is able to

predict engagement in severe inappropriate behaviors (e.g., Suspensions, Major

Discipline Citations), further investigation into its classification accuracy revealed that

the BESS SF is better at predicting who will not demonstrate problematic outcomes than

predicting those who will, which is consistent with past research (Chin et al., 2013; King

52

et al., 2012). Glover and Albers (2007) argue that low positive predictive power and high

sensitivity may be ideal for screening instruments as this decreases the risk of missing

youth who are in need of prevention and intervention, but the current results were far

below the 70 – 80% standard for classification accuracy (American Academy of

Pediatrics, 2012). Although the BESS SF was designed to over-identify youth so as to

decrease the possibility of failing to identify youth in need of prevention and intervention

(Kamphaus & Reynolds, 2007; King et al., 2012), use of the BESS SF in the current

population as a screening tool may overwhelm the resources of the school to complete

necessary follow-up assessments. As the purpose of universal screeners are to provide

efficient and effective means of proactively identifying youth in need of prevention and

intervention efforts in order to decrease the possibility of negative behavioral outcomes,

this current study calls into the questions the acceptability of the BESS SF as a screener

for use within a comprehensive system for low SES, African American youth.

One way to address possible concerns about the predictive validity of the BESS

SF overall score is to examine how the underlying factor structure of the BESS can be

used to enhance its ability to predict behavioral outcomes. This study intended to

investigate the predictive utility of the four-factor bifactor model developed by Naser et

al. (2016) but was precluded from doing so due to the unacceptable fit of the bifactor

model to the current data. Instead, the current study examined the predictive ability of the

domain-specific factors based on the BASC-2 composites from which the items were

drawn (Internalizing Problems, Inattention/Hyperactivity, School Problems, Personal

Adjustment; Kamphaus & Reynolds, 2007) separately from the overall risk score.

53

Through these analyses, Inattention/Hyperactivity emerged as the main predictor

of student behavioral outcomes, serving as the sole predictive factor for Suspensions and

Major and Minor Discipline Citations. On examination, it is clear that the items

compromising the Major and Minor Discipline Citations categories are largely associated

with externalizing behaviors (e.g., Causing Distractions/Disturbances, Throwing,

Bullying/Taunting). Although the specific incidences for which students were suspended

are unavailable, review of state-endorsed suspendable behavior also reflect an emphasis

on externalizing behaviors (e.g., Cursing/Vulgar Language, Fighting, Willful

Disobedience; Child Trends & EMT Associates, Inc., 2016). As the

Inattention/Hyperactivity factor represents risk related to externalizing behaviors such as

talking while others are talking and having difficulty staying still, it is not surprising that

student-endorsement of risk related to this factor would be associated with these

outcome. In fact, these items specifically address school-related behaviors such as having

trouble paying attention to the teacher and standing in lines. Student response to such

questions may reflect their experiences being corrected in school for these specific

behaviors whether or not students agree that their particular behaviors are problematic

(Phares & Compas, 1990). At this time, research examining the relationship between

youth-reported risk related to specific domains of functioning and behavioral outcomes is

lacking, with the majority of available research on school-occurring externalizing

behaviors and behavioral consequences relying solely upon teacher-reported information

(e.g., McIntosh, Campbell, Carter, & Zumbo, 2009). Although past research has found

that correlations between teacher and youth reported externalizing behaviors/risk tend to

be at the moderate level at best (Achenbach et al., 1987; De Los Reyes & Kazdin, 2005;

54

Salbach-Andrae, Lenz, & Lehmkuhl, 2009), this research has largely been conducted

using longer scales with a wider variety of questions than those included on the domain-

specific factors of the BESS SF. By restricting the included questions to those more

specifically relevant to school settings as occurs on the Inattention/Hyperactivity domain-

specific factor, the reliability of youth reports of their behavior may have been enhanced,

resulting in the observed associations. It is imperative that future researchers make efforts

to include assessments of youth-reported risk such as the BESS SF rather than relying

solely on teacher- or parent-reported risk related to externalizing behaviors to assess this

possibility.

Inattention/Hyperactivity and School Problems were both significant predictors of

Positive Behaviors. One way to make sense of this relationship is through the lens of

school engagement (e.g., Fredricks, Blumenfeld, & Paris, 2004; Wang & Fredricks,

2014). There are multiple related domains composing the construct of school

engagement, including behavioral engagement (e.g., following school rules, exhibiting

academically relevant behaviors such as effort and enthusiasm, participating in activities),

emotional engagement (e.g., identification with and affect towards school), and cognitive

engagement (e.g., desire and focus on learning). As items on the School Problems factor

are largely associated with lack of engagement and enjoyment of school, it can be

conceptualized as a measure of emotional engagement in school.

Inattention/Hyperactivity can be conceptualized as a measure of behavioral engagement

as it focuses on difficulties paying attention and conforming to school behavioral

expectations. The items composing the Positive Behavior composite, such as Exemplary

Effort and Doing More Than Asked, can also be conceptualized as indicators of student

55

engagement. As indicators of various aspects of school engagement, the association

between Inattention/Hyperactivity, School Problems, and Positive Behavior makes

conceptual sense. Although research specifically examining the relationship between

school engagement and behaviors outside of academic achievement is in its infancy,

initial findings indicate a negative and reciprocal association between school engagement

and problematic behavior in adolescents (e.g., substance use, delinquency; Wang &

Fredricks, 2014). In the current study, students who receive fewer citations for positive

behaviors were also found to receive more negative disciplinary actions. Applying the

reciprocal relationship found by Wang and Fredericks (2014), students who demonstrate

fewer positive behaviors associated with school engagement demonstrate more

problematic behaviors, for which they are disciplined. Disciplinary actions may involve

missed class time and/or negative interactions with the school, which then further

decreases school engagement and the cycle continues. By tapping into school

engagement, low citations for Positive Behaviors may provide an alternate way to

identify students who are at risk for negative behavioral outcomes outside of the

examination of disciplinary actions.

None of the domain-specific factors significantly predicted student absences. This

was unexpected, especially with respect to Internalizing Problems as youth with

internalizing problems frequently have higher rates of absences from school (e.g., Zolog

et al., 2011) than youth without internalizing problems. The fact that absences were not

predicted by Internalizing Problems provides further evidence of the need to examine the

role of influences outside of student control on absence rates within this population as

discussed above.

56

In summary, the domain-specific factors demonstrate the potential to enhance the

predictive ability of the BESS SF. Specifically, Inattention/Hyperactivity and School

Problems may serve to predict those at-risk for specific types of behavioral outcomes. As

classification accuracy associated with the domain-specific factors was not investigated,

it remains to be seen whether or not domain-specific risk demonstrates an improvement

in prediction of who is at-risk versus who is not at-risk over that obtained using the BESS

SF overall risk score. Although further validation studies are necessary as are

investigations into factors that impact the relationship between risk and observed

behavioral outcomes, the current study represents an important step in this line of

research.

Limitations

A major limitation for this study was the relatively small sample size, which

limited the ability to fit a bifactor model to the BESS SF data. Although bifactor models

can be run successfully with sample sizes similar to the current study (N = 230), smaller

sample sizes are more prone to estimation problems (Brunner et al., 2012; MacCallum,

Widaman, Zhang, & Hong, 1999; Yang & Green, 2010). It is possible that with a larger

sample size the bifactor model would have fit to the current data. Alternatively, it is

possible that an alternate bifactor structure would have been a better fit to the current data

than the model found by Naser et al. (2016). By conducting a confirmatory factor

analysis without performing any exploratory factor analyses, possible alternative models

may have been overlooked. The lack of a bifactor model with acceptable fit indices

limited the ability to test the predictive validity of the domain-specific factors over and

above the overall BESS SF risk score as it was not possible to weight the factors.

57

Therefore, the factors were not orthogonal with the overall risk score resulting in

multicollinearity. Although it was possible to complete analyses examining the predictive

ability of the domain-specific factors without the inclusion of the overall factor, the

sample size and statistical composition of the data thwarted true examination of the

predictive power of a bifactor model of the BESS SF.

The predictive ability of the domain-specific factors may also have been hindered

by reliance on outcome variables that focused on observable behaviors recorded by

teachers, which generally consisted of behaviors associated with externalizing rather than

internalizing concerns. Although one of the advantages of using universal screening over

traditional methods of identification of students in need for prevention and intervention

efforts is the inclusion of items assessing risk for internalizing behaviors (Achenbach et

al., 1987; Walker et al., 2005), it is not possible to assess the ability of the BESS SF to

predict internalizing problems without representing them among outcomes.

The current study also lacked an indicator of academic performance such as grade

point average. Due to the connection between behavioral and emotional risk and impaired

academic performance, identifying students at need for socioemotional interventions

through universal screening can facilitate the provision of interventions that also serve to

improve academic performance (e.g., Eklund & Dowdy, 2014; King et al., 2012; Zins et

al., 2007). Although some indicators of academic performance were included as part of

the Positive Behavior variable (e.g., High Academic Achievement), the failure to include

a specific academic indicator inhibits the ability to evaluate the BESS SF as a predictor of

academic performance in African American youth.

58

Efforts to improve the clinical meaningfulness of behavioral outcomes in the

current study may have been hampered by the high rates of absences, suspensions, and

citations for minor and major behaviors amongst the participants. As a result, chosen

suspension and absence cut-off points may have lost their clinical significance (Glaros &

Kline, 1988; Streiner, 2003), resulting in the poor classification accuracy of the BESS SF.

This is despite the fact that the current study may actually underrepresent suspension and

absences as data represented quarters two through four rather than the whole school year.

Although the goal of this study was to make longitudinal predictions of negative

behavioral outcomes, concurrent validity is also important to identifying students in need

of prevention and intervention (Dowdy et al., 2012; Glover & Albers, 2007). Future

studies should seek to assess both concurrent and predictive validity of the BESS SF.

Another area of limitation for this study is the use of teacher-gathered data for the

Major and Minor Discipline Citations and Positive Behaviors. As teachers are responsible

for the education and supervision of large groups of students, it is highly likely that they

miss occurrences of behaviors as they work to complete the large variety of tasks that are

required as part of their job (e.g., Putnam, Luiselli, Handler, & Jefferson, 2003). Even the

most conscientious teacher is not going to be able to observe and record every instance of

every behavior included within the Kickboard system. Additionally, if individual teachers

conceptualize behaviors differently or have different thresholds for issuing citations, the

integrity of the data could be compromised (Education and Community Supports, 2016;

Kaufman et al., 2010; McIntosh et al., 2009; Putnam et al., 2003). For example, one

teacher may include both passive and active behaviors in their consideration of whether

or not a student is off task, while another teacher may focus only on active off-task

59

behaviors. As classification into Major or Minor categories was completed retroactively

based on behavioral categorization without specific knowledge of the behavior leading to

the citation, differences in conceptualization of categories could result in incorrect

assumptions regarding severity of incident. For example, one teacher could cite a

noncompliant student for Not Following Directions, a Minor Discipline Citation, while

another one views that same behavior as Willful Disobedience, a Major Discipline

Citation. Finally, one of the proposed reasons for using a self-report measure such as the

BESS SF instead of teacher report screeners is to reduce the influence of implicit teacher

bias on referrals (Raines et al., 2012). By focusing mainly on outcomes that are recorded

by teachers and other school employees, the element of bias is introduced back into the

equation. Hypervigilance to behaviors, whether due to individual bias or application of

school policies, may result in misinterpretation of non-problematic behaviors as

problematic and citations for behaviors at such a low threshold that they fail to indicate

true risk. This likely impacted Minor Discipline Citations more than Major ones as the

severity of behavior necessary to warrant Major Discipline Citations may indicate

difficulties with behavioral and emotional control even if less severe disciplinary actions

may have been appropriate. Therefore, for all these reasons, relying on teachers to input

data constitutes a limitation for the current study.

Implications and Future Directions

As schools seek ways to proactively identify students who are at-risk for negative

behavioral and emotional outcomes through universal screening, it is imperative that they

have access to measures that are appropriate to their needs and have been validated for

their population (Glover & Albers, 2007; Young et al., 2010). Towards this goal, the

60

current study sought to examine the predictive validity of the BESS SF with a low

socioeconomic status, African American population attending a public charter school in a

Southeastern city. Although the BESS SF was able to predict disciplinary actions related

to severe problematic behaviors (e.g., Suspensions, Major Discipline Citations), it proved

better at predicting those who will not demonstrate problematic outcomes than predicting

those who will as demonstrated by problematically low classification accuracy for

suspensions and attendance. This calls into question its utility as an effective and efficient

tool for use as part of a comprehensive system for identifying students in need of

prevention and intervention services. To some degree, the low classification accuracy of

the BESS SF is intentional in order to limit false negatives that result in students in need

not receiving appropriate services with the intention being that follow-up assessment as

part of a comprehensive screening and intervention will separate the students who are

truly at-risk from those who were falsely identified (Glover & Albers, 2007; Kamphaus

& Reynolds, 2007; Levitt et al., 2007). The BESS SF is not intended to be the sole

decision point for service provision. However, the high rates of over-identification

indicated in the current study cause concerns about the possibility of overwhelming

schools that are already strapped for resources (Glover & Albers, 2007), as many urban

schools are. As the BESS SF demonstrated the ability to predict behavioral outcomes

using regressions, it is imperative that researchers continue to investigate why this does

not translate to acceptable classification accuracy.

One area for further investigation is the clinically meaningfulness of indicators of

behavioral outcomes. Determinations of classification accuracy are dependent on the

clinical meaningfulness of included variables, both those that are used to predict

61

outcomes and those that are used to measure outcomes (Glaros & Kline, 1988; Streiner,

2003). If the specific cut point chosen to separate those exhibiting problematic behavior

from those who do not is not clinically meaningful, then classification accuracy can

appear to be worse than if another outcome and/or cut point were chosen. Rather than

representing a failure of the BESS SF to identify at-risk students, the low classification

accuracy statistics for absences and suspensions could be due to the decision to use cut

points that are too low to indicate clinically meaningful problematic behavior outcomes.

When 39.545% of students are suspended at least once and 32.727% were absent at least

ten days, cut points of one or more suspensions and ten or more absences may no longer

separate problematic from non-problematic behavioral outcomes. Instead, it may be

necessary to employ stricter standards of what constitutes problematic behavioral

outcomes in this population. For example, Chang and Romero (2008) advocate

conceptualizing “chronic absence as missing 10 percent or more of the school year …

regardless of whether absences are excused or unexcused” (p. 3). Alternatively rather

than relying on predetermined cut-off points, future studies can employ statistical

methodologies such as the application of receiver operator characteristic (ROC) curves to

identify clinically meaningful cut-off points for suspensions and absences that are

relevant to the specific population of interest (see Burke et al., 2012 for a demonstration

of this process). Additionally, this process can be used to determine clinically meaningful

cut points for new conceptualizations of problematic behavioral outcomes such as Major

and Minor Discipline Citations and Positive Behaviors. Once validated and normed, the

classification accuracy of the BESS SF domain-specific factors should also be

investigated.

62

As part of investigations of the appropriateness and clinical utility of chosen

outcome variables, it is important that future researchers investigate other factors that

could impact the observed relationship between student-reported risk and problematic

behavioral outcomes. Despite the high likelihood of exposure to life stressors and

traumatic experiences connected to growing up in poor, urban environments (New

Orleans Parents’ Guide, 2014; Overstreet & Mazza, 2003), students were identified as at-

risk at generally normative rates based on a three-tiered model (e.g., Splett et al., 2013).

This could indicate that the BESS SF is not an appropriate tool for universal screening in

schools serving urban, low socioeconomic status (SES), African American youth.

Alternatively, low self-reported risk could also indicate true resiliency on part of these

students; however, students’ self-reported resilience does not seem to be reflected in their

teachers’ responses to them in the school environment as indicated by the high rates of

disciplinary actions. Instead, that the school environment may be particularly “reactive”

and characterized by teacher hypervigilance to behavioral infractions. Although the use

of the self-report measures may serve to decrease the over-identification of at-risk youth

(Raines et al., 2012), the assessed outcomes with the exception of absences were

determined by teachers, and, therefore, are subject to potential bias. If teachers hold

unconscious biases, they may be more likely to perceive behaviors as violations worthy

of disciplinary citations for African American students that would not result in

disciplinary action for European American students, resulting in higher rates of office

discipline referrals and suspension for African American students (Gregory et al., 2010;

Skiba et al., 2002; Skiba et al., 2011; U.S. Department of Education, 2016). As a result,

even students with low risk of problematic behavioral outcomes may receive high

63

numbers of disciplinary actions, explaining the problems with classification accuracy for

suspensions observed in this study. In order to assess the role of teacher perceptions of

behaviors, future research should investigate differential classification accuracy and

predictive validity of the BESS SF using a variety of behavioral outcomes as mediated by

implicit bias of teachers. It is also possible that students see themselves as more resilient

than teachers perceive them to be. This possibility is supported by past findings of low

correlations between BESS SF and TF overall risk scores (interrater reliability = .393, p

< .01; King et al., 2012). Future studies could benefit from including assessments of risk

from multiple informants to evaluate differences in perceived risk and the impact of these

differences on demonstrated behavioral outcomes.

Future research should utilize systems based approaches to consider how school

disciplinary policies impact the disciplinary behavior of individual teachers

(Bronfenbrenner 1977; 1986; Bear, 2008; Fenning & Rose, 2007; Foreman & Zins, 2008;

Nastasi et al., 2004). Even if teachers do not perceive certain policies as necessary or fair,

they may feel pressured to conform to administrative policies, resulting in hypervigilance

to behaviors. This school-wide emphasis on strict behavioral codes may result in hostile

school environments with high rates of discipline citations such as found in the current

study (e.g., APA Zero Tolerance Policy Task Force, 2008; Bear, 2008; Fenning & Rose,

2007). Future studies should assess the role of administrative policies emphasizing strict

application of rules on the relationship between student risk status and behavioral

outcomes.

In addition to the race/ethnicity of the students, this sample differed from many

others due to the high representation of students from low socioeconomic backgrounds.

64

Children growing up in low SES families are exposed to circumstances outside of their

control that may impact their ability to comply with behavioral expectations (Overstreet

& Mazza, 2003). Noncompliance with several of the behaviors resulting in Minor

Disciplinary Citations may be more indicative of such circumstances than of personal

risk. For example, students may be cited for “No Learning Supplies” not because they

forgot to bring pencils to class or because they do not care about school, but because all

their pencils are broken and their families cannot afford to buy new ones. The context of

growing up in poverty may also impact student absences as transportation issues, housing

instability, and neighborhood safety can make it difficult to get to school (Chang &

Romero, 2008; Morrissey et al., 2014). By learning more about the contextual factors

influencing student behavioral outcomes, appropriate prevention and intervention

strategies can be implemented that target the true cause of their behaviors rather than

focusing on behavioral and emotional risk factors specific to the student.

Another area for future investigation is the role of school engagement as a

potential mediator of the relationship between risk and problematic behaviors. On

examination of the questions composing the BESS SF domain-specific factors, it was

possible to make comparisons between Inattention/Hyperactivity and School Problems

and the concepts of behavioral engagement and emotional engagement, respectively.

These two factors were significant predictors of Positive Behaviors, which can be

conceptualized as reflective of overall engagement in school. The reciprocal association

between low school engagement and problematic behavior is concerning (Chang &

Romero, 2008; Fredricks et al., 2004; Wang & Fredricks, 2014), especially in light of the

high disciplinary actions in this school. Students who experience high rates of

65

disciplinary action miss academic time and may become further disengaged from school,

resulting in escalating problematic behaviors (Gregory et al., 2010; Wang & Fredricks,

2014). In contrast, engaged students may exhibit more positive behaviors, for which they

receive praise rather than discipline, which may, in turn, increase their engagement in

school. By tapping into the school engagement, Positive Behaviors may provide an

alternate way to identify students who are at risk for negative behavioral outcomes

outside of the examination of disciplinary actions. Future studies should examine the role

that student-reported school engagement plays in the relationship between risk status and

problematic behavioral outcomes including low receipt of citations for Positive

Behaviors.

The high number of disciplinary actions taken within this school are concerning

as restrictive discipline policies are associated with higher rather than lower rates of

problematic behaviors (Gregory et al., 2010; Wang & Fredricks, 2014; Way, 2011) As it

is known that positive reinforcement of appropriate behaviors is more effective at

promoting engagement in positive behaviors than punishment of inappropriate behaviors,

one possible way to provide prevention and intervention to students at-risk for

problematic behaviors may be to focus on providing specific instruction in the behaviors

required to earn merit awards. Instead of implementing strict disciplinary policies,

increasing positive behavioral supports through policies focusing on the importance

positive reinforcement in schools and clarity of expectations may help students

experience improved connections with teachers and the school as a whole resulting in

increased engagement and motivation to perform academically and behaviorally (APA

Zero Tolerance Policy Task Force, 2008; Bear, 2008; Fredricks et al., 2004).

66

Additionally, instruction in social-emotional and behavioral self-management within the

context of a supportive system of reinforcements and rewards for all students should help

reduce high levels of disciplinary action seen in this school. By reducing hypervigilance

to undesired behaviors, schools may enhance the clinical meaningfulness of disciplinary

actions as indicators of problematic outcomes as their issuance will better reflect

occurrences of problematic behaviors. As specific behavioral outcome data including

Positive Behaviors for the current study were collected as part of the school-wide positive

behavioral intervention and supports system (SWPBIS), it is clear that the school is

attempting to implement such a system, but the fidelity with which it is being

implemented is unclear at this time. Future research should seek to explore how SWPBIS

implementation can impact the relationship between risk status and problematic behavior

outcomes, specifically examining how reinforcement of positive behaviors and

disciplinary actions impact school engagement.

The current study provided initial evidence for new conceptualizations of

problematic behaviors, especially Major Discipline Citations and Positive Behaviors.

Despite this, the reliance on teacher gathered data may have negatively impacted the

reliability of this data due to the influence of implicit bias, school policy issues, potential

differences in variable conceptualization, and difficulty gathering data in light of other

responsibilities (Education and Community Supports, 2016; Fenning & Rose, 2007;

Gregory et al., 2010; Kaufman et al., 2010; McIntosh et al., 2009; Putnam et al., 2003;

Skiba et al., 2002; Skiba et al., 2011). One way to lessen the impact of this problem

would be to provide teacher trainings on the operationalization of variables, including a

focus on appropriate use of the Kickboard system with implementation integrity checks

67

occurring in classrooms throughout the year. Special attention addressing implicit bias

and hypervigilance to behaviors can be included as part of these trainings. Consultation

and trainings must also include school administration and other stakeholders who have a

role in determining school policies in order to improve buy-in and support of teachers as

they strive to change the way they discipline students and improve their cultural

competence (Bear, 2008; Fenning & Rose, 2007; Foreman & Zins, 2008; Nastasi et al.,

2004). Without institutional acceptance of the recommended changes, teachers may

receive conflicting messages regarding disciplinary expectations, decreasing the

effectiveness of any consultation efforts. Doing so may also serve to improve the overall

implementation of the SWPBIS system, which in turn, should serve to increase student

engagement and motivation to perform academically and behaviorally (APA Zero

Tolerance Policy Task Force, 2008; Bear, 2008). The perceived need for high rates of

disciplinary actions may be decreased through efforts to improve the reliability of data

collection.

Additionally, work can be done to improve the specificity of the Kickboard

variables themselves. For example, behaviors could be classified as Major or Minor at the

time of incident to decrease the possibility of misclassification. Alternative categorization

schemas should also be tested. The Major versus Minor classification system based on

the work of Horner and colleagues (Educational and Community Supports, 2016) utilized

by this study is only one way to categorize types of behavioral citations that can be issued

to students. Instead of categorizing referrals based on severity of incident, type of

behavior could be used to guide classification. For example, Putnam et al. (2003)

recorded whether behaviors resulting in ODRs were considered aggressive, disruptive,

68

disrespectful, noncompliant, or other. Using another system, Kaufman et al. (2010)

examined the occurrence of ODRs related to attendance, delinquency, aggression, and

disrespect. Future research should explore the relative merits of classifying behaviors by

severity versus behavior type with respect to feasibility of application and as clinically

meaningful indicators of problematic behavioral outcomes.

In order to address concerns over the predictive validity of the BESS SF overall

score, efforts should be made to improve its preciseness. One way to do this is examine

how the application of factors representing the underlying structure of the BESS can

function to enhance its utility. Efforts to examine the predictive ability of a four-factor

bifactor model of the BESS SF were hindered due to the poor fit of the bifactor model

found by Naser et al. (2016) to the current data set. As a result, it was not possible to

determine their predictive ability above and beyond the BESS SF overall risk score.

Future validation studies with larger sample sizes applying the bifactor model obtained

by Naser et al. (2016) and using exploratory factor analyses to determine model structure

should be completed in order to examine the predictive ability of a bifactor model of the

BESS SF. Despite this, analyses of the domain-specific factors consistent with the

BASC-2 domains from which the BESS items are drawn (Kamphaus & Reynolds, 2007)

demonstrated initial evidence of their utility as predictors of negative behavioral

outcomes. Specifically, these analyses demonstrated that students endorsing risk related

to Inattention/Hyperactivity have higher occurrences of negative behavioral outcomes in

school including suspensions and disciplinary citations focused on externalizing

concerns. These students may benefit from interventions designed to improve their

attention and behavioral control. Additionally, students at risk for

69

Inattention/Hyperactivity and/or School Problems received fewer citations for positive

behaviors than those who were not at risk on these factors. Therefore, students exhibiting

high scores on the BESS SF may benefit from interventions designed to encourage more

positive behaviors through efforts to improve their overall school engagement and the

application of SWPBIS as discussed above. Validation of the ability of the domain-

specific factors of the BESS SF as predictors of student behavioral outcomes and their

utility in developing targeted interventions based on specific areas of risk represent

important areas for further research.

The significant association between the BESS SF and Positive Behaviors

provided evidence for expanding examined behavioral outcomes beyond those related to

disciplinary outcomes and absences. Special attention should be paid to identifying

outcomes that may be predicted by the domain-specific factors as true examinations of

the predictive ability and classification accuracy of a predictor cannot be made unless

appropriate and clinically meaningful indicators of outcomes are used (Glaros & Kline,

1988). For example, the predictive ability of the Internalizing Problems factor can only

be examined through the presence of an outcome associated with internalizing concerns.

Although not feasible for individual schools to implement on a large scale, future

research could include specific measures designed to assess outcomes related to anxiety

and depression. Such efforts are necessary to complete validation studies of the BESS SF.

In conclusion, the current study represents another step towards the validation of

the BESS SF as a universal screening tool that can be used as part of a comprehensive

system of identification of students at-risk for negative behavioral and emotional

outcomes. Although the association of the overall BESS SF score with specific

70

behavioral outcomes seems promising, classification accuracy statistics continue to be

lacking, leading to the recommendation of caution when using the BESS SF to identify

students in need of prevention and intervention efforts. It should not be used, nor is it

intended to be used, without a comprehensive system in place to help separate those who

are truly at risk from the false positives. Despite this, the potential for improvement is

there. Further validation studies must be completed to determine whether meaningful and

appropriate cut-off points for behavioral outcomes can be established and explore

alternative behavioral outcomes such as positive behaviors and internalizing concerns.

Other factors that may account for the high rates of negative behavior outcomes despite

the low levels of overall risk such as implicit teacher bias, school policies, and

socioeconomic status should be explored. The predictive utility of a bifactor BESS model

remains to be seen; however, the predictive utility of the Inattention/Hyperactivity and

School Problems factors shows potential for the usefulness of domain-specific factors. As

early identification and intervention through the use of universal screeners such as the

BESS SF is key to decreasing the short- and long-term negative outcomes associated with

behavioral and emotional difficulties in youth, it is imperative that these validation efforts

be continued.

71

TABLES

Table 1 Descriptive Statistics

M SD Lowest Highest

Age 11.430 1.666 8.000 15.000

Overall Risk Score 51.200 9.603 31.000 81.000

Minor Discipline Citations 64.209 53.265 0.000 221.000

Major Discipline Citations 9.614 10.470 0.000 40.000

Positive Behaviors 168.286 69.459 38.000 364.000

Inattention/Hyperactivity 11.716 3.608 6.000 23.000

School Problems 12.081 4.168 6.000 23.000

Internalizing Problems 18.449 5.626 10.000 36.000

Personal Adjustment 14.904 4.474 8.000 29.000

Absences (Days) 7.709 5.639 0.000 26.000

Number of Suspensions 0.982 1.567 0.000 6.000

Days Suspended 1.864 3.136 0.000 12.000

72

Table 2 Correlations Between Demographic, Predictor, and Outcome Variables

Variables 2 3 4 5 6 7 8 9 10 11 12 13

1. Gendera .081 .008 -.008 .003 .033 -.020 0103 -.058 -.053 -.168* -.151* .196**

2. Age -.006 .208** -.041 -.130 .127 .150* .375*** .359*** .443*** .396*** .165*

3. Overall Risk Score .684*** .686*** .858*** .650*** .041 .147* .147* .093 .118 -.159*

4. School Problems .236*** .415*** .442** .101 .181** .157* .239*** .188** -.171*

5. Personal Adjustment .536*** .202** .026 .010 .035 -.055 -.016 -.038

6. Internalizing Problems .406*** .009 .014 .021 -.067 -.006 -.075

7. Inattention / Hyperactivity -.010 .310*** .291*** .298*** .300*** -.210**

8. Absences .251*** .252*** .183** .210** -.314***

9. Number of Suspension .937*** .674*** .660*** -.435***

10. Days Suspended .601*** .611*** -.427***

11. Minor Discipline Citations .892*** -.477***

12.Major Discipline Citations -.409***

13. Positive Behavior a 0 = Boys, 1 = Girls

73

Table 3 Prediction of Absences by Overall BESS SF Risk Score

Variable b SE b β ΔR2 Step 1 .031* Gender 1.025 0.091 .091 Age 0.484 0.143 .143* Step 2 .002

Overall BESS SF Risk Score 0.024 0.039 .041

* p < .05, ** p < .01, *** p < .001 Final model statistics: R2 = .033, F (3, 216) = 2.422 p > .05

Table 4 Prediction of Minor Discipline Citations by Overall BESS SF Risk Score Variable b SE b β ΔR2 Step 1 .238*** Gender -0.410 0.119 -.205** Age 0.276 0.036 .460*** Step 2 .009

Overall BESS SF Risk Score 0.010 0.006 .097

* p < .05, ** p < .01, *** p < .001 Final model statistics: R2 = .248, F (3, 216) = 23.718, p< .001

Table 5 Prediction of Number of Suspension by Overall BESS SF Risk Score Variable b SE b β ΔR2 Step 1 .149*** Gender -0.278 0.197 -.089

Age 0.360 0.059 .383*** Step 2 .023*

Overall BESS SF Risk Score 0.025 0.010 .151*

* p < .05, ** p < .01, *** p < .001

Final model statistics: R2 = .171, F (3, 216) = 14.893, p < .001

74

Table 6 Prediction of Days Suspended by Overall BESS SF Risk Score Variable b SE b β ΔR2 Step 1 .136*** Gender -0.520 0.397 -.083

Age 0.689 0.119 .366*** Step 2 .022*

Overall BESS SF Risk Score 0.049 0.020 .150*

* p < .05, ** p < .01, *** p < .001 Final model statistics: R2 = .146, F (3, 216) = 13.525, p< .001

Table 7 Prediction of Major Discipline Citations by Overall BESS Risk Score Variable b SE b β ΔR2 Step 1 .191*** Gender -0.369 0.122 -0.185**

Age 0.247 0.037 0.411*** Step 2 .015*

Overall BESS SF Risk Score 0.013 0.006 0.121*

* p < .05, ** p < .01, *** p < .001

Final model statistics: R2 = .205, F (3, 216) = 18.602, p< .001 Table 8 Prediction of Positive Behaviors by Overall BESS SF Risk Score Variable b SE b β ΔR2 Step 1 .061** Gender 0.368 0.132 .184**

Age 0.09 0.040 .150* Step 2 .025*

Overall BESS SF Risk Score -0.016 0.007 -.159*

* p < .05, ** p < .01, *** p < .001

Final model statistics: R2 = .086, F (3, 216) = 6.814, p< .001

75

Table 9 Classification Accuracy Using BESS SF Overall Score

Outcome Sensitivity Specificity

Positive Predictive

Power

Negative Predictive

Power Absences 12.500% 82.432% 25.714% 65.945% Number of Suspensions 18.391% 85.714% 45.714% 61.622%

76

Table 10 BESS SF Bifactor Model Standardized Weight Estimates

Item Description Overall Personal

Adjustment Inattention /

Hyperactivity Internalizing

Problems School

Problems

9. I am liked by others. .457 .654 21. People think I'm fun to be with. .328 .574 30. Others have respect for me. .414 .459 15. My parents trust me. .333 .287 26. My parents are proud of me. .384 .274 4. I like the way I look. .246 .273 18. My parents listen to what I say. .435 .122 1. I am good at making decisions. .381 .045 8. I have trouble paying attention to the teacher. .36 .538 25. I get into trouble for not paying attention. .329 .525 2. I talk while other people are talking. .272 .480 28. I have trouble standing still in lines. .262 .447 11. I have trouble sitting still. .294 .363 24. People tell me that I am too noisy. .309 .344 13. I feel like people are out to get me. .496 .873 14. I worry about what is going to happen. .405 .250 3. I worry but I don’t know why. .441 .155 10. I feel like my life is getting worse and worse. .714 .148

7. People get mad at me, even when I don't do anything wrong. .505 .126

16. I am left out of things. .534 .117 27. Even when I try hard, I fail. .617 -.082 23. I get blamed for things I can't help. .601 -.027 5. I feel out of place around people. .588 .025 20. I want to do better but can't. .489 12. School is boring. .212 .767

17. I hate school. .380 .687

19. Teachers are unfair. .363 .573

29. My school feels good to me. .286 .500

6. I feel like I want to quit school. .472 .395

22. Teachers make me feel stupid. .545 .210

77

Table 11 Prediction of Absences by the BESS Domain-Specific Factors Variable b SE b β ΔR2 Step 1 .031* Gender 1.025 0.755 .091 Age 0.484 0.227 .143* Step 2 .010 Internalizing Problems -0.001 0.051 -.001 School Problems 0.057 0.045 .102 Inattention/ Hyperactivity -0.042 0.044 -.074 Personal Adjustment 0.013 0.045 .023

* p < .05, ** p < .01, *** p < .001 Final model statistics: R2 = .041, F (6, 213) = 1.508, p > .05

Table 12 Prediction of Number of Suspensions by the BESS Domain-Specific Factors Variable b SE b β ΔR2 Step 1 .149*** Gender -0.278 0.197 -.089 Age 0.36 0.059 .383*** Step 2 .071** Internalizing Problems -0.009 0.013 -.057 School Problems 0.001 0.011 .009 Inattention/ Hyperactivity 0.045 0.011 .285*** Personal Adjustment -0.001 0.011 -.006

* p < .05, ** p < .01, *** p < .001 Final model statistics: R2 = .220, F (6, 213) = 10.005, p < .001

78

Table 13 Prediction of Days Suspended by the BESS Domain-Specific Factors Variable b SE b β ΔR2 Step 1 .136*** Gender -0.520 0.397 -.083 Age 0.689 0.119 .366*** Step 2 .062** Internalizing Problems -0.015 0.026 -.049 School Problems -0.005 0.023 -.016 Inattention/ Hyperactivity 0.085 0.022 .270*** Personal Adjustment 0.008 0.023 .024


Table 14 Prediction of Minor Discipline Citations by the BESS Domain-Specific Factors Variable b SE b β ΔR2 Step 1 .238*** Gender -0.410 0.119 -.205** Age 0.276 0.036 .460*** Step 2 .079*** Internalizing Problems -0.014 0.008 -.141 School Problems 0.011 0.007 .110 Inattention/ Hyperactivity 0.026 0.007 .263*** Personal Adjustment -0.004 0.007 -.043


79

Table 15 Prediction of Major Discipline Citations by the BESS Domain-Specific Factors Variable b SE b β ΔR2 Step 1 .191*** Gender -0.369 0.122 -.185** Age 0.247 0.037 .411*** Step 2 .065** Internalizing Problems -0.006 0.008 -.058 School Problems 0.002 0.007 .023 Inattention/ Hyperactivity 0.027 0.007 .270*** Personal Adjustment -0.003 0.007 -.029


Table 16 Prediction of Positive Behaviors by the BESS Domain-Specific Factors Variable b SE b β ΔR2 Step 1 .061** Gender 0.368 0.132 .184** Age 0.090 0.040 .150* Step 2 .073** Internalizing Problems 0.010 0.009 .103 School Problems -0.017 0.008 -.170* Inattention/ Hyperactivity -0.020 0.007 -.201** Personal Adjustment 0.000 0.008 -.003


80

APPENDIX Development of Specific Behavioral Outcomes Based on Kickboard Data

Category Referral Types (Educational and Community Supports, 2016) Specific Citations

Major Discipline Citations

Abusive Language/Inappropriate Language/Profanity, Bullying, Defiance/Insubordination/Non-Compliance, Fighting,

Forgery/Theft/Plagiarism, Inappropriate Location/Out of Bounds Area, Lying/Cheating, Property Damage/Vandalism, Skip Class

Bullying/Taunting Cursing/Vulgar Language Defacing School Property

Forgery Improper Touching

Leaving Early Lying

Skipping Skipping Detention

Stealing Throwing

Unauthorized Area Willful Disobedience

Minor Discipline Citations

Defiance, Disrespect, Disruption, Dress Code Violation, Property Misuse, Technology Violation

Causing Distractions/Disturbances Cell Phone or Electronic Device

Disrespect to Adults, Peers, or Property Eating in Computer Lab

Giving Up/Making Excuses Gossiping/Ribbing

Gum Chewing Horseplay/Play Fighting

Improper Use of Materials Incomplete Work

Littering Low/No Participation

No Do Now No Homework

No Learning Supplies Not Following Directions

Off Task Running in Hallway/Stairway Safe- Lining Up Incorrectly

Sleeping Talking During Level 0

Tardy to Class Unauthorized Food or Drinks

Uniform Violation - Jacket Coat Sweater Uniform Violation - Shirt

Uniform Violation - Shirt Untucked Uniform Violation - Shoes/Sneakers

Uniform Violation - Socks Uniform Violation - Wearing Hood

81

Category Referral Types Specific Citations

Positive Behaviors None

Dedication and Drive

Doing More than Asked

Exemplary Effort

Exemplary Leadership

Exemplary Service to Others

Exemplary Work

High Academic Achievement

High Enthusiasm

Major Improvement

Responsible - Lining Up Correctly

Taking Responsibility for Actions

Removed: Ambiguous None

LTS

Parental Involvement

Signed Paycheck Removed: Not start

using until 4th quarter

None Kindness, Empathy, and Respect for Others

82

LIST OF REFERENCES

Achenbach, T.M., McConaughy, S.H., & Howell, C.T. (1987). Child/adolescent

behavioral and emotional problems: Implications of cross-informant correlations

for situational specificity. Psychological Bulletin, 101, 213 – 232. doi:

10.1037/0033-2909.101.2.213

Achenbach, T. M., & Rescorla, L. A. (2001). Manual for the ASEBA School-Age Forms

& Profiles. Burlington, VT: University of Vermont, Research Center for Children,

Youth, & Families.

Albers, C.A., & Kettler, R.J. (2014). Best practices in universal screening. In P. Harrison

and A. Thomas (Eds.), Best practices in school psychology: Data-based and

collaborative decision making (pp. 121 – 131). Bethesda, MD: NASP.

American Academy of Pediatrics (2012). Addressing mental health concerns in primary

care: A clinician’s toolkit. Mental health screening and assessment tools for

primary care. Retrieved from: https://www.aap.org/en-us/advocacy-and-

policy/aap-health-initiatives/Mental-Health/Documents/MH_ScreeningChart.pdf

American Psychological Association Zero Tolerance Task Force (2008). Are zero

tolerance policies effective in the schools? An evidentiary review and

recommendations. American Psychologist, 63, 852 – 862. doi: 10.1037/0003-

066X.63.9.852

83

Bear, G.G. (2008). School-wide approaches to behavior problems. In B. Doll and J.A.

Cummings (Eds.), Transforming school mental health services: Population-based

approaches to promoting the competency and wellness of children (pp. 103 –

141). Thousand Oaks, CA: NASP and Corwin Press.

Bronfenbrenner, U. (1977). Toward an experimental ecology of human development.

American Psychologist, 32, 513-531. doi: 10.1037/0003-066X.32.7.513

Bronfenbrenner, U. (1986). Ecology of the family as context for human development:

Research perspectives. Developmental Psychology, 22, 723-742. doi:

10.1037/0012-1649.22.6.723

Brunner, M., Nagy, G., & Wilhelm, O. (2012). A tutorial on hierarchically structured

constructs. Journal of Personality, 80, 796 – 846. doi: 10.1111/j.1467-

6494.2011.00749.x

Burke, M.D., Davis, J.L., Lee, Y.-H., Hagan-Burke, S., Kwok, O.-M., & Sugai, G.

(2012). Universal screening for behavioral risk in elementary schools using

SWPBS expectations. Journal of Emotional and Behavioral Disorders, 20, 38 –

54. doi: 10.1177/1063426610377328

California Department of Education (2010). California Healthy Kids Survey. Retrieved

from http://chks.wested.org

Chang, H.N., & Romero, M. (2008). Present, engaged and accounted for: The critical

importance of addressing chronic absence in the early grades. Retrieved from

http://www.nccp.org/publications/pdf/text_837.pdf

84

Chen, F.F., West, S.G., & Sousa, K.H. (2006). A comparison of bifactor and second-

order models of quality of life. Multivariate Behavioral Research, 41, 189 – 225.

doi:10.1207/s15327906mbr4102_5

Child Trends & EMT Associates, Inc. (2016). Louisiana compilation of school discipline

laws and regulations. Retrieved from https://safesupportivelearning.ed.gov/

sites/default/files/ discipline-

compendium/Louisiana%20School%20Discipline%20Laws%20and%20

Regulations.pdf

Chin, J.K., Dowdy, E., & Quirk, M.P. (2013). Universal screening in middle school:

Examining the Behavioral and Emotional Screening System. Journal of

Psychoeducational Assessment, 31, 53 – 60. doi: 10.1177/0734282912448137.

De Los Reyes, A., & Kazdin, A.E. (2005). Informant discrepancies in the assessment of

childhood psychopathology: A critical review, theoretical framework, and

recommendations for further study. Psychological Bulletin, 2005; 483 – 509. doi:

10.1037/0033/2909.131.4.483

Dowdy, E. Furlong, M.J., & Sharkey, J.D. (2012). Using surveillance of mental health to

increase understanding of youth involvement in high-risk behaviors: A value-

added analysis. Journal of Emotional and Behavioral Disorders, 21, 33 – 44. doi:

10.1177/10634266611416817

Dowdy, E., Twyford, J.M., Chin, J.K., DiStefano, C.A., Kamphaus, R.W., & Mays, K.L.

(2011). Factor structure of the BASC-2 Behavioral and Emotional Screening

System Student Form. Psychological Assessment, 23, 379 – 387. doi:

10.1037/a0021843

85

Educational and Community Supports (2016). PBIS apps. Retrieved from

https://www.pbisapps.org/Pages/Default.aspx

Eklund, K., & Dowdy, E. (2014). Screening for behavioral and emotional risk versus

traditional school identification methods. School Mental Health, 6, 40 – 49. doi:

10.110/s12310-013-9109-1

Feeney-Kettler, K.A., Kratochwill, T.R., Kaiser, A.P., Hemmeter, M.L., & Kettler, R.J.

(2010). Screening young children’ risk for mental health problems: A review of

four measures. Assessment for Effective Intervention, 35, 218 – 230. doi:

10.1177/1534508410380557

Fenning, P., & Rose, J. (2007). Overrepresentation of African American students in

exclusionary discipline: The role of school policy. Urban Education, 42, 536 –

559. doi: 10.1177/0042085907305039

Field, A. (2013). Discovering statistics using IBM SPSS Statistics and sex and drugs and

rock ‘n’ roll (4th ed.). Washington, DC: Sage Publications Ltd.

Foreman, S.F., & Zins, J.E. (2008). Section commentary: Evidence-based consultation:

The importance of context and the consultee. In W.P. Erchul and S.M. Sheridan

(Eds.), Handbook of research in school consultation (pp. 361 – 371). New York:

Routledge.

Fredricks, J.A., Blumenfeld, P.C., & Paris, A.H. (2004). School engagement: Potential of

the concept, state of the evidence. Review of Educational Research, 74, 59 – 109.

doi: 10.3102/00346543074001059

86

Gion, C.M., McIntosh, K., & Horner, R. (2014). Patterns of minor office discipline

referrals in schools using SWIS. Retrieved from https://www.pbis.org/blueprint/

evaluation-briefs/patterns-of-minor-odrs

Glaros, A.G., & Kline, R.B. (1988). Assessing the accuracy of tests with cutting scores:

The sensitivity, specificity, and predictive value model. Journal of Clinical

Psychology, 44, 1013 – 1023. doi: 10.1002/1097-4679(198811)44:63.0.C);2-Z

Glover, T.A., & Albers, C.A. (2007). Considerations for evaluating universal screening

assessments. Journal of School Psychology, 45, 117 – 135. doi:

10.1016/j.jsp.2006.05.020

Gray, M.J., Litz, B.T., Hsu, J.L., & Lombardo, T.W. (2004). Psychometric properties of

the Life Events Checklist. Assessment, 11, 330 – 341. doi:

10.1177/1073191104269954

Gregory, A., Skiba, R.J., & Noguera, P.A. (2010). The achievement gap and the

discipline gap: Two sides of the same coin? Educational Research, 39, 59 – 68.

doi: 10.3102/001389X09357621

Goodman, R. (1997). The Strengths and Difficulties Questionnaire: A research note.

Journal of Child Psychology and Psychiatry, and Allied Disciplines, 38, 581 –

586. doi: 10.1111/j.1469-7610.1997.tb01545.x

Harrell-Williams, L.M., Raines, T.C., Kamphaus, R.W., & Dever, B.V. (2015).

Psychometric analysis of the BASC-2 Behavioral and Emotional Screening

System (BESS) Student Form: Results from high school student samples.

Psychological Assessment. Advance online publication.

http://dx.doi.org/10.1037/pas0000079

87

Helms, J.E. (1992). Why is there no study of cultural equivalence in standardized

cognitive ability testing? American Psychologist, 47, 1083 – 1101. doi:

10.1037/0003-066X.47.9.1083

Helms, J.E. (2006). Fairness is not validity or cultural bias in racial-group assessment: A

quantitative perspective. American Psychologist, 61, 845 – 859. doi:

10.1037/0003-066X.61.8.845

Hill, L.G., Lochman, J.E., Coie, J.E., Greenberg, M.R., & the Conduct Problems

Prevention Research Group (2004). Effectiveness of early screening for

externalizing problems: Screening accuracy and utility. Journal of Consulting and

Clinical Psychology, 72, 809 – 820. doi: 10.1037/0022-006S.72.5.809

Kamenetz, A. (2014). The end of neighborhood schools. Retrieved from

http://apps.npr.org/the-end-of-neighborhood-schools/

Kamphaus, R.W., DiStefano, C., Dowdy, E., Eklund, K., & Dunn, A.R. (2010).

Determining the presence of a problem: Comparing two approaches for detecting

youth behavioral risk. School Psychology Review, 39, 395 – 407.

Kamphaus, R.W., & Reynolds, C.R. (2007). BASC-2 Behavioral and Emotional

Screening System manual. Minneapolis, MN: Pearson.

Kaufman, J.S., Jaser, S.S., Vaughan, E.L., Reynolds, J.S., Di Donato, J., Bernard, S.N., &

Hernandez-Brereton, M. (2010). Patterns in office referral data by grade,

race/ethnicity, and gender. Journal of Positive Behavior Interventions, 12, 44 –

54. doi: 10.1177/1098300708329710

Kickboard (2016). Kickboard. Retrieved from https://www.kickboardforschools.com/

88

King, K.R., & Reschly, A.L. (2014). A comparison of screening instruments: Predictive

validity of the BESS and the BSC. Journal of Psychoeducational Assessment, 32,

687 – 698. doi: 10.1177/0734282914531714

King, K., Reschly, A.L., & Appleton, J.J. (2012). An examination of the validity of the

Behavioral and Emotional Screening System in a rural elementary school:

Validity of the BESS. Journal of Psychoeducational Assessment, 30, 527 – 538.

doi: 10.1177/07342829440673

Kóbor, A., Takács, Á., & Urbán, R. (2013). The bifactor model of the Strengths and

Difficulties Questionnaire. European Journal of Psychological Assessment, 29,

299 – 307. doi: 10.1027/1015-5759/a000160

Lance, C.E., Butts, M.M., & Michels, L.C. (2006). The sources of four commonly

reported cutoff criteria: What did they really say? Organizational Research

Methods, 9, 202 – 220. doi: 10.1177/10944228105284919

Levitt, J.M., Saka, N., Romanelli, L.H., & Hoagwood, K. (2007). Early identification of

mental health problems in schools: The status of instrumentation. Journal of

School Psychology, 45, 163 – 191. doi: 10.1016/j.jsp.2006.11.005

Louisiana Department of Education (n.d.). Louisiana believes: Attendance requirements.

Retrieved from https://www.louisianabelieves.com/courses/attendance-

requirements

MacCallum, R.C., Widaman, K.F., Zhang, S., & Hong, S. (1999). Sample size in factor

analysis. Psychological Methods, 4, 84 – 99. doi: 10.1037/1082-989X.4.1.84

89

McIntosh, K., Campbell, A.L., Carter, D.R., & Zumbo, B.D. (2009). Concurrent validity

of office discipline referrals and cut points used in schoolwide positive behavior

support. Behavioral Disorders, 34, 100 – 113.

Michel, C., Schultze-Lutter. F. & Schimmelmann, B.G. (2014). Screening instruments in

child and adolescent psychiatry: General and methodological considerations.

European Journal of Child and Adolescent Psychiatry, 23, 725 – 727. doi:

10.10007/s00787-014-0608-x

Morrissey, T.W., Hutchison, L., & Winsler, A. (2014). Family income, school

attendance, and academic achievement in elementary school. Developmental

Psychology, 50, 741 – 753. doi: 10.1037/a0033848

Naser, S., Hitti, A., & Overstreet, S. (2016). The Behavioral and Emotional Screening

System Student Form: Is there evidence of a global at-risk factor in a sample of

African American youth? Manuscript submitted for publication.

Netland, M. (2001). Assessment of exposure to political violence and other potentially

traumatizing events. A critical review. Journal of Traumatic Stress, 14, 311 –

326. doi: 10.1023/A:1011164901867

New Orleans Parents’ Guide (2014). New Orleans parents’ guide to public schools:

Spring 2014 Edition. Retrieved from http://neworleansparentsguide.org/files/

NOPG2014.pdf

New Orleans Parents’ Guide (2015). New Orleans parents’ guide to public schools:

Spring 2015 Edition. Retrieved from http://neworleansparentsguide.org/

90

Overstreet, S., & Mazza, J.J. (2003). An ecological-transactional understanding of

community violence: Theoretical perspectives. School Psychology Quarterly, 18,

66 – 87. doi: 10.1521/scpq.18.1.66.20874

Phares, V., & Compas, B.E. (1990). Adolescents’ subjective distress over their

emotional/behavioral problems. Journal of Consulting and Clinical Psychology,

58, 596 – 603. doi: 10.1037/0022-006X.58.5.596

Positive Behavioral Interventions & Supports (PBIS; 2016). PBIS: Positive behavioral

interventions & supports: OSEP technical assistance center. Retrieved from

http://www.pbis.org/

Putnam, R.F., Luiselli, J.K., Handler, M.W., & Jefferson, G.L. (2003). Evaluating student

discipline practices in a public school through behavioral assessment of office

referrals. Behavior Modification, 27, 505 – 523. doi: 10.1177/0145445503255569

Raines, T.C., Dever, B.V., Kamphaus, R.W., & Roach, A.T. (2012). Universal screening

for behavioral and emotional risk: A promising method for reducing

disproportionate placement in special education. The Journal of Negro Education,

81, 283 – 296.

Reise, S.P. (2012). The rediscovery of bifactor measurement models. Multivariate

Behavioral Research, 47, 667 – 696. doi: 10.1080/00273171.2012.71555

91

Renshaw, T.L., Eklund, K., Dowdy, E., Jimerson, S.R., Hart, S.R., Earhart, Jr., J., &

Jones, C.N. (2009). Examining the relationship between scores on the Behavioral

and Emotional Screening System and student academic, behavioral, and

engagement outcomes: An investigation of concurrent validity in elementary

school. The California School Psychologist, 14, 81 – 88. doi:

10/1007/BF03340953

Reynolds, C.R., & Kamphaus, R.W. (2004). Behavior Assessment System for Children –

second edition (BASC-2). Circle Pines, MN: AGS.

Nastasi, B.K., Moore, R.B., & Varjas, K.M. (2004). School-based mental health services:

Creating comprehensive and culturally specific programs. Washington, DC:

American Psychological Association.

Salbach-Andrae, H., Lenz, K., & Lehmkuhl, U. (2009). Patterns of agreement among

parent, teacher and youth ratings in a referred sample. European Psychiatry, 24,

345 – 351. doi: 10.1016/j.eurpsy.2008.07.008

Schanding, Jr., G.T., & Nowell, K.P. (2013). Universal screening for emotional and

behavioral problems: Fitting a population-based model. Journal of Applied School

Psychology, 29, 104 – 119. doi: 10.1080/15377903.2013.751479

Sims, P., & Vaughn, D. (2014). The state of public education in New Orleans: 2014

report. Retrieved from www.speno2014.com/wpcontent/uploads/2014/08/

SPENO-HQ.pdf

Skiba, R.J., Horner, R.H., Chung, C.-G., Rausch, M.K., May, S.L., & Tobin, T. (2011).

Race is not neutral: A national investigation of African American and Latino

disproportionality in school discipline. School Psychology Review, 40, 85-107.

92

Skiba, R.J., Michael, R.S., Nardo, A.C., & Peterson, R.L. (2002). The color of discipline:

Sources of racial and gender disproportionality in school punishment. The Urban

Review, 34, 317 – 342. doi: 10.1023/A:1021320817372

Splett, J.W., Fowler, J., Weist, M.D., McDaniel, H., & Dvorsky, M. (2013). The critical

role of school psychology in the school mental health movement. Psychology in

the Schools, 50, 245 – 258. doi: 10.1002/pits.21677

Streiner, D.L. (2003). Diagnosing tests: Using and misusing diagnostic and screening

tests. Journal of Personality Assessment, 81, 209 – 219. doi:

10.1207/S153277552JPA8103_03

Tabachnick, B.G., & Fidell, L.S. (2007). Using multivariate statistics (5th ed.). Boston,

MA: Pearson Education.

U.S. Department of Education: Office of Civil Rights. (2016). 2013 – 2014 civil rights

data collection: A first look. Retrieved from http://www2.ed.gov/about/offices

/list/ocr/docs/CRDC2013-14-first-look.pdf

Walker, B.A. (2010). Effective schoolwide screening to identify students at risk for social

and behavioral problems. Behavior Management, 46, 104 – 110. doi:

10.1177/1053451210374989

Walker, B., Cheney, D., Stage, S., & Blum, C. (2005). Schoolwide screening and positive

behavior supports: Identifying and supporting students at risk for school failure.

Journal of Positive Behavior Intervention, 7, 194 – 204. doi:

10.1177/10983007050070040101

93

Walker, H.M., Nishioka, V.M., Zeller, R., Severson, H.H., & Feil, E.G. (2000). Causal

factors and potential solutions for the persistent underidentification of students

having emotional or behavioral disorders in the context of schooling. Assessment

for Effective Intervention, 26, 29 – 39. doi: 10/1177/073724770002600105

Wang, M.-T., & Fredricks, J.A. (2014). The reciprocal links between school engagement,

youth problem behaviors, and school dropout during adolescence. Child

Development, 85, 722 – 737 doi: 10.1111/cdev.121138

Way, S.M. (2011). School discipline and disruptive classroom behavior: The moderating

effects of student perceptions. The Sociological Quarterly, 52, 346 – 375. doi:

10.1111/j.1533-8525.2011.01210.x

Wiesner, M., & Schanding, G. T. (2013). Exploratory structural equation modeling,

bifactor models, and standard confirmatory factor analysis models: Application to

the BASC-2 Behavioral and Emotional Screening System Teacher Form. Journal

of School Psychology, 51, 751 – 763. doi: 10.1016/j.jsp.2013.09.001

Yang, Y., & Green, S.B. (2010). A note on structural equation modeling estimates of

reliability. Structural Equation Modeling: A Multidisciplinary Journal, 17, 66 –

81. doi: 10.1080/10705510903438963

Young, E.L., Sabbah, H.Y., Young, B.J., Reiser, M.L., & Richardson, M.J. (2010).

Gender differences and similarities in a screening process for emotional and

behavioral risks in secondary schools. Journal of Emotional and Behavioral

Disorders, 18, 225 – 235. doi: 10.1177/1063426609338858

94

Zins, J.E., Bloodworth, M.R., Weissberg, R.P., & Walberg, H.J. (2007). The scientific

base linking social and emotional learning to school success. Journal of

Educational and Psychological Consultation, 17, 191 – 210. doi:

10.1080/10474410701413145

Zolog, T.C., Jane-Ballabriga, M.C., Bonillo-Martin, A., Canals-sans, J., Hernandez-

Martinez, C., Romero-Acosta, K., & Domenech-Llaberia, E. (2011). Somatic

complaints and symptoms of anxiety and depression in a school-based sample of

preadolescents and early adolescents, functional impairment and implications for

treatment. Journal of Cognitive and Behavioral Psychotherapists, 11, 191 – 208.

95

BIOGRAPHY

Kathryn Jones received her Bachelor of Science in Psychology, Bachelor of Arts in

Sociology, and Master of Science in Psychology from Tulane University. She also has a

Master of Arts in Forensic Psychology from Marymount University. She is currently a

doctoral candidate in School Psychology at Tulane University. She completed her

internship through the Psychological Services Center with the Illinois School Psychology

Internship Consortium. On internship, she had the opportunity to work in schools and in

primary care, which allowed her to work across settings to benefit the mental health of

youth and their families. Her research interests focus on the relationship between

psychosocial and physical factors in the development and maintenance of pediatric

somatic symptoms. Additionally, she is interested in the role of universal screenings in

schools and medical settings to improve identification, prevention, and intervention of

behavioral and emotional difficulties in youth. Kathryn will complete her post-doctoral

training in integrated pediatric primary care with Geisinger Medical Systems starting in

August 2016.