Why are teachers evaluated




















Likewise the true positive rate would go up and the true negative rate would go down. Much of the concern and cautions about the use of value-added have focused on the frequency of occurrence of false negatives, i. But framing the problem in terms of false negatives places the focus almost entirely on the interests of the individual who is being evaluated rather than the students who are being served.

It is easy to identify with the good teacher who wants to avoid dismissal for being incorrectly labeled a bad teacher. However, an evaluation system that results in tenure and advancement for almost every teacher and thus has a very low rate of false negatives generates a high rate of false positives, i.

These teachers drag down the performance of schools and do not serve students as well as more effective teachers. In the simplest of scenarios involving tenure of novice teachers, it is in the best interest of students to raise the cut point thereby increasing the proportion of truly effective teachers staffing classrooms whereas it is in the best interest of novice teachers to lower the cut point thereby making it more likely that they will be granted tenure.

Our message is that the interests of students and the interests of teachers in classification errors are not always congruent, and that a system that generates a fairly high rate of false negatives could still produce better outcomes for students by raising the overall quality of the teacher workforce.

Discussions of teacher evaluation at the policy and technical levels often proceed in isolation from experience and evidence from other related fields. But we know a lot about performance evaluation in other labor markets, knowledge that should inform debates about value-added and teacher evaluation in general. The correlation in test-based measures of teaching effectiveness between one school year and the next lies between. And ten percent of bottom quartile teachers one year would appear in the top quartile the next.

It is instructive to look at other sectors of the economy as a gauge for judging the stability of value-added measures. The use of imprecise measures to make high stakes decisions that place societal or institutional interests above those of individuals is wide spread and accepted in fields outside of teaching. Nevertheless nearly all selective colleges use SAT or ACT scores as a heavily weighted component of their admission decisions even though that produces substantial false negative rates students who could have succeeded but are denied entry.

Why would colleges use such a flawed selection instrument? In health care, patient volume and patient mortality rates for surgeons and hospitals are publicly reported on an annual basis by private organizations and federal agencies and have been formally approved as quality measures by national organizations.

Nevertheless, these measures are used by patients and health care purchasers to select providers because they are able to predict larger differences across medical providers in patient outcomes than other available measures. In a similar vein, the volume of home sales for realtors; returns on investment funds; productivity of field-service personnel for utility companies; output of sewing machine operators; and baseball batting averages predict future performance only modestly.

A meta-analysis [12] of 22 studies of objective performance measures found that the year-to-year correlations in high complexity jobs ranged from 0.

The between-season correlation in batting averages for professional baseball players is. We should not set unrealistic expectations for the reliability or stability of value-added. Value-added evaluations are as reliable as those used for high stakes decisions in many other fields.

We know a good deal about how other means of classification of teachers perform versus value-added. Rather than asking value-added to measure up to an arbitrary standard of perfection, it would be productive to ask how it performs compared to classification based on other forms of available information of teachers. Here the research is quite clear: if student test achievement is the outcome, [14] value-added is superior to other existing methods of classifying teachers. Classification that relies on other measurable characteristics of teachers e.

Consider a particular example that has arisen as a consequence of the deep recession: the need of districts to lay off teachers as a result of budget shortfalls. Managers in most industries would attempt to target layoffs so as to cause as little damage as possible to productivity — less productive workers would be dismissed or furloughed before more productive workers.

Suppose school district leaders were similarly motivated and had flexibility in deciding how to proceed. Imagine three possible approaches for deciding who should be dismissed. The first approach would employ the existing teacher evaluation system based on principal ratings, which identifies a few teachers as unsatisfactory but categorized the vast majority of teachers as satisfactory. The second approach would employ teacher experience, which has been found in a number of studies to have a statistically significant positive association with student achievement.

The third approach would use teacher value-added scores to identify the lowest performing teachers. Researchers have compared these three approaches using data from fourth and fifth grade public school teachers in New York City and simulating the elimination of enough teachers to reduce the budget by 5 percent.

The horizontal axis is teacher effectiveness as indexed by student gains whereas the vertical axis is the number of teachers. Teacher effectiveness scores are those regularly calculated by the NYC public schools and could encompass teacher performance going back as far as four years.

Note that if teachers were laid off based on seniority they would be distributed across the full range of performance in terms of effectiveness in raising student test scores whereas teachers laid off based on low value-added scores would be at the bottom of the distribution. In other words, many more effective teachers would be retained were layoffs based on value-added than were they based on seniority. Principal ratings, not shown in the graph, perform better than teacher seniority in identifying teachers with low effectiveness in raising student achievement, but not nearly as well as value-added scores.

The question, then, is not whether evaluations of teacher effectiveness based on value-added are perfect or close to it: they are not. The question, instead, is whether and how the information from value-added compares with other sources of information available to schools when difficult and important personnel decisions must be made. For example, keeping ineffective teachers on the job while dismissing far better teachers is something most school leaders, parents, and the general public would want to avoid.

Value-added is a better tool for that purpose than other measures such as teacher experience, certification status, seniority, and principal ratings, even though it is imperfect. We have a lot to learn about how to improve the reliability of value-added and other sources of information on teacher effectiveness, as well as how to build useful personnel policies around such information. However, too much of the debate about value-added assessment of teacher effectiveness has proceeded without consideration of the alternatives and by conflating objectionable personnel policies with value-added information itself.

When teacher evaluation that incorporates value-added data is compared against an abstract ideal, it can easily be found wanting in that it provides only a fuzzy signal.

Teachers differ dramatically in their performance, with large consequences for students. Staffing policies that ignore this lose one of the strongest levers for lifting the performance of schools and students. That is why there is great interest in establishing teacher evaluation systems that meaningfully differentiate performance.

Teaching is a complex task and value-added captures only a portion of the impact of differences in teacher effectiveness. Thus high stakes decisions based on value-added measures of teacher performance will be imperfect. We do not advocate using value-added measures alone when making decisions about hiring, firing, tenure, compensation, placement, or developing teachers, but surely value-added information ought to be in the mix given the empirical evidence that it predicts more about what students will learn from the teachers to which they are assigned than any other source of information.

The widget effect: Our national failure to acknowledge and act on differences in teacher effectiveness. A celebratory road trip by education secretary, New York Times , p. Four other objectives -- to make tenure and promotion decisions, to discharge incompetent teachers, to help teachers define standards for their peers, and to determine teachers" pay levels -- are summative goals involving personnel decisions.

The ninth objective, to give administrator greater control over teacher job performance, does not fit into either category. Most teachers perceive that evaluations at their school are used to promote the development of improved teaching skills rather than to assist administrators and other teachers to make judgments affecting personnel decisions for teachers.

Furthermore, most teachers do not believe that the latter goals should be objectives of performance evaluations at their schools Figure 6. Formative Goals A majority of teachers reported that formative goals, that is, goals associated with professional development guiding improvement of teaching skills, recognizing and reinforcing teaching excellence, helping teachers focus on student outcomes, and planning in service education activities , should be an objective to a great extent in teacher performance evaluations Table 6 and Figure 6.

However, approximately 20 percent fewer teachers reported that each of these four goals had been an objective to a great extent at their school when they were last evaluated. For example: Guiding improvement of teaching skills was cited by 81 percent of teachers as being an appropriate objective to a great extent, but only 61 percent of teachers said that it actually was an objective to a great extent in their last evaluation.

While 70 percent of teachers believe that recognizing and reinforcing teaching excellence should be an objective of teacher performance evaluations to a great extent 51 percent reported that it actually was an objective to a great extent when they were last evaluated.

Smaller percentages of teachers cited goals associated with personnel decisions for teachers as having been an objective at their school.

Furthermore, smaller percentages felt that summative goals should be objectives of teacher performance evaluations to a great extent than believed that of formative goals. The summative goals measured in this survey are to make tenure and promotion decisions, to discharge incompetent teachers, to help teachers define standards for their peers, and to determine teachers" pay levels. Despite more congruence between teachers" opinions and school objectives on these goals, there was a significant difference between teachers" perceptions of whether these should be objectives to a great extent compared to whether they actually were objectives to a great extent at the time of the last evaluation This was true for each goal except the goal of making tenure and promotion decisions.

For instance, 45 percent of teachers thought the goal of discharging incompetent teachers should be an objective of teacher performance evaluations to a great extent but only 18 percent reported that it was actually an objective to a great extent when they were last evaluated.

Administrative Control The goal of giving administrators greater control over teacher job performance showed a different pattern from the other objectives. Eleven percent of teachers believed that giving administration greater control over teacher job performance should be an objective to a great extent however, 15 percent reported that it was actually an objective to a great extent when they were last evaluated. Perhaps one of the most striking findings is that 75 percent of teachers reported that determining teachers" pay levels was not at all an objective at their school when they were last evaluated, and 50 percent agree that it should not be an objective to any extent Table 6.

This opinion varies with years of service in the current school. Some policymakers, though, focused more closely on the prospect of identifying and removing bad teachers quickly and efficiently. Federal intervention gave muscle to the focus on teacher evaluations.

Department of Education began the Race to the Top competition , offering grants to states that agreed to make certain policy changes. Among the prescribed changes was the requirement to develop and implement new teacher-evaluation systems that differentiated among at least three levels of performance and took student achievement into account. Major philanthropies also helped to fuel activity around teacher evaluation. By , 28 states had moved to require teachers to be evaluated annually, up from 15 in , and 41 states required consideration of student-achievement data, up from 15 in , according to one tally.

Because teacher evaluation remains a state and local priority, all of the policies are drafted at those levels. District collective bargaining agreements can add additional nuances.

As legislators overhauled the systems, some states also took steps to connect the new evaluation systems to other policies, including teacher compensation, promotion, and dismissal. A Colorado law, for instance, permits schools to return tenured teachers who receive several poor evaluations to probationary status.

The new evaluation systems are far more complex than previously used checklists. They consist of several components, each scored individually. Most of them heavily weigh periodic observations of teachers keyed to teaching standards, such as the well-known Framework for Teaching developed by consultant Charlotte Danielson. Districts and states differ in how frequently they require teachers to be observed, whether the observations must be announced beforehand, and who conducts them.

Policymakers also sought more objective measures in the system because of concerns that personal relationships made it more difficult for principals to grade them accurately. The inclusion of student test scores was a requirement under the federal initiatives, for example. The most sophisticated approach uses a statistical technique known as a value-added model, which attempts to filter out sources of bias in the test-score growth so as to arrive at an estimate of how much each teacher contributed to student learning.

Critics of the approach point to studies showing that the estimates are, in the words of one U. By , the governing bodies of both the National Education Association and the American Federation of Teachers had issued new policy statements on teacher evaluation.

Concerns over the use of test scores in evaluations have fueled more than a dozen lawsuits targeting the new evaluation systems. And indeed, standardized testing appears to have become more frequent as a result of evaluation pressures. Because only about 15 percent to 30 percent of teachers instruct in grades and subjects in which standardized-test-score data are available , some states and districts have devised or added additional tests. The new evaluations were also rolled out alongside the Common Core State Standards and related exams, leaving teachers concerned about how the harder tests will affect their performance evaluations in the future.

As a result of such concerns, some states, with federal approval, have pushed back the dates for attaching consequences to the reviews. But Democrats, typically champions of labor priorities, have been among the supporters of the new teacher-evaluation systems. A few studies do show some preliminary evidence that teachers who receive high-quality feedback subsequently go on to boost student performance.

For that reason, teacher evaluation is likely remain a contentious and central topic in K education. Collective Bargaining: The process by which a district and a union representing teachers arrive at a contract spelling out work hours and conditions, salary, benefits, and processes for handling grievances.

Often, contracts also set out details on professional development and other school initiatives, or supplement state law governing teachers.

Contracts are legally binding. Teacher Observations: Most teacher-evaluation systems require teachers to be observed several times. State and local policies determine such details as the length of the observations, the mix of formal and informal visits, whether they must be accompanied by pre- or post-observation conferences, and who conducts them.

Most states have probationary periods of three years. In general, tenured teachers can be fired only for a reason listed in state law. Districts must prove that they have met this standard during a due-process hearing.

Due-process procedures typically differ based on whether the charges deal with misconduct or poor performance. Value-Added Model VAM : In the context of teacher evaluation, value-added modeling is a statistical method of analyzing growth in student-test scores to estimate how much a teacher has contributed to student-achievement growth.

In general, VAMs factor in the gains the student was expected to make based on past performance, and in some cases, control for elements such as peer characteristics and background, including poverty level and family education. March



0コメント

  • 1000 / 1000