Understanding SRTR's 3-Tier Outcome Assessment

Note regarding the 5-tier outcome assessment released on December 20, 2016

In December 2016, the Scientific Registry of Transplant Recipients (SRTR) replaced the “3-tier” public rating assessment of transplant center performance on its website with a “5-tier” assessment, with the goal of improving the usefulness of outcome information for transplant patients. 

In response to feedback received from members of the transplant community regarding the lack of adequate time to review the new 5-tier rating system prior to implementation, HRSA requested that SRTR transfer the 5-tier rating to an alternate, publicly available beta site to undergo further review and identification of areas for improvement.  The SRTR website’s outcome assessment information reverted to the 3-tier system on Tuesday, Feb. 21st. Your comments about the 5-tier system on the beta site and/or the 3-tier system are welcomed, and may be submitted to SRTR by contacting us.

HRSA and SRTR remain committed to seeking and incorporating input from all stakeholders, especially patients, so that we can continually improve the SRTR web site and make outcome information more transparent and understandable for patients and their caregivers.


Summarizing transplant program performance using a 3-tier system

SRTR assigns each transplant program an outcome assessment (“worse than expected”, “as expected”, or “better than expected”) based on a risk-adjusted assessment of 1st-year success, defined as being alive with a functioning transplant one year after transplant. The outcome assessment is displayed in the program search results, and in the more detailed results shown for the program. This guide is meant to help you understand why SRTR assigns the outcome assessment, how SRTR calculates the assessment, and how to interpret the assessment.

A Guide for Patients

Outcome assessment

The outcome assessment tells you if the program’s 1-year success rate is better or worse than what is expected for that program after adjustment for how sick their patients are and the quality of donor organs used in the transplants. 

Why does the assessment show how outcomes compare to what is “expected”?

Many things affect whether a transplant recipient has a good outcome after transplant. Some patients are sicker than others and some organs come from healthier donors. Some programs may perform more transplants in sicker patients, and these programs would “expect” to have more complications. The comparison to “expected” tells you how a program’s outcomes compare to national predicted outcomes for similar patients transplanted with similar donors.

What do the ratings mean?

After taking into account the health of the recipient and donors at each program:

  • “Better than Expected” means that we are at least 97.5% certain that the program’s success rate is better than expected.
  • “As Expected” means that we are not at least 97.5% certain that the program’s success rate is better or worse than expected.
  • “Worse than Expected” means that we are at least 97.5% certain that the program’s success rate is worse than expected.

What should I think about when interpreting the outcome assessment?

A program’s assessment might be higher than others’ assessments, but this may not be the most important factor. You may also want to consider:

  • Waiting times.
  • Distance from home.
  • Insurance coverage.

In some cases, the risk of a complication after surgery is much lower than the risk of becoming too sick to undergo transplant while on the waiting list. Your care team can discuss the risks and how they may affect your decisions.

Why does SRTR assign an outcome assessment?

The reporting requirements of the OPTN Final Rule state that OPTN and SRTR, as appropriate, shall "Make available to the public timely and accurate program-specific information on the performance of transplant programs. This shall include free dissemination over the Internet, and shall be presented, explained, and organized as necessary to understand, interpret, and use the information accurately and efficiently" (OPTN Final Rule 121.11(b)(iv)). Further in fulfillment of the Final Rule, the SRTR contractor must identify transplant programs and organ procurement organizations with better or worse outcomes (SRTR Task 3.9.1). 

To fulfill its contractual obligation, SRTR currently evaluates transplant outcomes at three time points: 1 month, 1 year, and 3 years after transplant. In addition, SRTR evaluates two different outcomes: 1) survival with a functioning transplanted organ, and 2) survival regardless of whether the organ continues to function. SRTR uses complex statistical methods to perform these evaluations. These methods attempt to adjust for the case mix at the transplant program so programs that perform transplants in sicker patients, or accept higher-risk donors than other programs, are not penalized in their evaluations. These evaluations employ a Bayesian statistical methodology that results in an estimated "hazard ratio" for every program, for each outcome evaluated. The hazard ratio tells us how each program's outcomes compare with what we expected to happen given the types of patients and the types of donors the program accepts. A hazard ratio of 1.0 indicates that the program's results were exactly as expected, whereas a value of 2.0 indicates failure rates, i.e., death or failure of the transplanted organ, that are twice as high as expected, and, conversely, a value of 0.5 indicates failure rates that are half what would be expected. In addition, a level of certainty is associated with each estimated hazard ratio. Larger programs have more data available, and therefore we are generally more certain about their hazard ratios; we have less certainty about smaller programs because they have less data. Therefore, even if two programs have the same hazard ratio, we may be more certain about one than the other. These are difficult statistical concepts. SRTR uses the 3-tiered outcome assessment to translate program outcomes for people without statistical training.

How is the outcome assessment calculated?

Visual Depiction of the Posterior Probability Density of the Program's Hazard Ratio
Figure 1. Estimate the program's hazard ratio for first-year graft failure.
Step 1. Estimate the Program's Hazard Ratio for First-Year Graft Failure

The outcome assessment is calculated from SRTR's evaluation of the program's first-year graft failure rate. In other words, we start with our assessment of how often patients at the program die or lose function of their transplanted organ (the graft) within the first year after transplant. This evaluation results in an estimate of the program's hazard ratio which is provided in the full program-specific report in Figure C3 for adult transplant recipients and Figure C9 for pediatric transplant recipients. The hazard ratio is a measure of how many patients did not make it through the first year with a functioning graft relative to how many we expected not to make it. So, a hazard ratio of 1.0 means that we observed exactly the number of graft failures we expected at the program (after taking into account the types of patients and the donors the program accepts). A hazard ratio of 0.5 means that the program experienced half the failures we expected, and a hazard ratio of 2.0 means that the program experienced double the number of failures that we expected. Estimating a program's performance always involves some degree of uncertainty in the estimate. Therefore, we calculate a bell-shaped curve like the one shown in Figure 1 that describes the likely location of the program's hazard ratio. A narrower bell curve indicates that we have more certainty about the estimate, and a wider bell curve indicates less certainty. Technical information on the estimation of a program's hazard ratio was published in the American Journal of Transplantation.

Step 2. Determine whether we have at least 97.5% certainty that the program’s hazard ratio is better or worse than expected

In step 2, we determine whether or not at least 97.5% of the bell-shaped curve is lower than 1.0 (better than expected) or higher than 1.0 (worse than expected). We do this by identifying where on the left end of the bell-shaped curve we have 2.5% of the area under the curve to the left, and where on the right end of the curve we have 2.5% of the area under the curve to the right.

Step 3. Assign the Rating

The final step in the process is to assign the rating of “worse than expected”, “as expected”, or “better than expected”. If 97.5% of the bell-shaped curve is to the left of 1.0, we assign a rating of “Better than Expected”. If 97.5% of the bell-shaped curve is to the right of 1.0, we assign a rating of “Worse Than Expected”. Otherwise we assign a rating of “As Expected”.

Frequently Asked Questions

Why SRTR changed the 3-tier transplant program outcome assessment system to a 5-tier system.

SRTR is charged with reporting the proportion of transplanted organs and recipients that survive after transplantation at each program in the United States. SRTR attempts to provide this information over the internet in ways that are easy to understand, interpret and use, and updates this information every 6 months. In 2002, SRTR began publishing transplant program-specific reports on its public website. SRTR presented a 3-tier assessment of how many transplants survived the first year after transplantation at each program: “as expected”, “better than expected” or “worse than expected”. From the early 2000s through late 2014, these assessments were based on a statistical hypothesis test and required extremely strong statistical evidence to place a program outside of the “as expected” category. Specifically, programs were placed in the “as expected” category unless there was a 97.5% chance that their outcomes were better or worse than national norms.  As a result of these stringent requirements, almost all programs were “as expected” regardless of observed performance.  Therefore, the “as expected” category failed to identify important differences in program outcomes.  For example, the worst program in the “as expected” category typically had a transplant failure rate over four times higher than the best program in the “as expected” category. Given these facts, SRTR decided to explore alternative systems that would better convey differences in program outcomes to the general public.

In 2012 SRTR, OPTN, and HRSA hosted a Consensus Conference on Transplant Program Quality and Surveillance. One of the recommendations coming from this conference was to explore ways to make the transplant outcome assessments more understandable for patients and the general public. In 2013, SRTR began working with its over-sight committee, now called the SRTR Visiting Committee, on a different program outcome assessment system. SRTR considered many alternatives and best practices in public reporting as recommended by the Agency for Healthcare Research and Quality (AHRQ). After exploring several different options, SRTR and its Visiting Committee recommended a 5-tier assessment system. This system was presented and discussed at OPTN committees and national conferences over a period of 3 years before it was released in December, 2016.

The 5-tier assessment system was designed to better inform patients on program performance by identifying differences in posttransplant outcomes.  Specifically, the 5-tier assessment places only 30% of programs in the “as expected” tier (Table 1), while 96% of programs are “as expected” in the 3-tier assessment (Table 2).  By more evenly distributing programs across each category, the 5-tier assessment better identifies programs with similar posttransplant outcomes (Figure 1). Therefore, the 5-tier assessment will better inform the general public of transplant program performance.

After the new 5-tier system was released, a number of transplant programs and transplant surgeons expressed to HRSA that they had not had enough time to examine and understand the new system. As a result, HRSA directed SRTR to temporarily return to reporting program outcomes with the 3-tier system, while at the same time placing the new 5-tier system on a separate publicly accessible beta website so that everyone can compare the 3-tier system with the 5-tier system and have adequate time to understand the differences in the two systems.

Table 1. Numbers of adult transplant programs in each of the 5-tier assessment system categories.

Transplant Type

Tier 1
(Worse than Expected)

Tier 2
(Somewhat Worse than Expected)

Tier 3
(Good, As Expected)

Tier 4
(Somewhat Better than Expected)

Tier 5
(Better than Expected)

Heart

8

16

44

47

8

Kidney

12

52

78

61

30

Liver

5

32

40

37

10

Lung

3

17

22

20

5

 

Table 2. Numbers of adult transplant programs in each of the 3-tier assessment system categories.

Transplant Type

Worse than Expected

As Expected

Better than Expected

Heart

1

121

1

Kidney

7

218

8

Liver

0

121

3

Lung

1

65

1

 

Figure 1. The range of program outcomes within each of the tiers under the 3-tier and 5-tier systems.  A tier with wider dashed lines (or whiskers) indicates less similar program performance within the tier.  For example, the “as expected” category in the 3-tier system fails to capture important differences in program outcomes, with the worst program possessing a transplant failure rate over four times higher than the best program (2.0/0.5 = 4).
What is the 3-tiered assessment system?

One of SRTR's functions is to provide information to the public on the performance of transplant programs as mandated in the OPTN Final Rule. The metrics SRTR developed to assess program performance are rightly complex, involving complex statistics and mathematics that are often difficult to explain. The 3-tiered system therefore summarizes and communicates complex concepts to the general public.

Why did the SRTR change from the 5-tier system back to the old 3-tier system?

In response to feedback received from members of the transplant community regarding the lack of adequate time to review the new 5-tier rating system prior to implementation, HRSA requested that SRTR transfer the 5-tier rating to an alternate, publicly available beta site to undergo further review and identification of areas for improvement. The SRTR website’s outcome assessment information reverted to the 3-tier system on Tuesday, Feb. 21st. Your comments about the 5-tier system on the beta site and/or the 3-tier system are welcomed, and may be submitted to SRTR by contacting us.

HRSA and SRTR remain committed to seeking and incorporating input from all stakeholders, especially patients, so that we can continually improve the SRTR web site and make outcome information more transparent and understandable for patients and their caregivers. 

How should the community supply comment on the ratings, either the 3-tier or the 5-tier systems?

SRTR and HRSA encourage feedback on the systems. You may provide comments to the SRTR at SRTR@SRTR.org.

Doesn't a 3-tiered system lose important information?

Whenever we transform something that is continuous in nature, such as a program's hazard ratio, which can range from a value greater than 0 to positive infinity, into a 3-tiered system, we do lose information. For example, we lose whether or not the program's score was close to a rating boundary. Furthermore, we lose the sense of how certain we are of the overall evaluation, something that is conveyed by the size and shape of the program's bell-shaped estimate of the hazard ratio. SRTR continues to make available to interested readers the full evaluation of the program within the full program-specific report available on our website.

What is the interpretation of ‘as expected’ in the 3-tier system?

The 3-tier system is derived from a common approach used in medical research called statistical hypothesis testing, which identifies outcomes that are extremely unlikely to occur for truly average programs.  Since statistical hypothesis testing was not designed for program classification, programs with ‘as expected’ performance do not necessarily have similar observed performance with each other.  For example, Hospital A could have a graft failure rate that is 50% lower than expected and be classified with ‘as expected’ outcomes. In contrast, Hospital B could have a graft failure rate that is 100% higher than expected and also be classified with ‘as expected’ outcomes.  Indeed, programs hazard ratios within the “as expected” rating currently range from approximately 50% better than expected to 100% worse than expected. Thus, the ‘as expected’ tier in the 3-tier system only indicates that we do not have strong enough statistical evidence that the program’s outcomes are above or below the national norm. The full evaluation for each program remains available to interested readers in the full program-specific reports available on our website.

Does this evaluation system cause programs to avoid performing higher risk transplants?

SRTR’s evaluations are “risk adjusted” so that programs taking on higher risk patients and/or using higher risk donor organs are not penalized for having lower survival rates. For each assessment, SRTR takes into account many recipient and donor characteristics in an effort to make a fairer comparison of program outcomes. The factors considered in each evaluation are available for review and a publication detailing the process of arriving at the risk adjustment models is also available. By performing these adjustments for risk, SRTR is comparing the outcomes achieved at a program with outcomes achieved nationally, for similar levels of risk. SRTR has also recently shown that there is no relationship between the percentage of high-risk transplants performed at a program and their likelihood of having a poor outcome assessment. SRTR continues to work with the OPTN to advocate for improved data collection to enhance future versions of the risk adjustment models.

Are the models good enough to support this type of evaluation?

SRTR follows a published process for building and maintaining the risk-adjustment models used to account for recipient and donor characteristics when evaluating programs (Snyder JJ, Salkowski N, Kim SJ, Zaun D, Xiong H, Israni AK, Kasiske BL. Developing statistical models to assess transplant outcomes using national registries: The process in the United States. Transplantation. 2016;100:288-294). In addition, we are continually working in partnership with the OPTN community to discuss ways to collect better data to allow for better prediction of outcomes. 

How does the 3-tier assessment handle small vs. large programs?

Smaller programs have less precise estimates of their hazard ratios because we have less data to observe. Therefore, smaller programs have wider bell-shaped curves, and curves that are more likely to be centered near 1.0 due to the Bayesian evaluation system. This makes it likely that small programs will land in the “As Expected” rating.

Why not show actual survival achieved at each program?

The actual survival percentage achieved at a program is not a useful measure of the program’s outcomes compared to other programs, because some programs take on more risk than others. For example, Hospital A may have a lower survival percentage than Hospital B, simply because Hospital A takes riskier patients than Hospital B. It is also possible that Hospital A performs very well with risky transplants, and would receive a “Better than Expected” rating even though they have lower absolute survival than Hospital B. Therefore, it would be misleading to consumers to present actual patient survival as the basis for a quality metric to rank transplant programs. Risk adjusted assessments that are necessary to allow programs to take on varying degrees of risk without the fear of poor outcome evaluations. Actual patient survival percentages are available in the program reports, but they do not form the sole basis of the quality evaluation.