Witness Testimony of Mr. Daniel Bertoni, Director, Education, Workforce, and Income Security, U.S. Government Accountability Office
Chairman Miller, Ranking Member Michaud, Members of the Committee,
I am pleased to discuss the Department of Veterans’ Affairs (VA) quality assurance activities for ensuring accurate and consistent decisions on veterans’ disability benefit claims. Through its disability compensation program, VA provides cash compensation to veterans for disabling conditions incurred or aggravated while in military service. In fiscal year 2013, VA paid $53.6 billion in disability compensation to 3.6 million veterans. For this same time period, VA reported a 96-percent issue-based accuracy rate and a 90-percent claim-based accuracy rate, while setting a goal of 98 percent for both measures in fiscal year 2015. However, the Veterans Benefit Administration (VBA) within VA has faced difficulties in improving the accuracy and consistency of its claims decisions. Accurate initial claims decisions can help ensure that VA is paying disability benefits only to those entitled to such benefits, and in the correct amounts. Meanwhile, consistent decisions help ensure that veterans’ claims receive comparable treatment, regardless of which VBA staff member or regional office processes the claim. GAO and VA’s Office of Inspector General (OIG) have previously reported on shortcomings in VBA’s quality assurance activities. Concerns also have been raised about the lack of transparency related to recent changes in the calculation of VBA’s national accuracy rate for compensation claims—which is based on its national Systematic Technical Accuracy Review (STAR)—and whether recent changes reflect reliable measures of accuracy. My remarks today are based on ongoing work requested by this committee. Specifically, we examined: (1) the extent to which VBA effectively measures and reports the accuracy of compensation claim decision-making, and (2) the extent to which VBA’s other quality assurance activities are complementary and coordinated.
To inform our work, we reviewed STAR guidance, reports, and methods for sampling and estimating accuracy of claims decisions, and analyzed STAR and VBA claims data for claims processed in fiscal year 2013, the most recent year for which complete data are available. We also assessed VBA’s methods against accepted statistical practices. In addition, we assessed the reliability of STAR and VBA claims data used for all our analyses and determined that they were sufficiently reliable for the purposes of providing information on trends in claims decisions. We also reviewed relevant VBA reports and practices for reporting accuracy and compared these against requirements for agency performance reporting and related GAO work. To determine the extent to which STAR and other key quality assurance activities are complementary and coordinated, we reviewed relevant guidance and policy documents, interviewed cognizant VBA officials, and visited four regional offices to gain a range of perspectives on how quality assurance activities are implemented at the regional office level, as well as how information is shared across quality assurance activities. We reviewed VBA’s methods for designing and implementing its consistency reviews against generally accepted practices in survey and questionnaire development. We also compared VBA’s quality assurance practices against its internal guidance and Standards for Internal Control in the Federal Government (AIMD-00-21.3.1). We intend to produce a report later this year that will provide our final results.
Our work was conducted in accordance with generally accepted government auditing standards. Those standards require that we plan and perform the audit to obtain sufficient, appropriate evidence to provide a reasonable basis for our findings and conclusions based on our audit objectives. We believe that the evidence obtained provides a reasonable basis for our findings and conclusions.
VA pays monthly disability compensation to veterans with service-connected disabilities (i.e., injuries or diseases incurred or aggravated while on active military duty) according to the severity of the disability. VBA staff in 57 regional offices process disability compensation claims. These claims processors include Veterans Service Representatives who gather evidence needed to determine entitlement, including VA and military medical records, and Rating Veterans Service Representatives who decide entitlement and the rating percentage. Each claim requires a determination on one or more medical conditions. In fiscal year 2013, VBA decided more than 1 million compensation claims.
Since fiscal year 1999, VBA has used the STAR program to review the decisional accuracy of disability compensation claims. Under the STAR program, VBA reviews a stratified random sample of completed claims, and certified reviewers use a checklist to assess specific aspects of each claim. Specifically, for each of the 57 regional offices, completed claims are randomly sampled each month and the data are used to produce estimates of the accuracy of all completed claims. VA reports national estimates of accuracy from its STAR program to Congress and the public through its annual performance and accountability report and annual budget submission. VBA also produces regional office accuracy estimates, which it uses to manage the program. These and its national accuracy rates are reported in a publicly-available performance database, the ASPIRE dashboard.
Prior to October 2012, VBA’s estimates of accuracy were claim-based; that is, claims with zero errors were considered accurate and, conversely, claims with one or more errors were considered inaccurate. Beginning in October 2012, VBA also began using STAR data to produce issue-based estimates of accuracy that measure the accuracy of decisions on the individual medical conditions within each claim. For example, a veteran could submit one claim seeking disability compensation for five disabling medical conditions. If VA made an incorrect decision on one of those conditions, the claim would be counted as 80 percent accurate under the issue-based measure. By comparison, under the claim-based measure, the claim would be counted as 0 percent accurate. In March 2014, VBA reported a national estimate of issue-based accuracy in its fiscal year 2015 annual budget submission and plans to update this estimate in VA’s next performance and accountability report. VBA also produces issue-based estimates by regional office, and reports them in the ASPIRE dashboard. For fiscal year 2013, the regional office claim-based accuracy rates ranged from 78.4 percent to 96.8 percent, and the issue-based accuracy rates ranged from 87.0 percent to 98.7 percent.
Beyond STAR, VBA has programs for conducting regional office quality reviews and for measuring the consistency of decisions. In March 2012, VBA implemented a quality review team (QRT) at each regional office, consisting of staff dedicated to conducting quality reviews. A QRT conducts individual quality reviews of claims processing staff members’ work, for performance assessment purposes. The QRT also conducts in-process reviews before claims are finalized to help prevent inaccurate decisions by identifying specific types of common errors. Such reviews also serve as learning experiences for staff members. Since fiscal year 2008, VBA has also conducted studies to assess the consistency of disability claims decisions across regional offices. Initially, these studies have included inter-rater reliability studies that assess the extent to which a cross section of claims processors from all regional offices agree on an eligibility determination when reviewing the entire body of evidence from the same claim. In 2013, VBA moved beyond the inter-rater reliability studies and introduced consistency questionnaires as its primary means for assessing consistency. A questionnaire includes a brief scenario on a specific medical condition for which claims processors must correctly answer several multiple-choice questions.
When calculating accuracy rates, VBA does not follow generally accepted statistical practices. For example, VBA does not calculate the margin of error associated with each estimate that it generates, which prevents a complete understanding of trends over time and comparisons among offices. In addition, VBA does not weight the results of its STAR reviews to reflect its approach to selecting claims by regional office, which can affect the accuracy of estimates. According to our analysis of VBA data, weighting would have resulted in a small change to VBA’s nationwide claim-based accuracy rate for fiscal year 2013: from 89.5 percent to 89.1 percent. However, 29 of the 57 regional offices’ would have experienced a somewhat greater increase or decrease in their accuracy rates. According to VBA officials we interviewed, although STAR management used a statistician initially to help develop the way in which they measure accuracy, it currently does not use a statistician to, for example, weight STAR results and calculate margins of error for accuracy estimates. Further, VBA officials said they did not consult a statistician when developing the new issue-based accuracy measure, but rather relied on the same sampling methodology and approach for estimating accuracy as for the claim-based measure. We have previously reported that to be useful, performance information must meet users’ needs for completeness, accuracy, consistency, and validity among other factors.
In addition, VBA’s accuracy reporting lacks methodological details that would help users understand the distinction between its two accuracy measures and their associated limitations. While VBA’s new issue-based measure provides additional perspective on quality of claims decisions, to date VBA has not fully explained in its public reports how the issue-based and claim-based measures differ. VBA began reporting the issue-based measure in its ASPIRE dashboard in 2013. The issue-based measure tends to be higher than the claim-based measure because the former allows for claims to be considered partially correct, whereas the claim-based measure does not. According to VBA officials, the issue-based estimate provides a better measure of quality because veterans’ claims have increasingly included multiple medical issues. Our analysis of STAR data confirms that, as the number of issues per claim increases, the chance of at least one issue being decided incorrectly within a single claim increases because there are more opportunities for error (see fig. 1). However, VA did not report in its fiscal year 2015 budget request how these measures are calculated and why the issue-based measure might be higher than the claim-based measure.
Figure 1: Claim-Based and Issue-Based Accuracy Rates by Number of Issues Claimed, Fiscal Year 2013
Further, VA has not explained in public reports that its accuracy measures are estimates which have distinct margins of error and limitations. These margins of error are necessary for users to make meaningful comparisons, for example, between the two measures or over time for the same measure. Further, each accuracy measure has distinct limitations, but VA does not report this information. For example, the claim-based measure does not provide a sense of the proportion of issues that the agency decides correctly because the measure counts an entire claim as incorrect if any error is found. On the other hand, the issue-based measure does not provide a sense of the proportion of claims that the agency decides with no errors. In prior work, we identified clarity as a key attribute to a successful performance measure, meaning that the measure is clearly stated and the associated methodology for the measure is identified. Measures that lack clarity may confuse or mislead users. We have also reported on best practices in implementing related federal performance reporting requirements, such as those in the GPRA Modernization Act of 2010. Specifically, agencies must disclose information about the accuracy and validity of their performance information in their performance plans, including the sources for their data and actions to address any limitations.
Finally, VBA’s approach to measuring accuracy is inefficient because it reviews more claims than needed to estimate accuracy. VBA randomly selects about 21 claims per month from each of its regional offices for STAR review, regardless of the offices’ varying workloads and historical accuracy rates. According to VBA, this uniform approach allows the agency to achieve a desired level of precision of its accuracy estimates for each regional office. However, this approach leads VBA to select more claims for review than are needed at regional offices where the number of claims processed has been relatively small or accuracy has been high. According to our analysis of fiscal year 2013 regional office workload and accuracy results, VBA could reduce the overall number of claims it reviewed annually by about 39 percent (over 5,000 claims) and still achieve its desired precision for its regional office accuracy estimates. Only for one regional office did we find that VBA would need to increase the number of claims currently reviewed to achieve its confidence level goal for that office. More efficient sampling could allow VBA to select fewer cases for review and free up limited resources for other important quality assurance activities, such as additional targeted accuracy reviews on specific types of error-prone or complex claims. Specifically, reviewing about 5,000 fewer claims could free up about 1,000 staff days because, according to VBA officials, STAR staff review at least 5 claims per day.
Our review of STAR is ongoing. Specifically, we plan to analyze STAR data and processes to assess whether any types of claims are systematically underrepresented in accuracy reviews. In addition, we are further considering the implications of a more efficient sampling methodology on developing issue-based accuracy estimates.
VBA Has Enhanced and Coordinated Its Quality Assurance Activities, Although Gaps in Effectiveness Exist
VBA Enhanced Other Quality Assurance Activities, but Shortcomings Exist and Effectiveness is Unclear
In addition to its STAR reviews, VBA’s quality assurance framework includes other complementary activities, some of which have been enhanced to help meet its 98-percent goal for fiscal year 2015. For example, VBA established QRTs in regional offices as a means of strengthening its focus on quality at regional offices, where claims are processed. QRT personnel—like STAR reviewers—are required to pass an annual skills certification test. In addition to conducting individual quality reviews to determine whether claims processors are achieving individual accuracy targets, QRT personnel are charged with conducting in-process reviews of claims not yet finalized, looking for specific types of errors commonly made. Quality reviewers are also responsible for providing feedback to claims processors on the results of their quality reviews, including formal feedback from the results of individual quality reviews and more informal feedback from the results of in-process reviews. Typically, feedback on quality reviews is provided as reviews are completed. In addition, at the four offices we contacted, quality reviewers are available to answer questions and provide guidance to claims processors as needed.
During our site visits, however, we identified shortcomings in QRT practices and implementation that could reduce their effectiveness. With respect to assessing individual performance, we learned that three of the four offices we contacted have agreements with their local unions that prevent QRT personnel from reviewing claims processed during overtime. As a result, regional offices would be limited in their ability to address issues with the quality of work performed during overtime. VBA officials told GAO they do not know how many regional offices include or exclude claims processed during overtime, or the extent to which excluding cases worked during overtime occurs nationally. According to VBA data, claims processed on overtime represented about 10 percent of rating-related claims completed nationally in fiscal year 2013. In our ongoing work, we plan to review the extent to which this practice is followed at other regional offices and how claims processed during overtime are identified. In addition, regional offices we contacted told us that they face a challenge in conducting in-process reviews as required because VBA’s Veterans Benefits Management System lacks the capability to pause the process and prevent claims from being completed while a quality review is still underway. In our ongoing work, we will continue to review the extent to which claims are decided before in-process reviews are finalized and whether this causes rework or other problems.
VBA’s efforts to assess consistency of claims decisions have also expanded in recent years. The inter-rater reliability reviews that VBA largely relied on to assess consistency since 2007 were time consuming in that claims processors required about 4 hours to review an entire claim, the process was administered by proctors in the regional offices, and the results were hand-graded by national VBA staff. Given the resources involved, IRRs were typically limited to 300-500 (about 25-30 percent) claims processors, randomly selected from each regional office. Since VA expanded its consistency program in 2009 to include consistency questionnaires, it now relies more heavily on this streamlined approach to assess consistency. The questionnaires require less staff time to complete because, in addition to a brief scenario on a specific condition, participants have 10 or fewer multiple choice questions to answer. The questionnaires are administered electronically through the VA Talent Management System, which has allowed VBA to increase employee participation and administer the studies more frequently. For example, a recent consistency questionnaire was taken by about 3,000 claims processing employees—representing all employees responsible for rating claims.
Although VBA has enhanced its approach to measuring consistency, VA officials told us that consistency questionnaires to date have been developed and implemented without any prior pre-testing which would allow VA to examine the clarity of questions or the validity of the expected questionnaire results. Pre-testing is a generally accepted practice in sound survey/questionnaire development and would help determine whether the test questions are appropriate for field staff and are accurately measuring consistency. A quality assurance official told us that by July 2014, VBA plans to begin pre-testing consistency questionnaires with national quality assurance staff that have claims processing experience. However, VBA has not yet developed a concrete plan or written policies for pre-testing the questionnaires.
VBA efforts to evaluate the effectiveness of its efforts have been limited, although VBA told us that they have seen some improvements in accuracy following certain quality assurance enhancements. Specifically, VBA officials told us that, although they have not seen an increase in the national accuracy rate in the current fiscal year, the number of errors related to claim development—specifically, to ensuring the claim reflected sufficient / appropriate medical examinations)—has declined, showing the success of QRT efforts in targeting these errors through in-process reviews and providing related training. On the other hand, VBA central office has only begun to receive data from regional office IQR reviews and expects to begin receiving additional data and identifying accuracy trends in the summer of 2014. With respect to consistency studies, VBA has not evaluated and lacks plans to evaluate the efficacy of its new approach for conducting consistency reviews to determine its relative effectiveness. Evaluation can help to determine the “value added” of the expenditure of federal resources or to learn how to improve performance—or both. It can also play a key role in strategic planning and in program management, informing both program design and execution. In our ongoing work, we plan to further review how VBA uses the results of its consistency studies to improve quality and whether and how VBA could further leverage its vast amount of claims data to identify error trends and opportunities for improvement.
VBA Has Taken Steps
VA has taken steps to coordinate its quality assurance efforts, in part, by systematically disseminating information on national accuracy and consistency trends to regional office management and QRTs, which in turn used this information to provide feedback. With respect to STAR, regional offices receive quarterly reports on their STAR accuracy performance, and QRT reviewers receive periodic conference call updates from STAR staff to discuss error trend information. Managers or QRT members at each of the regional offices we contacted noted that they also share STAR data with claims processors through periodic training focused on STAR error trends. With respect to consistency studies, regional offices receive national results; regional office-specific results; and, since February 2014, individual staff results. Officials at each of the four regional offices we visited told us they discuss the results of consistency studies and inform claims processors of the correct answers to the questions.
Based on error trends identified through STAR and other quality assurance activities, QRT personnel are also expected to disseminate guidance and provide input into, and sometimes conduct, regional office training. For example, two of the four offices we contacted cited instances where they have used consistency study results for training purposes. In general, at each of the four offices, reviewers conduct, or work with regional office training coordinators to conduct, periodic training forums for claims processors. Regional offices we contacted also supplement training with other communications informed by quality review results. For example, QRTs at three of the four regional offices we contacted produce periodic newsletters for regional office claims processors, including guidance based on errors found in all types of reviews.
In addition to sharing STAR results for regional office training purposes, VA uses STAR to guide other quality assurance efforts. For example, according to VBA officials, the agency has used STAR data to identify error trends associated with specific medical issues, which in turn was used to target efforts to assess consistency of decisionmaking related to those issues. Recent examples are the August 2013 inter-rater reliability study, which examined rating percentages and effective dates assigned for diabetes mellitus (including peripheral neuropathy); and a February 2014 study on obtaining correct disability evaluations on certain musculoskeletal and respiratory conditions. In addition, according to VBA, the focus of in-process reviews has been guided by STAR error trend data. VBA established in-process reviews to help identify and prevent claim development errors based on medical examination and opinion-related errors, which it described as the most common error type. More recently, VBA has added two more common error types—incorrect rating percentages and incorrect effective benefit dates—to its in-process review efforts. VBA officials stated that they may add other common error types based on future STAR error analyses.
While regional offices receive a lot of information on accuracy and error trends, some quality review staff expressed concern that there was too much information. At the same time, staff in all four offices said that key supports were not sufficiently updated to help quality review staff and claims processors efficiently and effectively do their jobs. Staff at these offices consistently described problems with data systems, central guidance and training.
· Regarding data systems, at the regional office level quality assurance information is input into three different systems. Staff at all four offices we contacted said that these systems lack functionality to create reports on error trends, so they maintain spreadsheets to track additional information to allow them to assess error trends from their office’s quality reviews. At the national level, VA central office has made improvements in reporting capabilities to obtain information on errors by diagnostic code, but has not made this available to regional office quality staff.
· Regarding guidance, regional office quality review staff also said they face challenges locating the most current guidance from all of the information they are provided. Managers or staff at each of the regional offices we contacted said that VBA’s policy manuals are outdated. As a result, staff must search numerous sources of guidance to locate current policy, which is time-consuming and difficult. VBA officials acknowledged that there are several ways it provides guidance to regional offices—such as guidance letters; periodic quality calls (and notes from those calls); various bulletins; and training materials maintained on VBA’s Intranet site—and that this could be confusing to staff. VBA officials also noted that they face challenges in updating the policy manual to ensure it is as current as possible.
· Regarding training, staff in the offices we contacted also said that in some cases national training has not been updated to reflect the most current guidance. This makes it difficult to provide claims processors with the information they need to avoid future errors. For example, staff from one regional office noted that training modules on one error prone issue—Individual Unemployability and related effective dates of benefits— had not been updated to reflect all new guidance, the sources of which included conference calls, guidance letters, and Frequently Asked Questions compiled by VBA’s central office. VBA officials stated that they are continually updating national training to reflect new guidance, but that it takes time. Further, according to officials at the regional offices we contacted, VBA restricts regional offices’ flexibility to tailor course materials to address office-specific error trends. We will follow up with VBA’s national training office to discuss challenges to updating training and reasons behind restrictions on regional offices’ ability to modify training materials.
In conclusion, VBA has made important enhancements to its quality assurance program, but has missed key opportunities to fully cement its commitment to quality. Although VBA’s dual approach for measuring accuracy provides additional information on error trends, which can allow VBA to better target quality improvement efforts, VBA is producing imprecise estimates of accuracy that, while not completely reliable, are being used by program managers to guide improvement efforts. VBA also missed an opportunity to win the public’s trust when it introduced a new accuracy measure without full explanation of its meaning and limitations. At the same time, VBA is expending more resources than needed to produce its accuracy estimates—resources that could be better used to achieve more precise estimates or drill down on error trends to guide improvement efforts. VBA has bolstered regional attention to quality and national consistency efforts, but several shortcomings—such as excluding claims worked during overtime from regional office quality reviews, or the lack of pretesting of consistency questionnaires—may detract from their overall effectiveness. Moreover, VBA does not systematically track the impact of these efforts on accuracy rates, again creating a gap in information that could help focus future quality assurance efforts. Finally, while VBA has made a concerted effort to leverage and share information resulting from its national STAR and consistency studies and to disseminate relevant guidance, VBA has not taken the final step of centralizing its guidance or promptly updating national training. Claims decisions are highly complex and subjective, and staff who need to sift through a patchwork of guidance and training to make those decisions may be vulnerable to repeating prior errors. As we complete our ongoing work, we will consider any recommendations needed to address these issues.
Chairman Miller, Ranking Member Michaud, and Members of the Committee, this concludes my prepared statement. I will be pleased to answer any questions that you or other members of the subcommittee may have.
 Department of Veterans Affairs, Office of Inspector General, Systemic Issues Reported During Inspections at VA Regional Offices, 11-00510-167 (Washington, D.C.: May 18, 2011)
GAO, Veterans’ Disability Benefits: VA Has Improved Its Programs for Measuring Accuracy and Consistency, but Challenges Remain, GAO‑10‑530T (Washington, D.C.: March 24, 2010)
We focused on claims VBA identifies as rating-related, which require decisions on claimants’ entitlement to disability compensation, and the amounts of monthly benefits. We did not review authorization decisions, where entitlement to compensation is not an issue; for example, cases where beneficiaries have died, and VA is required to terminate compensation. Our work does not include a review of quality assurance activities associated with VBA pension claims or appealed claims.
We conducted site visits with the Newark, Oakland, Nashville and Waco VA regional offices. We selected these offices based on a range of criteria, including: (1) number of claims processed annually, (2) geography (at least one regional office in each of VA’s four areas), and (4) accuracy rates. At each office, we spoke with service center managers and quality assurance staff, as well as representatives of local veteran service organizations.
38 U.S.C. §1101 et seq. VA’s ratings are awarded in 10-percent increments, from 0 to 100 percent. Generally, VA does not pay disability compensation for disabilities rated at 0 percent. As of December 2013, basic monthly payments ranged from $130.94 for a veteran with 10-percent disability rating and no dependents, and to $3,134 for a veteran with a 100-percent disability rating, a spouse and one child.
For quality assurance purposes, VBA counts one of its sub-offices as a separate regional office, in addition to its 56 regional offices. Thus, for reporting purposes, we refer to 57 offices.
The STAR review has two major components. The benefit entitlement review assesses whether the correct steps were followed in addressing all issues in the claim, collecting appropriate evidence, and whether the resulting decision was correct, including effective dates and payment rates. Accuracy performance measures are calculated based on the results of the benefit entitlement review. The STAR review also assesses whether claims processors appropriately documented the decision and notified claimants.
 The ASPIRE dashboard is an online report of VBA’s performance by program. Data are updated monthly and available by regional office and nationally. See http://www.benefits.va.gov/REPORTS/aspire_dashboard.asp.
 STAR accuracy estimates are derived from sample data and have sampling error associated with them. The confidence interval is a range of values around the estimate which is likely to include the actual population value, and helps determine whether different estimates are significantly different from a statistical perspective.
 VBA samples about the same number of claims from each regional office regardless of the offices’ varying sizes. Smaller regional offices are disproportionately represented. Thus, the set of all claims reviewed nationally does not comprise a random sample of all claims. Weighting accounts for this fact and yields more correct estimates.
 In comparing the weighted accuracy estimates that we computed to the unweighted estimates that VBA reported for regional offices in fiscal year 2013, we found that weighting would increase the accuracy rate more than .4 percent for 17 offices and decrease the accuracy rate more than .4 percent for 12 offices. Weighting would increase the accuracy estimates for regional offices by as much as 2.1 percent and decrease the estimates by as much 3.6 percent.
 VBA arrived at its sample size—246 rating claims per regional office per year—based on an assumed accuracy of rate of 80 percent for each regional office, and a desired precision that reflects sampling error of plus or minus 5 percentage points at the 95 percent level of confidence in accuracy estimates for each regional office.
 QRT reviewers review an average of 5 randomly-selected claims per claims processing staff member per month. For claims processing staff members found in need of accuracy improvement, 10 reviews per claims processing staff member per month may be performed.
To help reduce its claims backlog, VBA has required claims processors to work 20 hours per month of mandatory overtime during portions of fiscal years 2013 and 2014.
A regional office is expected to perform in-process reviews equivalent to 10 percent of their expected claims decisions per month, according to VBA guidance.
VBA’s Veterans Benefits Management System is intended to help streamline the claims process by allowing for paperless claims processing, including electronic claims files.
 The STAR review assesses whether adequate evidence was developed to support the rating decision. Possible development errors include failure to obtain sufficient medical records, including a medical examination or opinion.
Individual Unemployability is a part of VA’s disability compensation program that allows VA to pay benefits at the 100 percent level to veterans whose service-connected disabilities prevent them from maintaining substantial gainful employment.