San Onofre steam generators – honest error driven by search for perfection
Mitsubishi Heavy Industries (MHI), the supplier that sold four new steam generators to Southern California Edison (SCE) for the San Onofre Nuclear Generating Station (SONGS), has issued a redacted version of its root cause analysis of the u-tube failures that have kept both of the station’s 1100 MWe units shut down since January 31, 2012. My analysis of the report tells me that a large, skilled, experienced team of engineers (both at the supplier and at the utility) made a design choice that resulted in unexpected and unintended consequences.
Early last month, two technically unqualified politicians (that phase is a bit redundant, isn’t it) – Senator Boxer and Congressman Markey – took it upon themselves to demonize selected nuclear energy professionals. They extracted a few isolated phrases from a version of the root cause analysis that was not publicly available and proclaimed to the world that they had found a smoking gun “proving” that SCE had knowingly installed faulty equipment.
Aside from the fact that such an assertion was absurd – why on earth would any corporation take the risk of installing components known to be faulty into a vital, multi-billion dollar production facility capable of producing between $1-$10 million in daily revenue – it exposed a visceral dislike of a power source that has been cleanly and safely supplying 20% of the electricity in the United States for several decades.
It also exposed a profound distrust of one of the most squeaky clean industries in the United States; say what you want to about nuclear energy, but it does not take much time in the industry to realize just how differently it is led compared to all other money making enterprises.
One of the major difficulties in this saga is the fact that politicians rarely understand engineering, especially the constant need to make informed decisions and to balance competing requirements. No mechanical system is flawless and no material is perfectly matched to its environment. That statement is especially true when the environment is a complex heat exchanger required to operate over a wide range of temperatures in a variable mix of fluid conditions over several decades.
After many months of investigation, tens of thousands of hours of analysis, and hundreds of millions of dollars worth of lost production time, it is now clear that the two steam generators installed in San Onofre Unit 3 contained a minor manufacturing feature that resulted in a “perfect pitch” harmonic. At just the wrong condition – 100% steam flow – a combination of relatively dry steam, precisely manufactured anti-vibration bars (AVB), and densely packed u-tubes resulted in a few hundred (out of nearly 10,000) tubes vibrating with a large enough amplitude to make contact. The unexpected vibration and contact resulted in accelerated wear and caused one tube to fail while the steam generator was operating.
If Unit 3 had remained in operation, other tubes in the same area of the steam generator would have likely failed. The same problem does not exist in Unit 2.
(See MHI’s document number L5-04GA588(0) titled San Onofre Nuclear Generating Station, Units 2 & 3 Replacement Steam Generators Supplementary Technical Evaluation Report Fig 2.1-1 (1/2 and 2/2) and 2.1-2 (1/2 and 2/2) to understand the basis for that statement.)
Update (Posted on March 10, 2013 at 08:25) The full document Root Cause Analysis Report for tube wear identified in the Unit 2 and Unit 3 Steam Generators of San Onofre Nuclear Generating Station is now available from the NRC ADAMS database. End Update.
Surprisingly enough, the reason that the condition does not exist in Unit 2 is that the anti-vibration bars (AVB) in Unit 2 were made with enough less precision that they prevented the perfect pitch situation. Instead of being virtually perfectly round holes through which the steam generator tubes could penetrate with tight tolerance but no contact, the AVB’s in unit 2 had enough manufacturing variation that they made contact with the penetrating tubes with an average force that was twice as high as the minor, incidental contact achieved in Unit 3.
All of the TTW tubes are located in the region of highest average void fraction, where velocities are highest and damping is lowest. Both Unit 2 and Unit 3 have the same thermal hydraulic conditions. The tube-to-AVB contact forces in the Unit 3 RSGs are smaller by a factor of two than those of the Unit 2 RSGs. Almost all of the TTW tubes were found in the Unit 3 RSGs. The difference in the contact forces explains this large difference between the two units.
(Section 2.4, Conclusion)
That extra contact force, which was considered to be undesirable by the designers at the time they designed and manufactured the tubes, provided enough unplanned disruption to the tube bundle that the harmonic vibration could not get started and could not reach enough of an amplitude to cause tube to tube wear (TTW).
It is instructive to learn that the tighter tolerances in unit 3 were purposely chosen because the supplier was seeking continuous process improvement. MHI engineers had determined that a small change in the manufacturing process could improve the repeatability of the AVB holes. The design team agreed that the tighter tolerances resulted in a design that was “significantly more conservative than previous designs in addressing U-bend tube vibration and wear.” (page 48 of MHI’s root cause analysis)
Because the computer models used for the design process were not perfect fidelity reproductions of the complete environment of the steam generator, simulation runs did not reveal the potentially detrimental effect of the tighter tolerances.
Though it does not come out and say this in the report, I believe it is likely that the cause of the slight simulation infidelity condition was a limited data set. The San Onofre steam generators are different from any other steam generator that the company has produced; the only ones that are remotely similar are those at Fort Calhoun. Even those generators have major differences, including substantially different sizes. MHI has manufactured dozens of nuclear power plant steam generators that are reliably operating around the world, but they use a design that could not meet the requirements for installation into the San Onofre units.
Misunderstood technical details
The phrases that Boxer and Markey used to assert that there was malfeasance on the part of the utility and the supplier came in places where the root cause analysis team was discussing the design process and the reasons why certain options were determined to be less than optimal. As any reasonably experienced engineer knows, design is fundamentally about tradeoffs among many different measures of effectiveness. A choice that might improve performance in one area often imposes a major degradation in other areas – what the leaked phrases referred to as “unacceptable consequences.”
It is simply part of the design process to propose, evaluate and reject several different alternatives, even if they appear to have advantages on one or two measures. In fact, design is fundamentally the process of rejecting an entire universe of alternatives and picking the ONE that is determined to be the best choice when all measures are considered.
I am sure that, with the advantage of hindsight, the engineers at MHI will never again produce a steam generator that experiences tube to tube wear (TTW) caused by nearly perfect anti-vibration bars (AVB) in a steam generator with high quality steam. I am also sure that human engineers will continue to produce better and better products that occasionally fail to perform as expected. Just look at the Boeing Dreamliner situation or even your own last automobile purchase for examples of fine engineering that is not quite as perfect as we might like.
There are many aspects of the San Onofre saga that sadden me. It is a terrible waste of resources; operating fossil fuel powered replacement generating plants has made the air dirtier and the people in Southern California a little less safe; the situation has given people like Markey and Boxer another reason to battle against the use of nuclear energy (despite the fact that there was no public health risk or impact); and decisions made in the wake of the leak have put an important area of my home country at risk of severe power disruptions in the near future.
Most of this could have been avoided if there was not such a culture of nuclear exceptionalism in which, of all energy sources, only nuclear must be perfect. It should be enough that nuclear fission is far superior to all other choices, but we still have a long way to go before the public will accept that message. We are not being well led by technically ignorant politicians and we are not being well served by a culture of “zero tolerance” within the industry itself.
Both of the units at San Onofre, by all rights, should be operating today, supplying reliable, emission free electricity. Instead, there is no current hope that the start up will happen any time soon. The replacement fuel pump meter continues to run and fill the coffers of the competitive fuel suppliers. The 700 or so SONGS employees who have already been laid off will most likely be joined by more colleagues in the near future.
It is too idealistic to hope that either Markey or Boxer will decide that they should apologize for their accusations and their continued efforts to muddy the waters and delay (or prevent) the restoration of two of the most useful and safe electricity producing units in Southern California.
Rest of the story
I think it is important to remind everyone exactly what happened. On January 31, 2012 (more than 13 months ago), San Onofre Unit 3 operators received indications that one of the two steam generators of the plant they were running was leaking and causing a tiny, but measurable, increase in the radioactivity of the normally non-radioactive water in the secondary (steam) side of the steam generators.
When they recognized the indication, the operators took the conservative course of action and shut down the nuclear plant, even though it turns out that the leak rate (84 gallons of coolant per day) was below the plant’s allowed technical specification (150 gallons of coolant per day per steam generator)
Aside: For the true nuclear professional geeks reading this, the reference for that number is LCO 3.4.13, RCS Operational Leakage. End Aside.
According to information obtained several months later, the MAXIMUM potential dose of radiation to anyone was 5.2E-5 millirem (0.000000052 rem) which is one billion times lower than the annual limit for radiation workers at the time that I first became a nuclear energy professional.
I would also like to remind people that steam generators have periodically experienced leaks since the very first pressurized water reactor was built. The devices do a great job of keeping the slightly radioactive primary coolant contained. They are important on board submarines, where there is no place for any radioactivity to go, but on land, steam generators are more of a choice than a requirement. About 1/3 of all light water reactors do not even bother with them, boiling water reactors send primary coolant directly to the steam turbines.
The small u-tube leak never represented any risk to the public. Unit 2 was not operating at the time and never experienced any steam generator u-tube leaks. Neither unit was in violation of its operating license; before submitting a recovery plan to the NRC that committed the utility to a course of action that included asking the NRC for permission to restart, there was no legal requirement for the plant operators to ask permission to fix their plant and start it back up again.
Unfortunately, nuclear plant operators, perhaps especially those in areas represented by politicians that will take every opportunity they can get to criticize the technology, run their companies with an abundance of conservatism. They believe that they will be better off practically genuflecting in front of regulators than by aggressively defending their rights to properly decide how to run their power plants.
I know I do not speak for my employer when I say this, but that is often not the correct or safe course of action. The regulators have a job to do; they need to ask hard questions. However, the plant operator should know far more than the regulator and should willingly accept all of the responsibilities associated with owning and operating electrical power plants in the safest and most reliable manner possible.
As this situation and several other show, there is a great financial reward associated with safety and reliability.
Atomic Power Review – (March 9, 2012) San Onofre: MHI document release by NRC and what it really means
Yes Vermont Yankee – San Onofre Thoughts and Future. I told you so.
Idaho Samizdat – Reactions to reactor restart remarks about San Onofre
Something very similar happened to fuel elements – grid to rod fretting and vibration caused mechanical damage and corrosion. This was the leading cause for fuel failure for most PWR/BWR fuels. Recently effective solutions have been adapted to solve the problem.
Fuel elements are partly removed every 12-24 months. It’s not quite so easy to replace a large fraction of the steam generator tubing, so any vibration problems in steam generators are much more troublesome than fuel vibration damage, interestingly.
Rod – What makes you think that the people of Southern California are bearing any of the ecological burden from running fossil-fuel plants to make up the difference? I’m sure that these people couldn’t care less. That’s why they keep reelecting nitwits like Boxer.
California imports more electricity than any other state. If you want to know where the bulk of the replacement power is coming from, google “Path 46.”
Though I understand your comment, neither you nor I can effectively toss stones at CA for buying its electricity from plants located in other states.
Rod – I know what you’re saying, but I’ve been a strong supporter for a third reactor at North Anna (preferably a 1650 MW EPR 😉 ) for about a decade now.
How many Southern Californians can make a similar claim?
While I agree with most of your post, I am still questioning why SCE did not go with a more experienced company for fabrication of these larger SGs (e.g., KEPCO/W or Ansaldo). Yes there may have been a price issue, but as we see here the financial and regulatory consequences are magnified when there is an error (especially in California!).
When things are going well, utilities have sometimes a tendency to cut corners. Penny wise, pound foolish, as it sometimes turns out. It also happened recently with a containment job where the postensioned cables in the concrete had to be removed, it was done incorrectly and the damage to the plant (and lost revenues) was astronomic. Don’t recall which nuclear plant it was.
It wasn’t because it was cheap; it was a job done incorrectly. A cheap job or item doesn’t always mean faulty, nor visa-versa.
Whilst not exactly the same this is very similar to what happened at Flamanville, however the reason was not cost-cutting from edf but from the construction company (using people who had received no adequate formation to do the soldering for example). Unfortunately deep cost-cutting has become the standart by which all construction companies go in France (edf changed provider for some of the recent components but here too had to have them rebuild because they were defective)
“Don’t recall which nuclear plant it was.”
Cyril- You are probably referring to the Crystal River steam generator replacement project in Florida. The containment concrete cracked, and then cracked again when they tried a fix. The utility recently decided to permanently close the plant.
Not quite, once the true cost of the fix was known and the source for that money is understood, the “utility” (either Duke or Progress) no longer has control of the decision. Thus Duke announced the decision, but it was made by Exelon. Research how NEIL, Ltd functions and is owned. Exelon has 18 votes. The same process will apply to SONGS, however the decision is currently being “deferred” pending NRC approval of SONGS2 “test”. My prediction is SONGS3 will never run again, SONGS2 is iffy, but probably never also.
Crystal River, in FL
I haven’t followed the SONGS situation closely, but one number in your post gets to me.
84 gallons a day seemed not so small an amount to me, and I wanted a comparison with amounts that mean anything to me. I divided the gallon/day measure to ml/minute. I came to 220 ml a minute, that is less than my morning cup of coffee! If my tap gives me that, I call my water supplier for bad service!
Am I right that the utility shut down two reactors because of a leak of a cup of hardly radioactive water a minute? Words fail.
Senator Boxer and Congressman Markey need to read Henry Petroski’s book To Engineer Is Human, The Role of Failure in Successful Design. It shows that nothing is perfect, that failure leads to successful design, though successful design can become too conservative, which is also detrimental.
A positive message from the steam generator failure at SONGS is that an important engineering lesson was learned. And the lesson was learned without endangering public safety. Those who want to keep SONGS closed are endangering the public now, as the replacement power is coming from more dangerous sources.
re: “Aside from the fact that such an assertion was absurd – why on earth would any corporation take the risk of installing components known to be faulty into a vital, multi-billion dollar production facility”…
For the same reason TMI Unit 2 didn’t install a $35,000 automatic condensate bypass valve as they had on Unit 1.
That is an invalid comparison. You are comparing an alleged failure to install a backup system to an allegation that SCE deliberately chose to install a major component of production that was known to be faulty.
Deciding against installing a piece of backup equipment that might never be used is different from deciding to knowingly install a faulty piece of vital equipment that is in continuous use. In the first case, the bet might pay off if the situation where the backup MIGHT be used never occurs. In the other case a faulty piece of major equipment would be likely to fail and its failure of which would inevitably result in a complete loss of production.
I might win if I choose not to pay extra money to install airbags in a car that was built without them. I am almost sure to loose if I purchase bald tires and plan to keep driving 24 hours per day. No rational decision maker would make the second choice, but might make the first.
Ok you missed my point, The reasons are the same, they believed the fault would never make an appearance that mattered.
No, I did not miss your point. Boxer and Markey have asserted that SCE knew that the steam generators were faulty, not that they contained some kind of low probability chance of being faulty.
That is completely different from deciding NOT to install an unnecessary piece of back up equipment on the off chance that it might be required under special circumstances.
You wrote the question “why on earth would any corporation take the risk of installing components known to be faulty into a vital, multi-billion dollar production facility?”
I answered that question, not any connection to Boxer and Markey. It’s the same reason that fire protection and fireproofing were installed knowing that it was faulty. Or not replacing it 30 years later, or the NRC ignoring it for 10 years.
They believe the likelihood is so low that they sleep well at night.
Furthermore, if you think that automatic condesate bypass valves are only needed at an “off chance” under “special circumstances” then you are overlooking things too.
Again, I disagree with your analogy. There is a huge difference between NOT installing components that your engineers do not think are necessary – and have shown why they do not believe they are necessary – and installing faulty components that you KNOW are vitally important for the functioning of the production system.
In my view, simpler is better. Prove to me that those automatic condensate by-pass valves that you are discussing are needed in anything other than an unlikely event like the one that initiated the TMI accident. Prove to me that they are the only way to address that rare event.
I have the same issue with your assertion about fire protection. Where is the evidence that the hundreds of millions spent on fire protection and improved fireproofing has done anything to improve the reliability of the plants? There was only on significant fire at a nuclear plant that I know of and it was started by a preventable, stupid human error. Just stop using a candle for inspections.
I did not make an analogy, your the one who continues to do that, and thereby miss my point, which answered the question you asked. You’re trapped in your own mindset.
Of course you made an analogy. You compared one situation to another, imply some kind of similarity in the thought processes involved.
I am trapped in a mindset of understanding reality, not a fantasy world in which corporations are evil even though they are run by normal, reasonable, human beings who are generally more logical and analytical than the average anonymous mudslinger on the Internet.
I did not imply a thought process, I stated that it IS the thought process. And the same thought process appears in numerous situations. There is one thought process. Now go back and reread my first post.
I never mention “evil corporations.” BTW The Kemeny Commision was most concerned about the “mindset” if the industry. You are a good example of that at times. Stating things like no one has been harmed at Fukushima. 1. you don’t know that 2. A million people lost their homes to the exclusion zone. If that’s mudslinging then its in the eye of the beholder. 😉
Also, to change the arguement of needing fireproofing improvements to one of reliabilty, and not safety is a real eye-opener.
We should debate on national radio some time! You are a worthy opponent.
A million people in the exclusion zone ?? The reported number is around 110 000 people evacuated, with a significant part from areas that are ready to be resettled now.
Thanks for the blog – definitely a refreshing read as opposed to all of the factually ignorant, scientifically illiterate, anti-nuke propaganda that passes for news these days. And I certainly agree that Boxer and Markey are two of the most vile, loathsome TOOLS in Washington DC these days – and that is definitely saying a lot! However even as pro-industry as I am (like you), I still have a hard time believing that SCE is simply the victim of unfortunate circumstances in this case, or that their troubles are a result of “setting the bar too high.”
I spent the better part of 16 years working at SONGS, the last 4-5 of that doing Root Cause Evaluations (RCEs) myself too. As you probably know, SONGS does not have a good track record when it comes to caring for their brand new, shiny, high-dollar, capital replacement equipment. Back in 2001 they wiped the newly manufactured and installed Turbine Rotor (also on U3 coincidentally) just after coming out of an outage. And while just as in this case there were a plethora of reasons in the RCE as to why this appeared to be an explainable, one-time event, I believe the underlying cause goes much deeper. Using SCE’s own processes and words, we should be looking at a Common Cause Evaluation (CCE) here, not an RCE.
Just like the new S/G’s, the new Turbine Rotors were of a highly-optimized, reworked design meant to squeeze a few extra megawatts out of the Steam Flow – yet which SCE (and the manufacturer) claimed during the process were simply “in kind” replacement parts. In fact, the entire project to “Upgrade” U2&3 (new Rotors, S/G’s, Reactor Heads, etc) was a decades long effort to increase efficiency and boost electrical output (and thus of course revenue) from the aging units – ostensibly by taking advantage of better manufacturing techniques and improved material/mechanical efficiencies – yet all of which somehow did not require a license amendment. Now I admit I am no expert on the 50.59 process, but it just seems to me that this innocent effort to wring out a few more percent MWE has resulted largely in cost overruns, lengthy delays, blown schedules, and expensive damage control and mitigation.
In all cases, SCE could (should?) have simply dusted off the old 1970’s blueprints for the components in question and shopped around for someone to build them to original spec. Instead, they opted for “new and improved” stuff which was passed off to Regulators, Shareholders, and the Public as “in kind”, but which we have all now witnessed through painful hindsight as having led to nothing but trouble. And add to this that all of these events took place during the years of SONGS’ spectacular, precipitous fall from INPO 1 to INPO 4 leads me to wonder even more. Maybe it’s just me, but this seems much more than just coincidence…
How so very true. Also costing over 2000 local jobs and dumping 1500+ houses on an already strained housing market.
How expensive is the fix ?
Is the fix rocket science.?
Why not give a a clean , simple , comprehensive answer , forget attacking Barbara Boxer.
The business challenge is that there is no way to estimate the cost of a fix that will be considered acceptable by the NRC, which is driven to attempt to give everyone a chance to express their opinion before making a decision. If the plant was a coal, oil or gas burner with some defects in its boilers, the owners would have found the defective tubes and inserted plugs. The plant would have been running within a week or so of its initial shutdown and the cost would have been measured in the tens of thousands of dollars.
I have read the detailed reports from the engineering firms called in to do the investigation. A safe, technically sound repair path could have been to simply plug the offending tubes and then restart the plants. It might have been prudent to remind the operators to pay a little closer attention to the radiation monitors that detect primary to secondary leaks, but as a former operator (submarines) I expect they would do that automatically. Besides, the events in January 2012 indicate that the San Onofre operators were already well trained in what to do if a tube leak develops.
All of the wailing and knashing of teeth by both the industry and its opponents overlooks the fact that no complex system is perfect, but some are pretty darned good.
@Rod. Isn’t this exactly what the NRC recommended?
Actions for Unit 2 and 3 (p. 2 and 3):
– pressure testing of tubes.
– preventative plugging.
– corrective actions (for retainer bar-related tube wear and tube-to-tube interaction).
– protocol for inspections and operational limits.
The only addition to this was from SCE seeking a license amendment to operate Unit 2 for a test period of 1 – 2 years at reduced power. Apparently, as was stated by the company, to recoup costs until a solution could be implemented to mitigate against design flaws and slow down excessive tube wear. The NRC responded as anticipated and scheduled a public hearing. NRC never suggested there wasn’t a technically sound repair path that couldn’t be developed in this instance (and restart the plants to run them on a cost effective basis at 100% power). Are you suggesting there should different set of rules for SCE than other license holders with the NRC?
Ultimately, it was costs that were the deciding factor here. And these costs could have been minimized by seeking the normal process for design upgrades, and review of technical requirements and changes to existing designs. They were not in this instance, and SCE is paying a heavy price for their decision (ratepayers and consumers too). I don’t see where NRC is to blame for this. The generator design was faulty, and SCE apparently had a good sense of the risks involved (and took them anyway). Nobody is to blame for this except MHI engineers and SCE management. The regulatory process could have prevented these mistakes, and appears to be quite sound.
I am not going to disagree too much. I have a great deal of respect for the staff at the NRC. They are almost universally well trained and well educated. Their technical recommendations are generally spot on.
My disagreement with you comes in not accepting the notion that the current process is acceptable. It takes way too much time. The NRC has been politically forced into a position of being unable to consider the expense associated with delay. They bend over backwards to listen to public commentary – especially from groups that have expressed opposition. The problem with that approach is that it ignores the fact that at least SOME of the opponents MIGHT include competitors that have an economic interest in stretching the clock and the cost as far as possible.
Please understand that the ticking clock at SONGS was costing somewhere between $1 and $5 million per day, depending on gas prices and assumptions. That includes weekends and holidays. Something tells me that there was not much overtime approved at the NRC, so the bureaucratic capacity factor working the issue was about 30% (40-50 hours per week out of 168 available).
Boxer and pot smoking hippies living off the State eventually closed SONGS. Hope they have solar Ipod chargers when their power bills double next year.
Sorry, but the people that decided to close SONGS occupy corner offices and boardroom seats. They are NOT pot smoking hippies, but financially motivated (aka greedy) corporate “leaders”. I will agree that they live – rather well – by collecting taxpayer (aka ratepayer) money using what amounts to government power.
By forcing me out of Fullerton my family is treating me as if there has been a meltdown at San Onofre. I am not economically self sufficient because I have cerebral palsy. Sometimes I wonder If I am being punished for making pro nuclear posts on the Internet. I was born in Fullerton and I lived in Fullerton almost all of my life. I told a few things about myself on the blog Friends for Fullertons Future. Unfortunately the blog was closed to further comments as of Feb. 27, 2013. People with disabilities are hurt by hurt by high energy prices more than anyone else. I knew virtually nothing about nuclear energy before I gained Internet access in 2004.
Comments are closed.
Recent Comments from our Readers
@Cyril R What was Tesla’s learning rate starting at the first Roadster? How much do you think that first unit…
A new engine or turbine product line doesn’t just cost triple a unit. That’d make it pointless. Yet this is…
Cyril First of a Kind (FOAK) applies to products whose parts and method of assembly are new, not just products…
The problem with the FOAK argument is that FOAK LWRs were built half a century ago for under $300/kWe. And…
I kind of wonder if there aren’t some smart Canadians looking across the border and rubbing their hands with glee.…