In the second and final part of this series, EPRA president Alan Ross speaks to EPRA Fellow Jack Nicholas, a giant in the reliability industry and author of several landmark books that helped define the industry.
Full interview (including transcript): Click here for full interview
EPRA Felllows: Click here to learn more about Jack Nicholas
Join EPRA: Click here for membership information
"Reliability is often placed under maintenance--but it should not be. You cannot maintain your way to levels of reliability that aren't designed into the asset." - Jack Nicholas
"The impact of the elements of industry 4.0 and digitalization will be profound... if we manage them effectively." - Jack Nicholas
"It's very important that we capture that tribal knowledge in one way or another, and then blend it with the new knowledge that we have about modern technology and digitalization." - Jack Nicholas
Jack R. Nicholas, Jr., P.E. (California) CMRP, CRL has been project manager and developer of predictive, condition monitoring technologies and maintenance and reliability programs, first as a senior civilian engineer on U.S. Navy nuclear submarines and surface warships for 17 years, and then as consultant to and trainer of key personnel in government facilities, oil field services, mining, refining, utilities and manufacturing firms in North America (Canada, USA) Australia, Asia (China, Japan, South Korea, Hong Kong) and the Caribbean for the past 25 years.
He has advised and trained US and UK government agency personnel (Defense, Energy, Space and Nuclear Regulation) and the Electric Power Research Institute (EPRI) on best maintenance practices, including RCM, predictive condition monitoring and related matters. He was founder in 1989 of what is now called PdMA Inc., where he conceived its first motor circuit analysis suite and helped recruit and lead its R & D Team. He helped bring the product to commercial success in the mid-1990’s.
He is author, co-author, and/or editor of twelve (12) books on maintenance and reliability subjects and contributor to many others. He served the Society for Maintenance and Reliability Professionals Certifying Organization (SMRPCO) for seven years from its startup in 2000 as Board Member, Exam Director, Team Leader and Chairman of both Certification and Accreditation Committees.
He was active in the Association of Iron & Steel Technology Reliability Achievement Task Force and Operations & Maintenance Committee and its predecessors for eight years, serving as Task Force Chairman two times.
Specialties: Author of numerous professional articles on subjects such as RCM, PdM/ACM Management, Business Policies, Processes, Plans & Procedures (P4) Management, RCFA, Maintenance, Reliability, Workshops & training seminar organizer and leader on the above subjects. Auditor of ACM/PdM and Policy, Plans, Processes and Procedure Programs.
Alan Ross (EPRA):
Today, our guest is someone who has made a difference in the entire reliability world in my life. And I know in the lives of many, many other people it's Jack Nicholas and Jack. Welcome. Thank you for being with us today. Thank you. Happy to be here. So let me give you a quick background. If I, if I were to give you Jackson tire bio, that would take the first 20 minutes of the podcast, we're going to do two separate podcast part one, and part two. The first one I want to focus on really is Jack's how he got to the point of where he is. But Jack is a public speaker, a consultant. He is self-employed. But I know he does a lot of different things for a lot of different organizations. Jack was one of the founding members of the CMRP program at SMRP.
I know he had a lot to do with that. He is a frequent speaker at SMRP and at a Reliabilityweb conferences. I have heard him probably a half a dozen times, and every time it's been a delight, but he's been a project manager and the developer predictive and condition monitoring technologies and maintenance and reliability programs. Now that's a mice mouthful. Jack first started as a senior civilian engineer in the us Navy nuclear sub. After being in the Navy, he'll share some of that with us and on warships for 17 years. Now, that's a great place to learn reliability, right, Jack. Absolutely. Yeah. He has traveled all over the world. He works with companies now helping people with their overall reliability programs. And he is clearly one of the thought leaders in our industry. Jack has authored a coauthored 12 books on maintenance and reliability subjects and his contributed on many other books.
I know he's again, affected my life, but Jack, what I want to do is get into a little flow here, cause I know some of your background and I want our listeners who, especially the young folks who think they know something about reliability, to know where it all started. But tell me how you did get into this world of reliability and start back in the Navy. I started in reliability by incredible luck. I was assigned as a fresh to a Navy destroyer. It was extremely helpful to listen an officer personnel, right from the first few hours on board. And for 38 months thereafter on that ship, I'll give you an example. I landed on that ship from a helicopter, having flown over
From an aircraft carrier and that after having come through six different countries to get to that ship in the middle of the Lebanon crisis in the Mediterranean in 1958. And as I touch the deck, the chief boatswains mate made who grabbed my legs and brought me down, sent me to the bridge, telling me my sea bag would be on my bunk. And I'd see it in a few hours. Little did I know when I got to the ship that they had left port so quickly after the beginning of the Lebanon crisis, that half of the wardroom and about a third of the crew, didn't make it to the ship in time. So they were very short of people and everybody was on board. At that point, it was exhausted from back to back watches and long hours on the general quarters waiting any kind of reaction from anywhere else that they were encountered in the Eastern Mediterranean.
When I got to the bridge, the commanding officer explained all that to me. And then he said, I'm going to turn over the deck to you, given that you have some experience, which I told them about during my first class cruise, I had stood watches on the bridge and knew what was going on. So he basically turned the bridge over to me. I was the only officer there, but I was in the hands of some very capable, enlisted personnel. And right from the start they protected me. They made me do the right things. And so right away, I got involved in the ship that I was aside of the engineering department, but stood watches at sea on the bridge and was assigned battle stations and a gun director for air and surface warfare. And on the bridge for anti-submarine warfare, this was an incredible learning experience, right from the start after about 20 months, serving in all the engineering divisions, I was promoted to chief engineer with about 80 personnel in the department, which was responsible for all the propulsion, auxiliary and hotel services on board.
The ship had been run hard and put away wet on too many occasions. And by that I really mean put away hard and wet for example during my tour as a chief engineer, we had to replace the uptakes from the, that went to the stacks of the ship because they hadn't put the covers on all the time. And rain caused the stainless steel in the uptakes, too, weather, and eventually rust to the point where there were holes in it. And the whole uptake system had to be replaced, remove the stacks and put in new uptakes made of stainless steel and thereafter, of course, we religiously put the covers back on the stacks whenever we were in port and the boilers were shut down, but we never missed a commitment. We deployed overseas for about 50% of my time on board and we had our share of difficulties and they had to do with the reliability of the equipment, which was very simple, but often prone to failure. So we had to learn how to fix it quickly, which was our specialty. We really didn't have any kind of predictive maintenance or asset condition monitoring tools. At that time, we went with our gut and the experience of some really fine chief petty officers that I had on board that kept us out of trouble, even though we had troubles to deal with all the time. And that, that's where I really got my start reliability from the ground up with an extraordinary crew.
Alan Ross (EPRA):
The idea that you did it under fire, which is really when you think about it, I don't want to make the parallel with a time of crisis or potential battle. But we, we put a lot of people in charge of reliability, of old plants all the time, and they may not even have an idea of what's going on at the plant, but suddenly they're thrown into it. And it is that CPO that that person that is in the trenches doing the testing and maintenance every day, that really can help a reliability engineer, a newly appointed electrical reliability engineer. And many times it is the relationships that they developed with the plant personnel, with the on floor personnel that will either lead to success or failure because those folks have a lot of legacy, knowledge native knowledge. They understand the equipment and things. So that's a great point that you make that you learn from, from others regardless of position, I want to go back and focus on the times that you were on a nuclear sub. I've heard your experience. I love you. I think it was you either did that not you, but the ship did actually hit a, an aircraft carrier or come close. Talk to me about that.
Well, first of all, I was selected for the Naval nuclear propulsion program by Mohammad Rick, over in 1961, he's considered the father of the nuclear Navy. At that time, there were only about 12 nuclear subs and commission 25 years later when I retired from civil service and then they were reserve at, at the same month we had 128 nuclear submarines, and I had responsibility at that point, which I'll talk about later for 122 of those, 128 ships from the standpoint of NSA condition monitoring. I was also assigned to the world's first nuclear powered ship, USS Nautilus. It was an attack submarine. And here again, I was fortunate to have another really helpful enlisted an officer group who aided in my qualification in submarines and near the end of my 44 months tour their qualification for command of subs. During that tour, the ship was assigned to the Portsmouth Naval ship yard.
About 10 months after the loss of the USS thresher with 129 souls on board, they had been conducting sea trials from the shipyard and had a major casually that caused them to lose their buoyancy and sink to the bottom. About 2000 people low there. I really got my hands on experience and reliability specifically as the Navy's SUBSAFE program was being developed as a result of the loss of the thresher because our initial time on the shipyard was supposed to be about 14 months and because of the loss of the thresher and the strong desire to modify submarines, to rectify some of the problems associated with the loss they extended our shipyard time to 27 months during which time I had access to the, at that time, top secret report on the loss of the thresher and access to some of the people who served on the board.
So I could learn as much as possible about that loss and what we could do to ensure that we didn't suffer the same problems. The culture of the Naval nuclear propulsion program was one of constant emphasis on reliability of personnel and machinery that continued to resonate to this day, mostly throughout submarine force, but also in the nuclear propulsion departments of our aircraft carriers. And to some extent in the commercial nuclear power industry. So I had a good opportunity there to learn again from people who had actually experienced the problem of loss of reliability and how to avoid it in the future. With respect to the collision you referred to, we actually did collide with the USS Essex CVS9. One of the ships that came out of world war II was a heroic ship. We were making simulated to torpedo attacks on her during an exercise off the Atlantic coast, 1967.
We were submerged and I had been conducting where we had been conducting a simulated attacks on her for several hours. I was directly involved with the events leading up to the collision, which occurred in spite of my recommendation that it followed could have avoided it. I was actually the also the deck making the approach on the ship when we wound up colliding. I had been noticing when I was watching some of the indicators from sonar that the patterns on there were quite different from all the previous attacks that we had conducted that day. And having looked at that and looked at the plot and seeing other things I really admitted to honestly having lost the picture. I couldn't figure out where we were, but I was afraid we were too close. And I recommended to the commanding officer who was also there with me and the second in command, the fire control coordinator, executive officer of the ship that we pass under the ships and attack them from the other side.
They declined to take that recommendation. The commanding officer relieved me of the con and decided to go up and take a look. We used our active sonar at that time for the first time in order to preserve our stealth up to that point to try to get a range. It turns out we were so close the range didn't indicate on a sonar. So we really didn't know where we were. We'd all lost the picture. So we went up and basically the commanding officer was looking at rivets on the ship because we had trained our crew so effectively a two word order got us to submerge the ship much deeper than the Periscope depth. We were at much faster than normal we'd practice that time and time again. And because those, in my case in this case, eight enlisted personnel did exactly the right thing.
We got down deep enough, so that we only got hit at the top of the sale, instead of in the hall, if we'd have been hitting the hall, it would have been instantly catastrophic failure and loss of that whole ship recognize this was only three years after the loss of the thresher. And about three years before the loss of the scorpion, the second nuclear power submarine that we lost in that decade the loss of the Nautilus would have been three out of the 12 or so shifts we had in commission at the time which would have been disastrous for the submarines. We actually were a submarine that used world war II, weapons and tactics in a generation jumping propulsion system in the submarine. And we were learning new things all the time, but we hadn't learned them fast enough to avoid that particular collision.
Alan Ross (EPRA):
Okay. My next question, my next question has got to be what were the two words that got you to die to get down?
The two words were "Emergency deep."
Alan Ross (EPRA):
They were sent to the diving party, which was one deck below us through an open hatch and open walkway or stairway. So he used the voice tube to get that message down, but you could hear him all over the ship or at least in the attack center. And because all of the people who were involved knew exactly what to do. We got down faster than you would ordinarily lower the ship in depth. And that basically saved us from the catastrophic collision that could have occurred. So as a result of that of course I learned a great many lessons, which I have put into a multimedia keynote speech, which I give I've given about six times to audiences up to over a thousand. And they've been really well received because the life of a submariner and the things that you do on submarines are so foreign to almost everybody in any audience that I've spoken to. That it's fascinating for them to hear what I have to say.
Alan Ross (EPRA):
I think everything that you've talked about it seems like is so far advanced. It's like aviation got to reliability before the steel industry did, and I know you've been involved with the AIST so, you know, these there's industry still coming into it commercial we realized that data centers are not necessarily reliable. They were lie on redundancy and as they are now being pushed to cut back on, on costs, they have to move from redundancy to reliability. There's a lot of different industries that are now just going from a, a maintenance mindset to a reliable mindset, which we'll talk about in a minute, but I want to talk about you, you are a prolific author, so you've done published a lot of books. You're publishing. You've got one now just published, or is, or is about to be published with reliability web. But talk to me about your writing. And as you write, what do you want readers to get from when you write a book or you write a chapter of a book, what do you want the readers to get from that?
Well, first of all, I write for people on the mill deck and their immediate supervisors who speak a language. That's quite different from people in the C suites of an organization. My intention is to enhance understanding between the highest and lowest personnel. So there's alignment between their positions in support of the overall organization. The book "Asset Condition Monitoring Management," which was published in 2016 is really a follow on to a book that we'd had in print about up at that time for about 15 years called "Predictive Maintenance Management" and [the book] focused on the technical aspects of the technologies that are used in any predictive maintenance program. I relegated that to an appendix of the new book, which really concentrates on the management issues associated with getting from zero to a viable award-winning asset condition monitoring program.
I actually changed the title. That's a condition monitoring because people started using that and it looked like it was the wave of the future, and it still is. So what I tried to do was align that with the newly minted ISO 55,000 series of standards to bring in things like big data management, advanced analytics, which I covered in some detail in the, in the current edition artificial intelligence machine learning line of sight, alignment of policies, plans, processes, and procedures from the top to the bottom of an organization and all the related subjects that help you get to the point where you are world-class in that particular segment of maintenance and reliability. Also, I just finished the list of changes and corrections for the 10th edition of volume two have a set of four books, titled motor electrical, predictive maintenance, and testing.
All the above keeps me current on the subjects of the book. I have the luxury of being somewhere retired, being able to do this. And the internet has been a great source of new up-to-date information, but I also get many ideas from print media, which I'm constantly reading. So I have a lot of ideas. I hope I can live long enough to work on all of them. And that's where I'm at.
Alan Ross (EPRA):
A semiretired doesn't sound like yourself retired, but you're, you're an example of the, what we call legacy knowledge that we need to capture, and you're capturing it in books and in your talks and in keynote speeches and in in, in a plethora of different ways. So we're going to end this part of our conversation, and I'm going to move into a short break. So Lee, we're going to break, and I'm going to intro part two. And we are at 29, 23. Now about five minutes of that was me just starting the tape. So we're at right at 30, and we're going to start part two. Welcome back to our podcast with Jack Nicholas. This is part two of our podcast series. One of the things that we're doing at EPRA is we are creating podcasts that we call legends and that's taking people like Jack or John McDonald from GE or some of the just incredibly gifted people who were passing on their knowledge from one generation to the next.
I hope you listen to part one, because just the whole concept of the generational knowledge being passed from one generation to the next Jack actually covered him receiving knowledge from the, from the deck, from the people that the CPOs and the enlisted man when he was in the Navy. And now him passing on that knowledge to people who were on the middle floor through his writings. But I want to switch gears and talk about reliability. So you, from the very beginning, and you mentioned this, that the reliability of the nuclear submarine fleet was incredibly critical, especially after the loss of the thresher. So in my world, I've been in, in a different part of the world for most of my life as a mechanical engineer. And then as a business leader I always considered as a C level business leader, maintenance to be a cost center.
I'm embarrassed to say that one of the things I constantly said is, you know, we've got to make sure that we hold down our costs, hold down, our costs, hold down. Our costs and cost centers are the place we ought to look at. So reliability has become a subset of maintenance, and it's been a potential liability for the reliability for professionals since maintenance has a cost center and people are trying to cut costs, whereas reliability should be considered an investment in asset integrity. I have created a term, a return on asset reliability and the return on asset reliability is different than the return on assets. So if you look at the investment, it is asset integrity, especially in the electrical system. Jack, I know you're not an electrical reliability expert, but you are a reliability expert and you do know a lot about motors, which is to me, the end result, the medium voltage and low voltage end of it. And at opera, we primarily focus on the high voltage end of it. But what do you do? Just talk to me about how you cannot maintain your way to any sort of reliability, especially electrical system. And let's just talk about where you see reliability in this world of SMRP and reliability web.
Well, first of all, let me say that reliability is often placed under maintenance, but it should not be, you cannot maintain your way to levels of reliability that aren't designed into the asset. In other words, reliability is a function of design. And so other factors having little to do with maintenance, for example, how you construct the asset or production line that you're trying to maintain. Once you have developed a maintenance program, it's often for a given system, you can only bring it up to its operational capability and its reliability design, its inherent reliability design. And the only way you can get above that is to redesign it or modify it, to meet some of those new challenges that are created by failures that reveal the least margin that you have to maximum reliability. However, maintenance personnel can often determine what that level of reliability actually is.
Once operations begin and you can pinpoint where improvement in design can enhance reliability and the maintenance personnel can greatly enhance root cause analysis, investigations and help implement corrective actions needed to mitigate or eliminate problems. But everyone must cooperate. If improvement in reliability is to be achieved. For example, the C level must provide the resources to make the design improvements. Engineering must provide the specifications specifications for design changes. Procurement must make the proper purchases of specified items and contractors responsible for installation much do their jobs properly. So everything comes out, right? Maintenance must adjust the processes and procedures to accommodate the changes in assets and anticipate the needs as changes are implemented in the newly redesigned system. However, small that redesigned might be my attitude towards this is that reliability affects the economy, economics and or mission readiness of an organization directly and so does maintenance whether in commercial or nonprofit pursuits as such, it should be and often is of direct interest to the leaders and by extension all levels below him or her, some of the best enlightened leaders have reliability managers reporting directly to them or to their subordinates.
There should be ongoing efforts to educate and train various subject matter experts on the techniques and processes to ensure continuous sustainment and improvement of reliability as assets change due to age, because most of those are D or procured on the basis of least cost of all the competitors are trying to sell you some of the things that you put into your systems and you need to take into effect changes due to age operating conditions production, target changes, or other influences. For example, I've been working with SKF recently on a pro bono basis developing a chapter of a free book that they intend to put into the internet on aspects of improving reliability. As you try to ramp up production for new things like masks face shields, and other things needed by medical personnel to offset the coronavirus pandemic.
There should be ongoing efforts in order to meet these production target changes that are going to occur in all of the installations or the startups of new installations that are aimed at helping us mitigate this problem. These include, but are not limited to the application of a methodology such as reliability center, maintenance, defect, elimination, risk threshold investigations, the discipline use of processes and procedures for operations and maintenance, as well as proper maintenance, planning and scheduling and developments of plans for enhancement of the overall maintenance effort. There's so much to do that. Just one person to many organizations, can't be expected to be successful without knowledgeable helpers who inevitably will pay for themselves many times over as they do their jobs.
Alan Ross (EPRA):
Jack, you just mentioned what we're calling the systems map because an electrical system reliability, the first most important step is C level sponsorship. It is critically important because what has happened is the electrical system has been the forgotten stepchild. You know, the transformers behind the fence, it's out there, it's worked for 40 years, everything's old, but it works. And unfortunately that's not true anymore. It's old and many times it doesn't work. And when we put just make an asset change, we replace a transformer because it's older. We might actually, because of the design, we might actually reduce the reliability of the system, the whole flow of power from the transformer down to the motor and the motor into the feed that runs whatever line it's running. And so we're putting in place what we call the systems map and electric power, reliability systems map.
And actually it follows all of your steps here. You've got to have a team of people. You have to have procurement plant operations, maintenance. You've got to have a lot of people buy into the process and the program. And then the other components of it, the technical components of the testing and maintenance programs, the data management data collection they're all a separate part because that's more of the technology of electric power, a liability where the organizational structure of it is actually the most important part. The design of that team of the EPR team is the most critical thing that someone responsible for reliability of a plan can assign to someone, given the responsibility for electrical power, reliability of that plant. You've talked about I mean, that's the perfect segue into the next major topic. And you mentioned this in part one, when you talked about your role and what you've done, but we're moving into this world of artificial intelligence, machine learning, digital twins, maintenance 4.0, you name it.
There's a new buzzword going on for the changing that's going on in manufacturing, commercial in a lot of different places. And I'm going to add the fact that there is a huge emphasis on automated systems that is changing the power quality demands more than ever. We have a case study that we've done. And in that case study the installation of new freezer freezer systems for a food processing company was a multi-million dollar replacement of old freezers that the top, the food at the top of the freezer would be at a different temperature than the food at the bottom. And they had to manually pull out trays. So they had to manually check on this. Well, the new system did it automated, automated, it moved fans up and down so that it stayed constant beautiful system. However, when they turned it on, it began to fail randomly different freezers failed randomly.
They could not figure it out. They brought in the freezer company, a whole group of engineers. And after days of of deciding, we're going to take these out, we're taking them back and we're going to bring the old system in one of the electricians. You know, as a staff electrician saw the group of white shirted people over there, some with ties and he went over listed a little bit and said, can I make a recommendation? There has always been harmonics in the direct feed from that transformer to the, this motor control system to what's controlling these freezers. You might want to check that they did. And the small harmonics that actually cost $1,500 to fix had been there, but had never been a problem with the old freezers, with the new automated systems, that little bitty harmonics that was in the line, which isn't a lot of places shut down. These freezers that could have been a million dollar problem that was solved with $1,500. It was a power quality issue. And so you add automated machinery into a system like robots and these automated self-controlled units. And you, you add to this whole AI ML, digital twins and maintenance 4.0, what do you see their impact for their reliability in the future?
Well, first I think the impact of the elements of industry 4.0 and digitalization will be profound if we manage them effectively. For example, we have often had experts in this field, lament the loss of tribal knowledge as older members of the workforce retire like that person who understood how those harmonics affected the modern electronics that you were trying to make use of this is exacerbated by automation and reduction of the labor force, maintain these operating assets as machine learning and properly maintained digital twins take hold. Much of the tribal knowledge can be captured in perpetuity for an asset. More importantly, the wisdom we gain from these and other currently short memory sources can be continually enhanced as assets, age, or subject changed operating conditions. The subject kind of manage the data and the results of analysis using industry 4.0 and the concepts made possible by digitalization will be dressed in addressed in part by one of the chapters under development for the book. I mentioned as a condition monitoring management, second edition of the book that I mentioned above. So I think the future of the digital twin artificial intelligence machine learning and all the elements of maintenance 4.0 will have profound effects on reliability in the future. But again, it's got to be properly managed by people who understand the details and some of the tribal knowledge that otherwise would've been lost. Had we not put it into these elements of the industry 4.0
Alan Ross (EPRA):
Jack, I want to ask you one more question that that as a result of what you've said here it really just, I, there there are generational changes taking place to not only as the AIML digitalization and all that taking place, there's this generational change boomers retiring and not everybody is semi retiring like you and continuing to work and give to the industry. There are a lot of people retiring and they leave with that tribal knowledge. And the next generation of people are much more prone to using technology than maybe my generation is I joke with my sons that you know, they say that they're a digital and they accused me of being analog. And I say, son, I am not analog. I'm Amish. I'm still trying to figure out this entire world, but talk a little bit just about your experience and your recommendations to not our generation. Talk to the next generation of young people coming in. And what advice would you give them about how to just capture as much knowledge as they can and how they can make a positive impact on reliability. Go back to the time when you were that young Ensign and talk to the next group of young ensigns.
Those people will learn a lot. If they learn to listen, listen to the people who are going out the door, try to capture as much of the knowledge that they have as possible, because they've got a lot of it and you may need it in the future. It's very important that we capture that tribal knowledge in one way or another, and then blend it with the new knowledge that we have about modern technology and digitalization and other elements that will make you successful in the business that you're in. So you want to blend the wisdom that you can gain from those people and then apply it with your own knowledge of what's new on the horizon or that you're trying to deal with in your day to day jobs. Because I don't see today's entry into the business of maintenance. Reliability is any less challenging than when I was there. It's even more so because of all the new technologies that are available. So take it on board and learn how to use it and apply it. And we'll be successful as a country and our individual businesses, and particularly in electrical power research area.
Alan Ross (EPRA):
You know, I think it's funny, Jack, you, you, you had to leave it with the thing... "take it on board." So this has been a podcast from [inaudible] and our guest has been Jack Nicholas, and I want to end this podcast was with a one challenge. You've just been listening to one of the legacy leaders of reliability, Jack Nicholas. And I would challenge all of the members of EPRA and those who are listening, who may not be members. It doesn't really matter, but be a leader take a role that says I can take what I've learned, learn more, and then I can share it because our, our philosophy at EPRA is: learn it, do it, then teach it.