Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This isn’t very big news. Issues occur during bring-up often. Linde’s processes are possibly so power intensive that failing over to generator power is not possible. TSMC is right to put Linde on notice since Linde should have a PFMEA and control plan to eliminate any root causes for downtime. I suspect in the long term TSMC has plans to insource this if the issue persists. Scrap happens sometimes during manufacturing, if the writer only has journalism experience and no manufacturing experience then they may not have a conceptual understanding of acceptable first pass yield. After all, the TSMC logo features failing parts!


In many ways I agree with you, but the problem statement (constrained/exhausted gas supply from vendor) makes it seems like this was not just line down, but the whole factory stopped for a few hours. Line down is a miserable migrane but still managable... while a whole factory stoppage makes a lobotomy seem like a good idea. It also sounds like there was not enough forewarning to park critical customer wafers in a "safe" stage of the process.

Even so, I also would still call this another monday at a semiconductor factory. Welcome! Here we play a nearly endless game of whack-a-mole. Here's your mallet and your towel. Now whack enough of the moles hard enough until they stop coming back (at least through the same holes). Beware the alpha moles.

By any road, I am surprised to see even this high-level perspective on a quality event disclosed to the mainstream public; I thought this was not standard practice. I enjoyed the read.


Just curious, would a full factory stoppage require recalibration or revalidation of certain equipment? Or is it more like an atmospheric issue that only affects the product.


I'm also curious. Its not like the power went out and machines unsafely shutdown though.


Sorry for the delay friend, I missed your message.

The number of issues that a semiconductor factory stoppage would cause stretches one imagination, worse if you cannot bring the material to a "safe" spot on the line. I will try to capture a few of them, off the top of my head.

As you alluded to, Contamination is the big one. You really need power to keep things clean. But also, the process that runs in the factory is just assumed by default to run all the time, and you optimize the process around that assumption. In a system with thousands of operations (and many suboperations within each operation), the process window is just too small to tolerate much deviance, and the process window is certainly not explored around a hard restart like this. We want to prevent it from running under these conditions at all!

Now for some more details:

- If your fab air handling/pumping system stops, particle counts will explode. This in turn causes killer defects on the process material.

- You also can't keep your tools evacuated at high vacuum / ultra-high vacuum levels (effectively, atomically pure). Pumping down to this level is not trivial and can take weeks of work to restore if the vacuum chamber is badly contaminated. Fab air is much better than the labs I used pumps in, but it is still a big job to keep these chambers pristine.

- Many tools are implicitly dependent on continuous operation and consumption of feedstock and workpieces (often called tool conditioning). For example, Letting a dry etch chamber idle means it will inevitably develop some kind of contamination layer over the previous chamber-wall conditioning layer. This can happen very fast (think ~30 min) even when the tool is idling under ideal conditions, and it often forces process module friends to run "dummy" conditioning wafers to manage the issue. Now imagine what might happen on non-ideal conditions.

- Feedstock / consumables can go bad very fast. There's wet and gaseous feedstocks trapped in the lines of every single tool, and most modules don't characterize what happens to the feedstock quality when the tool is shut down, at all. Related, I remember a story where a lab was having a terrible time replicating what was happening in a foundry due to particle contamination from wet cleans/etch. It turns out that the particulate was coming from the plastic jugs holding the wet chemistry. The root cause turned out to be the fab used that chemistry so much and so fast that the particulate contamination was never a problem, while the lab might have held the half-full jugs for months, causing plastic bits to build up in the chemistry.

- The engineers must prove that their tools/segments works as spec'd post restart. This is exhausting and painstaking work. Bringing tools back up to production in the course of normal operation is already tiresome enough. But you cannot just run critical material and hope for the best! SO now you must spend days validating the entire process line again.

- You can try to shelve / store key material to avert true disaster, but there are critical segments where this is impossible due to reactivity or sensitivity or whatever. You have a finite amount of time to get your material out of those high risk segments, and if the gas supplier only gives you an hour of forewarning, all that material might be totally screwed and there is virtually nothing you can do except cross your fingers. The material would likely be scrapped anyways since the risk is known to be too high to bother processing it further.

- There is also a finite amount of time where the wafers can spend in stores, even if they are pulled off the line in "safe" segments of the process. They will still collect particles, they will oxidize, surface quality will degrade as long as they are not in optimal conditions. Cleans are an option, but you must be sure those cleans do address the specific types of contaimination the wafers collected while in the stocks.

OK, that's what I could immediately think of off the top of my head in the time I have available. Hope that satiates your curiosity for the moment.


> After all, the TSMC logo features failing parts!

I'm not sure about that, I think the blank spaces are just parts that have been picked. The dies have been cut and the good ones are being removed.


Normally a wafer would have die-sized spaces for test structures used for optical, electrical, chemical and other tests. Think the TV test card https://en.wikipedia.org/wiki/Test_card


    > This isn’t very big news.
The opening paragraph feels a bit pearl clutching to me.

    > the company had to scrap thousands of wafers that were in production for clients at the site which include Apple, Nvidia, and AMD.
Eh. So what? I am sure scrap thousands of wafers for all kinds of other reasons. I would be better to know the cost per hour of a total plant shutdown. (Of course, I'm sure the author doesn't have this information.)

    > After all, the TSMC logo features failing parts!
Final hat tip here. I never knew that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: