Hacks

Analog Failures on RF Product Cause Production Surprise

Analog Failures on RF Product Cause Production Surprise


A factory is a machine. It takes a fixed set of inputs – circuit boards, plastic enclosures, optimism – and produces a fixed set of outputs in the form of assembled products. Sometimes it is comprised of real machines (see any recent video of a Tesla assembly line) but more often it’s a mixture of mechanical machines and meaty humans working together. Regardless of the exact balance the factory machine is conceived of by a production engineer and goes through the same design, iteration, polish cycle that the rest of the product does (in this sense product development is somewhat fractal). Last year [Michael Ossmann] had a surprise production problem which is both a chilling tale of a nasty hardware bug and a great reminder of how fragile manufacturing can be. It’s a natural fit for this year’s theme of going to production.

Surprise VCC glitching causing CPU reset

The saga begins with [Michael] receiving an urgent message from the factory that an existing product which had been in production for years was failing at such a high rate that they had stopped the production line. There are few worse notes to get from a factory! The issue was apparently “failure to program” and Great Scott Gadgets immediately requested samples from their manufacturer to debug. What follows is a carefully described and very educational debug session from hell, involving reverse engineering ROMs, probing errant voltage rails, and large sample sizes. [Michael] doesn’t give us a sense for how long it took to isolate but given how minute the root cause was we’d bet that it was a long, long time.

The post stands alone as an exemplar for debugging nasty hardware glitches, but we’d like to call attention to the second root cause buried near the end of the post. What stopped the manufacturer wasn’t the hardware problem so much as a process issue which had been exposed. It turned out the bug had always been reproducible in about 3% of units but the factory had never mentioned it. Why? We’d suspect that [Michael]’s guess is correct. The operators who happened to perform the failing step had discovered a workaround years ago and transparently smoothed the failure over. Then there was a staff change and the new operator started flagging the failure instead of fixing it. Arguably this is what should have been happening the entire time, but in this one tiny corner of the process the manufacturing process had been slightly deviated from. For a little more color check out episode #440.2 of the Amp Hour to hear [Chris Gammell] talk about it with [Michael]. It’s a good reminder that a product is only as reliable as the process that builds it, and that process isn’t always as reliable as it seems.



Source link

Click to add a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Hacks

More in Hacks

Go Back in Time with a Laser Cut Wood 3D Printer Kit

HCMay 23, 2019

Wolfram Engine Now Free… Sort Of

HCMay 22, 2019

By The Numbers: Which Rapper’s Rhymes Are The Freshest?

HCMay 21, 2019

Lateral Thinking For An Easier Charlieplex

HCMay 21, 2019

Muscle Wire BugBot and a Raspberry Pi Android with Its Eye on You at Maker Faire

HCMay 19, 2019

Fallout Inspired Display is Ready for the Apocalypse

HCMay 18, 2019

Copyright © HurtCafe.com 2016

Do NOT follow this link or you will be banned from the site!