Saturday, May 29, 2010

The Toyota Mess

I am a big fan of Toyota's products. And I talk about it. A LOT! Around the office I am known as the Toyota fan boy. I do try to be balanced about things, but in this regard I had the good fortune of owning a particular model that just never broke down and it skewed my perspective completely.

As a result, the trouble they're having initially took me by surprise and I found myself firmly in the company of the other fan boys. I was convinced that that the entire thing was probably just pedal-misapplication, and that a possible conspiracy to discredit a successful manufacturer (likely from Government Motors circles) is not entirely impossible.

But a lot of other bugs have since appeared from the woodwork. Some of them are a little silly, while others are reason for concern. Let me give two examples.

When all this started, a big production was made out of the fact that there was only one laptop in the entire USA that could read the blackbox in the cars. This, apparently, has an entirely plausible explanation, which goes like this if I understand it correctly. New legislation requires these devices to be in all new vehicles by 2012. Many vehicles already have them, but Toyota wanted to push the actual manufacture of the hardware required to read them to others. But they also wanted to protect their intellectual property, so the rights to build these devices had to be under some kind of license. Sorting out these things take time... so in the meantime all they had was a prototype. Of these prototypes, they had one in the USA. If this is true, the entire matter could be explained by a simple lack of sufficient foresight and no intended malice, or as the saying goes: Never ascribe to malice what can be adequately explained by incompetence.

At the same time, however, there is the matter of the Tundra's steering. We don't have the Tundra in South Africa, but it is basically a small truck, probably not unlike the Hilux, though smaller. In Japan, there was a recall because of a broken part in the steering system. The engineers said that the problems are likely caused by the much smaller parking spaces and more crowded roads in Japan causing much more lock-to-lock steering action, and therefore placing more stress on the components involved. They therefore decided not to do the same recall in America, despite their being a limited amount of reports of the same thing happening. This, I think, does indicate a tendency to place cost-savings and profit ahead of a good product. It shouldn't break in the first place, and secondly, when it was determined that they do break, hiding behind an excuse that comes down to "it only breaks under extreme circumstances" just doesn't work for me. These vehicles are known not to break down, even under extreme circumstances (see the Top Gear episodes where they tried to destroy a Hilux). This means that there is at least SOME truth in the allegations, and indeed the admissions of Mr. Toyoda, that Toyota became too focused on growth and took their eye off the quality ball.

Be that as it may, I actually intend to get to the matter of sudden unintended acceleration.

The American NHTSA have now linked 89 deaths to possible cases of SUA. As many other people have noted, there is a statistically relevant skew towards older people, and a couple of similarities with the mid-80's cases of SUA in the Audi 5000. As was the case with the Audi, its blamed on the car, this time on some kind of electronic gremlin. While both companies initially blamed the drivers, both of them wisened up and realised it is a bad idea to insult your customers. It is unlikely that driver error will receive the attention it probably deserve. However, lets take a look at possible causes.

In the case of Mark Saylor and the loaner Lexus, I believe it was the floor mats that were to blame. Toyota issued a TSB (technical service bulletin) in 2007 with regards to the floor mat problem. The problem with TSBs are that they are aimed at dealers, not customers. Basically, Toyota was telling their dealers to stop being idiots, and not to fit winter all-weather mats over the existing mats, and to secure them properly. Interestingly, this TSB is now used as proof that Toyota knew there was a problem a full two years before they issued a recall... but that of course begs the question whether they could have known that addressing the dealers would not be sufficient and that they also needed to tell the customers not to be idiots? It really isn't that clear cut.

Sticking with Mr. Saylor, bless his soul, there is one thing that keeps bothering me about the defense that he was a Highway Patrolman who made his living driving a car and was therefore a skilled driver. The automatic gearbox in those vehicles are still a mechanical contraption. The brakes in those vehicles are still hydraulic and completely separate from the propulsion system (unlike the Prius where some integration exists). The brakes on every single car on the road is stronger than the engine and WILL stop the vehicle even if the throttle is wide open (a small exception is possible here, which I will get to later). Shifting into Neutral WILL disengage the engine and allow you to stop the vehicle. It seems to me that the skilled patrolman defense has a problem, as there seems to be some level of panic, unfamiliarity with the vehicle or simply freezing up as most of us would. However you look at it, even though this accident may have been caused by some kind of problem with the throttle system, the level of training of the driver does not appear to conclusively prove that this must be the case.

The lawyers handling the class-action suit - and lawyers are a weird bunch, it must be said - insist that over 80% of the cases of SUA are not caused by floor mats and must necessarily be the result of an electronic gremlin. But if Toyota has had a 40% share in the SUA statistics in 2008 (which amounts to 51 cases), and we know that a part of that is caused by older drivers and floor mats, is it entirely inconceivable that perhaps their numbers could in fact be less than 30%? Perhaps not, but how can we be so CERTAIN that it must necessarily be an electronic gremlin?

To make matters worse for those who insist on an electronic problem (most likely software, although its one and the same thing to the layperson), we have no reported cases in Europe, Africa or Japan. This could in part be explained by market share: North America makes up 30% of Toyota's market. But at the same time Europe makes up about 20%, so if America has between 51 and 89 cases, should we not have at least one or two in Europe? Indeed, Der Spiegel also notes that though no other country in the world has comparable problems with SUA, it seems to be a mass phenomenon in America. The article continues to state that there were cases of stuck throttles, but those drivers managed not to wreck themselves and simply had the defect repaired. Some speculate that the European ability to stay alive might be because of the prevalence of manual transmissions over there and the instinctive knowledge built into the driver of every manual vehicle to simply disengage the clutch.

But knowing how to avoid being killed by your car, and having a car that will try to kill you is two entirely different things. So, is there a way that there might be some truth in the matter? Unfortunately for me, the long time Toyota loyalist, there is.

Suppose there is a problem in the ETC (Electronic Throttle Control) system, a system that employs sensors to measure the pedal position, more sensors to measure the actual throttle position, an electric component to actually open the throttle, and a computer that brings it all together. Suppose this problem causes the throttle to flip wide open. Then there is a way where this will suck quite badly.

The problem is in fact the lack of sucking, or vacuum. Petrol-engined vehicles use manifold vacuum to amplify brake force. They include a small reserve-tank to store a bit of vacuum to allow you to brake once or twice even if the engine fails. In my personal experience, you get to brake 4 or 5 times before the vacuum runs out, but the point is that the vacuum MAY run out unless it is replenished. There is however very little vacuum available in an engine with a wide open throttle. Now if you don't panic and you don't pump away your reserve vacuum, you will always be able to stop the vehicle, but if you do waste the vacuum, the effect would be that the brake pedal becomes hard, and unless you know the technical details, you could be forgiven for thinking that your brakes have failed. This could account for the reports of brakes failing at the same time. It could be another reason why Europe don't seem to be plagued with this problem: Diesels are more prevalent, and a Diesel does not use intake vacuum because it usually doesn't have any, instead it uses a dedicated vacuum pump that is not affected by the engine speed (as long as the engine speed isn't zero).

The problem for the complainant, in the above scenario, is that it still involves a measure of driver error: The driver does not know 1) to press the start/stop button for three seconds to turn off the engine (this is documented in the manual, which they did not read), 2) shifting into Neutral will take power away from the wheels or 3) don't pump the brakes.

Unfortunately the above also includes a problem for the manufacturer, a problem in the shape of a bug in the firmware of the ETC. To understand this, you have to understand how the system works.

The Toyota throttle, and indeed those used by most other manufacturers who employ drive by wire, has two sensors attached to it. These are hall sensors who do not employ physical contact and do not wear. These sensors are employed in a voltage-divider setup, so that increased pressure on the pedal will cause increased voltages to show up at the ETC on the other side of the connecting wires. The sensors each have their own set of wires and do not share connections, not even a ground wire. The wires entering the pedal assembly are ordered in such a manner that a short between the wires will more likely cause a fault condition. In addition to that, the two hall-sensors produce different output voltages, so that a short between the signal lines will cause one or both to go out of range, enabling the ETC to detect the problem. You may recall that a certain Professor Gilbert managed to get around this by inserting a 200k resistor between the two signal lines, which are situated at opposite ends of the pedal assembly so that it is incredibly unlikely that this will ever happen by accident, but I digress.

The two values read from the pedal sensors are fed to a computer, which calculates how much current to send to the electrical device that operates the throttle (actually it uses Pulse Width modulation, but its close enough for explanation purposes). On the throttle body itself there are two sensors that reads the throttle position, and sends them back to the computer. The computer therefore has two values of what the driver wants, and two values of what is actually happening on the engine. These are all supposed to match up, or the engine will simply be killed.

Is there a way that this might all go wrong? Dijkstra said that it is possible to prove that there is a bug in a program, but not to prove that it is fault-free. In other words, yes. The initial idea I came up with was a buffer overflow, but then I realised that the ETC probably does very little to no string processing and that properly laid out integer values in memory should never overrun. But then I remembered the old MP9 we had in the citigolf...

To recap: The MP9 ecu also has a throttle position sensor, only one though, since the throttle is still mechanical and its just for information purposes. Among others, it reads the resting position of the throttle, and stores it in internal flash memory. It needs this to properly idle the engine, something caused in turn by petrol engines not being able to idle properly unless the mixture is rich, but once again I'm going off topic. Over time, your throttle position will usually increase as dirt accumulates in the throttle body, but once on a rainy day a piece of dirt might in fact drop out and the throttle resting position might decrease a bit.

Somewhere in its calculations, it subtract the resting position from another value. When the resting position of the throttle suddenly decreases, the result ends up being negative, but because of a simple bug in the software (using an unsigned variable), the value ends up being an extremely large positive number. In the case of the mp9, there are extra validation code that detects this, logs an error, and attempts to use some kind of safe value so that your car stays drivable, although it might fail to idle properly.

If you were paying attention, you probably know where this is going. If a similar bug exists in the Toyota ETC, such an extremely large positive number could be a disaster.

Given the extremely low number of occurrences, even in America, this is probably an edge case that is incredibly unlikely to happen. What we all hope is that someone at Toyota is not arguing that it will only happen under extremely exceptional cases and that it is therefore not a problem.

Toyota is however adamant that the problem is mechanical and NOT electronic. If they actually thought of the edge cases (and I expect they did), and they did not use a rookie-programmer to write the code (I doubt they did), I expect that the above possible software problem would be either VERY hard to find, or it doesn't exist.

For the moment, I'm watching this with keen interest, because however it turns out, it will either make an extremely interesting case in the dangers of software bugs, or it will completely vindicate (and restore) my faith.

In the mean time, I still drive a Toyota. I am not scared of it... for it is a manual, it is a diesel, and it does not have a push-button start :-)