SUMMARY: VME board problems on 4/300

From: Chip Campbell (SYSCHIP@wren.sunnybrook.utoronto.ca)
Date: Fri Jul 26 1991 - 21:47:15 CDT


I recently asked about a problem we're having with a specific VME card
on our 4/360, which has quit working since we upgraded that machine's
CPU from a 3/260. I have received, to date, kind and helpful replies
from:

pvo@oce.orst.edu (Paul O'Neill)
vasey@mcc.com (Ron Vasey)
kevins@Aus.Sun.COM (Kevin Sheehan {Consulting Poster Child})
miker@sbcoc.com (Mike Raffety )
curt@ecn.purdue.edu (Curt Freeland)
tg@utstat.toronto.edu
todd@flex.Eng.McMaster.CA (Todd Pfaff)
stern@sunne.East.Sun.COM (Hal Stern - Consultant)

What follows is a summary of these replies. As we investigate the problem, I
will post a further summary of what we find and how (if) we fix it.

My original posting is at the very end.

Chip Campbell
Sunnybrook Health Science Centre
(University of Toronto)
Toronto, Ontario, Canada
campbell@srcl.sunnybrook.utoronto.ca
(416) 480-5718

====================== Summary follows ==============================

>What image processor is this?

MegaVision 1024 XM, from MegaVision Inc. in Goleta, Calif.

>re the 4/110:
>4/110 cpu boards are never bus slaves, always bus masters
>4/110 don't use address lines 28-31

>I'm not a system engineer, but I have read many messages in the past
>detailing timing problems in SUN's VME bus. (Basically, they use
>shorter-than-spec timing on certain WAIT and SYNC signals.) And this
>varies quite a bit between CPUs.

>There was a problem with the 4/300 VME implementation as I remember,
>and you should be able to get Sun to upgrade the board for you. The
>wierd thing is that it fails in a 4/280, as that was one of the better
>cage/CPU combinations. Also wierd that it worked in the 3/260, as that
>was a legal, if somewhat strange implementation.

>The other interesting thing is the sick CPU - the problem of which I
>speak was only with the VME board, it didn't affect the CPU at all. I
>suspect what you are running into here is the smaller power supply of
>the 4/300, not a VME problem per se.

Hmmm. We didn't change the power supply with the upgrade, so the 4/360
is running with the (larger?) power supply of the 3/260. It does have
the "six-fan tray" that the upgrade requires, due to the higher heat
output of the 4/300 cpu board. The machine also has a TAAC in it, which
draws a lot of power.

[ diag led's reading (downwards, 1=on) 11010001. ]
>This means the video memory failed. This is from the SunOS 4.0.3 PROM
>User's Manual, page 179. Sun unbundled this manual for SunOS 4.1 (;-(.
>Perhaps you DO have an addressing conflict?

> Have you
>tried putting the CPU board in "diag" mode with the switch near the serial
>connectors to see what it prints out? In diag mode, the system will tell
>you exactly what tests it is running as it powers up.
>
>Since it appears to be in the memory test portion, I have to ask if you removed
>the 3/200 memory from the bus? I have seen 4/300's get very ill when I forgot
>to remove the "wrong" memory from the bus while swapping CPU's around.

Good point, but yes, the old memory's out.

>I wish I could get a list from Sun on timing differences. We have a couple of
>homegrown boards that have fits on a 4/300 or 4/400, but seem to work fine
>on the 4/200 and below. As far as the backplane types, that should not matter.
>I have 8 4/300 boards in systems ranging from 3/100 to 4/400 backplanes.

>I don't know if this is a related problem but here it is anyway. We have
>a third party VMEbus IO controller which worked fine in a 3/260 and 4/470,
>but doesn't work in a 4/370 (the kernel panics with a data fault when a
>user program accesses the device driver). I wrote the driver for this device
>so it's possible the problem could be with the driver, but I doubt it since it
>worked fine in the other systems. Our Sun service rep thinks the cause may
>be a timing problem and he wants me to send him a crash dump for analysis.

> i'm assuming
>then that the board is a slave only (like a memory board, which
>leaves the jumpers in).

>the sun4 CPUs are obviously much faster than their sun3 brethren.
>this can trip you up if you don't expect setup/probe of the registers
>as quickly as the driver will perform them. that is, on a sun3,
>it may have taken N microseconds to poke the register and read
>back its contents. on a sun4, it could be N/2 or even N/4 microseconds.
>not all hardware can respond that quickly. some of the problems with
>sun's outboard ethernet controllers (ie1, ie2, etc) were caused by
>the intel chip requiring a little "settle" period that wasn't satisfied
>by the driver on a sun4. on a sun3, it wasn't a problem.

>how do the vendor's diag routines *not* go through a device driver?
>do they just open up /dev/vme16d32 and read and write to it?

Well, uh, I dunno, the vendor just said that this simple routine
was independent of the driver configuration.

================== My original posting follows ======================

We have a pressing problem that maybe a VME-bus expert out there can shed
some light on. We have a hand-wired 6U VME card that drives an external
image processor; this is a low-volume commercial product and not a one-off
thing. It worked fine in our 3/260, but now that we've upgraded that
machine's CPU to a 4/300 (the upgrade included a badge that says 4/360, but
don't expect Sun to know of that model) it doesn't work any more. It also
fails in our 4/280. The vendor claims identical cards are working nicely
in 4/110's, including one they have in-house. The order in which we
swapped the card (3/260 -> 4/280 -> 3/260 -> upgrade) rules out damage to
the card or cabling mistakes, honest. (The "4/280" was itself a 3/400 that
we up(?)graded to a 4/200; this was a shipping error and not a planned
upgrade.)

The card worked in slot 9 of the 3/260; in the 4/360, it fails in slot 9,
and causes the Sun to fail to boot when in slot 6, 5 or 2. I was careful
with the bus jumpers for both this card and the others that I displaced
(this card uses both jumpers "in"). In these slots, the cpu hangs with its
diag led's reading (downwards, 1=on) 11010001. In slot 9, the machine is
totally unable to communicate with the card. Some simple diag routines
supplied by the vendor report "bus error" when attempting to read or write
registers on the card, without going through any device-driver. The card
is on vme24d16 at x280000; we have no other device anywhere near those
addresses.

The vendor, who claims this is a very generic VME card that should be happy
in most any slot, can only suggest a timing problem, and suggests the
logic-analyzer route, which is not in itself beyond our capabilities but may
be more trouble than I'm allowed to go to. Other options include moving the
processor to a remaining 3/260 (geo-political issues) and putting it on a
PC host, which the vendor supports (requires outlay for board, software).

I do not expect this kind of problem easily fixed by e-mail. However, any
insights at all would be greatly appreciated. I see the major questions as
(1) What does that pattern of diag LED's mean? and (2) (Essay question)
What are the known timing and bus differences between the various CPUs
(3/200, 4/100, 4/200, 4/300)? Keep in mind that this is a 4/300 CPU in a
3/200 backplane, and the 4/280 has a 3/400 backplane.

I will happily summarize responses.



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:20 CDT