-nvidia upgrade issues

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 | Next >

-nvidia upgrade issues

by Bryce Harrington-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I've been looking into some problems people have been reporting
upgrading to Karmic with -nvidia installed.

One thing I've noticed is aside from whatever issue is occuring with
nvidia, there are bugs elsewhere which are compounding the problems and
leading to some poor user experiences.  A common scenario occurs if for
whatever reason the -nvidia kernel module fails to build in DKMS:

438398 - If DKMS fails to build the kernel module, the package upgrade
does not kick out.  It shows package upgrade as successful.  So this
leads directly to...

451305 - Jockey misses that the driver failed to build, and so is not
letting users know about the potential problem.  It goes ahead and
updates xorg.conf as if the driver was there.  X tries to obey the
configuration settings, but of course they won't work, so it exits on
startup with an error message.  *Normally* bulletproof-X would kick in
at this point, display the error to the user, and give them some tools
to diagnose and/or debug the situation.  Unfortunately...

474806 - The new gdm no longer supports the FailsafeXServer option, so
the diagnostic session no longer can be triggered to come up.  Instead,
gdm tries several times, then gives up, but then...

441638 - The gdm upstart job notices gdm has failed and so restarts it.
X of course continues to fail, gdm tries a few times and continues to
fail, repeat ad infinitum, and the user is just left looking at a
flashing screen.  Ick.


The above appears to be a pretty common scenario that we're getting a
rash of bug reports about.  It's hard to be certain because many of the
bug reports are only including information about the failed boot, not on
the failed build.  So I'm not sure if it is just one reason why the
build fails, or several.  However if we can solve the above bugs it
should give much better visibility into things.


Btw, workaround for anyone experiencing this issue is to purge your
nvidia (and fglrx) packages, remove /etc/X11/xorg.conf, and reinstall
nvidia (or fglrx).  It appears that in most of the bug reports this gets
the system functioning again.  Doing a full reinstall of Ubuntu rather
than an upgrade also appears to work around the issues.

Bryce

--
ubuntu-devel mailing list
ubuntu-devel@...
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

Re: [ubuntu-x] -nvidia upgrade issues

by Bryce Harrington-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Nov 04, 2009 at 02:26:56PM -0800, Bryce Harrington wrote:
> I've been looking into some problems people have been reporting
> upgrading to Karmic with -nvidia installed.
>
> One thing I've noticed is aside from whatever issue is occuring with
> nvidia, there are bugs elsewhere which are compounding the problems and
> leading to some poor user experiences.  A common scenario occurs if for
> whatever reason the -nvidia kernel module fails to build in DKMS:

One example is if 'patch' is not installed - bug 434154.  Then if there
are patches to be applied to the kernel module, DKMS fails.

I can't tell yet if this situation is the root cause of most of the
issues or just a corner case, but certainly that's something that could
have been missed during development since most developers *would* have
had patch installed.

> 438398 - If DKMS fails to build the kernel module, the package upgrade
> does not kick out.  It shows package upgrade as successful.  So this
> leads directly to...
>
> 451305 - Jockey misses that the driver failed to build, and so is not
> letting users know about the potential problem.  It goes ahead and
> updates xorg.conf as if the driver was there.  X tries to obey the
> configuration settings, but of course they won't work, so it exits on
> startup with an error message.  *Normally* bulletproof-X would kick in
> at this point, display the error to the user, and give them some tools
> to diagnose and/or debug the situation.  Unfortunately...
>
> 474806 - The new gdm no longer supports the FailsafeXServer option, so
> the diagnostic session no longer can be triggered to come up.  Instead,
> gdm tries several times, then gives up, but then...
>
> 441638 - The gdm upstart job notices gdm has failed and so restarts it.
> X of course continues to fail, gdm tries a few times and continues to
> fail, repeat ad infinitum, and the user is just left looking at a
> flashing screen.  Ick.
>
>
> The above appears to be a pretty common scenario that we're getting a
> rash of bug reports about.  It's hard to be certain because many of the
> bug reports are only including information about the failed boot, not on
> the failed build.  So I'm not sure if it is just one reason why the
> build fails, or several.  However if we can solve the above bugs it
> should give much better visibility into things.
>
>
> Btw, workaround for anyone experiencing this issue is to purge your
> nvidia (and fglrx) packages, remove /etc/X11/xorg.conf, and reinstall
> nvidia (or fglrx).  It appears that in most of the bug reports this gets
> the system functioning again.  Doing a full reinstall of Ubuntu rather
> than an upgrade also appears to work around the issues.
>
> Bryce
>
> --
> Ubuntu-x mailing list
> Ubuntu-x@...
> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-x

--
ubuntu-devel mailing list
ubuntu-devel@...
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

Re: -nvidia upgrade issues

by Mario Limonciello-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Bryce:

I've got a couple of comments i'll echo here

On Wed, Nov 4, 2009 at 16:26, Bryce Harrington <bryce@...> wrote:
I've been looking into some problems people have been reporting
upgrading to Karmic with -nvidia installed.

One thing I've noticed is aside from whatever issue is occuring with
nvidia, there are bugs elsewhere which are compounding the problems and
leading to some poor user experiences.  A common scenario occurs if for
whatever reason the -nvidia kernel module fails to build in DKMS:

It would be very good to try to get a sampling of why the kernel modules are failing to build.  Can you try to get people to collect the failed make.log's in these scenarios?
 

438398 - If DKMS fails to build the kernel module, the package upgrade
does not kick out.  It shows package upgrade as successful.  So this
leads directly to...


So the problem with declaring the package as failed if the DKMS build failed is that it may actually pass or fail depending on how far along into the updates you are.

Say you are updating to a new linux-headers with a new ABI at the same time as installing the NVIDIA package.

Well if the NVIDIA package is processed first, the headers aren't yet installed, so the package will fail during postinst, but as soon as the headers are loaded, the kernel postinst runs and the modules get successfully built.

Perhaps a potential solution is to look into whether the headers are yet available for this kernel, and if they aren't don't let the DKMS build fail cause the postinst to fail, but in any other scenario let the postinst fail.
 
451305 - Jockey misses that the driver failed to build, and so is not
letting users know about the potential problem.  It goes ahead and
updates xorg.conf as if the driver was there.  X tries to obey the
configuration settings, but of course they won't work, so it exits on
startup with an error message.  *Normally* bulletproof-X would kick in
at this point, display the error to the user, and give them some tools
to diagnose and/or debug the situation.  Unfortunately...

I see three potential improvements to Jockey for this scenario.

  1. Have Jockey be able to work in an interactive frontend.  If the package install behavior is modified to query if the headers are yet available, then you can more nicely present this information to the user
  2. Have Jockey check for the headers for the current kernel before even starting to install the packages.
  3. Before modifying the xorg.conf, do the equivalent of a modinfo nvidia to determine if the nvidia kernel module is indeed created.  Show a warning/error otherwise.
 

474806 - The new gdm no longer supports the FailsafeXServer option, so
the diagnostic session no longer can be triggered to come up.  Instead,
gdm tries several times, then gives up, but then...

441638 - The gdm upstart job notices gdm has failed and so restarts it.
X of course continues to fail, gdm tries a few times and continues to
fail, repeat ad infinitum, and the user is just left looking at a
flashing screen.  Ick.

This has been a pet peeve of mine too, so i'm glad to see a karmic-updates milestoned task on this bug.
 

The above appears to be a pretty common scenario that we're getting a
rash of bug reports about.  It's hard to be certain because many of the
bug reports are only including information about the failed boot, not on
the failed build.  So I'm not sure if it is just one reason why the
build fails, or several.  However if we can solve the above bugs it
should give much better visibility into things.


Btw, workaround for anyone experiencing this issue is to purge your
nvidia (and fglrx) packages, remove /etc/X11/xorg.conf, and reinstall
nvidia (or fglrx).  It appears that in most of the bug reports this gets
the system functioning again.  Doing a full reinstall of Ubuntu rather
than an upgrade also appears to work around the issues.

Bryce

--
ubuntu-devel mailing list
ubuntu-devel@...
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel



--
Mario Limonciello
superm1@...
Sent from Manchester, New Hampshire, United States
--
ubuntu-devel mailing list
ubuntu-devel@...
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

Re: -nvidia upgrade issues

by Bryce Harrington-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Nov 04, 2009 at 05:08:17PM -0600, Mario Limonciello wrote:

> Hi Bryce:
>
> I've got a couple of comments i'll echo here
>
> On Wed, Nov 4, 2009 at 16:26, Bryce Harrington <bryce@...> wrote:
>
> > I've been looking into some problems people have been reporting
> > upgrading to Karmic with -nvidia installed.
> >
> > One thing I've noticed is aside from whatever issue is occuring with
> > nvidia, there are bugs elsewhere which are compounding the problems and
> > leading to some poor user experiences.  A common scenario occurs if for
> > whatever reason the -nvidia kernel module fails to build in DKMS:
> >
>
> It would be very good to try to get a sampling of why the kernel modules are
> failing to build.  Can you try to get people to collect the failed
> make.log's in these scenarios?

Sure.  Maybe we also need to update ubuntu-bug to automatically attach
those files for nvidia bugs.  Let me know if there are any other files
that are useful for debugging -nvidia or dkms issues and I'll add them
in as well.

> > 438398 - If DKMS fails to build the kernel module, the package upgrade
> > does not kick out.  It shows package upgrade as successful.  So this
> > leads directly to...
> >
> >
> So the problem with declaring the package as failed if the DKMS build failed
> is that it may actually pass or fail depending on how far along into the
> updates you are.
>
> Say you are updating to a new linux-headers with a new ABI at the same time
> as installing the NVIDIA package.
>
> Well if the NVIDIA package is processed first, the headers aren't yet
> installed, so the package will fail during postinst, but as soon as the
> headers are loaded, the kernel postinst runs and the modules get
> successfully built.
> Perhaps a potential solution is to look into whether the headers are yet
> available for this kernel, and if they aren't don't let the DKMS build fail
> cause the postinst to fail, but in any other scenario let the postinst fail.

*Nod* Also there is at least one bug report where it is claimed dkms was
doing its thing while gdm was starting up, and since the module hadn't
finished building, boom.  Bug 453365.

> > 451305 - Jockey misses that the driver failed to build, and so is not
> > letting users know about the potential problem.  It goes ahead and
> > updates xorg.conf as if the driver was there.  X tries to obey the
> > configuration settings, but of course they won't work, so it exits on
> > startup with an error message.  *Normally* bulletproof-X would kick in
> > at this point, display the error to the user, and give them some tools
> > to diagnose and/or debug the situation.  Unfortunately...
> >
>
> I see three potential improvements to Jockey for this scenario.
>
>
>    1. Have Jockey be able to work in an interactive frontend.  If the
>    package install behavior is modified to query if the headers are yet
>    available, then you can more nicely present this information to the user
>    2. Have Jockey check for the headers for the current kernel before even
>    starting to install the packages.
>    3. Before modifying the xorg.conf, do the equivalent of a modinfo nvidia
>    to determine if the nvidia kernel module is indeed created.  Show a
>    warning/error otherwise.

Agreed.  All three would be worth having, I would prioritize #3 since it
sounds like it would require the least code change and may be quickest
to get an SRU on.  Pitti, opinions?

> > 474806 - The new gdm no longer supports the FailsafeXServer option, so
> > the diagnostic session no longer can be triggered to come up.  Instead,
> > gdm tries several times, then gives up, but then...
> >
> > 441638 - The gdm upstart job notices gdm has failed and so restarts it.
> > X of course continues to fail, gdm tries a few times and continues to
> > fail, repeat ad infinitum, and the user is just left looking at a
> > flashing screen.  Ick.
> >
> > This has been a pet peeve of mine too, so i'm glad to see a karmic-updates
> milestoned task on this bug.

Yeah, I brought this one up pre-release but I guess too late to solve it
before the release was finalized.  I hope we can see an SRU on it soon.

Bryce

--
ubuntu-devel mailing list
ubuntu-devel@...
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

Re: -nvidia upgrade issues

by Bryce Harrington-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Nov 04, 2009 at 05:08:17PM -0600, Mario Limonciello wrote:

> Hi Bryce:
>
> I've got a couple of comments i'll echo here
>
> On Wed, Nov 4, 2009 at 16:26, Bryce Harrington <bryce@...> wrote:
>
> > I've been looking into some problems people have been reporting
> > upgrading to Karmic with -nvidia installed.
> >
> > One thing I've noticed is aside from whatever issue is occuring with
> > nvidia, there are bugs elsewhere which are compounding the problems and
> > leading to some poor user experiences.  A common scenario occurs if for
> > whatever reason the -nvidia kernel module fails to build in DKMS:
> >
>
> It would be very good to try to get a sampling of why the kernel modules are
> failing to build.  Can you try to get people to collect the failed
> make.log's in these scenarios?

Bug 450238 adds some further information as to what might be going
wrong:

"""
Adding Module to DKMS build system
+ dkms add -m nvidia -v

Error! Invalid number of arguments passed.
Usage: add -m <module> -v <module-version>

The reason for this is, that the script uses a variable $CVERSION that
is never defined. Adding it manually works:

dkms add -m nvidia -v 185.18.36

Creating symlink /var/lib/dkms/nvidia/185.18.36/source ->
                 /usr/src/nvidia-185.18.36

DKMS: add Completed.
"""

Bryce

--
ubuntu-devel mailing list
ubuntu-devel@...
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

Re: -nvidia upgrade issues

by Bryce Harrington-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Nov 04, 2009 at 04:50:57PM -0800, Bryce Harrington wrote:

> On Wed, Nov 04, 2009 at 05:08:17PM -0600, Mario Limonciello wrote:
> > Hi Bryce:
> >
> > I've got a couple of comments i'll echo here
> >
> > On Wed, Nov 4, 2009 at 16:26, Bryce Harrington <bryce@...> wrote:
> >
> > > I've been looking into some problems people have been reporting
> > > upgrading to Karmic with -nvidia installed.
> > >
> > > One thing I've noticed is aside from whatever issue is occuring with
> > > nvidia, there are bugs elsewhere which are compounding the problems and
> > > leading to some poor user experiences.  A common scenario occurs if for
> > > whatever reason the -nvidia kernel module fails to build in DKMS:
> > >
> >
> > It would be very good to try to get a sampling of why the kernel modules are
> > failing to build.  Can you try to get people to collect the failed
> > make.log's in these scenarios?
>
> Bug 450238 adds some further information as to what might be going
> wrong:

Sorry false alarm, this simply appears to be a dupe of a bug you already
fixed in the package a few weeks ago.

> """
> Adding Module to DKMS build system
> + dkms add -m nvidia -v
>
> Error! Invalid number of arguments passed.
> Usage: add -m <module> -v <module-version>
>
> The reason for this is, that the script uses a variable $CVERSION that
> is never defined. Adding it manually works:
>
> dkms add -m nvidia -v 185.18.36
>
> Creating symlink /var/lib/dkms/nvidia/185.18.36/source ->
>                  /usr/src/nvidia-185.18.36
>
> DKMS: add Completed.
> """
>
> Bryce
>
> --
> ubuntu-devel mailing list
> ubuntu-devel@...
> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

--
ubuntu-devel mailing list
ubuntu-devel@...
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

Re: -nvidia upgrade issues

by Steve Langasek-6 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Nov 04, 2009 at 02:26:56PM -0800, Bryce Harrington wrote:
> 474806 - The new gdm no longer supports the FailsafeXServer option, so
> the diagnostic session no longer can be triggered to come up.  Instead,
> gdm tries several times, then gives up, but then...

> 441638 - The gdm upstart job notices gdm has failed and so restarts it.
> X of course continues to fail, gdm tries a few times and continues to
> fail, repeat ad infinitum, and the user is just left looking at a
> flashing screen.  Ick.

Fixes for both of these are now in the karmic-proposed queue.

Cheers,
--
Steve Langasek                   Give me a lever long enough and a Free OS
Debian Developer                   to set it on, and I can move the world.
Ubuntu Developer                                    http://www.debian.org/
slangasek@...                                     vorlon@...


--
ubuntu-devel mailing list
ubuntu-devel@...
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

signature.asc (844 bytes) Download Attachment

Re: -nvidia upgrade issues

by Bryce Harrington-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Nov 04, 2009 at 05:08:17PM -0600, Mario Limonciello wrote:

> Hi Bryce:
>
> I've got a couple of comments i'll echo here
>
> On Wed, Nov 4, 2009 at 16:26, Bryce Harrington <bryce@...> wrote:
>
> > I've been looking into some problems people have been reporting
> > upgrading to Karmic with -nvidia installed.
> >
> > One thing I've noticed is aside from whatever issue is occuring with
> > nvidia, there are bugs elsewhere which are compounding the problems and
> > leading to some poor user experiences.  A common scenario occurs if for
> > whatever reason the -nvidia kernel module fails to build in DKMS:
> >
>
> It would be very good to try to get a sampling of why the kernel modules are
> failing to build.  Can you try to get people to collect the failed
> make.log's in these scenarios?

Btw, in poking around in dkms.conf I noticed this:

PACKAGE_VERSION="185.18.31"

Shouldn't that be 185.18.36?  Or am I misunderstanding the purpose of
this file?

Bryce


--
ubuntu-devel mailing list
ubuntu-devel@...
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

Re: -nvidia upgrade issues

by Martin Pitt-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Mario Limonciello [2009-11-04 17:08 -0600]:
> I see three potential improvements to Jockey for this scenario.
>
>    1. Have Jockey be able to work in an interactive frontend.  If the
>    package install behavior is modified to query if the headers are yet
>    available, then you can more nicely present this information to the user

What do you mean by "interactive frontend"? For debconf you mean?
I'm afraid that requires a rewrite of Jockey, since it's currently
frontend <-> dbus <-> backend <-> python-apt, so the backend doesn't
have X access. I'm afraid this isn't SRUable.

>    2. Have Jockey check for the headers for the current kernel before even
>    starting to install the packages.
>    3. Before modifying the xorg.conf, do the equivalent of a modinfo nvidia
>    to determine if the nvidia kernel module is indeed created.  Show a
>    warning/error otherwise.

Those make a lot of sense. I'll see to fixing those ASAP and SRU them.

Thanks,

Martin

--
Martin Pitt                        | http://www.piware.de
Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)


--
ubuntu-devel mailing list
ubuntu-devel@...
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

signature.asc (204 bytes) Download Attachment

Re: -nvidia upgrade issues

by Amit Kucheria-6 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Nov 5, 2009 at 12:26 AM, Bryce Harrington <bryce@...> wrote:
> I've been looking into some problems people have been reporting
> upgrading to Karmic with -nvidia installed.
>

<snip>

I filed a bug 456240 regarding the dkms package failing to compile. My
video still works and I haven't had a chance to track the bug down.

I've attached a log to they bug. I'd be happy to provide more
information if necessary.

/Amit

--
ubuntu-devel mailing list
ubuntu-devel@...
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

Re: -nvidia upgrade issues

by Steve Langasek-6 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Nov 04, 2009 at 05:08:17PM -0600, Mario Limonciello wrote:
> > 438398 - If DKMS fails to build the kernel module, the package upgrade
> > does not kick out.  It shows package upgrade as successful.  So this
> > leads directly to...

> So the problem with declaring the package as failed if the DKMS build failed
> is that it may actually pass or fail depending on how far along into the
> updates you are.

> Say you are updating to a new linux-headers with a new ABI at the same time
> as installing the NVIDIA package.

> Well if the NVIDIA package is processed first, the headers aren't yet
> installed, so the package will fail during postinst, but as soon as the
> headers are loaded, the kernel postinst runs and the modules get
> successfully built.
> Perhaps a potential solution is to look into whether the headers are yet
> available for this kernel, and if they aren't don't let the DKMS build fail
> cause the postinst to fail, but in any other scenario let the postinst fail.

I wonder if a dpkg trigger wouldn't help here for lucid (not for SRU): each
dkms module package registers its interest in an appropriate file pattern,
and at the end of the corresponding dpkg run the trigger fires to try to do
the module compilation?  This would have the advantage that dpkg would then
have information about exactly which dkms packages failed to build, but I
haven't thought this through completely to be sure it's worth doing and
doesn't have any major design pitfalls.

--
Steve Langasek                   Give me a lever long enough and a Free OS
Debian Developer                   to set it on, and I can move the world.
Ubuntu Developer                                    http://www.debian.org/
slangasek@...                                     vorlon@...


--
ubuntu-devel mailing list
ubuntu-devel@...
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

signature.asc (844 bytes) Download Attachment

Re: -nvidia upgrade issues

by Mario Limonciello-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Bryce:

On Thu, Nov 5, 2009 at 01:44, Bryce Harrington <bryce@...> wrote:
Btw, in poking around in dkms.conf I noticed this:

PACKAGE_VERSION="185.18.31"

Shouldn't that be 185.18.36?  Or am I misunderstanding the purpose of
this file?

If you have encountered a scenario where that doesn't reflect the version installed, that's a bug for sure in the nvidia driver package you are working with, and I am certain there will be future problems on such a system.



--
Mario Limonciello
superm1@...
Sent from Austin, Texas, United States
--
ubuntu-devel mailing list
ubuntu-devel@...
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

Re: -nvidia upgrade issues

by Mario Limonciello-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Martin:

On Thu, Nov 5, 2009 at 03:43, Martin Pitt <martin.pitt@...> wrote:
Mario Limonciello [2009-11-04 17:08 -0600]:
> I see three potential improvements to Jockey for this scenario.
>
>    1. Have Jockey be able to work in an interactive frontend.  If the
>    package install behavior is modified to query if the headers are yet
>    available, then you can more nicely present this information to the user

What do you mean by "interactive frontend"? For debconf you mean?
I'm afraid that requires a rewrite of Jockey, since it's currently
frontend <-> dbus <-> backend <-> python-apt, so the backend doesn't
have X access. I'm afraid this isn't SRUable.

As i'm sure you are aware, there are other deficiencies with the way things are done now
1) Not being able to represent whether something failed to install or download
2) Not being able to ask to insert CD media if it's present there

As you said, this doesn't sound SRUable, but for Lucid perhaps the better solution is to use python-aptdaemon.  It can certainly provide more of this information more easily.

--
Mario Limonciello
superm1@...
Sent from Austin, Texas, United States
--
ubuntu-devel mailing list
ubuntu-devel@...
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

Re: -nvidia upgrade issues

by Mario Limonciello-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Steve:

On Thu, Nov 5, 2009 at 06:04, Steve Langasek <steve.langasek@...> wrote:

I wonder if a dpkg trigger wouldn't help here for lucid (not for SRU): each
dkms module package registers its interest in an appropriate file pattern,
and at the end of the corresponding dpkg run the trigger fires to try to do
the module compilation?  This would have the advantage that dpkg would then
have information about exactly which dkms packages failed to build, but I
haven't thought this through completely to be sure it's worth doing and
doesn't have any major design pitfalls.

Reading through your idea it tentatively sounds like a good way to help things out.  It doesn't even need to be a pattern though.  The dkms_autoinstaller script is able to query what has and hasn't been built yet, and try to build things.  So if a dpkg trigger is set up to just call it at the end as necessary, that would work too.

Bryce:

Have you assembled a spec for Lucid we can talk about at UDS to try to help clean up these problems?

Thanks,
--
Mario Limonciello
superm1@...
Sent from Austin, Texas, United States
--
ubuntu-devel mailing list
ubuntu-devel@...
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

Re: -nvidia upgrade issues

by Steve Langasek-6 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Nov 05, 2009 at 07:35:50AM -0600, Mario Limonciello wrote:
> > I wonder if a dpkg trigger wouldn't help here for lucid (not for SRU): each
> > dkms module package registers its interest in an appropriate file pattern,
> > and at the end of the corresponding dpkg run the trigger fires to try to do
> > the module compilation?  This would have the advantage that dpkg would then
> > have information about exactly which dkms packages failed to build, but I
> > haven't thought this through completely to be sure it's worth doing and
> > doesn't have any major design pitfalls.

> Reading through your idea it tentatively sounds like a good way to help
> things out.  It doesn't even need to be a pattern though.  The
> dkms_autoinstaller script is able to query what has and hasn't been built
> yet, and try to build things.  So if a dpkg trigger is set up to just call
> it at the end as necessary, that would work too.

It does have to be either a pattern, or an explicit trigger invocation from
the kernel package maintainer script; those are the ways dpkg knows which
triggers need to be called.

  http://www.dpkg.org/dpkg/Triggers

And a file pattern would be preferable if it's possible, because that
doesn't require further coordination regarding the contents of the kernel
packages' maintainer scripts.

--
Steve Langasek                   Give me a lever long enough and a Free OS
Debian Developer                   to set it on, and I can move the world.
Ubuntu Developer                                    http://www.debian.org/
slangasek@...                                     vorlon@...


--
ubuntu-devel mailing list
ubuntu-devel@...
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

signature.asc (844 bytes) Download Attachment

Re: -nvidia upgrade issues

by Steve Langasek-6 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Nov 05, 2009 at 05:46:47AM -0800, Steve Langasek wrote:
>   http://www.dpkg.org/dpkg/Triggers

Hmm, that seems to be a terrible link. :)  Let's try

  https://wiki.ubuntu.com/DpkgTriggers

instead.

--
Steve Langasek                   Give me a lever long enough and a Free OS
Debian Developer                   to set it on, and I can move the world.
Ubuntu Developer                                    http://www.debian.org/
slangasek@...                                     vorlon@...


--
ubuntu-devel mailing list
ubuntu-devel@...
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

signature.asc (844 bytes) Download Attachment

Re: -nvidia upgrade issues

by Raphael Hertzog :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Le jeudi 05 novembre 2009, Steve Langasek a écrit :
> It does have to be either a pattern, or an explicit trigger invocation from
> the kernel package maintainer script; those are the ways dpkg knows which
> triggers need to be called.
>
>   http://www.dpkg.org/dpkg/Triggers

You should not refer to this page as documentation of the dpkg triggers.
The only reliable documentation is the one integrated in dpkg itself.
This blog entry is also interesting:
http://www.seanius.net/blog/2009/09/dpkg-triggers-howto/

Cheers,
--
Raphaël Hertzog -+- http://www.ouaza.com

Freexian : des développeurs Debian au service des entreprises
http://www.freexian.com

--
ubuntu-devel mailing list
ubuntu-devel@...
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

Re: -nvidia upgrade issues

by Bryce Harrington-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Nov 05, 2009 at 07:29:04AM -0600, Mario Limonciello wrote:

> Hi Bryce:
>
> On Thu, Nov 5, 2009 at 01:44, Bryce Harrington <bryce@...> wrote:
>
> > Btw, in poking around in dkms.conf I noticed this:
> >
> > PACKAGE_VERSION="185.18.31"
> >
> > Shouldn't that be 185.18.36?  Or am I misunderstanding the purpose of
> > this file?
> >
> > If you have encountered a scenario where that doesn't reflect the version
> installed, that's a bug for sure in the nvidia driver package you are
> working with, and I am certain there will be future problems on such a
> system.

No, this was just from the copy in the source package.  Sounds like it
gets replaced by the one generated from dkms.conf.in so this dkms.conf
appears to just be a stray.

Thanks,
Bryce

--
ubuntu-devel mailing list
ubuntu-devel@...
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

(Update) Re: -nvidia upgrade issues

by Bryce Harrington-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

The two worst bugs are fixed, and the other two are at least understood
now but I could use a bit more advice.  It seems there is a weird race
condition with DKMS/upstart/nvidia which has cropped up because due to
faster boot, that looks tricky to get sorted, so feedback from people
with experience in DKMS/upstart matters would be helpful.

From what I understand, when doing an upgrade it installs both nvidia
and a new kernel (2.6.31).  At that point nvidia.ko is built against the
*old* kernel (2.6.28).  Fine, a nvidia.ko was successfully built so
installation completes without error.  xorg.conf is updated and the
system is ready to run nvidia.  Or so it thinks.

Now the user reboots.

During boot, dpkg notes that it needs to build a new nvidia.ko for
2.6.31 and dutifully gets to work.  Meanwhile, since X is being started
early on in the boot cycle, it in fact starts up before dkms has
finished building the new nvidia.ko.  X starts booting nvidia but since
there is not yet an nvidia.ko for the current kernel it exits with an
error.

I'm going to see if I can reproduce this synthetically, but meanwhile
does this theory make sense?  If so, is there a dkms/upstart trick we
could do to work around the issue in Karmic?  And for Lucid what would
the "right" solution be?


Further notes on the other nvidia issues below...

On Wed, Nov 04, 2009 at 02:26:56PM -0800, Bryce Harrington wrote:

> I've been looking into some problems people have been reporting
> upgrading to Karmic with -nvidia installed.
>
> One thing I've noticed is aside from whatever issue is occuring with
> nvidia, there are bugs elsewhere which are compounding the problems and
> leading to some poor user experiences.  A common scenario occurs if for
> whatever reason the -nvidia kernel module fails to build in DKMS:
>
> 438398 - If DKMS fails to build the kernel module, the package upgrade
> does not kick out.  It shows package upgrade as successful.  So this
> leads directly to...

In reviewing instances of nvidia failures, this particular scenario
appears to pop up less frequently in practice than I had initially
assumed, and mostly due to unusual corner cases like not having patch
installed, upgrading to Karmic directly from Hardy, etc.  It seems most
of these specific issues got fixed during development, just that the
bug reports didn't get closed.  The important point though is that these
failures ended up worse than they should have been, due to the following
bugs...

> 451305 - Jockey misses that the driver failed to build, and so is not
> letting users know about the potential problem.  It goes ahead and
> updates xorg.conf as if the driver was there.  X tries to obey the
> configuration settings, but of course they won't work, so it exits on
> startup with an error message.  *Normally* bulletproof-X would kick in
> at this point, display the error to the user, and give them some tools
> to diagnose and/or debug the situation.  Unfortunately...

Elsewhere in this thread several fixes/workarounds to this issue were
identified, which should greatly lessen the severity of these kinds of
error situations.
 
> 474806 - The new gdm no longer supports the FailsafeXServer option, so
> the diagnostic session no longer can be triggered to come up.  Instead,
> gdm tries several times, then gives up, but then...

This is fixed; we now no longer rely on gdm for doing the failsafe but
instead catch it with a simple upstart job and kick into failsafe-x mode.
Thanks Steve!

> 441638 - The gdm upstart job notices gdm has failed and so restarts it.
> X of course continues to fail, gdm tries a few times and continues to
> fail, repeat ad infinitum, and the user is just left looking at a
> flashing screen.  Ick.

Now that we have an upstart job handling this case, the blinking
situation will no longer happen.  This fix is SRU'd and uploaded to
ubuntu-proposed, and will go live before long.

Since this particular situation crops up right now mostly with nvidia,
people installing via the release livecd should be okay - that boots
with open source drivers, and when they choose to install nvidia it will
download that and (I assume) also update xorg to the version that
contains this fix.  So by the time they reboot they'll have the fix.
Steve, can you confirm?

> The above appears to be a pretty common scenario that we're getting a
> rash of bug reports about.  It's hard to be certain because many of the
> bug reports are only including information about the failed boot, not on
> the failed build.  So I'm not sure if it is just one reason why the
> build fails, or several.  However if we can solve the above bugs it
> should give much better visibility into things.
>
>
> Btw, workaround for anyone experiencing this issue is to purge your
> nvidia (and fglrx) packages, remove /etc/X11/xorg.conf, and reinstall
> nvidia (or fglrx).  It appears that in most of the bug reports this gets
> the system functioning again.  Doing a full reinstall of Ubuntu rather
> than an upgrade also appears to work around the issues.

It looks like simply doing a dpkg-reconfigure on the nvidia package is
sufficient to work around the issue, no need for reinstalling it
(although that'll work too).

Bryce

--
ubuntu-devel mailing list
ubuntu-devel@...
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

Re: (Update) Re: -nvidia upgrade issues

by Mike Rooney-6 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Nov 6, 2009 at 4:21 PM, Bryce Harrington <bryce@...> wrote:

> The two worst bugs are fixed, and the other two are at least understood
> now but I could use a bit more advice.  It seems there is a weird race
> condition with DKMS/upstart/nvidia which has cropped up because due to
> faster boot, that looks tricky to get sorted, so feedback from people
> with experience in DKMS/upstart matters would be helpful.
>
> From what I understand, when doing an upgrade it installs both nvidia
> and a new kernel (2.6.31).  At that point nvidia.ko is built against the
> *old* kernel (2.6.28).  Fine, a nvidia.ko was successfully built so
> installation completes without error.  xorg.conf is updated and the
> system is ready to run nvidia.  Or so it thinks.
>
> Now the user reboots.
>
> During boot, dpkg notes that it needs to build a new nvidia.ko for
> 2.6.31 and dutifully gets to work.  Meanwhile, since X is being started
> early on in the boot cycle, it in fact starts up before dkms has
> finished building the new nvidia.ko.  X starts booting nvidia but since
> there is not yet an nvidia.ko for the current kernel it exits with an
> error.

Chiming in as a user of -nvidia here, I just wanted to share my
upgrade experience here as a data point. Indeed nvidia does build
against the old kernel which seemed a little silly to me, but harmless
as I understand it. However I didn't have any issues on reboot. Maybe
it is just a race condition; to the very best of my memory, on a
reboot it built for .31 just fine and continued on. I had literally
zero problems anywhere with my dist upgrade. Though, I may have been
using one of the 9.04 X PPAs, if that is worth anything. Sorry if this
wasn't a useful data point :)

--
Michael Rooney
mrooney@...

--
ubuntu-devel mailing list
ubuntu-devel@...
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
< Prev | 1 - 2 | Next >