|
View:
New views
20 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 | Next > |
|
|
-nvidia upgrade issuesI've been looking into some problems people have been reporting
upgrading to Karmic with -nvidia installed. One thing I've noticed is aside from whatever issue is occuring with nvidia, there are bugs elsewhere which are compounding the problems and leading to some poor user experiences. A common scenario occurs if for whatever reason the -nvidia kernel module fails to build in DKMS: 438398 - If DKMS fails to build the kernel module, the package upgrade does not kick out. It shows package upgrade as successful. So this leads directly to... 451305 - Jockey misses that the driver failed to build, and so is not letting users know about the potential problem. It goes ahead and updates xorg.conf as if the driver was there. X tries to obey the configuration settings, but of course they won't work, so it exits on startup with an error message. *Normally* bulletproof-X would kick in at this point, display the error to the user, and give them some tools to diagnose and/or debug the situation. Unfortunately... 474806 - The new gdm no longer supports the FailsafeXServer option, so the diagnostic session no longer can be triggered to come up. Instead, gdm tries several times, then gives up, but then... 441638 - The gdm upstart job notices gdm has failed and so restarts it. X of course continues to fail, gdm tries a few times and continues to fail, repeat ad infinitum, and the user is just left looking at a flashing screen. Ick. The above appears to be a pretty common scenario that we're getting a rash of bug reports about. It's hard to be certain because many of the bug reports are only including information about the failed boot, not on the failed build. So I'm not sure if it is just one reason why the build fails, or several. However if we can solve the above bugs it should give much better visibility into things. Btw, workaround for anyone experiencing this issue is to purge your nvidia (and fglrx) packages, remove /etc/X11/xorg.conf, and reinstall nvidia (or fglrx). It appears that in most of the bug reports this gets the system functioning again. Doing a full reinstall of Ubuntu rather than an upgrade also appears to work around the issues. Bryce -- ubuntu-devel mailing list ubuntu-devel@... Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel |
|
|
Re: [ubuntu-x] -nvidia upgrade issuesOn Wed, Nov 04, 2009 at 02:26:56PM -0800, Bryce Harrington wrote:
> I've been looking into some problems people have been reporting > upgrading to Karmic with -nvidia installed. > > One thing I've noticed is aside from whatever issue is occuring with > nvidia, there are bugs elsewhere which are compounding the problems and > leading to some poor user experiences. A common scenario occurs if for > whatever reason the -nvidia kernel module fails to build in DKMS: One example is if 'patch' is not installed - bug 434154. Then if there are patches to be applied to the kernel module, DKMS fails. I can't tell yet if this situation is the root cause of most of the issues or just a corner case, but certainly that's something that could have been missed during development since most developers *would* have had patch installed. > 438398 - If DKMS fails to build the kernel module, the package upgrade > does not kick out. It shows package upgrade as successful. So this > leads directly to... > > 451305 - Jockey misses that the driver failed to build, and so is not > letting users know about the potential problem. It goes ahead and > updates xorg.conf as if the driver was there. X tries to obey the > configuration settings, but of course they won't work, so it exits on > startup with an error message. *Normally* bulletproof-X would kick in > at this point, display the error to the user, and give them some tools > to diagnose and/or debug the situation. Unfortunately... > > 474806 - The new gdm no longer supports the FailsafeXServer option, so > the diagnostic session no longer can be triggered to come up. Instead, > gdm tries several times, then gives up, but then... > > 441638 - The gdm upstart job notices gdm has failed and so restarts it. > X of course continues to fail, gdm tries a few times and continues to > fail, repeat ad infinitum, and the user is just left looking at a > flashing screen. Ick. > > > The above appears to be a pretty common scenario that we're getting a > rash of bug reports about. It's hard to be certain because many of the > bug reports are only including information about the failed boot, not on > the failed build. So I'm not sure if it is just one reason why the > build fails, or several. However if we can solve the above bugs it > should give much better visibility into things. > > > Btw, workaround for anyone experiencing this issue is to purge your > nvidia (and fglrx) packages, remove /etc/X11/xorg.conf, and reinstall > nvidia (or fglrx). It appears that in most of the bug reports this gets > the system functioning again. Doing a full reinstall of Ubuntu rather > than an upgrade also appears to work around the issues. > > Bryce > > -- > Ubuntu-x mailing list > Ubuntu-x@... > Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-x -- ubuntu-devel mailing list ubuntu-devel@... Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel |
|
|
Re: -nvidia upgrade issuesHi Bryce:
I've got a couple of comments i'll echo here On Wed, Nov 4, 2009 at 16:26, Bryce Harrington <bryce@...> wrote: I've been looking into some problems people have been reporting It would be very good to try to get a sampling of why the kernel modules are failing to build. Can you try to get people to collect the failed make.log's in these scenarios?
451305 - Jockey misses that the driver failed to build, and so is not I see three potential improvements to Jockey for this scenario.
This has been a pet peeve of mine too, so i'm glad to see a karmic-updates milestoned task on this bug.
-- Mario Limonciello superm1@... Sent from Manchester, New Hampshire, United States -- ubuntu-devel mailing list ubuntu-devel@... Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel |
|
|
Re: -nvidia upgrade issuesOn Wed, Nov 04, 2009 at 05:08:17PM -0600, Mario Limonciello wrote:
> Hi Bryce: > > I've got a couple of comments i'll echo here > > On Wed, Nov 4, 2009 at 16:26, Bryce Harrington <bryce@...> wrote: > > > I've been looking into some problems people have been reporting > > upgrading to Karmic with -nvidia installed. > > > > One thing I've noticed is aside from whatever issue is occuring with > > nvidia, there are bugs elsewhere which are compounding the problems and > > leading to some poor user experiences. A common scenario occurs if for > > whatever reason the -nvidia kernel module fails to build in DKMS: > > > > It would be very good to try to get a sampling of why the kernel modules are > failing to build. Can you try to get people to collect the failed > make.log's in these scenarios? Sure. Maybe we also need to update ubuntu-bug to automatically attach those files for nvidia bugs. Let me know if there are any other files that are useful for debugging -nvidia or dkms issues and I'll add them in as well. > > 438398 - If DKMS fails to build the kernel module, the package upgrade > > does not kick out. It shows package upgrade as successful. So this > > leads directly to... > > > > > So the problem with declaring the package as failed if the DKMS build failed > is that it may actually pass or fail depending on how far along into the > updates you are. > > Say you are updating to a new linux-headers with a new ABI at the same time > as installing the NVIDIA package. > > Well if the NVIDIA package is processed first, the headers aren't yet > installed, so the package will fail during postinst, but as soon as the > headers are loaded, the kernel postinst runs and the modules get > successfully built. > Perhaps a potential solution is to look into whether the headers are yet > available for this kernel, and if they aren't don't let the DKMS build fail > cause the postinst to fail, but in any other scenario let the postinst fail. *Nod* Also there is at least one bug report where it is claimed dkms was doing its thing while gdm was starting up, and since the module hadn't finished building, boom. Bug 453365. > > 451305 - Jockey misses that the driver failed to build, and so is not > > letting users know about the potential problem. It goes ahead and > > updates xorg.conf as if the driver was there. X tries to obey the > > configuration settings, but of course they won't work, so it exits on > > startup with an error message. *Normally* bulletproof-X would kick in > > at this point, display the error to the user, and give them some tools > > to diagnose and/or debug the situation. Unfortunately... > > > > I see three potential improvements to Jockey for this scenario. > > > 1. Have Jockey be able to work in an interactive frontend. If the > package install behavior is modified to query if the headers are yet > available, then you can more nicely present this information to the user > 2. Have Jockey check for the headers for the current kernel before even > starting to install the packages. > 3. Before modifying the xorg.conf, do the equivalent of a modinfo nvidia > to determine if the nvidia kernel module is indeed created. Show a > warning/error otherwise. Agreed. All three would be worth having, I would prioritize #3 since it sounds like it would require the least code change and may be quickest to get an SRU on. Pitti, opinions? > > 474806 - The new gdm no longer supports the FailsafeXServer option, so > > the diagnostic session no longer can be triggered to come up. Instead, > > gdm tries several times, then gives up, but then... > > > > 441638 - The gdm upstart job notices gdm has failed and so restarts it. > > X of course continues to fail, gdm tries a few times and continues to > > fail, repeat ad infinitum, and the user is just left looking at a > > flashing screen. Ick. > > > > This has been a pet peeve of mine too, so i'm glad to see a karmic-updates > milestoned task on this bug. Yeah, I brought this one up pre-release but I guess too late to solve it before the release was finalized. I hope we can see an SRU on it soon. Bryce -- ubuntu-devel mailing list ubuntu-devel@... Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel |
|
|
Re: -nvidia upgrade issuesOn Wed, Nov 04, 2009 at 05:08:17PM -0600, Mario Limonciello wrote:
> Hi Bryce: > > I've got a couple of comments i'll echo here > > On Wed, Nov 4, 2009 at 16:26, Bryce Harrington <bryce@...> wrote: > > > I've been looking into some problems people have been reporting > > upgrading to Karmic with -nvidia installed. > > > > One thing I've noticed is aside from whatever issue is occuring with > > nvidia, there are bugs elsewhere which are compounding the problems and > > leading to some poor user experiences. A common scenario occurs if for > > whatever reason the -nvidia kernel module fails to build in DKMS: > > > > It would be very good to try to get a sampling of why the kernel modules are > failing to build. Can you try to get people to collect the failed > make.log's in these scenarios? Bug 450238 adds some further information as to what might be going wrong: """ Adding Module to DKMS build system + dkms add -m nvidia -v Error! Invalid number of arguments passed. Usage: add -m <module> -v <module-version> The reason for this is, that the script uses a variable $CVERSION that is never defined. Adding it manually works: dkms add -m nvidia -v 185.18.36 Creating symlink /var/lib/dkms/nvidia/185.18.36/source -> /usr/src/nvidia-185.18.36 DKMS: add Completed. """ Bryce -- ubuntu-devel mailing list ubuntu-devel@... Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel |
|
|
Re: -nvidia upgrade issuesOn Wed, Nov 04, 2009 at 04:50:57PM -0800, Bryce Harrington wrote:
> On Wed, Nov 04, 2009 at 05:08:17PM -0600, Mario Limonciello wrote: > > Hi Bryce: > > > > I've got a couple of comments i'll echo here > > > > On Wed, Nov 4, 2009 at 16:26, Bryce Harrington <bryce@...> wrote: > > > > > I've been looking into some problems people have been reporting > > > upgrading to Karmic with -nvidia installed. > > > > > > One thing I've noticed is aside from whatever issue is occuring with > > > nvidia, there are bugs elsewhere which are compounding the problems and > > > leading to some poor user experiences. A common scenario occurs if for > > > whatever reason the -nvidia kernel module fails to build in DKMS: > > > > > > > It would be very good to try to get a sampling of why the kernel modules are > > failing to build. Can you try to get people to collect the failed > > make.log's in these scenarios? > > Bug 450238 adds some further information as to what might be going > wrong: Sorry false alarm, this simply appears to be a dupe of a bug you already fixed in the package a few weeks ago. > """ > Adding Module to DKMS build system > + dkms add -m nvidia -v > > Error! Invalid number of arguments passed. > Usage: add -m <module> -v <module-version> > > The reason for this is, that the script uses a variable $CVERSION that > is never defined. Adding it manually works: > > dkms add -m nvidia -v 185.18.36 > > Creating symlink /var/lib/dkms/nvidia/185.18.36/source -> > /usr/src/nvidia-185.18.36 > > DKMS: add Completed. > """ > > Bryce > > -- > ubuntu-devel mailing list > ubuntu-devel@... > Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel -- ubuntu-devel mailing list ubuntu-devel@... Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel |
|
|
Re: -nvidia upgrade issuesOn Wed, Nov 04, 2009 at 02:26:56PM -0800, Bryce Harrington wrote:
> 474806 - The new gdm no longer supports the FailsafeXServer option, so > the diagnostic session no longer can be triggered to come up. Instead, > gdm tries several times, then gives up, but then... > 441638 - The gdm upstart job notices gdm has failed and so restarts it. > X of course continues to fail, gdm tries a few times and continues to > fail, repeat ad infinitum, and the user is just left looking at a > flashing screen. Ick. Fixes for both of these are now in the karmic-proposed queue. Cheers, -- Steve Langasek Give me a lever long enough and a Free OS Debian Developer to set it on, and I can move the world. Ubuntu Developer http://www.debian.org/ slangasek@... vorlon@... -- ubuntu-devel mailing list ubuntu-devel@... Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel |
|
|
Re: -nvidia upgrade issuesOn Wed, Nov 04, 2009 at 05:08:17PM -0600, Mario Limonciello wrote:
> Hi Bryce: > > I've got a couple of comments i'll echo here > > On Wed, Nov 4, 2009 at 16:26, Bryce Harrington <bryce@...> wrote: > > > I've been looking into some problems people have been reporting > > upgrading to Karmic with -nvidia installed. > > > > One thing I've noticed is aside from whatever issue is occuring with > > nvidia, there are bugs elsewhere which are compounding the problems and > > leading to some poor user experiences. A common scenario occurs if for > > whatever reason the -nvidia kernel module fails to build in DKMS: > > > > It would be very good to try to get a sampling of why the kernel modules are > failing to build. Can you try to get people to collect the failed > make.log's in these scenarios? Btw, in poking around in dkms.conf I noticed this: PACKAGE_VERSION="185.18.31" Shouldn't that be 185.18.36? Or am I misunderstanding the purpose of this file? Bryce -- ubuntu-devel mailing list ubuntu-devel@... Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel |
|
|
Re: -nvidia upgrade issuesMario Limonciello [2009-11-04 17:08 -0600]:
> I see three potential improvements to Jockey for this scenario. > > 1. Have Jockey be able to work in an interactive frontend. If the > package install behavior is modified to query if the headers are yet > available, then you can more nicely present this information to the user What do you mean by "interactive frontend"? For debconf you mean? I'm afraid that requires a rewrite of Jockey, since it's currently frontend <-> dbus <-> backend <-> python-apt, so the backend doesn't have X access. I'm afraid this isn't SRUable. > 2. Have Jockey check for the headers for the current kernel before even > starting to install the packages. > 3. Before modifying the xorg.conf, do the equivalent of a modinfo nvidia > to determine if the nvidia kernel module is indeed created. Show a > warning/error otherwise. Those make a lot of sense. I'll see to fixing those ASAP and SRU them. Thanks, Martin -- Martin Pitt | http://www.piware.de Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org) -- ubuntu-devel mailing list ubuntu-devel@... Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel |
|
|
Re: -nvidia upgrade issuesOn Thu, Nov 5, 2009 at 12:26 AM, Bryce Harrington <bryce@...> wrote:
> I've been looking into some problems people have been reporting > upgrading to Karmic with -nvidia installed. > <snip> I filed a bug 456240 regarding the dkms package failing to compile. My video still works and I haven't had a chance to track the bug down. I've attached a log to they bug. I'd be happy to provide more information if necessary. /Amit -- ubuntu-devel mailing list ubuntu-devel@... Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel |
|
|
Re: -nvidia upgrade issuesOn Wed, Nov 04, 2009 at 05:08:17PM -0600, Mario Limonciello wrote:
> > 438398 - If DKMS fails to build the kernel module, the package upgrade > > does not kick out. It shows package upgrade as successful. So this > > leads directly to... > So the problem with declaring the package as failed if the DKMS build failed > is that it may actually pass or fail depending on how far along into the > updates you are. > Say you are updating to a new linux-headers with a new ABI at the same time > as installing the NVIDIA package. > Well if the NVIDIA package is processed first, the headers aren't yet > installed, so the package will fail during postinst, but as soon as the > headers are loaded, the kernel postinst runs and the modules get > successfully built. > Perhaps a potential solution is to look into whether the headers are yet > available for this kernel, and if they aren't don't let the DKMS build fail > cause the postinst to fail, but in any other scenario let the postinst fail. I wonder if a dpkg trigger wouldn't help here for lucid (not for SRU): each dkms module package registers its interest in an appropriate file pattern, and at the end of the corresponding dpkg run the trigger fires to try to do the module compilation? This would have the advantage that dpkg would then have information about exactly which dkms packages failed to build, but I haven't thought this through completely to be sure it's worth doing and doesn't have any major design pitfalls. -- Steve Langasek Give me a lever long enough and a Free OS Debian Developer to set it on, and I can move the world. Ubuntu Developer http://www.debian.org/ slangasek@... vorlon@... -- ubuntu-devel mailing list ubuntu-devel@... Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel |
|
|
Re: -nvidia upgrade issuesHi Bryce:
On Thu, Nov 5, 2009 at 01:44, Bryce Harrington <bryce@...> wrote: Btw, in poking around in dkms.conf I noticed this: If you have encountered a scenario where that doesn't reflect the version installed, that's a bug for sure in the nvidia driver package you are working with, and I am certain there will be future problems on such a system. -- Mario Limonciello superm1@... Sent from Austin, Texas, United States -- ubuntu-devel mailing list ubuntu-devel@... Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel |
|
|
Re: -nvidia upgrade issuesHi Martin:
On Thu, Nov 5, 2009 at 03:43, Martin Pitt <martin.pitt@...> wrote: Mario Limonciello [2009-11-04 17:08 -0600]: As i'm sure you are aware, there are other deficiencies with the way things are done now 1) Not being able to represent whether something failed to install or download 2) Not being able to ask to insert CD media if it's present there As you said, this doesn't sound SRUable, but for Lucid perhaps the better solution is to use python-aptdaemon. It can certainly provide more of this information more easily. -- Mario Limonciello superm1@... Sent from Austin, Texas, United States -- ubuntu-devel mailing list ubuntu-devel@... Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel |
|
|
Re: -nvidia upgrade issuesHi Steve:
On Thu, Nov 5, 2009 at 06:04, Steve Langasek <steve.langasek@...> wrote: --
Reading through your idea it tentatively sounds like a good way to help things out. It doesn't even need to be a pattern though. The dkms_autoinstaller script is able to query what has and hasn't been built yet, and try to build things. So if a dpkg trigger is set up to just call it at the end as necessary, that would work too. Bryce: Have you assembled a spec for Lucid we can talk about at UDS to try to help clean up these problems? Thanks, Mario Limonciello superm1@... Sent from Austin, Texas, United States -- ubuntu-devel mailing list ubuntu-devel@... Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel |
|
|
Re: -nvidia upgrade issuesOn Thu, Nov 05, 2009 at 07:35:50AM -0600, Mario Limonciello wrote:
> > I wonder if a dpkg trigger wouldn't help here for lucid (not for SRU): each > > dkms module package registers its interest in an appropriate file pattern, > > and at the end of the corresponding dpkg run the trigger fires to try to do > > the module compilation? This would have the advantage that dpkg would then > > have information about exactly which dkms packages failed to build, but I > > haven't thought this through completely to be sure it's worth doing and > > doesn't have any major design pitfalls. > Reading through your idea it tentatively sounds like a good way to help > things out. It doesn't even need to be a pattern though. The > dkms_autoinstaller script is able to query what has and hasn't been built > yet, and try to build things. So if a dpkg trigger is set up to just call > it at the end as necessary, that would work too. It does have to be either a pattern, or an explicit trigger invocation from the kernel package maintainer script; those are the ways dpkg knows which triggers need to be called. http://www.dpkg.org/dpkg/Triggers And a file pattern would be preferable if it's possible, because that doesn't require further coordination regarding the contents of the kernel packages' maintainer scripts. -- Steve Langasek Give me a lever long enough and a Free OS Debian Developer to set it on, and I can move the world. Ubuntu Developer http://www.debian.org/ slangasek@... vorlon@... -- ubuntu-devel mailing list ubuntu-devel@... Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel |
|
|
Re: -nvidia upgrade issuesOn Thu, Nov 05, 2009 at 05:46:47AM -0800, Steve Langasek wrote:
> http://www.dpkg.org/dpkg/Triggers Hmm, that seems to be a terrible link. :) Let's try https://wiki.ubuntu.com/DpkgTriggers instead. -- Steve Langasek Give me a lever long enough and a Free OS Debian Developer to set it on, and I can move the world. Ubuntu Developer http://www.debian.org/ slangasek@... vorlon@... -- ubuntu-devel mailing list ubuntu-devel@... Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel |
|
|
Re: -nvidia upgrade issuesLe jeudi 05 novembre 2009, Steve Langasek a écrit :
> It does have to be either a pattern, or an explicit trigger invocation from > the kernel package maintainer script; those are the ways dpkg knows which > triggers need to be called. > > http://www.dpkg.org/dpkg/Triggers You should not refer to this page as documentation of the dpkg triggers. The only reliable documentation is the one integrated in dpkg itself. This blog entry is also interesting: http://www.seanius.net/blog/2009/09/dpkg-triggers-howto/ Cheers, -- Raphaël Hertzog -+- http://www.ouaza.com Freexian : des développeurs Debian au service des entreprises http://www.freexian.com -- ubuntu-devel mailing list ubuntu-devel@... Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel |
|
|
Re: -nvidia upgrade issuesOn Thu, Nov 05, 2009 at 07:29:04AM -0600, Mario Limonciello wrote:
> Hi Bryce: > > On Thu, Nov 5, 2009 at 01:44, Bryce Harrington <bryce@...> wrote: > > > Btw, in poking around in dkms.conf I noticed this: > > > > PACKAGE_VERSION="185.18.31" > > > > Shouldn't that be 185.18.36? Or am I misunderstanding the purpose of > > this file? > > > > If you have encountered a scenario where that doesn't reflect the version > installed, that's a bug for sure in the nvidia driver package you are > working with, and I am certain there will be future problems on such a > system. No, this was just from the copy in the source package. Sounds like it gets replaced by the one generated from dkms.conf.in so this dkms.conf appears to just be a stray. Thanks, Bryce -- ubuntu-devel mailing list ubuntu-devel@... Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel |
|
|
(Update) Re: -nvidia upgrade issuesThe two worst bugs are fixed, and the other two are at least understood
now but I could use a bit more advice. It seems there is a weird race condition with DKMS/upstart/nvidia which has cropped up because due to faster boot, that looks tricky to get sorted, so feedback from people with experience in DKMS/upstart matters would be helpful. From what I understand, when doing an upgrade it installs both nvidia and a new kernel (2.6.31). At that point nvidia.ko is built against the *old* kernel (2.6.28). Fine, a nvidia.ko was successfully built so installation completes without error. xorg.conf is updated and the system is ready to run nvidia. Or so it thinks. Now the user reboots. During boot, dpkg notes that it needs to build a new nvidia.ko for 2.6.31 and dutifully gets to work. Meanwhile, since X is being started early on in the boot cycle, it in fact starts up before dkms has finished building the new nvidia.ko. X starts booting nvidia but since there is not yet an nvidia.ko for the current kernel it exits with an error. I'm going to see if I can reproduce this synthetically, but meanwhile does this theory make sense? If so, is there a dkms/upstart trick we could do to work around the issue in Karmic? And for Lucid what would the "right" solution be? Further notes on the other nvidia issues below... On Wed, Nov 04, 2009 at 02:26:56PM -0800, Bryce Harrington wrote: > I've been looking into some problems people have been reporting > upgrading to Karmic with -nvidia installed. > > One thing I've noticed is aside from whatever issue is occuring with > nvidia, there are bugs elsewhere which are compounding the problems and > leading to some poor user experiences. A common scenario occurs if for > whatever reason the -nvidia kernel module fails to build in DKMS: > > 438398 - If DKMS fails to build the kernel module, the package upgrade > does not kick out. It shows package upgrade as successful. So this > leads directly to... In reviewing instances of nvidia failures, this particular scenario appears to pop up less frequently in practice than I had initially assumed, and mostly due to unusual corner cases like not having patch installed, upgrading to Karmic directly from Hardy, etc. It seems most of these specific issues got fixed during development, just that the bug reports didn't get closed. The important point though is that these failures ended up worse than they should have been, due to the following bugs... > 451305 - Jockey misses that the driver failed to build, and so is not > letting users know about the potential problem. It goes ahead and > updates xorg.conf as if the driver was there. X tries to obey the > configuration settings, but of course they won't work, so it exits on > startup with an error message. *Normally* bulletproof-X would kick in > at this point, display the error to the user, and give them some tools > to diagnose and/or debug the situation. Unfortunately... Elsewhere in this thread several fixes/workarounds to this issue were identified, which should greatly lessen the severity of these kinds of error situations. > 474806 - The new gdm no longer supports the FailsafeXServer option, so > the diagnostic session no longer can be triggered to come up. Instead, > gdm tries several times, then gives up, but then... This is fixed; we now no longer rely on gdm for doing the failsafe but instead catch it with a simple upstart job and kick into failsafe-x mode. Thanks Steve! > 441638 - The gdm upstart job notices gdm has failed and so restarts it. > X of course continues to fail, gdm tries a few times and continues to > fail, repeat ad infinitum, and the user is just left looking at a > flashing screen. Ick. Now that we have an upstart job handling this case, the blinking situation will no longer happen. This fix is SRU'd and uploaded to ubuntu-proposed, and will go live before long. Since this particular situation crops up right now mostly with nvidia, people installing via the release livecd should be okay - that boots with open source drivers, and when they choose to install nvidia it will download that and (I assume) also update xorg to the version that contains this fix. So by the time they reboot they'll have the fix. Steve, can you confirm? > The above appears to be a pretty common scenario that we're getting a > rash of bug reports about. It's hard to be certain because many of the > bug reports are only including information about the failed boot, not on > the failed build. So I'm not sure if it is just one reason why the > build fails, or several. However if we can solve the above bugs it > should give much better visibility into things. > > > Btw, workaround for anyone experiencing this issue is to purge your > nvidia (and fglrx) packages, remove /etc/X11/xorg.conf, and reinstall > nvidia (or fglrx). It appears that in most of the bug reports this gets > the system functioning again. Doing a full reinstall of Ubuntu rather > than an upgrade also appears to work around the issues. It looks like simply doing a dpkg-reconfigure on the nvidia package is sufficient to work around the issue, no need for reinstalling it (although that'll work too). Bryce -- ubuntu-devel mailing list ubuntu-devel@... Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel |
|
|
Re: (Update) Re: -nvidia upgrade issuesOn Fri, Nov 6, 2009 at 4:21 PM, Bryce Harrington <bryce@...> wrote:
> The two worst bugs are fixed, and the other two are at least understood > now but I could use a bit more advice. It seems there is a weird race > condition with DKMS/upstart/nvidia which has cropped up because due to > faster boot, that looks tricky to get sorted, so feedback from people > with experience in DKMS/upstart matters would be helpful. > > From what I understand, when doing an upgrade it installs both nvidia > and a new kernel (2.6.31). At that point nvidia.ko is built against the > *old* kernel (2.6.28). Fine, a nvidia.ko was successfully built so > installation completes without error. xorg.conf is updated and the > system is ready to run nvidia. Or so it thinks. > > Now the user reboots. > > During boot, dpkg notes that it needs to build a new nvidia.ko for > 2.6.31 and dutifully gets to work. Meanwhile, since X is being started > early on in the boot cycle, it in fact starts up before dkms has > finished building the new nvidia.ko. X starts booting nvidia but since > there is not yet an nvidia.ko for the current kernel it exits with an > error. Chiming in as a user of -nvidia here, I just wanted to share my upgrade experience here as a data point. Indeed nvidia does build against the old kernel which seemed a little silly to me, but harmless as I understand it. However I didn't have any issues on reboot. Maybe it is just a race condition; to the very best of my memory, on a reboot it built for .31 just fine and continued on. I had literally zero problems anywhere with my dist upgrade. Though, I may have been using one of the 9.04 X PPAs, if that is worth anything. Sorry if this wasn't a useful data point :) -- Michael Rooney mrooney@... -- ubuntu-devel mailing list ubuntu-devel@... Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel |
| < Prev | 1 - 2 | Next > |
| Free embeddable forum powered by Nabble | Forum Help |
So the problem with declaring the package as failed if the DKMS build failed is that it may actually pass or fail depending on how far along into the updates you are.
Say you are updating to a new linux-headers with a new ABI at the same time as installing the NVIDIA package.
Well if the NVIDIA package is processed first, the headers aren't yet installed, so the package will fail during postinst, but as soon as the headers are loaded, the kernel postinst runs and the modules get successfully built.