next/master boot: 177 boots: 1 failed, 176 passed (next-20180808)
Full Boot Summary: https://kernelci.org/boot/all/job/next/branch/master/kernel/next-20180808/ Full Build Summary: https://kernelci.org/build/next/branch/master/kernel/next-20180808/
Tree: next Branch: master Git Describe: next-20180808 Git Commit: 6b522b734da2950c368aae668f963b8925fb5545 Git URL: http://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git Tested: 67 unique boards, 26 SoC families, 21 builds out of 200
Boot Regressions Detected:
arm:
multi_v7_defconfig: omap3-beagle-xm: lab-baylibre-seattle: new failure (last pass: next-20180807)
Boot Failure Detected:
arm:
multi_v7_defconfig omap3-beagle-xm: 1 failed lab
--- For more info write to info@kernelci.org
On Wed, Aug 08, 2018 at 06:42:44AM -0700, kernelci.org bot wrote:
Today's -next failed to boot multi_v7_defconfig on Beagle XM:
multi_v7_defconfig: omap3-beagle-xm: lab-baylibre-seattle: new failure (last pass: next-20180807)
but it booted fine with the OMAP defconfig and in pending-fixes. The MMC doesn't seem terribly happy:
[ 4.797210] mmc0: card never left busy state [ 4.801727] mmc0: error -110 whilst initialising SD card
though I'm not seeing any other errors. All the information we have including full logs can be found at:
https://kernelci.org/boot/id/5b6ad2e459b5143a9d96ba95/
The diffstat from yesterday is very light, nothing jumps out as being a likely cause.
Hi,
* Mark Brown broonie@kernel.org [180808 14:17]:
On Wed, Aug 08, 2018 at 06:42:44AM -0700, kernelci.org bot wrote:
Today's -next failed to boot multi_v7_defconfig on Beagle XM:
multi_v7_defconfig: omap3-beagle-xm: lab-baylibre-seattle: new failure (last pass: next-20180807)
but it booted fine with the OMAP defconfig and in pending-fixes. The MMC doesn't seem terribly happy:
[ 4.797210] mmc0: card never left busy state [ 4.801727] mmc0: error -110 whilst initialising SD card
though I'm not seeing any other errors. All the information we have including full logs can be found at:
https://kernelci.org/boot/id/5b6ad2e459b5143a9d96ba95/
Thanks for notifying about that.
The diffstat from yesterday is very light, nothing jumps out as being a likely cause.
Could it be a bad MMC card?
I don't have a beagle xm in my rack right now, but I gave next with m_v7_dc a quick boot test on n900, omap3-evm and logicpd torpedo that are all omap3 based with the twl4030 variant PMICs and I'm not seeing any MMC issues on them.
Regards,
Tony
On Thu, Aug 09, 2018 at 12:33:48AM -0700, Tony Lindgren wrote:
- Mark Brown broonie@kernel.org [180808 14:17]:
The diffstat from yesterday is very light, nothing jumps out as being a likely cause.
Could it be a bad MMC card?
It's possible, it's Kevin's lab so he'd need to take a look.
I don't have a beagle xm in my rack right now, but I gave next with m_v7_dc a quick boot test on n900, omap3-evm and logicpd torpedo that are all omap3 based with the twl4030 variant PMICs and I'm not seeing any MMC issues on them.
Yes, there's a bunch of those in kernelci which seem fine.
Mark Brown broonie@kernel.org writes:
On Thu, Aug 09, 2018 at 12:33:48AM -0700, Tony Lindgren wrote:
- Mark Brown broonie@kernel.org [180808 14:17]:
The diffstat from yesterday is very light, nothing jumps out as being a likely cause.
Could it be a bad MMC card?
It's possible, it's Kevin's lab so he'd need to take a look.
I changed MMC cards, and it's happier now.
What's strange is that it was succesfully loading/booting u-boot from MMC, and the card and partitions etc looked fine in a linux pc. Anyways, it's changed out now.
Kevin
* Kevin Hilman khilman@baylibre.com [180815 17:14]:
Mark Brown broonie@kernel.org writes:
On Thu, Aug 09, 2018 at 12:33:48AM -0700, Tony Lindgren wrote:
- Mark Brown broonie@kernel.org [180808 14:17]:
The diffstat from yesterday is very light, nothing jumps out as being a likely cause.
Could it be a bad MMC card?
It's possible, it's Kevin's lab so he'd need to take a look.
I changed MMC cards, and it's happier now.
OK
What's strange is that it was succesfully loading/booting u-boot from MMC, and the card and partitions etc looked fine in a linux pc. Anyways, it's changed out now.
Weird. I guess it's still possible we have some regression for lower voltage cards then.
Regards,
Tony
Tony Lindgren tony@atomide.com writes:
- Kevin Hilman khilman@baylibre.com [180815 17:14]:
Mark Brown broonie@kernel.org writes:
On Thu, Aug 09, 2018 at 12:33:48AM -0700, Tony Lindgren wrote:
- Mark Brown broonie@kernel.org [180808 14:17]:
The diffstat from yesterday is very light, nothing jumps out as being a likely cause.
Could it be a bad MMC card?
It's possible, it's Kevin's lab so he'd need to take a look.
I changed MMC cards, and it's happier now.
OK
What's strange is that it was succesfully loading/booting u-boot from MMC, and the card and partitions etc looked fine in a linux pc. Anyways, it's changed out now.
Weird. I guess it's still possible we have some regression for lower voltage cards then.
Hmm, I think I spoke to soon, and now I don't think it's the MMC card.
I'm still seeing periodic failures on this board soon after the MMC init, but only in mainline and next: https://kernelci.org/boot/omap3-beagle-xm
Also, looking at that URL, you'll see that the failures are only for multi_v7 but not omap2plus_defconfig.
The step that seems to be happening right after MMC init is unused regulators being disabled. Is it possible that multi_v7 is missing some regulator setup?
Also, the last line in the failure case:
leds_pwm pwmleds: unable to request PWM for beagleboard::pmu_stat: -517
doesn't happen on the successful omap2plus_defconfig boots either.
Kevin
Hi,
* Kevin Hilman khilman@baylibre.com [180817 15:47]:
Hmm, I think I spoke to soon, and now I don't think it's the MMC card.
I'm still seeing periodic failures on this board soon after the MMC init, but only in mainline and next: https://kernelci.org/boot/omap3-beagle-xm
Also, looking at that URL, you'll see that the failures are only for multi_v7 but not omap2plus_defconfig.
I was finally able to reproduce this here this morning with v4.18 after about 20 boot attempts. Looks like the system boots up, it just has a long pause. See the timestamps below where there is about 185 second pause:
[ 2.307800] mmc0: host does not support reading read-only switch, assuming write-enable [ 2.318237] mmc0: new high speed SDHC card at address 59b4 [ 2.325592] mmcblk0: mmc0:59b4 SD 14.7 GiB [ 2.333221] mmcblk0: p1 p2 [ 2.384490] ehci-omap 48064800.ehci: EHCI Host Controller [ 2.390045] ehci-omap 48064800.ehci: new USB bus registered, assigned bus number 2 [ 2.399261] ehci-omap 48064800.ehci: irq 93, io mem 0x48064800 [ 2.434387] ehci-omap 48064800.ehci: USB 2.0 started, EHCI 1.00 [ 2.441436] hub 2-0:1.0: USB hub found [ 2.445404] hub 2-0:1.0: 3 ports detected [ 2.451751] input: gpio_keys as /devices/platform/gpio_keys/input/input0 [ 2.460144] twl_rtc 48070000.i2c:twl@48:rtc: setting system clock to 2000-01-01 00:05:24 UTC (946685124) [ 2.814422] usb 2-2: new high-speed USB device number 2 using ehci-omap [ 3.016632] hub 2-2:1.0: USB hub found [ 3.020751] hub 2-2:1.0: 5 ports detected [ 3.344421] usb 2-2.1: new high-speed USB device number 3 using ehci-omap [ 3.498474] smsc95xx v1.0.6 [ 3.595184] smsc95xx 2-2.1:1.0 eth0: register 'smsc95xx' at usb-48064800.ehci-2.1, smsc95 xx USB 2.0 Ethernet, 11:22:33:44:55:66 [ 188.843536] random: crng init done [ 188.951354] smsc95xx 2-2.1:1.0 eth0: hardware isn't capable of remote wakeup [ 188.959014] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 190.469421] smsc95xx 2-2.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xC1E1 [ 190.494506] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 190.524627] Sending DHCP requests .., OK [ 192.948425] IP-Config: Got DHCP answer from 192.168.111.1, my address is 192.168.111.32 [ 192.956604] IP-Config: Complete: [ 192.959869] device=eth0, hwaddr=02:02:00:a0:1f:e0, ipaddr=192.168.111.32, mask=255.2 55.255.0, gw=192.168.111.1 [ 192.970428] host=beagleboard-xm, domain=muru.com, nis-domain=(none) [ 192.977233] bootserver=192.168.111.64, rootserver=192.168.111.64, rootpath=/srv/nfs3 /alpine-armhf,v3,rsize=32768,wsize=32768 [ 192.977264] nameserver0=208.69.40.3, nameserver1=208.69.40.4 [ 192.997680] VAUX3: disabling [ 193.002410] VDAC: disabling [ 193.007171] VUSB3V1: disabling [ 193.010711] VPLL2: disabling [ 193.030761] VFS: Mounted root (nfs filesystem) readonly on device 0:14. [ 193.038635] devtmpfs: mounted [ 193.046600] Freeing unused kernel memory: 2048K
The long pause happens already before disabling unused regulators. So it seems more like some regression with timers or interrupts.
The step that seems to be happening right after MMC init is unused regulators being disabled. Is it possible that multi_v7 is missing some regulator setup?
Also, the last line in the failure case:
leds_pwm pwmleds: unable to request PWM for beagleboard::pmu_stat: -517
doesn't happen on the successful omap2plus_defconfig boots either.
Looks like we're missing these in multi_v7_defconfig:
CONFIG_PWM_TWL=y CONFIG_PWM_TWL_LED=y
Regards,
Tony
Tony Lindgren tony@atomide.com writes:
Hi,
- Kevin Hilman khilman@baylibre.com [180817 15:47]:
Hmm, I think I spoke to soon, and now I don't think it's the MMC card.
I'm still seeing periodic failures on this board soon after the MMC init, but only in mainline and next: https://kernelci.org/boot/omap3-beagle-xm
Also, looking at that URL, you'll see that the failures are only for multi_v7 but not omap2plus_defconfig.
I was finally able to reproduce this here this morning with v4.18 after about 20 boot attempts. Looks like the system boots up, it just has a long pause. See the timestamps below where there is about 185 second pause:
[ 2.307800] mmc0: host does not support reading read-only switch, assuming write-enable [ 2.318237] mmc0: new high speed SDHC card at address 59b4 [ 2.325592] mmcblk0: mmc0:59b4 SD 14.7 GiB [ 2.333221] mmcblk0: p1 p2 [ 2.384490] ehci-omap 48064800.ehci: EHCI Host Controller [ 2.390045] ehci-omap 48064800.ehci: new USB bus registered, assigned bus number 2 [ 2.399261] ehci-omap 48064800.ehci: irq 93, io mem 0x48064800 [ 2.434387] ehci-omap 48064800.ehci: USB 2.0 started, EHCI 1.00 [ 2.441436] hub 2-0:1.0: USB hub found [ 2.445404] hub 2-0:1.0: 3 ports detected [ 2.451751] input: gpio_keys as /devices/platform/gpio_keys/input/input0 [ 2.460144] twl_rtc 48070000.i2c:twl@48:rtc: setting system clock to 2000-01-01 00:05:24 UTC (946685124) [ 2.814422] usb 2-2: new high-speed USB device number 2 using ehci-omap [ 3.016632] hub 2-2:1.0: USB hub found [ 3.020751] hub 2-2:1.0: 5 ports detected [ 3.344421] usb 2-2.1: new high-speed USB device number 3 using ehci-omap [ 3.498474] smsc95xx v1.0.6 [ 3.595184] smsc95xx 2-2.1:1.0 eth0: register 'smsc95xx' at usb-48064800.ehci-2.1, smsc95 xx USB 2.0 Ethernet, 11:22:33:44:55:66 [ 188.843536] random: crng init done [ 188.951354] smsc95xx 2-2.1:1.0 eth0: hardware isn't capable of remote wakeup [ 188.959014] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 190.469421] smsc95xx 2-2.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xC1E1 [ 190.494506] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 190.524627] Sending DHCP requests .., OK [ 192.948425] IP-Config: Got DHCP answer from 192.168.111.1, my address is 192.168.111.32 [ 192.956604] IP-Config: Complete: [ 192.959869] device=eth0, hwaddr=02:02:00:a0:1f:e0, ipaddr=192.168.111.32, mask=255.2 55.255.0, gw=192.168.111.1 [ 192.970428] host=beagleboard-xm, domain=muru.com, nis-domain=(none) [ 192.977233] bootserver=192.168.111.64, rootserver=192.168.111.64, rootpath=/srv/nfs3 /alpine-armhf,v3,rsize=32768,wsize=32768 [ 192.977264] nameserver0=208.69.40.3, nameserver1=208.69.40.4 [ 192.997680] VAUX3: disabling [ 193.002410] VDAC: disabling [ 193.007171] VUSB3V1: disabling [ 193.010711] VPLL2: disabling [ 193.030761] VFS: Mounted root (nfs filesystem) readonly on device 0:14. [ 193.038635] devtmpfs: mounted [ 193.046600] Freeing unused kernel memory: 2048K
The long pause happens already before disabling unused regulators. So it seems more like some regression with timers or interrupts.
Ah, that would explain the fails in kernelCI. I think we have a default wait of 200 sec for a full kernel boot.
The step that seems to be happening right after MMC init is unused regulators being disabled. Is it possible that multi_v7 is missing some regulator setup?
Also, the last line in the failure case:
leds_pwm pwmleds: unable to request PWM for beagleboard::pmu_stat: -517
doesn't happen on the successful omap2plus_defconfig boots either.
Looks like we're missing these in multi_v7_defconfig:
CONFIG_PWM_TWL=y CONFIG_PWM_TWL_LED=y
Does the absence of these explain the super-long boot delay? Based on your boot, it doesn't seem like it.
Any other ideas for where the delay is coming from?
Kevin
* Kevin Hilman khilman@baylibre.com [180820 21:32]:
Tony Lindgren tony@atomide.com writes:
I was finally able to reproduce this here this morning with v4.18 after about 20 boot attempts. Looks like the system boots up, it just has a long pause. See the timestamps below where there is about 185 second pause:
...
The long pause happens already before disabling unused regulators. So it seems more like some regression with timers or interrupts.
Ah, that would explain the fails in kernelCI. I think we have a default wait of 200 sec for a full kernel boot.
...
CONFIG_PWM_TWL=y CONFIG_PWM_TWL_LED=y
Does the absence of these explain the super-long boot delay? Based on your boot, it doesn't seem like it.
No that's not it.
Any other ideas for where the delay is coming from?
I think v4.16 is OK or at least I did not have any luck reproducing it so far on v4.16. I can reproduce it with v4.17, tried bisecting yesterday but of course it went nowhere as there's no easy way to know if some commit is good without booting it tens of times.
Regards,
Tony
* Tony Lindgren tony@atomide.com [180821 08:58]:
- Kevin Hilman khilman@baylibre.com [180820 21:32]:
Any other ideas for where the delay is coming from?
I think v4.16 is OK or at least I did not have any luck reproducing it so far on v4.16. I can reproduce it with v4.17, tried bisecting yesterday but of course it went nowhere as there's no easy way to know if some commit is good without booting it tens of times.
I think I managed to bisect it down to commit 554c8aa8ecad ("sched: idle: Select idle state before stopping the tick"). And looks like there's a fix being discussed for it at:
https://lore.kernel.org/lkml/2161372.IsD4PDzmmY@aspire.rjw.lan/
At least so far I have not been able to reproduce the issue with the patch being discussed, maybe give it a try on your test system too.
I'm guessing that probably the only reason why it was not caught earlier with omap2plus_defconfig is that it has more devices enabled and produces more interrupts compared to multi_v7_defconfig.
Regards,
Tony
kernel-build-reports@lists.linaro.org