next/master boot: 177 boots: 1 failed, 176 passed (next-20180808)

List overview All Threads
Download

newer

older

android-hikey-linaro-4.9-oe...

mainline build: 7 warnings 0...

kernelci.org bot

8 Aug 2018 8 Aug '18

1:42 p.m.

next/master boot: 177 boots: 1 failed, 176 passed (next-20180808)

Full Boot Summary: https://kernelci.org/boot/all/job/next/branch/master/kernel/next-20180808/ Full Build Summary: https://kernelci.org/build/next/branch/master/kernel/next-20180808/

Tree: next Branch: master Git Describe: next-20180808 Git Commit: 6b522b734da2950c368aae668f963b8925fb5545 Git URL: http://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git Tested: 67 unique boards, 26 SoC families, 21 builds out of 200

Boot Regressions Detected:

arm:

multi_v7_defconfig: omap3-beagle-xm: lab-baylibre-seattle: new failure (last pass: next-20180807)

Boot Failure Detected:

arm:

multi_v7_defconfig omap3-beagle-xm: 1 failed lab

--- For more info write to info@kernelci.org

Attachments:

attachment.html (text/html — 3.5 KB)

Show replies by date

Mark Brown

8 Aug 8 Aug

2:14 p.m.

On Wed, Aug 08, 2018 at 06:42:44AM -0700, kernelci.org bot wrote:

Today's -next failed to boot multi_v7_defconfig on Beagle XM:

...

multi_v7_defconfig:
    omap3-beagle-xm:
        lab-baylibre-seattle: new failure (last pass: next-20180807)

but it booted fine with the OMAP defconfig and in pending-fixes. The MMC doesn't seem terribly happy:

[ 4.797210] mmc0: card never left busy state [ 4.801727] mmc0: error -110 whilst initialising SD card

though I'm not seeing any other errors. All the information we have including full logs can be found at:

https://kernelci.org/boot/id/5b6ad2e459b5143a9d96ba95/

The diffstat from yesterday is very light, nothing jumps out as being a likely cause.

Tony Lindgren

9 Aug 9 Aug

7:33 a.m.

Hi,

* Mark Brown broonie@kernel.org [180808 14:17]:

...

On Wed, Aug 08, 2018 at 06:42:44AM -0700, kernelci.org bot wrote:

Today's -next failed to boot multi_v7_defconfig on Beagle XM:

...
multi_v7_defconfig:
    omap3-beagle-xm:
        lab-baylibre-seattle: new failure (last pass: next-20180807)
but it booted fine with the OMAP defconfig and in pending-fixes. The MMC doesn't seem terribly happy:

[ 4.797210] mmc0: card never left busy state [ 4.801727] mmc0: error -110 whilst initialising SD card

though I'm not seeing any other errors. All the information we have including full logs can be found at:
https://kernelci.org/boot/id/5b6ad2e459b5143a9d96ba95/

Thanks for notifying about that.

...

The diffstat from yesterday is very light, nothing jumps out as being a likely cause.

Could it be a bad MMC card?

I don't have a beagle xm in my rack right now, but I gave next with m_v7_dc a quick boot test on n900, omap3-evm and logicpd torpedo that are all omap3 based with the twl4030 variant PMICs and I'm not seeing any MMC issues on them.

Regards,

Tony

Mark Brown

9:28 a.m.

On Thu, Aug 09, 2018 at 12:33:48AM -0700, Tony Lindgren wrote:

...

Mark Brown broonie@kernel.org [180808 14:17]:

...

...
The diffstat from yesterday is very light, nothing jumps out as being a likely cause.

...

Could it be a bad MMC card?

It's possible, it's Kevin's lab so he'd need to take a look.

...

I don't have a beagle xm in my rack right now, but I gave next with m_v7_dc a quick boot test on n900, omap3-evm and logicpd torpedo that are all omap3 based with the twl4030 variant PMICs and I'm not seeing any MMC issues on them.

Yes, there's a bunch of those in kernelci which seem fine.

Kevin Hilman

15 Aug 15 Aug

5:10 p.m.

Mark Brown broonie@kernel.org writes:

...

On Thu, Aug 09, 2018 at 12:33:48AM -0700, Tony Lindgren wrote:

...

Mark Brown broonie@kernel.org [180808 14:17]:

...
...
The diffstat from yesterday is very light, nothing jumps out as being a likely cause.

...
Could it be a bad MMC card?

It's possible, it's Kevin's lab so he'd need to take a look.

I changed MMC cards, and it's happier now.

What's strange is that it was succesfully loading/booting u-boot from MMC, and the card and partitions etc looked fine in a linux pc. Anyways, it's changed out now.

Kevin

Tony Lindgren

7:33 p.m.

* Kevin Hilman khilman@baylibre.com [180815 17:14]:

...

Mark Brown broonie@kernel.org writes:

...
On Thu, Aug 09, 2018 at 12:33:48AM -0700, Tony Lindgren wrote:

...

Mark Brown broonie@kernel.org [180808 14:17]:

...
...
The diffstat from yesterday is very light, nothing jumps out as being a likely cause.

...
Could it be a bad MMC card?

It's possible, it's Kevin's lab so he'd need to take a look.

I changed MMC cards, and it's happier now.

...

What's strange is that it was succesfully loading/booting u-boot from MMC, and the card and partitions etc looked fine in a linux pc. Anyways, it's changed out now.

Weird. I guess it's still possible we have some regression for lower voltage cards then.

Regards,

Tony

Kevin Hilman

17 Aug 17 Aug

3:44 p.m.

Tony Lindgren tony@atomide.com writes:

...

Kevin Hilman khilman@baylibre.com [180815 17:14]:

...
Mark Brown broonie@kernel.org writes:

...
On Thu, Aug 09, 2018 at 12:33:48AM -0700, Tony Lindgren wrote:

...

Mark Brown broonie@kernel.org [180808 14:17]:

...
...
The diffstat from yesterday is very light, nothing jumps out as being a likely cause.

...
Could it be a bad MMC card?

It's possible, it's Kevin's lab so he'd need to take a look.

I changed MMC cards, and it's happier now.

OK

...
What's strange is that it was succesfully loading/booting u-boot from MMC, and the card and partitions etc looked fine in a linux pc. Anyways, it's changed out now.

Weird. I guess it's still possible we have some regression for lower voltage cards then.

Hmm, I think I spoke to soon, and now I don't think it's the MMC card.

I'm still seeing periodic failures on this board soon after the MMC init, but only in mainline and next: https://kernelci.org/boot/omap3-beagle-xm

Also, looking at that URL, you'll see that the failures are only for multi_v7 but not omap2plus_defconfig.

The step that seems to be happening right after MMC init is unused regulators being disabled. Is it possible that multi_v7 is missing some regulator setup?

Also, the last line in the failure case:

leds_pwm pwmleds: unable to request PWM for beagleboard::pmu_stat: -517

doesn't happen on the successful omap2plus_defconfig boots either.

Kevin

Tony Lindgren

20 Aug 20 Aug

3:57 p.m.

Hi,

* Kevin Hilman khilman@baylibre.com [180817 15:47]:

...

Hmm, I think I spoke to soon, and now I don't think it's the MMC card.

I'm still seeing periodic failures on this board soon after the MMC init, but only in mainline and next: https://kernelci.org/boot/omap3-beagle-xm

Also, looking at that URL, you'll see that the failures are only for multi_v7 but not omap2plus_defconfig.

I was finally able to reproduce this here this morning with v4.18 after about 20 boot attempts. Looks like the system boots up, it just has a long pause. See the timestamps below where there is about 185 second pause:

[ 2.307800] mmc0: host does not support reading read-only switch, assuming write-enable [ 2.318237] mmc0: new high speed SDHC card at address 59b4 [ 2.325592] mmcblk0: mmc0:59b4 SD 14.7 GiB [ 2.333221] mmcblk0: p1 p2 [ 2.384490] ehci-omap 48064800.ehci: EHCI Host Controller [ 2.390045] ehci-omap 48064800.ehci: new USB bus registered, assigned bus number 2 [ 2.399261] ehci-omap 48064800.ehci: irq 93, io mem 0x48064800 [ 2.434387] ehci-omap 48064800.ehci: USB 2.0 started, EHCI 1.00 [ 2.441436] hub 2-0:1.0: USB hub found [ 2.445404] hub 2-0:1.0: 3 ports detected [ 2.451751] input: gpio_keys as /devices/platform/gpio_keys/input/input0 [ 2.460144] twl_rtc 48070000.i2c:twl@48:rtc: setting system clock to 2000-01-01 00:05:24 UTC (946685124) [ 2.814422] usb 2-2: new high-speed USB device number 2 using ehci-omap [ 3.016632] hub 2-2:1.0: USB hub found [ 3.020751] hub 2-2:1.0: 5 ports detected [ 3.344421] usb 2-2.1: new high-speed USB device number 3 using ehci-omap [ 3.498474] smsc95xx v1.0.6 [ 3.595184] smsc95xx 2-2.1:1.0 eth0: register 'smsc95xx' at usb-48064800.ehci-2.1, smsc95 xx USB 2.0 Ethernet, 11:22:33:44:55:66 [ 188.843536] random: crng init done [ 188.951354] smsc95xx 2-2.1:1.0 eth0: hardware isn't capable of remote wakeup [ 188.959014] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 190.469421] smsc95xx 2-2.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xC1E1 [ 190.494506] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 190.524627] Sending DHCP requests .., OK [ 192.948425] IP-Config: Got DHCP answer from 192.168.111.1, my address is 192.168.111.32 [ 192.956604] IP-Config: Complete: [ 192.959869] device=eth0, hwaddr=02:02:00:a0:1f:e0, ipaddr=192.168.111.32, mask=255.2 55.255.0, gw=192.168.111.1 [ 192.970428] host=beagleboard-xm, domain=muru.com, nis-domain=(none) [ 192.977233] bootserver=192.168.111.64, rootserver=192.168.111.64, rootpath=/srv/nfs3 /alpine-armhf,v3,rsize=32768,wsize=32768 [ 192.977264] nameserver0=208.69.40.3, nameserver1=208.69.40.4 [ 192.997680] VAUX3: disabling [ 193.002410] VDAC: disabling [ 193.007171] VUSB3V1: disabling [ 193.010711] VPLL2: disabling [ 193.030761] VFS: Mounted root (nfs filesystem) readonly on device 0:14. [ 193.038635] devtmpfs: mounted [ 193.046600] Freeing unused kernel memory: 2048K

The long pause happens already before disabling unused regulators. So it seems more like some regression with timers or interrupts.

...

The step that seems to be happening right after MMC init is unused regulators being disabled. Is it possible that multi_v7 is missing some regulator setup?

Also, the last line in the failure case:

leds_pwm pwmleds: unable to request PWM for beagleboard::pmu_stat: -517

doesn't happen on the successful omap2plus_defconfig boots either.

Looks like we're missing these in multi_v7_defconfig:

CONFIG_PWM_TWL=y CONFIG_PWM_TWL_LED=y

Regards,

Tony

Kevin Hilman

9:28 p.m.

Tony Lindgren tony@atomide.com writes:

...

Hi,

Kevin Hilman khilman@baylibre.com [180817 15:47]:

...
Hmm, I think I spoke to soon, and now I don't think it's the MMC card.

I'm still seeing periodic failures on this board soon after the MMC init, but only in mainline and next: https://kernelci.org/boot/omap3-beagle-xm

Also, looking at that URL, you'll see that the failures are only for multi_v7 but not omap2plus_defconfig.

I was finally able to reproduce this here this morning with v4.18 after about 20 boot attempts. Looks like the system boots up, it just has a long pause. See the timestamps below where there is about 185 second pause:

[ 2.307800] mmc0: host does not support reading read-only switch, assuming write-enable [ 2.318237] mmc0: new high speed SDHC card at address 59b4 [ 2.325592] mmcblk0: mmc0:59b4 SD 14.7 GiB [ 2.333221] mmcblk0: p1 p2 [ 2.384490] ehci-omap 48064800.ehci: EHCI Host Controller [ 2.390045] ehci-omap 48064800.ehci: new USB bus registered, assigned bus number 2 [ 2.399261] ehci-omap 48064800.ehci: irq 93, io mem 0x48064800 [ 2.434387] ehci-omap 48064800.ehci: USB 2.0 started, EHCI 1.00 [ 2.441436] hub 2-0:1.0: USB hub found [ 2.445404] hub 2-0:1.0: 3 ports detected [ 2.451751] input: gpio_keys as /devices/platform/gpio_keys/input/input0 [ 2.460144] twl_rtc 48070000.i2c:twl@48:rtc: setting system clock to 2000-01-01 00:05:24 UTC (946685124) [ 2.814422] usb 2-2: new high-speed USB device number 2 using ehci-omap [ 3.016632] hub 2-2:1.0: USB hub found [ 3.020751] hub 2-2:1.0: 5 ports detected [ 3.344421] usb 2-2.1: new high-speed USB device number 3 using ehci-omap [ 3.498474] smsc95xx v1.0.6 [ 3.595184] smsc95xx 2-2.1:1.0 eth0: register 'smsc95xx' at usb-48064800.ehci-2.1, smsc95 xx USB 2.0 Ethernet, 11:22:33:44:55:66 [ 188.843536] random: crng init done [ 188.951354] smsc95xx 2-2.1:1.0 eth0: hardware isn't capable of remote wakeup [ 188.959014] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 190.469421] smsc95xx 2-2.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xC1E1 [ 190.494506] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 190.524627] Sending DHCP requests .., OK [ 192.948425] IP-Config: Got DHCP answer from 192.168.111.1, my address is 192.168.111.32 [ 192.956604] IP-Config: Complete: [ 192.959869] device=eth0, hwaddr=02:02:00:a0:1f:e0, ipaddr=192.168.111.32, mask=255.2 55.255.0, gw=192.168.111.1 [ 192.970428] host=beagleboard-xm, domain=muru.com, nis-domain=(none) [ 192.977233] bootserver=192.168.111.64, rootserver=192.168.111.64, rootpath=/srv/nfs3 /alpine-armhf,v3,rsize=32768,wsize=32768 [ 192.977264] nameserver0=208.69.40.3, nameserver1=208.69.40.4 [ 192.997680] VAUX3: disabling [ 193.002410] VDAC: disabling [ 193.007171] VUSB3V1: disabling [ 193.010711] VPLL2: disabling [ 193.030761] VFS: Mounted root (nfs filesystem) readonly on device 0:14. [ 193.038635] devtmpfs: mounted [ 193.046600] Freeing unused kernel memory: 2048K

The long pause happens already before disabling unused regulators. So it seems more like some regression with timers or interrupts.

Ah, that would explain the fails in kernelCI. I think we have a default wait of 200 sec for a full kernel boot.

...

...
The step that seems to be happening right after MMC init is unused regulators being disabled. Is it possible that multi_v7 is missing some regulator setup?

Also, the last line in the failure case:

leds_pwm pwmleds: unable to request PWM for beagleboard::pmu_stat: -517

doesn't happen on the successful omap2plus_defconfig boots either.

Looks like we're missing these in multi_v7_defconfig:

CONFIG_PWM_TWL=y CONFIG_PWM_TWL_LED=y

Does the absence of these explain the super-long boot delay? Based on your boot, it doesn't seem like it.

Any other ideas for where the delay is coming from?

Kevin

Tony Lindgren

21 Aug 21 Aug

3:58 p.m.

* Kevin Hilman khilman@baylibre.com [180820 21:32]:

...

Tony Lindgren tony@atomide.com writes:

...
I was finally able to reproduce this here this morning with v4.18 after about 20 boot attempts. Looks like the system boots up, it just has a long pause. See the timestamps below where there is about 185 second pause:

...

...
The long pause happens already before disabling unused regulators. So it seems more like some regression with timers or interrupts.

Ah, that would explain the fails in kernelCI. I think we have a default wait of 200 sec for a full kernel boot.

...

...
CONFIG_PWM_TWL=y CONFIG_PWM_TWL_LED=y

Does the absence of these explain the super-long boot delay? Based on your boot, it doesn't seem like it.

No that's not it.

...

Any other ideas for where the delay is coming from?

I think v4.16 is OK or at least I did not have any luck reproducing it so far on v4.16. I can reproduce it with v4.17, tried bisecting yesterday but of course it went nowhere as there's no easy way to know if some commit is good without booting it tens of times.

Regards,

Tony

Tony Lindgren

11:22 p.m.

* Tony Lindgren tony@atomide.com [180821 08:58]:

...

Kevin Hilman khilman@baylibre.com [180820 21:32]:

...
Any other ideas for where the delay is coming from?

I think v4.16 is OK or at least I did not have any luck reproducing it so far on v4.16. I can reproduce it with v4.17, tried bisecting yesterday but of course it went nowhere as there's no easy way to know if some commit is good without booting it tens of times.

I think I managed to bisect it down to commit 554c8aa8ecad ("sched: idle: Select idle state before stopping the tick"). And looks like there's a fix being discussed for it at:

https://lore.kernel.org/lkml/2161372.IsD4PDzmmY@aspire.rjw.lan/

At least so far I have not been able to reproduce the issue with the patch being discussed, maybe give it a try on your test system too.

I'm guessing that probably the only reason why it was not caught earlier with omap2plus_defconfig is that it has more devices enabled and produces more interrupts compared to multi_v7_defconfig.

Regards,

Tony

2554

days inactive

2567

days old

kernel-build-reports@lists.linaro.org

10 comments

participants

tags (0)

participants (4)

kernelci.org bot
Kevin Hilman
Mark Brown
Tony Lindgren