Hi,
There are several ideas for DT improvements. Please check if they are reasonable, and any comments are welcome. I would let Mingliang (CCed) share more details if needed.
1) Improve search algorithm performance: Is the binary search tree or other algorithm better than current algorithm?
2) Reduce DTB space: when use one DTB to support multiple boards, the image is quite big (e.g, ~39MB space for 100 configurations), and some compression algorithm can reduce the space a lot (e.g, from 39MB to 7MB). Shall we have such compression support for DTB? And it can be helpful if we can have more efficient compression algorithm.
3) Define specific rule for properties: The property value (FDT_PROP_DATA) itself occupies only ~50% of the total DTB space. And the property of each node is different and the private property name length is too long, for example: “freq-autodown-baseaddress-num” in dt_strings. It seems more reasonable that the property value should occupies more than 70% of the total DTB space. It can probably be achieved to define some rules to restrict the length of property name, etc.
Thanks, Jammy
Hi Jammy & Mingliang,
On 2/5/21 2:59 AM, Jammy Zhou wrote:
Hi,
There are several ideas for DT improvements. Please check if they are reasonable, and any comments are welcome. I would let Mingliang (CCed) share more details if needed.
- Improve search algorithm performance: Is the binary search tree or other
algorithm better than current algorithm?
We will need data to show the problem. I suppose this would best be done when unflatening the data at runtime? What is the expected gain in boot time? Are there any measurements of how much time is spent in the search routines today?
- Reduce DTB space: when use one DTB to support multiple boards, the image
is quite big (e.g, ~39MB space for 100 configurations), and some compression algorithm can reduce the space a lot (e.g, from 39MB to 7MB). Shall we have such compression support for DTB? And it can be helpful if we can have more efficient compression algorithm.
This could be done as an enhancement to the DTB loader instead of the DTB format itself.
Compressing each DTB (boardx.dtb.xz) will get you gains but compressing a set of boards (vmlinux-5.4.0-65-generic-dbt-set-20.tar.xz) might give you more.
To be significant, the number of boards would need to be large and the size of the rootfs would need to be modest. A 200 to 300 MB minimal image would make an interesting comparison point. (A rootfs of 10s of MB would probably only target a few boards.)
What is the goal of the use case? 1) Fit in limited storage ( ex: 256MB ) 2) Conserve more space of modest storage for container data ( 1GB eMMC) 3) Improve boot time
For 3, the load time will be reduced but the decompression time will be added. These need to be balanced based on the CPU.
One pet peeve I have in most of our boot loaders today is that they do loading and decompression serially. During loading the IO is 100% loaded and the CPU is very lightly loaded. During decomoression the CPU is 100% loaded and the IO is 0%. It makes sense to pipleline / overlap these things which means that it needs to go into the loader. To optimize boot time the decompression algorithm needs to be chosen correctly. On smaller CPUs the time taken to decompress newer algorithms can greatly outweigh the time taken to load the decompressed data. Ideally the time to decompress 1 block == time to load one block. The dynamics shift with CPU and IO performance.
Today, a lot of people focused on boot speed just use decompressed data but I think we could do better if we pipeline
- Define specific rule for properties: The property value (FDT_PROP_DATA)
itself occupies only ~50% of the total DTB space. And the property of each node is different and the private property name length is too long, for example: “freq-autodown-baseaddress-num” in dt_strings. It seems more reasonable that the property value should occupies more than 70% of the total DTB space. It can probably be achieved to define some rules to restrict the length of property name, etc.
This is harder. In 2019 I had proposed an ATOM based DTB enhancement [2]. I was told Frank Rowand had other proposals for format changes.
Thanks, Bill
[2] https://docs.google.com/document/d/19XbxN-zX-GYwOXdF78lGnp0j7UNx1MT3wzyCjait...
Thanks, Jammy _______________________________________________ boot-architecture mailing list boot-architecture@lists.linaro.org https://lists.linaro.org/mailman/listinfo/boot-architecture
+ zhangpeng, owner of DT in Hisilicon
-----Original Message----- From: Bill Mills [mailto:bill.mills@linaro.org] Sent: 2021年2月7日 1:39 To: Jammy Zhou jammy.zhou@linaro.org; boot-architecture@lists.linaro.org; Frank Rowand frowand.list@gmail.com Cc: Xiamingliang (XML, Hisilicon) xiamingliang@huawei.com Subject: Re: Ideas for DT improvements
Hi Jammy & Mingliang,
On 2/5/21 2:59 AM, Jammy Zhou wrote:
Hi,
There are several ideas for DT improvements. Please check if they are reasonable, and any comments are welcome. I would let Mingliang (CCed) share more details if needed.
- Improve search algorithm performance: Is the binary search tree or
other algorithm better than current algorithm?
We will need data to show the problem. I suppose this would best be done when unflatening the data at runtime? What is the expected gain in boot time? Are there any measurements of how much time is spent in the search routines today?
- Reduce DTB space: when use one DTB to support multiple boards, the
image is quite big (e.g, ~39MB space for 100 configurations), and some compression algorithm can reduce the space a lot (e.g, from 39MB to 7MB). Shall we have such compression support for DTB? And it can be helpful if we can have more efficient compression algorithm.
This could be done as an enhancement to the DTB loader instead of the DTB format itself.
Compressing each DTB (boardx.dtb.xz) will get you gains but compressing a set of boards (vmlinux-5.4.0-65-generic-dbt-set-20.tar.xz) might give you more.
To be significant, the number of boards would need to be large and the size of the rootfs would need to be modest. A 200 to 300 MB minimal image would make an interesting comparison point. (A rootfs of 10s of MB would probably only target a few boards.)
What is the goal of the use case? 1) Fit in limited storage ( ex: 256MB ) 2) Conserve more space of modest storage for container data ( 1GB eMMC) 3) Improve boot time
For 3, the load time will be reduced but the decompression time will be added. These need to be balanced based on the CPU.
One pet peeve I have in most of our boot loaders today is that they do loading and decompression serially. During loading the IO is 100% loaded and the CPU is very lightly loaded. During decomoression the CPU is 100% loaded and the IO is 0%. It makes sense to pipleline / overlap these things which means that it needs to go into the loader. To optimize boot time the decompression algorithm needs to be chosen correctly. On smaller CPUs the time taken to decompress newer algorithms can greatly outweigh the time taken to load the decompressed data. Ideally the time to decompress 1 block == time to load one block. The dynamics shift with CPU and IO performance.
Today, a lot of people focused on boot speed just use decompressed data but I think we could do better if we pipeline
- Define specific rule for properties: The property value
(FDT_PROP_DATA) itself occupies only ~50% of the total DTB space. And the property of each node is different and the private property name length is too long, for example: “freq-autodown-baseaddress-num” in dt_strings. It seems more reasonable that the property value should occupies more than 70% of the total DTB space. It can probably be achieved to define some rules to restrict the length of property name, etc.
This is harder. In 2019 I had proposed an ATOM based DTB enhancement [2]. I was told Frank Rowand had other proposals for format changes.
Thanks, Bill
[2] https://docs.google.com/document/d/19XbxN-zX-GYwOXdF78lGnp0j7UNx1MT3wzyCjait...
Thanks, Jammy _______________________________________________ boot-architecture mailing list boot-architecture@lists.linaro.org https://lists.linaro.org/mailman/listinfo/boot-architecture
Hi Bill,
Thanks very much for your comments. Since we're close to the Chinese New Year holiday, I would assume there will be some delay for the response by Zhangpeng.
Regards, Jammy
On Sun, 7 Feb 2021 at 09:35, Xiamingliang (XML, Hisilicon) < xiamingliang@huawei.com> wrote:
- zhangpeng, owner of DT in Hisilicon
-----Original Message----- From: Bill Mills [mailto:bill.mills@linaro.org] Sent: 2021年2月7日 1:39 To: Jammy Zhou jammy.zhou@linaro.org; boot-architecture@lists.linaro.org; Frank Rowand frowand.list@gmail.com Cc: Xiamingliang (XML, Hisilicon) xiamingliang@huawei.com Subject: Re: Ideas for DT improvements
Hi Jammy & Mingliang,
On 2/5/21 2:59 AM, Jammy Zhou wrote:
Hi,
There are several ideas for DT improvements. Please check if they are reasonable, and any comments are welcome. I would let Mingliang (CCed) share more details if needed.
- Improve search algorithm performance: Is the binary search tree or
other algorithm better than current algorithm?
We will need data to show the problem. I suppose this would best be done when unflatening the data at runtime? What is the expected gain in boot time? Are there any measurements of how much time is spent in the search routines today?
- Reduce DTB space: when use one DTB to support multiple boards, the
image is quite big (e.g, ~39MB space for 100 configurations), and some compression algorithm can reduce the space a lot (e.g, from 39MB to 7MB). Shall we have such compression support for DTB? And it can be helpful if we can have more efficient compression algorithm.
This could be done as an enhancement to the DTB loader instead of the DTB format itself.
Compressing each DTB (boardx.dtb.xz) will get you gains but compressing a set of boards (vmlinux-5.4.0-65-generic-dbt-set-20.tar.xz) might give you more.
To be significant, the number of boards would need to be large and the size of the rootfs would need to be modest. A 200 to 300 MB minimal image would make an interesting comparison point. (A rootfs of 10s of MB would probably only target a few boards.)
What is the goal of the use case?
- Fit in limited storage ( ex: 256MB )
- Conserve more space of modest storage for container data ( 1GB eMMC)
- Improve boot time
For 3, the load time will be reduced but the decompression time will be added. These need to be balanced based on the CPU.
One pet peeve I have in most of our boot loaders today is that they do loading and decompression serially. During loading the IO is 100% loaded and the CPU is very lightly loaded. During decomoression the CPU is 100% loaded and the IO is 0%. It makes sense to pipleline / overlap these things which means that it needs to go into the loader. To optimize boot time the decompression algorithm needs to be chosen correctly. On smaller CPUs the time taken to decompress newer algorithms can greatly outweigh the time taken to load the decompressed data. Ideally the time to decompress 1 block == time to load one block. The dynamics shift with CPU and IO performance.
Today, a lot of people focused on boot speed just use decompressed data but I think we could do better if we pipeline
- Define specific rule for properties: The property value
(FDT_PROP_DATA) itself occupies only ~50% of the total DTB space. And the property of each node is different and the private property name length is too long, for example: “freq-autodown-baseaddress-num” in dt_strings. It seems more reasonable that the property value should occupies more than 70% of the total DTB space. It can probably be achieved to define some rules to restrict the length of property name, etc.
This is harder. In 2019 I had proposed an ATOM based DTB enhancement [2]. I was told Frank Rowand had other proposals for format changes.
Thanks, Bill
[2]
https://docs.google.com/document/d/19XbxN-zX-GYwOXdF78lGnp0j7UNx1MT3wzyCjait...
Thanks, Jammy _______________________________________________ boot-architecture mailing list boot-architecture@lists.linaro.org https://lists.linaro.org/mailman/listinfo/boot-architecture
-- Bill Mills Principal Technical Consultant, Linaro +1-240-643-0836 TZ: US Eastern Work Schedule: Tues/Wed/Thur
boot-architecture@lists.linaro.org