Skip to content

Conversation

@m-ildefons
Copy link
Member

Add node network information to supportbundle. The supportbundle already collects a lot of configs from the node. This adds the network config as well, which aids when debugging network issues.

related-to: harvester/harvester#7714

Add node network information to supportbundle. The supportbundle already
collects a lot of configs from the node. This adds the network config as
well, which aids when debugging network issues.

related-to: harvester/harvester#7714

Signed-off-by: Moritz Röhrich <[email protected]>
@m-ildefons m-ildefons added the enhancement New feature or request label Feb 26, 2025
@m-ildefons m-ildefons requested a review from bk201 February 26, 2025 09:37
@m-ildefons m-ildefons self-assigned this Feb 26, 2025
@m-ildefons
Copy link
Member Author

@ravarga this should do the trick and help you debugging network issues in the future.

# Generate supportconfig from node
chroot $HOST_PATH /sbin/supportconfig -c -m -B supportconfig_$SUPPORT_BUNDLE_NODE_NAME \
-i BOOT,DAEMONS,ETC,ISCSI,MEM,MOD,NTP,SMART,DISK,pharvester_plugin_rke2,pharvester_plugin_console
-i BOOT,DAEMONS,ETC,ISCSI,MEM,MOD,NET,NTP,SMART,DISK,pharvester_plugin_rke2,pharvester_plugin_console
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this increase the time to collect supportconfig on a node a lot? Previously we had an issue harvester/harvester#5323, the progress is quite slow due to the failing collecting in namespaces.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a valid point and it can indeed happen. Supportconfig is a useful source of information and it should even run to the full extent but it should be an optional part of supportbundle. When debugging issues it may be required to have it but many times it adds time in the middle of an issue where fast action is required.

With that, if we can make supportconfig an optional part of SB I believe that would be the best option

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change does not include Kubernetes objects. It just collects the network configuration from the nodes (iptable entries, network interface status, etc.).

On my workstation, a basic supportconfig takes between 30 and 40 seconds. Most of that time is spent looking at the RPM database.

Excluding the two custom module (i.e. with -i BOOT,DAEMONS,ETC,ISCSI,MEM,MOD,NTP,SMART,DISK), the supportconfig takes about 1 minute 10 seconds.
It varies from run to run by about 7 seconds.

With the networking info it's about the same - again, varying several seconds between each run.

In conclusion, I think collecting the networking info via the supportconfig adds an insignificant amount of time.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a single node? Imagine running it on a 9-node cluster when storage is misbehaving.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So? Then additionally collecting networking info will add an insignificant delay to each of the nine nodes respective supportconfig.
Even if they are collected sequentially and the delays add up I doubt that it will be a noticeable impact.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just did a little test on a single harvester v1.4.1 node I have running here in a VM on my laptop:

As a baseline, grabbing only minimal info:

harvester-node-0:~ # time supportconfig -m 
[...]

==[ DONE ]===================================================================
  Log file tar ball: /var/log/scc_harvester-node-0_250228_0740.txz
  Log file size:     312K
  Log file md5sum:   8ff1b54ccc45a0b3b9db28c44e78afb7-f
=============================================================================

real	0m10.006s
user	0m1.863s
sys	0m0.350s

Minimal + NET somehow added slightly over a minute to the process:

harvester-node-0:~ # time supportconfig -m -i NET
[...]
==[ DONE ]===================================================================
  Log file tar ball: /var/log/scc_harvester-node-0_250228_0742.txz
  Log file size:     570K
  Log file md5sum:   88a649665bb6b31628c1877c1e6a847a-f
=============================================================================

real	1m11.975s
user	0m10.296s
sys	0m5.647s

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah...

# ls -1 /var/log/scc_harvester-node-0_250228_0740
basic-environment.txt
basic-health-check.txt
hardware.txt
messages.txt
messages_config.txt
messages_localwarn.txt
pam.txt
rpm.txt
summary.xml
supportconfig.txt
y2log.txt

# ls -1 /var/log/scc_harvester-node-0_250228_0742
basic-environment.txt
basic-health-check.txt
hardware.txt
messages.txt
messages_config.txt
messages_localwarn.txt
network-cni-08df30b6-4f8a-230c-9808-06a784961dc5.txt
network-cni-0afefa77-b59f-1e8a-9013-1e9c39a7d390.txt
network-cni-0dc411a5-78b9-f874-55f1-1f3ba0c791f3.txt
network-cni-0e1945a5-27db-2013-1a31-a84e8b47ba69.txt
network-cni-104ab7c0-002e-0617-8bd9-1e4faff44587.txt
network-cni-126366c4-835c-13c5-abf5-2f7f0167b912.txt
network-cni-1e7bbb66-28e2-6cfe-d83f-e259f5f22b32.txt
network-cni-2631ced7-76ef-23c2-edec-090a79abe9cc.txt
network-cni-332c4f98-523d-dced-d788-5ee24fef437a.txt
network-cni-334c493b-a0af-90da-2187-9b81093ba4be.txt
network-cni-337244ec-98e1-6f9d-61d3-7a50d6836cdb.txt
network-cni-349066b4-da91-a86b-db90-e18b9bc37c1a.txt
network-cni-36a65244-976b-cd8b-56cb-4c23104160ee.txt
network-cni-4951873c-5f1a-bf4d-4aa5-847feeea8bcc.txt
network-cni-4f01ca2e-b90b-251e-0f54-449d0755c2b3.txt
network-cni-4f8580ab-c56f-170d-85fd-ae8f73f99da2.txt
network-cni-500b77f0-b615-71c8-603f-4531a45e142f.txt
network-cni-50471341-f851-3bbc-7b82-c0606f65011b.txt
network-cni-53af6031-8f56-8025-be9d-238b5c607b37.txt
network-cni-588fb6e3-5f2d-2755-3ddf-a663c4992ffc.txt
network-cni-6411e29b-2676-8e86-2594-6e4f57e1f630.txt
network-cni-643fe3da-bbe9-f475-18a2-3d6e83ad08e7.txt
network-cni-682afd8e-530e-b9db-be47-9e374dc83f90.txt
network-cni-6bc16ca3-e7b0-1d04-df97-b6baf43ca7a3.txt
network-cni-6f27c0f1-403e-e3b3-5ed6-247ad5cd75d4.txt
network-cni-7cf7f389-fc5a-d2b0-9277-bda90c417c5e.txt
network-cni-811cc44d-9bba-a488-5dda-339e352134f2.txt
network-cni-8bd8b90d-0aa2-3bf4-f4c6-5896e6a511f7.txt
network-cni-8c041c8a-9958-e0bb-c6e3-e58009c42e74.txt
network-cni-8cc4df4d-af6e-1671-5a9b-a340c45ec4a9.txt
network-cni-8d4344d9-9861-99cc-aac0-d73359604a39.txt
network-cni-9296a672-f06f-5f37-5788-37c2e8902439.txt
network-cni-95c321a9-06c2-b6b5-ce55-2b7b02956c84.txt
network-cni-9a430727-72d5-fe1c-3fbd-d3da5d0b5a3a.txt
network-cni-a6da6a80-3dd0-3c38-b7ba-0739b8c1582c.txt
network-cni-a8b82c94-0101-40e5-2064-93de0c224a4f.txt
network-cni-af1da483-7cc0-e04d-1bcf-068353d2efe3.txt
network-cni-b3f5b13a-5554-80c4-983e-2899cda863e8.txt
network-cni-b5b409c3-cf23-1675-85f2-7e16bcbffa73.txt
network-cni-bea7547d-497c-fe86-e142-28577d48e834.txt
network-cni-c128a3fc-c92b-3750-680b-88518fddebf9.txt
network-cni-c3871fdc-622c-7eab-311f-0b0c03ee17f8.txt
network-cni-c79ccc77-9079-9566-d78b-ed244c2935d6.txt
network-cni-c835f421-817f-395c-bd7c-f6823215e246.txt
network-cni-d010d64b-8b65-61cf-c2e9-65d110a89e88.txt
network-cni-df9c551b-fb32-9fda-3aa9-dcd13276c33a.txt
network-cni-dffa85f2-2614-0036-85ea-3decc4ae630e.txt
network-cni-ef7b65ed-39de-23d6-00f0-79acc731255c.txt
network-cni-f4f5ae6d-9d3f-bbca-0424-2cb00a3086b9.txt
network-cni-ff0d03aa-a625-9d47-1246-6952e127e8cc.txt
network.txt
pam.txt
rpm.txt
summary.xml
supportconfig.txt
y2log.txt

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's about 1s per net-namespace.
hwinfo is not fast for me, but I suspect by far worse is that the supportconfig tries to ping each interface and default route in each net-namespace to find out if there is connectivity.

Actual measurements on my test-system aren't as bad as yours, Tim, but still there is indeed a significantly more noticeable slow-down than on my dev workstation, due to the number of net-namespaces.

harvester-node-0:/home/rancher # time supportconfig -m -i BOOT,DAEMONS,ETC,ISCSI,MEM,MOD,NTP,SMART,DISK,NET
[...]
==[ DONE ]===================================================================
  Log file tar ball: /var/log/scc_harvester-node-0_250228_1015.txz
  Log file size:     782K
  Log file md5sum:   beee0c20d98d78a00635665f4c2330af-f
=============================================================================



real    1m6.404s
user    0m38.196s
sys     0m11.587s

And without network info:

harvester-node-0:/home/rancher # time supportconfig -m -i BOOT,DAEMONS,ETC,ISCSI,MEM,MOD,NTP,SMART,DISK
[...]
==[ DONE ]===================================================================
  Log file tar ball: /var/log/scc_harvester-node-0_250228_1020.txz
  Log file size:     509K
  Log file md5sum:   d201e8e202c7efecc92804f9fe4cbe89-f
=============================================================================



real    0m47.713s
user    0m27.219s
sys     0m4.773s

But I think omitting this data from the supportbundle is the wrong solution. We should rather think about speeding up the supportconfig script or the supportbundle itself.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way of speeding up this part is to execute the supportconfig on each node at the same time. If I'm not mistaken, it is now running sequentially one after another

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

node level info is already collected by running the collector as a daemonset

@bk201 bk201 requested a review from tserong February 26, 2025 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants