-
Notifications
You must be signed in to change notification settings - Fork 36
support bundle: add node network information #135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Add node network information to supportbundle. The supportbundle already collects a lot of configs from the node. This adds the network config as well, which aids when debugging network issues. related-to: harvester/harvester#7714 Signed-off-by: Moritz Röhrich <[email protected]>
|
@ravarga this should do the trick and help you debugging network issues in the future. |
| # Generate supportconfig from node | ||
| chroot $HOST_PATH /sbin/supportconfig -c -m -B supportconfig_$SUPPORT_BUNDLE_NODE_NAME \ | ||
| -i BOOT,DAEMONS,ETC,ISCSI,MEM,MOD,NTP,SMART,DISK,pharvester_plugin_rke2,pharvester_plugin_console | ||
| -i BOOT,DAEMONS,ETC,ISCSI,MEM,MOD,NET,NTP,SMART,DISK,pharvester_plugin_rke2,pharvester_plugin_console |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this increase the time to collect supportconfig on a node a lot? Previously we had an issue harvester/harvester#5323, the progress is quite slow due to the failing collecting in namespaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a valid point and it can indeed happen. Supportconfig is a useful source of information and it should even run to the full extent but it should be an optional part of supportbundle. When debugging issues it may be required to have it but many times it adds time in the middle of an issue where fast action is required.
With that, if we can make supportconfig an optional part of SB I believe that would be the best option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change does not include Kubernetes objects. It just collects the network configuration from the nodes (iptable entries, network interface status, etc.).
On my workstation, a basic supportconfig takes between 30 and 40 seconds. Most of that time is spent looking at the RPM database.
Excluding the two custom module (i.e. with -i BOOT,DAEMONS,ETC,ISCSI,MEM,MOD,NTP,SMART,DISK), the supportconfig takes about 1 minute 10 seconds.
It varies from run to run by about 7 seconds.
With the networking info it's about the same - again, varying several seconds between each run.
In conclusion, I think collecting the networking info via the supportconfig adds an insignificant amount of time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a single node? Imagine running it on a 9-node cluster when storage is misbehaving.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So? Then additionally collecting networking info will add an insignificant delay to each of the nine nodes respective supportconfig.
Even if they are collected sequentially and the delays add up I doubt that it will be a noticeable impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just did a little test on a single harvester v1.4.1 node I have running here in a VM on my laptop:
As a baseline, grabbing only minimal info:
harvester-node-0:~ # time supportconfig -m
[...]
==[ DONE ]===================================================================
Log file tar ball: /var/log/scc_harvester-node-0_250228_0740.txz
Log file size: 312K
Log file md5sum: 8ff1b54ccc45a0b3b9db28c44e78afb7-f
=============================================================================
real 0m10.006s
user 0m1.863s
sys 0m0.350s
Minimal + NET somehow added slightly over a minute to the process:
harvester-node-0:~ # time supportconfig -m -i NET
[...]
==[ DONE ]===================================================================
Log file tar ball: /var/log/scc_harvester-node-0_250228_0742.txz
Log file size: 570K
Log file md5sum: 88a649665bb6b31628c1877c1e6a847a-f
=============================================================================
real 1m11.975s
user 0m10.296s
sys 0m5.647s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah...
# ls -1 /var/log/scc_harvester-node-0_250228_0740
basic-environment.txt
basic-health-check.txt
hardware.txt
messages.txt
messages_config.txt
messages_localwarn.txt
pam.txt
rpm.txt
summary.xml
supportconfig.txt
y2log.txt
# ls -1 /var/log/scc_harvester-node-0_250228_0742
basic-environment.txt
basic-health-check.txt
hardware.txt
messages.txt
messages_config.txt
messages_localwarn.txt
network-cni-08df30b6-4f8a-230c-9808-06a784961dc5.txt
network-cni-0afefa77-b59f-1e8a-9013-1e9c39a7d390.txt
network-cni-0dc411a5-78b9-f874-55f1-1f3ba0c791f3.txt
network-cni-0e1945a5-27db-2013-1a31-a84e8b47ba69.txt
network-cni-104ab7c0-002e-0617-8bd9-1e4faff44587.txt
network-cni-126366c4-835c-13c5-abf5-2f7f0167b912.txt
network-cni-1e7bbb66-28e2-6cfe-d83f-e259f5f22b32.txt
network-cni-2631ced7-76ef-23c2-edec-090a79abe9cc.txt
network-cni-332c4f98-523d-dced-d788-5ee24fef437a.txt
network-cni-334c493b-a0af-90da-2187-9b81093ba4be.txt
network-cni-337244ec-98e1-6f9d-61d3-7a50d6836cdb.txt
network-cni-349066b4-da91-a86b-db90-e18b9bc37c1a.txt
network-cni-36a65244-976b-cd8b-56cb-4c23104160ee.txt
network-cni-4951873c-5f1a-bf4d-4aa5-847feeea8bcc.txt
network-cni-4f01ca2e-b90b-251e-0f54-449d0755c2b3.txt
network-cni-4f8580ab-c56f-170d-85fd-ae8f73f99da2.txt
network-cni-500b77f0-b615-71c8-603f-4531a45e142f.txt
network-cni-50471341-f851-3bbc-7b82-c0606f65011b.txt
network-cni-53af6031-8f56-8025-be9d-238b5c607b37.txt
network-cni-588fb6e3-5f2d-2755-3ddf-a663c4992ffc.txt
network-cni-6411e29b-2676-8e86-2594-6e4f57e1f630.txt
network-cni-643fe3da-bbe9-f475-18a2-3d6e83ad08e7.txt
network-cni-682afd8e-530e-b9db-be47-9e374dc83f90.txt
network-cni-6bc16ca3-e7b0-1d04-df97-b6baf43ca7a3.txt
network-cni-6f27c0f1-403e-e3b3-5ed6-247ad5cd75d4.txt
network-cni-7cf7f389-fc5a-d2b0-9277-bda90c417c5e.txt
network-cni-811cc44d-9bba-a488-5dda-339e352134f2.txt
network-cni-8bd8b90d-0aa2-3bf4-f4c6-5896e6a511f7.txt
network-cni-8c041c8a-9958-e0bb-c6e3-e58009c42e74.txt
network-cni-8cc4df4d-af6e-1671-5a9b-a340c45ec4a9.txt
network-cni-8d4344d9-9861-99cc-aac0-d73359604a39.txt
network-cni-9296a672-f06f-5f37-5788-37c2e8902439.txt
network-cni-95c321a9-06c2-b6b5-ce55-2b7b02956c84.txt
network-cni-9a430727-72d5-fe1c-3fbd-d3da5d0b5a3a.txt
network-cni-a6da6a80-3dd0-3c38-b7ba-0739b8c1582c.txt
network-cni-a8b82c94-0101-40e5-2064-93de0c224a4f.txt
network-cni-af1da483-7cc0-e04d-1bcf-068353d2efe3.txt
network-cni-b3f5b13a-5554-80c4-983e-2899cda863e8.txt
network-cni-b5b409c3-cf23-1675-85f2-7e16bcbffa73.txt
network-cni-bea7547d-497c-fe86-e142-28577d48e834.txt
network-cni-c128a3fc-c92b-3750-680b-88518fddebf9.txt
network-cni-c3871fdc-622c-7eab-311f-0b0c03ee17f8.txt
network-cni-c79ccc77-9079-9566-d78b-ed244c2935d6.txt
network-cni-c835f421-817f-395c-bd7c-f6823215e246.txt
network-cni-d010d64b-8b65-61cf-c2e9-65d110a89e88.txt
network-cni-df9c551b-fb32-9fda-3aa9-dcd13276c33a.txt
network-cni-dffa85f2-2614-0036-85ea-3decc4ae630e.txt
network-cni-ef7b65ed-39de-23d6-00f0-79acc731255c.txt
network-cni-f4f5ae6d-9d3f-bbca-0424-2cb00a3086b9.txt
network-cni-ff0d03aa-a625-9d47-1246-6952e127e8cc.txt
network.txt
pam.txt
rpm.txt
summary.xml
supportconfig.txt
y2log.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's about 1s per net-namespace.
hwinfo is not fast for me, but I suspect by far worse is that the supportconfig tries to ping each interface and default route in each net-namespace to find out if there is connectivity.
Actual measurements on my test-system aren't as bad as yours, Tim, but still there is indeed a significantly more noticeable slow-down than on my dev workstation, due to the number of net-namespaces.
harvester-node-0:/home/rancher # time supportconfig -m -i BOOT,DAEMONS,ETC,ISCSI,MEM,MOD,NTP,SMART,DISK,NET
[...]
==[ DONE ]===================================================================
Log file tar ball: /var/log/scc_harvester-node-0_250228_1015.txz
Log file size: 782K
Log file md5sum: beee0c20d98d78a00635665f4c2330af-f
=============================================================================
real 1m6.404s
user 0m38.196s
sys 0m11.587s
And without network info:
harvester-node-0:/home/rancher # time supportconfig -m -i BOOT,DAEMONS,ETC,ISCSI,MEM,MOD,NTP,SMART,DISK
[...]
==[ DONE ]===================================================================
Log file tar ball: /var/log/scc_harvester-node-0_250228_1020.txz
Log file size: 509K
Log file md5sum: d201e8e202c7efecc92804f9fe4cbe89-f
=============================================================================
real 0m47.713s
user 0m27.219s
sys 0m4.773s
But I think omitting this data from the supportbundle is the wrong solution. We should rather think about speeding up the supportconfig script or the supportbundle itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One way of speeding up this part is to execute the supportconfig on each node at the same time. If I'm not mistaken, it is now running sequentially one after another
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
node level info is already collected by running the collector as a daemonset
Add node network information to supportbundle. The supportbundle already collects a lot of configs from the node. This adds the network config as well, which aids when debugging network issues.
related-to: harvester/harvester#7714