Replies: 5 comments 10 replies
-
|
Your dtrace command appears to be off by an order of magnitude. Thinking about it, I can't immediately come up with any changes in 2.0 or 2.1 that would drastically change the allocation behavior. |
Beta Was this translation helpful? Give feedback.
-
|
@bdrewery I have no quick ideas what could be the issue. I haven't seen TrueNAS reports alike to that, and our users run plenty of FreeBSD systems with 2.1.x in a wild. Though we use custom ZFS builds, based on 2.1.x, not the one from base FreeBSD, but they should be very close. Whats about places where 1MB buffers can be allocated, since you are using 1MB recordsize, it can obviously be data buffers. Though ARC does not normally keep so big blocks, unless it is tuned to, but instead copies the content into a chain of PAGE_SIZE chunks to free KVA. 1MB allocations are used for data in small DBUF cache (recently accessed); for dirty buffers that were just recently modified and still being written (though in first two cases you'd likely see zio_data_buf_1048576, not zio_buf_1048576, since IIRC metadata should not use large blocks unless tuned so); for ZIOs in pipeline, for example compression/decompression code allocates linear buffers that way (I see those among your stack traces, and as I see that code always allocates zio_buf_*); and finally if FreeBSD-specific vdev_geom.c code can't execute aggregated I/O via BIO_UNMAPPED GEOM request it calls abd_borrow_buf()/abd_borrow_buf_copy() to get its linear copy. I actually see above several threads in abd_borrow_buf_copy(), as I understand waiting for buffer allocations to execute writes. But that may be a part of normal operation, either because your HBA does not support BIO_UNMAPPED, or simply because some buffers in that aggregated I/O (BTW I/O aggregation is also done up to 1MB for HDDs) are not page-aligned. So it may or may not be a problem. Though it makes me wonder what HBA/disk controller/driver are you using? Actually, considering "block size: 512B configured, 4096B native" I see, your pool likely runs with ashift=9, that means there can be some buffers not aligned to PAGE_SIZE. TrueNAS almost always use ashift=12 to 4K disk compatibility and should almost always be able to use BIO_UNMAPPED, since all disk I/Os are page-aligned. Just thinking about possible differences... |
Beta Was this translation helpful? Give feedback.
-
|
Do you have any datasets with a recordsize of larger than the default 128kb? |
Beta Was this translation helpful? Give feedback.
-
|
In a similar vein, I have several FreeBSD-Stable 13.1 OoM crash dumps where several ZFS UMA allocators have insane memory usage. "Insane" in that the vmstat -z shows abd_chunk and zio_buf_comb_1048576 with usages that exceed system memory. For example on a 256G system (processed vmstat -z -M -N output): recordsize for the I/O intensive pools on this system are 1M |
Beta Was this translation helpful? Give feedback.
-
|
is this the problem i also have? wired explodes. this is NOT arc. git seems to push it over the edge quite easily, some cases 100% it goes so far that kernel kills entire userland off. why it doesn't panic, i don't know, maybe the workload was just killed off it goes fast, in few seconds it's all gone. i have just 4g ram so it's literally ram speed i guess i don't see any use in kernel allocating over 95% ram in split second. if anyone wants, they can do it whatever zfs does, which i don't know, is it seems to accept new calls from userland and it doesn't ever limit them. even if it chokes up. it feels like disks aren't keeping up. but if storage is not keeping up with things, there should be some brake here as it could quickly become uncontrollable i bet one can fill up any ram size as quickly as my machine does there are no other issues in this machine. i think it's write. as reads seem fine. except scrub also seems to get it on edge. procs were killed and i was presented with last pid: 5399; load averages: 0.90, 0.77, 0.59 up 14+02:38:07 08:58:37 any (z)fs experts? arc seems fine. limits apply. iirc ram * 0.6. what else goes on there, i don't know |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I would open an issue but I don't have enough data yet to describe the problem. I just have a small home server for Nextcloud and a few vms. Nothing too special.
I've ran into numerous panics over the last few weeks with 2.1.4 and 2.1.5 where specifically
zio_buf_1048576explodes from 0 to 20G in what seems like seconds. (Some data below). This isn't ARC; I had that limited to 12G. Note that I have partially reverted 309c32c to splitzio_data_bufback out on its own after running into this problem earlier. When this happened the system went into OOM, swapped out everything and made no progress on anything. I had to reboot. I did mange to get a kernel dump on the last time.Before this I was using whatever ZFS FreeBSD 12 had, not ZoL/zfs 2. It was stable.
Looking at my system now and over the last 24 hours there have been 0 allocations of
zio_buf_1048576.I've been running(I do see hits with proper sizedtrace -n 'fbt::zio_buf_alloc:entry { if (arg0 == 10485760) stack(); }'with no hits.1048576but the total allocations invmstat -zremain very low outside of this issue)My question is what exactly would cause
zio_buf_1048576to be allocated so rarely and so quickly? How might I find what is using them in the kernel dump?Current
top. The 10G free is close to what pre-explode looks like. I explicitly setvfs.zfs.arc_free_targetto keep 10G pages free.(There is some dedup but it is ancient data and relatively small)
Some customizations:
From
vmstat -zof last panic from dump (parsed)And a previous time from a dump
Current. No 1M allocations.
Some interesting threads from gdb.
Is there a way for me to find what is using all of the 1M
zio_bufallocations?Beta Was this translation helpful? Give feedback.
All reactions