-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Hello, I'm facing major corruption, luckily I had backups for most things. The actual scenario is moving every single mountpoint from all LXCs to Moosefs. At this exact point when this problem happened I was rebooting the moosefs master safely (systemctl restart moosefs-master). I'm on MooseFS Community Edition.
I mounted moosefs with Use Block Device feature enabled
First problem that I'm facing, the plugin isn't cancelling operations on failures, there's an example where I did a move + delete source and this happened
started block device: (/dev/mfs/mfsmaster.main_9421__images_178_vm-178-disk-0->/dev/nbd2 : MFS://images/178/vm-178-disk-0 : 120.000GiB)
Creating filesystem with 31457280 4k blocks and 7864320 inodes
Filesystem UUID: cebb78ff-31d1-4478-899c-213db950b394
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872
/dev/rbd1
Number of files: 123,836 (reg: 110,154, dir: 13,680, link: 2)
Number of created files: 123,834 (reg: 110,154, dir: 13,678, link: 2)
Number of deleted files: 0
Number of regular files transferred: 110,154
Total file size: 36,692,106,197 bytes
Total transferred file size: 36,692,106,080 bytes
Literal data: 36,692,106,080 bytes
Matched data: 0 bytes
File list size: 3,997,413
File list generation time: 0.052 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 36,709,001,584
Total bytes received: 2,206,455
sent 36,709,001,584 bytes received 2,206,455 bytes 17,296,211.09 bytes/sec
total size is 36,692,106,197 speedup is 1.00
can't unmap MooseFS device '/images/178/vm-178-disk-0': error receiving data from '/dev/mfs/nbdsock': Connection timed out
volume deactivation failed: mfsBlock:178/vm-178-disk-0
Removing image: 1% complete...
Removing image: 2% complete...
Removing image: 3% complete...
Removing image: 4% complete...
Removing image: 5% complete...
Removing image: 6% complete...
Removing image: 7% complete...
// The source image got completely deleted, im truncating the log.Why is this important?
Not only because it completely nuked the source image (!), but after moving a secondary mountpoint from this same LXC with ID 178, 15 minutes later, it overwrote the 'vm-178-disk-0' image. To put it in better words:
Both move operations failed with the same log shown above. It could be due to a moosefs master restart in the middle or simply that the server lost connection.
Before migration, my rootfs: was vm-178-disk-0, my mp0: was vm-178-disk-1. First I started moving the rootfs from my source storage to MooseFS. After some time, this error appears Connection timed out, but the source image is pruned anyway. As I did not see any error, I started moving mp0:, which overwrote the previous 'vm-178-disk-0' that was in /mnt/pve/Block/images/178/. I will never know if the rootfs image did copy successfully and moosefs overwrote it