-
Notifications
You must be signed in to change notification settings - Fork 875
Batch size changes output with same images #238
Description
Bug report
Information
Please specify the following information when submitting an issue:
-
What are your command line arguments?:
Command line args:
CUDA_VISIBLE_DEVICES=0 python -m pdb train.py --num_epochs 301 --continue_training false --dataset dataset --crop_height 352 --crop_width 480 --batch_size 4 --num_val_images 100 --model DeepLabV3_plus --frontend ResNet50 -
Have you written any custom code?:
I removed data augmentation by adding "return input_image, output_image" right at the beginning and removing an empty line to not change where other lines are later for breakpoints. I also tried both with is_training=False and is_training=True. -
What have you done to try and solve this issue?:
Googled why this might happen. Tried other models. -
TensorFlow version?:
'1.13.1'
Describe the problem
When calling sess.run the output will be different with the same images depending on the size of the batch they were included in.
Source code / logs
Running in pdb, this can be done with a fresh checkout to replicate the problem. I originally found it when trying to implement batch inference into predict.py but I doing this in train.py is the quickest way for you to reproduce the problem.
(Pdb) break train.py:197
...
(Pdb) output_image_last = sess.run(network,feed_dict={net_input:np.expand_dims(input_image, axis=0)})
(Pdb) output_images = sess.run(network,feed_dict={net_input:input_image_batch})
(Pdb) (input_image - input_image_batch[3]).max()
0.0
(Pdb) (output_image_last - output_images[3]).max()
1.0644385
The following is another set of commands I tested from the breakpoint at 197 if you want to copy paste quickly, for these you must remove data augmentation. These commands setup a batch within pdb of size 2 and 4 and just tries to generally test that the same input images produce different outputs depending on batch size.
output_image_last_alone = sess.run(network,feed_dict={net_input:np.expand_dims(input_image, axis=0)})
output_images_orig4 = sess.run(network,feed_dict={net_input:input_image_batch})
input_image_batch_manual2 = []
index = i * args.batch_size + j-1
id = id_list[index]
input_image2 = utils.load_image(train_input_names[id])
output_image2 = utils.load_image(train_output_names[id])
index = i * args.batch_size + j
id = id_list[index]
input_image3 = utils.load_image(train_input_names[id])
output_image3 = utils.load_image(train_output_names[id])
input_image2, output_image2 = data_augmentation(input_image2, output_image2)
input_image3, output_image3 = data_augmentation(input_image3, output_image3)
input_image2 = np.float32(input_image2) / 255.0
input_image3 = np.float32(input_image3) / 255.0
input_image_batch_manual2.append(np.expand_dims(input_image2, axis=0))
input_image_batch_manual2.append(np.expand_dims(input_image3, axis=0))
input_image_batch_manual2 = np.squeeze(np.stack(input_image_batch_manual2, axis=1))
output_images_batch2 = sess.run(network,feed_dict={net_input:input_image_batch_manual2})
input_image_batch_manual4 = []
index = i * args.batch_size + j-3
id = id_list[index]
input_image0 = utils.load_image(train_input_names[id])
output_image0 = utils.load_image(train_output_names[id])
index = i * args.batch_size + j-2
id = id_list[index]
input_image1 = utils.load_image(train_input_names[id])
output_image1 = utils.load_image(train_output_names[id])
input_image0, output_image0 = data_augmentation(input_image0, output_image0)
input_image1, output_image1 = data_augmentation(input_image1, output_image1)
input_image0 = np.float32(input_image0) / 255.0
input_image1 = np.float32(input_image1) / 255.0
input_image_batch_manual4.append(np.expand_dims(input_image0, axis=0))
input_image_batch_manual4.append(np.expand_dims(input_image1, axis=0))
index = i * args.batch_size + j-1
id = id_list[index]
input_image2 = utils.load_image(train_input_names[id])
output_image2 = utils.load_image(train_output_names[id])
index = i * args.batch_size + j
id = id_list[index]
input_image3 = utils.load_image(train_input_names[id])
output_image3 = utils.load_image(train_output_names[id])
input_image2, output_image2 = data_augmentation(input_image2, output_image2)
input_image3, output_image3 = data_augmentation(input_image3, output_image3)
input_image2 = np.float32(input_image2) / 255.0
input_image3 = np.float32(input_image3) / 255.0
input_image_batch_manual4.append(np.expand_dims(input_image2, axis=0))
input_image_batch_manual4.append(np.expand_dims(input_image3, axis=0))
input_image_batch_manual4 = np.squeeze(np.stack(input_image_batch_manual4, axis=1))
output_images_batch4 = sess.run(network,feed_dict={net_input:input_image_batch_manual4})
(input_image - input_image_batch[3]).max() #input image is the 4th image in the batch
(input_image - input_image_batch_manual2[1]).max() #input image is the 2nd image in this manually loaded batch loaded in pdb
(input_image - input_image_batch_manual4[3]).max() #input image is the 4th image in this manually loaded batch loaded in pdb
(output_image_last_alone - output_images_orig4[3]).max() #the single batch run produces a different output
(output_image_last_alone - output_images_batch2[1]).max() #the single batch run produces a different output
(output_image_last_alone - output_images_batch4[3]).max() #the single batch run produces a different output
(output_images_batch2[1] - output_images_batch4[3]).max() #batch size 2 produces different output than batch size 4
(output_images_orig4 - output_images_batch4).max() #the manually loaded batch produces the same output as the original batch