Skip to content
This repository was archived by the owner on Jul 5, 2021. It is now read-only.
This repository was archived by the owner on Jul 5, 2021. It is now read-only.

Batch size changes output with same images #238

@RorryB

Description

@RorryB

Bug report

Information

Please specify the following information when submitting an issue:

  • What are your command line arguments?:
    Command line args:
    CUDA_VISIBLE_DEVICES=0 python -m pdb train.py --num_epochs 301 --continue_training false --dataset dataset --crop_height 352 --crop_width 480 --batch_size 4 --num_val_images 100 --model DeepLabV3_plus --frontend ResNet50

  • Have you written any custom code?:
    I removed data augmentation by adding "return input_image, output_image" right at the beginning and removing an empty line to not change where other lines are later for breakpoints. I also tried both with is_training=False and is_training=True.

  • What have you done to try and solve this issue?:
    Googled why this might happen. Tried other models.

  • TensorFlow version?:
    '1.13.1'

Describe the problem

When calling sess.run the output will be different with the same images depending on the size of the batch they were included in.

Source code / logs

Running in pdb, this can be done with a fresh checkout to replicate the problem. I originally found it when trying to implement batch inference into predict.py but I doing this in train.py is the quickest way for you to reproduce the problem.
(Pdb) break train.py:197
...
(Pdb) output_image_last = sess.run(network,feed_dict={net_input:np.expand_dims(input_image, axis=0)})
(Pdb) output_images = sess.run(network,feed_dict={net_input:input_image_batch})
(Pdb) (input_image - input_image_batch[3]).max()
0.0
(Pdb) (output_image_last - output_images[3]).max()
1.0644385

The following is another set of commands I tested from the breakpoint at 197 if you want to copy paste quickly, for these you must remove data augmentation. These commands setup a batch within pdb of size 2 and 4 and just tries to generally test that the same input images produce different outputs depending on batch size.

output_image_last_alone = sess.run(network,feed_dict={net_input:np.expand_dims(input_image, axis=0)})
output_images_orig4 = sess.run(network,feed_dict={net_input:input_image_batch})

input_image_batch_manual2 = []

index = i * args.batch_size + j-1
id = id_list[index]
input_image2 = utils.load_image(train_input_names[id])
output_image2 = utils.load_image(train_output_names[id])

index = i * args.batch_size + j
id = id_list[index]
input_image3 = utils.load_image(train_input_names[id])
output_image3 = utils.load_image(train_output_names[id])
input_image2, output_image2 = data_augmentation(input_image2, output_image2)
input_image3, output_image3 = data_augmentation(input_image3, output_image3)
input_image2 = np.float32(input_image2) / 255.0
input_image3 = np.float32(input_image3) / 255.0
input_image_batch_manual2.append(np.expand_dims(input_image2, axis=0))
input_image_batch_manual2.append(np.expand_dims(input_image3, axis=0))
input_image_batch_manual2 = np.squeeze(np.stack(input_image_batch_manual2, axis=1))
output_images_batch2 = sess.run(network,feed_dict={net_input:input_image_batch_manual2})

input_image_batch_manual4 = []
index = i * args.batch_size + j-3
id = id_list[index]
input_image0 = utils.load_image(train_input_names[id])
output_image0 = utils.load_image(train_output_names[id])

index = i * args.batch_size + j-2
id = id_list[index]
input_image1 = utils.load_image(train_input_names[id])
output_image1 = utils.load_image(train_output_names[id])
input_image0, output_image0 = data_augmentation(input_image0, output_image0)
input_image1, output_image1 = data_augmentation(input_image1, output_image1)
input_image0 = np.float32(input_image0) / 255.0
input_image1 = np.float32(input_image1) / 255.0
input_image_batch_manual4.append(np.expand_dims(input_image0, axis=0))
input_image_batch_manual4.append(np.expand_dims(input_image1, axis=0))
index = i * args.batch_size + j-1
id = id_list[index]
input_image2 = utils.load_image(train_input_names[id])
output_image2 = utils.load_image(train_output_names[id])

index = i * args.batch_size + j
id = id_list[index]
input_image3 = utils.load_image(train_input_names[id])
output_image3 = utils.load_image(train_output_names[id])
input_image2, output_image2 = data_augmentation(input_image2, output_image2)
input_image3, output_image3 = data_augmentation(input_image3, output_image3)
input_image2 = np.float32(input_image2) / 255.0
input_image3 = np.float32(input_image3) / 255.0
input_image_batch_manual4.append(np.expand_dims(input_image2, axis=0))
input_image_batch_manual4.append(np.expand_dims(input_image3, axis=0))
input_image_batch_manual4 = np.squeeze(np.stack(input_image_batch_manual4, axis=1))
output_images_batch4 = sess.run(network,feed_dict={net_input:input_image_batch_manual4})

(input_image - input_image_batch[3]).max() #input image is the 4th image in the batch
(input_image - input_image_batch_manual2[1]).max() #input image is the 2nd image in this manually loaded batch loaded in pdb
(input_image - input_image_batch_manual4[3]).max() #input image is the 4th image in this manually loaded batch loaded in pdb

(output_image_last_alone - output_images_orig4[3]).max() #the single batch run produces a different output
(output_image_last_alone - output_images_batch2[1]).max() #the single batch run produces a different output
(output_image_last_alone - output_images_batch4[3]).max() #the single batch run produces a different output
(output_images_batch2[1] - output_images_batch4[3]).max() #batch size 2 produces different output than batch size 4

(output_images_orig4 - output_images_batch4).max() #the manually loaded batch produces the same output as the original batch

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions