Why do you need transpose here _s, state_word, _ = word_attn_model(mini_batch[i,:,:].transpose(0,1), state_word) and here: torch.from_numpy(main_matrix).transpose(0,1) in def pad_batch Thanks :)