Skip to content

bug: memory of position_encoding_table is not malloced correctly. #790

@johnson-magic

Description

@johnson-magic

Branch/Tag/Commit

main

Docker Image Version

nvcr.io/nvidia/pytorch:22.12-py3

GPU name

A10

CUDA Driver

535.54.03

Reproduced Steps

1. docker run -ti --gpus all --rm nvcr.io/nvidia/pytorch:22.12-py3 bash
2. git clone --recursive https://github.com/NVIDIA/FasterTransformer.git
3. cd FasterTransformer
4. mkdir build
5. cd build
6. cmake -DSM=86 -DCMAKE_BUILD_TYPE=Release ..
7. make -j14
8. CUDA_VISIBLE_DEVICES=0 ./satrn 1 1 8 64 2048 4022 3 100 576 512 0 0.0 0

Abnormal Phenomena:
in

val = val + position_encoding[step_offset + col_index];
, step_offset is calculated with intervals of hidden_units,

So I think

cudaD2Dcpy(weights_ptr[0], other.weights_ptr[0], max_seq_len_ * vocab_size_);
should be
cudaD2Dcpy(weights_ptr[0], other.weights_ptr[0], max_seq_len_ * hidden_units_);
instead of
cudaD2Dcpy(weights_ptr[0], other.weights_ptr[0], max_seq_len_ * vocab_size_);

There are two similar situations

cudaD2Dcpy(weights_ptr[0], other.weights_ptr[0], max_seq_len_ * vocab_size_);

deviceMalloc(&weights_ptr[0], max_seq_len_ * vocab_size_);

I have pull a pr to try to fix it. @byshiue

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions