Skip to content

test_standalone_executor random failure #41182

@zlsh80826

Description

@zlsh80826
  • 版本、环境信息:
       1)PaddlePaddle版本:请提供您的PaddlePaddle版本号,例如1.1或CommitID
       2)CPU/GPU:如果您使用GPU训练,请提供GPU驱动版本、CUDA和cuDNN版本号
       3)系统环境:请您描述系统类型、版本,例如Mac OS 10.14
       4)Python版本号
       5)显存信息
 $ python3.7 tools/summary_env.py
****************************************
Paddle version: 2.2.2
Paddle With CUDA: False

OS: Ubuntu 18.04
Python version: 3.7.12

CUDA version: None
cuDNN version: None.None.None
Nvidia driver version: None
****************************************
  • 复现步骤
$ echo "
FROM paddlepaddle/paddle:2.2.2-gpu-cuda11.2-cudnn8

RUN export DEBIAN_FRONTEND=noninteractive && \
    apt update && apt install python3 python3-pip -y

RUN git clone https://github.com/PaddlePaddle/Paddle.git -b v2.2.2 && \
    sed -i 's/BRPC_DEPS brpc/BRPC_DEPS brpc ssl crypto/g' Paddle/paddle/fluid/framework/CMakeLists.txt

RUN pip install --upgrade pip && \
    pip install -r Paddle/python/requirements.txt

RUN cd Paddle && \
    mkdir -p build && \
    cd build && \
    cmake .. \
    -DCMAKE_EXPORT_COMPILE_COMMANDS=ON \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_CUDA_FLAGS="-t0" \
    -DCUDA_ARCH_NAME=Manual \
    -DCUDA_ARCH_BIN="70" \
    -DWITH_INCREMENTAL_COVERAGE=OFF \
    -DWITH_INFERENCE_API_TEST=OFF \
    -DWITH_DISTRIBUTE=OFF \
    -DWITH_COVERAGE=OFF \
    -DWITH_TENSORRT=OFF \
    -DWITH_TESTING=ON \
    -DWITH_CONTRIB=OFF \
    -DWITH_ROCM=OFF \
    -DWITH_RCCL=OFF \
    -DWITH_STRIP=ON \
    -DWITH_MKL=OFF \
    -DWITH_AVX=OFF \
    -DWITH_GPU=ON \
    -DWITH_PYTHON=ON \
    -DPY_VERSION=3.7

RUN cd Paddle/build && make -j`nproc`
" > Dockerfile
$ docker build -t paddle .
$ docker run -it --rm paddle bash -c 'set -xe; cd Paddle/build; for i in `seq 1 5000`; do ctest -R test_standalone_executor --output-on-failure; done'
  • 问题描述:请详细描述您的问题,同步贴出报错信息、日志/代码关键片段
    test_standalone_executor failed randomly. The error log is following
+ for i in `seq 1 5000`
+ ctest -R test_standalone_executor --output-on-failure
Test project /home/Paddle/build
    Start 1242: test_standalone_executor
1/1 Test #1242: test_standalone_executor .........***Failed    6.58 sec
W0330 16:28:45.547430 26695 init.cc:202] AVX is available, Please re-compile on local machine
W0330 16:28:45.693004 26695 init.cc:202] AVX is available, Please re-compile on local machine
W0330 16:28:45.693174 26695 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W0330 16:28:45.697403 26695 device_context.cc:465] device: 0, cuDNN Version: 8.1.
W0330 16:28:47.651964 26695 init.cc:202] AVX is available, Please re-compile on local machine


--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
No stack trace in paddle, may be caused by external reasons.

----------------------
Error Message Summary:
----------------------
FatalError: `Segmentation fault` is detected by the operating system.
  [TimeInfo: *** Aborted at 1648657727 (unix time) try "date -d @1648657727" if you are using GNU date ***]
  [SignalInfo: *** SIGSEGV (@0x0) received by PID 26695 (TID 0x7fb851bfc700) from PID 0 ***]

Segmentation fault


0% tests passed, 1 tests failed out of 1

Total Test time (real) =   6.63 sec

The following tests FAILED:
        1242 - test_standalone_executor (Failed)
Errors while running CTest

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions