-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Closed
Labels
Description
- 版本、环境信息:
1)PaddlePaddle版本:请提供您的PaddlePaddle版本号,例如1.1或CommitID
2)CPU/GPU:如果您使用GPU训练,请提供GPU驱动版本、CUDA和cuDNN版本号
3)系统环境:请您描述系统类型、版本,例如Mac OS 10.14
4)Python版本号
5)显存信息
$ python3.7 tools/summary_env.py
****************************************
Paddle version: 2.2.2
Paddle With CUDA: False
OS: Ubuntu 18.04
Python version: 3.7.12
CUDA version: None
cuDNN version: None.None.None
Nvidia driver version: None
****************************************
- 复现步骤
$ echo "
FROM paddlepaddle/paddle:2.2.2-gpu-cuda11.2-cudnn8
RUN export DEBIAN_FRONTEND=noninteractive && \
apt update && apt install python3 python3-pip -y
RUN git clone https://github.com/PaddlePaddle/Paddle.git -b v2.2.2 && \
sed -i 's/BRPC_DEPS brpc/BRPC_DEPS brpc ssl crypto/g' Paddle/paddle/fluid/framework/CMakeLists.txt
RUN pip install --upgrade pip && \
pip install -r Paddle/python/requirements.txt
RUN cd Paddle && \
mkdir -p build && \
cd build && \
cmake .. \
-DCMAKE_EXPORT_COMPILE_COMMANDS=ON \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_CUDA_FLAGS="-t0" \
-DCUDA_ARCH_NAME=Manual \
-DCUDA_ARCH_BIN="70" \
-DWITH_INCREMENTAL_COVERAGE=OFF \
-DWITH_INFERENCE_API_TEST=OFF \
-DWITH_DISTRIBUTE=OFF \
-DWITH_COVERAGE=OFF \
-DWITH_TENSORRT=OFF \
-DWITH_TESTING=ON \
-DWITH_CONTRIB=OFF \
-DWITH_ROCM=OFF \
-DWITH_RCCL=OFF \
-DWITH_STRIP=ON \
-DWITH_MKL=OFF \
-DWITH_AVX=OFF \
-DWITH_GPU=ON \
-DWITH_PYTHON=ON \
-DPY_VERSION=3.7
RUN cd Paddle/build && make -j`nproc`
" > Dockerfile
$ docker build -t paddle .
$ docker run -it --rm paddle bash -c 'set -xe; cd Paddle/build; for i in `seq 1 5000`; do ctest -R test_standalone_executor --output-on-failure; done'
- 问题描述:请详细描述您的问题,同步贴出报错信息、日志/代码关键片段
test_standalone_executorfailed randomly. The error log is following
+ for i in `seq 1 5000`
+ ctest -R test_standalone_executor --output-on-failure
Test project /home/Paddle/build
Start 1242: test_standalone_executor
1/1 Test #1242: test_standalone_executor .........***Failed 6.58 sec
W0330 16:28:45.547430 26695 init.cc:202] AVX is available, Please re-compile on local machine
W0330 16:28:45.693004 26695 init.cc:202] AVX is available, Please re-compile on local machine
W0330 16:28:45.693174 26695 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W0330 16:28:45.697403 26695 device_context.cc:465] device: 0, cuDNN Version: 8.1.
W0330 16:28:47.651964 26695 init.cc:202] AVX is available, Please re-compile on local machine
--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
No stack trace in paddle, may be caused by external reasons.
----------------------
Error Message Summary:
----------------------
FatalError: `Segmentation fault` is detected by the operating system.
[TimeInfo: *** Aborted at 1648657727 (unix time) try "date -d @1648657727" if you are using GNU date ***]
[SignalInfo: *** SIGSEGV (@0x0) received by PID 26695 (TID 0x7fb851bfc700) from PID 0 ***]
Segmentation fault
0% tests passed, 1 tests failed out of 1
Total Test time (real) = 6.63 sec
The following tests FAILED:
1242 - test_standalone_executor (Failed)
Errors while running CTest