-
Notifications
You must be signed in to change notification settings - Fork 118
Closed
Description
On a ppc64el system with this kernel and a WX7100 (Polaris) card, loading the amdgpu module results in a kernel oops. Note that the upstream Linux 4.15 amdgpu module works and allows a full graphical environment to load; the oops is specific to the AMD 4.13 kernel. Oops follows:
[ 89.848698] checking generic (600c280010000 500000) vs hw (6000000000000 10000000)
[ 89.848800] amdgpu 0000:01:00.0: enabling device (0140 -> 0142)
[ 89.915446] [drm] initializing kernel modesetting (POLARIS10 0x1002:0x67C4 0x1002:0x0B0D 0x00).
[ 89.965406] [drm] register mmio base: 0x00000000
[ 89.965458] [drm] register mmio size: 262144
[ 89.965502] [drm] PCI I/O BAR is not found.
[ 89.965540] [drm] probing gen 2 caps for device 1014:4c1 = 300104/180001e
[ 89.965584] [drm] probing mlw for device 1014:4c1 = 300104
[ 89.965631] [drm] UVD is enabled in VM mode
[ 89.965658] [drm] VCE enabled in VM mode
[ 90.299090] [drm] PCI I/O BAR is not found. Using MMIO to access ATOM BIOS
[ 90.299092] ATOM BIOS: 113-C9540101-100
[ 90.299103] [drm] GPU post is not needed
[ 90.299130] [drm] vm size is 64 GB, block size is 13-bit, fragment size is 9-bit
[ 90.299147] amdgpu: No suitable DMA available
[ 92.836890] amdgpu 0000:01:00.0: VRAM: 8192M 0x000000F400000000 - 0x000000F5FFFFFFFF (8192M used)
[ 92.836969] amdgpu 0000:01:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[ 92.837021] [drm] Detected VRAM RAM=8192M, BAR=256M
[ 92.837056] [drm] RAM width 256bits GDDR5
[ 92.837183] [TTM] Zone kernel: Available graphics memory: 7471346 kiB
[ 92.837227] [TTM] Initializing pool allocator
[ 92.837289] [drm] amdgpu: 8192M of VRAM memory ready
[ 92.837325] [drm] amdgpu: 8192M of GTT memory ready.
[ 92.837383] [drm] GART: num cpu pages 65536, num gpu pages 65536
[ 92.837555] [drm] PCIE GART of 256M enabled (table at 0x000000F400040000).
[ 92.837607] amdgpu 0000:01:00.0: (-12) failed to allocate kernel bo
[ 92.837651] amdgpu 0000:01:00.0: (-12) create WB bo failed
[ 92.837829] [drm:amdgpu_device_init [amdgpu]] *ERROR* amdgpu_wb_init failed -12
[ 92.837912] amdgpu 0000:01:00.0: amdgpu_init failed
[ 92.838002] Unable to handle kernel paging request for data at address 0xc00c000085a80000
[ 92.838066] Faulting instruction address: 0xc008000005a2f1cc
[ 92.838122] Oops: Kernel access of bad area, sig: 11 [#1]
[ 92.838166] SMP NR_CPUS=2048
[ 92.838168] NUMA
[ 92.838200] PowerNV
[ 92.838257] Modules linked in: amdgpu(+) mfd_core ttm drm_kms_helper drm syscopyarea sysfillrect sysimgblt fb_sys_fops i2c_algo_bit i2c_dev ghash_generic gf128mul ecb snd_hda_codec_hdmi snd_hda_intel xts snd_hda_codec joydev ofpart ctr evdev ipmi_powernv powernv_flash ipmi_devintf cbc snd_hda_core vmx_crypto mtd snd_hwdep ipmi_msghandler at24 opal_prd binfmt_misc snd_aloop snd_pcm snd_timer snd soundcore parport_pc lp parport ip_tables x_tables autofs4 nfsv3 nfs_acl nfs lockd grace sunrpc fscache hid_generic usbhid hid xhci_pci xhci_hcd usbcore tg3 ptp pps_core libphy
[ 92.838719] CPU: 0 PID: 971 Comm: kworker/0:1 Not tainted 4.13.0+ #1
[ 92.838778] Workqueue: events work_for_cpu_fn
[ 92.838823] task: c0000001d35c4700 task.stack: c0000001d35c8000
[ 92.838876] NIP: c008000005a2f1cc LR: c0080000059a036c CTR: c008000005a2f178
[ 92.838940] REGS: c0000001d35cb4c0 TRAP: 0300 Not tainted (4.13.0+)
[ 92.838993] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
[ 92.839003] CR: 28002288 XER: 20040000
[ 92.839079] CFAR: c008000005a2f1ac DAR: c00c000085a80000 DSISR: 42000000 SOFTE: 1
GPR00: c0080000059a036c c0000001d35cb740 c008000005c5bde0 c000000009be0000
GPR04: c00c000085a80000 0000000000000000 0000000000080000 0000000000000000
GPR08: 0000000000000001 c008000005a2f178 0000000000000001 c008000004a1e5d8
GPR12: c008000005a2f178 c00000000fb80000 c000000000128568 c000000009be2f20
GPR16: c000000009be2f28 c000000009be2f18 c000000009be2f38 c000000009be2f40
GPR20: c000000009be2f30 0000000000008000 0000000000000400 c000000009be2f38
GPR24: c000000009be2f40 c000000009be2f30 c000000009be2f18 0000000000000000
GPR28: 0000000000000000 0000000000000000 c00c000085a80000 0000000000080000
[ 92.839738] NIP [c008000005a2f1cc] gmc_v8_0_gart_set_pte_pde+0x54/0x90 [amdgpu]
[ 92.839914] LR [c0080000059a036c] amdgpu_gart_unbind+0xa4/0x130 [amdgpu]
[ 92.839968] Call Trace:
[ 92.839992] [c0000001d35cb740] [c000000009be2720] 0xc000000009be2720 (unreliable)
[ 92.840138] [c0000001d35cb780] [c0080000059a036c] amdgpu_gart_unbind+0xa4/0x130 [amdgpu]
[ 92.840290] [c0000001d35cb800] [c0080000059a06e8] amdgpu_gart_fini+0x40/0x70 [amdgpu]
[ 92.840447] [c0000001d35cb830] [c008000005a30b98] gmc_v8_0_sw_fini+0x50/0x90 [amdgpu]
[ 92.840593] [c0000001d35cb860] [c00800000597f1d0] amdgpu_fini+0x208/0x560 [amdgpu]
[ 92.840741] [c0000001d35cb910] [c008000005985b5c] amdgpu_device_init+0xcc4/0x1590 [amdgpu]
[ 92.840889] [c0000001d35cba30] [c0080000059880fc] amdgpu_driver_load_kms+0xb4/0x2d0 [amdgpu]
[ 92.840976] [c0000001d35cbab0] [c0080000044cab7c] drm_dev_register+0x1d4/0x290 [drm]
[ 92.841121] [c0000001d35cbb50] [c00800000597d880] amdgpu_pci_probe+0x128/0x1f0 [amdgpu]
[ 92.841228] [c0000001d35cbbd0] [c0000000005d851c] local_pci_probe+0x6c/0x140
[ 92.841296] [c0000001d35cbc60] [c0000000001199d8] work_for_cpu_fn+0x38/0x60
[ 92.843968] [c0000001d35cbc90] [c00000000011ead8] process_one_work+0x248/0x520
[ 92.848119] [c0000001d35cbd30] [c00000000011f030] worker_thread+0x280/0x5d0
[ 92.851012] [c0000001d35cbdc0] [c00000000012870c] kthread+0x1ac/0x1c0
[ 92.851102] [c0000001d35cbe30] [c00000000000bae0] ret_from_kernel_thread+0x5c/0x7c
[ 92.851209] Instruction dump:
[ 92.851231] 7cdf3378 7c9e2378 7cbd2b78 7cfc3b78 48000008 e8410018 7be6c6c4 7bbd1828
[ 92.852725] 78c64602 7fdeea14 7cc6e378 7c0004ac <f8de0000> 39200001 38600000 992d019c
[ 92.852815] ---[ end trace 2915333da62340c0 ]---
EDIT: lspci output for the AMD card:
Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon Pro WX 7100]
Flags: fast devsel, IRQ 24, NUMA node 0
Memory at 6000000000000 (64-bit, prefetchable) [size=256M]
Memory at 6000010000000 (64-bit, prefetchable) [size=2M]
I/O ports at <unassigned> [disabled]
Memory at 600c000000000 (32-bit, non-prefetchable) [size=256K]
Expansion ROM at 600c000040000 [disabled] [size=128K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [58] Express Legacy Endpoint, MSI 00
Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150] Advanced Error Reporting
Capabilities: [200] #15
Capabilities: [270] #19
Capabilities: [2b0] Address Translation Service (ATS)
Capabilities: [2c0] Page Request Interface (PRI)
Capabilities: [2d0] Process Address Space ID (PASID)
Capabilities: [320] Latency Tolerance Reporting
Capabilities: [328] Alternative Routing-ID Interpretation (ARI)
Capabilities: [370] L1 PM Substates
Kernel driver in use: amdgpu
Kernel modules: amdgpu
Metadata
Metadata
Assignees
Labels
No labels