-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Open
Labels
Description
🐛 Describe the bug
Problem Description:
When working with torch_geometric.data.Data
objects loaded from TUDataset
(specifically, ENZYMES
dataset), the __repr__
(string representation when printing the Data object) for the 'y' attribute shows a value that contradicts the actual value stored in the y
tensor.
For example, when printing pyg_dataset[idx]
, its output includes y=[1]
. However, when retrieving the actual value using pyg_dataset[idx].y.item()
, the returned value is 4
. This discrepancy is misleading as the printed representation does not reflect the true data.
Steps to Reproduce:
- Ensure PyTorch Geometric and its dependencies are installed (e.g., in a Google Colab environment).
- Load the
ENZYMES
dataset. - Access a specific graph from the dataset (e.g., at index 100).
- Print the
Data
object to see its string representation. - Directly print the raw
y
tensor of theData
object. - Retrieve and print the scalar value of the
y
tensor using.item()
.
import torch
import os
from torch_geometric.datasets import TUDataset
# Simulate a non-Gradescope environment for local testing if needed
# if 'IS_GRADESCOPE_ENV' in os.environ:
# del os.environ['IS_GRADESCOPE_ENV']
# Load the ENZYMES dataset
# Ensure you've run necessary pip installs beforehand (torch_geometric, ogb)
root = './enzymes_data_bug_report' # Use a distinct path
name = 'ENZYMES'
pyg_dataset = TUDataset(root, name)
print(f"Dataset loaded: {pyg_dataset}")
# The index where the discrepancy was observed
idx = 100
graph_data = pyg_dataset[idx]
print(f"\n--- Analysis for graph at index {idx} ---")
print(f"1. Print of the Data object (pyg_dataset[{idx}]): {graph_data}")
print(f" (Expected 'y=' in output above to be consistent with actual value)")
# Directly check the 'y' tensor and its type
y_tensor = graph_data.y
print(f"2. Raw 'y' tensor: {y_tensor}")
print(f"3. Type of 'y' tensor: {type(y_tensor)}")
# Get the item value from 'y' tensor
y_item_value = y_tensor.item()
print(f"4. Value obtained from y.item(): {y_item_value}")
print("\n--- Discrepancy Highlight ---")
print(f"Observed 'y' in Data object's print (__repr__): (This is misleading)")
print(f"Actual 'y' tensor value (from y.item()): {y_item_value}")
if y_item_value != 1:
print("!!! WARNING: The displayed 'y' in Data object's __repr__ does NOT match the actual tensor value. !!!")
### Versions
# The following user warnings about torch-scatter and torch-sparse import issues were observed,
# which might be related to CUDA environment or package compilation, but the core issue
# (y-value display discrepancy) persists regardless.
# /usr/local/lib/python3.11/dist-packages/torch_geometric/typing.py:86: UserWarning: An issue occurred while importing 'torch-scatter'. Disabling its usage. Stacktrace: /usr/local/lib/python3.11/dist-packages/torch_scatter/_scatter_cuda.so: undefined symbol: _ZN2at23SavedTensorDefaultHooks11set_tracingEb
# warnings.warn(f"An issue occurred while importing 'torch-scatter'. "
# /usr/local/lib/python3.11/dist-packages/torch_geometric/typing.py:124: UserWarning: An issue occurred while importing 'torch-sparse'. Disabling its usage. Stacktrace: /usr/local/lib/python3.11/dist-packages/torch_sparse/_spmm_cuda.so: undefined symbol: _ZN2at23SavedTensorDefaultHooks11set_tracingEb
# warnings.warn(f"An issue occurred while importing 'torch-sparse'. "
PyTorch version: 2.4.0+cu121
PyTorch Geometric version: 2.6.1
Python 3.11.13
Operating System: Google Colab (Free Tier)