Skip to content

Data object __repr__ shows incorrect 'y' value for TUDataset (e.g., ENZYMES) #10372

@hahaxiao365

Description

@hahaxiao365

🐛 Describe the bug

Problem Description:
When working with torch_geometric.data.Data objects loaded from TUDataset (specifically, ENZYMES dataset), the __repr__ (string representation when printing the Data object) for the 'y' attribute shows a value that contradicts the actual value stored in the y tensor.

For example, when printing pyg_dataset[idx], its output includes y=[1]. However, when retrieving the actual value using pyg_dataset[idx].y.item(), the returned value is 4. This discrepancy is misleading as the printed representation does not reflect the true data.

Steps to Reproduce:

  1. Ensure PyTorch Geometric and its dependencies are installed (e.g., in a Google Colab environment).
  2. Load the ENZYMES dataset.
  3. Access a specific graph from the dataset (e.g., at index 100).
  4. Print the Data object to see its string representation.
  5. Directly print the raw y tensor of the Data object.
  6. Retrieve and print the scalar value of the y tensor using .item().
import torch
import os
from torch_geometric.datasets import TUDataset

# Simulate a non-Gradescope environment for local testing if needed
# if 'IS_GRADESCOPE_ENV' in os.environ:
#     del os.environ['IS_GRADESCOPE_ENV']

# Load the ENZYMES dataset
# Ensure you've run necessary pip installs beforehand (torch_geometric, ogb)
root = './enzymes_data_bug_report' # Use a distinct path
name = 'ENZYMES'
pyg_dataset = TUDataset(root, name)

print(f"Dataset loaded: {pyg_dataset}")

# The index where the discrepancy was observed
idx = 100 
graph_data = pyg_dataset[idx]

print(f"\n--- Analysis for graph at index {idx} ---")
print(f"1. Print of the Data object (pyg_dataset[{idx}]): {graph_data}")
print(f"   (Expected 'y=' in output above to be consistent with actual value)")

# Directly check the 'y' tensor and its type
y_tensor = graph_data.y
print(f"2. Raw 'y' tensor: {y_tensor}")
print(f"3. Type of 'y' tensor: {type(y_tensor)}")

# Get the item value from 'y' tensor
y_item_value = y_tensor.item()
print(f"4. Value obtained from y.item(): {y_item_value}")

print("\n--- Discrepancy Highlight ---")
print(f"Observed 'y' in Data object's print (__repr__): (This is misleading)")
print(f"Actual 'y' tensor value (from y.item()): {y_item_value}")
if y_item_value != 1:
    print("!!! WARNING: The displayed 'y' in Data object's __repr__ does NOT match the actual tensor value. !!!")

### Versions

# The following user warnings about torch-scatter and torch-sparse import issues were observed,
# which might be related to CUDA environment or package compilation, but the core issue
# (y-value display discrepancy) persists regardless.
# /usr/local/lib/python3.11/dist-packages/torch_geometric/typing.py:86: UserWarning: An issue occurred while importing 'torch-scatter'. Disabling its usage. Stacktrace: /usr/local/lib/python3.11/dist-packages/torch_scatter/_scatter_cuda.so: undefined symbol: _ZN2at23SavedTensorDefaultHooks11set_tracingEb
#   warnings.warn(f"An issue occurred while importing 'torch-scatter'. "
# /usr/local/lib/python3.11/dist-packages/torch_geometric/typing.py:124: UserWarning: An issue occurred while importing 'torch-sparse'. Disabling its usage. Stacktrace: /usr/local/lib/python3.11/dist-packages/torch_sparse/_spmm_cuda.so: undefined symbol: _ZN2at23SavedTensorDefaultHooks11set_tracingEb
#   warnings.warn(f"An issue occurred while importing 'torch-sparse'. "

PyTorch version: 2.4.0+cu121
PyTorch Geometric version: 2.6.1
Python 3.11.13
Operating System: Google Colab (Free Tier)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions