-
Notifications
You must be signed in to change notification settings - Fork 664
Description
🐛 Describe the bug
I tried running unshard.py
(base) tchakrabarty@ip-10-3-75-209:~/OLMo/logs$ python3 scripts/unshard.py --safe-tensors "/fsx/home/tchakrabarty/OLMo/fiction-history-midtrain/step11931" "/fsx/home/tchakrabarty/OLMo/fiction-history-midtrain/step11931_unsharded"
python3: can't open file '/fsx/home/tchakrabarty/OLMo/logs/scripts/unshard.py': [Errno 2] No such file or directory
(base) tchakrabarty@ip-10-3-75-209:~/OLMo/logs$ cd ..
(base) tchakrabarty@ip-10-3-75-209:~/OLMo$ python3 scripts/unshard.py --safe-tensors "/fsx/home/tchakrabarty/OLMo/fiction-history-midtrain/step11931" "/fsx/home/tchakrabarty/OLMo/fiction-history-midtrain/step11931_unsharded"
Traceback (most recent call last):
File "/fsx/home/tchakrabarty/OLMo/scripts/unshard.py", line 108, in <module>
main(
~~~~^
args.input_dir,
^^^^^^^^^^^^^^^
...<4 lines>...
use_shared_mem_impl=args.use_legacy_shared_mem_impl,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/fsx/home/tchakrabarty/OLMo/scripts/unshard.py", line 36, in main
model_state_dict, optim_state_dict, trainer_state_dict = checkpointer.unshard_checkpoint(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
input_dir,
^^^^^^^^^^
load_optimizer_state=not model_only,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
load_trainer_state=not model_only,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/fsx/home/tchakrabarty/OLMo/olmo/checkpoint.py", line 2002, in unshard_checkpoint
from olmo_core.distributed.checkpoint import ( # type: ignore
...<2 lines>...
)
ImportError: cannot import name 'unshard_model_state' from 'olmo_core.distributed.checkpoint' (/fsx/home/tchakrabarty/miniconda3/lib/python3.13/site-packages/olmo_core/distributed/checkpoint/__init__.py)
Versions
Python 3.13.2
accelerate==1.8.1
-e git+https://github.com/allenai/OLMo.git@f3dff833c880add075b123df9ddc31423086ef31#egg=ai2_olmo
ai2-olmo-core==2.1.0
ai2-olmo-eval==0.7.1
aiobotocore==2.23.0
aiohappyeyeballs==2.6.1
aiohttp==3.12.13
aioitertools==0.12.0
aiosignal==1.3.2
anaconda-anon-usage @ file:///croot/anaconda-anon-usage_1743190139881/work
annotated-types @ file:///work/perseverance-python-buildout/croot/annotated-types_1728386958747/work
antlr4-python3-runtime==4.9.3
archspec @ file:///croot/archspec_1709217642129/work
attrs==25.3.0
beaker-gantry==2.6.2
beaker-py==2.4.4
black==23.12.1
boltons @ file:///croot/boltons_1737061692168/work
boto3==1.38.41
botocore==1.38.46
Brotli @ file:///croot/brotli-split_1736182456865/work
build==1.2.2.post1
cached_path==1.7.3
cachetools==5.5.2
certifi @ file:///croot/certifi_1745939216646/work/certifi
cffi @ file:///croot/cffi_1736182485317/work
charset-normalizer @ file:///croot/charset-normalizer_1721748349566/work
click==8.2.1
click-help-colors==0.9.4
click-option-group==0.5.7
conda @ file:///croot/conda_1743715288469/work/conda-src
conda-anaconda-telemetry @ file:///croot/conda-anaconda-telemetry_1744662537425/work
conda-anaconda-tos @ file:///croot/conda-anaconda-tos_1744823864153/work
conda-content-trust @ file:///work/perseverance-python-buildout/croot/conda-content-trust_1728487058234/work
conda-libmamba-solver @ file:///croot/conda-libmamba-solver_1745607008911/work/src
conda-package-handling @ file:///croot/conda-package-handling_1731369017509/work
conda_package_streaming @ file:///croot/conda-package-streaming_1731366181659/work
cryptography @ file:///croot/cryptography_1740577825284/work
Cython==3.1.2
datasets==3.6.0
dill==0.3.8
distro @ file:///work/perseverance-python-buildout/croot/distro_1728396110052/work
docutils==0.21.2
einops==0.8.1
face==24.0.0
filelock==3.18.0
flash_attn==2.8.0.post2
frozendict @ file:///work/perseverance-python-buildout/croot/frozendict_1728497542215/work
frozenlist==1.7.0
fsspec==2025.3.0
ftfy==6.3.1
gitdb==4.0.12
GitPython==3.1.44
glom==24.11.0
google-api-core==2.25.1
google-auth==2.40.3
google-cloud-core==2.4.3
google-cloud-storage==2.19.0
google-crc32c==1.7.1
google-resumable-media==2.7.2
googleapis-common-protos==1.70.0
grpcio==1.73.0
hf-xet==1.1.5
hf_transfer==0.1.9
huggingface-hub==0.33.0
id==1.5.0
idna @ file:///work/perseverance-python-buildout/croot/idna_1728385935861/work
importlib_resources==6.5.2
iniconfig==2.1.0
isort==5.12.0
jaraco.classes==3.4.0
jaraco.context==6.0.1
jaraco.functools==4.2.1
jeepney==0.9.0
Jinja2==3.1.6
jmespath==1.0.1
joblib==1.5.1
jsonpatch @ file:///work/perseverance-python-buildout/croot/jsonpatch_1728399595941/work
jsonpointer==2.1
keyring==25.6.0
libmambapy @ file:///croot/mamba-split_1734469461757/work/libmambapy
lightning-utilities==0.14.3
markdown-it-py @ file:///work/perseverance-python-buildout/croot/markdown-it-py_1728387994477/work
MarkupSafe==3.0.2
mdurl @ file:///work/perseverance-python-buildout/croot/mdurl_1728387286143/work
menuinst @ file:///croot/menuinst_1738943416351/work
more-itertools==10.7.0
mpmath==1.3.0
msgspec==0.19.0
multidict==6.5.0
multiprocess==0.70.16
mypy==1.3.0
mypy_extensions==1.1.0
necessary==0.4.3
networkx==3.5
nh3==0.2.21
numpy==1.26.4
nvidia-cublas-cu12==12.6.4.1
nvidia-cuda-cupti-cu12==12.6.80
nvidia-cuda-nvrtc-cu12==12.6.77
nvidia-cuda-runtime-cu12==12.6.77
nvidia-cudnn-cu12==9.5.1.17
nvidia-cufft-cu12==11.3.0.4
nvidia-cufile-cu12==1.11.1.6
nvidia-curand-cu12==10.3.7.77
nvidia-cusolver-cu12==11.7.1.2
nvidia-cusparse-cu12==12.5.4.2
nvidia-cusparselt-cu12==0.6.3
nvidia-nccl-cu12==2.26.2
nvidia-nvjitlink-cu12==12.6.85
nvidia-nvtx-cu12==12.6.77
omegaconf==2.3.0
packaging @ file:///croot/packaging_1734472117206/work
pandas==2.3.0
pathspec==0.12.1
petname==2.6
platformdirs @ file:///croot/platformdirs_1744273042065/work
pluggy @ file:///croot/pluggy_1733169602837/work
propcache==0.3.2
proto-plus==1.26.1
protobuf==5.29.5
psutil==7.0.0
pyarrow==20.0.0
pyasn1==0.6.1
pyasn1_modules==0.4.2
pycosat @ file:///croot/pycosat_1736868416091/work
pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work
pydantic @ file:///croot/pydantic_1734736067156/work
pydantic_core @ file:///croot/pydantic-core_1734726052986/work
Pygments @ file:///croot/pygments_1744664109463/work
pyproject_hooks==1.2.0
PySocks @ file:///work/perseverance-python-buildout/croot/pysocks_1728386193338/work
pytest==8.4.1
pytest-sphinx==0.6.3
python-dateutil==2.9.0.post0
pytz==2025.2
PyYAML==6.0.2
readme_renderer==44.0
regex==2024.11.6
requests @ file:///croot/requests_1730999120400/work
requests-toolbelt==1.0.0
requirements-parser==0.13.0
rfc3986==2.0.0
rich @ file:///croot/rich_1732638981168/work
rsa==4.9.1
ruamel.yaml @ file:///croot/ruamel.yaml_1745960305322/work
ruamel.yaml.clib @ file:///croot/ruamel.yaml.clib_1745937152469/work
ruff==0.12.0
s3fs==2025.5.1
s3transfer==0.13.0
safetensors==0.5.3
scikit-learn==1.7.0
scipy==1.16.0
SecretStorage==3.3.3
sentry-sdk==2.30.0
setproctitle==1.3.6
setuptools==78.1.1
six==1.17.0
smart-open==7.1.0
smashed==0.21.5
smmap==5.0.2
sympy==1.14.0
threadpoolctl==3.6.0
tokenizers==0.21.1
torch==2.7.1
torchmetrics==1.7.3
tqdm @ file:///croot/tqdm_1738943501192/work
transformers @ file:///tmp/tmpujqajz3k
triton==3.3.1
trouting==0.3.3
truststore @ file:///croot/truststore_1736550121485/work
twine==6.1.0
typing_extensions @ file:///croot/typing_extensions_1734714854207/work
tzdata==2025.2
urllib3 @ file:///croot/urllib3_1737133630106/work
wandb==0.20.1
wcwidth==0.2.13
wheel==0.45.1
wrapt==1.17.2
xxhash==3.5.0
yarl==1.20.1
zstandard @ file:///croot/zstandard_1731356346222/work