Docs

stephenroller · stephenroller · commit 48b9b6c08323 · 2022-05-04T14:41:21.000Z
diff --git a/projects/OPT/download_opt175b.md b/projects/OPT/download_opt175b.md
@@ -3,7 +3,7 @@
 After receiving an email with a presigned URL to access the model weights, follow the below set of instructions to get started with hosting the model.
 
 ## Download all shards
-Since we trained OPT-175B on 124 hosts, we have 124*8 = 992 files corresponding to the model parameters (8x tensor parallelism). The pre-signed URL that you receive in your email will look something like the following:
+Since we trained OPT-175B on 124 hosts, we have 124\*8 = 992 files corresponding to the model parameters (8x tensor parallelism). The pre-signed URL that you receive in your email will look something like the following:
 
 ```
 https://<cloudfront_url>/175b/checkpoint_last_20220503/stubbed.pt?&<super_long_query_string>
@@ -20,6 +20,15 @@ To consolidate the 992 shards into 8 files model-parallel evaluation, run (assum
 bash metaseq/scripts/reshard_sbatch.sh <directory_where_all_the_shards_are>/checkpoint_last <output_dir>/ 8 1
 ```
 
+Note that most of our models expect to run with Model (Tensor) Parallelism. For smaller models, some
+users may find it easier to eliminate model parallelism. The checkpoints can be converted
+to eliminate use of MP with the `consolidate_fsdp_shards.py` script:
+
+```bash
+python metaseq.scripts.consolidate_fsdp_shards ${FOLDER_PATH}/checkpoint_last --new-arch-name transformer_lm_gpt --save-prefix ${FOLDER_PATH}/consolidated
+```
+
+
 ## Run the API
 Follow the instructions in the [API docs](../../docs/api.md) to spin up the API.  You will need to update the constants in `metaseq/service/constants.py` to point to right directories.