Skip to content

Commit b5b8745

Browse files
iofu728liyucheng09Starmysmydmdm
authored
Feature(MInference): support llama 3.1 (#54)
Co-authored-by: Yucheng Li <[email protected]> Co-authored-by: Chengruidong Zhang <[email protected]> Co-authored-by: Yuqing Yang <[email protected]>
1 parent ddfb462 commit b5b8745

File tree

5 files changed

+10
-3
lines changed

5 files changed

+10
-3
lines changed

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ https://github.com/microsoft/MInference/assets/30883354/52613efc-738f-4081-8367-
1717
_Now, you can process **1M context 10x faster in a single A100** using Long-context LLMs like LLaMA-3-8B-1M, GLM-4-1M, with even **better accuracy**, try **MInference 1.0** right now!_
1818

1919
## News
20+
- 🥤 [24/07/24] MInference support [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) now.
2021
- 🪗 [24/07/07] Thanks @AK for sponsoring. You can now use MInference online in the [HF Demo](https://huggingface.co/spaces/microsoft/MInference) with ZeroGPU.
2122
- 📃 [24/07/03] Due to an issue with arXiv, the PDF is currently unavailable there. You can find the paper at this [link](https://export.arxiv.org/pdf/2407.02490).
2223
- 🧩 [24/07/03] We will present **MInference 1.0** at the _**Microsoft Booth**_ and _**ES-FoMo**_ at ICML'24. See you in Vienna!
@@ -60,6 +61,7 @@ get_support_models()
6061
```
6162

6263
Currently, we support the following LLMs:
64+
- LLaMA-3.1: [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)
6365
- LLaMA-3: [gradientai/Llama-3-8B-Instruct-262k](https://huggingface.co/gradientai/Llama-3-8B-Instruct-262k), [gradientai/Llama-3-8B-Instruct-Gradient-1048k](https://huggingface.co/gradientai/Llama-3-8B-Instruct-Gradient-1048k), [gradientai/Llama-3-8B-Instruct-Gradient-4194k](https://huggingface.co/gradientai/Llama-3-8B-Instruct-Gradient-4194k)
6466
- GLM-4: [THUDM/glm-4-9b-chat-1m](https://huggingface.co/THUDM/glm-4-9b-chat-1m)
6567
- Yi: [01-ai/Yi-9B-200K](https://huggingface.co/01-ai/Yi-9B-200K)

minference/configs/Llama_3.1_8B_Instruct_128k_kv_out_v32_fit_o_best_pattern.json

Lines changed: 1 addition & 0 deletions
Large diffs are not rendered by default.

minference/configs/model2path.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,9 @@
2626
"THUDM/glm-4-9b-chat-1m": os.path.join(
2727
BASE_DIR, "GLM_4_9B_1M_instruct_kv_out_v32_fit_o_best_pattern.json"
2828
),
29+
"meta-llama/Meta-Llama-3.1-8B-Instruct": os.path.join(
30+
BASE_DIR, "Llama_3.1_8B_Instruct_128k_kv_out_v32_fit_o_best_pattern.json"
31+
),
2932
}
3033

3134

minference/modules/minference_forward.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
from importlib import import_module
99

1010
from transformers.models.llama.modeling_llama import *
11+
from transformers.utils import is_flash_attn_2_available
1112
from transformers.utils.import_utils import _is_package_available
1213

1314
if _is_package_available("vllm"):
@@ -531,7 +532,7 @@ def forward(
531532
if os.path.exists(self.config_path):
532533
config_list = json.load(open(self.config_path))
533534
if self.layer_idx < len(config_list):
534-
assert False
535+
assert False, f"Search completed. The config is located in {self.config_path}."
535536
else:
536537
config_list = []
537538
config = {}

minference/version.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,10 @@
55
_MINOR = "1"
66
# On master and in a nightly release the patch should be one ahead of the last
77
# released build.
8-
_PATCH = "4"
8+
_PATCH = "5"
99
# This is mainly for nightly builds which have the suffix ".dev$DATE". See
1010
# https://semver.org/#is-v123-a-semantic-version for the semantics.
11-
_SUFFIX = ".post4"
11+
_SUFFIX = ""
1212

1313
VERSION_SHORT = "{0}.{1}".format(_MAJOR, _MINOR)
1414
VERSION = "{0}.{1}.{2}{3}".format(_MAJOR, _MINOR, _PATCH, _SUFFIX)

0 commit comments

Comments
 (0)