Skip to content

Commit 5991ead

Browse files
authored
Merge pull request #124 from chaserRen/dev
add long video prompt
2 parents 87b00d4 + aa598f8 commit 5991ead

File tree

5 files changed

+149
-85
lines changed

5 files changed

+149
-85
lines changed

dingo/model/llm/llm_long_video_qa.py

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
import json
2+
3+
from dingo.model import Model
4+
from dingo.model.llm.base_openai import BaseOpenAI
5+
from dingo.model.modelres import ModelRes
6+
from dingo.model.prompt.prompt_long_video_qa import PromptLongVideoQa
7+
from dingo.utils import log
8+
9+
10+
@Model.llm_register("LLMLongVideoQa")
11+
class LLMLongVideoQa(BaseOpenAI):
12+
prompt = PromptLongVideoQa
13+
14+
@classmethod
15+
def process_response(cls, response: str) -> ModelRes:
16+
log.info(response)
17+
result = ModelRes()
18+
result.error_status = False
19+
result.type = "text"
20+
result.name = "qa_pairs"
21+
result.reason = [response]
22+
23+
return result
Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
from dingo.model.model import Model
2+
from dingo.model.prompt.base import BasePrompt
3+
4+
5+
@Model.prompt_register("PromptLongVideoQa", [])
6+
class PromptLongVideoQa(BasePrompt):
7+
content = """
8+
### Background
9+
You will be given a video summary text that chronologically records the content of the video. Your task is to infer the complete story of events in the video based on the summary content and generate 6 multi-step reasoning Q&A pairs that satisfy the <Output Format>.
10+
11+
### Objective
12+
Multi-step reasoning questions: The questions should require logical reasoning to answer, rather than being based on direct observation or perception. The design of the questions should promote a deep understanding of the entire plot, not just simple recognition of single scenes or objects.
13+
Multi-step reasoning process: Beyond basic event overviews, the answers should be derived through multiple steps of logical thinking and information integration. This means drawing conclusions from given information rather than stating obvious facts.
14+
Combining multiple information sources: While questions and answers can be resolved through visual content alone or by combining video and subtitles, they should not rely solely on subtitle information or everyday common sense. This requires comprehensive consideration of information from different channels to form a complete understanding.
15+
Generation result: You must generate exactly 6 Q&A pairs.
16+
17+
### Question Categories and Multi-step Reasoning Examples
18+
## 1. Event Prediction
19+
Definition: Predict subsequent plot developments based on events that have already occurred in the video.
20+
# Example
21+
Question: How will the miscarriage caused by the woman in pink being accidentally hurt while trying to break up a fight affect the subsequent plot?
22+
Answer: It may lead to a rift between the man in the blue vest and the man in green.
23+
Reasoning process:
24+
1. The woman trying to break up the fight was accidentally hurt, seen lying in bed holding her stomach, with doctors diagnosing a miscarriage
25+
2. The woman has a close relationship with the man in the blue vest
26+
3. The man in the blue vest will become angrier with the man in green
27+
4. The man in the blue vest and the man in green will have a falling out
28+
29+
## 2. Hypothetical Reasoning
30+
Definition: Present a hypothetical premise and infer corresponding developments.
31+
# Example
32+
Question: If the characters continue participating in the desert competition, what challenges might they face?
33+
Answer: They might face physical discomfort or even life-threatening challenges.
34+
Reasoning process:
35+
1. The characters are in an arid desert environment with harsh conditions
36+
2. The harsh environment has already caused physical discomfort in some participants
37+
3. Continued competition would likely lead to more severe physical discomfort or life-threatening situations
38+
39+
## 3. Event Attribution
40+
Definition: Determine the cause or purpose of an event in the video.
41+
# Example
42+
Question: Why does the streamer describe Kaveh as a good person?
43+
Answer: Because Kaveh donated all the property he won from the competition to those in need.
44+
Reasoning process:
45+
1. Kaveh won Sachin's property through the competition
46+
2. Kaveh donated all the won property to those in need
47+
3. Therefore the streamer describes Kaveh as a good person
48+
49+
## 4. Implicit Inference
50+
Definition: Analyze implicit information not explicitly shown, such as character personalities, emotions, relationships, event atmosphere, or situations.
51+
# Example
52+
Question: Why does the streamer share the story about his daughter Rin with viewers?
53+
Answer: Because the character he's using has a snake around its neck, reminding him of his daughter Rin's story about not being afraid of snakes, which he finds interesting enough to share.
54+
Reasoning process:
55+
1. The streamer is introducing his character Baizhu, who has a snake around the neck
56+
2. He mentions his daughter Rin wanted to keep a snake and wasn't afraid even at close range
57+
3. He likely finds this story interesting
58+
4. Therefore he shares it with viewers
59+
60+
## 5. Logical Connection
61+
Definition: Analyze the correlation between two elements in the video and explain their logical relationship, which can also be linked through events serving as intermediate connecting elements.
62+
# Example
63+
Question: What is the relationship between the man in the black jacket and his surroundings?
64+
Answer: He is very familiar with the environment.
65+
Reasoning process (adjust steps as needed):
66+
1. The man in black jacket appears multiple times smiling and relaxed
67+
2. People tend to relax in familiar environments
68+
3. Therefore he must be familiar with this environment
69+
70+
## 6. Event Summary
71+
Definition: Pose a summary question about the video content and provide an answer.
72+
# Example
73+
Question: What is the theme of this livestream?
74+
Answer: The streamer completing a Genshin Impact quest involving multiple characters competing, with Kaveh ultimately winning.
75+
76+
## 7. Multi-element Inference
77+
Definition: Infer event transformations after considering multiple conditions, with questions containing computational or counting components (numbers, dates, time points) derived from different elements.
78+
# Example
79+
Question: How many characters did the streamer use in the game?
80+
Answer: The streamer used 4 characters.
81+
Reasoning process:
82+
1. Used Nahida
83+
2. Used Zhongli
84+
3. Used Yae Miko
85+
4. Used Baizhu
86+
5. Total of 4 characters used
87+
88+
### Output Format
89+
Question1: [question]
90+
Answer1: [answer]
91+
Reasoning1: [detailed multi-step reasoning]
92+
Type1: [reasoning type]
93+
94+
### Workflow
95+
1. Carefully read the provided subtitles and summary.
96+
2. Generate exactly 6 multi-step reasoning Q&A pairs, ensuring each type is represented with even distribution.
97+
3. Format answers according to the specified <Output Format>, ensuring each step is supported by logical reasoning derived from the text.
98+
99+
### Provided Text
100+
"""

dingo/model/prompt/prompt_text_quality_multilan.py

Lines changed: 0 additions & 85 deletions
This file was deleted.
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
from dingo.exec import Executor
2+
from dingo.io import InputArgs
3+
4+
input_data = {
5+
"input_path": "../../test/data/test_long_video_qa.jsonl",
6+
"save_data": True,
7+
"save_correct": True,
8+
"dataset": "local",
9+
"data_format": "jsonl",
10+
"column_id": "video_id",
11+
"column_content": "summary",
12+
"custom_config": {
13+
"prompt_list": ["PromptLongVideoQa"],
14+
"llm_config": {
15+
"LLMLongVideoQa": {
16+
"key": "",
17+
"api_url": "",
18+
}
19+
}
20+
}
21+
}
22+
input_args = InputArgs(**input_data)
23+
executor = Executor.exec_map["local"](input_args)
24+
result = executor.execute()
25+
print(result)

test/data/test_long_video_qa.jsonl

Lines changed: 1 addition & 0 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)