Skip to content

Conversation

@nemonameless
Copy link
Collaborator

cherry-pick #230

@nemonameless nemonameless merged commit 83aa2d1 into PaddlePaddle:release/1.0 Oct 13, 2023
lyuwenyu added a commit that referenced this pull request Feb 20, 2025
## 算子目录

- [1. 转换算子](#1-转换算子)
  - [1.1 llava转换算子](#11-llava转换算子)
    - [1.1.1 llava_convert](#111-llava_convert)
- [2. 过滤算子](#2-过滤算子)
  - [2.1 基础过滤算子](#21-基础过滤算子)
    - [2.1.1 valid_data_filter](#211-valid_data_filter)
- [2.1.1.1 image_compliance_operator](#2111-image_compliance_operator)
- [2.1.1.2
conversation_compliance_operator](#2112-conversation_compliance_operator)
  - [2.2 文本过滤算子](#22-文本过滤算子)
- [2.2.1 conversation_length_filter](#221-conversation_length_filter)
- [2.2.2 average_line_length_filter](#222-average_line_length_filter)
- [2.2.3 maximum_line_length_filter](#223-maximum_line_length_filter)
- [2.2.4
conversation_percentage_filter](#224-conversation_percentage_filter)
    - [2.2.5 token_num_filter](#225-token_num_filter)
    - [2.2.6 alphanumeric_ratio_filter](#226-alphanumeric_ratio_filter)
    - [2.2.7 stopwords_ratio_filter](#227-stopwords_ratio_filter)
    - [2.2.8 special_characters_filter](#228-special_characters_filter)
    - [2.2.9 language_id_filter](#229-language_id_filter)
    - [2.2.10 text_action_filter](#2210-text_action_filter)
- [2.2.11
text_entity_dependency_filter](#2211-text_entity_dependency_filter)
- [2.2.12
char_ngram_repetition_filter](#2212-char_ngram_repetition_filter)
- [2.2.13
word_ngram_repetition_filter](#2213-word_ngram_repetition_filter)
    - [2.2.14 conversation_hash_filter](#2214-conversation_hash_filter)
- [2.2.14.1
simhash_duplicate_operator](#22141-simhash_duplicate_operator)
- [2.2.14.2
minhash_duplicate_operator](#22142-minhash_duplicate_operator)
    - [2.2.15 llm_judge_filter](#2215-llm_judge_filter)
  - [2.3 图像过滤算子](#23-图像过滤算子)
    - [2.3.1 image_filesize_filter](#231-image_filesize_filter)
    - [2.3.2 image_ration_filter](#232-image_ration_filter)
    - [2.3.3 image_resolution_filter](#233-image_resolution_filter)
    - [2.3.4 image_hash_filter](#234-image_hash_filter)
  - [2.4 图文过滤算子](#24-图文过滤算子)
    - [2.4.1 image_clip_filter](#241-image_clip_filter)
- [3. 分析算子](#3-分析算子)
  - [3.1 基础分析算子](#31-基础分析算子)
    - [3.1.1 base_analysis_pipeline](#311-base_analysis_pipeline)
- [3.1.1.1 analyze_dataset_statistics](#3111-analyze_dataset_statistics)
- [3.1.1.2
analyze_language_distribution](#3112-analyze_language_distribution)
      - [3.1.1.3 analyze_image_paths](#3113-analyze_image_paths)
      - [3.1.1.4 analyze_data_anomalies](#3114-analyze_data_anomalies)
- [3.1.1.5
analyze_conversation_tokens](#3115-analyze_conversation_tokens)
  - [3.2 进阶分析算子](#32-进阶分析算子)
    - [3.2.1 description_analysis](#321-description_analysis)
    - [3.2.2 quality_analysis](#322-quality_analysis)
- [4. 可视化算子](#4-可视化算子)
  - [4.1 lda可视化算子](#41-lda可视化算子)
    - [4.1.1 lda_topic_clustering](#411-lda_topic_clustering)
- [5. 生成算子](#5-生成算子)
  - [5.1 多模态生成算子](#51-多模态生成算子)
    - [5.1.1 generate_qna_for_images](#511-generate_qna_for_images)



--- 
- #1055
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants