Skip to content

[Back-End] Configure preprocess, BerTopic initialisation and verbosity #20

@Septimus4

Description

@Septimus4

For a configurable BerTopic plugin, consider these steps for flexibility and ease of setup:

1. Configuration File

Using a YAML configuration file is an effective way to manage pre-processing, model setup, and logging options. This allows for easy tweaking without modifying the code. Here's a sample structure in YAML:

preprocessing:
  stopwords: true
  punctuation: true
  lowercase: true

bertopic:
  min_topic_size: 10
  embedding_model: "all-MiniLM-L6-v2"
  ...
  ...

logging:
  verbosity: "info"  # Options: debug, info, warning, error

You can read this configuration file in your Python script with a library like PyYAML for YAML.

2. Script Argument Overrides

Allow command-line arguments to override config values. Using argparse, you can parse specific options, e.g., logging verbosity, pre-processing steps, or model settings. These arguments can dynamically override the config file settings if provided.

Example argument setup:

import argparse
import yaml

def parse_arguments():
    parser = argparse.ArgumentParser()
    parser.add_argument("--stopwords", action="store_true", help="Enable stopword removal")
    parser.add_argument("--punctuation", action="store_true", help="Enable punctuation removal")
    parser.add_argument("--verbosity", choices=["debug", "info", "warning", "error"], help="Set logging verbosity")
    return parser.parse_args()

args = parse_arguments()

4. Configurable Logging

For logging, use Python’s logging module. Map verbosity levels from the config or arguments.

import logging

logging.basicConfig(level=getattr(logging, verbosity.upper()))
logger = logging.getLogger(__name__)

logger.info("BerTopic model initialized.")
logger.debug("Debugging details here if verbosity is set to debug.")

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions