Add documentation about recent audio descriptors/similarity changes

ffont · ffont · commit 2ba077aa0bb0 · 2025-11-19T12:33:46.000+01:00
diff --git a/DEVELOPERS.md b/DEVELOPERS.md
@@ -261,6 +261,49 @@ The new analysis pipeline uses a job queue based on Celery/RabbitMQ. RabbitMQ co
 for Freesound async tasks other than analysis).
 
 
+### Supported audio descriptors for search and sound metadata
+
+By combining the output of one or several audio analyzers (see the sections below), Freesound has a way to make audio descriptors
+avialable for filtering search queries and as sound metadata fields through the API. In this way, an API user is able to make
+queries and filter by both textual sound properties like tags and audio descriptors. Also, in the search results returned through the API,
+a user can specify which descriptor values should be returned, much like any other standard sound metadata field (see API docs for more info).
+
+The way to specify which audio descriptors should be available as fields/filters, is trhough the `settings.CONSOLIDATED_AUDIO_DESCRIPTORS`
+configuration parameter. There, a list of descriptors is defined together with the way to find their values based on the output of audio
+analyzers. When analyzing sounds with audio analyzers, the results of each analyzer will be saved in disk (either with a .json or .yaml) file.
+Also, a `SoundAnalysis` object for every analyzer/sound pair will be created (see the sections above). Analyzers output will only be saved in disk,
+and will not be loaded in the corresponding `SoundAnalysis` object. The `Sound` model has a `Sound.consoidate_analysis()` method which, when run, will
+create a new `SoundAnalysis` object of analyzer type `consolidated` (`settings.CONSOLIDATED_ANALYZER_NAME`), will collect all the relevant descriptors
+data (following the `settings.CONSOLIDATED_AUDIO_DESCRIPTORS` list) from each individual analyzer output file, and will load the collected data in 
+the `analysis_datya` field of the newly created `SoundAnalysis` object. This is how only the relevant audio descriptors data is load to the DB in the
+consolidated `SoundAnalysis` object. `Sound.consoidate_analysis()` is called every time a new analyzer relevant to `settings.CONSOLIDATED_AUDIO_DESCRIPTORS` 
+finished an analysis task so sounds are updated "automaticlly". Also there is a management command `create_consolidated_sound_analysis_and_sim_vectors` that 
+will help creating these objects in bulk.
+
+For every descriptor defined in `settings.CONSOLIDATED_AUDIO_DESCRIPTORS` there are a number of options that can be set, including some value transformations
+and whether the descriptor should be indexed in the search engine. If no options are set, sensible defaults will be used. The current definition of 
+`settings.CONSOLIDATED_AUDIO_DESCRIPTORS` should hopefully be self-explanatory.
+
+When consolidated analyses are loaded in `SoundAnalysis` objects in the DB, adding sounds to the search engine will also include these
+descriptors and therefore these will be available for filtering in search queries. Note that multi-dimensional descriptors will not be indexed
+(they are not useful for filtering), and also descriptors marked with `index: False` will be skipped.
+
+
+### Similarity search and similarity spaces
+
+Similarly to how audio descriptors work, Freesound can also define a number of "similarity spaces" that can be used for similarity search
+and that are based on the output of audio analyzers. Similarity spaces are defined through `settings.SIMILARITY_SPACES`. The entries
+of `settings.SIMILARITY_SPACES` define a number of properties like from which analyzer/property value the vector should be obtained,
+if the vector should be L2 normalized, etc. Checking the current definition of `settings.SIMILARITY_SPACES` should hopefully be self-explanatory.
+
+The `Sound` model has a `Sound.load_similarity_vectors()` which will create corresponding `SoundSimilarityVector` objects for each pair of
+sound and type of similarity space. Once the vectors are loaded in the DB, they can be indexed in the search engine and also be used as targets
+for a similarity search. `Sound.load_similarity_vectors()` is called when any relevant analyser to `settings.SIMILARITY_SPACES` finished analysing
+a sound, therefore vectors should be automatically loaded (and also indexed in the search engine as sounds will also be marked as "index dirty" when
+new similarity vector objects are created). The management command `create_consolidated_sound_analysis_and_sim_vectors` can be used to help creating 
+`SoundSimilarityVector` objects in bulk.
+
+
 ### Considerations when updating Django version
 
 #### Preparation
diff --git a/README.md b/README.md
@@ -45,7 +45,7 @@ Below are instructions for setting up a local Freesound installation for develop
        freesound-data/previews/
        freesound-data/analysis/
 
-4. Download [Freesound development similarity index](https://drive.google.com/file/d/1ydJUUXbQZbHrva4UZd3C05wDcOXI7v1m/view?usp=sharing) and the [Freesound tag recommendation models](https://drive.google.com/file/d/1snaktMysCXdThWKkYuKWoGc_Hk2BElmz/view?usp=sharing) and place their contents under `freesound-data/similarity_index/` and `freesound-data/tag_recommendation_models` directories respectively (you'll need to create the directories). 
+4. Download the [Freesound tag recommendation models](https://drive.google.com/file/d/1snaktMysCXdThWKkYuKWoGc_Hk2BElmz/view?usp=sharing) and place the contents under `freesound-data/tag_recommendation_models` directory (you'll need to create that directory). 
 
 5. Rename `freesound/local_settings.example.py` file, so you can customise Django settings if needed and create a `.env` file with your local user UID and other useful settings. These other settings include `COMPOSE_PROJECT_NAME` and `LOCAL_PORT_PREFIX` which can be used to allow parallel local installations running on the same machine (provided that these to variables are different in the local installations), and `FS_BIND_HOST` which you should set to `0.0.0.0` if you need to access your local Freesound services from a remote machine.
 
@@ -115,7 +115,16 @@ If you a prompted for a password, use `localfreesoundpgpassword`, this is define
 
     Because the `web` container mounts a named volume for the home folder of the user running the shell plus process, command history should be kept between container runs :)
 
-16. (extra step) The steps above will get Freesound running, but to save resources in your local machine some non-essential services will not be started by default. If you look at the `docker-compose.yml` file, you'll see that some services are marked with the profile `analyzers` or `all`. These services include sound similarity, search results clustering and the audio analyzers. To run these services you need to explicitly tell `docker compose` using the `--profile` (note that some services need additional configuration steps (see *Freesound analysis pipeline* section in `DEVELOPERS.md`):
+16. (extra) Load audio descriptors and similarity vectors to the database and reindex the search index. This is necessary to make audio descriptors available thorugh the API and to make similarity search work. Note that for this to work, you need to have properly set the development data folder, and you should see some files inside the `freesound-data/analysis` folders which store the (previously computed) results of Freesound audio analysers. 
+
+        # First run the following command which will create relevant objects in the DB. Note that this can take some minutes.
+        docker compose run --rm web python manage.py create_consolidated_sound_analysis_and_sim_vectors --force   
+
+        # Then re-create the search engine sounds index after audio descriptors data has been loaded in the DB. You need to specifically indicate that similarity vectors should be added.
+        docker compose run --rm web python manage.py reindex_search_engine_sounds --include-similarity-vectors
+
+
+The steps above will get Freesound running, but to save resources in your local machine some non-essential services will not be started by default. If you look at the `docker-compose.yml` file, you'll see that some services are marked with the profile `analyzers` or `all`. These services include sound tag recommendation and the audio analyzers. To run these services you need to explicitly tell `docker compose` using the `--profile` (note that some services need additional configuration steps (see *Freesound analysis pipeline* section in `DEVELOPERS.md`):
 
         docker compose --profile analyzers up   # To run all basic services + sound analyzers
         docker compose --profile all up         # To run all services
diff --git a/_docs/api/source/resources.rst b/_docs/api/source/resources.rst
@@ -183,6 +183,10 @@ laion_clap             512                    This space is built using LAION-CL
 freesound_classic      100                    This space is built using a combination of low-level acoustic audio features extracted using the ``FreesoundExtractor`` from the Essentia audio analysis library (https://essentia.upf.edu). We currently don't provide code to extract these features from arbitrary audio, but we might do that in the future.
 =====================  =====================  ====================================================================
 
+When using vectors as input for the ``similar_to`` parameter, make sure that the vectors are extracted using the same method as the one used to build the similarity space. 
+Note that L2-normalisation is automatically applied to input vectors.
+If the provided vector is already L2-normalized, this will have no effect.
+
 
 .. _search-weights:
 
diff --git a/sounds/management/commands/create_consolidated_sound_analysis_and_sim_vectors.py b/sounds/management/commands/create_consolidated_sound_analysis_and_sim_vectors.py
@@ -60,13 +60,20 @@ def add_arguments(self, parser):
             dest='chunk_size',
             default=100,
             help='Number of sounds to process in each chunk (default: 100).')
-
+        
         parser.add_argument(
-            '--clear_others',
+            '--skip-consolidated-analysis',
             action='store_true',
-            dest='clear_others',
+            dest='skip_consolidated_analysis',
             default=False,
-            help='If set, clear analysis data from SoundAnalysis obects other than "consolidated" one.')
+            help='If set, skip generating consolidated analysis objects.')
+        
+        parser.add_argument(
+            '--skip-similarity-vectors',
+            action='store_true',
+            dest='skip_similarity_vectors',
+            default=False,
+            help='If set, skip generating similarity vector objects.')
     
 
     def handle(self, *args, **options):
@@ -94,57 +101,58 @@ def handle(self, *args, **options):
         for i in range(0, len(sound_ids_to_process), chunk_size):
             sound_ids = sound_ids_to_process[i:i+chunk_size]
             ss = Sound.objects.filter(id__in=sound_ids)
-            
-            if options['clear_others']:
-                # Clear data from all non-consolidated sound analysis objects related to these sounds
-                ssaa = SoundAnalysis.objects.filter(sound__in=sound_ids).exclude(analyzer=settings.CONSOLIDATED_ANALYZER_NAME)
-                ssaa.update(analysis_data={}) 
 
             # Generate consolidated analyses and load similarity vectors for the chunk of sounds
             consolidated_analyis_objects = []
             similarity_vector_objects = []
             for sound in ss:
-                consolidated_analysis_data, tmp_analyzers_data= sound.consolidate_analysis(no_db_operations=True)
-                
-                consolidated_analyis_objects.append(SoundAnalysis(
-                    sound_id=sound.id,
-                    analyzer=settings.CONSOLIDATED_ANALYZER_NAME,
-                    analysis_data=consolidated_analysis_data,
-                    analysis_status = "OK",
-                    last_analyzer_finished = timezone.now()
-                ))
-
-                for similarity_space_name, similarity_space in settings.SIMILARITY_SPACES.items():
-                    analyzer_data = tmp_analyzers_data.get(similarity_space['analyzer'], {})
-                    if not analyzer_data:
-                        analyzer_data = SoundAnalysis.get_analysis_data_from_file_without_db(sound.id, similarity_space['analyzer'])
-                        if not analyzer_data:
-                            continue
-                    try:
-                        sim_vector = analyzer_data[similarity_space['vector_property_name']]
-                        sim_vector = [float(x) for x in sim_vector] 
-                    except (IndexError, ValueError, KeyError):
-                        continue
-
-                    if len(sim_vector) != similarity_space['vector_size']:
-                        continue
+                if not options['skip_consolidated_analysis']:
+                    consolidated_analysis_data, tmp_analyzers_data= sound.consolidate_analysis(no_db_operations=True)
                     
-                    similarity_vector_objects.append(SoundSimilarityVector(
+                    consolidated_analyis_objects.append(SoundAnalysis(
                         sound_id=sound.id,
-                        similarity_space_name=similarity_space_name,
-                        vector=sim_vector
+                        analyzer=settings.CONSOLIDATED_ANALYZER_NAME,
+                        analysis_data=consolidated_analysis_data,
+                        analysis_status = "OK",
+                        last_analyzer_finished = timezone.now()
                     ))
+
+                if not options['skip_similarity_vectors']:
+                    for similarity_space_name, similarity_space in settings.SIMILARITY_SPACES.items():
+                        analyzer_data = tmp_analyzers_data.get(similarity_space['analyzer'], {})
+                        if not analyzer_data:
+                            analyzer_data = SoundAnalysis.get_analysis_data_from_file_without_db(sound.id, similarity_space['analyzer'])
+                            if not analyzer_data:
+                                continue
+                        try:
+                            sim_vector = analyzer_data[similarity_space['vector_property_name']]
+                            sim_vector = [float(x) for x in sim_vector] 
+                        except (IndexError, ValueError, KeyError):
+                            continue
+
+                        if len(sim_vector) != similarity_space['vector_size']:
+                            continue
+                        
+                        similarity_vector_objects.append(SoundSimilarityVector(
+                            sound_id=sound.id,
+                            similarity_space_name=similarity_space_name,
+                            vector=sim_vector
+                        ))
                     
             # Now that we loaded all the data, create the db objcts in bulk
             if options['force']:
                 # If force is set, we delete any existing consolidated analysis or similarity vector for these sounds
                 # before creating the new ones
                 # NOTE: we used ignore_conflicts=True below to avoid issues force is set to False and some objects already exist
-                SoundAnalysis.objects.filter(sound__in=sound_ids, analyzer=settings.CONSOLIDATED_ANALYZER_NAME).delete()
-                SoundSimilarityVector.objects.filter(sound__in=sound_ids).delete()
+                if not options['skip_consolidated_analysis']:
+                    SoundAnalysis.objects.filter(sound__in=sound_ids, analyzer=settings.CONSOLIDATED_ANALYZER_NAME).delete()
+                if not options['skip_similarity_vectors']:
+                    SoundSimilarityVector.objects.filter(sound__in=sound_ids).delete()
             
-            SoundAnalysis.objects.bulk_create(consolidated_analyis_objects, ignore_conflicts=True)
-            SoundSimilarityVector.objects.bulk_create(similarity_vector_objects, ignore_conflicts=True)
+            if not options['skip_consolidated_analysis']:
+                SoundAnalysis.objects.bulk_create(consolidated_analyis_objects, ignore_conflicts=True)
+            if not options['skip_similarity_vectors']:
+                SoundSimilarityVector.objects.bulk_create(similarity_vector_objects, ignore_conflicts=True)
 
             total_done += chunk_size
             elapsed = time.monotonic() - starttime
diff --git a/sounds/models.py b/sounds/models.py
@@ -2299,10 +2299,16 @@ class SoundSimilarityVector(models.Model):
     similarity_space_name = models.CharField(max_length=100)
     vector = ArrayField(models.FloatField())
 
-    def apply_l2_normalization(self, commit=True):
-        norm = math.sqrt(sum([v*v for v in self.vector]))
+    @classmethod
+    def l2_normalize_vector(cls, vector):
+        norm = math.sqrt(sum([v*v for v in vector]))
         if norm > 0:
-            self.vector = [v/norm for v in self.vector]
+            return [v/norm for v in vector]
+        else:
+            return vector
+
+    def apply_l2_normalization(self, commit=True):
+        self.vector = self.l2_normalize_vector(self.vector)
         if commit:
             self.save()
 
diff --git a/utils/search/backends/solr555pysolr.py b/utils/search/backends/solr555pysolr.py
@@ -621,6 +621,9 @@ def search_sounds(self, textual_query='', query_fields=None, query_filter='', fi
                 vector = None
                 if isinstance(similar_to, list):
                     vector = similar_to  # we allow vectors to be passed directly
+                    # If vector needs to be l2 normalized, do it now. Note that if the vector is already normalized, this will have no effect
+                    if config_options.get('l2_norm', False):
+                        vector = SoundSimilarityVector.l2_normalize_vector(vector)
                 else:
                     # similar_to should be a sound_id
                     try: