Skip to content

Commit 2ba077a

Browse files
committed
Add documentation about recent audio descriptors/similarity changes
1 parent 3a0d38b commit 2ba077a

File tree

6 files changed

+118
-45
lines changed

6 files changed

+118
-45
lines changed

DEVELOPERS.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -261,6 +261,49 @@ The new analysis pipeline uses a job queue based on Celery/RabbitMQ. RabbitMQ co
261261
for Freesound async tasks other than analysis).
262262

263263

264+
### Supported audio descriptors for search and sound metadata
265+
266+
By combining the output of one or several audio analyzers (see the sections below), Freesound has a way to make audio descriptors
267+
avialable for filtering search queries and as sound metadata fields through the API. In this way, an API user is able to make
268+
queries and filter by both textual sound properties like tags and audio descriptors. Also, in the search results returned through the API,
269+
a user can specify which descriptor values should be returned, much like any other standard sound metadata field (see API docs for more info).
270+
271+
The way to specify which audio descriptors should be available as fields/filters, is trhough the `settings.CONSOLIDATED_AUDIO_DESCRIPTORS`
272+
configuration parameter. There, a list of descriptors is defined together with the way to find their values based on the output of audio
273+
analyzers. When analyzing sounds with audio analyzers, the results of each analyzer will be saved in disk (either with a .json or .yaml) file.
274+
Also, a `SoundAnalysis` object for every analyzer/sound pair will be created (see the sections above). Analyzers output will only be saved in disk,
275+
and will not be loaded in the corresponding `SoundAnalysis` object. The `Sound` model has a `Sound.consoidate_analysis()` method which, when run, will
276+
create a new `SoundAnalysis` object of analyzer type `consolidated` (`settings.CONSOLIDATED_ANALYZER_NAME`), will collect all the relevant descriptors
277+
data (following the `settings.CONSOLIDATED_AUDIO_DESCRIPTORS` list) from each individual analyzer output file, and will load the collected data in
278+
the `analysis_datya` field of the newly created `SoundAnalysis` object. This is how only the relevant audio descriptors data is load to the DB in the
279+
consolidated `SoundAnalysis` object. `Sound.consoidate_analysis()` is called every time a new analyzer relevant to `settings.CONSOLIDATED_AUDIO_DESCRIPTORS`
280+
finished an analysis task so sounds are updated "automaticlly". Also there is a management command `create_consolidated_sound_analysis_and_sim_vectors` that
281+
will help creating these objects in bulk.
282+
283+
For every descriptor defined in `settings.CONSOLIDATED_AUDIO_DESCRIPTORS` there are a number of options that can be set, including some value transformations
284+
and whether the descriptor should be indexed in the search engine. If no options are set, sensible defaults will be used. The current definition of
285+
`settings.CONSOLIDATED_AUDIO_DESCRIPTORS` should hopefully be self-explanatory.
286+
287+
When consolidated analyses are loaded in `SoundAnalysis` objects in the DB, adding sounds to the search engine will also include these
288+
descriptors and therefore these will be available for filtering in search queries. Note that multi-dimensional descriptors will not be indexed
289+
(they are not useful for filtering), and also descriptors marked with `index: False` will be skipped.
290+
291+
292+
### Similarity search and similarity spaces
293+
294+
Similarly to how audio descriptors work, Freesound can also define a number of "similarity spaces" that can be used for similarity search
295+
and that are based on the output of audio analyzers. Similarity spaces are defined through `settings.SIMILARITY_SPACES`. The entries
296+
of `settings.SIMILARITY_SPACES` define a number of properties like from which analyzer/property value the vector should be obtained,
297+
if the vector should be L2 normalized, etc. Checking the current definition of `settings.SIMILARITY_SPACES` should hopefully be self-explanatory.
298+
299+
The `Sound` model has a `Sound.load_similarity_vectors()` which will create corresponding `SoundSimilarityVector` objects for each pair of
300+
sound and type of similarity space. Once the vectors are loaded in the DB, they can be indexed in the search engine and also be used as targets
301+
for a similarity search. `Sound.load_similarity_vectors()` is called when any relevant analyser to `settings.SIMILARITY_SPACES` finished analysing
302+
a sound, therefore vectors should be automatically loaded (and also indexed in the search engine as sounds will also be marked as "index dirty" when
303+
new similarity vector objects are created). The management command `create_consolidated_sound_analysis_and_sim_vectors` can be used to help creating
304+
`SoundSimilarityVector` objects in bulk.
305+
306+
264307
### Considerations when updating Django version
265308

266309
#### Preparation

README.md

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ Below are instructions for setting up a local Freesound installation for develop
4545
freesound-data/previews/
4646
freesound-data/analysis/
4747

48-
4. Download [Freesound development similarity index](https://drive.google.com/file/d/1ydJUUXbQZbHrva4UZd3C05wDcOXI7v1m/view?usp=sharing) and the [Freesound tag recommendation models](https://drive.google.com/file/d/1snaktMysCXdThWKkYuKWoGc_Hk2BElmz/view?usp=sharing) and place their contents under `freesound-data/similarity_index/` and `freesound-data/tag_recommendation_models` directories respectively (you'll need to create the directories).
48+
4. Download the [Freesound tag recommendation models](https://drive.google.com/file/d/1snaktMysCXdThWKkYuKWoGc_Hk2BElmz/view?usp=sharing) and place the contents under `freesound-data/tag_recommendation_models` directory (you'll need to create that directory).
4949

5050
5. Rename `freesound/local_settings.example.py` file, so you can customise Django settings if needed and create a `.env` file with your local user UID and other useful settings. These other settings include `COMPOSE_PROJECT_NAME` and `LOCAL_PORT_PREFIX` which can be used to allow parallel local installations running on the same machine (provided that these to variables are different in the local installations), and `FS_BIND_HOST` which you should set to `0.0.0.0` if you need to access your local Freesound services from a remote machine.
5151

@@ -115,7 +115,16 @@ If you a prompted for a password, use `localfreesoundpgpassword`, this is define
115115

116116
Because the `web` container mounts a named volume for the home folder of the user running the shell plus process, command history should be kept between container runs :)
117117

118-
16. (extra step) The steps above will get Freesound running, but to save resources in your local machine some non-essential services will not be started by default. If you look at the `docker-compose.yml` file, you'll see that some services are marked with the profile `analyzers` or `all`. These services include sound similarity, search results clustering and the audio analyzers. To run these services you need to explicitly tell `docker compose` using the `--profile` (note that some services need additional configuration steps (see *Freesound analysis pipeline* section in `DEVELOPERS.md`):
118+
16. (extra) Load audio descriptors and similarity vectors to the database and reindex the search index. This is necessary to make audio descriptors available thorugh the API and to make similarity search work. Note that for this to work, you need to have properly set the development data folder, and you should see some files inside the `freesound-data/analysis` folders which store the (previously computed) results of Freesound audio analysers.
119+
120+
# First run the following command which will create relevant objects in the DB. Note that this can take some minutes.
121+
docker compose run --rm web python manage.py create_consolidated_sound_analysis_and_sim_vectors --force
122+
123+
# Then re-create the search engine sounds index after audio descriptors data has been loaded in the DB. You need to specifically indicate that similarity vectors should be added.
124+
docker compose run --rm web python manage.py reindex_search_engine_sounds --include-similarity-vectors
125+
126+
127+
The steps above will get Freesound running, but to save resources in your local machine some non-essential services will not be started by default. If you look at the `docker-compose.yml` file, you'll see that some services are marked with the profile `analyzers` or `all`. These services include sound tag recommendation and the audio analyzers. To run these services you need to explicitly tell `docker compose` using the `--profile` (note that some services need additional configuration steps (see *Freesound analysis pipeline* section in `DEVELOPERS.md`):
119128

120129
docker compose --profile analyzers up # To run all basic services + sound analyzers
121130
docker compose --profile all up # To run all services

_docs/api/source/resources.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,10 @@ laion_clap 512 This space is built using LAION-CL
183183
freesound_classic 100 This space is built using a combination of low-level acoustic audio features extracted using the ``FreesoundExtractor`` from the Essentia audio analysis library (https://essentia.upf.edu). We currently don't provide code to extract these features from arbitrary audio, but we might do that in the future.
184184
===================== ===================== ====================================================================
185185

186+
When using vectors as input for the ``similar_to`` parameter, make sure that the vectors are extracted using the same method as the one used to build the similarity space.
187+
Note that L2-normalisation is automatically applied to input vectors.
188+
If the provided vector is already L2-normalized, this will have no effect.
189+
186190

187191
.. _search-weights:
188192

sounds/management/commands/create_consolidated_sound_analysis_and_sim_vectors.py

Lines changed: 48 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -60,13 +60,20 @@ def add_arguments(self, parser):
6060
dest='chunk_size',
6161
default=100,
6262
help='Number of sounds to process in each chunk (default: 100).')
63-
63+
6464
parser.add_argument(
65-
'--clear_others',
65+
'--skip-consolidated-analysis',
6666
action='store_true',
67-
dest='clear_others',
67+
dest='skip_consolidated_analysis',
6868
default=False,
69-
help='If set, clear analysis data from SoundAnalysis obects other than "consolidated" one.')
69+
help='If set, skip generating consolidated analysis objects.')
70+
71+
parser.add_argument(
72+
'--skip-similarity-vectors',
73+
action='store_true',
74+
dest='skip_similarity_vectors',
75+
default=False,
76+
help='If set, skip generating similarity vector objects.')
7077

7178

7279
def handle(self, *args, **options):
@@ -94,57 +101,58 @@ def handle(self, *args, **options):
94101
for i in range(0, len(sound_ids_to_process), chunk_size):
95102
sound_ids = sound_ids_to_process[i:i+chunk_size]
96103
ss = Sound.objects.filter(id__in=sound_ids)
97-
98-
if options['clear_others']:
99-
# Clear data from all non-consolidated sound analysis objects related to these sounds
100-
ssaa = SoundAnalysis.objects.filter(sound__in=sound_ids).exclude(analyzer=settings.CONSOLIDATED_ANALYZER_NAME)
101-
ssaa.update(analysis_data={})
102104

103105
# Generate consolidated analyses and load similarity vectors for the chunk of sounds
104106
consolidated_analyis_objects = []
105107
similarity_vector_objects = []
106108
for sound in ss:
107-
consolidated_analysis_data, tmp_analyzers_data= sound.consolidate_analysis(no_db_operations=True)
108-
109-
consolidated_analyis_objects.append(SoundAnalysis(
110-
sound_id=sound.id,
111-
analyzer=settings.CONSOLIDATED_ANALYZER_NAME,
112-
analysis_data=consolidated_analysis_data,
113-
analysis_status = "OK",
114-
last_analyzer_finished = timezone.now()
115-
))
116-
117-
for similarity_space_name, similarity_space in settings.SIMILARITY_SPACES.items():
118-
analyzer_data = tmp_analyzers_data.get(similarity_space['analyzer'], {})
119-
if not analyzer_data:
120-
analyzer_data = SoundAnalysis.get_analysis_data_from_file_without_db(sound.id, similarity_space['analyzer'])
121-
if not analyzer_data:
122-
continue
123-
try:
124-
sim_vector = analyzer_data[similarity_space['vector_property_name']]
125-
sim_vector = [float(x) for x in sim_vector]
126-
except (IndexError, ValueError, KeyError):
127-
continue
128-
129-
if len(sim_vector) != similarity_space['vector_size']:
130-
continue
109+
if not options['skip_consolidated_analysis']:
110+
consolidated_analysis_data, tmp_analyzers_data= sound.consolidate_analysis(no_db_operations=True)
131111

132-
similarity_vector_objects.append(SoundSimilarityVector(
112+
consolidated_analyis_objects.append(SoundAnalysis(
133113
sound_id=sound.id,
134-
similarity_space_name=similarity_space_name,
135-
vector=sim_vector
114+
analyzer=settings.CONSOLIDATED_ANALYZER_NAME,
115+
analysis_data=consolidated_analysis_data,
116+
analysis_status = "OK",
117+
last_analyzer_finished = timezone.now()
136118
))
119+
120+
if not options['skip_similarity_vectors']:
121+
for similarity_space_name, similarity_space in settings.SIMILARITY_SPACES.items():
122+
analyzer_data = tmp_analyzers_data.get(similarity_space['analyzer'], {})
123+
if not analyzer_data:
124+
analyzer_data = SoundAnalysis.get_analysis_data_from_file_without_db(sound.id, similarity_space['analyzer'])
125+
if not analyzer_data:
126+
continue
127+
try:
128+
sim_vector = analyzer_data[similarity_space['vector_property_name']]
129+
sim_vector = [float(x) for x in sim_vector]
130+
except (IndexError, ValueError, KeyError):
131+
continue
132+
133+
if len(sim_vector) != similarity_space['vector_size']:
134+
continue
135+
136+
similarity_vector_objects.append(SoundSimilarityVector(
137+
sound_id=sound.id,
138+
similarity_space_name=similarity_space_name,
139+
vector=sim_vector
140+
))
137141

138142
# Now that we loaded all the data, create the db objcts in bulk
139143
if options['force']:
140144
# If force is set, we delete any existing consolidated analysis or similarity vector for these sounds
141145
# before creating the new ones
142146
# NOTE: we used ignore_conflicts=True below to avoid issues force is set to False and some objects already exist
143-
SoundAnalysis.objects.filter(sound__in=sound_ids, analyzer=settings.CONSOLIDATED_ANALYZER_NAME).delete()
144-
SoundSimilarityVector.objects.filter(sound__in=sound_ids).delete()
147+
if not options['skip_consolidated_analysis']:
148+
SoundAnalysis.objects.filter(sound__in=sound_ids, analyzer=settings.CONSOLIDATED_ANALYZER_NAME).delete()
149+
if not options['skip_similarity_vectors']:
150+
SoundSimilarityVector.objects.filter(sound__in=sound_ids).delete()
145151

146-
SoundAnalysis.objects.bulk_create(consolidated_analyis_objects, ignore_conflicts=True)
147-
SoundSimilarityVector.objects.bulk_create(similarity_vector_objects, ignore_conflicts=True)
152+
if not options['skip_consolidated_analysis']:
153+
SoundAnalysis.objects.bulk_create(consolidated_analyis_objects, ignore_conflicts=True)
154+
if not options['skip_similarity_vectors']:
155+
SoundSimilarityVector.objects.bulk_create(similarity_vector_objects, ignore_conflicts=True)
148156

149157
total_done += chunk_size
150158
elapsed = time.monotonic() - starttime

sounds/models.py

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2299,10 +2299,16 @@ class SoundSimilarityVector(models.Model):
22992299
similarity_space_name = models.CharField(max_length=100)
23002300
vector = ArrayField(models.FloatField())
23012301

2302-
def apply_l2_normalization(self, commit=True):
2303-
norm = math.sqrt(sum([v*v for v in self.vector]))
2302+
@classmethod
2303+
def l2_normalize_vector(cls, vector):
2304+
norm = math.sqrt(sum([v*v for v in vector]))
23042305
if norm > 0:
2305-
self.vector = [v/norm for v in self.vector]
2306+
return [v/norm for v in vector]
2307+
else:
2308+
return vector
2309+
2310+
def apply_l2_normalization(self, commit=True):
2311+
self.vector = self.l2_normalize_vector(self.vector)
23062312
if commit:
23072313
self.save()
23082314

utils/search/backends/solr555pysolr.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -621,6 +621,9 @@ def search_sounds(self, textual_query='', query_fields=None, query_filter='', fi
621621
vector = None
622622
if isinstance(similar_to, list):
623623
vector = similar_to # we allow vectors to be passed directly
624+
# If vector needs to be l2 normalized, do it now. Note that if the vector is already normalized, this will have no effect
625+
if config_options.get('l2_norm', False):
626+
vector = SoundSimilarityVector.l2_normalize_vector(vector)
624627
else:
625628
# similar_to should be a sound_id
626629
try:

0 commit comments

Comments
 (0)