This repository contains the dataset and source code for the models utilized in the publication "DiVa: An Iterative Framework to Harvest More Diverse and Valid Labels from User Comments for Music", which was presented at ACM MM2023.
DiVa is an iterative framework designed to obtain song labels from user comments. It leverages the knowledge provided by a small number of gold labels annotated by experts and a large amount of user comments to obtain more diverse and valid labels for each song. DiVa ultimately yields a complete set of song labels for each song.
The experimental data used in this research was provided by Tencent Music Entertainment Group. We would like to thank them for their contribution to this project.
We apologize that the code and data for this project are not yet available. We are currently working on preparing them for release and will make them available as soon as possible. Thank you for your patience and understanding.
Coming soon.
We have three datasets in total: a training dataset and two test datasets (test1 and test2). Each sample in the datasets includes user comments for a certain song and annotated labels. The train and test1 datasets only have gold labels, while test2 is fully annotated. Here is the link to download the dataset: train test1 test2 If failed, please try this with the extraction code fjy5.
To download the dataset, simply click on the link and it will take you to the download page. Once downloaded, you can use the dataset for your research or analysis. Please make sure to cite the source of the dataset if you use it in your work.
Coming soon.