GitHub - ivangermanov/openml-tags

This repo contains my Master's thesis project in Computer Science & Engineering.

You can find the thesis PDF here.

Summary

Developed a novel NLP pipeline to automatically generate tags for texts, improving discoverability and organization on the OpenML platform.

Data Preparation: Conducted in-depth exploratory data analysis and data augmentation to improve input quality.
Advanced NLP Techniques: Integrated LLaMA-3-70b LLM for prompt-based tag generation and a DeBERTa-based zeroshot classifier for tag filtering, achieving nuanced and context-aware tagging.
Model Optimization: Extended and optimized the BERTopic model with advanced embedding (Salesforce/SFR-Embedding-2_R) and dimensionality reduction (UMAP) techniques, fine-tuned using Bayesian optimization.
Automated Evaluation: Achieved a combined NPMI and diversity score of 0.779, outperforming established baselines (LDA, NMF, Top2Vec, CTM) in automated evaluations.
Human Evaluation: Conducted a user study (n=21) and large-scale automated evaluation (using GPT-4-mini), demonstrating superior performance compared to the baseline and approaching human-level results on multiple metrics.
Cost-Effective Pipeline: Developed a computationally cheaper pipeline configuration maintaining acceptable tag quality.

Environment Setup

To set up the environment and install the necessary dependencies, please follow these steps.

Prerequisites

Make sure you have Anaconda or Miniconda installed on your system.

Steps

Create a new environment using the environment.yml file:
```
conda env create -f environment.yml
```
Activate the environment:
```
conda activate openml-tags
```
Verify the environment is working as expected by running:
```
conda list
```

At this point, all required packages should be installed, and you can start using the repository and running the notebooks in the notebooks directory.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
latex		latex
mvp		mvp
notebooks		notebooks
scrapy		scrapy
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
CITATION.CFF		CITATION.CFF
README.md		README.md
baseline_survey.pdf		baseline_survey.pdf
environment.yml		environment.yml
environment_octis.yml		environment_octis.yml
environment_simple.yml		environment_simple.yml
human_generated_survey.pdf		human_generated_survey.pdf
proposed_model_survey.pdf		proposed_model_survey.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Summary

Environment Setup

Prerequisites

Steps

About

Releases

Packages

Languages

ivangermanov/openml-tags

Folders and files

Latest commit

History

Repository files navigation

Summary

Environment Setup

Prerequisites

Steps

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages