Improving Automatic Reviewer Assignment using Large Language Models For NBDT Journal
- Mentors
- Titipat Achakulvisut, Daniele Marinazzo, Björn Brembs
- Organization
- INCF
- Technologies
- python, tensorflow, numpy, pytorch, pandas, keras, HuggingFace, Google Colab, LLMs
- Topics
- machine learning, natural language processing
Matching papers to reviewers based on topics is a crucial task for the Neurons,
Behavior, Data Analysis, and Theory (NBDT) journal. However, the current
automatic reviewer assignment tool that uses SciBERT embeddings, cosine
similarity, and linear programming may not capture the semantic meaning of
the text accurately. This project aims to address this limitation by
finetuning SciBERT on the relevant corpus of data and selecting the
appropriate optimization objectives. SciBERT can learn from both the left and
right contexts of words and has a vocabulary that is more suitable for scientific
texts than BERT. The project will involve creating a training dataset, pre-processing it, generating the appropriate features, selecting and fine-tuning
SciBERT, generating the word embeddings from the fine-tuned SciBERT and
choosing the appropriate optimization objectives (Contrastive Learning, Learning to Rank Diversely, and LambdaRank) according to the dataset
obtained and the subsequent evaluation of its performance. The expected
outcome is an improved tool that more accurately matches papers to
reviewers for the NBDT journal and can potentially be useful in other domains
as well.