The Future of LLM Training: Decoding Dialogue Data with Selecta

2024.07.18
admin
5 min read

Nowadays we observe the powerful success of Large Language Models (LLMs). The last generation of LLMs is capable of maintaining meaningful dialogue with a human, solving logical and mathematical tasks, writing essays, reviews, letters, etc, summarizing texts, and even writing code on programming languages.
The incredible capabilities of LLMs are primarily attributed to Reinforcement Learning with Human Feedback (RLHF), enabling them to excel in various tasks.
Initially, the LLM (the transformer) undergoes pretraining on a vast collection of unstructured and unlabeled texts. And then fine-tuning takes place through RLHF to further enhance its performance.

The major challenges of the reonforcement learning with human feedback (RLHF)

Secondly, we find in the user’s text references for the games as well as criteria that she/he wants the game to satisfy. We compute scores of similarity to mentioned games, as well as matching the criteria (if present in user’s cues) for games in the database.
The main score is the semantic similarity between game descriptions in the database and the user’s text; it is calculated using embeddings from the Sentence Transformer model.
Also, we use the rating of the games and semantic similarity to specific parts of the user’s input that are connected to particles of the concept. We compute the weighted sum of these scores, called combo scores.
All games in the database are ranked according to this combo score, and the first k games of this ranking are recommended to the user.