[D] Using RLHF beyond preference tuning

Scout Monitoring - Free Django app performance insights with Scout Monitoring

Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

www.scoutapm.com

featured

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

trl

13 8,412 9.6 Python

Train transformer language models with reinforcement learning.

They have examples of making GPT output more positive (code) by using a sentiment model as reward. There are other examples about reducing toxicity, summarization here: https://github.com/lvwerra/trl/tree/main/examples . Should be fairly simple to modify the sentiment example and try the calculator reward you mentioned above.

Scout Monitoring

www.scoutapm.com featured

Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

NumPy for Beginners: A Basic Guide to Get You Started

1 project | dev.to | 6 Jun 2024
ScoutSuite

2 projects | dev.to | 6 Jun 2024
Ask HN: Who wants to be hired? (June 2024)

13 projects | news.ycombinator.com | 3 Jun 2024
Python Bytecode: A Beginner’s Guide

1 project | dev.to | 6 Jun 2024
GlueCannon: Simplify VPN Container Orchestration and Deployment with Gluetun

1 project | news.ycombinator.com | 6 Jun 2024

[D] Using RLHF beyond preference tuning

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning Post date: 14 Apr 2023

trl

Scout Monitoring

Related posts

NumPy for Beginners: A Basic Guide to Get You Started

ScoutSuite

Ask HN: Who wants to be hired? (June 2024)

Python Bytecode: A Beginner’s Guide

GlueCannon: Simplify VPN Container Orchestration and Deployment with Gluetun