Thinking about AI safety
- Massachussetts
-
23:30
(UTC -04:00) - awestover.github.io
Pinned Loading
-
-
-
DQN-maze-solver
DQN-maze-solver PublicInvestigating whether or not RL agents can acausally collaborate with other instances of themselves.
Python 1
-
transformer-shortest-paths
transformer-shortest-paths PublicExperimentally evaluating transformer's generalization on a synthetic task
HTML 1
-
activation-steering-vs-prompting
activation-steering-vs-prompting PublicIs activation steering more powerful than prompting at mitigating deception in some current reasoning LLMs?
Jupyter Notebook 1
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.