
Learning from human preferences
Leveraging insights from human preferences to enhance user experiences.

Learning to cooperate, compete, and communicate
Exploring the dynamics of cooperation, competition, and communication

UCB exploration via Q-ensembles
Optimizing exploration using Q-ensembles in UCB algorithms

OpenAI Baselines: DQN
Exploring the capabilities of OpenAI Baselines with a focus on DQN algorithm

Robots that learn
Exploring the cutting-edge advancements in machine learning for robotics.

Roboschool
Roboschool: Exploring the Future of Robotics through Interactive Learning Environments

When Bash Scripts Bite
The blogpost discusses the potential pitfalls of using shell scripts and the prevalent warnings against their usage.

Looking for a technical writer
The technical writer position has been filled. Update on the hiring process.

Caveat Configurator: how to replace configs with code, and why you might not want to
Replacing configs with code and the downsides of doing so

Equivalence between policy gradients and soft Q-learning
Exploring the equivalence between policy gradients and soft Q-learning

This is not the performance you were looking for: the tricks systems play on us
The impact of deployment choices on software performance and the potential erasure of optimization efforts due to scheduling policy, affinity, or background workload on a server.

Stochastic Neural Networks for hierarchical reinforcement learning
Exploring the application of stochastic neural networks in hierarchical reinforcement learning.