A toy example for Reinforcement Learning, CLI via fire, CI/CD on gitlab in Python

Eager to Play TicTacToe?

The package neckar contains a fully functional reinformcement learning model to train yourself and play against in the game TicTacToe. Here’s how you can simply download and install the result if you have a gitlab account (and can set up a personal access token…):

pip install neckar --extra-index-url https://__token__:personalaccesstoken@gitlab.com/api/v4/projects/37714906/packages/pypi/simple

… and if you don’t: Download the wheel from this link, cd to the wheel and run

pip install neckar-0.1.1-py3-none-any.whl

In both cases, you then start plaxing against the trained model by runnning

reilingen tictactoe play

The standard model was trained in 20000 rounds before shipping. If the computer is too good for you, you can dumb it down like so:

reilingen tictactoe train --rounds=500

and you should be able to win a game, even though the computer always starts.

How it’s done: Reinforcement Learning

This main part of the code, residing in scr/neckar/tictactoe.py is essentially copied directly from here. The only two major changes I did was in the data structure used: data frames instead of numpy arrays. This is to support the second change: the winning condition is evaluated with regexes rather than with sums. I plan on extending the package to include connect four, and wanted to streamline the detection of winning patterns.

The mathematical background to the reinforcement learning part is temporal difference learning as described on the wiki. The main formula consists in updating the state probabilities after each run using the regression In the code, this line reads

self.states_value[st] += self.lr * (self.decay_gamma * reward - self.states_value[st])

and its corresponding method takes up just five lines of code of the ~300 lines.

How it’s done: Package set up

As can be seen from the README, the package was set up with PyScaffold. In order to start with a fresh scaffold, I simply ran

pip install pyscaffold
putup neckar

and receieved a working a skeleton of a Python package, with the right structure and some dummy files. From there, I simple replaced the reinforcement learning code at the right places and added sensible unit tests in ./tests for my new winning detection.

How it’s done: CLI with fire

The skeleton provided by PyScaffold already comes with an entry point, see setup.cfg

[options.entry_points]
console_scripts =
     reilingen = neckar.main:run

I use fire to expose the functions play and train to the CLI. This allows the user to interact with the application and even train it, and is implemented in main.py and allows for the usage as described in Eager to Play TicTacToe?

How it’s done: CI/CD in gitlab

The step from a PyScaffold setup to a CI/CD pipeline in gitlab is simple but has many, many positive consequences. The pipeline is defined in .gitlab-ci.yml, and it is run any time there is a new commit or a new tag in the package in gitlab. The step run contains the line

python -m build
python -m twine upload --repository-url ${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/packages/pypi dist/*

and therefore an entire whl is built any time a new commit is created. The final packages lie in the package repository and are publicly available as shown in Section Eager to Play TicTacToe?