Eager to Play TicTacToe?
The package neckar
contains a fully functional reinformcement learning model to train yourself and play against in the game TicTacToe. Here’s how you can simply download and install the result if you have a gitlab account (and can set up a personal access token…):
pip install neckar --extra-index-url https://__token__:personalaccesstoken@gitlab.com/api/v4/projects/37714906/packages/pypi/simple
… and if you don’t: Download the wheel from this link, cd to the wheel and run
pip install neckar-0.1.1-py3-none-any.whl
In both cases, you then start plaxing against the trained model by runnning
reilingen tictactoe play
The standard model was trained in 20000 rounds before shipping. If the computer is too good for you, you can dumb it down like so:
reilingen tictactoe train --rounds=500
and you should be able to win a game, even though the computer always starts.
How it’s done: Reinforcement Learning
This main part of the code, residing in scr/neckar/tictactoe.py
is essentially copied directly from here.
The only two major changes I did was in the data structure used: data frames instead of numpy arrays. This is to support the second change: the winning condition is evaluated with regexes rather than with sums. I plan on extending the package to include connect four, and wanted to streamline the detection of winning patterns.
The mathematical background to the reinforcement learning part is temporal difference learning as described on the wiki. The main formula consists in updating the state probabilities after each run using the regression
In the code, this line reads
self.states_value[st] += self.lr * (self.decay_gamma * reward - self.states_value[st])
and its corresponding method takes up just five lines of code of the ~300 lines.
How it’s done: Package set up
As can be seen from the README, the package was set up with PyScaffold. In order to start with a fresh scaffold, I simply ran
pip install pyscaffold
putup neckar
and receieved a working a skeleton of a Python package, with the right structure and some dummy files. From there, I simple replaced the reinforcement learning code at the right places and added sensible unit tests in ./tests
for my new winning detection.
How it’s done: CLI with fire
The skeleton provided by PyScaffold already comes with an entry point, see setup.cfg
[options.entry_points]
console_scripts =
reilingen = neckar.main:run
I use fire
to expose the functions play
and train
to the CLI. This allows the user to interact with the application and even train it, and is implemented in main.py
and allows for the usage as described in Eager to Play TicTacToe?
How it’s done: CI/CD in gitlab
The step from a PyScaffold setup to a CI/CD pipeline in gitlab is simple but has many, many positive consequences. The pipeline is defined in .gitlab-ci.yml
, and it is run any time there is a new commit or a new tag in the package in gitlab. The step run
contains the line
python -m build
python -m twine upload --repository-url ${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/packages/pypi dist/*
and therefore an entire whl is built any time a new commit is created. The final packages lie in the package repository and are publicly available as shown in Section Eager to Play TicTacToe?