Fishnet on Armv8 vs. x86_64
Fishnet is Stockfish for Lichess
Fishnet is a distributed system that runs Stockfish NNUE (Efficiently updateable neural network) to analyse games on Lichess. People around the world can donate their CPU time to help improve people’s chess games, find out how they blundered, and continuously improve. Fishnet is not the first system to be designed to take advantage of consumer CPU cycles. SETI@home & Folding@home were/are some of the biggest names (that I am aware of) that utilise this distributed cpu system.
After I got my Fishnet API key I could not wait to finally get it up and running. I have a few servers that are currently participating in OpenNIC, running tor relays, and relaying email for me. All of the requirements are network related, so I have CPU cycles to spare.
While this was incredibly easy due to the fact that I have ansible playbooks for deploying new docker containers, I also managed to finally get one of the newish Oracle A1 ARM “free tier” boxes. I’m not one to shy away from free tiers, and oracle being the smallest player in the space has certainly made their free tier very inviting. 1x ARMv8 4core 24GB Ram, 45GB of storage. So one I had finished spamming the Launch instance button continuously for a couple days because there were no available instances in my “home region” I finally got one, loaded my ansible scripts, and proceeded to get a failing docker container.
Yep, architecture miss-match. Fishnet docker containers are specifically x86_64 rather than Armv8. Me: “Oh no, whatever shall I do?” I say facetiously. Just add a few extra lines to our ansible role to build from source, and voila.
Armv8 vs. x86_64 benchmark
To make the benchmarks as even as possible. They all had 1 core assigned to them.
- Box1: Engine: stockfish-x86-64-avx2 : ~386-400knps : CPU: AMD EPYC 7551
- Box2: Engine: stockfish-x86-64-avx2 : ~388-392knps : CPU: AMD EPYC 7551
- Box3: Engine: stockfish-x86-64-sse41-popcnt : ~489-534knps : CPU: Intel i7-3770k
- Box4: Engine: stockfish-armv8 : ~480-520knps : CPU: Oracle A1
ARM showing up to the table of big players is a refreshing feeling. Fishnet (Stockfish) neural network is amazingly optimised for performance. When I scaled the cores up, it was quite linear in regards to performance. The A1 processor gets around 850-870knps average when using 2 cores. Using 2 cores and saying it scales linear is not really statistically significant. A friend has one of the newish AMD 5900X 12 core, 24 thread processor. Luckily they are a great friend and let me run fishnet for an hour to put their processor through the ringer. With 23 “cores” running, we got on average 17000knps! Which is ~730knps per core.
What makes these numbers significant is the time it takes to analyse a chess game. No one likes waiting too long for games to be analysed after losing/winning. So When you load up fishnet, your client is added into one of two pools of clients. The “user waiting on analysis” pool or the “other” pool. The 6-8 second mark seems to be a good recommendation. Any longer, and I would probably stop caring about the result, and queue for another bullet game. Going off the Fishnet recommendation nearly all CPUs are fast enough to help out! As they can reach the recommended ~2meganodes (~2000knps) in 6 seconds.
I love chess, can I help out too?
You can find my docker-compose file inside my fishnet role on git here if docker is your thing. Otherwise read more about contributing to fishnet on their git niklasf/fishnet.
Since I have excess CPU usage, it is a nice feeling to give back to a service I use daily. It is the icing on the cake to contribute the free tier CPU of the cloud mogels to community run services.