Posted 2022-10-03Updated 2022-10-15Data Science

Houyi Project Technical Report - How to Win an Intelligent War in Apex Legends

HOUYI is positioned as a general-purpose aiming assistant for electronic shooting games. It currently supports Apex Legends, abbreviated below as apex. Its runtime framework is a two-stage pipeline of detection and tracking: first, object detection is used to identify enemies on the screen, and then the crosshair position is computed and corrected automatically. The mission of HOUYI is to use computer technology to resist the alienation imposed by electronic shooting games.

When nothing is stored within, forms reveal themselves. Its movement is like water, its stillness like a mirror, its response like an echo.

Zhuangzi, “Under Heaven”

Forms Reveal Themselves

Deep learning algorithms represented by YOLO have excellent performance and mature deployment workflows, making them the mainstream choice for object detection in computer vision today. Houyi uses the PaddlePaddle framework from Baidu to train detection models, and uses Microsoft ONNX Runtime together with Nvidia TensorRT for deployment.

Data Understanding

The main difficulties of object detection in apex are:

Apex has many playable characters, many skins, and complex visual effects
Distinguishing enemies from teammates is difficult
Distant targets, over 30 meters away, occupy very few pixels on screen, which makes this a small-object detection problem
Real-time inference is required even while mainstream GPUs are under game load

Example from the center region of the screen:

Target region:

Data Collection and Annotation

Overall data pipeline HOUYI uses about 13k manually annotated samples as the initial dataset. After the first model version is trained and put into production, online real-world samples are collected to expand the dataset, enabling collaborative evolution between client and cloud.

Data collection strategy HOUYI implements a real-time uniform sampling strategy for screen images under different states, namely hip fire, ADS firing, and ADS aiming. This reduces information loss across states as much as possible and keeps the training distribution favorable for learning and aligned with actual gameplay.

Mode	Sampling Region	Sampling Ratio	Compression
Hip Fire	1080x1080	1/100	Lossless
ADS Firing	640x640	1/100	Lossless
ADS Aiming	640x640	1/500	Lossless

Feature consistency HOUYI ensures online and offline feature consistency in two ways:

Integrated deployment and collection. During live use, before each model inference there is a small probability that the image is cached and uploaded to the server after the session ends. In addition, each crosshair correction record is uploaded as reference material for control algorithm research.
Lossless compression format. PNG is used to guarantee pixel-level consistency.

Automatic annotation HOUYI uses the offline model with the most parameters and highest accuracy to pre-label the online collected samples. After manual inspection and fine-tuning, the samples are added to the training set.

Object Detection Algorithm

PP-YOLOE+ HOUYI mainly uses PP-YOLOE+ as the online inference model. PP-YOLOE+ is an excellent one-stage anchor-free model based on PP-YOLOv2. Its accuracy-fps tradeoff surpasses many popular YOLO variants. On top of that, PP-YOLOE+ converges in only 80 epochs, which greatly reduces training time and cost compared with the more common 300-epoch setting.

Model	Epoch	AP^0.5:0.95	AP^0.5	AP^0.75	AP^small	AP^medium	AP^large	AR^small	AR^medium	AR^large
PP-YOLOE+_s	80	43.7	60.6	47.9	26.5	47.5	59.0	46.7	71.4	81.7
PP-YOLOE+_m	80	49.8	67.1	54.5	31.8	53.9	66.2	53.3	75.0	84.6
PP-YOLOE+_l	80	52.9	70.1	57.9	35.2	57.5	69.1	56.0	77.9	86.9
PP-YOLOE+_x	80	54.7	72.0	59.9	37.9	59.3	70.4	57.0	78.7	87.2

Detection examples:

It Moves Like Water

Once a target has been detected on the screen, the next step is to drag the crosshair onto it, which is actually quite difficult. Player input, weapon recoil, and irregular enemy motion all create control error. In addition, server latency, inference latency, and input latency mean that simply moving toward the target may only chase where the enemy used to be. This problem cannot be solved completely by program optimization alone, so a dedicated control algorithm is needed for a better user experience.

PID

Closed-loop control corrects error using feedback from the controlled system. Moving the crosshair can also be viewed as a closed-loop control problem: take the current crosshair position as the origin, then the enemy coordinate at time $t$ is the error $e(t)$. If the enemy is currently at $(100, 200)$, then the most naive idea is to move the mouse by $(100, 200)$. But such teleport-like behavior is unnatural and not optimal. A more natural idea is to introduce the most mature and widely used closed-loop control algorithm from continuous systems: PID (proportional-integral-derivative controller).

$$ u(t)=K_{\mathrm{p}} e(t)+K_{\mathrm{i}} \int_{0}^{t} e(\tau) \mathrm{d} \tau+K_{\mathrm{d}} \frac{\mathrm{d} e(t)}{\mathrm{d} t} $$

The first term is the proportional term, the second is the integral term, and the third is the derivative term.

The first term is easy to understand: the larger the error, the larger the required correction. But a pure proportional controller quickly runs into trouble. If an enemy moves to the right at a constant speed of 10 and we set $K_p=0.5$, then the steady state becomes $e(t)=20, u(t)=10$.

At that point an integral term is needed to eliminate steady-state error. For example, if the accumulated error over the past second is 20 and we set $K_i=0.25$, then $u(t)=200.5+200.25=15$, which exceeds the enemy’s movement speed and can reduce the distance to the target. A controller with integral action is called a PI controller.

But if i is too small, convergence is slow. If i is too large, jitter appears and convergence also slows down. The best effect is a slight overshoot:

The final derivative term mainly reduces oscillation during control and mitigates overshoot. Since $\mathrm{d}e(t)$ is generally negative, the derivative term acts like a brake. It reduces the pulling amplitude in advance and prevents the crosshair from overshooting the target. When PID is tuned properly:

The hard parts of applying PID are:

The enemy’s deliberately irregular motion during gunfights cannot be predicted by PID, so compensation is limited
Parameter tuning is difficult, which is the classic headache of PID
Different shooting scenarios require different PID parameters

PVAR: vector autoregressive proportional control

If I said that the main reason I wanted to go on the Journey to the West was to visit the Women’s Kingdom, and that reaching India just happened along the way, you probably would not believe me either.

Journey to the West Diary

When I talk about aim assist, what I really want to talk about is time-series forecasting.
In a typical correction process, $e(t)$ decreases rapidly toward zero at first, and then oscillates left and right because of recoil and other factors. The following table shows time-series data from one aiming sequence. enemy_box and confidence are the enemy bounding box and confidence output by the model. ex and ey are the enemy’s horizontal and vertical distances from the crosshair, and ux and uy are the mouse movement commands output by the controller.

	enemy_box_x1	enemy_box_y1	enemy_box_x2	enemy_box_y2	confidence	ex	ey	ux	uy
0	266.21786	304.98212	289.40607	362.9163	0.548098	-42.188035	13.94921	-25.31283	8.369531
1	261.20413	304.77383	285.12802	362.66193	0.705141	-46.833925	13.71788	-29.106307	8.525386
2	273.99313	297.0426	297.64008	360.8898	0.700376	-34.183395	8.9662	-22.454103	5.920422
3	289.20618	288.6216	315.9204	360.6148	0.700991	-17.43671	4.6182	-12.794311	3.41447
4	304.0824	283.4045	331.3549	357.4968	0.622412	-2.28135	0.45065	-3.746427	0.922879
5	311.96606	282.16705	340.17767	357.0965	0.715523	6.071865	-0.368225	1.40623	0.423019
6	313.57156	282.32294	341.7551	357.35828	0.734364	7.66333	-0.15939	2.544032	0.544513
7	313.0578	282.24142	341.4103	357.54837	0.754622	7.23405	-0.105105	2.42217	0.575116
8	311.94815	282.4199	340.3355	356.5956	0.768114	6.141825	-0.49225	2.91703	0.036619
9	311.10672	282.8336	339.47678	356.11987	0.770691	5.29175	-0.523265	2.516004	0.007234
10	309.4765	282.83307	337.44662	356.76465	0.778666	3.46156	-0.20114	2.424042	-0.049513
11	305.26862	290.59354	332.79498	361.52643	0.760621	-0.9682	6.059985	0.129709	3.759036
12	303.50372	291.24722	333.49023	363.27325	0.66165	-1.503025	7.260235	-0.325249	4.665564

Across thousands of measured aiming records, the median $e_x$ and $e_y$ sequences passed the unit-root stationarity test ADFULLER (p<0.0002). Therefore, HOUYI not only corrects the current $e_t$, but also uses a vector autoregression model, VAR, to predict $\hat e_{t+1}$.

From the perspective of vector autoregression,

$$ e_{t}=c+A_{1} e_{t-1}+A_{2} e_{t-2}+\cdots+A_{p} e_{t-p} $$

We then inject prior knowledge to adjust the model:

The current error has a linear relationship with the previous movement command $U_{t-1}$, so a new explanatory variable is introduced
The error does not contain a constant term

The final model used by HOUYI is

$$ e_{t}=A_{1} e_{t-1}+A_{2} e_{t-2}+\cdots+A_{p} e_{t-p} +A_u U_{t-1} $$

VAR and the proportional controller together form HOUYI’s main control algorithm, PVAR.

It Responds Like An Echo

Performance optimization is both the most important and the most labor-intensive part of HOUYI. Low-latency, high-throughput image processing is the foundation of the final experience.

Screen capture There are many ways to capture the screen, but the most efficient one is DXGI, which uses the GPU to process textures directly. Capture speed is therefore almost limited only by GPU rendering speed. On top of that, HOUYI specifically optimizes memory efficiency for partial capture: only data from the required region is read, and data type conversion is performed only for the target area.

Preprocessing The main performance killer during preprocessing is still memory access efficiency. HOUYI removes all unnecessary memory copies and allocates a fixed region in GPU memory for model input, roughly doubling the number of corrections per second.

Inference engine The model is exported to ONNX and then deployed with Nvidia’s black-tech TensorRT.

The core of NVIDIA TensorRT is a C++ library that facilitates high-performance inference on NVIDIA graphics processing units (GPUs). TensorRT takes a trained network, which consists of a network definition and a set of trained parameters, and produces a highly optimized runtime engine that performs inference for that network. TensorRT provides API’s via C++ and Python that help to express deep learning models via the Network Definition API or load a pre-defined model via the parsers that allow TensorRT to optimize and run them on an NVIDIA GPU. TensorRT applies graph optimizations, layer fusion, among other optimizations, while also finding the fastest implementation of that model leveraging a diverse collection of highly optimized kernels. TensorRT also supplies a runtime that you can use to execute this network on all of NVIDIA’s GPUs from the Kepler generation onwards. TensorRT also includes optional high speed mixed precision capabilities introduced in the Tegra X1, and extended with the Pascal, Volta, Turing, and NVIDIA Ampere GPU architectures.

It is worth mentioning that TensorRT performs hardware-specific optimization for different GPUs, so the first startup requires about 15 minutes to rebuild the model runtime. In this application scenario, that cost is negligible.

Postscript

If you want to use cheats for the pleasure of casually deleting other players, do not use Houyi;

If you do not approve of using external power to break through human limits, do not use Houyi;

If you think it is cool to use algorithms and compute to surpass human limits when processing the same input data, and if you want to use computer technology to resist the alienation imposed by electronic shooting games, then may the Houyi Project help you win an intelligent war.

References

Houyi Project Technical Report - How to Win an Intelligent War in Apex Legends

https://en.heth.ink/houyi/

Author

Posted on

2022-10-03

Updated on

2022-10-15

Houyi Project Technical Report - How to Win an Intelligent War in Apex Legends

Forms Reveal Themselves

Data Understanding

Data Collection and Annotation

Object Detection Algorithm

It Moves Like Water

PID

PVAR: vector autoregressive proportional control

It Responds Like An Echo

Postscript

Author

Posted on

Updated on

Licensed under

Catalogue

Categories

Recents