Houyi Project Technical Report - How to Win an Intelligent War in Apex Legends

Houyi Project Technical Report - How to Win an Intelligent War in Apex Legends

HOUYI is positioned as a general-purpose aiming assistant for electronic shooting games. It currently supports Apex Legends, abbreviated below as apex. Its runtime framework is a two-stage pipeline of detection and tracking: first, object detection is used to identify enemies on the screen, and then the crosshair position is computed and corrected automatically. The mission of HOUYI is to use computer technology to resist the alienation imposed by electronic shooting games.

When nothing is stored within, forms reveal themselves. Its movement is like water, its stillness like a mirror, its response like an echo.

  • Zhuangzi, “Under Heaven”

Forms Reveal Themselves

Deep learning algorithms represented by YOLO have excellent performance and mature deployment workflows, making them the mainstream choice for object detection in computer vision today. Houyi uses the PaddlePaddle framework from Baidu to train detection models, and uses Microsoft ONNX Runtime together with Nvidia TensorRT for deployment.

Data Understanding

The main difficulties of object detection in apex are:

  • Apex has many playable characters, many skins, and complex visual effects
  • Distinguishing enemies from teammates is difficult
  • Distant targets, over 30 meters away, occupy very few pixels on screen, which makes this a small-object detection problem
  • Real-time inference is required even while mainstream GPUs are under game load

Example from the center region of the screen:

Target region:

Data Collection and Annotation

Overall data pipeline HOUYI uses about 13k manually annotated samples as the initial dataset. After the first model version is trained and put into production, online real-world samples are collected to expand the dataset, enabling collaborative evolution between client and cloud.

Data collection strategy HOUYI implements a real-time uniform sampling strategy for screen images under different states, namely hip fire, ADS firing, and ADS aiming. This reduces information loss across states as much as possible and keeps the training distribution favorable for learning and aligned with actual gameplay.

Mode Sampling Region Sampling Ratio Compression
Hip Fire 1080x1080 1/100 Lossless
ADS Firing 640x640 1/100 Lossless
ADS Aiming 640x640 1/500 Lossless

Feature consistency HOUYI ensures online and offline feature consistency in two ways:

  • Integrated deployment and collection. During live use, before each model inference there is a small probability that the image is cached and uploaded to the server after the session ends. In addition, each crosshair correction record is uploaded as reference material for control algorithm research.
  • Lossless compression format. PNG is used to guarantee pixel-level consistency.

Automatic annotation HOUYI uses the offline model with the most parameters and highest accuracy to pre-label the online collected samples. After manual inspection and fine-tuning, the samples are added to the training set.

Object Detection Algorithm

PP-YOLOE+ HOUYI mainly uses PP-YOLOE+ as the online inference model. PP-YOLOE+ is an excellent one-stage anchor-free model based on PP-YOLOv2. Its accuracy-fps tradeoff surpasses many popular YOLO variants. On top of that, PP-YOLOE+ converges in only 80 epochs, which greatly reduces training time and cost compared with the more common 300-epoch setting.

Model Epoch AP0.5:0.95 AP0.5 AP0.75 APsmall APmedium APlarge ARsmall ARmedium ARlarge
PP-YOLOE+_s 80 43.7 60.6 47.9 26.5 47.5 59.0 46.7 71.4 81.7
PP-YOLOE+_m 80 49.8 67.1 54.5 31.8 53.9 66.2 53.3 75.0 84.6
PP-YOLOE+_l 80 52.9 70.1 57.9 35.2 57.5 69.1 56.0 77.9 86.9
PP-YOLOE+_x 80 54.7 72.0 59.9 37.9 59.3 70.4 57.0 78.7 87.2

Detection examples:


It Moves Like Water

Once a target has been detected on the screen, the next step is to drag the crosshair onto it, which is actually quite difficult. Player input, weapon recoil, and irregular enemy motion all create control error. In addition, server latency, inference latency, and input latency mean that simply moving toward the target may only chase where the enemy used to be. This problem cannot be solved completely by program optimization alone, so a dedicated control algorithm is needed for a better user experience.

PID

Closed-loop control corrects error using feedback from the controlled system. Moving the crosshair can also be viewed as a closed-loop control problem: take the current crosshair position as the origin, then the enemy coordinate at time \(t\) is the error \(e(t)\). If the enemy is currently at \((100, 200)\), then the most naive idea is to move the mouse by \((100, 200)\). But such teleport-like behavior is unnatural and not optimal. A more natural idea is to introduce the most mature and widely used closed-loop control algorithm from continuous systems: PID (proportional-integral-derivative controller).

$$ u(t)=K_{\mathrm{p}} e(t)+K_{\mathrm{i}} \int_{0}^{t} e(\tau) \mathrm{d} \tau+K_{\mathrm{d}} \frac{\mathrm{d} e(t)}{\mathrm{d} t} $$

The first term is the proportional term, the second is the integral term, and the third is the derivative term.

The first term is easy to understand: the larger the error, the larger the required correction. But a pure proportional controller quickly runs into trouble. If an enemy moves to the right at a constant speed of 10 and we set \(K_p=0.5\), then the steady state becomes \(e(t)=20, u(t)=10\).

At that point an integral term is needed to eliminate steady-state error. For example, if the accumulated error over the past second is 20 and we set \(K_i=0.25\), then \(u(t)=200.5+200.25=15\), which exceeds the enemy’s movement speed and can reduce the distance to the target. A controller with integral action is called a PI controller.



But if i is too small, convergence is slow. If i is too large, jitter appears and convergence also slows down. The best effect is a slight overshoot:

The final derivative term mainly reduces oscillation during control and mitigates overshoot. Since \(\mathrm{d}e(t)\) is generally negative, the derivative term acts like a brake. It reduces the pulling amplitude in advance and prevents the crosshair from overshooting the target. When PID is tuned properly:

The hard parts of applying PID are:

  • The enemy’s deliberately irregular motion during gunfights cannot be predicted by PID, so compensation is limited
  • Parameter tuning is difficult, which is the classic headache of PID
  • Different shooting scenarios require different PID parameters

PVAR: vector autoregressive proportional control

If I said that the main reason I wanted to go on the Journey to the West was to visit the Women’s Kingdom, and that reaching India just happened along the way, you probably would not believe me either.

  • Journey to the West Diary

When I talk about aim assist, what I really want to talk about is time-series forecasting.
In a typical correction process, \(e(t)\) decreases rapidly toward zero at first, and then oscillates left and right because of recoil and other factors. The following table shows time-series data from one aiming sequence. enemy_box and confidence are the enemy bounding box and confidence output by the model. ex and ey are the enemy’s horizontal and vertical distances from the crosshair, and ux and uy are the mouse movement commands output by the controller.

enemy_box_x1 enemy_box_y1 enemy_box_x2 enemy_box_y2 confidence ex ey ux uy
0 266.21786 304.98212 289.40607 362.9163 0.548098 -42.188035 13.94921 -25.31283 8.369531
1 261.20413 304.77383 285.12802 362.66193 0.705141 -46.833925 13.71788 -29.106307 8.525386
2 273.99313 297.0426 297.64008 360.8898 0.700376 -34.183395 8.9662 -22.454103 5.920422
3 289.20618 288.6216 315.9204 360.6148 0.700991 -17.43671 4.6182 -12.794311 3.41447
4 304.0824 283.4045 331.3549 357.4968 0.622412 -2.28135 0.45065 -3.746427 0.922879
5 311.96606 282.16705 340.17767 357.0965 0.715523 6.071865 -0.368225 1.40623 0.423019
6 313.57156 282.32294 341.7551 357.35828 0.734364 7.66333 -0.15939 2.544032 0.544513
7 313.0578 282.24142 341.4103 357.54837 0.754622 7.23405 -0.105105 2.42217 0.575116
8 311.94815 282.4199 340.3355 356.5956 0.768114 6.141825 -0.49225 2.91703 0.036619
9 311.10672 282.8336 339.47678 356.11987 0.770691 5.29175 -0.523265 2.516004 0.007234
10 309.4765 282.83307 337.44662 356.76465 0.778666 3.46156 -0.20114 2.424042 -0.049513
11 305.26862 290.59354 332.79498 361.52643 0.760621 -0.9682 6.059985 0.129709 3.759036
12 303.50372 291.24722 333.49023 363.27325 0.66165 -1.503025 7.260235 -0.325249 4.665564

Across thousands of measured aiming records, the median \(e_x\) and \(e_y\) sequences passed the unit-root stationarity test ADFULLER (p<0.0002). Therefore, HOUYI not only corrects the current \(e_t\), but also uses a vector autoregression model, VAR, to predict \(\hat e_{t+1}\).

From the perspective of vector autoregression,

$$ e_{t}=c+A_{1} e_{t-1}+A_{2} e_{t-2}+\cdots+A_{p} e_{t-p} $$

We then inject prior knowledge to adjust the model:
  • The current error has a linear relationship with the previous movement command \(U_{t-1}\), so a new explanatory variable is introduced
  • The error does not contain a constant term

The final model used by HOUYI is

$$ e_{t}=A_{1} e_{t-1}+A_{2} e_{t-2}+\cdots+A_{p} e_{t-p} +A_u U_{t-1} $$

VAR and the proportional controller together form HOUYI’s main control algorithm, PVAR.

It Responds Like An Echo

Performance optimization is both the most important and the most labor-intensive part of HOUYI. Low-latency, high-throughput image processing is the foundation of the final experience.

Screen capture There are many ways to capture the screen, but the most efficient one is DXGI, which uses the GPU to process textures directly. Capture speed is therefore almost limited only by GPU rendering speed. On top of that, HOUYI specifically optimizes memory efficiency for partial capture: only data from the required region is read, and data type conversion is performed only for the target area.

Preprocessing The main performance killer during preprocessing is still memory access efficiency. HOUYI removes all unnecessary memory copies and allocates a fixed region in GPU memory for model input, roughly doubling the number of corrections per second.

Inference engine The model is exported to ONNX and then deployed with Nvidia’s black-tech TensorRT.

The core of NVIDIA TensorRT is a C++ library that facilitates high-performance inference on NVIDIA graphics processing units (GPUs). TensorRT takes a trained network, which consists of a network definition and a set of trained parameters, and produces a highly optimized runtime engine that performs inference for that network. TensorRT provides API’s via C++ and Python that help to express deep learning models via the Network Definition API or load a pre-defined model via the parsers that allow TensorRT to optimize and run them on an NVIDIA GPU. TensorRT applies graph optimizations, layer fusion, among other optimizations, while also finding the fastest implementation of that model leveraging a diverse collection of highly optimized kernels. TensorRT also supplies a runtime that you can use to execute this network on all of NVIDIA’s GPUs from the Kepler generation onwards. TensorRT also includes optional high speed mixed precision capabilities introduced in the Tegra X1, and extended with the Pascal, Volta, Turing, and NVIDIA Ampere GPU architectures.

It is worth mentioning that TensorRT performs hardware-specific optimization for different GPUs, so the first startup requires about 15 minutes to rebuild the model runtime. In this application scenario, that cost is negligible.

Postscript

If you want to use cheats for the pleasure of casually deleting other players, do not use Houyi;

If you do not approve of using external power to break through human limits, do not use Houyi;

If you think it is cool to use algorithms and compute to surpass human limits when processing the same input data, and if you want to use computer technology to resist the alienation imposed by electronic shooting games, then may the Houyi Project help you win an intelligent war.


References

  1. Principles of PID control: understand PID from its essence without clinging to formulas
  2. After reading this, you too can become a PID tuning expert

Houyi Project Technical Report - How to Win an Intelligent War in Apex Legends

https://en.heth.ink/houyi/

Author

YK

Posted on

2022-10-03

Updated on

2022-10-15

Licensed under