Whenever you start using a lot of data to backtest a strategy and you would like to use the triple-barrier method, you’ll face the issue of low time efficiency by running a CPU-based computation. This article provides a great Nvidia-GPU-based solution code that you can implement and get much quicker the desired prediction feature. Quicker sounds great, doesn’t it? Let’s dive in!
- What is the Triple-Barrier Method?
- Time Efficiency Limitations When Using the CPU
- Exploring the Synergy Between Rapids and Numba Libraries
- GPU compute hierarchy
- A GPU-based code to create the triple-barrier method prediction feature
What is the Triple-Barrier Method?
The Triple-Barrier Method is a new tool in financial machine learning that offers a dynamic approach to creating a prediction feature based on risk management. This method provides traders with a framework to set a prediction feature. It is based on what a trader would do if she set profit-taking and stop-loss levels that adapt in real-time to changing market conditions.
Unlike traditional trading strategies that use fixed percentages or arbitrary thresholds, the Triple-Barrier Method adjusts profit-taking and stop-loss levels based on price movements and market volatility. It achieves this by employing three distinct barriers around the trade entry point: the upper, lower, and vertical barriers. These barriers determine whether the signal will be long, short, or no position at all.
The upper barrier represents the profit-taking level, indicating when traders should consider closing their position to secure gains. On the other hand, the lower barrier serves as the stop-loss level, signalling when it’s wise to exit the trade to limit potential losses.
What sets the Triple-Barrier Method apart is its incorporation of time through the vertical barrier. This time constraint ensures that profit-taking or stop-loss levels are reached within a specified timeframe; if not, the previous position is held for the next period. You can learn more about it in López de Prado’s (2018) book.
Time Efficiency Limitations When Using the CPU
If you have 1 million price returns to convert into a classification-based prediction feature, you’ll face time efficiency issues while using López de Prado’ (2018) algorithm. Let’s present some CPU limitations regarding that concern.
Time efficiency is an important factor in computing for tasks that range from basic calculations to sophisticated simulations and data processing. Central Processing Units (CPUs) are not without their limitations in terms of time efficiency, particularly when it comes to large-scale and highly parallelizable tasks. Let’s talk about CPU time efficiency constraints and how they affect different kinds of computations.
- Serial Processing: One of the main drawbacks of CPUs is their intrinsic serial processing nature. Conventional CPUs are made to carry out instructions one after the other sequentially. Although this method works well for many tasks, it becomes inefficient when handling highly parallelizable tasks that would be better served by concurrent execution.
- Limited Parallelism: CPUs usually have a finite number of cores, each of which can only handle one thread at a time. Even though modern CPUs come in a variety of core configurations (such as dual, quad, or more), their level of parallelism is still limited compared to other computing devices like GPUs or specialized hardware accelerators.
- Memory Bottlenecks: Another drawback of CPUs is the potential for memory bottlenecks, particularly in tasks requiring frequent access to large datasets. CPUs have limited memory bandwidth, which can be saturated when processing large amounts of data or when several cores are vying for memory access simultaneously.
- Instruction-Level Parallelism (ILP) Constraints: The term “instruction-level parallelism” (ILP) describes a CPU’s capacity to carry out several instructions at once within one thread. The degree of parallelism that can be reached is naturally limited by hardware, resource constraints, and instruction dependencies.
- Context Switching Overhead: Time efficiency may be impacted by context switching overhead, which is the process of preserving and regaining the state of a CPU’s execution context when transferring between threads or processes. Even though efficient scheduling algorithms used in modern operating systems reduce context-switching overhead, it is still something to take into account, especially in multitasking environments.
- Mitigating Time Efficiency Limitations: Although CPUs’ time efficiency is naturally limited, there are several ways to get around these limitations and boost overall performance:
- Multi-Threading: Apply multi-threading techniques to parallelize tasks and efficiently utilize the available CPU cores. Keep in mind potential overhead and contention issues when managing multiple threads. You’re better off using the maximum number of threads available per your CPU cores minus 1 to run your code efficiently.
- Optimized Algorithms: Apply data structures and algorithms specially designed to meet the needs of the given task. This could entail reducing pointless calculations, minimizing memory access patterns, and, when practical, taking advantage of parallelism.
- Distributed Computing: Distribute computational tasks across several CPUs or servers in a distributed computing environment to take advantage of extra processing power and scale horizontally as needed.
Is there another way?
Yes! Using a GPU. GPU is well-designed for parallelism. Here, we present the Nvidia-based solution.
Exploring the Synergy Between Rapids and Numba Libraries
New to GPU usage? New to Rapids? New to Numba?
Don’t worry! We’ve got you covered. Let’s dive into these topics.
When combined, Rapids and Numba, two great libraries in the Python ecosystem, provide a convincing way to speed up tasks involving data science and numerical computing. We’ll go over the fundamentals of how these libraries interact and the advantages they offer computational workflows.
Understanding Rapids
Rapids library is an open-source library suite that uses GPU acceleration to speed up machine learning and data processing tasks. Popular Python data science libraries, such as cuDF (GPU DataFrame), cuML (GPU Machine Learning), cuGraph (GPU Graph Analytics), and others, are available in GPU-accelerated versions thanks to Rapids, which is built on top of CUDA. Rapids significantly speeds up data processing tasks by utilizing the parallel processing power of GPUs. This allows analysts and data scientists to work with larger datasets and produce faster results.
Understanding Numba
Numba is a just-in-time (JIT) Python compiler that optimizes machine code at runtime from Python functions. Numba is an optimization tool for numerical and scientific computing applications that makes Python code perform and compiled languages like C or Fortran. Developers can achieve significant performance gains for computationally demanding tasks by instructing Numba to compile Python functions into efficient machine code by annotating them with the @cuda.jit decorator.
Synergy Between Rapids and Numba
Rapids and Numba work well together because of their complementary abilities to speed up numerical calculations. While Rapids is great at using GPU acceleration for data processing tasks, Numba uses JIT compilation to optimize Python functions to improve CPU-bound computation performance. Developers can use GPU acceleration for data-intensive tasks and maximize performance on CPU-bound computations by combining these Python libraries to get the best of both worlds.
How Rapids and Numba Work Together
The standard workflow when combining Rapids and Numba is to use Rapids to offload data processing tasks to GPUs and use Numba to optimize CPU-bound computations. This is how they collaborate:
Preprocessing Data with Rapids: To load, manipulate, and preprocess big datasets on the GPU, use the Rapids cuDF library. Utilize GPU-accelerated DataFrame operations to carry out tasks like filtering, joining, and aggregating data.
The Numba library offers a decorator called @cuda.jit that makes it possible to compile Python functions into CUDA kernels for NVIDIA GPU parallel execution. Conversely, RAPIDS is a CUDA-based open-source software library and framework suite. To speed up data processing pipelines from start to finish, it offers a selection of GPU-accelerated libraries for data science and data analytics applications.
Various data processing tasks can be accelerated by using CUDA-enabled GPUs in conjunction with RAPIDS when @cuda.jit is used. For example, to perform computations on GPU arrays, you can write CUDA kernels using @cuda.jit (e.g., using NumPy-like syntax). These kernels can then be integrated into RAPIDS workflows for tasks like:
GPU compute hierarchy
Let’s understand how GPU’s hierarchy works. In GPU computing, particularly in frameworks like CUDA (Compute Unified Device Architecture) used by NVIDIA GPUs, these terms are fundamental to understanding parallel processing:
- Thread: A thread is the smallest unit of execution within a GPU. It’s analogous to a single line of code executed in a traditional CPU. Threads are organized into groups called warps (in NVIDIA architecture) or wavefronts (in AMD architecture).
- Block (or Thread Block): A block is a group of threads that execute the same code in parallel. Threads within a block can share data through shared memory and synchronize their execution. The size of a block is limited by the GPU architecture and is typically a multiple of 32 threads (the warp size in NVIDIA GPUs).
- Grid: A grid is an assembly of blocks that share a common kernel or GPU function. It shows how the parallel computation is organized overall. Blocks in grids are frequently arranged along the x, y, and z axes, making them three-dimensional.
So, to summarize:
- Threads execute code.
- Threads are organized into blocks.
- Blocks are organized into grids.
Stay tuned for Part II for steps on creating the triple-barrier method prediction feature.
Originally posted on QuantInsti blog.
Disclosure: Interactive Brokers
Information posted on IBKR Campus that is provided by third-parties does NOT constitute a recommendation that you should contract for the services of that third party. Third-party participants who contribute to IBKR Campus are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.
This material is from QuantInsti and is being posted with its permission. The views expressed in this material are solely those of the author and/or QuantInsti and Interactive Brokers is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to buy or sell any security. It should not be construed as research or investment advice or a recommendation to buy, sell or hold any security or commodity. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.
Join The Conversation
If you have a general question, it may already be covered in our FAQs. If you have an account-specific question or concern, please reach out to Client Services.