Intel’s Flagship ARC Graphics Card With Xe-HPG Alchemist GPU To Tackle AMD RX 6700 XT & NVIDIA RTX 3070 – Wccftech

Intel ARC graphics cards based on the Alchemist Xe-HPG GPUs are all set for launch next year and based on the specifications, we could be looking at very competitive performance numbers against AMD and NVIDIA GPUs.

Intel’s Flagship ARC Graphics Cards With Xe-HPG Alchemist GPU To Be Highly Competitive Against NVIDIA GA104 & AMD Navi 22

The first Intel ARC graphics cards will be powered by the Alchemist GPUs based on the Xe-HPG architecture. Intel has so far confirmed that the first discrete graphics cards will hit retail by Q1 2022 and will be based on the TSMC 6nm process node. Intel also detailed the specifications of Alchemist GPUs and the core building blocks which include the Xe-Core.

Intel ARC Gaming Graphics Card Design Teased With Dual-Slot & Dual-Fan Design, Powered By Xe-HPG Alchemist GPU

Intel ARC Xe-HPG Alchemist GPU – The Building Blocks

So rounding up what we learned, the Intel Xe-HPG Alchemist GPU features a Xe-Core which is the fundamental DNA of the 1st Gen ARC lineup. The Xe-Core is a compute block that is composed of 16 Vector Engines (256-bit per engine) and 16 Matrix Engines (1024-bit per engine). Each Vector Engine is composed of 8 ALUs so, in total, we are looking at 128 ALUs per Xe-Core. Each Matrix Engine block is also referred to as an XMX block which will handle tensor operations in both FP16 and INT8 modes. The Xe-Core further features its own dedicated L1 cache.

Intel fuses four Xe-Cores together to form a Render Slice which is composed of 4 Ray Tracing Units, four Sampler Units, Geometry/Rasterize/HiZ engines, and two Pixel Backend blocks with 8 units on each. These Render Slices are put together to form the main GPUs. The flagship is composed of an 8 Render Slice configuration which features 32 Xe-Cores, 512 Vector Engines, and 4096 ALUs. There will be different configurations with 2, 4, 6 Render Slices but we are focusing on the flagship part in this report.

Intel Unveils Arc Graphics Lineup With Alchemist GPUs Coming To Desktops & Notebooks In Q1 2022 – Demos High-Performance Gaming & Raytracing

Intel ARC Alchemist vs NVIDIA GA104 & AMD Navi 22 GPUs

GPU Name Alchemist DG-512 NVIDIA GA104 AMD Navi 22
Architecture Xe-HPG Ampere RDNA 2
Process Node TSMC 6nm Samsung 8nm TSMC 7nm
Flagship Product ARC (TBA) GeForce RTX 3070 Ti Radeon RX 6700 XT
Raster Engine 8 6 2
FP32 Cores 32 Xe Cores 48 SM Units 40 Compute Units
FP32 Units 4096 6144 2560
FP32 Compute ~16 TFLOPs 21.7 TFLOPs 12.4 TFLOPs
TMUs 256 192 160
ROPs 128 96 64
RT Cores 32 RT Units 48 RT Cores (V2) 40 RA Units
Tensor Cores 512 XMX Cores 192 Tensor Cores (V3) N/A
Tensor Compute ~131 TFLOPs FP16
~262 TOPs INT8
87 TFLOPs FP16
174 TOPs INT8
25 TFLOPs FP16
50 TOPs INT8
L2 Cache TBA 4 MB 3 MB
Additional Cache 16 MB Smart Cache? N/A 96 MB Infinity Cache
Memory Bus 256-bit 256-bit 192-bit
Memory Capacity 16 GB GDDR6 8 GB GDDR6X 16 GB GDDR6
Launch Q1 2022 Q2 2021 Q1 2021

Intel ARC Xe-HPG Alchemist GPU – Comparing It To NVIDIA’s GA104 & AMD’s Navi 22

A rundown of the specifications and comparison has been made by 3DCenter which gives us an idea of the theoretical performance that Intel’s new GPU will have to offer. So right off the bat, Intel’s ARC Xe-HPG Alchemist flagship will offer more TMUs and ROPs than the NVIDIA and AMD competition. The core count at 4096 is higher than AMD’s Navi 22, Navi 21 (RX 6800) but lower compared to NVIDIA’s GA104. NVIDIA is using a dual FP32 numbering methodology and should theoretically be 3072.

Intel’s ARC Alchemist GPUs have lower ray tracing units than the competition but we don’t know exactly how their Ray tracing implementation works. For example, while Navi 22 offers more RT cores than the GA106 Ampere GPUs, the hardware-level integration within NVIDIA’s RT cores is superior in all regards to AMD’s implementation. So the final performance would depend upon Intel’s hardware-level integration and software-level optimization for ray tracing applications.

A major lead that Intel could have over the competition, especially NVIDIA since AMD lacks in this department, is AI assistance in supersampling technologies. Intel has already showcased an impressive demo of its XeSS technology and based on the expected numbers, Intel GPUs could outperform NVIDIA’s Tensor Core implementation (DLSS) with its XMX architecture. Intel is also expected to feature a small but useful game cache on its GPUs and will be equipped with higher VRAM capacities of up to 16 GB (GDDR6) across a 256-bit bus interface. This would be twice as much memory as NVIDIA’s RTX 3070 and RTX 3070 Ti so they may have to prepare a refresh to counter it.

Lastly, the theoretical FP32 compute performance is computed with an expected peak clock rate of 2 GHz. That’s the most likely scenario for TSMC’s 6nm process node given how well clocks scale on TSMC’s 7nm process node. Based on that, the Intel Xe-HPG Alchemist GPU could offer around 16-17 TFLOPs of Compute power. This is slightly lower FLOPs than what NVIDIA’s GA104 produces but it should be noted that not all FLOPs should be measured equally as gaming architecture runs very different compared to datacenter chips.

Based on these early specifications, we are looking at an Intel graphics card that could end up being faster than AMD’s Radeon RX 6700 XT and NVIDIA’s RTX 3070 with ease. To push its 1st Gen graphics cards further into the consumer segment, Intel may likely offer very competitive prices against established giants like AMD and NVIDIA. And along with a strong suite of software-level optimizations, they might have a win-win in their hands which will only be pushed forward with future generations of ARC GPUs.

Intel ARC Alchemist vs NVIDIA GA104 & AMD Navi 22 GPUs

GPU Name Alchemist DG-512 NVIDIA GA104 AMD Navi 22
Architecture Xe-HPG Ampere RDNA 2
Process Node TSMC 6nm Samsung 8nm TSMC 7nm
Flagship Product ARC (TBA) GeForce RTX 3070 Ti Radeon RX 6700 XT
Raster Engine 8 6 2
FP32 Cores 32 Xe Cores 48 SM Units 40 Compute Units
FP32 Units 4096 6144 2560
FP32 Compute ~16 TFLOPs 21.7 TFLOPs 12.4 TFLOPs
TMUs 256 192 160
ROPs 128 96 64
RT Cores 32 RT Units 48 RT Cores (V2) 40 RA Units
Tensor Cores 512 XMX Cores 192 Tensor Cores (V3) N/A
Tensor Compute ~131 TFLOPs FP16
~262 TOPs INT8
87 TFLOPs FP16
174 TOPs INT8
25 TFLOPs FP16
50 TOPs INT8
L2 Cache TBA 4 MB 3 MB
Additional Cache 16 MB Smart Cache? N/A 96 MB Infinity Cache
Memory Bus 256-bit 256-bit 192-bit
Memory Capacity 16 GB GDDR6 8 GB GDDR6X 16 GB GDDR6
Launch Q1 2022 Q2 2021 Q1 2021

Which next-generation GPUs are you looking forward to the most?

Leave a Reply

Your email address will not be published. Required fields are marked *