VRAMãå¢ããã°è§£æ±ºãããã¯ç©ççã«ééã£ãŠãã â HBMã»CXLã»Unified Memoryãåããªãã£ããã®
HBMã6åã«å¢ãããŠããèŒããããã¢ãã«ãµã€ãºã¯2åã«ãããªããªããRTX 5060ã®VRAMã16GBã«åå¢ããŠã70Bã¯ãã«ã«èŒããªãããVRAMãè¶³ããªããªãå¢ããã°ãããââãã®çºæ³ã¯ã垯åã»å®¹éã»ã³ã¹ãã®ç©ççãã¬ãŒããªããç¡èŠããŠããã
HBMãCXLãUnified Memoryããã®3ã€ã¯VRAMã®å£ã«å¯Ÿããç°ãªãã¢ãããŒãã ããããããã垯åããšã容éããšãã³ã¹ããã®äžè§åœ¢ã®ã©ãã«äœçœ®ãããã§ãLLMæšè«ã®æ§èœãæ ¹æ¬çã«å€ããã
ã¡ã¢ãªã®äžè§åœ¢: 垯åã»å®¹éã»ã³ã¹ã
æè¡ 垯å 容é ã³ã¹ã/GB ã€ã³ã¿ãŒãã§ãŒã¹
ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
HBM3E (H200) 4,800 GB/s 141 GB $10-15 TSV 1024-bit à 6 stacks
GDDR6 (RTX4060) 272 GB/s 8 GB $2.5-4 128-bit, 17 Gbps
CXL 3.1 64 GB/s* TBçŽ $3-5 PCIe 6.0 x16
Unified (M4 Max) 546 GB/s 128 GB AppleäŸå LPDDR5X 512-bit
* per direction (128 GB/s bidirectional)
åæè¡ã®ç©ççãªç¹åŸŽ:
- HBM3E: ã·ãªã³ã³è²«é黿¥µïŒTSVïŒã§åçŽã«ç©å±€ã垯åã¯å§åçã ããã€ã³ã¿ãŒããŒã¶é¢ç©ãšã³ã¹ããé£ã
- GDDR6: åºæ¿äžã®ã¯ãã æ¥ç¶ãå®ããŠGPUç¬å ã§äœ¿ãããã容éã«éçããã
- CXL 3.1: æ¢åã®PCIeã€ã³ãã©ãæµçšãTBçŽã®å®¹éãåããããã¡ã¢ãªèªã¿åºã垯åã¯HBM3Eã®1/75
- Unified Memory: CPU/GPU/NPUãåãã¡ã¢ãªããŒã«ãå ±æãã³ããŒã³ã¹ããŒãã ãã垯åã¯å ±æ=ç«¶å
HBMã¯åž¯åãCXLã¯å®¹éãUnified Memoryã¯ãã©ã³ã¹ãéžãã ãã©ããäžè§åœ¢ã®å šé ç¹ã¯åããªãã
HBM: 垯åã®çã容éã®å¥Žé·
HBMã®åž¯åã¯ã~5,000æ¬ä»¥äžã®ã·ãªã³ã³è²«é黿¥µïŒTSVïŒã«ãã1024-bitå¹ ã®ãã¹ããçãŸãããH200ã¯6ã¹ã¿ãã¯ãæèŒããåèš6144-bitã®ãã¹å¹ ã§4.8 TB/sãå®çŸãããGDDRã®18åã ã
ã ã容éã«ã¯ç©ççãªå€©äºããã:
HBM3E ã¹ã¿ãã¯æ§æ:
1ã〠= 24 Gbit (3GB)
8-Hi (8æç©å±€) = 24 GB/stack
12-Hi (次äžä»£) = 36 GB/stack
H200: 6 stacks à 24GB = 144 GB raw (å
¬ç§°141 GB)
ã¹ã¿ãã¯å䟡: $240-360æšå® ($10-15/GB)
ããã£ãšã¹ã¿ãã¯ãå¢ããã°ïŒãââããã§é¢ç©ã®åé¡ã«ã¶ã€ãããåHBMã¹ã¿ãã¯ã¯ã€ã³ã¿ãŒããŒã¶äžã§~100 mm²ãå æãããGPU die (~800 mm²) + 6ã¹ã¿ã㯠(~600 mm²) = ~1400 mm²ãCoWoS-Sã®çŸè¡äžéã¯çŽ2831 mm²ïŒ3.3xã¬ãã¯ã«ïŒã§ãH200ã«ã¯ãŸã äœè£ãããããã€ã³ã¿ãŒããŒã¶ã®å€§ååã¯ã³ã¹ããšæ©çãŸããçŽæ¥æªåãããã
LLMæšè«ãžã®åœ±é¿
GPU æå€§ã¢ãã« (Q4) 垯å åè
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
H200 (141GB HBM3E) ~280B 4,800 GB/s 70B FP16ã ãšKV cacheäœè£1GB
RTX 4060 (8GB GDDR6) ~13B 272 GB/s 13B以äžã¯CPUãªãããŒãå¿
é
RTX 5060 (16GB GDDR7) ~30B 448 GB/s 容é2åã§ãã¢ãã«ã¯2åã«ãªããªã
VRAMãåå¢ããŠãèŒããããã¢ãã«ãµã€ãºã¯åã«ãªããªããKV cacheã®ååšãããã32Kã³ã³ããã¹ãã®70B FP16ã¢ãã«ã®KV cacheã¯çŽ8GBãVRAMã®ãäœãããKV cacheã«é£ãããã
CXL: 容éã®è§£æŸã垯åã®ç ç²
CXL (Compute Express Link) ã¯PCIeã®ç©çå±€äžã«æ§ç¯ãããã¡ã¢ãªæ¡åŒµãããã³ã«ã ã
CXL 3.1ã¯PCIe 6.0ã®ç©çå±€äžã«æ§ç¯ããã64 GB/s per directionïŒx16ã¬ãŒã³ïŒã®åž¯åãæäŸãããã¬ã€ãã³ã·ã¯170-400 nsïŒDDR5ããŒã«ã«ã®2-4åïŒã容éã¯memory poolingã«ããçè«äžç¡å¶éã ããçŸæç¹ã§ã¯ãµãŒããŒ/ããŒã¿ã»ã³ã¿ãŒåãã ã
CXLã®åž¯åã§LLMæšè«ãããšäœãèµ·ããã:
ã¢ãã« CXL (64 GB/s) GDDR6 (272 GB/s) HBM3E (4,800 GB/s)
ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
7B Q4_K_M (4.7GB) ~13.6 t/s ~32 t/s (å®å¹) ~1021 t/s (çè«)
70B Q4_K_M (40GB) 1.6 t/s N/A (èŒããªã) ~120 t/s (çè«)
70B Q4ã®éã¿ãCXLããèªããš1.6 t/sã人éãèªãé床ãšå€ãããªãã
ã ããCXLã®ç䟡ã¯ãéã¿ã®æ ŒçŽå Žæãã§ã¯ãªãã
éå±€åã¡ã¢ãªã¢ãŒããã¯ãã£:
Tier ã¡ã¢ãª çšé 容é 垯å ã¬ã€ãã³ã·
ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
0 GPU SRAM ã¢ã¯ãã£ããŒã·ã§ã³ 24 MB ~4 TB/s ~1 ns
1 HBM/GDDR éã¿ãã¢ã¯ãã£ãKVãã£ãã·ã¥ 8-141 GB 272-4800 ~10 ns
2 CXL Memory KVãã£ãã·ã¥ã®ãªãŒããŒãã㌠TBçŽ 64 GB/s 170-400 ns
3 NVMe SSD æ°žç¶ã¹ãã¬ãŒãž TBçŽ 7 GB/s ~10,000 ns
CXLã®æ¬è³ªã¯ãVRAMã®ä»£æ¿ãã§ã¯ãªããVRAMãšNVMeã®éãåããæ°ããå±€ãã ãKVãã£ãã·ã¥ã®å€ãããŒã¯ã³ïŒ128Kã³ã³ããã¹ãã®æåã®æ¹ïŒãCXLã¡ã¢ãªã«éé¿ãããã°ãVRAMäžã«ã¯çŽè¿ã®ã¢ãã³ã·ã§ã³ç¯å²ã ãæ®ãèšèšãå¯èœã«ãªããKVãã£ãã·ã¥ã®ãªãŒããŒãããŒå ãšããŠNVMeã®9åé«éã
ãã®éå±€åã¯ãå ã¡ã¢ãªèªã¿åºãïŒKVãã£ãã·ã¥ã®ç©ççãªè»¢ééåæžïŒãKVãã£ãã·ã¥éååïŒããŒã¿éãæ°å€çã«åæžïŒãšã¯çŽäº€ããæé©åã ãçµã¿åãããããã
Unified Memory: ãã©ã³ã¹ã®çœ
Apple Siliconã®Unified Memoryã¯ãCPUã»GPUã»NPUãåãç©çã¡ã¢ãªããŒã«ãå
±æããã
ããã 容é 垯å ãã¹å¹
å
±æå
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
M4 Max 128 GB 546 GB/s 512-bit CPU 12ã³ã¢ + GPU 40ã³ã¢ + NPU 16ã³ã¢ + ã¡ãã£ã¢ãšã³ãžã³
M4 (base) 16-32 GB 120 GB/s 128-bit åäžïŒRTX 4060ã®272 GB/sã®åå以äžïŒ
LLMæšè«ã§ã®çŸå®
- M4 Max 128GB: 70B Q4_K_M (40GB) ãã¡ã¢ãªç®¡çãªãã§å šéèŒããçè«äžé 546/40 = 13.7 t/sã ãã宿ž¬ã¯8-10 t/sãCPU/NPU/IOãšã®åž¯åå ±æãããã«ããã¯
- M4 32GB: 32B Q4ã§çè« 120/18 = 6.7 t/s â 宿ž¬4-5 t/sãRTX 4060ã¯GDDR6 272 GB/sãç¬å ãããããåã¢ãã«ã§10.8 t/sãåºã
垯åå ±æã®åé¡ã¯æ§é çã ãGPUæšè«äžãCPUãã¡ã¢ãªã«ã¢ã¯ã»ã¹ãã垯åãé£ãåããmacOSã®ã¡ã¢ãªç®¡çãUIæç»ãããã¯ã°ã©ãŠã³ãã§åž¯åãæ¶è²»ãããSafariã§å€§ããªããŒãžãéããªããæšè«ããã°ãäœæã§é床ãèœã¡ãã
Unified Memoryã®å©ç¹ã¯ãGPUã¡ã¢ãªç®¡çã®æé€ãã ãCUDAã®cudaMalloc/cudaMemcpyãäžèŠãããŒã¿ã¯ãã§ã«ããã«ãããã³ããŒã³ã¹ããŒãã
ã ã垯åã¯å
±æè³æºã§ãããç¬å ã§ããªããRTX 4060ã®GDDR6ã¯272 GB/sãGPUãäºå®äžç¬å ãããM4ã®ããŒã¹ã¢ãã«ã¯120 GB/sãã·ã¹ãã å
šäœã§åãåãã
GPU ç·åž¯å GPUå æç LLMå®å¹åž¯å æšè«é床
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
RTX 4060 (8GB GDDR6) 272 GB/s ~95% (DPåºåçšåºŠ) ~258 GB/s 7B Q4: 32 t/s (å®å¹ç58%)
M4 Max (128GB LPDDR5X) 546 GB/s 倧å (CPU/NPU/IOãšç«¶å) ~400 GB/s 70B Q4: 8-10 t/s
M4 base (16GB LPDDR5X) 120 GB/s ã·ã¹ãã å
šäœãšå
±æ ~78 GB/s 7B Q4: 14-16 t/s
RTX 4060ã¯åž¯åãå°ãããGPUç¬å ã§ãå°ã¢ãã«ãªãæéãM4 Maxã¯åž¯åã倧ãããå ±æã®ããã倧ã¢ãã«ãèŒãããã代ããã«åž¯åãããã®å¹çã¯äœããM4 baseã¯åž¯åã容éãäžéå端ã§ãLLMçšéã§ã¯RTX 4060ã«è² ããã
3ã€ã®ã¢ãããŒãã®æ¯èŒ
垯å 容é ã³ã¹ã LLMæšè«ã§ã®äœçœ®
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
HBM3E 4,800 GB/s 141 GB $10-15/GB éã¿+KVãé«éã«èªã
GDDR6 272 GB/s 8-24 GB $2.5-4/GB å°ã¢ãã«ãé«éã«åã
CXL 3.1 64 GB/s TBçŽ $3-5/GB KVãã£ãã·ã¥ã®ãªãŒããŒãããŒå
Unified (Max) 546 GB/s 128 GB AppleäŸå 倧ã¢ãã«ããŒãã³ããŒã§èŒãã
NVMe SSD 7 GB/s TBçŽ $0.1/GB ã¢ãã«ã®æ°žç¶ã¹ãã¬ãŒãž
- HBM (H100/H200) â ãããæšè«ãè€æ°ãªã¯ãšã¹ãåæåŠçã垯åãè€æ°ãªã¯ãšã¹ãã§å ±æã§ããããã1ãªã¯ãšã¹ããããã®ã³ã¹ãå¹çãé«ãããã ãåäžãªã¯ãšã¹ãã§ã¯700W TDPã®å€§åãç¡é§ã«ãªã
- GDDR (RTX 4060/5060) â å人å©çšãåäžãªã¯ãšã¹ããå°ãäžã¢ãã«ãGPUç¬å 垯åã§å¹çæå€§ã115W TDPã§32 t/s (0.28 t/s/W)ãå°ã¢ãã«ãªãé»åå¹çã§H100åäžãªã¯ãšã¹ã (700W) ã«åãããã ã容éã®å£ãããã8GBã§ã¯7Bãäžé
- CXL â è¶ é·ã³ã³ããã¹ãæšè« (128K+)ãã¡ã¢ãªããŒã«å ±æãKVãã£ãã·ã¥ãæ°åGBã«èšããé·ã³ã³ããã¹ãã§VRAMäžè¶³ãè§£æ¶ããããã ã垯åã¯HBM3Eã®1/75ã§ãéã¿ã®æ ŒçŽå ãšããŠã¯é ãããããµãŒããŒåã2025-26幎ãã³ã³ã·ã¥ãŒããŒã¯2028幎以é
- Unified Memory (Apple) â 倧ã¢ãã«ãæè»œã«åããããéçºã»å®éšçšéã70B Q4ãã¡ã¢ãªç®¡çãªãã§åãããã ã垯åå ±æã§é床å¹çã¯GDDRç¬å ã«å£ããã²ãŒãã³ã°çšéãšã®äž¡ç«ã¯å°é£
8GB VRAMãŠãŒã¶ãŒãžã®å®çšç瀺å
Layer 1: éååïŒå³å¹æ§ããïŒ
Q4_K_Méååã§7Bã¢ãã«ã®éã¿ã14GB â 4.7GBã«ãªãïŒ3åã®å®¹éå¹çïŒãllama.cpp/Ollamaã§æšæºãµããŒãã
Layer 2: KVãã£ãã·ã¥éååïŒå®é𿮵éïŒ
--cache-type-k q4_0 --cache-type-v q8_0 ã§KVãã£ãã·ã¥ãFP16ã®1/3ã«å§çž®ãé·ã³ã³ããã¹ã察å¿ã®éµã詳现ã¯ãKVãã£ãã·ã¥ãQ4ã«èœãšããã32Kã³ã³ããã¹ãã8GBã«åãŸã£ããã§æ€èšŒããã
Layer 3: CPUãªãããŒãïŒåž¯åãã¬ãŒããªãïŒ
--n-gpu-layers ã§éšåçã«GPUã«èŒããã°ã32Bã¢ãã«ãåãïŒé
ããåãïŒãRTX 4060ã§32Bæé©ãªãããŒãæ10.8 t/sãããã«ããã¯ã¯CPUâGPUéã®PCIe 4.0 x8 = 16 GB/sã
Layer 4: CXLïŒå°æ¥ïŒ
CXLã¡ã¢ãªã¢ãžã¥ãŒã«ã§PCIeçµç±ã®ã¡ã¢ãªè¿œå ãKVãã£ãã·ã¥ã®Tier 2ã¹ãã¬ãŒãžãšããŠæ©èœãããã³ã³ã·ã¥ãŒããŒåãã¯2028幎以éãä»ã®CPUãªãããŒãïŒPCIe 16 GB/sïŒãšåçã¯äŒŒãŠããããCXLã¯ã¡ã¢ãªã»ãã³ãã£ã¯ã¹ïŒload/storeã¢ã¯ã»ã¹ãGPUçŽæ¥ã¢ãã¬ãã·ã³ã°ïŒã§å·®å¥åãããã
仿¥ã§ããã®ã¯Layer 1-3ã®çµã¿åããã ãQ4éåå + KVãã£ãã·ã¥Q4 + æé©GPUãªãããŒã = 32Bã¢ãã« Ã 32Kã³ã³ããã¹ãã8GBã§åããå°æ¥CXLãLayer 4ãšããŠå ããã°ã128K+ã³ã³ããã¹ããçŸå®çã«ãªãã
泚ç®ãã¹ãã¯ãCXLãçŽæãããã¡ã¢ãªè¿œå ãã¯ã仿¥ã®CPUãªãããŒããšæ¬è³ªçã«åãPCIeãã¹ãéãããšã ã垯åã®å€©äºã¯åããCXLã®å©ç¹ã¯ã¡ã¢ãªã»ãã³ãã£ã¯ã¹ïŒload/storeã§ã¢ã¯ã»ã¹ã§ããGPUãçŽæ¥ã¢ãã¬ãã·ã³ã°å¯èœïŒã§ãã£ãŠã垯åã®åäžã§ã¯ãªãã
ç©çãæ±ºããã¡ã¢ãªã®æªæ¥
ãVRAMãå¢ããã°åé¡ã¯è§£æ±ºãããïŒãââ解決ããªãã容éãå¢ãããšåž¯åãã³ã¹ããç ç²ã«ãªãã
垯åã»å®¹éã»ã³ã¹ãã®äžè§åœ¢ã¯ç©çæ³åãæ¯é ããŠãããã©ã®æè¡ã3ã€å šãŠã¯åããªããHBMã¯åž¯åãåã£ãŠå®¹éãšã³ã¹ããç ç²ã«ãããCXLã¯å®¹éãåã£ãŠåž¯åãç ç²ã«ãããUnified Memoryã¯ãã©ã³ã¹ãåã£ãŠç¬å 垯åãç ç²ã«ãããGDDRã¯ç¬å 垯åãåã£ãŠå®¹éãç ç²ã«ããã
LLMæšè«ã®æé©è§£ã¯ã1ã€ã®æè¡ãéžã¶ãããšã§ã¯ãªããè€æ°ã®æè¡ãéå±€çã«çµã¿åãããããšã ã
仿¥ã®RTX 4060ã§å®è¡å¯èœãªæåç:
- éã¿ â VRAMïŒQ4éååã§7-13Bãå šèŒãïŒ
- KVãã£ãã·ã¥ â VRAMïŒQ4/Q8éååã§å®¹éç¯çŽïŒ
- 远å ã¬ã€ã€ãŒ â RAMïŒCPUãªãããŒããPCIe垯åïŒ
- æ°žç¶ã¹ãã¬ãŒãž â NVMe SSD
å°æ¥ã®CXLæèŒã³ã³ã·ã¥ãŒããŒPCã§ã®æåç:
- éã¿ â VRAMïŒQ4éååïŒ
- ã¢ã¯ãã£ãKV â VRAM
- å€ãKV â CXLã¡ã¢ãªïŒ64 GB/sã§ååãªã¢ã¯ã»ã¹é床ïŒ
- æ°žç¶ã¹ãã¬ãŒãž â NVMe SSD
ã¡ã¢ãªã®å£ã¯ãç Žãããã®ã§ã¯ãªããéå±€ã§åé¿ããããã®ã ã
åèæç®
- CXL Consortium â "Compute Express Link Specification 3.1" (2024)
- Samsung â "CMM-D: CXL Memory Module for Data Centers" (2024)
- SK hynix â HBM3E specifications, 12-Hi stack architecture
- NVIDIA H200 SXM specifications â 141GB HBM3E, 4.8 TB/s
- Apple M4 Max specifications â 128GB Unified Memory, 546 GB/s
- "Efficient Memory Management for Large Language Model Serving with PagedAttention" (2023) arXiv:2309.06180












