Hyper-Converged Infrastructure 超融合基礎設施

Phison HCI Phison HCI

Phison Hyper-Converged Infrastructure software is the core architecture of the next-generation AI Data Platform. It integrates computing, storage, GPU resources, and an AI management platform to provide enterprises with a one-stop AI infrastructure solution. 群聯超融合基礎設施(HCI)軟體是下一代 AI 資料平台的核心架構。它整合運算、儲存、GPU 資源與 AI 管理平台,為企業提供一站式 AI 基礎設施解決方案。

90 90 %↑ %↑ GPU Utilization GPU 利用率

vGPU partitioning and intelligent scheduling lift average GPU utilization from 25% to over 90%, turning idle capacity into active inference and improving hardware ROI by 3–4×. 透過 vGPU 切割與智慧排程,GPU 使用率從平均 25% 提升至 90% 以上,將閒置算力轉化為推論產出,硬體 ROI 翻 3–4 倍。

60 60 %↓ %↓ Lower TTFT 更低首字延遲

Distributed KV-Cache sharing achieves an 80% hit rate across the cluster, eliminating redundant prefill computation and cutting time-to-first-token by up to 60%. 分散式 KV-Cache 共享讓叢集命中率達 80%,消除重複 Prefill 計算,首字延遲(TTFT)最多降低 60%。

0 0 Zero-Downtime Operations 零停機維運

Built-in health monitoring and automatic failover redirect traffic within seconds when a node goes offline, enabling rolling maintenance with zero service interruption. 內建健康監控與自動容移(Failover),節點離線時流量在數秒內重導,支援滾動維護,服務零中斷。

Enterprise AI Infrastructure Challenges 企業 AI 基礎設施的核心障礙

Before adopting Phison HCI, enterprises deploying Private AI must first overcome the following fundamental challenges. 在採用 Phison HCI 之前,部署 Private AI 的企業必須先克服以下根本性挑戰。

Persistently Low GPU Utilization GPU 閒置率居高不下

Full-card or passthrough deployment often reaches only 20–30% GPU utilization, leaving compute underused and hardware ROI extremely low. 整卡或直通部署平均 GPU 使用率僅 20–30%,算力大量閒置,硬體投資 ROI 極低。

Limited LLM Context Length LLM 上下文長度受限

GPU HBM often cannot hold the KV Cache large models need; long documents and conversations degrade quickly or fail to complete. GPU HBM 常不足以承載大型 KV Cache;長文件與長對話場景效能急降或無法完成推論。

Lengthy Deployment Cycles 部署週期冗長

From procurement and networking to containers and model launch, traditional flows often take weeks or months, slowing AI innovation. 從採購、網路到容器平台與模型上線,傳統流程常需數週至數月,嚴重拖慢 AI 創新。

Fragmented Multi-System Management 多系統管理破碎化

Compute, storage, network, containers, and monitoring are managed separately—IT teams operate across five or more systems, raising labor cost and config drift risk. 計算、儲存、網路、容器與監控分散管理,IT 需跨五套以上系統,人力成本高且易組態漂移。

What Core Technology Does Phison Own? 群聯擁有哪些核心技術?

Phison HCI builds on three self-developed technologies — vGPU partitioning and time-sharing, aiDAPTIVCache tiering from HBM to NVMe, and multi-node tensor/pipeline parallel scale-out — to eliminate GPU idle waste, extend effective memory across the cluster, and run large-model inference at production scale. 群聯 HCI 以三項自研技術為基礎——vGPU 切割與分時共享、aiDAPTIVCache 從 HBM 到 NVMe 的分層快取、以及多節點 Tensor/Pipeline 平行擴展——消除 GPU 閒置、延伸叢集有效記憶體,支撐大型模型量產級推論。

vGPU Partitioning + Time-Sharing vGPU 切割 + 分時共享

Split a single GPU into vGPU instances with on-demand compute and memory allocation. Multiple models or tenants time-share the same card with QoS isolation, and quotas adjust dynamically at peak load. 單卡切割為 vGPU 實例,按需分配算力與顯存;多模型、多租戶分時共享並確保 QoS 隔離,尖峰期動態調整配額。

KV Cache Extension Across Multiple Nodes KV Cache 擴充多節點

aiDAPTIVCache tiers cache from GPU HBM to NVMe SSD, expanding effective memory 10×+. KV Cache shares across nodes so Prefill results reuse across Decode workloads — supporting 128K+ token inference without OOM. aiDAPTIVCache 自 GPU HBM 分層至 NVMe SSD,有效記憶體延伸 10 倍以上;KV Cache 跨節點共享,Prefill 結果供多 Decode 節點重用,支撐 128K+ tokens 推論。

Multi-Node Scale-Out Architecture 多節點橫向擴展架構

Tensor Parallel + Pipeline Parallel split 70B–405B models across nodes. Cross-node KV Cache sharing cuts inter-node traffic as throughput scales linearly with each node added. Tensor Parallel + Pipeline Parallel 切分 70B–405B 模型;跨節點 KV Cache 共享降低跨機通訊,新增節點即可線性提升吞吐量。

Phison HCI Architecture 群聯 HCI 架構

Through software-hardware integration and modular design, enterprises can quickly deploy AI workstations, Private AI, AI agents, RAG, AI inference, and Edge AI applications, lowering adoption barriers and accelerating AI implementation. 透過軟硬體整合與模組化設計,企業可快速部署 AI 工作站、Private AI、AI 代理、RAG、AI 推論與 Edge AI 應用,降低導入門檻並加速 AI 落地。

User Surfaces 使用者介面

User Login 使用者登入
AI Workspace AI 工作區
Applications 應用程式
Compute 運算主控台
Storage 儲存主控台
Management 管理主控台

Platform Services 平台服務

AI Platform AI 平台

On-prem model upload 本地模型上傳
OCI Artifacts support OCI Artifacts 支援
Rapid model deployment 快速模型部署
Performance monitoring 效能監控

Service Deployment & Management 服務部署與管理

Scheduling & orchestration 排程與協調
Backend services 後端服務

Compute Resources 運算資源

CPU · RAM CPU · RAM
GPU & vGPU GPU 與 vGPU
aiDAPTIVCache aiDAPTIVCache
  • Cluster 叢集
  • Container 容器
  • VM 虛擬機
  • Multi-Tenancy 多租戶
  • Access Control 存取控制
  • Audit 稽核
  • Cost 成本
  • Image 映像檔
  • Monitoring 監控

Unified Management Console 統一管理控制台

One control plane for Kubernetes, storage, VMs, AI inference, and monitoring — operate your entire AI infrastructure without switching tools. 單一控制平面管理 Kubernetes、儲存、虛擬機、AI 推論與監控 — 無需切換工具即可運維整體 AI 基礎設施。

Core Technology Benefits 核心技術效益

Measurable performance and efficiency gains powered by Phison HCI's proprietary technologies. 群聯 HCI 自研核心技術帶來可量化的效能與效率提升。

Lower Inference Cost更低推論成本

Reduce idle resources and directly lower the per-token inference cost.降低閒置率,直接減少每 Token 推論成本。

Linear Scale-Out線性橫向擴展

Add new nodes to linearly increase throughput without redeploying the model.新增節點即可線性提升吞吐量,無需重新部署模型。

Higher Concurrency更高並發容量

Combined with vGPU partitioning, a single host can serve more concurrent requests simultaneously.結合 vGPU 切割,同台主機可同時服務更多並行請求。

vGPU Resource Partitioning Technology vGPU 資源切割技術

A single GPU can be divided into multiple virtual GPU instances, allowing different workloads — such as training, inference, and batch processing — to share the same card. This eliminates idle GPU waste and enables fine-grained resource scheduling. Phison HCI Platform supports GPU virtualization and resource partitioning, allowing a single GPU to be dynamically allocated to multiple AI tasks or users and preventing GPU idle waste. 單張 GPU 可切割為多個虛擬 GPU 實例,讓訓練、推論、批次處理等不同工作負載共享同一張卡,消除 GPU 閒置並實現精細化資源調度。群聯 HCI 平台支援 GPU 虛擬化與資源切割,可將單卡動態分配給多個 AI 任務或使用者,避免 GPU 閒置浪費。

Core value 核心價值

  • Maximizes GPU utilization 最大化 GPU 利用率
  • Lowers AI adoption costs 降低 AI 導入成本
  • Enables multiple workloads to run in parallel 支援多工作負載並行運行
  • Supports secure multi-tenant isolation 支援安全的多租戶隔離

Applicable scenarios 適用場景

  • Shared AI workstations 共享 AI 工作站
  • Multi-department AI development 多部門 AI 開發
  • AI inference service platforms AI 推論服務平台
  • GPU resource pool management GPU 資源池管理

Multi-Node KV Cache Expansion Technology 多節點 KV Cache 擴充技術

Phison's self-developed KV Cache expansion technology uses high-speed NVMe storage as an extension of GPU HBM. It addresses the context-length limitations of large models and supports shared cache across multiple nodes, significantly reducing GPU VRAM pressure and improving large-model inference efficiency. 群聯自研 KV Cache 擴充技術,以高速 NVMe 儲存延伸 GPU HBM,突破大型模型上下文長度限制,支援多節點共享 Cache,顯著降低 GPU 顯存壓力並提升大模型推論效率。

Core technical features 核心技術特性

  • GPU / DRAM / SSD / Remote SSD hierarchical caching architecture GPU / DRAM / SSD / Remote SSD 分層快取架構
  • Dynamic KV Cache expansion 動態 KV Cache 擴充
  • Support for long-context inference 支援長上下文推論
  • Shared cache resources across multiple nodes 多節點共享快取資源

Technical benefits 技術效益

  • Improves model inference throughput 提升模型推論吞吐量
  • Reduces GPU memory bottlenecks 降低 GPU 記憶體瓶頸
  • Reduces the need to purchase high-end GPUs 降低高階 GPU 採購需求
  • Improves overall GPU usage rate 提升整體 GPU 使用率