✨ Rotary Position Embeddings for Long Context Length
★ 292 전문 정보 ★
This article is divided into two parts; they are: • Simple RoPE • RoPE for Long Context Length Compared to the sinusoidal position embeddings in the original Transformer paper, RoPE mutates the input tensor using a rotation matrix: $$ begin{aligned} X_{n,i} &= X_{n,i} cos(ntheta_i) – X_{n,fr
🎯 핵심 특징
✅ 고품질
검증된 정보만 제공
⚡ 빠른 업데이트
실시간 최신 정보
💎 상세 분석
전문가 수준 리뷰
📖 상세 정보
This article is divided into two parts; they are: • Simple RoPE • RoPE for Long Context Length Compared to the sinusoidal position embeddings in the original Transformer paper, RoPE mutates the input tensor using a rotation matrix: $$ begin{aligned} X_{n,i} &= X_{n,i} cos(ntheta_i) – X_{n,frac{d}{2}+i} sin(ntheta_i) \ X_{n,frac{d}{2}+i} &= X_{n,i} sin(ntheta_i) + X_{n,frac{d}{2}+i} cos(ntheta_i) end{aligned} $$ where $X_{n,i}$ is the $i$-th element of the vector at the $n$-th position of the sequence of tensor $X$.