https://feedx.site
公式: f(x)=λ⋅ELU(α,x)。新收录的资料是该领域的重要参考
one nice thing about working with bytes instead of UTF-16 is that a character set - the predicate in each ITE node - can be represented as a 256-bit bitvector. one bit per possible byte value. in the rust version this is just four u64s packed into a struct:,这一点在新收录的资料中也有详细论述
Challenge: Build the smallest transformer that can add two 10-digit numbers with = 99% accuracy on a held-out 10K test set.,更多细节参见新收录的资料