Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
cma
12 months ago
|
parent
|
context
|
favorite
| on:
How has DeepSeek improved the Transformer architec...
Flash attention was also a set of common techniques in other areas of optimized software, yet the big guys weren't doing the optimizations when it came out and it significantly improved everything.
whimsicalism
12 months ago
[–]
yes, i agree that low-level & infra work is where a lot of deepseek's improvement came from
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: