Inference Software Engineer
Etched · Cupertino, California, US
About Etched Etched is building AI chips that are hard-coded for individual model architectures. Our first product (Sohu) only supports transformers, but has...
Job description
About Etched Etched is building AI chips that are hard-coded for individual model architectures. Our first product (Sohu) only supports transformers, but has an order of magnitude more throughput and lower latency than a B200. With Etched ASICs, you can build products that would be impossible with GPUs, like real-time video generation models and extremely deep & parallel chain-of-thought reasoning agents. Key responsibilities: - Contribute to the architecture and design of the Sohu host software stack - Implement high-performance, modular code across the complete Etched software stack, consisting of a mix of Rust, C++ and Python. - Interface with firmware and drivers teams delivering highest-performance HW/SW stack. - Work with AI model researchers and product-facing teams building out the Etched serving front-end. Representative projects: You may be a good fit if you have - Build scheduling logic for handling continuous batching and real time inference - Implement inference-time acceleration techniques such as speculative decoding, tree search, KV cache sharing, etc. - Implement distributed networking primitives for efficient multi-server inference - Experience with C++ and Python...