Together AI's Posts (85)

Senior Software Development Engineer in Test

As a core member on the SDET Team at Together AI, you will be a key player in setting a high quality bar for our users and customers. Your primary focus will be on designing and implementing automated testing processes using Typescript, Golang, and Python. We’re looking for an independent and self-sufficient member who works closely with stakeholders and engineers, ensuring the overall quality of the products we have at Together AI. Preferred Qualifications Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancement such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers and engineers in our journey in building the next generation AI infrastructure. We offer competitive compensation, startup equity, health insurance and other competitive benefits. The US base salary range for this full-time position is: $140,000 - $220,000 + equity + benefits. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge. Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more. Please see our privacy policy athttps://www.together.ai/privacy - Bachelor's degree in Computer Science, Software Engineering, or a related field or 5+ years of industry experience - Proficiency in Golang, Python, or TypeScript - Proven experience as an SDET, or a similar role in a software development environmentStrong knowledge of automation testing methodologies, tools, and best practicesExperience in Cypress, Rest API Testing, or K6Passionately committed to ensuring the highest standards of software quality and dedicated to delivering top-notch products to our users - Strong knowledge of automation testing methodologies, tools, and best practices - Experience in Cypress, Rest API Testing, or K6 - Passionately committed to ensuring the highest standards of software quality and dedicated to delivering top-notch products to our users - Excellent problem-solving skills and attention to detail - Strong communication and collaboration skills - Self-motivated and adaptable in a fast-paced startup environment - Strong knowledge of automation testing methodologies, tools, and best practices - Experience in Cypress, Rest API Testing, or K6 - Passionately committed to ensuring the highest standards of software quality and dedicated to delivering top-notch products to our users - Familiarity with AI and machine learning concepts - Experience with CI/CD, Argo CD, or Github actions - Experience testing sites running on AWS and EKS - With the SDET Team, develop a sustainable test automation strategy and drive accountability and ownership across relevant teams to maintain these practicesIdentify project needs and establish QA best practices and processes that take into account the team's resources, roadmap, and quality standardsUphold high quality standards using user impact as a factor in decisionsWork closely with engineering and product teams to understand project requirements and align on testing goals, defining strategies and test plans for the project. - Identify project needs and establish QA best practices and processes that take into account the team's resources, roadmap, and quality standards - Uphold high quality standards using user impact as a factor in decisions - Work closely with engineering and product teams to understand project requirements and align on testing goals, defining strategies and test plans for the project. - Create and maintain robust test automation frameworks using Cypress to increase test efficiency and coverage - Write, maintain, and execute automated test scripts for functionality, performance, and reliability testing - Conduct automated regression testing to validate software changes and updates - Document test automation processes, findings, and results for reference and reporting purposes - Stay current on emerging testing tools, best practices, and quality assurance trends - Identify project needs and establish QA best practices and processes that take into account the team's resources, roadmap, and quality standards - Uphold high quality standards using user impact as a factor in decisions - Work closely with engineering and product teams to understand project requirements and align on testing goals, defining strategies and test plans for the project.

Location: San Francisco

Salary range: None - None

Senior Software Engineer - Together Cloud Infrastructure

Together AI is building the AI Acceleration Cloud, an end-to-end platform for the full generative AI lifecycle, combining the fastest LLM inference engine with state-of-the-art AI cloud infrastructure. As a Senior AI Infrastructure Engineer, you will play a key role in building the next generation AI cloud platform – a highly available, global, blazing-fast cloud infrastructure that virtualizes cutting-edge ML hardware (GB200s/GB300s, BlueField DPUs) and enables state-of-the-art ML practitioners with self-serve AI cloud services, such as on-demand + managed Kubernetes and Slurm clusters. This platform serves both our internal SaaS products (inference, fine-tuning) and our external cloud customers, spanning dozens of data centers across the world. Some of what you’ll work on: To be successful, you’ll need to be deeply technical and possess excellent communication, collaboration, and diplomacy skills. You have strong fundamental software development skills. In addition, you have strong systems knowledge and troubleshooting abilities. Requirements Responsibilities About Together AI Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancement such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers in our journey in building the next generation AI infrastructure. Compensation We offer competitive compensation, startup equity, health insurance, and other benefits, as well as flexibility in terms of remote work. The US base salary range for this full-time position is: $160,000 - $230,000 + equity + benefits. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge. Equal Opportunity Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more. Please see our privacy policy athttps://www.together.ai/privacy - Design, build, and maintain performant, secure, and highly-available backend services/operators that run in our data centers and automate hardware management, such as Infiniband partitioning, in-DC parallel storage provisioning, and VM provisioning. - Design and build out the IaaS software layer for a new GB200 data center with thousands of GPUs. - Work on a global multi-exabyte high-performance object store, serving massive datasets for pretraining. - Build advanced observability stacks for our customers with automated node lifecycle management for fault-tolerant distributed pretraining. - 5+ years of professional software development experience and proficiency in at least one backend programming language (Golang desired) - 5+ years experience writing high-performance, well-tested, production quality code - Demonstrated experience with building and operating high-performance and/or globally distributed micro-service architectures across one or more cloud providers (AWS, Azure, GCP) - Excellent communication skills – able to write clear design docs and work effectively with both technical and non-technical team members - Deep experience with Kubernetes internals a big plus, such as implementing non-trivial Kubernetes operators, device/storage/network plugins, custom schedulers, or patches thereon or Kubernetes itself - Deep experience with VMs/hypervisors a big plus, such as QEMU/KVM, cloud-hypervisor, VFIO, virtio, PCIE passthrough, Kubevirt, SR-IOV - Deep experience with DC networking tech + solutions a big plus, such as VLAN, VXLAN, VPN, VPC, OVS/OVN - Experience with Cluster API or similar a big plus - Experience working on high-performance compute, networking, and/or storage a big plus - Experience virtualizing GPUs and/or Infiniband a big plus - Strong systems knowledge across compute, networking, and storage, including concurrency, memory management, performant I/O, and scale - Experience with infrastructure automation tools (Terraform, Ansible), monitoring/observability stacks (Prometheus, Grafana), and CI/CD pipelines (GitHub Actions, ArgoCD) - Experience building IaaS or PaaS systems at scale a plus - Experience with DPUs/SmartNICs a plus - GPU programming, NCCL, CUDA knowledge a plus - Perform architecture and research work for decentralized AI workloads - Work on the core, open-source Together AI platform - Create services, tools, and developer documentation - Create testing frameworks for robustness and fault-tolerance

Location: San Francisco

Salary range: None - None

Senior Software Engineer - Together Cloud Platform

Together AI is building the AI Acceleration Cloud, an end-to-end platform for the full generative AI lifecycle, combining the fastest LLM inference engine with state-of-the-art AI cloud infrastructure. As a Senior Backend Engineer, you will play a key role in building the next generation AI cloud platform – a highly available, global, blazing-fast cloud infrastructure that virtualizes cutting-edge ML hardware (GB200s/GB300s, BlueField DPUs) and enables state-of-the-art ML practitioners with self-serve AI cloud services, such as on-demand + managed Kubernetes and Slurm clusters. This platform serves both our internal StaaS products (inference, fine-tuning) and our external cloud customers, spanning dozens of data centers across the world. Some of what you’ll work on: About Together AI Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancement such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers in our journey in building the next generation AI infrastructure. Compensation We offer competitive compensation, startup equity, health insurance, and other benefits, as well as flexibility in terms of remote work. The US base salary range for this full-time position is: $160,000 - $230,000 + equity + benefits. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge. Equal Opportunity Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more. Please see our privacy policy athttps://www.together.ai/privacy - Work on a distributed GPU scheduling system for the on-demand clusters product, Instant Clusters. - Build out a global management plane for managing our data center compute, networking, and storage. - Design and build new customer-facing cloud platform services, delivering killer enterprise AI cloud features. - 5+ years of demonstrated experience in building large scale, fault tolerant, distributed systems and API microservices - Experience designing, analyzing and improving efficiency, scalability, and stability of various system resources - Excellent communication skills – able to write clear design docs and work effectively with both technical and non-technical team members - Demonstrated experience with building and operating high-performance and/or globally distributed microservice architectures across one or more cloud providers (AWS, Azure, GCP) - Strong systems knowledge across compute, networking, and storage, including concurrency, memory management, performant I/O, and scale - Experience developing against and managing a relational database, such as PostgreSQL - Expert-level programmer in one or more of programming language (Golang preferred) - Proficiency in version control practices and integrating IaC with CI/CD pipelines. - Experience with Kubernetes and containers preferred - Experience building and operating data infrastructure (Kinesis, Airflow, Kafka, etc) a plus - Bachelor’s or Master’s degree in Computer Science, Computer Engineering, or a related technical field, or equivalent practical experience - Identify, design, and develop foundational backend services that power Together’s commerce platform - Analyze and improve the robustness and scalability of existing distributed systems, APIs, databases, and infrastructure - Partner with product teams to understand functional requirements and deliver solutions that meet business needs - Write clear, well-tested, and maintainable software and IaC for both new and existing systems - Conduct design and code reviews, create developer documentation, and develop testing strategies for robustness and fault tolerance - Participate in an on-call rotation to address critical incidents when necessary

Location: San Francisco

Salary range: None - None

Site Reliability Engineer (Amsterdam)

As a Site Reliability Engineer (SRE) at Together, you are responsible for keeping all user-facing services and production systems running smoothly. You are a blend of a pragmatic operator and a software engineer that applies sound engineering principles, operational discipline, and mature automation to our operating environments and codebase. You specialize in systems (operating systems, storage subsystems, networking), while implementing best practices for availability, reliability and scalability, with varied interests in algorithms and distributed systems. Requirements Responsibilities About Together AI Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancement such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers and engineers in our journey in building the next generation AI infrastructure. Equal Opportunity Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more. Please see our privacy policy athttps://www.together.ai/privacy - 7+ years of professional SRE or related experience - Bachelor's degree in Computer Science or a related field or equivalent work experience - Expert knowledge of Ansible (roles, playbooks), Terraform, and Kubernetes - Proficiency in programming/scripting languages - Direct experience in monitoring and observability practices - Advanced knowledge of cloud services - Ability to thrive in a collaborative environment involving different stakeholders and subject matter experts - Be on an on-call (PagerDuty) rotation to respond to incidents that impact availability - Build and run our infrastructure with Ansible, Terraform, and Kubernetes to enable scaling to a massive number of concurrent users - Build monitoring systems to ensure the highest quality service for our customers - Design and implement operational processes (such as deployments and upgrades) - Debug production issues across all services and levels of the stack - Identify improvements for the product architecture from the reliability, performance and availability perspectives - Plan the growth of Together AI’s infrastructure

Location: Amsterdam

Salary range: None - None

AI Native Account Executive

As a Startup Account Executive at Together, you’ll drive AI innovation by securing strategic deals with the fastest growing startups in the world. You’ll develop deep relationships with your clients to help them achieve their ambitious goals, accelerating both innovation & their impact on the world. You’ll work cross-functionally with product, engineering, and research to help deliver the best products for your customers. The ideal candidate will have a passion for entrepreneur & AI, relationship-driven selling, and a fast paced environment. Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancement such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers in our journey in building the next generation AI infrastructure. We offer competitive compensation, startup equity, health insurance and other competitive benefits. The US base salary range for this full-time position is: $180 - 250K OTE + equity + benefits. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge. This is a hybrid role based in the Bay Area. Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more. Please see our privacy policy athttps://www.together.ai/privacy - Generate pipeline & win new business in the startup ecosystem. - Design & execute creative, strategic & customer centric sales strategies to meet & exceed revenue quotas - Find creative ways to integrate into the startup ecosystem & become a trusted partner of founders & their teams - Collaborate on product roadmaps & features by bringing the voice of the customer into Together - Work closely with the SDR team to help refine outbound approach, inform product-market fit, messaging & value prop for Together products. - 5-10 years of experience in sales, with a track record of exceeding targets - Technical, passion for technology & a desire to work with highly technical teams and products - An excellent communicator with both clients and internal teams - Adaptability, coachability, high drive and sense of urgency - enjoys working within a fast-paced environment wearing multiple hats - Enjoys experimenting with the sales pitch/process to achieve company goals - Experience and success with pipeline generation - A passion for & experience with AI systems and/or infrastructure / API products highly preferred

Location: San Francisco

Salary range: None - None

1 ... 8 9 10 11 ... 17