Sr. Data Engineer (Python, Data Pipelines, AWS)
CytoTronics
CytoTronics is seeking a highly motivated and experienced Senior Data Engineer with a background in designing and building highly parallelized data processing pipelines. As a member of a small and dynamic team, the ideal candidate will have experience with a variety of data engineering tools and a passion for working closely with cross-functional teams to deliver scalable, efficient, and impactful data solutions.
Who We Are
CytoTronics is disrupting the traditional drug discovery process. Spinning out from Harvard after a decade of research, our proprietary semiconductor-cell interface can deliver high-dimensional functional assessment of live-cell responses at scale.
The Pixel™ family of cloud-enabled cell-plate readers enable high-resolution, multiplexed, real-time assessment of live-cell characteristics, which provide deeper understanding of how chemical or genetic perturbations affect cell function.
The software team provides support to all activities at CytoTronics, from the embedded software to data analysis tools running in the cloud. As an early member of the software team, you will play a key role in shaping our data infrastructure, pipelines, and processing tools, enabling deeper biological insights at scale.
See this 90 second video to get a sense for our technology: Accelerating drug discovery with live cell insights at scale.
The Role
As a Senior Data Engineer at CytoTronics, you will be a hands-on contributor to designing and implementing scalable data pipelines that process large datasets generated by our instruments. You will work closely with the biology, data science, and front-end teams to ensure the seamless flow of data, from raw acquisition to advanced analysis.
You will contribute to our data science toolbox API, used by internal and external Data Scientists. You will interface with and contribute to the cloud backend that handles the multiple terabytes of data generated by our instruments and processed in our APIs.
In practice, you will be architecting and maintaining high-performance data pipelines that transform terabytes of raw data (e.g., images, videos, time-series measurements) into meaningful biological insights. You will collaborate with others in the software team to help design systems capable of handling the massive scale of our data operations.
Who You Are
- You are a seasoned data engineer with significant experience in designing and building data processing pipelines.
- You have a strong background in Python, with hands-on experience in libraries such as NumPy, Pandas, SciPy, scikit-learn.
- You thrive in a dynamic startup environment and enjoy solving complex data engineering challenges.
- You have experience with parallel computing frameworks like Dask or similar, and are comfortable working in distributed computing environments.
- You are proactive in optimizing data processes, identifying performance bottlenecks, and implementing creative solutions to ensure reliability and scalability.
- You have a deep understanding of data best practices, such as data validation, error handling, and performance monitoring.
- You are comfortable contributing to and being a part of DevOps activities and are adept at continuously improving CI/CD processes in the context of data infrastructure.
- You have the curiosity to go down the technology stack, identify where to be proactive, develop creative solutions, and improvements that have immediate impact.
- You are comfortable contributing to and being a part of DevOps activities and you have a knack for continuously improving the CI/CD process.
Key Requirements
- 7+ years of experience in building highly parallelized data processing pipelines using Python.
- Expertise in data manipulation, transformation, and visualization libraries (NumPy, Pandas, SciPy, scikit-learn, matplotlib, seaborn, TensorFlow, etc).
- Experience in distributed data processing with frameworks like Dask or similar.
- Strong understanding of cloud-based data storage, processing, and computing (AWS preferred).
Nice to Have
- Signal / Image Processing, Scientific Data Analysis, Bioinformatics, Data Visualization
- Software for scientific / analytical / imaging devices
- Cloud infrastructure DevOps, Docker, Kubernetes, Terraform, etc.
CytoTronics is an equal employment opportunity employer in Boston, United States. We offer competitive salary and equity compensation package. This role is full-time and out of our Boston South End office, with flexible in-person / work-from-home possibility. This role reports to the Head of Software.