Research Software Engineer - Clinical NLP Specialty (Data Science and AI Institute)
About the position Responsibilities • The successful candidates will participate in ground-breaking research projects that need advanced software solutions requiring expertise in software engineering not commonly found in scientific collaborations. • The projects will require development of state-of-the art clinical NLP solutions using the latest deep learning libraries trained on state-of-the-art hardware in secure healthcare computing environments. • Projects will involve analysis of massive data sets either in the cloud or on premises. • Projects will require development of novel NLP software pipelines for processing of unstructured clinical notes. • Some projects may require deep engagement, possibly leading to co-authorship on scientific publications, while others may involve a more casual consulting engagement. • They may require software solutions developed from scratch or refactoring existing solutions to make them conform to industry standards (quality, efficiency, reusability, robustness, portability, documentation, etc.). • It is a high-level goal of DSAI to translate the efforts for the individual projects into frameworks and template patterns for sustainable scientific infrastructure benefiting future projects. Requirements • Strong NLP, LLM, machine learning and deep learning skills. • Practical experience building NLP models and pipelines in a secure, HIPPA compliant healthcare environment. • Expert-level knowledge of multiple modern NLP and LLM libraries and models. • Hands-on experience adapting and fine-tuning large language models for domain-specific clinical applications, with attention to data efficiency, interpretability, and reproducibility. • Demonstrated expertise in prompt engineering, evaluation, and benchmarking of large language models, including applying responsible AI principles in clinical or sensitive-data contexts • Expert-level knowledge of the Python programming language. • Familiarity with or willingness to learn C++ or other languages as may be needed. • Familiarity with software containerization technologies such as Docker and Singularity. • Familiarity with the Databricks platform. • Fluency in the Linux operating system and related tools. • Familiarity with modern software engineering best practices, such as Git source control, peer code review, test-driven development, build automation and continuous integration / continuous delivery. • Familiarity with cloud development and deployment. • Demonstrated leadership and self-direction. • Willingness to teach others both informally and in short course format. • Willingness to continually learn new tools and techniques as needed. • Excellent verbal and written communication. • Masters in a quantitative discipline such as computer science, engineering, physics or bioinformatics, with strong scientific computing and/or mathematics background. • Three year's experience working in software development in large clinical NLP projects in industry or academia. • Additional education may substitute for required experience, and additional related experience may substitute for required education beyond a high school diploma/graduation equivalent, to the extent permitted by the JHU equivalency formula. Nice-to-haves • PhD in a quantitative discipline. • Five (5) years' experience as above in clinical NLP. • Experience in CUDA GPU programming. • Experience authoring open-source Python packages in PyPI. • Experience in open-source project governance. • Experience in open-source community adoption initiatives. Apply tot his job