Inkit Padhi

Hi! My name is Inkit. I’m a researcher interested in fields of Machine Learning (ML) and Natural Language Processing (NLP). My current work centers around improving the safety, reliability, and trustworthiness of Large Language Models (LLMs).

Currently, my primary focus and interests lie in safety and alignment, leveraging synthetic data, developing effective steering strategies, and unlocking advanced reasoning abilities in LLMs.

My past research has encompassed various areas of deep learning, including but not limited to, learning representations for diverse modalities, counterfactual generation, interpretability, text style transfer, unsupervised learning, influence-based attribution and more. I began my research journey at USC/ISI under the guidance of Kevin Knight; our work on probing in sequence models established a foundational contribution in the field of interpretability.

Email: $first_name.$last_name@gmail.com

Updates

Apr, 2025	I will be at ICLR in Singapore presenting CAST (w/ Bruce).
Apr, 2025	Granite Guardian tops the GuardBench, a leaderboard for Guardrail models. Learn more in our blog post.
Feb, 2025	Granite Guardian has been accepted to the NAACL ‘25 Industry Track (Oral)! The latest Granite Guardian 3.1 model now also covers agentic risks.
Jan, 2025	CAST, work on conditional activation steering, is accepted as a spotlight at ICLR ‘25. [Code]
Dec, 2024	Attending NeurIPS ‘24 to present our work at the Pluralistic Alignment workshop.
Dec, 2024	Our paper “Granite Guardian” is now available as a preprint on arXiv.
Nov, 2024	I’ll be presenting our work “Value Alignment From Unstructured Text” at EMNLP 2024
Oct, 2024	Granite Guardian 3.0 is out! It helps detect input and response risks, including various harm and RAG hallucinations.
Sep, 2024	CAST: Checkout my exceptional summer intern, Bruce Lee’s, work on conditional activation steering.
Aug, 2024	Alignment Studio is accepted to IEEE Internet Computing! We introduce an architecture that facilitates alignment of LMs to specific values, norms and regulations within a context.

Selected Publications

A brief glimpse into things I've scribbled. For a comprehensive list, please refer to this page.

Value Alignment from Unstructured Text

Inkit Padhi, Karthikeyan Natesan Ramamurthy, Prasanna Sattigeri, and 3 more authors

In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, Nov 2024

Bib PDF

@inproceedings{padhi-etal-2024-value,
  title = {Value Alignment from Unstructured Text},
  author = {Padhi, Inkit and Natesan Ramamurthy, Karthikeyan and Sattigeri, Prasanna and Nagireddy, Manish and Dognin, Pierre and Varshney, Kush},
  editor = {Dernoncourt, Franck and Preo{\c{t}}iuc-Pietro, Daniel and Shimorina, Anastasia},
  booktitle = {Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track},
  month = nov,
  year = {2024},
  address = {Miami, Florida, US},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2024.emnlp-industry.81},
  pages = {1083--1095},
}

Programming Refusal with Conditional Activation Steering

Bruce W. Lee, Inkit Padhi, Karthikeyan Natesan Ramamurthy, and 4 more authors

Nov 2024

Bib PDF

@misc{lee2024programmingrefusalconditionalactivation,
  title = {Programming Refusal with Conditional Activation Steering},
  author = {Lee, Bruce W. and Padhi, Inkit and Ramamurthy, Karthikeyan Natesan and Miehling, Erik and Dognin, Pierre and Nagireddy, Manish and Dhurandhar, Amit},
  year = {2024},
  eprint = {2409.05907},
  archiveprefix = {arXiv},
  primaryclass = {cs.LG},
  url = {https://arxiv.org/abs/2409.05907},
}

ComVas: Contextual Moral Values Alignment System

Inkit Padhi, Pierre Dognin, Jesus Rios, and 8 more authors

In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI-24, Aug 2024

Demo Track

DOI Bib PDF

@inproceedings{ijcai2024p1026,
  title = {ComVas: Contextual Moral Values Alignment System},
  author = {Padhi, Inkit and Dognin, Pierre and Rios, Jesus and Luss, Ronny and Achintalwar, Swapnaja and Riemer, Matthew and Liu, Miao and Sattigeri, Prasanna and Nagireddy, Manish and Varshney, Kush R. and Bouneffouf, Djallel},
  booktitle = {Proceedings of the Thirty-Third International Joint Conference on
                 Artificial Intelligence, {IJCAI-24}},
  publisher = {International Joint Conferences on Artificial Intelligence Organization},
  editor = {Larson, Kate},
  pages = {8759--8762},
  year = {2024},
  month = aug,
  note = {Demo Track},
  doi = {10.24963/ijcai.2024/1026},
  url = {https://doi.org/10.24963/ijcai.2024/1026},
}

The Impact of Positional Encoding on Length Generalization in Transformers

Amirhossein Kazemnejad, Inkit Padhi, Karthikeyan Natesan Ramamurthy, and 2 more authors

In Advances in Neural Information Processing Systems, Aug 2023

Bib PDF

@inproceedings{NEURIPS2023_4e85362c,
  author = {Kazemnejad, Amirhossein and Padhi, Inkit and Natesan Ramamurthy, Karthikeyan and Das, Payel and Reddy, Siva},
  booktitle = {Advances in Neural Information Processing Systems},
  editor = {Oh, A. and Naumann, T. and Globerson, A. and Saenko, K. and Hardt, M. and Levine, S.},
  pages = {24892--24928},
  publisher = {Curran Associates, Inc.},
  title = {The Impact of Positional Encoding on Length Generalization in Transformers},
  url = {https://proceedings.neurips.cc/paper_files/paper/2023/file/4e85362c02172c0c6567ce593122d31c-Paper-Conference.pdf},
  volume = {36},
  year = {2023},
}

Tabular Transformers for Modeling Multivariate Time Series

Inkit Padhi, Yair Schiff, Igor Melnyk, and 6 more authors

In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Jun 2021

DOI PDF

Does String-Based Neural MT Learn Source Syntax?

Xing Shi, Inkit Padhi, and Kevin Knight

In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Nov 2016

DOI Bib PDF

@inproceedings{shi-etal-2016-string,
  title = {Does String-Based Neural {MT} Learn Source Syntax?},
  author = {Shi, Xing and Padhi, Inkit and Knight, Kevin},
  editor = {Su, Jian and Duh, Kevin and Carreras, Xavier},
  booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing},
  month = nov,
  year = {2016},
  address = {Austin, Texas},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/D16-1159},
  doi = {10.18653/v1/D16-1159},
  pages = {1526--1534},
}