I work in the area of machine learning and information retrieval. Currently, I am Applied Science Manager with Amazon Search headquartered in Palo Alto. Our team works on cold-start and product discovery challenges in Ranking.
Feel free to reach out if you want to collaborate!
Research Interests
Information retrieval, machine learning, text mining, statistical natural language processing, deep-learning, data science
Updates
Oct, 2022: Paper accepted at CIKM 2022 on Bayesian methods to address sold-start in Product Search (details)
Mar, 2022: Paper accepted at SIGIR 2022 on role of clicks in ranking and unbiased learning to rank
Jun, 2021: Paper accepted at CIKM 2021 on seasonal relevance in Produc Search
Jan, 2020: Poster paper accepted at WWW 2020 on cold-start in Search
July, 2019: Attended SIGIR 2019 in Paris
July 2019: Organised 2nd Edition of Search Workshop at Amazon ML Conference 2019 in Seattle
May, 2019: Attended WWW 2019 in San Francisco
November, 2018: Have moved to Amazon Search Science and AI team (A9) in Palo Alto.
June, 2018: Invited to co-chair Demo track at CODS-COMAD 2019.
March, 2018: Gave an invited talk at IIT Kharagpur on Machine Learning at Amazon
March, 2018: Gave a couple of Guest Lectures in Machine Learning course at IIT Kharagpur (details)
Feb, 2018: Short paper accepted at WWW 2018
Jan, 2018: Attended CODS-COMAD Conference in Goa
Nov, 2017: Gave a talk on Deep Learning at GHCI 2017. (Details)
Sept, 2017: Gave a tutorial on distributed training using MXNet at Amazon India AI Summit in Bangalore.
March, 2017: Joined Core ML team at Amazon as ML Scientist to take on cutting-edge ML research
Jan, 2017: Successfully defended PhD thesis
Dec, 2016: Paper accepted at ECIR 2017 titled "Learning to classify inappropriate query-completions"
Dec, 2016: Paper accepted at Information Processing & Management titled "Continuous space models for CLIR"
Cross-view Embeddings for Information Retrieval [pdf]
Parth Gupta Doctoral Thesis, UPV (Valencia, Spain)
Learning to Rank - Using Bayesian Networks [pdf]
Parth Gupta Master's Thesis, DA-IICT (Gandhinagar, India)
Journals
Continuous space models for CLIR [pdf]
Parth Gupta, Rafael E. Banchs and Paolo Rosso Information Processing & Management, 2016 (Impact Factor: 1.397)
Cross-language plagiarism detection over continuous-space-and knowledge graph-based representations of language
Parth Gupta, Marc Franco-Salvador, Paolo Rosso and Rafael E. Banchs Knowledge-Based Systems, 2016 (Impact Factor: 3.325)
A deep source-context feature for lexical selection in statistical machine translation
Parth Gupta, Marta R. Costa-Jussà, Rafael E. Banchs and Paolo Rosso Pattern Recognition Letters, 2016 (Impact Factor: 1.586)
Squeezing bottlenecks: exploring the limits of autoencoder semantic representation capabilities
Parth Gupta, Rafael E. Banchs and Paolo Rosso Neurocomputing, 2016 (Impact Factor: 2.005)
Methods for Cross-Language Plagiarism Detection [code]
Alberto Barrón-Cedeño∗, Parth Gupta∗ and Paolo Rosso (∗ Equal contribution) Knowledge-Based Systems Vol. 50, 2013 (Impact Factor: 4.104)
Conferences
Treating Cold Start in Product Search by Priors[pdf]
Parth Gupta, Tommaso Dreossi, Jan Bakus, Yu-hsiang Lin and Vamsi Salaka Proceedings of WWW 2020 (Teipei, Taiwan)
Retrieving Information from Multiple Sources[pdf]
Anurag Roy, Kripabandhu Ghosh, Moumita Basu, Parth Gupta and Saptarshi Ghosh Proceedings of WWW 2018 (Lyon, France)
Learning to classify inappropriate query-completions [pdf]
Parth Gupta and Jose Santos Proceedings of ECIR 2017 (Scotland, UK)
Query Expansion for Mixed-script Information Retrieval [pdf][code] [demo]
Parth Gupta, Kalika Bali, Rafael Banchs, Monojit Choudhury and Paolo Rosso Proceedings of SIGIR 2014 (Gold Coast, Australia)
Enrichment of Bilingual Dictionary through News Stream Data [data-code]
Ajay Dubey, Parth Gupta, Vasudev Varma and Paolo Rosso Proceedings of LREC 2014 (Reykjavík, Iceland)
Cross-Language Plagiarism Detection using a Multilingual Semantic Network [pdf]
Marc Franco Salvador, Parth Gupta and Paolo Rosso Proceedings of ECIR 2013 (Moscow, Russia)
Expected Divergence based Feature Selection for Learning to Rank [pdf] [poster][code]
Parth Gupta and Paolo Rosso Proceedings of COLING 2012 (Mumbai, India)
Cross-language High Similarity Search using a Conceptual Thesaurus [pdf] [slides]
Parth Gupta, Alberto Barrón-Cedeño and Paolo Rosso Proceedings of CLEF 2012 (Rome, Italy)
Detection of Paraphrastic Cases of Mono-lingual and Cross-lingual Plagiarism [pdf]
Parth Gupta, Khushboo Singhal, Prasenjit Majumder and Paolo Rosso Proceedings of ICON 2011 (Chennai, India)
Workshops
Modeling of terms across scripts through autoencoders [pdf]
Parth Gupta SIGIR 2014 Doctoral Consortium (Gold Coast, Australia)
English-to-Hindi system description for WMT 2014: Deep Source-Context Features for Moses
Parth Gupta, Marta Costa-jussà Ruiz, Rafael E. Banchs, Paolo Rosso The 9th ACL WMT workshop on Statistical Machine Translation, ACL 2014 (Baltimore, USA)
On Dimensionality Reduction Techniques for Cross-Language Information Retrieval [pdf]
Parth Gupta Future Directions in Information Access Symposium (FDIA) in Conjunction with European Summer School in Information Retrieval (ESSIR), FDIA@ESSIR, 2013 (Granada, Spain)
Text Reuse with ACL: (Upward) Trends [pdf] [slides]
Parth Gupta and Paolo Rosso Workshop on Rediscovering 50 Years of Discoveries, ACL 2012 (Jeju, South Korea)
Multiword Named Entities Extraction from Cross-Language Text Re-use [pdf] [slides]
Parth Gupta, Khushboo Singhal and Paolo Rosso CREDISLAS Workshop, LREC 2012 (Istanbul, Turkey)
Working notes
Mapping Hindi-English Text Re-use Document Pairs [pdf] [slides]
Parth Gupta and Khushboo Singhal Notebook Papers of Forum for Information Retrieval Evaluation, FIRE 2011 (Mumbai, India)
External & Intrinsic Plagiarism Detection: VSM & Discourse Markers based Approach [pdf]
Sameer Rao, Parth Gupta, Khushboo Singhal, and Prasenjit Majumder Notebook Papers of CLEF 2011 LABs and Workshops, CLEF 2011 (Amsterdam, The Netherlands)
External Plagiarism Detection: N-Gram Approach using Named Entity Recognizer [pdf]
Parth Gupta, Sameer Rao, and Prasenjit Majumder Notebook Papers of CLEF 2010 LABs and Workshops, CLEF 2010 (Padua, Italy)
Code
jDNN: Deep learning tookkit (under development and very less documented) with CUDA support. [java]
Mixed-script Equivalents: The code used in "Query Expansion for Multi-script Information Retrieval" with trained models. [java]
Xapian: Large scale search engine library - an open source project with commercial support. I implemented the learning to rank module xapian-letor which also was extended as GSoC project in 2012. [c++]
FS-ED for Letor: Expected divergence based feature selection module. The feature selection works really well (statistically significant) if used with ranking algirithm based on large margin classifiers e.g. RankSVM. [java]
Terrier Wrapper: A wrapper on top of Terrier 3.5 to perform variaous operations from collecting term/document statistics, using stemmer off-the-shelf, creating term-document matrix etc. More details available on the code page. [java]
Replicated SoftMax (RSM): Implementation for modelling RSM type Restricted Boltzmann Machine (RBM) (email for the copy). [octave]
Cross-language PD: Detailed fragment identification algorithm presented in Knoledge-Based Systems paper. [java]
IR Evaluation Framework: IR evaluation framework with measures like MAP, NDCG@k, MRR, Recall etc. [perl]