I build scalable and delightful Search systems with cutting edge ML technologies. I manage Search and Ranking tech teams to deliver high impact customer facing results. Over past 15 years, worked on several Search systems including Learning to Rank, L1/L2 Rankers, Semantic Matching, Query Understanding and Autocompletion, Search Relevance/Defects, Cold-start, Multilingual Search, Product Recommendation and Discovery at leading search engines (Microsoft Bing, Amazon Product Search, Xapian).
Based in SF Bay Area, in the middle of Search revolution.
Research Interests
Information retrieval, machine learning, text mining, statistical natural language processing, deep-learning, data science
Updates
(Not updated actively)
Oct, 2022: Paper accepted at CIKM 2022 on Bayesian methods to address sold-start in Product Search (details)
Mar, 2022: Paper accepted at SIGIR 2022 on role of clicks in ranking and unbiased learning to rank
Jun, 2021: Paper accepted at CIKM 2021 on seasonal relevance in Produc Search
Jan, 2020: Poster paper accepted at WWW 2020 on cold-start in Search
July, 2019: Attended SIGIR 2019 in Paris
July 2019: Organised 2nd Edition of Search Workshop at Amazon ML Conference 2019 in Seattle
May, 2019: Attended WWW 2019 in San Francisco
November, 2018: Have moved to Amazon Search Science and AI team (A9) in Palo Alto.
June, 2018: Invited to co-chair Demo track at CODS-COMAD 2019.
March, 2018: Gave an invited talk at IIT Kharagpur on Machine Learning at Amazon
March, 2018: Gave a couple of Guest Lectures in Machine Learning course at IIT Kharagpur (details)
Feb, 2018: Short paper accepted at WWW 2018
Jan, 2018: Attended CODS-COMAD Conference in Goa
Nov, 2017: Gave a talk on Deep Learning at GHCI 2017. (Details)
Sept, 2017: Gave a tutorial on distributed training using MXNet at Amazon India AI Summit in Bangalore.
March, 2017: Joined Core ML team at Amazon as ML Scientist to take on cutting-edge ML research
Jan, 2017: Successfully defended PhD thesis
Dec, 2016: Paper accepted at ECIR 2017 titled "Learning to classify inappropriate query-completions"
Dec, 2016: Paper accepted at Information Processing & Management titled "Continuous space models for CLIR"
Cross-view Embeddings for Information Retrieval [pdf]
Parth Gupta Doctoral Thesis, UPV (Valencia, Spain)
Learning to Rank - Using Bayesian Networks [pdf]
Parth Gupta Master's Thesis, DA-IICT (Gandhinagar, India)
Journals
Continuous space models for CLIR [pdf]
Parth Gupta, Rafael E. Banchs and Paolo Rosso Information Processing & Management, 2016 (Impact Factor: 1.397)
Cross-language plagiarism detection over continuous-space-and knowledge graph-based representations of language
Parth Gupta, Marc Franco-Salvador, Paolo Rosso and Rafael E. Banchs Knowledge-Based Systems, 2016 (Impact Factor: 3.325)
A deep source-context feature for lexical selection in statistical machine translation
Parth Gupta, Marta R. Costa-Jussà, Rafael E. Banchs and Paolo Rosso Pattern Recognition Letters, 2016 (Impact Factor: 1.586)
Squeezing bottlenecks: exploring the limits of autoencoder semantic representation capabilities
Parth Gupta, Rafael E. Banchs and Paolo Rosso Neurocomputing, 2016 (Impact Factor: 2.005)
Methods for Cross-Language Plagiarism Detection [code]
Alberto Barrón-Cedeño∗, Parth Gupta∗ and Paolo Rosso (∗ Equal contribution) Knowledge-Based Systems Vol. 50, 2013 (Impact Factor: 4.104)
Conferences
Treating Cold Start in Product Search by Priors[pdf]
Parth Gupta, Tommaso Dreossi, Jan Bakus, Yu-hsiang Lin and Vamsi Salaka Proceedings of WWW 2020 (Teipei, Taiwan)
Retrieving Information from Multiple Sources[pdf]
Anurag Roy, Kripabandhu Ghosh, Moumita Basu, Parth Gupta and Saptarshi Ghosh Proceedings of WWW 2018 (Lyon, France)
Learning to classify inappropriate query-completions [pdf]
Parth Gupta and Jose Santos Proceedings of ECIR 2017 (Scotland, UK)
Query Expansion for Mixed-script Information Retrieval [pdf][code] [demo]
Parth Gupta, Kalika Bali, Rafael Banchs, Monojit Choudhury and Paolo Rosso Proceedings of SIGIR 2014 (Gold Coast, Australia)
Enrichment of Bilingual Dictionary through News Stream Data [data-code]
Ajay Dubey, Parth Gupta, Vasudev Varma and Paolo Rosso Proceedings of LREC 2014 (Reykjavík, Iceland)
Cross-Language Plagiarism Detection using a Multilingual Semantic Network [pdf]
Marc Franco Salvador, Parth Gupta and Paolo Rosso Proceedings of ECIR 2013 (Moscow, Russia)
Expected Divergence based Feature Selection for Learning to Rank [pdf] [poster][code]
Parth Gupta and Paolo Rosso Proceedings of COLING 2012 (Mumbai, India)
Cross-language High Similarity Search using a Conceptual Thesaurus [pdf] [slides]
Parth Gupta, Alberto Barrón-Cedeño and Paolo Rosso Proceedings of CLEF 2012 (Rome, Italy)
Detection of Paraphrastic Cases of Mono-lingual and Cross-lingual Plagiarism [pdf]
Parth Gupta, Khushboo Singhal, Prasenjit Majumder and Paolo Rosso Proceedings of ICON 2011 (Chennai, India)
Workshops
Modeling of terms across scripts through autoencoders [pdf]
Parth Gupta SIGIR 2014 Doctoral Consortium (Gold Coast, Australia)
English-to-Hindi system description for WMT 2014: Deep Source-Context Features for Moses
Parth Gupta, Marta Costa-jussà Ruiz, Rafael E. Banchs, Paolo Rosso The 9th ACL WMT workshop on Statistical Machine Translation, ACL 2014 (Baltimore, USA)
On Dimensionality Reduction Techniques for Cross-Language Information Retrieval [pdf]
Parth Gupta Future Directions in Information Access Symposium (FDIA) in Conjunction with European Summer School in Information Retrieval (ESSIR), FDIA@ESSIR, 2013 (Granada, Spain)
Text Reuse with ACL: (Upward) Trends [pdf] [slides]
Parth Gupta and Paolo Rosso Workshop on Rediscovering 50 Years of Discoveries, ACL 2012 (Jeju, South Korea)
Multiword Named Entities Extraction from Cross-Language Text Re-use [pdf] [slides]
Parth Gupta, Khushboo Singhal and Paolo Rosso CREDISLAS Workshop, LREC 2012 (Istanbul, Turkey)
Working notes
Mapping Hindi-English Text Re-use Document Pairs [pdf] [slides]
Parth Gupta and Khushboo Singhal Notebook Papers of Forum for Information Retrieval Evaluation, FIRE 2011 (Mumbai, India)
External & Intrinsic Plagiarism Detection: VSM & Discourse Markers based Approach [pdf]
Sameer Rao, Parth Gupta, Khushboo Singhal, and Prasenjit Majumder Notebook Papers of CLEF 2011 LABs and Workshops, CLEF 2011 (Amsterdam, The Netherlands)
External Plagiarism Detection: N-Gram Approach using Named Entity Recognizer [pdf]
Parth Gupta, Sameer Rao, and Prasenjit Majumder Notebook Papers of CLEF 2010 LABs and Workshops, CLEF 2010 (Padua, Italy)
Code
jDNN: Deep learning tookkit (under development and very less documented) with CUDA support. [java]
Mixed-script Equivalents: The code used in "Query Expansion for Multi-script Information Retrieval" with trained models. [java]
Xapian: Large scale search engine library - an open source project with commercial support. I implemented the learning to rank module xapian-letor which also was extended as GSoC project in 2012. [c++]
FS-ED for Letor: Expected divergence based feature selection module. The feature selection works really well (statistically significant) if used with ranking algirithm based on large margin classifiers e.g. RankSVM. [java]
Terrier Wrapper: A wrapper on top of Terrier 3.5 to perform variaous operations from collecting term/document statistics, using stemmer off-the-shelf, creating term-document matrix etc. More details available on the code page. [java]
Replicated SoftMax (RSM): Implementation for modelling RSM type Restricted Boltzmann Machine (RBM) (email for the copy). [octave]
Cross-language PD: Detailed fragment identification algorithm presented in Knoledge-Based Systems paper. [java]
IR Evaluation Framework: IR evaluation framework with measures like MAP, NDCG@k, MRR, Recall etc. [perl]