Ph.D. Candidate in Computer Science • Aug. 2016 — May. 2021 (expected)
Member of Data Mining Group. Supervised by Professor Jiawei Han.
Received Brian Totty Graduate Fellowship at Computer Science Department.
Shanghai Jiao Tong University ( SJTU )
B.S.E. in Computer Science and Technology • Sep. 2012 — Jun. 2016
Overall GPA: 3.92/4.0 (91.71/100) Major GPA: 3.98/4.0 (93.78/100)Rank: 1/78
Member of IEEE honor class, an elite program at SJTU which aims to nurture scientists in computer science,
electrical and electronic technology, and information science based on MIT’s educational model.
International Exchange Student • Jun. 2014 — Aug. 2014
One of 10 top students fully funded by notable alumnus Neil Shen.
Earned “Certificate of Excellence” at Yale University.
J. Shen, Z. Wu, D. Lei, J. Shang, X. Ren, J. Han, "SetExpan: Corpus-Based Set Expansion via Context Feature Selection and Rank Ensemble", accepted into The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2017). [PDF]
X. Ren, J. Shen, M. Qu, X. Wang, Z. Wu, Q. Zhu, M. Jiang, F. Tao, S. Sinha, D. Liem, P. Ping, R. Weinshilboum, J. Han, "Life-iNet: A Structured Network-Based Knowledge Exploration and Analytics System for Life Sciences", accepted into The 55th annual meeting of the Association for Computational Linguistics (ACL 2017) System Demo. [PDF]
J. Shen, Y. Jia, X. Liu, Y. Huang, L. Fu, X. Wang, "Overlapping Community Detection in Temporal Text Networks", submitted into Tenth ACM International Conference on Web Search and Data Mining (WSDM 2017). [PDF]
J. He, Y. Huang, C. Liu, J. Shen, Y. Jia, X. Wang, "Text network exploration via heterogeneous web of topics", accepted into the Sixth IEEE ICDM Workshop on Data Mining in Networks (ICDM 2016). [PDF] [Demo]
J. Shen, Z. Song, S. Li, Z. Tan, Y. Mao, L. Fu, L. Song, X. Wang, "Modeling Topic-level Academic Influence in Scientific Literatures", accepted into Thirtieth AAAI Conference on Artificial Intelligence (AAAI 2016) Workshop on Scholarly Big Data. [PDF] [Slide]
J. Shen, Z. Tan, L. Fu, X. Wang, "Trend Analysis of Top-tier Conferences in Computer Network Field", accepted into Communications of the China Computer Federation, 11(9), 62-66, 2015. [PDF]
Z. Tan, C. Liu, Y. Mao, J. Shen, B. Wang, L. Fu, L. Song, X. Wang, "AceMap: A Novel Approach towards Displaying Relationship among Academic Literatures", accepted to 25th International World Wide Web Conference (WWW 2016). [PDF] [System]
Multi-Task Learning for Personal Search with Query Clustering
Google Intern Work • May 2017 — Aug. 2017
Developed a hierarchical clustering algorithm based on truncated SVD and varimax rotation for query clustering based on query/document attributes in personal search.
Proposed a query-dependent deep neural ranking model based on the multi-task learning framework.
Improved offline Gmail ranking quality by 0.8% in terms of MRR and 1.32% in terms of success@1.
Entity Set Aware Information Retrieval System
Supervised by Prof. Jiawei Han • Aug. 2017 — Present
Propose an unsupervised ranking algorithm based on entity language model for biomedical literature search.
Parse over 100GB PubTator and PubMed datasets, and build a real-time system based on ElasticSearch.
Overlapping Community Detection in Text Network
Supervised by Prof. Xinbing Wang • Feb, 2016 — Jul, 2016
Generated 32 large text networks based on Microsoft Academic Graph and studied their community structures.
Proposed an affiliation graph model to capture community interactions and consider information from both
link structures and node attributes.
Achieved 40% improvements in terms of the accuracy of detected communities on 17 real networks.
Topic-based Academic Information Retrieval System
Supervised by Prof. Xinbing Wang • Sep, 2015 — Jun. 2016
Managed a group of 7 people developing a topic-based search engine. Refactored and configured an open source enterprise search platform Solr.
Returned paper search results based on both word-level and topic-level similarities with user’s query.
Ranked papers according to their influence scores as well as their relevance to the query.
Visualized the topic distribution of each paper and topic evolution among the whole corpus.
Modeling Academic Influence in Scientific Literatures
Supervised by Prof. Xinbing Wang • Apr, 2015 — Sep, 2015
Devised a generative model named Reference Topic Model (RefTM) to utilize both the textual content and citation information in scientific literatures.
Proposed a fast inference algorithm based on collapsed Gibbs Sampling to learn RefTM effectively.
Introduced a quantitative metric named J-Index to model academic influence in scientific literatures.
Designed experiments on a collection of over 420,000 research papers to validate the effectiveness of J-Index.
Exponential Interest Aggregation in Named Data Networking
Supervised by Prof. Weijia Jia • Apr, 2015 — Jul, 2015
Proposed Exponential Interest Aggregation (EIA), an adaptive forwarding strategy addressing hop-by-hop congestion control problem in Named Data Networking (NDN).
Established the Interest aggregation state transition framework in NDN, and analyzed the effectiveness of EIA algorithm mathematically under this framework.
Conducted simulation to evaluate the performance of EIA, and showed that EIA improved average delay by 13%, average number of retransmission by 25%, and cache hit ratio by 61%
Accent Classification of English Speakers
This work was advised by Prof. Kai Yu.
Solved the accent classification problem through four steps -- word segmentation, feature extraction, clip classification and recording classification.
Built a system with Mel-Frequency Cepstral Coefficients (MFCC) as the feature vector and Logistic Regression as the classification method, and achieved an overall classification accuracy of 97%.
PM2.5 Concentration Prediction using Time Series based Data Mining
This work was advised by Prof. Bo Yuan.
Formalized PM2.5 prediction problem, an important issue in the control and reduction of pollutants in the air.
Applied three methods to deal with PM2.5 prediction problem, including AutoRegressive-Moving-Average (ARMA) model, Stochastic Volatility (SV) model, and Stock-Watson (SW) model.
Designed a novel model combining Stock-Watson model with Time Series Neural Network to achieve a better prediction accuracy for PM2.5 concentration in the next 6 hours.
A Map-Generating and Speed-Optimizing Driving System
This work was advised by Prof. Xinbing Wang.
Designed and implemented a traffic signal schedule inference model to estimate the duration of traffic light.
Integrated this model in a speed-optimizing driving system named CityDrive, which reduced the total waiting time per vehicle by 98.8% in the simulation and saved 58.8% of kinetic energy in real tests.
Static Website Construction
Developed Shanghai Jiao Tong University president home page (Chinese Version), based on Joomla! CMS.
Created several static personal home pages and static Blogs, using GitHub Pages combined with Jekyll.