Jing Li - Publications

Full List [Google Scholar]

Books

Authors: Xiaoyan Zhu, Jing Li, Yu Hao, Han Xiao and Minlie Huang
Publisher: Publishing House of Electronics Industry
ISBN: 9787121389924
Publish time: June 1st, 2020
Links: JD.com, Tmall.com

Preprints

FLM-101B: An Open LLM and How to Train It with $100K Budget

Xiang Li, Yiqun Yao, Xin Jiang, Xuezhi Fang, Xuying Meng, Siqi Fan, Peng Han, Jing Li, Li Du, Bowen Qin, Zheng Zhang, Aixin Sun and Yequan Wang

PDF Code Abstract BibTex

@article{DBLP:journals/corr/abs-2309-03852,
  author       = {Xiang Li and
                  Yiqun Yao and
                  Xin Jiang and
                  Xuezhi Fang and
                  Xuying Meng and
                  Siqi Fan and
                  Peng Han and
                  Jing Li and
                  Li Du and
                  Bowen Qin and
                  Zheng Zhang and
                  Aixin Sun and
                  Yequan Wang},
  title        = {{FLM-101B:} An Open {LLM} and How to Train It with {\textdollar}100K
                  Budget},
  journal      = {CoRR},
  volume       = {abs/2309.03852},
  year         = {2023},
  url          = {https://doi.org/10.48550/arXiv.2309.03852},
  doi          = {10.48550/arXiv.2309.03852},
  eprinttype   = {arXiv},
}

Multi-objective Large Language Model Alignment with Hierarchical Experts

Zhuo Li, Guodong Du, WeiYang Guo, Yigeng Zhou, Xiucheng Li, Wenya Wang, Fangming Liu, Yequan Wang, Deheng Ye, Min Zhang, Jing Li

PDF Code Abstract BibTex

@article{hoe2025,
  title        = {Multi-objective Large Language Model Alignment with Hierarchical Experts},
  author       = {Zhuo Li and Guodong Du and WeiYang Guo and Yigeng Zhou and Xiucheng Li and Wenya Wang and Fangming Liu and Yequan Wang and Deheng Ye and Min Zhang and Jing Li},
  journal      = {CoRR},
  volume       = {abs/2505.20925},
  year         = {2025},
  url          = {https://doi.org/10.48550/arXiv.2505.20925},
  doi          = {10.48550/arXiv.2309.03852},
  eprinttype   = {arXiv},
}

Jailbreak-R1: Exploring the Jailbreak Capabilities of LLMs via Reinforcement Learning

Weiyang Guo, Zesheng Shi, Zhuo Li, Yequan Wang, Xuebo Liu, Wenya Wang, Fangming Liu, Min Zhang, Jing Li

PDF Code Abstract BibTex

@article{jailbreakr12025,
  title        = {Jailbreak-R1: Exploring the Jailbreak Capabilities of LLMs via Reinforcement Learning},
  author       = {Weiyang Guo, Zesheng Shi, Zhuo Li, Yequan Wang, Xuebo Liu, Wenya Wang, Fangming Liu, Min Zhang, Jing Li},
  journal      = {CoRR},
  volume       = {abs/2505.20925},
  year         = {2025},
  url          = {https://doi.org/10.48550/arXiv.2505.20925},
  doi          = {10.48550/arXiv.2309.03852},
  eprinttype   = {arXiv},
}

Knowledge Grafting of Large Language Models

Guodong Du, Xuanning Zhou, Junlin Li, Zhuo Li, Zesheng Shi, Wanyu Lin , Ho-Kin Tang, Xiucheng Li, Fangming Liu, Wenya Wang, Min Zhang, Jing Li

PDF Code Abstract BibTex

@article{crafting2025,
  title        = {Knowledge Grafting of Large Language Models},
  author       = {Guodong DU and Xuanning Zhou and Junlin Li and Zhuo Li and Zesheng Shi and Wanyu Lin and Ho-Kin Tang and Xiucheng Li and Fangming Liu and Wenya Wang and Min Zhang and Jing Li},
  journal      = {CoRR},
  volume       = {abs/2505.18502},
  year         = {2025},
  url          = {https://doi.org/10.48550/arXiv.2505.18502},
  doi          = {10.48550/arXiv.2505.18502},
  eprinttype   = {arXiv},
}

2025

Function-to-Style Guidance of LLMs for Code Translation
Longhui Zhang, Bin Wang, Jiahao Wang, Xiaofeng Zhao, Min Zhang, Hao Yang, Meishan Zhang, Yu Li, Jing Li , Jun Yu, Min Zhang
ICML-25- The Forty-Second International Conference on Machine Learning, 2025.
PDF Code Abstract BibTex

Large language models (LLMs) have made significant strides in code translation tasks. However, ensuring both the correctness and readability of translated code remains a challenge, limiting their effective adoption in real-world software development. In this work, we propose F2STrans, a function-to-style guiding paradigm designed to progressively improve the performance of LLMs in code translation. Our approach comprises two key stages: (1) Functional learning, which optimizes translation correctness using high-quality source-target code pairs mined from online programming platforms, and (2) Style learning, which improves translation readability by incorporating both positive and negative style examples. Additionally, we introduce a novel code translation benchmark that includes up-to-date source code, extensive test cases, and manually annotated ground-truth translations, enabling comprehensive functional and stylistic evaluations. Experiments on both our new benchmark and existing datasets demonstrate that our approach significantly improves code translation performance. Notably, our approach enables Qwen-1.5B to outperform prompt-enhanced Qwen-32B and GPT-4 on average across 20 diverse code translation scenarios.

@inproceedings{F2STrans_icml25, title={Function-to-Style Guidance of LLMs for Code Translation}, author = {Longhui Zhang and Bin Wang and Jiahao Wang and Xiaofeng Zhao and Min Zhang and Hao Yang and Meishan Zhang and Yu Li and Jing Li and Jun Yu and Min Zhang}, booktitle = {The Forty-Second International Conference on Machine Learning (ICML)}, year={2025} }

Few-Shot Learner Generalizes Across AI-Generated Image Detection
Shiyu Wu, Jing Liu, Jing Li, Yequan Wang
ICML-25- The Forty-Second International Conference on Machine Learning, 2025.
PDF Code Abstract BibTex

Current fake image detectors trained on large synthetic image datasets perform satisfactorily on limited studied generative models. However, these detectors suffer a notable performance decline over unseen models. Besides, collecting adequate training data from online generative models is often expensive or infeasible. To overcome these issues, we propose Few-Shot Detector (FSD), a novel AI-generated image detector which learns a specialized metric space to effectively distinguish unseen fake images by utilizing very few samples. Experiments show that FSD achieves state-of-the-art performance by average ACC on GenImage dataset. More importantly, our method is better capable of capturing the intra-category common features in unseen images without further training.

@inproceedings{fsd_icml25, title={Few-Shot Learner Generalizes Across AI-Generated Image Detection}, author = {Shiyu Wu and Jing Liu and Jing Li and Yequan Wang}, booktitle = {The Forty-Second International Conference on Machine Learning (ICML)}, year={2025} }

MTSA: Multi-turn Safety Alignment for LLMs through Multi-round Red-teaming
WeiYang Guo, Jing Li , Wenya Wang, Yu Li, Daojing He, Jun Yu, Min Zhang
ACL-25- The 63rd Annual Meeting of the Association for Computational Linguistics, 2025
PDF Code Abstract BibTex

The proliferation of jailbreak attacks against large language models (LLMs) highlights the need for robust security measures. However, in multi-round dialogues, malicious intentions may be hidden in interactions, leading LLMs to be more prone to produce harmful responses. In this paper, we propose the Multi-Turn Safety Alignment (MTSA) framework, to address the challenge of securing LLMs in multi-round interactions. It consists of two stages: In the thought-guided attack learning stage, the red-team model learns about thought-guided multi-round jailbreak attacks to generate adversarial prompts. In the adversarial iterative optimization stage, the red-team model and the target model continuously improve their respective capabilities in interaction. Furthermore, we introduce a multi-turn reinforcement learning algorithm based on future rewards to enhance the robustness of safety alignment. Experimental results show that the red-team model exhibits state-of-the-art attack capabilities, while the target model significantly improves its performance on safety benchmarks.

@inproceedings{weiyang_acl25, title={MTSA: Multi-turn Safety Alignment for LLMs through Multi-round Red-teaming}, author = {WeiYang Guo and Jing Li and Yu Li and Wenya Wang and Daojing He and Jun Yu and Min Zhang}, booktitle = {The 63rd Annual Meeting of the Association for Computational Linguistics (ACL)}, year={2025} }

Multi-Modality Expansion and Retention for LLMs through Parameter Merging and Decoupling
Junlin Li, Guodong DU, Jing Li , Sim Kuan Goh, Wenya Wang, Yequan Wang, Fangming Liu, Ho-Kin Tang, Saleh Alharbi, Daojing He, Min Zhang
ACL-25- The 63rd Annual Meeting of the Association for Computational Linguistics, 2025
PDF Code Abstract BibTex

Fine-tuning Large Language Models (LLMs) with multimodal encoders on modality-specific data expands the modalities that LLMs can handle, leading to the formation of Multimodal LLMs (MLLMs). However, this paradigm heavily relies on resource-intensive and inflexible fine-tuning from scratch with new multimodal data. In this paper, we propose MMER (Multi-modality Expansion and Retention), a training-free approach that integrates existing MLLMs for effective multimodal expansion while retaining their original performance. Specifically, MMER reuses MLLMs' multimodal encoders while merging their LLM parameters. By comparing original and merged LLM parameters, MMER generates binary masks to approximately separate LLM parameters for each modality. These decoupled parameters can independently process modality-specific inputs, reducing parameter conflicts and preserving original MLLMs' fidelity. MMER can also mitigate catastrophic forgetting by applying a similar process to MLLMs fine-tuned on new tasks. Extensive experiments show significant improvements over baselines, proving that MMER effectively expands LLMs' multimodal capabilities while retaining 99% of the original performance, and also markedly mitigates catastrophic forgetting.

@inproceedings{junlin_acl25, title={Multi-Modality Expansion and Retention for LLMs through Parameter Merging and Decoupling}, author = {Junlin Li and Guodong DU and Jing Li and Sim Kuan Goh and Wenya Wang and Yequan Wang and Fangming Liu and Ho-Kin Tang and Saleh Alharbi and Daojing He and Min Zhang}, booktitle = {The 63rd Annual Meeting of the Association for Computational Linguistics (ACL)}, year={2025} }

Neural Parameter Search for Slimmer Fine-Tuned Models and Better Transfer
Guodong DU, Jing Li , Zitao Fang, Junlin Li, Runhua Jiang, Shuyang Yu, Yifei Guo, Yangneng Chen, Sim Kuan Goh, Ho-Kin Tang, Daojing He, Honghai LIU, Min Zhang
ACL-25- The 63rd Annual Meeting of the Association for Computational Linguistics, 2025
PDF Code Abstract BibTex

Foundation models and their checkpoints have significantly advanced deep learning, boosting performance across various applications. However, fine-tuned models often struggle outside their specific domains and exhibit considerable redundancy. Recent studies suggest that combining a pruned fine-tuned model with the original pre-trained model can mitigate forgetting, reduce interference when merging model parameters across tasks, and improve compression efficiency. In this context, developing an effective pruning strategy for fine-tuned models is crucial. Leveraging the advantages of the task vector mechanism, we preprocess fine-tuned models by calculating the differences between them and the original model. Recognizing that different task vector subspaces contribute variably to model performance, we introduce a novel method called Neural Parameter Search (NPS) for slimming down fine-tuned models. This method enhances pruning efficiency by searching through neural parameters of task vectors within low-rank subspaces. Our method has three key applications: enhancing knowledge transfer through pairwise model interpolation, facilitating effective knowledge fusion via model merging, and enabling the deployment of compressed models that retain near-original performance while significantly reducing storage costs. Extensive experiments across vision, NLP, and multi-modal benchmarks demonstrate the effectiveness and robustness of our approach, resulting in substantial performance gains.

@inproceedings{guodong_acl25, title={Neural Parameter Search for Slimmer Fine-Tuned Models and Better Transfer}, author = {Guodong DU and Jing Li and Zitao Fang and Junlin Li and Runhua Jiang and Shuyang Yu and Yifei Guo and Yangneng Chen and Sim Kuan Goh and Ho-Kin Tang and Daojing He and Honghai LIU and Min Zhang}, booktitle = {The 63rd Annual Meeting of the Association for Computational Linguistics (ACL)}, year={2025} }

Safety Alignment via Constrained Knowledge Unlearning
Zesheng Shi, Yucheng Zhou, Jing Li , Yuxin Jin, YU LI, Daojing He, Fangming Liu, Saleh Alharbi, Jun Yu, Min Zhang
ACL-25- The 63rd Annual Meeting of the Association for Computational Linguistics, 2025
PDF Code Abstract BibTex

Despite significant progress in safety alignment, large language models (LLMs) remain susceptible to jailbreak attacks. Existing defense mechanisms have not fully deleted harmful knowledge in LLMs, which allows such attacks to bypass safeguards and produce harmful outputs. To address this challenge, we propose a novel safety alignment strategy, Constrained Knowledge Unlearning (CKU), which focuses on two primary objectives: knowledge localization and retention, and unlearning harmful knowledge. CKU works by scoring neurons in specific multilayer perceptron (MLP) layers to identify a subset U of neurons associated with useful knowledge. During the unlearning process, CKU prunes the gradients of neurons in U to preserve valuable knowledge while effectively mitigating harmful content. Experimental results demonstrate that CKU significantly enhances model safety without compromising overall performance, offering a superior balance between safety and utility compared to existing methods. Additionally, our analysis of neuron knowledge sensitivity across various MLP layers provides valuable insights into the mechanics of safety alignment and model knowledge editing.

@inproceedings{zesheng_acl25, title={Safety Alignment via Constrained Knowledge Unlearning}, author = {Zesheng Shi and Yucheng Zhou and Jing Li and Yuxin Jin and YU LI and Daojing He and Fangming Liu and Saleh Alharbi and Jun Yu and Min Zhang}, booktitle = {The 63rd Annual Meeting of the Association for Computational Linguistics (ACL)}, year={2025} }

Speed Up Your Code: Progressive Code Acceleration Through Bidirectional Tree Editing
Longhui Zhang, Jiahao Wang, Meishan Zhang, GaoXiong Cao, Ensheng Shi, mayuchi, Jun Yu, Honghai LIU, Jing Li , Min Zhang
ACL-25- The 63rd Annual Meeting of the Association for Computational Linguistics, 2025
PDF Code Abstract BibTex

Large language models (LLMs) have made significant strides in code acceleration (CA) tasks. Current works typically fine-tune LLMs using slow-fast code pairs mined from online programming platforms. Although these methods are widely recognized for their effectiveness, the training data often lack clear code acceleration patterns and offer only limited speed improvements. Moreover, existing training methods, such as direct instruction fine-tuning (IFT), tend to overlook the hierarchical relationships among acceleration patterns. In this work, we introduce BITE, a novel training paradigm designed to improve LLMs' CA capabilities through two key innovations: (1) Bidirectional tree editing, which generates high-quality training data by incrementally transforming given code into both its most efficient and least efficient variants, and (2) Progressive code acceleration learning, which enables LLMs to internalize multi-level CA strategies by learning increasingly sophisticated acceleration patterns. Additionally, we introduce a new CA evaluation benchmark and metric for comprehensive assessment of model performance on CA tasks. Extensive experiments on both our benchmark and existing benchmarks demonstrate the effectiveness of our approach. Notably, BITE enables Qwen-1.5B to outperform prompt-enhanced GPT-4 and current training-based methods on average across five programming languages.

@inproceedings{longhui_acl25, title={Speed Up Your Code: Progressive Code Acceleration Through Bidirectional Tree Editing}, author = {Longhui Zhang and Jiahao Wang and Meishan Zhang and GaoXiong Cao and Ensheng Shi and mayuchi and Jun Yu and Honghai LIU and Jing Li and Min Zhang}, booktitle = {The 63rd Annual Meeting of the Association for Computational Linguistics (ACL)}, year={2025} }

DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization
Hexuan Deng, Wenxiang Jiao, Xuebo Liu, Jing Li, Min Zhang, Zhaopeng Tu
ACL-25- The 63rd Annual Meeting of the Association for Computational Linguistics, 2025
PDF Code Abstract BibTex

Large language models (LLMs) deliver impressive results but face challenges from increasing model sizes and computational costs. Structured pruning reduces model size and speeds up inference but often causes uneven degradation across domains, leading to biased performance. To address this, we propose DRPruning, a method that dynamically adjusts the data distribution during training to restore balanced performance across heterogeneous and multi-tasking data. Experiments in monolingual and multilingual settings show that DRPruning surpasses similarly sized models in both pruning and continued pretraining over perplexity, downstream tasks, and instruction tuning. Further analysis demonstrates the robustness of DRPruning towards various domains and distribution shifts. Furthermore, DRPruning can determine optimal reference losses and data ratios automatically, suggesting potential for broader applications.

@inproceedings{hexuan_acl25, title={DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization}, author = {Hexuan Deng and Wenxiang Jiao and Xuebo Liu and Jing Li and Min Zhang and Zhaopeng Tu}, booktitle = {The 63rd Annual Meeting of the Association for Computational Linguistics (ACL)}, year={2025} }

Reflection on Knowledge Graph for Large Language Models Reasoning
Yigeng Zhou, Wu Li, Yifan Lu, Jing Li , Fangming Liu, Meishan Zhang, Yequan Wang, Daojing He, Honghai LIU, Min Zhang
ACL-25- Findings of the 63rd Annual Meeting of the Association for Computational Linguistics, 2025
PDF Code Abstract BibTex

Semantic role labeling (SRL) is a crucial task of natural language processing (NLP). Although generative decoder-based large language models (LLMs) have achieved remarkable success across various NLP tasks, they still lag behind state-of-the-art encoder-decoder (BERT-like) models in SRL. In this work, we seek to bridge this gap by equipping LLMs for SRL with two mechanisms: (a) retrieval-augmented generation and (b) self-correction. The first mechanism enables LLMs to leverage external linguistic knowledge such as predicate and argument structure descriptions, while the second allows LLMs to identify and correct inconsistent SRL outputs. We conduct extensive experiments on three widely-used benchmarks of SRL (CPB1.0, CoNLL-2009, and CoNLL-2012). Results demonstrate that our method achieves state-of-the-art performance in both Chinese and English, marking the first successful application of LLMs to surpass encoder-decoder approaches in SRL.

@inproceedings{yigeng_acl25, title={Reflection on Knowledge Graph for Large Language Models Reasoning}, author = {Yigeng Zhou and Wu Li and Yifan Lu and Jing Li and Fangming Liu and Meishan Zhang and Yequan Wang and Daojing He and Honghai LIU and Min Zhang}, booktitle = {Findings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL)}, year={2025} }

Adaptive Detoxification: Safeguarding General Capabilities of LLMs through Toxicity-Aware Knowledge Editing
Yifan Lu, Jing Li , Yigeng Zhou, Yihui Zhang, Wenya Wang, Xiucheng Li, Meishan Zhang, Fangming Liu, Jun Yu, Min Zhang
ACL-25- Findings of the 63rd Annual Meeting of the Association for Computational Linguistics, 2025
PDF Code Abstract BibTex

Large language models (LLMs) exhibit impressive language capabilities but remain vulnerable to malicious prompts and jailbreaking attacks. Existing knowledge editing methods for LLM detoxification face two major challenges. First, they often rely on entity-specific localization, making them ineffective against adversarial inputs without explicit entities. Second, these methods suffer from over-editing, where detoxified models reject legitimate queries, compromising overall performance. In this paper, we propose ToxEdit, a toxicity-aware knowledge editing approach that dynamically detects toxic activation patterns during forward propagation. It then routes computations through adaptive inter-layer pathways to mitigate toxicity effectively. This design ensures precise toxicity mitigation while preserving LLMs' general capabilities. To more accurately assess over-editing, we also enhance the SafeEdit benchmark by incorporating instruction-following evaluation tasks. Experimental results on multiple LLMs demonstrate that our ToxEdit outperforms previous state-of-the-art methods in both detoxification performance and safeguarding general capabilities of LLMs.

@inproceedings{yifan_acl25, title={Adaptive Detoxification: Safeguarding General Capabilities of LLMs through Toxicity-Aware Knowledge Editing}, author = {Yifan Lu and Jing Li and Yigeng Zhou and Yihui Zhang and Wenya Wang and Xiucheng Li and Meishan Zhang and Fangming Liu and Jun Yu and Min Zhang}, booktitle = {Findings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL)}, year={2025} }

LLMs Can Also Do Well! Breaking Barriers in Semantic Role Labeling via Large Language Models
Xinxin Li, Huiyao Chen, Chengjun Liu, Jing Li, Meishan Zhang, Jun Yu, Min Zhang
ACL-25- Findings of the 63rd Annual Meeting of the Association for Computational Linguistics, 2025
PDF Code Abstract BibTex

Semantic role labeling (SRL) is a crucial task of natural language processing (NLP). Although generative decoder-based large language models (LLMs) have achieved remarkable success across various NLP tasks, they still lag behind state-of-the-art encoder-decoder (BERT-like) models in SRL. In this work, we seek to bridge this gap by equipping LLMs for SRL with two mechanisms: (a) retrieval-augmented generation and (b) self-correction. The first mechanism enables LLMs to leverage external linguistic knowledge such as predicate and argument structure descriptions, while the second allows LLMs to identify and correct inconsistent SRL outputs. We conduct extensive experiments on three widely-used benchmarks of SRL (CPB1.0, CoNLL-2009, and CoNLL-2012). Results demonstrate that our method achieves state-of-the-art performance in both Chinese and English, marking the first successful application of LLMs to surpass encoder-decoder approaches in SRL.

@inproceedings{xinxin_acl25, title={LLMs Can Also Do Well! Breaking Barriers in Semantic Role Labeling via Large Language Models}, author = {Xinxin Li and Huiyao Chen and Chengjun Liu and Jing Li and Meishan Zhang and Jun Yu and Min Zhang}, booktitle = {Findings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL)}, year={2025} }

ProjectEval: A Benchmark for Programming Agents Automated Evaluation on Project-Level Code Generation
Kaiyuan Liu, Youcheng Pan, Yang Xiang, Daojing He, Jing Li, Yexing Du, Tianrun Gao
ACL-25- Findings of the 63rd Annual Meeting of the Association for Computational Linguistics, 2025
PDF Code Abstract BibTex

Recently, LLM agents have made rapid progress in improving their programming capabilities. However, existing benchmarks lack the ability to automatically evaluate from users' perspective, and also lack the explainability of the results of LLM agents' code generation capabilities. Thus, we introduce ProjectEval, a new benchmark for LLM agents project-level code generation's automated evaluation by simulating user interaction. ProjectEval is constructed by LLM with human reviewing. It has three different level inputs of natural languages or code skeletons. ProjectEval can evaluate the generated projects by user interaction simulation for execution, and by code similarity through existing objective indicators. Through ProjectEval, we find that systematic engineering project code, overall understanding of the project and comprehensive analysis capability are the keys for LLM agents to achieve practical projects. Our findings and benchmark provide valuable insights for developing more effective programming agents that can be deployed in future real-world production.

@inproceedings{kaiyuan_acl25, title={ProjectEval: A Benchmark for Programming Agents Automated Evaluation on Project-Level Code Generation}, author = {Kaiyuan Liu and Youcheng Pan and Yang Xiang and Daojing He and Jing Li and Yexing Du and Tianrun Gao}, booktitle = {Findings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL)}, year={2025} }

Knowledge Editing with Dynamic Knowledge Graphs for Multi-hop Question Answering
Yifan Lu, Yigeng Zhou, Jing Li , Yequan Wang, Xuebo Liu, Daojing He, Fangming Liu, Min Zhang
AAAI-25- The Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025.
PDF Code Abstract BibTex

Multi-hop question answering (MHQA) poses a significant challenge for large language models (LLMs) due the extensive knowledge demands involved. Knowledge editing, which aims to precisely modify the LLMs to incorporate specific knowledge without negatively impacting other unrelated knowledge, offers a potential solution for addressing MHQA challenges with LLMs. However, current solutions struggle to effectively resolve issues of knowledge conflicts. Most parameter-preserving editing methods are hindered by inaccurate retrieval and overlook secondary editing issues, which can introduce noise into the reasoning process of LLMs. In this paper, we introduce KEDKG, a novel knowledge editing method that leverages a dynamic knowledge graph for MHQA, designed to ensure the reliability of answers. KEDKG involves two primary steps: dynamic knowledge graph construction and knowledge graph augmented generation. Initially, KEDKG autonomously constructs a dynamic knowledge graph to store revised information while resolving potential knowledge conflicts. Subsequently, it employs a fine-grained retrieval strategy coupled with an entity and relation detector to enhance the accuracy of graph retrieval for LLM generation. Experimental results on benchmarks show that KEDKG surpasses previous state-of-the-art models, delivering more accurate and reliable answers in environments with dynamic information.

@inproceedings{kedkg_aaai_25, title={Knowledge Editing with Dynamic Knowledge Graphs for Multi-hop Question Answering}, author = {Yifan Lu and Yigeng Zhou and Jing Li and Yequan Wang and Xuebo Liu and Daojing He and Fangming Liu andMin Zhang}, booktitle = {The Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI)}, year={2025} }

Impromptu Cybercrime Euphemism Detection
Xiang Li, Yucheng Zhou, Laiping Zhao, Jing Li , Fangming Liu
COLING-25- The 31st International Conference on Computational Linguistics, 2025.
PDF Abstract BibTex

Detecting euphemisms is essential for content security on various social media platforms, but existing methods designed for detecting euphemisms are ineffective in impromptu euphemisms. In this work, we make a first attempt to an exploration of impromptu euphemism detection and introduce the Impromptu Cybercrime Euphemisms Detection (ICED) dataset. Moreover, we propose a detection framework tailored to this problem, which employs context augmentation modeling and multi-round iterative training. Our detection framework mainly consists of a coarse-grained and a fine-grained classification model. The coarse-grained classification model removes most of the harmless content in the corpus to be detected. The fine-grained model, impromptu euphemisms detector, integrates context augmentation and multi-round iterations training to better predicts the actual meaning of a masked token. In addition, we leverage ChatGPT to evaluate the mode's capability. Experimental results demonstrate that our approach achieves a remarkable 76-fold improvement compared to the previous state-of-the-art euphemism detector.

@inproceedings{xianglicoling25, title={Impromptu Cybercrime Euphemism Detection}, author = {Xiang Li and Yucheng Zhou and Laiping Zhao and Jing Li and Fangming Liu}, booktitle = {The 31st International Conference on Computational Linguistics (COLING)}, year={2025} }

SMSMO: Learning to generate multimodal summary for scientific papers
Xinyi Zhong, Zusheng Tan, Shen Gao, Jing Li, Jiaxing Shen, Jingyu Ji, Jeff Tang, Billy Chiu
ELSEVIER KBS-25- Knowledge-Based Systems, Volume 310, 2025
PDF Abstract BibTex

Nowadays, publishers like Elsevier increasingly use graphical abstracts (i.e., a pictorial paper summary) along with textual abstracts to facilitate scientific paper readings. In such a case, automatically identifying a representative image and generating a suitable textual summary for individual papers can help editors and readers save time, facilitating them in reading and understanding papers. To tackle the case, we introduce the dataset for Scientific Multimodal Summarization with Multimodal Output (SMSMO). Unlike other multimodal tasks which performed on generic, medium-size contents (e.g., news), SMSMO needs to tackle longer multimodal contents in papers, with finer-grained multimodality interactions and semantic alignments between images and text. For this, we propose a cross-modality, multi-task learning summarizer (CMT-Sum). It captures the intra- and inter-modality interactions between images and text through a cross-fusion module; and models the finer-grained image–text semantic alignment by jointly generating the text summary, selecting the key image and matching the text and image. Extensive experiments conducted on two newly introduced datasets on the SMSMO task showcase our model’s effectiveness.

@article{zhong2024smsmo, title={SMSMO: Learning to generate multimodal summary for scientific papers}, author={Zhong, Xinyi and Tan, Zusheng and Gao, Shen and Li, Jing and Shen, Jiaxing and Ji, Jingyu and Tang, Jeff and Chiu, Billy}, journal={Knowledge-Based Systems}, pages={112908}, year={2024}, publisher={Elsevier} }

2024

Parameter Competition Balancing for Model Merging
Guodong Du, Junlin Lee, Jing Li , Runhua Jiang, Yifei Guo, Shuyang Yu, Hanting Liu, Sim Kuan Goh, Ho-Kin Tang, Daojing He, Min Zhang
NeurIPS-24- The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024.
PDF Code Abstract BibTex

While fine-tuning pretrained models has become common practice, these models often underperform outside their specific domains. Recently developed model merging techniques enable the direct integration of multiple models, each fine-tuned for distinct tasks, into a single model. This strategy promotes multitasking capabilities without requiring retraining on the original datasets. However, existing methods fall short in addressing potential conflicts and complex correlations between tasks, especially in parameter-level adjustments, posing a challenge in effectively balancing parameter competition across various tasks. This paper introduces an innovative technique named PCB-Merging (Parameter Competition Balancing), a lightweight and training-free technique that adjusts the coefficients of each parameter for effective model merging. PCB-Merging employs intra-balancing to gauge parameter significance within individual tasks and inter-balancing to assess parameter similarities across different tasks. Parameters with low importance scores are dropped, and the remaining ones are rescaled to form the final merged model. We assessed our approach in diverse merging scenarios, including cross-task, cross-domain, and cross-training configurations, as well as out-of-domain generalization. The experimental results reveal that our approach achieves substantial performance enhancements across multiple modalities, domains, model sizes, number of tasks, fine-tuning forms, and large language models, outperforming existing model merging methods.

@inproceedings{guodong24neurips, title={Parameter Competition Balancing for Model Merging}, author = {Guodong Du and Junlin Lee and Jing Li and Runhua Jiang and Yifei Guo and Shuyang Yu and Hanting Liu and Sim Kuan Goh and Ho-Kin Tang and Daojing He and Min Zhang}, booktitle = {The Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS)}, year={2024} }

Multimodal Reasoning with Multimodal Knowledge Graph
Junlin Lee, Yequan Wang, Jing Li and Min Zhang
ACL-24- The 62nd Annual Meeting of the Association for Computational Linguistics, 2024.
PDF Abstract BibTex

Multimodal reasoning with large language models (LLMs) often suffers from hallucinations and the presence of deficient or outdated knowledge within LLMs. Some approaches have sought to mitigate these issues by employing textual knowledge graphs, but their singular modality of knowledge limits comprehensive cross-modal understanding. In this paper, we propose the Multimodal Reasoning with Multimodal Knowledge Graph (MR-MKG) method, which leverages multimodal knowledge graphs (MMKGs) to learn rich and semantic knowledge across modalities, significantly enhancing the multimodal reasoning capabilities of LLMs. In particular, a relation graph attention network is utilized for encoding MMKGs and a cross-modal alignment module is designed for optimizing image-text alignment. A MMKG-grounded dataset is constructed to equip LLMs with initial expertise in multimodal reasoning through pretraining. Remarkably, MR-MKG achieves superior performance while training on only a small fraction of parameters, approximately 2.25% of the LLM's parameter size. Experimental results on multimodal question answering and multimodal analogy reasoning tasks demonstrate that our MR-MKG method outperforms previous state-of-the-art models.

@inproceedings{junlin24acl, title={Multimodal Reasoning with Multimodal Knowledge Graph}, author = {Junlin Lee and Yequan Wang and Jing Li and Min Zhang}, booktitle = {The 62nd Annual Meeting of the Association for Computational Linguistics (ACL)}, year={2024} }

Knowledge Fusion By Evolving Weights of Language Models
Guodong Du, Jing Li , Hanting Liu, Runhua Jiang, Shuyang Yu, Yifei Guo, Sim Kuan Goh and Ho-Kin Tang
ACL-24- Findings of the 62nd Annual Meeting of the Association for Computational Linguistics, 2024.
PDF Code Abstract BibTex

Fine-tuning pre-trained language models, particularly large language models, demands extensive computing resources and can result in varying performance outcomes across different domains and datasets. This paper examines the approach of integrating multiple models from diverse training scenarios into a unified model. This unified model excels across various data domains and exhibits the ability to generalize well on out-of-domain data. We propose a knowledge fusion method named Evolver, inspired by evolutionary algorithms, which does not need further training or additional training data. Specifically, our method involves aggregating the weights of different language models into a population and subsequently generating offspring models through mutation and crossover operations. These offspring models are then evaluated against their parents, allowing for the preservation of those models that show enhanced performance on development datasets. Importantly, our model evolving strategy can be seamlessly integrated with existing model merging frameworks, offering a versatile tool for model enhancement. Experimental results on mainstream language models (i.e., encoder-only, decoder-only, encoder-decoder) reveal that Evolver outperforms previous state-of-the-art models by large margins. The code is publicly available at https://github.com/duguodong7/model-evolution.

@inproceedings{guodong24acl, title={Knowledge Fusion By Evolving Weights of Language Models}, author = {Guodong DU and Jing Li and Hanting Liu and Runhua Jiang and Shuyang Yu and Yifei Guo and Sim Kuan Goh and Ho-Kin Tang}, booktitle = {Findings of The 62nd Annual Meeting of the Association for Computational Linguistics (ACL)}, year={2024} }

Masked Structural Growth for 2x Faster Language Model Pre-training
Yiqun Yao, Zheng Zhang, Jing Li and Yequan Wang
ICLR-24- The Twelfth International Conference on Learning Representations, 2024.
PDF Code Abstract BibTex

Acceleration of large language model pre-training is a critical issue in present NLP research. In this paper, we focus on speeding up pre-training by progressively growing from a small Transformer structure to a large one. There are two main research problems related to progressive growth: growth schedule and growth operator. For growth schedule, existing work has explored multi-stage expansion of depth and feedforward layers. However, the impact of each dimension on the schedule's efficiency is still an open question. For growth operator, existing work relies on the initialization of new weights to inherit knowledge, and achieve only non-strict function preservation, limiting further optimization of training dynamics. To address these issues, we propose Masked Structural Growth (MSG), including growth schedules involving all possible dimensions and strictly function-preserving growth operators that is independent of the initialization of new weights. Experiments show that MSG is significantly faster than related work: we achieve a speed-up of 80% for Bert-base and 120% for Bert-large pre-training. Moreover, MSG is able to improve fine-tuning performances at the same time.

@inproceedings{yao24iclr, title={Masked Structural Growth for 2x Faster Language Model Pre-training}, author = {Yiqun Yao and Zheng Zhang and Jing Li and Yequan Wang}, booktitle = {The Twelfth International Conference on Learning Representations (ICLR)}, year={2024} }

Few-Shot Relation Extraction With Dual Graph Neural Network Interaction
Jing Li, Shanshan Feng and Billy Chiu
IEEE TNNLS-24- IEEE Transactions on Neural Networks and Learning Systems, 35(10): 14396-14408, 2024.
PDF Abstract BibTex

Recent advances in relation extraction with deep neural architectures have achieved excellent performance. However, current models still suffer from two main drawbacks: 1) they require enormous volumes of training data to avoid model overfitting and 2) there is a sharp decrease in performance when the data distribution during training and testing shift from one domain to the other. It is thus vital to reduce the data requirement in training and explicitly model the distribution difference when transferring knowledge from one domain to another. In this work, we concentrate on few-shot relation extraction under domain adaptation settings. Specifically, we propose, a novel graph neural network (GNN) based approach for few-shot relation extraction. leverages an edge-labeling dual graph (i.e. an instance graph and a distribution graph) to explicitly model the intraclass similarity and interclass dissimilarity in each individual graph, as well as the instance-level and distribution-level relations across graphs. A dual graph interaction mechanism is proposed to adequately fuse the information between the two graphs in a cyclic flow manner. We extensively evaluate on FewRel1.0 and FewRel2.0 benchmarks under four few-shot configurations. The experimental results demonstrate that can match or outperform previously published approaches. We also perform experiments to further investigate the parameter settings and architectural choices, and we offer a qualitative analysis.

@article{jing24dualgraph, title={Few-Shot Relation Extraction With Dual Graph Neural Network Interaction}, author={Li, Jing and Feng, Shanshan and Chiu, Billy}, journal={IEEE Transactions on Neural Networks and Learning Systems (TNNLS)}, volume = {35}, number = {10}, pages = {14396--14408}, year = {2024}, publisher={IEEE} }

Before 2023

Chain of Thought with Explicit Evidence Reasoning for Few-shot Relation Extraction
Xilai Ma, Jing Li and Min Zhang
EMNLP-23- Findings of The 2023 Conference on Empirical Methods in Natural Language Processing, 2023.
PDF Abstract BibTex

Few-shot relation extraction involves identifying the type of relationship between two specific entities within a text, using a limited number of annotated samples. A variety of solutions to this problem have emerged by applying meta-learning and neural graph techniques which typically necessitate a training process for adaptation. Recently, the strategy of in-context learning has been demonstrating notable results without the need of training. Few studies have already utilized in-context learning for zero-shot information extraction. Unfortunately, the evidence for inference is either not considered or implicitly modeled during the construction of chain-of-thought prompts. In this paper, we propose a novel approach for few-shot relation extraction using large language models, named CoT-ER, chain-of-thought with explicit evidence reasoning. In particular, CoT-ER first induces large language models to generate evidences using task-specific and concept-level knowledge. Then these evidences are explicitly incorporated into chain-of-thought prompting for relation extraction. Experimental results demonstrate that our CoT-ER approach (with 0% training data) achieves competitive performance compared to the fully-supervised (with 100% training data) state-of-the-art approach on the FewRel1.0 and FewRel2.0 datasets.

@inproceedings{DBLP:conf/emnlp/MaLZ23a, author = {Xilai Ma and Jing Li and Min Zhang}, title = {Chain of Thought with Explicit Evidence Reasoning for Few-shot Relation Extraction}, booktitle = {Findings of the Association for Computational Linguistics (EMNLP), pages = {2334--2352}, year = {2023}, url = {https://aclanthology.org/2023.findings-emnlp.153}, }

Rethinking Document-Level Relation Extraction: A Reality Check
Jing Li, Yequan Wang, Shuai Zhang and Min Zhang
ACL-23- Findings of The 61st Annual Meeting of the Association for Computational Linguistics, 2023.
PDF Abstract BibTex

Recently, numerous efforts have continued to push up performance boundaries of document-level relation extraction (DocRE) and have claimed significant progress in DocRE. In this paper, we do not aim at proposing a novel model for DocRE. Instead, we take a closer look at the field to see if these performance gains are actually true. By taking a comprehensive literature review and a thorough examination of popular DocRE datasets, we find that these performance gains are achieved upon a strong or even untenable assumption in common: all named entities are perfectly localized, normalized, and typed in advance. Next, we construct four types of entity mention attacks to examine the robustness of typical DocRE models by behavioral probing. We also have a close check on model usability in a more realistic setting. Our findings reveal that most of current DocRE models are vulnerable to entity mention attacks and difficult to be deployed in real-world end-user NLP applications. Our study calls more attentions for future research to stop simplifying problem setups, and to model DocRE in the wild rather than in an unrealistic Utopian world.

@inproceedings{li2023rethinking, title={Rethinking Document-Level Relation Extraction: A Reality Check}, author={Li, Jing and Wang, Yequan and Zhang, Shuai and Zhang, Min}, pages= {5715--5730}, booktitle = {Findings of The 61st Annual Meeting of the Association for Computational Linguistics (ACL)}, year={2023} }

Few-Shot Named Entity Recognition via Meta-Learning (Extended Abstract)
Jing Li, Billy Chiu, Shanshan Feng and Hao Wang
ICDE-23- The 39th IEEE International Conference on Data Engineering, 2023.
PDF Abstract BibTex

toupdate

toupdate
A Survey on Deep Learning for Named Entity Recognition (Extended Abstract)
Jing Li, Aixin Sun, Jianglei Han and Chenliang Li
ICDE-23- The 39th IEEE International Conference on Data Engineering, 2023.
PDF Abstract BibTex

toupdate

toupdate
GRLSTM: Trajectory Similarity Computation with Graph-based Residual LSTM
Silin Zhou, Jing Li, Hao Wang, Shuo Shang, Peng Han
AAAI-23- The Thirty-Seventh AAAI Conference on Artificial Intelligence.
PDF Abstract BibTex

The computation of trajectory similarity is a crucial task in many spatial data analysis applications. However, existing methods have been designed primarily for trajectories in Euclidean space, which overlooks the fact that real-world trajectories are often generated on road networks. This paper addresses this gap by proposing a novel framework, called GRLSTM (Graph-based Residual LSTM). To jointly capture the properties of trajectories and road networks, the proposed framework incorporates knowledge graph embedding (KGE), graph neural network (GNN), and the residual network into the multi-layer LSTM (Residual-LSTM). Specifically, the framework constructs a point knowledge graph to study the multi-relation of points, as points may belong to both the trajectory and the road network. KGE is introduced to learn point embeddings and relation embeddings to build the point fusion graph, while GNN is used to capture the topology structure information of the point fusion graph. Finally, Residual-LSTM is used to learn the trajectory embeddings.To further enhance the accuracy and robustness of the final trajectory embeddings, we introduce two new neighbor-based point loss functions, namely, graph-based point loss function and trajectory-based point loss function. The GRLSTM is evaluated using two real-world trajectory datasets, and the experimental results demonstrate that GRLSTM outperforms all the state-of-the-art methods significantly.

@inproceedings{DBLP:conf/aaai/Zhou0WS023, author = {Silin Zhou and Jing Li and Hao Wang and Shuo Shang and Peng Han}, editor = {Brian Williams and Yiling Chen and Jennifer Neville}, title = {{GRLSTM:} Trajectory Similarity Computation with Graph-Based Residual {LSTM}}, booktitle = {Thirty-Seventh {AAAI} Conference on Artificial Intelligence, {AAAI} 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, {IAAI} 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence}, pages = {4972--4980}, year = {2023}, }

Sequence Labeling with Meta-Learning
Jing Li, Peng Han, Xiangnan Ren, Jilin Hu, Lisi Chen and Shuo Shang
IEEE TKDE-23- IEEE Transactions on Knowledge and Data Engineering, 35(3): 3072-3086, 2023.
PDF Abstract BibTex

Recent neural architectures in sequence labeling have yielded state-of-the-art performance on single domain data such as newswires. However, they still suffer from (i) requiring massive amounts of training data to avoid overfitting; (ii) huge performance degradation when there is a domain shift in the data distribution between training and testing. In this paper, we investigate the problem of domain adaptation for sequence labeling under homogeneous and heterogeneous settings. We propose MetaSeq, a novel meta-learning approach for domain adaptation in sequence labeling. Specifically, MetaSeq incorporates meta-learning and adversarial training strategies to encourage robust, general and transferable representations for sequence labeling. The key advantage of MetaSeq is that it is capable of adapting to new unseen domains with a small amount of annotated data from those domains. We extensively evaluate MetaSeq on named entity recognition, part-of-speech tagging and slot filling tasks under homogeneous and heterogeneous settings. The experimental results show that MetaSeq achieves state-of-the-art performance against eight baselines. Impressively, MetaSeq surpasses the in-domain performance using only 16.17% and 7% of target domain data on average for homogeneous settings, and 34.76%, 24%, 22.5% of target domain data on average for heterogeneous settings.

@article{jing23seq, author = {Jing Li and Peng Han and Xiangnan Ren and Jilin Hu and Lisi Chen and Shuo Shang}, title = {Sequence Labeling with Meta-Learning}, journal = {IEEE Transactions on Knowledge and Data Engineering (TKDE)}, volume = {35}, number = {3}, pages = {3072--3086}, year = {2023}, url = {https://doi.org/10.1109/TKDE.2021.3118469}, doi = {10.1109/TKDE.2021.3118469}, }

A Dual-Channel Framework for Sarcasm Recognition by Detecting Sentiment Conflict
Yiyi Liu, Yequan Wang, Aixin Sun, Xuying Meng, Jing Li, Jiafeng Guo
NAACL-22- Findings of 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics.
PDF Abstract BibTex

Sarcasm employs ambivalence, where one says something positive but actually means negative, and vice versa. The essence of sarcasm, which is also a sufficient and necessary condition, is the conflict between literal and implied sentiments expressed in one sentence. However, it is difficult to recognize such sentiment conflict because the sentiments are mixed or even implicit. As a result, the recognition of sophisticated and obscure sentiment brings in a great challenge to sarcasm detection. In this paper, we propose a DualChannel Framework by modeling both literal and implied sentiments separately. Based on this dual-channel framework, we design the Dual-Channel Network (DC-Net) to recognize sentiment conflict. Experiments on political debates (i.e., IAC-V1 and IAC-V2) and Twitter datasets show that our proposed DC-Net achieves state-of-the-art performance on sarcasm recognition. Our code is released to support research https://github.com/yiyi-ict/dual-channel-for-sarcasm.

@inproceedings{DBLP:conf/naacl/LiuWSMLG22, author = {Yiyi Liu and Yequan Wang and Aixin Sun and Xuying Meng and Jing Li and Jiafeng Guo}, editor = {Marine Carpuat and Marie{-}Catherine de Marneffe and Iv{\'{a}}n Vladimir Meza Ru{\'{\i}}z}, title = {A Dual-Channel Framework for Sarcasm Recognition by Detecting Sentiment Conflict}, booktitle = {Findings of the Association for Computational Linguistics: {NAACL} 2022, Seattle, WA, United States, July 10-15, 2022}, pages = {1670--1680}, publisher = {Association for Computational Linguistics}, year = {2022}, url = {https://doi.org/10.18653/v1/2022.findings-naacl.126}, doi = {10.18653/v1/2022.findings-naacl.126}, timestamp = {Tue, 31 Jan 2023 17:06:57 +0100}, biburl = {https://dblp.org/rec/conf/naacl/LiuWSMLG22.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }

Interactive Information Extraction by Semantic Information Graph
Siqi Fan, Yequan Wang, Jing Li, Zheng Zhang, Shuo Shang, Peng Han
IJCAI-ECAI-22- The 31st International Joint Conference on Artificial Intelligence and the 25th European Conference on Artificial Intelligence, 2022. Acceptance rate: 15%.
PDF Abstract BibTex

Information extraction (IE) mainly focuses on three highly correlated subtasks, i.e., entity extraction, relation extraction and event extraction. Recently, there are studies using Abstract Meaning Representation (AMR) to utilize the intrinsic correlations among these three subtasks. AMR based models are capable of building the relationship of arguments. However, they are hard to deal with relations. In addition, the noises of AMR (i.e., tags unrelated to IE tasks, nodes with unconcerned conception, and edge types with complicated hierarchical structures) disturb the decoding processing of IE. As a result, the decoding processing limited by the AMR cannot be worked effectively. To overcome the shortages, we propose an Interactive Information Extraction (InterIE) model based on a novel Semantic Information Graph (SIG). SIG can guide our InterIE model to tackle the three subtasks jointly. Furthermore, the well-designed SIG without noise is capable of enriching entity and event trigger representation, and capturing the edge connection between the information types. Experimental results show that our InterIE achieves state-of-the-art performance on all IE subtasks on the benchmark dataset (i.e., ACE05-E+ and ACE05-E). More importantly, the proposed model is not sensitive to the decoding order, which goes beyond the limitations of AMR based methods.

@inproceedings{DBLP:conf/ijcai/FanWLZSH22, author = {Siqi Fan and Yequan Wang and Jing Li and Zheng Zhang and Shuo Shang and Peng Han}, editor = {Luc De Raedt}, title = {Interactive Information Extraction by Semantic Information Graph}, booktitle = {Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, {IJCAI} 2022, Vienna, Austria, 23-29 July 2022}, pages = {4100--4106}, publisher = {ijcai.org}, year = {2022}, url = {https://doi.org/10.24963/ijcai.2022/569}, doi = {10.24963/ijcai.2022/569}, timestamp = {Wed, 27 Jul 2022 16:43:00 +0200}, biburl = {https://dblp.org/rec/conf/ijcai/FanWLZSH22.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }

FOGS: First-Order Gradient Supervision with Learning-based Graph for Traffic Flow Forecasting
Xuan Rao, Hao Wang, Shuo Shang, Liang Zhang, Jing Li, Peng Han
IJCAI-ECAI-22- The 31st International Joint Conference on Artificial Intelligence and the 25th European Conference on Artificial Intelligence, 2022. Acceptance rate: 15%.
PDF Abstract BibTex

Traffic flow forecasting plays a vital role in the transportation domain. Existing studies usually manually construct correlation graphs and design sophisticated models for learning spatial and temporal features to predict future traffic states. However, manually constructed correlation graphs cannot accurately extract the complex patterns hidden in the traffic data. In addition, it is challenging for the prediction model to fit traffic data due to its irregularly-shaped distribution. To solve the above-mentioned problems, in this paper, we propose a novel learning-based method to learn a spatial-temporal correlation graph, which could make good use of the traffic flow data. Moreover, we propose First-Order Gradient Supervision (FOGS), a novel method for traffic flow forecasting. FOGS utilizes first-order gradients, rather than specific flows, to train prediction model, which effectively avoids the problem of fitting irregularly-shaped distributions. Comprehensive numerical evaluations on four real-world datasets reveal that the proposed methods achieve state-of-the-art performance and significantly outperform the benchmarks.

@inproceedings{DBLP:conf/ijcai/RaoWZLS022, author = {Xuan Rao and Hao Wang and Liang Zhang and Jing Li and Shuo Shang and Peng Han}, editor = {Luc De Raedt}, title = {{FOGS:} First-Order Gradient Supervision with Learning-based Graph for Traffic Flow Forecasting}, booktitle = {Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, {IJCAI} 2022, Vienna, Austria, 23-29 July 2022}, pages = {3926--3932}, publisher = {ijcai.org}, year = {2022}, url = {https://doi.org/10.24963/ijcai.2022/545}, doi = {10.24963/ijcai.2022/545}, timestamp = {Sun, 02 Oct 2022 16:08:04 +0200}, biburl = {https://dblp.org/rec/conf/ijcai/RaoWZLS022.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }

A Survey on Deep Learning for Named Entity Recognition
Jing Li, Aixin Sun, Jianglei Han and Chenliang Li
IEEE TKDE-22- IEEE Transactions on Knowledge and Data Engineering, 34(1): 50-70, 2022.
PDF Abstract BibTex

Named entity recognition (NER) is the task to identify text spans that mention named entities, and to classify them into predefined categories such as person, location, organization etc. NER serves as the basis for a variety of natural language applications such as question answering, text summarization, and machine translation. Although early NER systems are successful in producing decent recognition accuracy, they often require much human effort in carefully designing rules or features. In recent years, deep learning, empowered by continuous real-valued vector representations and semantic composition through nonlinear processing, has been employed in NER systems, yielding state-of-the-art performance. In this paper, we provide a comprehensive review on existing deep learning techniques for NER. We first introduce NER resources, including tagged NER corpora and off-the-shelf NER tools. Then, we systematically categorize existing works based on a taxonomy along three axes: distributed representations for input, context encoder, and tag decoder. Next, we survey the most representative methods for recent applied techniques of deep learning in new NER problem settings and applications. Finally, we present readers with the challenges faced by NER systems and outline future directions in this area.

@article{jing22nersurvey, author = {Jing Li and Aixin Sun and Jianglei Han and Chenliang Li}, title = {A Survey on Deep Learning for Named Entity Recognition}, journal = {IEEE Transactions on Knowledge and Data Engineering (TKDE)}, volume = {34}, number = {1}, pages = {50--70}, year = {2022}, url = {https://doi.org/10.1109/TKDE.2020.2981314}, doi = {10.1109/TKDE.2020.2981314}, }

Neural Text Segmentation and Its Application to Sentiment Analysis
Jing Li, Billy Chiu, Shuo Shang and Ling Shao
IEEE TKDE-22- IEEE Transactions on Knowledge and Data Engineering, 34(2): 828-842, 2022.
PDF Abstract BibTex Demo

Text segmentation is a fundamental task in natural language processing. Depending on the levels of granularity, the task can be defined as segmenting a document into topical segments, or segmenting a sentence into elementary discourse units (EDUs). Traditional solutions to the two tasks heavily rely on carefully designed features. The recently proposed neural models do not need manual feature engineering, but they either suffer from sparse boundary tags or cannot efficiently handle the issue of variable size output vocabulary. In light of such limitations, we propose a generic end-to-end segmentation model, namely SEGBOT, which first uses a bidirectional recurrent neural network to encode an input text sequence. SEGBOT then uses another recurrent neural networks, together with a pointer network, to select text boundaries in the input sequence. In this way, SEGBOT does not require any hand-crafted features. More importantly, SEGBOT inherently handles the issue of variable size output vocabulary and the issue of sparse boundary tags. In our experiments, SEGBOT outperforms state-of-the-art models on two tasks: document-level topic segmentation and sentence-level EDU segmentation. As a downstream application, we further propose a hierarchical attention model for sentence-level sentiment analysis based on the outcomes of SEGBOT. The hierarchical model can make full use of both word-level and EDU-level information simultaneously for sentence-level sentiment analysis. In particular, it can effectively exploit EDU-level information, such as the inner properties of EDUs, which cannot be fully encoded in word-level features. Experimental results show that our hierarchical model achieves new state-of-the-art results on the Movie Review and Stanford Sentiment Treebank benchmarks.

@article{li22segsenti, author = {Jing Li and Billy Chiu and Shuo Shang and Ling Shao}, title = {Neural Text Segmentation and Its Application to Sentiment Analysis}, journal = {IEEE Transactions on Knowledge and Data Engineering (TKDE)}, volume = {34}, number = {2}, pages = {828--842}, year = {2022}, url = {https://doi.org/10.1109/TKDE.2020.2983360}, doi = {10.1109/TKDE.2020.2983360}, }

Few-Shot Named Entity Recognition via Meta-Learning
Jing Li, Billy Chiu, Shanshan Feng and Hao Wang
IEEE TKDE-22- IEEE Transactions on Knowledge and Data Engineering, 34(9): 4245-4256, 2022.
PDF Abstract BibTex

Few-shot learning under the N-way K-shot setting (i.e., K annotated samples for each of N classes) has been widely studied in relation extraction (e.g., FewRel) and image classification (e.g., Mini-ImageNet). Named entity recognition (NER) is typically framed as a sequence labeling problem where the entity classes are inherently entangled together because the entity number and classes in a sentence are not known in advance, leaving the N-way K-shot NER problem so far unexplored. In this paper, we first formally define a more suitable N-way K-shot setting for NER. Then we propose FewNER, a novel meta-learning approach for few-shot NER. FewNER separates the entire network into a task-independent part and a task-specific part. During training in FewNER, the task-independent part is meta-learned across multiple tasks and a task-specific part is learned for each single task in a low-dimensional space. At test time, FewNER keeps the task-independent part fixed and adapts to a new task via gradient descent by updating only the task-specific part, resulting in it being less prone to overfitting and more computationally efficient. The results demonstrate that FewNER achieves state-of-the-art performance against nine baseline methods by significant margins on three adaptation experiments.

@article{li20fewshot, author = {Jing Li and Billy Chiu and Shanshan Feng and Hao Wang}, title = {Few-Shot Named Entity Recognition via Meta-Learning}, journal = {IEEE Transactions on Knowledge and Data Engineering (TKDE)}, volume = {34}, number = {9}, pages = {4245--4256}, year = {2022}, url = {https://doi.org/10.1109/TKDE.2020.3038670}, doi = {10.1109/TKDE.2020.3038670}, }

Neural Named Entity Boundary Detection
Jing Li, Aixin Sun and Yukun Ma
IEEE TKDE-21- IEEE Transactions on Knowledge and Data Engineering, 33(4): 1790-1795, 2021.
PDF Abstract BibTex Demo

In this paper, we focus on named entity boundary detection , which is to detect the start and end boundaries of an entity mention in text, without predicting its type. The detected entities are input to entity linking or fine-grained typing systems for semantic enrichment. We propose BdryBot , a recurrent neural network encoder-decoder framework with a pointer network to detect entity boundaries from a given sentence. The encoder considers both character-level representations and word-level embeddings to represent the input words. In this way, BdryBot does not require any hand-crafted features. Because of the pointer network, BdryBot overcomes the problem of variable size output vocabulary and the issue of sparse boundary tags. We conduct two sets of experiments, in-domain detection and cross-domain detection, on six datasets. Our results show that BdryBot achieves state-of-the-art performance against five baselines. In addition, our proposed approach can be further enhanced when incorporating contextualized language embeddings into token representations.

@article{li21bdrybot, author = {Jing Li and Aixin Sun and Yukun Ma}, title = {Neural Named Entity Boundary Detection}, journal = {IEEE Transactions on Knowledge and Data Engineering (TKDE)}, volume = {33}, number = {4}, pages = {1790--1795}, year = {2021}, url = {https://doi.org/10.1109/TKDE.2020.2981329}, doi = {10.1109/TKDE.2020.2981329}, }

Domain Generalization for Named Entity Boundary Detection via Meta-Learning
Jing Li, Shuo Shang and Lisi Chen
IEEE TNNLS-21- IEEE Transactions on Neural Networks and Learning Systems, 32(9): 3819-3830, 2021.
PDF Abstract BibTex

Named entity recognition (NER) aims to recognize mentions of rigid designators from text belonging to predefined semantic types, such as person, location, and organization. In this article, we focus on a fundamental subtask of NER, named entity boundary detection, which aims at detecting the start and end boundaries of an entity mention in the text, without predicting its semantic type. The entity boundary detection is essentially a sequence labeling problem. Existing sequence labeling methods either suffer from sparse boundary tags (i.e., entities are rare and nonentities are common) or they cannot well handle the issue of variable size output vocabulary (i.e., need to retrain models with respect to different vocabularies). To address these two issues, we propose a novel entity boundary labeling model that leverages pointer networks to effectively infer boundaries depending on the input sequence. On the other hand, training models on source domains that generalize to new target domains at the test time are a challenging problem because of the performance degradation. To alleviate this issue, we propose METABDRY, a novel domain generalization approach for entity boundary detection without requiring any access to target domain information. Especially, adversarial learning is adopted to encourage domain-invariant representations. Meanwhile, metalearning is used to explicitly simulate a domain shift during training so that metaknowledge from multiple resource domains can be effectively aggregated. As such, METABDRY explicitly optimizes the capability of ``learning to generalize,'' resulting in a more general and robust model to reduce the domain discrepancy. We first conduct experiments to demonstrate the effectiveness of our novel boundary labeling model. We then extensively evaluate METABDRY on eight data sets under domain generalization settings. The experimental results show that METABDRY achieves state-of-the-art results against the recent seven baselines.

@article{li21domaingen, author = {Jing Li and Shuo Shang and Lisi Chen}, title = {Domain Generalization for Named Entity Boundary Detection via Metalearning}, journal = {IEEE Transactions on Neural Networks and Learning Systems (TNNLS)}, volume = {32}, number = {9}, pages = {3819--3830}, year = {2021}, url = {https://doi.org/10.1109/TNNLS.2020.3015912}, doi = {10.1109/TNNLS.2020.3015912}, }

Leveraging Official Content and Social Context to Recommend Software Documentation
Jing Li, Zhenchang Xing and Muhammad Ashad Kabir
IEEE TSC-21- IEEE Transactions on Services Computing, 14(2), 472-486, 2021.
PDF Abstract BibTex

For an unfamiliar Application Programming Interface (API), software developers often access the official documentation to learn its usage, and post questions related to this API on social question and answering (Q&A) sites to seek solutions. The official software documentation often captures the information about functionality and parameters, but lacks detailed descriptions in different usage scenarios. On the contrary, the discussions about APIs on social Q&A sites provide enriching usages. Moreover, existing code search engines and information retrieval systems cannot effectively return relevant software documentation when the issued query does not contain code snippets or API-like terms. In this paper, we present CnCxL2R , a software documentation recommendation strategy incorporating the content of official documentation and the social context on Q&A into a learning-to-rank schema. In the proposed strategy, the content, local context and global context of documentation are considered to select candidate documents. Then four types of features are extracted to learn a ranking model. We conduct a large-scale automatic evaluation on Java documentation recommendation. The results show that CnCxL2R achieves state-of-the-art performance over the eight baseline models. We also compare the CnCxL2R with Google search. The results show that CnCxL2R can recommend more relevant software documentation, and can effectively capture the semantic between the high-level intent in developers’ queries and the low-level implementation in software documentation.

@article{TSCLiXK21, author = {Jing Li and Zhenchang Xing and Muhammad Ashad Kabir}, title = {Leveraging Official Content and Social Context to Recommend Software Documentation}, journal = {{IEEE} Trans. Serv. Comput.}, volume = {14}, number = {2}, pages = {472--486}, year = {2021}, url = {https://doi.org/10.1109/TSC.2018.2812729}, doi = {10.1109/TSC.2018.2812729}, }

HME: A Hyperbolic Metric Embedding Approach for Next-POI Recommendation
Shanshan Feng, Lucas Vinh Tran, Gao Cong, Lisi Chen, Jing Li and Fan Li
SIGIR-20- The 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2020. Acceptance rate: 147/555 (26%).
PDF Abstract BibTex

With the increasing popularity of location-aware social media services, next-Point-of-Interest (POI) recommendation has gained significant research interest. The key challenge of next-POI recommendation is to precisely learn users' sequential movements from sparse check-in data. To this end, various embedding methods have been proposed to learn the representations of check-in data in the Euclidean space. However, their ability to learn complex patterns, especially hierarchical structures, is limited by the dimensionality of the Euclidean space. To this end, we propose a new research direction that aims to learn the representations of check-in activities in a hyperbolic space, which yields two advantages. First, it can effectively capture the underlying hierarchical structures, which are implied by the power-law distributions of user movements. Second, it provides high representative strength and enables the check-in data to be effectively represented in a low-dimensional space. Specifically, to solve the next-POI recommendation task, we propose a novel hyperbolic metric embedding (HME) model, which projects the check-in data into a hyperbolic space. The HME jointly captures sequential transition, user preference, category and region information in a unified approach by learning embeddings in a shared hyperbolic space. To the best of our knowledge, this is the first study to explore a non-Euclidean embedding model for next-POI recommendation. We conduct extensive experiments on three check-in datasets to demonstrate the superiority of our hyperbolic embedding approach over the state-of-the-art next-POI recommendation algorithms. Moreover, we conduct experiments on another four online transaction datasets for next-item recommendation to further demonstrate the generality of our proposed model.

@inproceedings{DBLP:conf/sigir/FengTCCLL20, author = {Shanshan Feng and Lucas Vinh Tran and Gao Cong and Lisi Chen and Jing Li and Fan Li}, title = {{HME:} {A} Hyperbolic Metric Embedding Approach for Next-POI Recommendation}, booktitle = {Proceedings of the 43rd International {ACM} {SIGIR} conference on research and development in Information Retrieval (SIGIR)}, pages = {1429--1438}, publisher = {{ACM}}, year = {2020}, url = {https://doi.org/10.1145/3397271.3401049}, doi = {10.1145/3397271.3401049}, }

Contextualized Point-of-Interest Recommendation
Peng Han, Zhongxiao Li, Yong Liu, Peilin Zhao, Jing Li, Hao Wang and Shuo Shang
IJCAI-PRICAI-20- The 29th International Joint Conference on Artificial Intelligence and the 17th Pacific Rim International Conference on Artificial Intelligence, 2020. Acceptance rate: 592/4717 (12.6%).
PDF Abstract BibTex

Point-of-interest (POI) recommendation has become an increasingly important sub-field of recommendation system research. Previous methods employ various assumptions to exploit the contextual information for improving the recommendation accuracy. The common property among them is that similar users are more likely to visit similar POIs and similar POIs would like to be visited by the same user. However, none of existing methods utilize similarity explicitly to make recommendations. In this paper, we propose a new framework for POI recommendation, which explicitly utilizes similarity with contextual information. Specifically, we categorize the context information into two groups, i.e., global and local context, and develop different regularization terms to incorporate them for recommendation. A graph Laplacian regularization term is utilized to exploit the global context information. Moreover, we cluster users into different groups, and let the objective function constrain the users in the same group to have similar predicted POI ratings. An alternating optimization method is developed to optimize our model and get the final rating matrix. The results in our experiments show that our algorithm outperforms all the state-of-the-art methods.

@inproceedings{DBLP:conf/ijcai/HanLLZLWS20, author = {Peng Han and Zhongxiao Li and Yong Liu and Peilin Zhao and Jing Li and Hao Wang and Shuo Shang}, title = {Contextualized Point-of-Interest Recommendation}, booktitle = {Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI)}, pages = {2484--2490}, year = {2020}, url = {https://doi.org/10.24963/ijcai.2020/344}, doi = {10.24963/ijcai.2020/344}, }

MetaNER: Named Entity Recognition with Meta-Learning
Jing Li, Shuo Shang and Ling Shao
WWW-20- The Web Conference, 2020. Acceptance rate: 217/1129 (19.2%).
PDF Abstract BibTex

Recent neural architectures in named entity recognition (NER) have yielded state-of-the-art performance on single domain data such as newswires. However, they still suffer from (i) requiring massive amounts of training data to avoid overfitting; (ii) huge performance degradation when there is a domain shift in the data distribution between training and testing. In this paper, we investigate the problem of domain adaptation for NER under homogeneous and heterogeneous settings. We propose MetaNER, a novel meta-learning approach for domain adaptation in NER. Specifically, MetaNER incorporates meta-learning and adversarial training strategies to encourage robust, general and transferable representations for sequence labeling. The key advantage of MetaNER is that it is capable of adapting to new unseen domains with a small amount of annotated data from those domains. We extensively evaluate MetaNER on multiple datasets under homogeneous and heterogeneous settings. The experimental results show that MetaNER achieves state-of-the-art performance against eight baselines. Impressively, MetaNER surpasses the in-domain performance using only 16.17% and 34.76% of target domain data on average for homogeneous and heterogeneous settings, respectively.

@inproceedings{li20metaner, author = {Jing Li and Shuo Shang and Ling Shao}, title = {MetaNER: Named Entity Recognition with Meta-Learning}, booktitle = {The Web Conference 2020 (WWW)}, pages = {429--440}, year = {2020}, url = {https://doi.org/10.1145/3366423.3380127}, }

Pay Your Trip for Traffic Congestion: Dynamic Pricing in Traffic-Aware Road Networks
Lisi Chen, Shuo Shang, Bin Yao and Jing Li
AAAI-20- The Thirty-Fourth AAAI Conference on Artificial Intelligence. Acceptance rate: 1591/7737 (20.6%).
PDF Abstract BibTex

Pricing is essential in optimizing transportation resource allocation. Congestion pricing is widely used to reduce urban traffic congestion. We propose and investigate a novel Dynamic Pricing Strategy (DPS) to price travelers' trips in intelligent transportation platforms (e.g., DiDi, Lyft, Uber). The trips are charged according to their “congestion contributions” to global urban traffic systems. The dynamic pricing strategy retrieves a matching between n travelers' trips and the potential travel routes (each trip has k potential routes) to minimize the global traffic congestion. We believe that DPS holds the potential to benefit society and the environment, such as reducing traffic congestion and enabling smarter and greener transportation. The DPS problem is challenging due to its high computation complexity (there exist kn matching possibilities). We develop an efficient and effective approximate matching algorithm based on local search, as well as pruning techniques to further enhance the matching efficiency. The accuracy and efficiency of the dynamic pricing strategy are verified by extensive experiments on real datasets.

@inproceedings{DBLP:conf/aaai/ChenSYL20, author = {Lisi Chen and Shuo Shang and Bin Yao and Jing Li}, title = {Pay Your Trip for Traffic Congestion: Dynamic Pricing in Traffic-Aware Road Networks}, booktitle = {The Thirty-Fourth {AAAI} Conference on Artificial Intelligence (AAAI)}, pages = {582--589}, year = {2020}, url = {https://aaai.org/ojs/index.php/AAAI/article/view/5397}, }

Adversarial Transfer for Named Entity Boundary Detection with Pointer Networks
Jing Li, Deheng Ye and Shuo Shang
IJCAI-19- The 28th International Joint Conference on Artificial Intelligence, Pages 5053-5069, 2019. Acceptance rate: 850/4752 (17.9%).
PDF Abstract BibTex

In this paper, we focus on named entity boundary detection, which aims to detect the start and end boundaries of an entity mention in text, without predicting its type. A more accurate and robust detection approach is desired to alleviate error propagation in downstream applications, such as entity linking and fine-grained typing systems. Here, we first develop a novel entity boundary labeling approach with pointer networks, where the output dictionary size depends on the input, which is variable. Furthermore, we propose AT-Bdry, which incorporates adversarial transfer learning into an end-to-end sequence labeling model to encourage domain-invariant representations. More importantly, AT-Bdry can reduce domain difference in data distributions between the source and target domains, via an unsupervised transfer learning approach (i.e., no annotated target-domain data is necessary). We conduct Formal Text to Formal Text, Formal Text to Informal Text and ablation evaluations on five benchmark datasets. Experimental results show that AT-Bdry achieves state-of-the-art transferring performance against recent baselines.

@inproceedings{li19advt, author = {Jing Li and Deheng Ye andd Shuo Shang}, title = {Adversarial Transfer for Named Entity Boundary Detection with Pointer Networks}, booktitle = {Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI)}, pages = {5053--5059}, year = {2019}, url = {https://doi.org/10.24963/ijcai.2019/702}, }

Neural Discourse Segmentation
Jing Li
IJCAI-19- The 28th International Joint Conference on Artificial Intelligence, Pages 6539-6541, 2019. (Demo)
PDF Abstract BibTex

Identifying discourse structures and coherence relations in a piece of text is a fundamental task in natural language processing. The first step of this process is segmenting sentences into clause-like units called elementary discourse units (EDUs). Traditional solutions to discourse segmentation heavily rely on carefully designed features. In this demonstration, we present SEGBOT, a system to split a given piece of text into sequence of EDUs by using an end-to-end neural segmentation model. Our model does not require hand-crafted features or external knowledge except word embeddings, yet it outperforms state-of-the-art solutions to discourse segmentation.

@inproceedings{li19segdemo, author = {Jing Li}, title = {Neural Discourse Segmentation}, booktitle = {Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI)}, pages = {6539--6541}, year = {2019}, url = {https://doi.org/10.24963/ijcai.2019/949}, }

LinkLive: Discovering web learning resources for developers from Q&A discussions
Jing Li, Zhenchang Xing and Aixin Sun
WWWJ-19- World Wide Web. 22(4), Pages 1699-1725, Springer, 2019.
PDF Abstract BibTex

Software developers need access to correlated information (e.g., API documentation, Wikipedia pages, Stack Overflow questions and answers) which are often dispersed among different Web resources. This paper is concerned with the situation where a developer is visiting a Web page, but at the same time is willing to explore correlated Web resources to extend his/her knowledge or to satisfy his/her curiosity. Specifically, we present an item-based collaborative filtering technique, named LinkLive, for automatically recommending a list of correlated Web resources for a particular Web page. The recommendation is done by exploiting hyperlink associations from the crowdsourced knowledge on Stack Overflow. We motivate our research using an exploratory study of hyperlink dissemination patterns on Stack Overflow. We then present our LinkLive technique that uses multiple features, including hyperlink co-occurrences in Q&A discussions, locations (e.g., question, answer, or comment) in which hyperlinks are referenced, and votes for posts/comments in which hyperlinks are referenced. Experiments using 7 years of Stack Overflow data show that, our technique recommends correlated Web resources with promising accuracy in an open setting. A user study of 6 participants suggests that practitioners find the recommended Web resources useful for Web discovery.

@article{LiXS19, author = {Jing Li and Zhenchang Xing and Aixin Sun}, title = {LinkLive: discovering Web learning resources for developers from Q{\&}A discussions}, journal = {World Wide Web}, volume = {22}, number = {4}, pages = {1699--1725}, year = {2019}, url = {https://doi.org/10.1007/s11280-018-0621-y}, doi = {10.1007/s11280-018-0621-y}, }

DLocRL: A Deep Learning Pipeline for Fine-Grained Location Recognition and Linking in Tweets
Canwen Xu, Jing Li, Xiangyang Luo, Jiaxin Pei, Chenliang Li, Donghong Ji
WWW-19- The Web Conference, Pages 3391-3397, ACM, 2019. (Short)
PDF Abstract BibTex

In recent years, with the prevalence of social media and smart devices, people causally reveal their locations such as shops, hotels, and restaurants in their tweets. Recognizing and linking such fine-grained location mentions to well-defined location profiles are beneficial for retrieval and recommendation systems. In this paper, we propose DLocRL, a new deep learning pipeline for fine-grained location recognition and linking in tweets, and verify its effectiveness on a real-world Twitter dataset.

@inproceedings{DBLP:conf/www/XuLLPLJ19, author = {Canwen Xu and Jing Li and Xiangyang Luo and Jiaxin Pei and Chenliang Li and Donghong Ji}, title = {DLocRL: {A} Deep Learning Pipeline for Fine-Grained Location Recognition and Linking in Tweets}, booktitle = {The World Wide Web Conference (WWW)}, pages = {3391--3397}, year = {2019}, url = {https://doi.org/10.1145/3308558.3313491}, doi = {10.1145/3308558.3313491}, }

Spatial Keyword Search: A Survey
Lisi Chen, Shuo Shang, Chengcheng Yang and Jing Li
GeoInformatica-19- GeoInformatica. Springer, July 2019.
PDF Abstract BibTex

Spatial keyword search has been playing an indispensable role in personalized route recommendation and geo-textual information retrieval. In this light, we conduct a survey on existing studies of spatial keyword search. We categorize existing works of spatial keyword search based on the types of their input data, output results, and methodologies. For each category, we summarize their common features in terms of input data, output result, indexing scheme, and search algorithms. In addition, we provide detailed description regarding each study of spatial keyword search. This survey summarizes the findings of existing spatial keyword search studies, thus uncovering new insights that may guide software engineers as well as further research.

@article{DBLP:journals/geoinformatica/ChenSYL20, author = {Lisi Chen and Shuo Shang and Chengcheng Yang and Jing Li}, title = {Spatial keyword search: a survey}, journal = {GeoInformatica}, volume = {24}, number = {1}, pages = {85--106}, year = {2020}, url = {https://doi.org/10.1007/s10707-019-00373-y}, doi = {10.1007/s10707-019-00373-y},

Subtopic-Driven Multi-Document Summarization
Xin Zheng, Aixin Sun, Jing Li and Karthik Muthuswamy
EMNLP-IJCNLP-19- 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Pages 3144-3153, 2019. Acceptance rate: 684/2877 (23.8%).
PDF Abstract BibTex

In multi-document summarization, a set of documents to be summarized is assumed to be on the same topic, known as the underlying topic in this paper. That is, the underlying topic can be collectively represented by all the documents in the set. Meanwhile, different documents may cover various different subtopics and the same subtopic can be across several documents. Inspired by topic model, the underlying topic of a document set can also be viewed as a collection of different subtopics of different importance. In this paper, we propose a summarization model called STDS. The model generates the underlying topic representation from both document view and subtopic view in parallel. The learning objective is to minimize the distance between the representations learned from the two views. The contextual information is encoded through a hierarchical RNN architecture. Sentence salience is estimated in a hierarchical way with subtopic salience and relative sentence salience, by considering the contextual information. Top ranked sentences are then extracted as a summary. Note that the notion of subtopic enables us to bring in additional information (e.g. comments to news articles) that is helpful for document summarization. Experimental results show that the proposed solution outperforms state-of-the-art methods on benchmark datasets.

@inproceedings{DBLP:conf/emnlp/ZhengSLM19, author = {Xin Zheng and Aixin Sun and Jing Li and Karthik Muthuswamy}, title = {Subtopic-driven Multi-Document Summarization}, booktitle = {Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)}, pages = {3151--3160}, publisher = {Association for Computational Linguistics}, year = {2019}, url = {https://doi.org/10.18653/v1/D19-1311}, doi = {10.18653/v1/D19-1311}, }

To Do or Not To Do: Distill Crowdsourced Negative Caveats to Augment API Documentation
Jing Li, Aixin Sun and Zhenchang Xing
JASIST-18- Journal of the Association for Information Science and Technology. Volume 69, Issue 12, Pages 1460-1475, Wiley, 2018.
PDF Abstract BibTex

Negative caveats of application programming interfaces (APIs) are about “how not to use an API,” which are often absent from the official API documentation. When these caveats are overlooked, programming errors may emerge from misusing APIs, leading to heavy discussions on Q&A websites like Stack Overflow. If the overlooked caveats could be mined from these discussions, they would be beneficial for programmers to avoid misuse of APIs. However, it is challenging because the discussions are informal, redundant, and diverse. For this, for example, we propose Disca, a novel approach for automatically Distilling desirable API negative caveats from unstructured Q&A discussions. Through sentence selection and prominent term clustering, Disca ensures that distilled caveats are context‐independent, prominent, semantically diverse, and nonredundant. Quantitative evaluation in our experiments shows that the proposed Disca significantly outperforms four text‐summarization techniques. We also show that the distilled API negative caveats could greatly augment API documentation through qualitative analysis.

@article{LiSX18, author = {Jing Li and Aixin Sun and Zhenchang Xing}, title = {To Do or Not To Do: Distill crowdsourced negative caveats to augment api documentation}, journal = {J. Assoc. Inf. Sci. Technol.}, volume = {69}, number = {12}, pages = {1460--1475}, year = {2018}, url = {https://doi.org/10.1002/asi.24067}, doi = {10.1002/asi.24067}, }

SegBot: A Generic Neural Text Segmentation Model with Pointer Network
Jing Li, Aixin Sun and Shafiq Joty
IJCAI-18-The 27th International Joint Conference on Artificial Intelligence and the 23rd European Conference on Artificial Intelligence. Pages 4166-4172, 2018. Acceptance rate: 710/3470 (20.5%).
PDF Abstract BibTex Demo

Text segmentation is a fundamental task in natural language processing that comes in two levels of granularity: (i) segmenting a document into a sequence of topical segments (topic segmentation), and (ii) segmenting a sentence into a sequence of elementary discourse units (EDU segmentation). Traditional solutions to the two tasks heavily rely on carefully designed features. The recently proposed neural models do not need manual feature engineering, but they either suffer from sparse boundary tags or they cannot well handle the issue of variable size output vocabulary. We propose a generic end-to-end segmentation model called SegBot. SegBot uses a bidirectional recurrent neural network to encode input text sequence. The model then uses another recurrent neural network together with a pointer network to select text boundaries in the input sequence. In this way, SegBot does not require hand-crafted features. More importantly, our model inherently handles the issue of variable size output vocabulary and the issue of sparse boundary tags. In our experiments, SegBot outperforms state-of-the-art models on both topic and EDU segmentation tasks.

@inproceedings{LiSJ18segbot, author = {Jing Li and Aixin Sun and Shafiq R. Joty}, title = {SegBot: {A} Generic Neural Text Segmentation Model with Pointer Network}, booktitle = {Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI)}, pages = {4166--4172}, year = {2018}, url = {https://doi.org/10.24963/ijcai.2018/579}, doi = {10.24963/ijcai.2018/579}, }

API Caveat Explorer: Surfacing Nagative Usages from Practice
Jing Li, Aixin Sun, Zhenchang Xing and Lei Han
SIGIR-18-The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, Pages 1293-1296. ACM, 2018. (Demo)
PDF Abstract BibTex Demo

Application programming interface (API) documentation well describes an API and how to use it. However, official documentation does not describe "how not to use it" or the different kinds of errors when an API is used wrongly. Programming caveats are negative usages of an API. When these caveats are overlooked, errors may emerge, leading to heavy discussions on Q&A websites like Stack Overflow. In this demonstration, we present API Caveat Explorer, a search system to explore API caveats that are mined from large-scale unstructured discussions on Stack Overflow. API Caveat Explorer takes API-oriented queries such as "HashMap" and retrieves API caveats by text summarization techniques. API caveats are represented by sentences, which are context-independent, prominent, semantically diverse and non-redundant. The system provides a web-based interface that allows users to interactively explore the full picture of all discovered caveats of an API, and the details of each. The potential users of API Caveat Explorer are programmers and educators for learning and teaching APIs.

@inproceedings{LiSXH18, author = {Jing Li and Aixin Sun and Zhenchang Xing and Lei Han}, title = {{API} Caveat Explorer - Surfacing Negative Usages from Practice: An API-oriented Interactive Exploratory Search System for Programmers}, booktitle = {The 41st International {ACM} {SIGIR} Conference on Research {\&} Development in Information Retrieval}, pages = {1293--1296}, year = {2018}, url = {https://doi.org/10.1145/3209978.3210170}, doi = {10.1145/3209978.3210170}, }

Learning to Answer Programming Questions with Software Documentation through Social Context Embedding
Jing Li, Aixin Sun and Zhenchang Xing
INS-18- Information Sciences. Volumes 448–449, Pages 36-52, June 2018, Elsevier.
PDF Abstract BibTex

Official software documentation provides a comprehensive overview of software usages, but not on specific programming tasks or use cases. Often there is a mismatch between the documentation and a question on a specific programming task because of different wordings. We observe from Stack Overflow that the best answers to programmers’ questions often contain links to formal documentation. In this paper, we propose a novel deep-learning-to-answer framework, named QDLinker, for answering programming questions with software documentation. QDLinker learns from the large volume of discussions in community-based question answering site to bridge the semantic gap between programmers’ questions and software documentation. Specifically, QDLinker learns question-documentation semantic representation from these question answering discussions with a four-layer neural network, and incorporates semantic and content features into a learning-to-rank schema. Our approach does not require manual feature engineering or external resources to infer the degree of relevance between a question and documentation. Through extensive experiments, results show that QDLinker effectively answers programming questions with direct links to software documentation. QDLinker significantly outperforms the baselines based on traditional retrieval models and Web search services dedicated for software documentation retrieval. The user study shows that QDLinker effectively bridges the semantic gap between the intent of a programming question and the content of software documentation.

@article{L2ALiSX18, author = {Jing Li and Aixin Sun and Zhenchang Xing}, title = {Learning to answer programming questions with software documentation through social context embedding}, journal = {Information Sciences}, volume = {448-449}, pages = {36--52}, year = {2018}, url = {https://doi.org/10.1016/j.ins.2018.03.014}, doi = {10.1016/j.ins.2018.03.014}, }

HDSKG: Harvesting Domain Specific Knowledge Graph from Content of Webpages
Xuejiao Zhao, Zhenchang Xing, Muhammad Ashad Kabir, Naoya Sawada, Jing Li and Shangwei Lin
SANER-17-The 24th IEEE International Conference on Software Analysis, Evolution, and Reengineering. Acceptance rate: 34/140 (24.3%).
PDF Abstract BibTex

Knowledge graph is useful for many different domains like search result ranking, recommendation, exploratory search, etc. It integrates structural information of concepts across multiple information sources, and links these concepts together. The extraction of domain specific relation triples (subject, verb phrase, object) is one of the important techniques for domain specific knowledge graph construction. In this research, an automatic method named HDSKG is proposed to discover domain specific concepts and their relation triples from the content of webpages. We incorporate the dependency parser with rule-based method to chunk the relations triple candidates, then we extract advanced features of these candidate relation triples to estimate the domain relevance by a machine learning algorithm. For the evaluation of our method, we apply HDSKG to Stack Overflow (a Q&A website about computer programming). As a result, we construct a knowledge graph of software engineering domain with 35279 relation triples, 44800 concepts, and 9660 unique verb phrases. The experimental results show that both the precision and recall of HDSKG (0.78 and 0.7 respectively) is much higher than the openIE (0.11 and 0.6 respectively). The performance is particularly efficient in the case of complex sentences. Further more, with the self-training technique we used in the classifier, HDSKG can be applied to other domain easily with less training data.

@inproceedings{DBLP:conf/wcre/ZhaoXKSLL17, author = {Xuejiao Zhao and Zhenchang Xing and Muhammad Ashad Kabir and Naoya Sawada and Jing Li and Shang{-}Wei Lin}, editor = {Martin Pinzger and Gabriele Bavota and Andrian Marcus}, title = {{HDSKG:} Harvesting domain specific knowledge graph from content of webpages}, booktitle = {{IEEE} 24th International Conference on Software Analysis, Evolution and Reengineering (SANER)}, pages = {56--67}, publisher = {{IEEE} Computer Society}, year = {2017}, url = {https://doi.org/10.1109/SANER.2017.7884609}, doi = {10.1109/SANER.2017.7884609}, }

From Discussion to Wisdom: Web Resource Recommendation for Hyperlinks in Stack Overflow
Jing Li, Zhenchang Xing, Deheng Ye and Xuejiao Zhao
SAC-16-The 31st ACM Symposium on Applied Computing,2016. Acceptance rate: 252/1047 (24.07%).
PDF Abstract BibTex

Application programming interface (API) documentation well describes an API and how to use it. However, official documentation does not describe "how not to use it" or the different kinds of errors when an API is used wrongly. Programming caveats are negative usages of an API. When these caveats are overlooked, errors may emerge, leading to heavy discussions on Q&A websites like Stack Overflow. In this demonstration, we present API Caveat Explorer, a search system to explore API caveats that are mined from large-scale unstructured discussions on Stack Overflow. API Caveat Explorer takes API-oriented queries such as "HashMap" and retrieves API caveats by text summarization techniques. API caveats are represented by sentences, which are context-independent, prominent, semantically diverse and non-redundant. The system provides a web-based interface that allows users to interactively explore the full picture of all discovered caveats of an API, and the details of each. The potential users of API Caveat Explorer are programmers and educators for learning and teaching APIs.

@inproceedings{SACLiXYZ16, author = {Jing Li and Zhenchang Xing and Deheng Ye and Xuejiao Zhao}, editor = {Sascha Ossowski}, title = {From discussion to wisdom: web resource recommendation for hyperlinks in stack overflow}, booktitle = {Proceedings of the 31st Annual {ACM} Symposium on Applied Computing (SAC)}, pages = {1127--1133}, year = {2016}, url = {https://doi.org/10.1145/2851613.2851815}, doi = {10.1145/2851613.2851815}, }

BPMiner: Mining Developers' Behavior Patterns from Screen-Captured Task Videos
Jing Li, Lingfeng Bao, Zhenchang Xing, Xinyu Wang and Bo Zhou
SAC-16-The 31st ACM Symposium on Applied Computing, 2016. Acceptance rate: 252/1047 (24.07%).
PDF Abstract BibTex

Many user studies of software development use screen-capture software to record developers' behavior for post-mortem analysis. However, extracting behavioral patterns from screencaptured videos requires manual transcription and coding of videos, which is often tedious and error-prone. Automatically extracting Human-Computer Interaction (HCI) data from screen-captured videos and systematically analyzing behavioral data will help researchers analyze developers' behavior in software development more effectively and efficiently. In this paper, we present BPMiner, a novel behavior analysis approach to mine developers' behavior patterns from screencaptured videos using computer vision techniques and exploratory sequential pattern analysis. We have implemented a proof-of-concept prototype of BPMiner, and applied the BPMiner prototype to study the developers' online search behavior during software development. Our study suggests that the BPMiner approach can open up new ways to study developers' behavior in software development.

@inproceedings{SACLiBXWZ16, author = {Jing Li and Lingfeng Bao and Zhenchang Xing and Xinyu Wang and Bo Zhou}, editor = {Sascha Ossowski}, title = {BPMiner: mining developers' behavior patterns from screen-captured task videos}, booktitle = {Proceedings of the 31st Annual {ACM} Symposium on Applied Computing (SAC)}, pages = {1371--1377}, year = {2016}, url = {https://doi.org/10.1145/2851613.2851771}, doi = {10.1145/2851613.2851771}, }

Software-specific Part-of-speech Tagging: An Experimental Study on Stack Overflow
Deheng Ye, Zhenchang Xing, Jing Li and Nachiket Kapre
SAC-16-The 31st ACM Symposium on Applied Computing, 2016. Acceptance rate: 252/1047 (24.07%).
PDF Abstract BibTex

Part-of-speech (POS) tagging performance degrades on out-of-domain data due to the lack of domain knowledge. Software engineering knowledge, embodied in textual documentations, bug reports and online forum discussions, is expressed in natural language, but is full of domain terms, software entities and software-specific informal languages. Such software texts call for software-specific POS tagging. In the software engineering community, there have been several attempts leveraging POS tagging technique to help solve software engineering tasks. However, little work is done for POS tagging on software natural language texts. In this paper, we build a software-specific POS tagger, called S-POS, for processing the textual discussions on Stack Overflow. We target at Stack Overflow because it has become an important developer-generated knowledge repository for software engineering. We define a POS tagset that is suitable for describing software engineering knowledge, select corpus, develop a custom tokenizer, annotate data, design features for supervised model training, and demonstrate that the tagging accuracy of S-POS outperforms that of the Stanford POS Tagger when tagging software texts. Our work presents a feasible roadmap to build software-specific POS tagger for the socio-professional contents on Stack Overflow, and reveals challenges and opportunities for advanced software-specific information extraction.

@inproceedings{DBLP:conf/sac/YeXLK16, author = {Deheng Ye and Zhenchang Xing and Jing Li and Nachiket Kapre}, editor = {Sascha Ossowski}, title = {Software-specific part-of-speech tagging: an experimental study on stack overflow}, booktitle = {Proceedings of the 31st Annual {ACM} Symposium on Applied Computing (SAC)}, pages = {1378--1385}, publisher = {{ACM}}, year = {2016}, url = {https://doi.org/10.1145/2851613.2851772}, doi = {10.1145/2851613.2851772}, }

Extracting and Analyzing Time-Series HCI Data from Screen-Captured Task Videos
Lingfeng Bao, Jing Li, Zhenchang Xing, Xinyu Wang, Xin xia and Bo Zhou
EMSE-16- Empirical Software Engineering, Springer, Pages 1-41, 2016.
PDF Abstract BibTex

Recent years have witnessed the increasing emphasis on human aspects in software engineering research and practices. Our survey of existing studies on human aspects in software engineering shows that screen-captured videos have been widely used to record developers’ behavior and study software engineering practices. The screen-captured videos provide direct information about which software tools the developers interact with and which content they access or generate during the task. Such Human-Computer Interaction (HCI) data can help researchers and practitioners understand and improve software engineering practices from human perspective. However, extracting time-series HCI data from screen-captured task videos requires manual transcribing and coding of videos, which is tedious and error-prone. In this paper we report a formative study to understand the challenges in manually transcribing screen-captured videos into time-series HCI data. We then present a computer-vision based video scraping technique to automatically extract time-series HCI data from screen-captured videos. We also present a case study of our scvRipper tool that implements the video scraping technique using 29-hours of task videos of 20 developers in two development tasks. The case study not only evaluates the runtime performance and robustness of the tool, but also performs a detailed quantitative analysis of the tool’s ability to extract time-series HCI data from screen-captured task videos. We also study the developer’s micro-level behavior patterns in software development from the quantitative analysis.

@article{DBLP:journals/ese/BaoLXWXZ17, author = {Lingfeng Bao and Jing Li and Zhenchang Xing and Xinyu Wang and Xin Xia and Bo Zhou}, title = {Extracting and analyzing time-series {HCI} data from screen-captured task videos}, journal = {Empir. Softw. Eng.}, volume = {22}, number = {1}, pages = {134--174}, year = {2017}, url = {https://doi.org/10.1007/s10664-015-9417-1}, doi = {10.1007/s10664-015-9417-1}, }

Learning to Extract API Mentions from Informal Natural Language Discussions
Deheng Ye, Zhenchang Xing, Chee Yong Foo, Jing Li, and Nachiket Kapre
ICSME-16-The 32nd International Conference on Software Maintenance and Evolution. Acceptance rate: 37/125 (29%).
PDF Abstract BibTex

When discussing programming issues on social platforms (e.g, Stack Overflow, Twitter), developers often mention APIs in natural language texts. Extracting API mentions in natural language texts is a prerequisite for effective indexing and searching for API-related information in software engineering social content. However, the informal nature of social discussions creates two fundamental challenges for API extraction: common-word polysemy and sentence-format variations. Common-word polysemy refers to the ambiguity between the API sense of a common word and the normal sense of the word (e.g., append, apply and merge). Sentence-format variations refer to the lack of consistent sentence writing format for inferring API mentions. Existing API extraction techniques fall short to address these two challenges, because they assume distinct API naming conventions (e.g., camel case, underscore) or structured sentence format (e.g., code-like phrase, API annotation, or full API name). In this paper, we propose a semi-supervised machine-learning approach that exploits name synonyms and rich semantic context of API mentions to extract API mentions in informal social text. The key innovation of our approach is to exploit two complementary unsupervised language models learned from the abundant unlabeled text to model sentence-format variations and to train a robust model with a small set of labeled data and an iterative self-training process. The evaluation of 1,205 API mentions of the three libraries (Pandas, Numpy, and Matplotlib) in Stack Overflow texts shows that our approach significantly outperforms existing API extraction techniques based on language-convention and sentence-format heuristics and our earlier machine-learning based method for named-entity recognition.

@inproceedings{DBLP:conf/icsm/YeXFLK16, author = {Deheng Ye and Zhenchang Xing and Chee Yong Foo and Jing Li and Nachiket Kapre}, title = {Learning to Extract {API} Mentions from Informal Natural Language Discussions}, booktitle = {2016 {IEEE} International Conference on Software Maintenance and Evolution (ICSME)}, pages = {389--399}, year = {2016}, url = {https://doi.org/10.1109/ICSME.2016.11}, doi = {10.1109/ICSME.2016.11}, }

Software-specific Named Entity Recognition in Software Engineering Social Content
Deheng Ye, Zhenchang Xing, Chee Yong Foo, Zi Qun Ang, Jing Li and Nachiket Kapre
SANER-16-The 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering. Acceptance rate: 52/140 (37%).
PDF Abstract BibTex

Software engineering social content, such as Q&A discussions on Stack Overflow, has become a wealth of information on software engineering. This textual content is centered around software-specific entities, and their usage patterns, issues-solutions, and alternatives. However, existing approaches to analyzing software engineering texts treat software-specific entities in the same way as other content, and thus cannot support the recent advance of entity-centric applications, such as direct answers and knowledge graph. The first step towards enabling these entity-centric applications for software engineering is to recognize and classify software-specific entities, which is referred to as Named Entity Recognition (NER) in the literature. Existing NER methods are designed for recognizing person, location and organization in formal and social texts, which are not applicable to NER in software engineering. Existing information extraction methods for software engineering are limited to API identification and linking of a particular programming language. In this paper, we formulate the research problem of NER in software engineering. We identify the challenges in designing a software-specific NER system and propose a machine learning based approach applied on software engineering social content. Our NER system, called S-NER, is general for software engineering in that it can recognize a broad category of software entities for a wide range of popular programming languages, platform, and library. We conduct systematic experiments to evaluate our machine learning based S-NER against a well-designed, and to study the effectiveness of widely-adopted NER techniques and features in the face of the unique characteristics of software engineering social content.

@inproceedings{DBLP:conf/wcre/YeXFALK16, author = {Deheng Ye and Zhenchang Xing and Chee Yong Foo and Zi Qun Ang and Jing Li and Nachiket Kapre}, title = {Software-Specific Named Entity Recognition in Software Engineering Social Content}, booktitle = {{IEEE} 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER)}, pages = {90--101}, publisher = {{IEEE} Computer Society}, year = {2016}, url = {https://doi.org/10.1109/SANER.2016.10}, doi = {10.1109/SANER.2016.10}, }

scvRipper: Video Scraping Tool for Modeling Developers' Behavior Using Interaction Data
Lingfeng Bao, Jing Li, Zhenchang Xing, Xinyu Wang and Bo Zhou
ICSE-15-The 37th International Conference on Software Engineering Tool Demonstrations, Vol.2, Pages 673-676, 2015.
PDF Abstract Demo BibTex

Screen-capture tool can record a user's interaction with software and application content as a stream of screenshots which is usually stored in certain video format. Researchers have used screen-captured videos to study the programming activities that the developers carry out. In these studies, screen-captured videos had to be manually transcribed to extract software usage and application content data for the study purpose. This paper presents a computer-vision based video scraping tool (called scvRipper) that can automatically transcribe a screen-captured video into time-series interaction data according to the analyst's need. This tool can address the increasing need for automatic behavioral data collection methods in the studies of human aspects of software engineering.

@inproceedings{DBLP:conf/icse/BaoLXWZ15, author = {Lingfeng Bao and Jing Li and Zhenchang Xing and Xinyu Wang and Bo Zhou}, title = {scvRipper: Video Scraping Tool for Modeling Developers' Behavior Using Interaction Data}, booktitle = {37th {IEEE/ACM} International Conference on Software Engineering, {ICSE} 2015, Florence, Italy, May 16-24, 2015, Volume 2}, pages = {673--676}, publisher = {{IEEE} Computer Society}, year = {2015}, url = {https://doi.org/10.1109/ICSE.2015.220}, doi = {10.1109/ICSE.2015.220}, }

Reverse Engineering Time-Series Interaction Data from Screen-Captured Videos
Lingfeng Bao, Jing Li, Zhenchang Xing, Xinyu Wang and Bo Zhou
SANER-15-The 22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering, Pages 399-408, 2015. Acceptance rate: 46/144 (32%).
PDF Abstract BibTex

In recent years the amount of research on human aspects of software engineering has increased. Many studies use screen-capture software (e.g., Snagit) to record developers' behavior as they work on software development tasks. The recorded task videos capture direct information about which activities the developers carry out with which content and in which applications during the task. Such behavioral data can help researchers and practitioners understand and improve software engineering practices from human perspective. However, extracting time-series interaction data (software usage and application content) from screen-captured videos requires manual transcribing and coding of videos, which is tedious and error-prone. In this paper we present a computer-vision based video scraping technique to automatically reverse-engineer time-series interaction data from screen-captured videos. We report the usefulness, effectiveness and runtime performance of our video scraping technique using a case study of the 29 hours task videos of 20 developers in the two development tasks.

@inproceedings{DBLP:conf/wcre/BaoLXWZ15, author = {Lingfeng Bao and Jing Li and Zhenchang Xing and Xinyu Wang and Bo Zhou}, editor = {Yann{-}Ga{\"{e}}l Gu{\'{e}}h{\'{e}}neuc and Bram Adams and Alexander Serebrenik}, title = {Reverse engineering time-series interaction data from screen-captured videos}, booktitle = {22nd {IEEE} International Conference on Software Analysis, Evolution, and Reengineering (SANER)}, pages = {399--408}, year = {2015}, url = {https://doi.org/10.1109/SANER.2015.7081850}, doi = {10.1109/SANER.2015.7081850}, }