Software Security

Open source software (OSS) has been widely used in both free and proprietary applications. The Black Duck reports that 96% of their scanned applications contain open source components, which account for 57% of the code base on average. At the same time, vulnerabilities embedded in upstream OSS are fast propagated to the underlying applications. Also, the clone or reuse of OSS without explicit reference makes it challenging for maintainers to track and mitigate vulnerabilities. Our research develops practical techniques for detecting such vulnerabilities, which help build a more reliable and secure information system infrastructure.

Read more about our works:
Security Patch Identification Security Patch Classification

Security Patch Identification

Security patches in open source software (OSS) not only provide security fixes to identified vulnerabilities but also make the vulnerable code public to the attackers. Therefore, armored attackers may misuse this information to launch N-day attacks on unpatched OSS versions. The best practice for preventing this type of N-day attacks is to keep upgrading the software to the latest version in no time. However, due to the concerns on reputation and easy software development management, software vendors may choose to secretly patch their vulnerabilities in a new version without reporting them to CVE or even providing any explicit description in their change logs. When those secretly patched vulnerabilities are being identified by armored attackers, they can be turned into powerful "0-day" attacks, which can be exploited to compromise not only unpatched version of the same software, but also similar types of OSS (e.g., SSL libraries) that may contain the same vulnerability due to code clone or similar design/implementation logic. Therefore, it is critical to identify secret security patches. In our paper, we develop a defense system and implement a toolset to automatically identify secret security patches in OSS. To distinguish security patches from other patches, we first build a security patch database that contains more than 4700 security patches mapping to the records in the CVE list. Next, we identify a set of features to help distinguish security patches from non-security ones using machine learning approaches. Finally, we use code clone identification mechanisms to discover similar patches or vulnerabilities in similar types of OSS. The experimental results show our approach can achieve good detection performance. A case study on OpenSSL, LibreSSL, and BoringSSL discover 12 secret security patches.

Publiched in the IEEE Conference on Dependable Systems and Networks (DSN) 2019.

Download the Paper Dataset

author={X. {Wang} and K. {Sun} and A. {Batcheller} and S. {Jajodia}},  
booktitle={2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)},   
title={Detecting "0-Day" Vulnerability: An Empirical Study of Secret Security Patch in OSS},   

Security Patch Classification

With the increasing usage of open source software (OSS) in both free and proprietary applications, vulnerabilities embedded in OSS are also propagated to the underlying applications. It is critical to find security patches to fix these vulnerabilities, especially those essential to reduce security risk. Unfortunately, given a security patch, currently, there does not exist a way to automatically recognize the vulnerability that is fixed. In our paper, we first conduct an empirical study on security patches by type (i.e., corresponding vulnerability type), using a large-scale dataset collected from the National Vulnerability Database (NVD). Based on analysis results, we develop a machine learning-based system to help identify the vulnerability type of a given security patch. The evaluation results show that our system achieves good performance.

Published in the IEEE Conference on Communications and Network Security (CNS) 2020.

Download the Paper

author={X. {Wang} and S. {Wang} and K. {Sun} and A. {Batcheller} and S. {Jajodia}},
booktitle={2020 IEEE Conference on Communications and Network Security (CNS)}, 
title={A Machine Learning Approach to Classify Security Patches into Vulnerability Types}, 

PatchDB: A Large-Scale Security Patch Dataset

Security patches, embedding both vulnerable code and the corresponding fixes, are of great significance to vulnerability detection and software maintenance. A large patch dataset is critical for various patch analysis tasks. However, the existing patch datasets suffer from insufficient samples and low varieties. In this paper, we construct a large-scale patch dataset called PatchDB consisting of three datasets, namely, NVD-based dataset, wild-based dataset, and synthetic dataset. The NVD-based dataset is extracted from the patch hyperlinks indexed by the NVD. The wild-based dataset includes security patches that we collect from the commits on GitHub. To improve the efficiency of data collection and reduce the effort on manual verification, we develop a new nearest link search method to help find the most promising security patch candidates. Moreover, we provide a synthetic dataset, which uses a new oversampling method to synthesize patches at the source code level, enriching the control flow variants of original patches. We conduct a set of studies to investigate the effectiveness of the proposed algorithms and evaluate the properties of the collected dataset. The experimental results show that PatchDB can help improve the performance on security patch identification.

Publiched in the IEEE Conference on Dependable Systems and Networks (DSN) 2021.

Download the Paper Slides Dataset Website

author={X. {Wang} and S. {Wang} and P. {Feng} and K. {Sun} and S. {Jajodia}},  
booktitle={2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)},   
title={PatchDB: A Large-Scale Security Patch Dataset},