#COMPLIANCECHECKER #SECURITY #SECOM #COMMITMESSAGES
SECOMlint: A linter for Security Commit Messages
| To Resubmit
january, 2023 .
Sofia Reis,
Corina S. Pasareanu,
Rui Abreu,
Hakan Erdogmus. ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
(ESEC/FSE'23)
- Demo Track
Abstract
Paper
Transparent and efficient vulnerability and patch disclosure are still a challenge
in the security community, essentially because of the poor-quality documentation
stemming from the lack of standards. SECOM is a recently-proposed standard convention
for security commit messages that enables the writing of well-structured and complete
commit messages for security patches. The convention prescribes different bits of
security-related information essential for a better understanding of vulnerabilities
by humans and tools. SECOMlint is an automated and configurable solution to help
security and maintenance teams infer compliance against the SECOM standard when
submitting patches to security vulnerabilities in their source version control
systems. The tool leverages the natural language processing technique Named-Entity
Recognition (NER) to extract security-related information from commit messages and
uses it to match the compliance standards designed. We demonstrate SECOMlint at
https://youtu.be/-1hzpMN_uFI; and documentation and its source code at
https://tqrg.github.io/secomlint/.
#FRAMEWORK #SECURITY #AI #VULNERABILITYDETECTION
Tenet: A Flexible Framework for
Machine-Learning-based Vulnerability Detection
| Under Review
january, 2023 .
Eduard
Pinconschi,
Sofia Reis,
Chi Zhang,
Rui Abreu,
Hakan Erdogmus,
Corina S. Pasareanu and
Limin Jia . International Conference on
Software Engineering for AI
(CAIN'23)
co-located with the International Conference
in Software Engineering (ICSE'23) - Poster
Abstract
Paper
Software vulnerability detection (SVD) aims to
identify potential security weaknesses in software.
SVD systems
have been rapidly evolving from those being based on
testing,
static analysis, and dynamic analysis to those based
on machine
learning (ML). Many ML-based approaches have been
proposed,
but challenges remain: training and testing datasets
contain
duplicates, and building customized end-to-end
pipelines for SVD
is time-consuming. We present Tenet, a modular
framework
for building end-to-end, customizable, reusable, and
automated
pipelines through a plugin-based architecture that
supports SVD
for several deep learning (DL) and basic ML models.
We
demonstrate the applicability of Tenet by building
practical
pipelines performing SVD on real-world
vulnerabilities.
#LINTER #SECURITY #PATTERNS #INFRASTRUCTURE
#METHODOLOGY
Leveraging Practitioners' Feedback
to Improve a Security Linter
august, 2022 .
Sofia Reis,
Rui Abreu,
Marcelo d'Amorim and
Daniel Fortunato . IEEE/ACM
International Conference on Automated Software
Engineering
(ASE'22)
- Technical Paper Track
Abstract
Paper
Infrastructure-as-Code (IaC) is a technology that
enables the managing, provisioning, and distributing
of infrastructure through code instead of manual
processes. As with any piece of code, IaC scripts
are not immune to defects. A recent Cloud Threat
Report from Palo Alto Network’s Unit 42 announced
the discovery of over 199K vulnerable IaC templates.
This highlights the importance of tools to prevent
vulnerabilities from reaching production and shift
security left in the development pipeline.
Unfortunately, we observed through a comprehensive
study that security linters for IaC scripts can be
very imprecise. Our approach to address this problem
was to leverage community expertize to improve the
precision of these tools. More precisely, we
interviewed professional developers of Puppet
scripts to collect their feedback on the root causes
of imprecision of the state-of-the-art security
linter for Puppet. From that feedback, we developed
a new linter adjusting 7 rules of the original
linter ruleset and adding 3 new rules. We conducted
a new study with 131 professional developers,
showing an increase in precision from 8% to 83%. The
main message of this paper is that obtaining
professional feedback is feasible and highly
effective and that feedback is key to the creation
of high precision rulesets, which is critical for
the usefulness and adoption of IaC security linters.
#BESTPRACTICES #SECURITYCOMMITSMESSAGES #CONVENTION #SECURITYSTANDARD
SECOM: Towards a convention for
security commit messages
march, 2022 .
Sofia Reis,
Rui Abreu,
Hakan Erdogmus and
Corina Păsăreanu . Mininig Software
Repositories Conference
(MSR'22)
-
Industry Track, co-located with the International
Conference in Software
Engineering (ICSE'22)
- Short Paper
Abstract
How to Configure
Website
Paper
One way to detect and assess software
vulnerabilities is by extracting security-related
information from commit messages.
Automating the detection and assessment of
vulnerabilities upon security commit messages is
still challenging due to the
lack of structured and clear messages. We created a
convention, called SECOM, for security commit
messages that structure and
include bits of security-related information that
are essential for detecting and assessing
vulnerabilities for both humans
and tools. The full convention and details are
available here:
https://tqrg.github.io/secom/.
#SURVEY #STATICANALYSIS #SECURITY #TOOLS #COLLECTION
A Systematic Survey of
Security-oriented Static Analysis Tools |
Under Review
2022 .
Sofia Reis
and Rui
Abreu .
ACM Computing Surveys Journal (CSUR) -
Long Survey
Abstract
Available Soon
Over the past decades, a vast amount of static
analysis tools has been studied, designed, and
produced in academia and industry. Static
analysis is a technique capable of examining entire
codebases against coding rules before executing the
source code. These tools
have the potential of addressing issues early on the
software development lifecycle by pinpointing
software defects such as security
vulnerabilities. Detecting this type of issues early
on decreases the amount of money wasted on
maintenance and makes the software
safer by default. Despite these techniques being
around for a few decades, there are still several
research opportunities and problems
to be solved. With this systematic survey, we aim to
solve one of them: unstructured knowledge. The
knowledge regarding these tools
is very spread out on the internet and academic
papers, which turn the understanding and adoption
very difficult.
In this systematic literature review, we organize
and describe the current state of security-oriented
static analysis tools (SoSATs) by
providing a a structured overview of previous
approaches, including techniques, programming
languages and weaknesses spectra,
performance, availability, and popularity. This work
is a contribution to both industry and academia:
industry, by providing a complete
description of the tools; and academia, by providing
a set of open research opportunities in the field.
#MAINTAINABLESECURITY #SECURITYPATCHES #IMPACT
Fixing Vulnerabilities Potentially
Hinders Maintainability
september, 2021 .
Sofia Reis,
Rui
Abreu and LuÃs Cruz .
Published at the Empirical Software Engineer
Journal (EMSE'21) .
Accepted at the International Conference on Software
Maintenance and Evolution (ICSME'21)
- J1 Track for presentation.
Abstract
Replication Package
Paper
Presentation
Security is a requirement of utmost importance to
produce high-quality software. However, there is
still a considerable amount of vulnerabilities being
discovered and fixed almost weekly. We hypothesize
that developers affect the maintainability of their
codebases when patching vulnerabilities. This paper
evaluates the impact of patches to improve security
on the maintainability of open-source software.
Maintainability is measured based on the Better Code
Hub's model of 10 guidelines on a dataset,
including 1300 security-related commits. Results
show evidence of a trade-off between security
and maintainability for 41.90% of the cases, i.e.,
developers may hinder software maintainability.
Our analysis shows that 38.29% of patches increased
software complexity and 37.87% of patches
increased the percentage of LOCs per unit. The
implications of our study are that changes to
codebases while patching vulnerabilities need to be
performed with extra care; tools for patch risk
assessment should be integrate into the CI/CD
pipeline; computer science curricula needs to be
updated;
and, more secure programming languages are
necessary.
#VULNERABILITYDETECTION #MACHINELEARNING #CODE2VEC
On using distributed representations
of source code for the detection of C security
vulnerabilities
july, 2021 .
David Coimbra, Sofia
Reis, Rui
Abreu, Hakan Erdogmus and
Corina Păsăreanu . International
Workshop on Principles of Diagnosis (DX'21)
- Paper
Abstract
Replication Package
Paper
CodeXGLUE Leaderboard: SecurityAware
This paper presents an evaluation of the code
representation model Code2vec
when trained on the task of detecting security
vulnerabilities in C source
code. We leverage the open-source library astminer
to extract path-contexts
from the abstract syntax trees of a corpus of
labeled C functions. Code2vec is
trained on the resulting path-contexts with the task
of classifying a function as
vulnerable or non-vulnerable. Using the CodeXGLUE
benchmark, we show that the
accuracy of Code2vec for this task is comparable to
simple transformer-based
methods such as pre-trained RoBERTa, and outperforms
more naive NLP-based methods.
We achieved an accuracy of 61.43% while maintaining
low computational
requirements relative to larger models.
#DYNAMICSLICING #FAULTLOCALIZATION
Demystifying the Combination of
Dynamic Slicing and Spectrum-based Fault
Localization
may, 2019 .
Sofia Reis,
Rui Abreu
and Marcelo
d'Amorim . International Joint Conference on
Artificial Intelligence Conference (IJCAI'19) - Main
Track
Abstract
Paper
Tool
Several approaches have been proposed to reduce
debugging costs through automated software
fault diagnosis. Dynamic Slicing (DS) and Spectrum-
based Fault Localization (SFL) are popular
fault diagnosis techniques and normally seen as
complementary. This paper reports on a comprehensive
study to reassess the effects of combining DS with
SFL. With this combination, components that are
often
involved in failing but seldom in passing test runs
could be located and their suspiciousness reduced.
Results show that the DS-SFL combination, coined as
Tandem-FL, improves the diagnostic accuracy up to
73.7%
(13.4% on average). Furthermore, results indicate
that the risk of missing faulty statements, which is
a DS's key
limitation, is not high — DS misses faulty
statements in 9% of the 260 cases. To sum up, we
found that the DS-SFL
combination was practical and effective and
encourage new SFL techniques to be evaluated against
that optimization.
#SECURITYPATCHES #DATASET #MULTILANGUAGE
SECBENCH: A Database of Real
Security Vulnerabilities
september, 2017 . Sofia Reis
and Rui
Abreu .
International Workshop on Secure Software Engineering in
DevOps and Agile Development (SecSE'17)
co-located with the European Symposium on Research in
Computer Security (ESORICS'17)
- Workshop Paper;
Extended version published at the International Journal
of Secure Software Engineering
(IJSSE)
- Journal Paper (Special Invitation)
Abstract
Paper
Extended Paper
Dataset
Currently, to satisfy the high number of system
requirements, complex software
is created which turns its development costintensive
and more susceptible to security vulnerabilities. In
software
security testing, empirical studies typically use
artificial faulty programs
because of the challenges involved in the extraction
or reproduction of
real security vulnerabilities. Thus, researchers
tend to use hand-seeded
faults or mutations to overcome these issues which
might not be suitable for
software testing techniques since the two approaches
can create
samples that inadvertently differ from the real
vulnerabilities and thus
might lead to misleading assessments of the
capabilities of the tools. Although
there are databases targeting security
vulnerabilities test cases,
one database contains only real vulnerabilities, the
other ones are a mix
of real and artificial or even only artificial
samples. Secbench is a database
of real security vulnerabilities mined from Github
which hosts millions of
open-source projects carrying a considerable number
of security vulnerabilities.
We mined 248 projects - accounting to almost 2M
commits -
for 16 different vulnerability patterns, yielding a
Database with 682 real
security vulnerabilities.