Downloads

Online code clones

2,289 Online Clones in the Study

The files below contains 2,289 online code clones reported in the study. The CSV file contains only the clone pairs and their classifications while JSON file contains the complete information from the manual investigation (e.g. comments from the investigators, latest version of code, licensing, git blame, modification types)

CSV JSON Python script

Web Application of 2,289 Online Clones in the Study

We developed a web application for the manual clone validation and classification phase. The web application stores the details of each clone pair, its classification, status (normal, outdated), and license. Thus, if you would like to see the clones in more details, please visit the web app below.

Go to the web app

Data sets

Stack Overflow code snippets

72,365 Java code snippets in accepted answers on Stack Overflow extracted from the Stack Exchange data dump on January 2016.

Download here

Qualitas projects

A curated collection of 112 Java software systems for empirical studies of code artefacts.

Download here (v. 20130901r)

Clone and license detection tools

Simian

Text-based clone detector that locates clones at line-level granularity

Download here

SourcererCC

Scalable token-based clone detector which detects clones, of type-1 up to type-3

Download here

Ninka

A lightweight sentence-based automatic license identification tool for source code.

Download here