Results

RQ1.1: Stack Overflow Answerers’ Awareness

General information

Table 1: Stack Overflow answerers taken the survey

Reputation Sent emails Answers Rate
963,731-6,999 607 201 33%

Table 2: Experience of Stack Overflow answerers

Experience Amount Percent
Less than a year 1 0.5%
1 – 2 years 1 0.5%
3 – 5 years 30 14.9%
5 – 10 years 58 28.9%
More than 10 years 111 55.2%

Code snippets in answers

Table 3: Frequency of including code snippets in answers

Include code snippets Amount Percent
Very Frequently (81–100% of the time) 84 42%
Frequently (61–80% of the time) 63 31%
Occasionally (41–60% of the time) 40 20%
Rarely (21–40% of the time) 11 6%
Very Rarely (1–20% of the time) 2 1%
Never (0% of the time) 1 1%
Total 201 100%

Figure 2: The sources of code snippets in Stack Overflow answers

snippet sources

RQ 1.1 How often are Stack Overflow answerers aware of the outdated code and licensing conflicts when they answer a question on Stack Overflow?

Outdated code snippets

Table 4: Notifications of outdated code snippets in answers

Notified of outdated code Amount Percent
Very frequently (81–100% of my answers) 2 1%
Frequently (61–80% of my answers) 1 0.5%
Occasionally (41–60% of my answers) 9 4.5%
Rarely (21–40% of my answers) 16 8%
Very rarely (1–20% of my answers) 103 51.5%
Never (0% of my answers) 69 34.5%

License of code snippets

Table 5: Inclusion of software license in answer

Include license? Amount
No. 197
Yes, in code comment 1
Yes, in text surrounding the code 2
Total 200

Table 6: Checking for licensing conflicts with CC BY-SA 3.0

Check license conflicts? Amount Percent
Very Frequently (81–100% of the time) 14 7%
Frequently (61–80% of the time) 7 3.5%
Occasionally (41–60% of the time) 10 5%
Rarely (21–40% of the time) 16 8%
Very rarely (1–20% of the time) 15 7.5%
Never (0% of the time) 138 69%
Total 200 100%

Answer to RQ 1.1

Although most of the Stack Overflow answerers are aware that their code can be outdated, 51.5% of the answerers were very rarely notified and 35.5% have never been notified of outdated code in the answers. After being notified, 19.8% of them rarely or never fix the outdated code. 124 answerers out of 200 (62%) are aware of Stack Overflow’s CC BY-SA 3.0 license applied to code snippets in questions and answers. However, only 3 answerers explicitly include software license in their answers. Some answerers choose to include the license in their profile page instead. 69% of the answerers never check for licensing conflicts between their copied code snippets and Stack Overflow’s CC BY-SA 3.0.

RQ1.2: Stack Overflow Visitors’ Awareness

General information

Twenty-four (27%) and twenty-one (24%) participants have over 10 years and 5–10 years of experience respectively. There are 19 participants (21%) who have 3–5 years, 18 (20%) who have 1-2 years, and 7 (8%) participants who have less than a year of programming experience.

Table 7: Problems from Stack Overflow code snippets

Problem Amount
Mismatched solutions 40
Outdated solutions 39
Incorrect solutions 28
Buggy code 1

Table 8: Frequency of reporting the problems to Stack Overflow posts

Report? Amount Percent
Very Frequently (81–100% of the time) 1 1.8%
Frequently (61–80% of the time) 1 1.8%
Occasionally (41–60% of problematic snippets) 3 5.3%
Rarely (21–40% of problematic snippets) 8 14.0%
Very rarely (1–20% of problematic snippets) 8 14.0%
Never (0% of problematic snippets) 36 63.2%
Total 57 100%

Table 9: Check for licensing conflicts before using Stack Overflow snippets

License check? Amount Percent
Very frequently (81–100% of the time) 0 0.0%
Frequently (61–80% of the time) 7 8.1%
Occasionally (41–60% of the time) 6 6.9%
Rarely (21–40% of the time) 6 6.9%
Very rarely (1–20% of the time) 11 12.6%
Never (0% of the time) 57 65.5%
Total 87 100%

Answer to RQ 1.2

Stack Overflow visitors experienced several issues from Stack Overflow answers including outdated code. 85% of them are not aware of CC BY-SA 3.0

RQ2: Online Code Clones

To what extent is source code cloned between Stack Overflow and open source projects?

Table 10: Investigated online clone pairs and corresponding snippets and Qualitas projects

Set Pairs Snippets Projects Cloned ratio
Reported clones 2,289 460 59 53.28%
TP from manual validation 2,063 443 59 54.09%

Answer to RQ 2

We found 2,063 manually confirmed clone pairs between 443 Stack Overflow code snippets and 59 Qualitas proejcts.

RQ3: Patterns of Online Code Cloning

Why do online code clones occur?

Table 11: Classifications of online clone pairs

Set QS SQ EX UD BP IN AC Total
Before consolidation 247 1 197 107 1,495 16 226 2,289
After consolidation 153 1 109 65 216 9 53 606

Answer to RQ 3

We found 153 pairs with strong evidences to be cloned from 23 Qualitas projects to Stack Overflow, 1 pair was cloned from Stack Overflow to Qualitas, and 109 pairs were found to be cloned to Stack Overflow from external sources. However, the largest amount of the clone pairs between Stack Overflow and Qualitas projects are boiler-plate code (216), followed by 65 clone pairs with no evidence that the code has actually been copied, and 9 pairs of clones due to implementing the same interface or inheriting the same class.

RQ4: Outdated Online Code Clones

Are online code clones up-to-date compared to their counterparts in the original projects?

Figure 1: Outdated QS online clone pairs group by projects

outdated code

Table 12: Six code modification types found when comparing the outdated clone pairs to their latest versions

Modification Occurrences
Statement modification 50
Statement addition 28
Statement removal 18
Method signature change 16
Method rewriting 15
File deletion 14

Table 13: Examples of the outdated QS online clones (see full results in the Interactive menu)

Post Date Project File Start End Date Issue ID Type* Date
2513183 25/3/10 eclipse GenerateTo
StringAction.java
113 166 5/6/13 Bug 439874 S 17/3/15
22315734 11/3/314 hadoop Writable
Comparator.java
44 54 25/8/11 HADOOP-11323 S 20/11/14
23520731 7/5/14 hibernate Schema
Update.java
115 168 22/5/13 HHH-10458 S 5/2/16
18232672 14/8/13 log4j SMTP
Appender.java
207 228 31/3/10 Bug 44644 R 18/10/08
17697173 17/7/13 lucene SlowSynonym
FilterFactory.java
38 52 6/4/13 LUCENE-4095 D 31/5/12
21734562 12/2/14 tomcat Form
Authenticator.java
51 61 4/8/10 BZ 59823 R 4/8/16
12593810 26/9/12 poi Workbook
Factory.java
49 60 7-Dec-09 57593 R 30/4/15
8037824 7/11/11 jasper
reports
JRVerifier.java 1221 1240 31/5/10 N/A D 20/5/11
3758110 21/9/10 spring Default
Annotation
Handler
Mapping.java
78 92 20/10/10 SPR-14129 D 20/1/12
14019840 24/12/12 struts Default
ActionMapper.java
273 288 17-Jul-10 WW-4225 S 18/10/13

Note: S: modified/added/deleted statements, D: file has been deleted, R: method has been rewritten completely

Answer to RQ 4

Our results show that 66% (101) of QS clone pairs on Stack Overflow are outdated. 86 pairs differ from their newest versions by modifications applied to variable names or method names, added or deleted statements, to a fully rewritten code with new method signatures. 15 pairs are dead snippets. 47 outdated code snippets are found in 130,703 GitHub projects without evidence of copying, which of 12 were buggy. A toxic code snippet with a race condition was found in two popular projects: deeplearning4j and Apache Hive.

RQ5: Software Licensing Violation

Do licensing conflicts occur between Stack Overflow clones and their originals?

Table 14: License mapping of online clones (file-level)

Type Qualitas Stack Overflow (CC BY-NC-SA) QS EX UD
Compatible Apache-2 Apache-2 1    
  EPLv1 EPLv1 2   1
  Proprietary Proprietary   2  
  Sun Microsystems Sun Microsystems   3  
  No license No license 20 9 2
  No license CC BY-SA 3.0   1  
Total     23 15 3
Incompatible AGPLv3/3+ No license 1   4
  Apache-2 No license 46 14 12
  BSD/BSD3 No license 4   1
  CDDL or GPLv2 No license     6
  EPLv1 No license 10   6
  GPLv2+/3+ No license 8 48 7
  LesserGPLv2.1+/3+ No license 16   9
  MPLv1.1 No license     1
  Oracle No license   3  
  Proprietary No license   1 2
  Sun Microsystems No license   1 2
  Unknown No license   11  
  LesserGPLv2.1+ New BSD3 1    
Total     86 78 50

Answer to RQ 5

We found 214 code snippets on Stack Overflow that could potentially violate the license of their original software. The majority of them do not contain licensing statements after they have been copied to Stack Overflow. For 164 of them, we were able to identify, with evidence, where the code snippet has been copied from. We found occurrences of 7,112 clones of the 214 license-incompatible code snippets in 2,427 GitHub projects.