Background Research

Insphpect is part of a Ph.D Project by Thomas Butler at the University of Northampton in the UK.

The tool on this website is a proof of concept of the metric which has been developed as a result of this research. Please try it out for yourself and remember to complete the survey! Your feedback is vital to the final stage of the research!

Project aim

The aim of the Ph.D project is to develop a metric for analysing source code flexibility by identifying programming practices known to reduce flexibility. For example, global variables and singletons.

Insphpect is a tool which has been created to test and evaluate this metric.

Methodology

Firstly, bad practices were identified by literature review. During this review, it was discovered that most "bad practices" which are dicussed frequently by industry professionals are not discussed in the same terms among academics.

A paper outling this disconnect between industry and academia entitled Seven deadly sins of software flexibility was presented at the 13th China Europe International Symposium Of Software Engineering Education in Athens in 2017.

Because the idea of a "bad practice" can be subjective, a meta-analysis was performed to collect developer's opinions on the following practices:

Global/Static Variables
Singleton Pattern
Inheritance
Using the new keyword in a constructor (not using Dependency Injection)
Service Locator
Setter Injection
Annotations for configuration
Static Methods

Meta-Analysis

Because these bad practices are rarely discussed in academia, a meta-analysis was performed of all literature, not just academic literature to determine developer's opinions on these bad practices.

For each bad practice, the first 100 relevant results* from google for each practice was included in the analysis. Over 800 articles (100 for each practice).

*A *relevant result* is defined as an article which is written by a single author or organisation describing or discussing the practice in question. Discussion forums, posts on social media and question & answer sites will not be included as these pages will include multiple opinions. Comments sections on articles were be omitted for the same reason. Any article which only discussed the practice in passing was also omitted from the analysis.

Controls

When searching, all searches were performed in a private browser session, while logged out of google to ensure results were as organic or possible.
Google was used as a randomisation tool: The first 100 relevant results were chosen, this brought up opinions of people who where in favour of and against using the practice.
Where a practice had multiple names, all the names were typed into google and results collated. Any articles which existed in both searches were only included in the meta-analysis once. Where there were 2 names, 2x50 results were used, where there were 4, 4x25 sets of search results were used, etc.

Benefits of using Google for the search meta-analysis sample

Google acts as a randomisation tool: It brings back articles which are both in favour of and against using the practice being searched for.
This gave a good overview of the current thoughts of developers. Google results are not entirely random, it tends to favour popular pages which strengthens the result as the most cited pages are more likely to be included in the results
With a sample size of 100 articles per programming practice this gave a good overview of what developers think of each practice.

Each of the 800 articles was then graded on two metrics:

A 5 point recommendation score of whether the author recommend using or avoiding the practice.
A 7 point Jadad style score which graded the amount of analysis done by the author

Recommendation score

Each article was given a grade from 1-5 for the recommendation made by the author:

Always favour this practice over alternatives
Favour this practice over alternatives unless specific (described) circumstances apply
Neutral - No recommendation (e.g. a manual page) or no conclusion drawn
Only use this practice in specific (described) circumstances
Always favour alternative approaches

Jadad style score

Differing methodological rigor in sources is a problem which exits exists when doing any kind of meta-analysis. When performing meta-analysis of clinical trials the Cochrane Collaboration consider methodological rigour an important part of their meta-analysis[1].

Rather than simply counting the number of trials which show a positive outcome and counting the number of trials which show a negative outcome, the trials are weighted on methodological rigour. For example, in a meta-analysis of a drug they may find that 3 trials show that it is an effective treatment and 8 which say that it is not. Instead of simply counting the numbers on each side, methodological rigor of each study is used as a factor when building conclusions on the overall efficacy of the treatment.

In a meta-analysis of the efficacy of homeopathic treatments it was found that trials of homeopathy with a poor methodology are much more likely to show a positive outcome whereas trials with a robust methodology are more likely to conclude that homeopathy is no better than placebo[2] .

This is because methodological rigour can affect the outcome. For example, by putting the most healthy patients in the experimental group and putting the least healthy patients in the control group it's likely that the experimental group will see significant improvement over the control group regardless of whether the drug being tested has any effect[3].

For programming articles, analytic rigour can be plotted against whether the article recommends using or avoiding the practice to create a meta-analysis in a similar manner.

It should be possible to draw conclusions such as as an article's analytic rigour increases, it is more likely to recommend using the practice in question.

The created metric was based on the Jadad Scale[4] used for analysis of clinical trials in medicine. The Jadad Scale is a 5 point scale using a 3 question questionnaire which can be used to quickly assess the methodological rigour used in a clinical trial. The questions asked are: *Was the study described as randomized?*, *Was the study described as double blind?* and *Was there a description of withdrawals and dropouts?*. These are then used to calculate a score from zero (very poor) to five (rigourous). By citation count the Jadad Scale is the most widely used method of comparing clinical trials in the world[5].

As the Jadad Scale is not applicable for anything other the clinical trials, a new metric was created based on the principles of the Jadad scale to be used in determining the analytic rigour of any given article about a programming practice. A seven point scale was chosen with a point awarded if the article does each of the following:

Describes how to use the practice
Provides a code example of using the practice
Discusses potential negative/positive implications of using the practice
Describes alternative approaches to the same problem
Provides like for like code samples comparing the practice to alternative approaches
Discusses of pros/cons of the compared approaches
Offers a conclusion on when/where/if the practice is suitable

Using this metric, a manual page that describes a practice and provides a sample of how to use it would score two whereas an article that discussed the pros/cons of different approaches and made a recommendation would score seven.

Each of the 800 articles was given a Jadad-style score and a recommendation score.

Test Methodology

To verify that the suggested meta-analysis methodology produces meaningful results, a meta-analysis was performed on two practices where the result can be anticipated with a high degree of certainty. If the methodology works as intended, the following hypotheses should be proven true.

Singleton pattern

The singleton pattern is well known as being considered bad practice among developers[6] and will act as a good benchmark for testing the meta-analysis methodology.

Hypothesis - Singleton

Before the results were collected it was expected that articles which had a higher Jadad style score (higher academic rigour) would be more likely to suggest avoiding the practice.

Dependency Injection

Dependency Injection is antithesis to the Singleton Pattern and is much more flexible. Although there are some practical considerations when using Dependency Injection and there is widespread discussion about the best way to implement it, it's widely considered the best approach for flexibility[7].

Hypothesis - Dependency Injection

Dependency Injection a well established method of increasing flexibility in code[8]. Because of this, it is expected that there will be few to no negative recommendations and as the *Jadad* style score increases articles should be more likely to suggest favouring dependency injection over alternative approaches.

Preliminary results

Singleton

Each horizontal line represents an article and the left (orange) bar for each article is the recommendation going from 5: Avoid this practice at all costs (Far left) to 1: Favour this practice over alternatives.

The right (blue) bar for each article is the Jadad style score measuring analytic rigour. A score of seven means the article describes the practice, provide code examples, discusses alternative approaches, provides like-for-like code samples, discusses the pros/cons of each approach and makes a recommendation of which approach should be used.

Article 1 (at the bottom of the chart) has a recommendation score of 3 and a Jadad style score of 1. It does not go into detail and its recommendation is neutral; it doesn't suggest either avoiding or favouring use of the Singleton Pattern.

Article 100 (at the top of the chart) on the other hand strongly recommends against using the Singleton Pattern and has an Jadad style score of 7, it compares the singleton against alternatives in detail and concludes by strongly recommending against its use (recommendation score of 5).

There is a clear trend: As the Jadad style score increases, the author is more likely to recommend against using the Singleton pattern.

Key Findings - Singleton

As hypothesised, articles with a high analytic rigour are considerably more like to suggest avoiding the singleton pattern.
If a simple tally was used, the singleton pattern would appear to have a mostly neutral recommendation score. 65% of articles do not recommend for or against its use.
The mode recommendation is neutral. If a developer looked through articles about the singleton pattern, 65% of the articles they read would not recommend against using the Singleton Pattern.
The mean recommendation score is is 3.5. From this alone it could be inferred that the singleton pattern is generally considered to be neutral, slightly discouraged but not widely avoided.
When the Jadad style score is taken into account, every article which makes a recommendation recommends against using the singleton pattern (recommendation score of 4 or 5).
Only 22% of articles about the singleton pattern even mention alternative approaches that can be used to solve the same problem.
Of those that recommend against using the pattern, over half say it should be avoided at all cost.
55 of the 65 articles which make a neutral recommendation are manual type pages (Jadad style score of 2) which show how to use the pattern but do not weigh in on when, where or if it should be used and do not compare the pattern to alternatives.
No articles which make a recommendation recommend using the singleton pattern instead of alternative approaches

Dependency Injection

Key Findings - Dependency Injection

As hypothesised, Dependency Injection is seen as overwhelmingly positive with zero articles favouring alternative approaches.
The mean recommendation score is 1.94 which shows that even using a simple tally, the overall recommendation is that Dependency Injection is a favourable pattern among developers.
50% of articles suggest using dependency injection instead of alternatives
Every article with an analytic rigour score of 4 or higher recommends using this practice instead of alternative.
47 of the 50 articles with a neutral recommendation are manual style pages which show how the pattern is used but do not discuss when, where or if it should be used.
Discounting the manual pages, only two of the remaining 53 articles make a neutral recommendation and both of those have a Jadad style score 3.
As the Jadad style score increases, the probability that an article will recommend using Dependency Injection over alternatives increases
Only 5 of the 55 articles in favour of dependency injection (Recommendation score of < 3) suggest there are some specific circumstances where alternatives should be used instead.

Evaluation of the methodology

By testing the methodology with practices that the outcome can be predicted for it was possible to validate this meta-analysis methodology.

The methodology produced the expected result. It was shown that if an author considered alternative approaches they were more likely to recommend against using the Singleton Pattern. The inverse was also true for Dependency Injection.

As these were the expected results, the methodology suggested can be shown to work as intended and provide an overview of the attitudes of developers about any given practice.

This meta-analysis methodology gives more insight into the overall opinion of programming practices than a simple tally of for/against/neutral by also accounting for academic rigour.

A paper outlining this methodology entitled A methodology for performing meta-analyses of developer attitudes towards programming practices was presented at the 2019 SAI Computing Conference in London.

Results

This methodology was used on each bad practice which had been identified. For each remaining practice a meta-analysis with a sample size of 100 was conducted.

Global Variables

Key Findings

The mode recommendation score is 3 (neutral/no recommendation). A simple tally would imply that most people do not recommend either using or avoiding the practice.
The mean recommendation score is 3.22. A metric that did not account for analytic rigour would show developer's attitude to be slightly unfavourable but mostly neutral.
Although global variables have been described as "bad practice" since at least 1973[9] and are one of the first bad practices junior developers are taught about[10], only 21% of articles discussing global variables mention the negative implications of their use and only 17% recommend against using them.
Zero articles recommend using global variables over alternative approaches.
As the Jadad-style score increases, an article is more likely to recommend against using global variables.

Static Methods

Key Findings

The mode recommendation score is 3 (neutral/no recommendation). A simple tally would imply that most people do not recommend either using or avoiding the practice.
The mean recommendation score is 3.19. A metric that did not account for academic rigour would show developer's attitude to be mostly neutral.
The Jadad-style score correlates with the recommendation. The higher the Jadad-style score, the less likely the author is to recommend using static methods.
Considering how common static methods are, there is surprisingly little discussion about when they should be used.
Of those that do make a recommendation, 85% recommend against using them with 50% suggesting avoiding them at all cost.

Inheritance

Methodology notes

Due to the term "inheritance" not being exclusive to programming the search term inheritance class was used to bring up only programming related results. Similar search terms like inheritance programming or inheritance oop would yield similar results but the page may not mention oop or programming. However, any discussion about inheritance will need to mention classes.

Using this search term, only 7% of results made a recommendation on whether to use inheritance or not. All 7% argued in favour of alternatives.

As 7 articles is a very small sample size, additional search terms were used to find articles which specifically compare inheritance to alternatives:

"vs inheritance" class
"inheritance vs" class

These terms should not introduce any bias but return only relevant results. As previously, the search keyword class was appended to ensure only discussions about programming are returned. Relevant results from the first ten pages of each of these searches were added to the data set if the URL was not already present which is why there are more than 100 articles being included in the meta-analysis.

Key Findings

The mode recommendation score is 3 (neutral/no recommendation). A simple tally would imply that most people do not recommend either using or avoiding the practice.
The mean recommendation score is 3.4. A metric that did not account for analytic rigour would show developer's attitude to be slightly unfavourable but mostly neutral.
There are zero articles which suggest favouring inheritance over alternative approaches.
The Jadad-style score correlates with the recommendation. The higher the Jadad-style score, the less likely the author is to recommend using inheritance.

Service Locator

Key Findings

The mode recommendation score is 3 (neutral/no recommendation). A simple tally would imply that most people do not recommend either using or avoiding the practice.
The mean recommendation score is 3.65. A metric that did not account for analytic rigour would show developer's attitude to be somewhat unfavourable.
The Jadad-style score correlates with the recommendation. The higher the Jadad-style score, the less likely the author is to recommend using Service Locators.
56% of articles about Service Locators discuss negative implications of the pattern, the highest percentage of any practice analysed.

Annotations

Methodology notes

A meta-analysis was performed for annotations using the search term "annotation configuration". It quickly became apparent that this term was mostly yielding results demonstrating how annotations were used for configuration in a specific library (Swing) rather than comparing the use of annotations to alternative approaches.

This search was stopped after 20 results as most results were not relevant to the research:

18 of the 20 results relate to Java's popular Spring where annotations are very commonly used.
All results were examples of configuring libraries using annotations, rather than comparing annotations to alternative approaches.
None of the pages discussed whether annotations were a good practice or bad practice.

To find relevant results, which discuss the pros/cons of using annotations or alternative four new search terms were used:

annotation configuration "best practice"
annotation configuration "good practice"
annotation configuration "bad practice"
annotation configuration "anti pattern"

Searches were stopped after either 50 relevant results or page 10 of search results and results that appeared in more than on set of search results were only included once. In total 110 results were gathered across the four search terms.

Although these search terms all have explicit bias and will bring up results specifically discussing annotations against alternatives, searching explicitly for "good practice" and "best practice" should be biased in favour of results where authors talk favorably about annotations, however it was found that the inverse was true. Search results containig the terms "good practice" and "best practice" argued against using annotations for configuration whenever they made a recommendation.

Key Findings

The mode recommendation score is 3 (neutral/no recommendation). A simple tally would imply that most people do not recommend either using or avoiding the practice.
The mean recommendation score is 3.22. A metric that did not account for analytic rigour would show developer's attitude to be slightly unfavourable but nearly neutral.
Although there are many articles discussing annotations, only 8.1% of articles discuss potential alternative approaches and only 12.7% mention negative aspects of using the practice. However, this is only slightly lower than the very well known bad practice Global Variables which has 8% and 21% respectively.
Each of the nine articles that compares annotations to alternatives suggests using alternatives instead of annotations
In total twelve articles recommend using alternatives instead of annotations and zero recommend using annotations over alternatives
The Jadad-style score correlates with the recommendation. The higher the Jadad-style score, the less likely the author is to recommend using annotations

Setter Injection

Key Findings

The mode recommendation score is 3 (neutral/no recommendation). A simple tally would imply that most people do not recommend either using or avoiding the practice.
The mean recommendation score is 3.37. A metric that did not account for analytic rigour would show developer's attitude to be mostly neutral.
The Jadad-style score correlates with the recommendation. The higher the Jadad-style score, the less likely the author is to recommend using setter injection.
58% of articles discuss alternative approaches, the most of any practices analysed.

Meta-analyses Overall conclusions

Most programming practices are taught using examples, but very few articles regarding any practice discuss alternative approaches or when/where a given practice should be used over another.

This is potentially a serious problem for students and junior developers as they are taught practices without also being taught negative side effects of using those practices or alternative solutions to the same problem.

This is similar to teaching students of carpentry to use a jigsaw without teaching them about hand saws or chainsaws and where each one is useful.

if all you have is a hammer, everything looks like a nail - Proverb

This chart shows each practice broken down by the number of articles that discuss negative implications of the practice, discuss alternative approaches and make a recommendation.

The following conclusions were made based on over 800 articles being analysed across 8 bad practices:

Regardless of the practice being discussed, authors who consider alternative approaches come to different conclusions than those who do not.
Despite global variables being widely considered bad practice by even junior developers and known to cause issues since 1973 (Wulf et al, 1973) only 16% of articles about global variables recommend against their use and only 8% discuss alternative approaches
Of the 847 articles analysed, 33% mention the negative implications of the practice being discussed, while 26% discuss alternatives and 27% make a recommendation of when/where to use the practice. Although as demonstrated in Figure $NAME this varies significantly by the practice being discussed.
Although you may expect simpler practice like global variables to receive more discussion regarding the problems, advanced level practices such as service locators have a higher percentage of articles discussing their negative aspects than simpler practices such as global variables and inheritance. It is hypothesised that this is because:
1. People using these more advanced practices are using them to solve a specific problem and are more likely to consider alternatives
2. Professional programmers are not interested in talking about basic functionality like global variables and inheritance and would rather discuss more advanced concepts.
3. Less experienced programmers who are still using global variables are less likely to write articles.

Metric

The findings above confirm that the practices identified are widely considered to be bad practices and have negative effects on code flexibility. A metric was developed to analyse the overall flexibility of a piece of code (A libray, individual project, etc). It takes into account the size of the project and grades it on a score of 0-100.

This metric works by scanning the source code for known bad practices (as identifed in the research above) and grading the software based on the frequency of bad practices encountered.

Insphpect - Proof of concept

The tool on this website is a proof of concept of this metric. Please try it out for yourself and remember to complete the survey! Your feedback is vital to the final stage of the research!

Published Papers

Butler, T. and Johnson, M. (2017) Seven deadly sins of software flexibility. In 13th China Europe International Symposium Of Software Engineering Education, University of Derby, Derby.
Butler, T (2019) A methodology for performing meta-analyses of developer attitudes towards programming practices. In SAI Computing Conference, London.

References

Cochrane, C. (n.d.) Cochrane [online]. Available from: http://www.cochrane.org/
Mathie, R., Frye, J., Fisher, P. (2015) Homeopathic Oscillococcinum® for preventing and treating influenza and influenza-like illness. Cochrane Database System Rev 12 .
Goldacre, B. (2010) Bad Science ISBN: 978-0-00-724019-7. Fourth Estate.
Jadad, A., Moore, A., Carroll, D., Jenkinson, C. (1996) Assessing the quality of reports of randomized clinical trials: Is blinding necessary?. Controlled Clinical Trials 17(1), pp.1-12. ELSEVIER.
Olivo, S., Macedo, L., Caroline, I., Fuentes, J., Magee, D. (2008) Scales to assess the quality of randomized controlled trials: a systematic review.(Research Report). Physical Therapy 88(2), pp.156.
Knack-Nielsen, T. (2008) What's so bad about the Singleton? [online]. Available from: http://www.sitepoint.com/whats-so-bad-about-the-singleton/
Albert, A. (2013) Why should we use dependency injection? [online]. Available from: http://www.javacreed.com/why-should-we-use-dependency-injection/
Fowler, M. (2004) Inversion of Control Containers and the Dependency Injection pattern [online]. Available from: http://martinfowler.com/articles/injection.html
Wulf, W., Shaw, M. (1973) Global varaibles considered harmful. ACM SIGPLAN Notices , pp.28-34.
Judis, S. (2017) The global object in JavaScript: a matter of platforms, unreadable code and not breaking the internet [online]. Available from: https://www.contentful.com/blog/2017/01/17/the-global-object-in-javascript/