Insphpect

Background Research

Insphpect is part of a Ph.D Project by Thomas Butler at the University of Northampton in the UK.

The tool on this website is a proof of concept of the metric which has been developed as a result of this research. Please try it out for yourself and remember to complete the survey! Your feedback is vital to the final stage of the research!

Project aim

The aim of the Ph.D project is to develop a metric for analysing source code flexibility by identifying programming practices known to reduce flexibility. For example, global variables and singletons.

Insphpect is a tool which has been created to test and evaluate this metric.

Methodology

Firstly, bad practices were identified by literature review. During this review, it was discovered that most "bad practices" which are dicussed frequently by industry professionals are not discussed in the same terms among academics.

A paper outling this disconnect between industry and academia entitled Seven deadly sins of software flexibility was presented at the 13th China Europe International Symposium Of Software Engineering Education in Athens in 2017.

Because the idea of a "bad practice" can be subjective, a meta-analysis was performed to collect developer's opinions on the following practices:

Meta-Analysis

Because these bad practices are rarely discussed in academia, a meta-analysis was performed of all literature, not just academic literature to determine developer's opinions on these bad practices.

For each bad practice, the first 100 relevant results* from google for each practice was included in the analysis. Over 800 articles (100 for each practice).

*A *relevant result* is defined as an article which is written by a single author or organisation describing or discussing the practice in question. Discussion forums, posts on social media and question & answer sites will not be included as these pages will include multiple opinions. Comments sections on articles were be omitted for the same reason. Any article which only discussed the practice in passing was also omitted from the analysis.

Controls

Benefits of using Google for the search meta-analysis sample

Each of the 800 articles was then graded on two metrics:

Recommendation score

Each article was given a grade from 1-5 for the recommendation made by the author:

  1. Always favour this practice over alternatives
  2. Favour this practice over alternatives unless specific (described) circumstances apply
  3. Neutral - No recommendation (e.g. a manual page) or no conclusion drawn
  4. Only use this practice in specific (described) circumstances
  5. Always favour alternative approaches

Jadad style score

Differing methodological rigor in sources is a problem which exits exists when doing any kind of meta-analysis. When performing meta-analysis of clinical trials the Cochrane Collaboration consider methodological rigour an important part of their meta-analysis[1].

Rather than simply counting the number of trials which show a positive outcome and counting the number of trials which show a negative outcome, the trials are weighted on methodological rigour. For example, in a meta-analysis of a drug they may find that 3 trials show that it is an effective treatment and 8 which say that it is not. Instead of simply counting the numbers on each side, methodological rigor of each study is used as a factor when building conclusions on the overall efficacy of the treatment.

In a meta-analysis of the efficacy of homeopathic treatments it was found that trials of homeopathy with a poor methodology are much more likely to show a positive outcome whereas trials with a robust methodology are more likely to conclude that homeopathy is no better than placebo[2] .

This is because methodological rigour can affect the outcome. For example, by putting the most healthy patients in the experimental group and putting the least healthy patients in the control group it's likely that the experimental group will see significant improvement over the control group regardless of whether the drug being tested has any effect[3].

For programming articles, analytic rigour can be plotted against whether the article recommends using or avoiding the practice to create a meta-analysis in a similar manner.

It should be possible to draw conclusions such as as an article's analytic rigour increases, it is more likely to recommend using the practice in question.

The created metric was based on the Jadad Scale[4] used for analysis of clinical trials in medicine. The Jadad Scale is a 5 point scale using a 3 question questionnaire which can be used to quickly assess the methodological rigour used in a clinical trial. The questions asked are: *Was the study described as randomized?*, *Was the study described as double blind?* and *Was there a description of withdrawals and dropouts?*. These are then used to calculate a score from zero (very poor) to five (rigourous). By citation count the Jadad Scale is the most widely used method of comparing clinical trials in the world[5].

As the Jadad Scale is not applicable for anything other the clinical trials, a new metric was created based on the principles of the Jadad scale to be used in determining the analytic rigour of any given article about a programming practice. A seven point scale was chosen with a point awarded if the article does each of the following:

  1. Describes how to use the practice
  2. Provides a code example of using the practice
  3. Discusses potential negative/positive implications of using the practice
  4. Describes alternative approaches to the same problem
  5. Provides like for like code samples comparing the practice to alternative approaches
  6. Discusses of pros/cons of the compared approaches
  7. Offers a conclusion on when/where/if the practice is suitable

Using this metric, a manual page that describes a practice and provides a sample of how to use it would score two whereas an article that discussed the pros/cons of different approaches and made a recommendation would score seven.

Each of the 800 articles was given a Jadad-style score and a recommendation score.

Test Methodology

To verify that the suggested meta-analysis methodology produces meaningful results, a meta-analysis was performed on two practices where the result can be anticipated with a high degree of certainty. If the methodology works as intended, the following hypotheses should be proven true.

Singleton pattern

The singleton pattern is well known as being considered bad practice among developers[6] and will act as a good benchmark for testing the meta-analysis methodology.

Hypothesis - Singleton

Before the results were collected it was expected that articles which had a higher Jadad style score (higher academic rigour) would be more likely to suggest avoiding the practice.

Dependency Injection

Dependency Injection is antithesis to the Singleton Pattern and is much more flexible. Although there are some practical considerations when using Dependency Injection and there is widespread discussion about the best way to implement it, it's widely considered the best approach for flexibility[7].

Hypothesis - Dependency Injection

Dependency Injection a well established method of increasing flexibility in code[8]. Because of this, it is expected that there will be few to no negative recommendations and as the *Jadad* style score increases articles should be more likely to suggest favouring dependency injection over alternative approaches.

Preliminary results

Singleton

Singeton results

Each horizontal line represents an article and the left (orange) bar for each article is the recommendation going from 5: Avoid this practice at all costs (Far left) to 1: Favour this practice over alternatives.

The right (blue) bar for each article is the Jadad style score measuring analytic rigour. A score of seven means the article describes the practice, provide code examples, discusses alternative approaches, provides like-for-like code samples, discusses the pros/cons of each approach and makes a recommendation of which approach should be used.

Article 1 (at the bottom of the chart) has a recommendation score of 3 and a Jadad style score of 1. It does not go into detail and its recommendation is neutral; it doesn't suggest either avoiding or favouring use of the Singleton Pattern.

Article 100 (at the top of the chart) on the other hand strongly recommends against using the Singleton Pattern and has an Jadad style score of 7, it compares the singleton against alternatives in detail and concludes by strongly recommending against its use (recommendation score of 5).

There is a clear trend: As the Jadad style score increases, the author is more likely to recommend against using the Singleton pattern.

Key Findings - Singleton

Dependency Injection

DI results

Key Findings - Dependency Injection

Evaluation of the methodology

By testing the methodology with practices that the outcome can be predicted for it was possible to validate this meta-analysis methodology.

The methodology produced the expected result. It was shown that if an author considered alternative approaches they were more likely to recommend against using the Singleton Pattern. The inverse was also true for Dependency Injection.

As these were the expected results, the methodology suggested can be shown to work as intended and provide an overview of the attitudes of developers about any given practice.

This meta-analysis methodology gives more insight into the overall opinion of programming practices than a simple tally of for/against/neutral by also accounting for academic rigour.

A paper outlining this methodology entitled A methodology for performing meta-analyses of developer attitudes towards programming practices was presented at the 2019 SAI Computing Conference in London.

Results

This methodology was used on each bad practice which had been identified. For each remaining practice a meta-analysis with a sample size of 100 was conducted.

Global Variables

global variable results

Key Findings

Static Methods

Static methods results

Key Findings

Inheritance

Inheritance results

Methodology notes

Due to the term "inheritance" not being exclusive to programming the search term inheritance class was used to bring up only programming related results. Similar search terms like inheritance programming or inheritance oop would yield similar results but the page may not mention oop or programming. However, any discussion about inheritance will need to mention classes.

Using this search term, only 7% of results made a recommendation on whether to use inheritance or not. All 7% argued in favour of alternatives.

As 7 articles is a very small sample size, additional search terms were used to find articles which specifically compare inheritance to alternatives:

Key Findings

Service Locator

Service Locator results

Key Findings

Annotations

Anotations results

Methodology notes

A meta-analysis was performed for annotations using the search term "annotation configuration". It quickly became apparent that this term was mostly yielding results demonstrating how annotations were used for configuration in a specific library (Swing) rather than comparing the use of annotations to alternative approaches.

This search was stopped after 20 results as most results were not relevant to the research:

To find relevant results, which discuss the pros/cons of using annotations or alternative four new search terms were used:

Searches were stopped after either 50 relevant results or page 10 of search results and results that appeared in more than on set of search results were only included once. In total 110 results were gathered across the four search terms.

Although these search terms all have explicit bias and will bring up results specifically discussing annotations against alternatives, searching explicitly for "good practice" and "best practice" should be biased in favour of results where authors talk favorably about annotations, however it was found that the inverse was true. Search results containig the terms "good practice" and "best practice" argued against using annotations for configuration whenever they made a recommendation.

Key Findings

Setter Injection

Setter injection results

Key Findings

Meta-analyses Overall conclusions

Most programming practices are taught using examples, but very few articles regarding any practice discuss alternative approaches or when/where a given practice should be used over another.

This is potentially a serious problem for students and junior developers as they are taught practices without also being taught negative side effects of using those practices or alternative solutions to the same problem.

This is similar to teaching students of carpentry to use a jigsaw without teaching them about hand saws or chainsaws and where each one is useful.

if all you have is a hammer, everything looks like a nail - Proverb
Overall discussion

This chart shows each practice broken down by the number of articles that discuss negative implications of the practice, discuss alternative approaches and make a recommendation.

The following conclusions were made based on over 800 articles being analysed across 8 bad practices:

Metric

The findings above confirm that the practices identified are widely considered to be bad practices and have negative effects on code flexibility. A metric was developed to analyse the overall flexibility of a piece of code (A libray, individual project, etc). It takes into account the size of the project and grades it on a score of 0-100.

This metric works by scanning the source code for known bad practices (as identifed in the research above) and grading the software based on the frequency of bad practices encountered.

Insphpect - Proof of concept

The tool on this website is a proof of concept of this metric. Please try it out for yourself and remember to complete the survey! Your feedback is vital to the final stage of the research!

Published Papers

References

  1. Cochrane, C. (n.d.) Cochrane [online]. Available from: http://www.cochrane.org/
  2. Mathie, R., Frye, J., Fisher, P. (2015) Homeopathic Oscillococcinum® for preventing and treating influenza and influenza-like illness. Cochrane Database System Rev 12 .
  3. Goldacre, B. (2010) Bad Science ISBN: 978-0-00-724019-7. Fourth Estate.
  4. Jadad, A., Moore, A., Carroll, D., Jenkinson, C. (1996) Assessing the quality of reports of randomized clinical trials: Is blinding necessary?. Controlled Clinical Trials 17(1), pp.1-12. ELSEVIER.
  5. Olivo, S., Macedo, L., Caroline, I., Fuentes, J., Magee, D. (2008) Scales to assess the quality of randomized controlled trials: a systematic review.(Research Report). Physical Therapy 88(2), pp.156.
  6. Knack-Nielsen, T. (2008) What's so bad about the Singleton? [online]. Available from: http://www.sitepoint.com/whats-so-bad-about-the-singleton/
  7. Albert, A. (2013) Why should we use dependency injection? [online]. Available from: http://www.javacreed.com/why-should-we-use-dependency-injection/
  8. Fowler, M. (2004) Inversion of Control Containers and the Dependency Injection pattern [online]. Available from: http://martinfowler.com/articles/injection.html
  9. Wulf, W., Shaw, M. (1973) Global varaibles considered harmful. ACM SIGPLAN Notices , pp.28-34.
  10. Judis, S. (2017) The global object in JavaScript: a matter of platforms, unreadable code and not breaking the internet [online]. Available from: https://www.contentful.com/blog/2017/01/17/the-global-object-in-javascript/