Quick note, registration for AWS re:Invent is now open! This year it is free and virtual, to be held over 3 weeks starting Nov 30th. Come join us then!
Publishing this newsletter has given me an incredible opportunity to meet many awesomely talented and brilliant technologists. One of those people is Amnon Eden, a UK based computer scientist specializing in artificial intelligence, machine learning, and software engineering practices.
The newsletter is mostly my voice and platform to share what I learn from the many smart and experience people that I speak with regularly on topics ranging from developer culture, to engineering principles, to digital transformation. My goal however was for this to be a publication that can be a platform for the voices of all those in the DEVBIZOPS community.
What ideas are you looking to get out into the world? Let this newsletter be one avenue to use to share out to a global audience. In that spirit, I asked Amnon if it would be ok to share his expanded thoughts on my essay last week on Developer Documentation concluding with how AI will be a major force in shaping how we document code and enable collaboration.
Please do continue reading and if you are inspired to scratch that writer’s itch and share your thoughts, reply to this newsletter and I would be happy to chat more!
Last week’s DevBizOps blog entry (“Developer Documentation: Developers don’t like writing docs, what’s the alternative?“) asked: How programmers can get their answers from documentation, and are there alternatives?
As ever Mark’s post advises on questions every developer has to ask in understanding software. Open source or proprietary, documentation is necessary either for using, extending, or changing software. The costs of searching for answers are known to developers and project managers. Empirical literature and decades of research in software comprehension and reverse engineering show that the costs of understanding software are significant. What, then, can be done?
We offer five more lessons in addition to Mark’s post, presented as “formulas”:
Self-Documented Design = Self-Explanatory Design: Self-documented code and self-documented design
Document Value = Accuracy ⁄ Technical Debt: Technical debt diminishes the value of any fixed document
Reverse-Engineered Analysis > Static Documentation: Knowledge extracted from code is much more accurate than any old paper
AI Analysis >> Reverse-Engineering: AI can improve over traditional reverse engineering
AIn+1 > AIn: AI progresses and “the sky is the limit”
1. Self-Documented Design = Self-Explanatory Design
Mark urges developers to write self-documenting code, and I couldn’t agree more: The ultimate ‘documentation’ is the source code itself — it literally defines how the program works. But naming conventions for classes and methods/functions are only the start: Modular, well-designed code is navigated much faster.
A clear and simple architecture, in and of itself, is clear and simple documentation. Developers should strive to make their entire ontology explicit, for example by building the object-oriented class hierarchy as the requirements [Booch OOA&D]. For example, a whopping 90% of the code developed in Java could be eliminated in certain areas using functional programming, and therefore eliminate 95% of the documentation needed. So choose your programming paradigm wisely and use it to simplify your design, because simple design is self-explanatory.
2. Document Value = Accuracy ⁄ Technical-Debt
It is interesting to learn that 83% of developers use the official documentation. We should also ask, does it help? Does software documentation answer questions, or does it help only a little, or perhaps it makes no difference — a burden?
The return on the effort to read and understand documentation diminishes with technical debt, a subject presented in a previous post [Birch 2020]. Since developers’ time is rarely spent on refactoring and streamlining design, the accumulation of technical debt eventually renders the documentation irrelevant, however much effort has been spent on it. Developers could spend considerable time on hundreds (or more) pages only to realize that the paper they are reading is–
Too abstract: The code is described correctly but only in very general terms
Eroded: The original code is described correctly, but changes made conflict with the documentation (i.e. violate its architectural decisions) [Garlan Architectural Mismatch]
Obsolete: The original code is described correctly but major changes made the documentation irrelevant
What, indeed, is the alternative?
3. Reverse-Engineered Analysis > Static Documentation
Mark’s post mentions Quod AI, which uses machine learning methods to mine for answers to specific questions. Code analysis has merit for several reasons: First, Quod AI’s tool focus on interactive replies to specific queries. Rather than generating documents, Quod AI take a specific query and analyses the code in search for the answer. Effective static and dynamic analysis tools could eliminate the need for most trivial documentation:
Static analysis: Create a “picture” of the program’s modules and their dependencies, e.g. by learning class diagrams or Codecharts [Eden: Wiley]
Dynamic analysis: Find out the various possible behaviours of the program, e.g. by learning formulas in temporal logic [the model checking approach]
This approach has been taken by reverse engineering tools since the 1980s with mixed results [Kazman Reengineering]. Design recovery tools can only go so far by generating UML diagrams [Gueheneuc Design Recovery]. Our own team developed a tool for visual navigation in programs and generating roadmaps at varying degrees of abstraction [Gasparis Design Navigator]. Later our team also developed a round-trip engineering tool [Eden Round-Trip]. However, the low-hanging fruit of round-trip engineering is yet to be picked.
4. AI Analysis >> Reverse-Engineering
The recent successes of data science raises the obvious question: Can AI help understand programs? There are good reasons to believe that AI can do great deal to help. If nothing else, it is a method that integrates the information that has already been made available by existing techniques as discussed above , (1) static and (2) dynamic analysis, and three more mechanisms:
Analyze natural language: use NLP to find requirements in written documentation
Analyze drawn diagrams (e.g. UML): use image segmentation and analysis (eg translators) to find design decisions in diagrams
Analyze secondary sources: Extract information from versioning history and features such as specific individual who made the change, including their comments.
As for the more distant future (2-10 years):
5. AIn+1 > AIn (“the sky is the limit”)
Looking forward in time we should ask, by exactly how much will AI be able help developers? First, keep in mind that AI has diminishing limits on speed and space, therefore it could parse and organize very large programs nearly instantaneously. Second, and more importantly, the intelligence of its answers is only limited by the intelligence of the tool. How “intelligent” can we expect AI analysis tools to be?
As much as one can use history to forecast technological impact [e.g. Gartner’s Hype Cycle 2020], the history of AI does show a steady improvement in the intelligence demonstrated by machines. From the General Problem Solver (1957) and Aliza (1966) to Deep Blue’s Chess championship (1997), Watson winning Jeopardy! championship (2011), and AlphaGo Zero beating AlfaGo 100:1 after AlphaGo beat Go champions (2017), it is evident that AI has become significantly more “intelligent”. Self-driving cars, personal assistants/chatbots, and AI in the diagnosis of a broad range of medical conditions [BMJ 2018] that have been shown o be as effective as humans had been largely imaginary only a decade ago. In 2020, AI has replaced humans in filtering CVs and passport control, and in increasing number of application areas outperforms humanity. Progress in AI has even had leading computer scientists to consider the possibility of machine superintelligence [Eden: Springer].
It is fairly safe to conclude that intelligent use of AI could lead to the paradigm shift in software analysis that is long overdue. So yes, leading AI researchers and computer scientist alike believe that AI is the future for documentation and possibly much more!
References
BMJ 2018 - Loh, Erwin. ‘Medicine and the Rise of the Robots: A Qualitative Review of Recent Advances of Artificial Intelligence in Health’. BMJ Leader 2, no. 2 (June 2018): 59–63. https://doi.org/10.1136/leader-2018-000071.
Birch 2020 - Mark Birch, “Circuit Breaker: How to balance the work to do with the work of improvement”, April 2020, http://devbizops.substack.com/p/circuit-breaker-bb872cc3a47#
Booch OOA&D - Booch, Grady. Object Oriented Design with Applications. Redwood City, CA: Benjamin/Cummings Pub. Co., 1991.
Eden: Wiley - Eden, Amnon H, with contributions from Jon Nicholson. Codecharts: Roadmaps and Blueprints for Object-Oriented Programs. Hoboken, N.J.: Wiley-Blackwell, 2011. https://onlinelibrary.wiley.com/doi/book/10.1002/9780470891032.
Eden: Springer - Eden, Amnon H., James H. Moor, Johnny H. Søraker, and Eric Steinhart, eds. Singularity Hypotheses: A Scientific and Philosophical Assessment. The Frontiers Collection. Springer, 2013. http://www.springer.com/engineering/computational+intelligence+and+complexity/book/978-3-642-32559-5.
Eden Round-Trip - Eden, A.H., E. Gasparis, J. Nicholson, and R. Kazman. ‘Round-Trip Engineering with the Two-Tier Programming Toolkit’. Software Quality Journal 26, no. 2 (1 June 2018): 249–71. https://doi.org/10.1007/s11219-017-9363-9.
Garlan Architectural Mismatch - Garlan, David, Robert Allen, and John Ockerbloom. ‘Architectural Mismatch or Why It’s Hard to Build Systems out of Existing Parts’. In Proceedings of the 17th International Conference on Software Engineering, 179–85. Seattle, Washington, United States: ACM, 1995. https://doi.org/10.1145/225014.225031.
Gasparis Design Navigator - Gasparis, Epameinondas, Amnon H. Eden, Jonathan Nicholson, and Rick Kazman. ‘The Design Navigator: Charting Java Programs’. In Tool Demonstrations, Proc. of 30th IEEE Int’l Conf. on Software Engineering—ICSE 2008. Leipzig, Germany: IEEE Computer Society Press, 2008.
Gueheneuc Design Recovery - Gueheneuc, Y.-G., K. Mens, and R. Wuyts. ‘A Comparative Framework for Design Recovery Tools’, 10 pp. – 134, 2006. https://doi.org/10.1109/CSMR.2006.1.
Kazman Reengineering - Kazman, Rick, Steven G. Woods, and S. Jeromy Carrière. ‘Requirements for Integrating Software Architecture and Reengineering Models: CORUM II’. In Proceedings of the Working Conference on Reverse Engineering (WCRE’98), 154. IEEE Computer Society, 1998. http://portal.acm.org/citation.cfm?id=837030.