Note - Size Matters

Software projects are often too large for any one person to understand completely.

Academic projects range in size from a few hundred Source Lines of Code (SLOCS) to perhaps 50,000 lines for group projects, like those we build in Software Studio. If printed at about 50 lines per page, not counting heading, that's a range of 4 to more than 1000 pages of text.
Professional projects range in size from perhaps 10,000 SLOCS to more than 50,000,000. The putative size of Windows 10 is claimed to be 60 million lines, for example. If printed that's a range of about 200 to more than 1,000,000 pages of text.
Obviously size matters! A lot of our design effort goes into managing this large volume of text, even for some academic projects, and always for professional projects.
So, we need to partition this mass of text into parts that are small enough to be understood by one person. We do this by breaking up our source code into files, called packages, which are the smallest units of compilation.
A well designed source package will usually have one or two classes with typically about 10 functions each which should average about 50 lines of code or less per function. Let's assume that the average package size is 1000 SLOCS.
Consequently, an academic project will usually contain a range of packages from one to about fifty packages while a professional project will range from about 10 to 50,000 packages.

Project Type - metric	Small Project	Large Project
Academic - line count	200	50,000
Academic - page count	4	1000
Academic - package count	1	50
Professional - line count	10,000	50,000,000
Professional - page count	200	1,000,000
Professional - package count	10	50,000

Software Production Effort:

Frederick Brooks, former chair of Computer Science at UNC, Chapel Hill, and former manager of the IBM 360 Hardware project and the IBM 360 System Software Project, has interesting views on the subject of building large software systems. I've adapted, in the following, one of his examples from his book "Mythical Man Month", 2nd Edition.

You're a software developer for your company, and have been asked, by your team lead, to estimate how much effort, E, it will take you, in person days, to complete a new application. Since you've taken several project-based courses in your graduate program, you use that experience to arrive at a realistic estimate. That's shown by effort E in the top left entry in the table below.

But wait! Perhaps your team lead doesn't want a Program, but instead wants a Program Product. That increases the effort by a factor of three. That's due to the process of turning your program into a product by careful Specification, thorough Unit Testing, and writing a Design Document so the code can be maintained.

But wait again! It's likely that what your team lead really wants is a Program System Product, e.g., your "Program" must be manufactored to professional standards to become a product, and must also integrate in with several other programs that are part of a software system your company is developing. All the parts need to understand a common set of communication protocols, must be integrated together, and should have consistent user interfaces across all the parts of the system. That increases effort again by about a factor of three, e.g., it takes nine times the effort to build a Program System Product compared to what it takes to build a Program.

		Multiply effort E by 3 in this column
	Program (E) Construction Test Documentation in Code Comments	Program Product (3*E) add Specs & Design Documentation add Unit Test
Multiply effort E by 3 in this row	Program System (3*E) add consistent format add communication protocols add integration	Program System Product (9*E) Specs & Design Documentation Consistent format Communication Protocols Unit Test Integration Integration Test

Schedule:

I've written a lot of code over many years, and by now, write about 300 source lines of code a day (SLOCs), including design of the day's code, documentation in code comments, and construction testing. So I should estimate, for building Program System Products, that I would generate one ninth of that. Since I'm a greybeard, I'll also spend time in meetings, and mentoring entry level developers. So, will estimate that I'll produce, in a commercial development environment, about 20 source lines of code a day. That productivity was typical of a couple of very well managed software engineering organizations in which I've worked - we kept careful records on which we based bids for contracts, that indicated this level of productivity.

Now, suppose we estimate a new contract will require about 1 million SLOCs. Then, at 20 lines per day the contract will require:

1 million SLOCs ==> 50,000 person days of effort

Figuring two weeks of vacation, 5 days a week, and 10 days of holidays and sick time for a person year, offset by 10 days of overtime, that results in:

200 person years

and if we plan a staff of 50 developers, that results in:

4 years to complete

In that time we will produce, based on the first table, about one thousand packages.

1,000,000 SLOC

1000 packages

50 developers

4 years to complete

No one person can understand that much code. It's critically important that we abstract away a lot of that detail to build models of the software that we can think critically about. Learning to build these models, using diagramming, documentation, and analyses is what this course is all about.

Conclusions for: Software Size Matters!

Developing Products requires much more effort than building Programs:

Commercial products are much larger than academic projects, entailing perhaps 1 million SLOCs in 1 thousand packages. Many commercial and industrial software systems are much larger than even this.
Commercial products require more documentation, more testing, and much more communication within large development teams.
Abstraction is essential: Concepts, Diagrams, Packages, Prototypes
No one can understand a system of 1 million SLOCs by looking at its source code alone. We need to build models based on standard diagramming techniques, e.g., Universal Modeling Language (UML), figure out how to critically analyze and evaluate the packages that make up this product and the structure that results from tying those packages together in an operational system.
Software reuse is important.
Reuse means to use packages without changing any of their text.
Compiler libraries are a good example. But we need to design our own code for reuse where that makes sense. We don't want to have to create 1,000 packages. The C# language and .Net framework were designed to support reuse and we will examine how to tap into that support.
Software salvage is important.
Salvage means that we start with existing packages and extend them with additional packages, or modify them, as little as possible, to suit a new application.
Salvage is harder to do gracefully than reuse. We've explored one elegant way to support salvage in our research, using a technology we call Software Matrix.

We will spend a lot of our time learning how to develop, analyze, and manage large distributed software systems. The Final Project will have you develop the architecture for one of these.