Purpose:
One focus area for this course is understanding how to structure and implement big software systems. By big we mean systems
that may consist of hundreds or even thousands of packages1 and perhaps several million lines of code.
We won't be building anything quite that large, but our projects may be considerably bigger than anything you've worked
on before.
In order to successfully implement big systems we need to partition code into relatively small parts and thoroughly test
each of the parts before inserting them into the software baseline2. As new parts are added to the baseline and
as we make changes to fix latent errors or performance problems we will re-run test sequences for those parts
and, perhaps, for the entire baseline. Managing that process efficiently requires effective tools for code analysis
as well as testing. How we do that code analysis is illustrated by the projects for this year.
The projects this Fall focus on building software tools for code analysis. We will emphasize C# code
but want our tools to be easily extendable to other similar languages like C++ and Java.
Code analysis consists of extracting lexical content from source code files, analyzing the code's syntax from its lexical
content, and building a Type Table holding the dependency results. Alternately you can provide an Abstract Syntax Tree (AST) that holds the results of our analysis. It is then fairly easy to build
several backends that can do further analyses on the AST to construct code metrics, search for particular constructs, evaluate
package dependencies, or some other interesting features of the code.
You will find it useful to look at the Parsing blog for a brief introduction
to parsing and code analysis.
In this third project we will build and test a package dependency analyzer in C# that consists of, at least, these packages:
-
Tokenizer
extracts words, called tokens, from a stream of characters. Token boundaries are white-space characters, transitions between alphanumeric
and punctuator characters, and comment and string boundaries. Certain classes of punctuator characters belong to single character or
two character tokens so they require special rules for extraction.
-
SemiExpression
groups tokens into sets, each of which contain all the information needed to analyze some
grammatical construct without containing extra tokens that have to be saved for subsequent analyses. SemiExpressions are
determined by special terminating characters: semicolon, open brace, closed brace, and newline when preceeded on the same line with
'using'.
-
TypeTable
Provides a container that stores type information needed for dependency analysis.
-
TypeAnalysis
Finds all the types defined in each of a collection of C# source files. It does this by building rules to
detect type definitions - classes, structs, enums, and aliases.
-
DepAnalysis
Finds, for each file in a specified collection, all other files from the collection on which they depend. File A
depends on file B, if and only if, it uses the name of any type defined in file B. It might do that by calling a
method of a type or by inheriting the type. Note that this intentionally does not record dpedndencies of a file on files
outsied the file set, e.g., language and platform libraries.
-
StrongComponent
A strong component is the largest set of files that are all mutually dependent. That is, all the files whcih can
be reached from any other file in the set by following direct or transitive dependency links. The term 'Strong Component'
comes from the theory of directed graphs. There are a number of algorithms for finding strong components in graphs.
My favorite is the Tarjan Algorithm, nicely described here:
Tarjan Algorithm and pseudo code.
You will n eed a graph class to implement this. You will find one in the C# Repository:
C# graph class.
-
Display
Uses information in the TypeTable to build an effective display of the dependency relationships between all files in the
selected collection. Note that you are not expected to provide a graphical display. An indented text display will satisfy
these requirements.
-
Tester
Provides code to demonstrate you meet all requirements.
Use of the C# Parser, provided in the C# Repository is recommended.
You will find the Project2-InstrSolF2018 to be helpful.
That code integrates a complete solution of Project #2 with the C# Parser, and provides some of the rules you will need
for Project #3 (not all the Rules and Actions, of course).