T B H P N

Code Artistry - Program Scopes and Global Data

Initial Thoughts:

Global data is often said to be "bad". By global, we mean that the data is subject to some form of non-local sharing. Throughout this discussion we take "data" to mean an instance of some language or user defined type, including language primitives like ints.

Most data is shared, otherwise why create it in the first place? However, the scope of this sharing has deep implications for the ease and effectiveness of the code's implementation and maintenance.

Problems with Global Sharing:

These problems make development, testing, and maintenance of code containing global sharing difficult and often quite unproductive.

Scopes:

For purposes of this discussion we will define a package as a single source code file in C# and Java, and, for C and C++, a pair of source code files - header and matching implementation.

Access to data in the four major development languages C, C++, Java, and C#, depends on scope. These languages define, for each package, a tree of nested scopes using the symbols "{" and "}". Code that resides outside any "{", "}" block is said to be in the program's global scope. There are two kinds of scope:

Namespaces can be defined only at the global level. That is, they may be sequenced and nested in the global scope, but may not be nested within any non-namespace scope. They serve to qualify program names to avoid conflicts when code is developed by more than one person, and also to provide a weak semantic structuring for large bodies of code.

In C# and Java, handles to reference types and instances of value types may be declared only as members of a class or struct or locally in a member function. C and C++ allow type instances to be declared in global scope, in a namespace scope, and as members of a class or struct or locally in a global or member function.

Data Access:

Access to data is controlled by the scope in which it resides:

What do we mean by Global Sharing?

It's relative. Data defined outside any class, struct, or global function, is global to the program. Non-static member data is global to the instance of its class or struct. Static member data is global to the set of all instances of its type. Data defined inside the scope of a function is global to all code in the function after the point of its declaration.

I expect that most developers and language enthusiasts mean program global access when they use the term "global". Note that a database provides program global sharing of data. Its use is subject to some of the same problems as raw data, although most database facilities provide safe multi-threaded operation.

So What?

The difficulties we encounter with shared data depends largely on its scope of definition. Data defined in a program's global scope is the most likely to cause difficulties in development and maintenance, as cited in "Problems with Global Sharing", above. As the scope of access is reduced the problems become easier to manage; and for local and non-static member data, in small functions and classes, almost non-existent because the entire scope is designed and implemented by one person at one time in a page or two of code.

Constant data defined in the global scope causes no problems and has semantic value, providing a program-wide name for an invariant. Global data with a single writer has only one specific problem - a value intended for one reader may be overwritten with a value for another reader before the first is read.

Using global data with multiple writers and readers should almost always be avoided because its just too hard to understand and manage. Java and C# prohibit all use of global data, even the beneficial use of global invariants.

Sharing through static member data can be very useful. Here's an example.

We've decided to communicate between two processes in our application using enqueued messages. There may be several parts of the sending process's code that need to create and post messages. If they are in different scopes how do they all access the queue?

It would be hard to make a simple elegant design that passed references to the queue into every scope that needs access. If we write a small wrapper class that holds a static instance of the queue and makes it available, the code in any other scope that needs access can simply declare an instance of the wrapper class to get to the queue. If the queue is thread-safe, then the wrapper becomes very easy to use.

In C++ we can declare the wrapper as a template type that takes an integral type as a template parameter. That will create seperate types for each value of the parameter and so we can define queues that are shared by different categories of users, e.g., those that need acess to an input queue and those that need access to an output queue.

Summary:

Use program global data only for program invariants, e.g., initialized once and read many times. Use static member data to share between instances and provide access to a facility, like a queue, across program scopes. But, be aware of threading issues with static member data. While virtually all programs beyond the "Hello" variety need shared data, we should attempt to minimize and localize its use as much as possible.

Newhouse