XML Graphs in Program Analysis
Anders Møller | Michael I. Schwartzbach

invited for PEPM 2007

XML graphs have shown to be a simple and effective formalism for representing sets of XML documents in program analysis. It has evolved through a six year period with variants tailored for a range of applications. We present a unified definition, outline the key properties including validation of XML graphs against different XML schema languages, and provide a software package that enables others to make use of these ideas. We also survey four very different applications: XML in Java, Java Servlets and JSP, transformations between XML and non-XML data, and XSLT.

PDF

A Domain-Specific Programming Language for Secure Multiparty Computation
Janus Dam Nielsen | Michael I. Schwartzbach

to appear in PLAS 2007

We present a domain-specific programming language for Secure Multiparty Computation (SMC). Information is a resource of vital importance and considerable economic value to individuals, public administration, and private companies. This means that the confidentiality of information is crucial, but at the same time significant value can often be obtained by combining confidential information from various sources. This fundamental conflict between the benefits of confidentiality and the benefits of information sharing may be overcome using the cryptographic method of SMC where computations are performed on secret values and results are only revealed according to specific protocols. We identify the key linguistic concepts of SMC and bridge the gap between high-level security requirements and low-level cryptographic operations constituting an SMC platform, thus improving the efficiency and security of SMC application development. The language is implemented in a prototype compiler that generates Java code exploiting a distributed cryptographic runtime.

PDF

A Singular Choice for Multiple Choice
Gudmund S. Frandsen | Michael I. Schwartzbach

in SIGCSE Bulletin, Vol 38(4), 2006

How should multiple choice tests be scored and graded, in particular when students are allowed to check several boxes to convey partial knowledge? Many strategies may seem reasonable, but we demonstrate that five self-evident axioms are sufficient to determine completely the correct strategy. We also discuss how to measure robustness of the obtained grades. Our results have practical advantages and also suggest criteria for designing multiple choice questions.

PDF

Static Validation of XSL Transformations
Anders Møller | Mads Østerby Olesen | Michael I. Schwartzbach

to appear in TOPLAS

XSL Transformations (XSLT) is a programming language for defining transformations between XML languages. The structure of these languages is formally described by schemas, for example using DTD, which allows individual documents to be validated. However, existing XSLT tools offer no static guarantees that, under the assumption that the input is valid relative to the input schema, the output of the transformation is valid relative to the output schema.

We present a validation technique for XSLT based on the summary graph formalism introduced in the static analysis of JWIG Web services. Being able to provide static guarantees, we can detect a large class of errors in an XSLT stylesheet at the time it is written instead of later when it has been deployed, and thereby provide benefits similar to those of static type checkers for modern programming languages.

Our analysis takes a pragmatic approach that focuses its precision on the essential language features but still handles the entire XSLT 1.0 language. We evaluate the analysis precision on a range of real stylesheets and demonstrate how it may be useful in practice.

PDF

Contracts for Cooperation between Web Service Programmers and HTML Designers
Henning Böttger | Anders Møller | Michael I. Schwartzbach

in Journal of Web Engineering, Vol 5(1), 2006

Interactive Web services consist of a mixture of HTML fragments and program code. The fragments, which are maintained by designers, are combined to form HTML pages that are shown to the clients. The code, which is maintained by programmers, is executed on the server to handle the business logic. Current Web service frameworks provide little help in separating these constituents, which complicates cooperation between programmers and HTML designers.

We propose a system based on XML templates and formalized contracts allowing a flexible separation of concerns. The contracts act as interfaces between the programmers and the HTML designers and permit tool support for statically checking that both parties fulfill their obligations. This ensures that (1) programmers and HTML designers work more independently focusing on their own expertises, (2) the Web service implementation is better structured and thus easier to develop and maintain, (3) it is guaranteed that only valid HTML is sent to the clients even though it is constructed dynamically, (4) the programmer uses the XML templates consistently, and (5) the form input fields being sent to the client always match the code receiving those values. Additionally, we describe tools that aid in the construction and management of contracts and XML templates.

PDF

Dual Syntax for XML Languages
Claus Brabrand | Anders Møller | Michael I. Schwartzbach

to appear in Information Systems, 2007

XML is successful as a machine processable data interchange format, but it is often too verbose for human use. For this reason, many XML languages permit an alternative more legible non-XML syntax. XSLT stylesheets are often used to convert from the XML syntax to the alternative syntax; however, such transformations are not reversible since no general tool exists to automatically parse the alternative syntax back into XML.

We present XSugar, which makes it possible to manage dual syntax for XML languages. An XSugar specification is built around a context-free grammar that unifies the two syntaxes of a language. Given such a specification, the XSugar tool can translate from alternative syntax to XML and vice versa. Moreover, the tool statically checks that the transformations are reversible and that all XML documents generated from the alternative syntax are valid according to a given XML schema.

PDF

The Design Space of Type Checkers for XML Transformation Languages
Anders Møller | Michael I. Schwartzbach

ICDT 2005

We survey work on statically type checking XML transformations, covering a wide range of notations and ambitions. The concept of type may vary from idealizations of DTD to full-blown XML Schema or even more expressive formalisms. The notion of transformation may vary from clean and simple transductions to domain-specific languages or integration of XML in general-purpose programming languages. Type annotations can be either explicit or implicit, and type checking ranges from exact decidability to pragmatic approximations. We characterize and evaluate existing tools in this design space, including a recent result of the authors providing practical type checking of full unannotated XSLT 1.0 stylesheets given general DTDs that describe the input and output languages.

PDF

Static Analysis of XML Transformations in Java
Christian Kirkegaard | Anders Møller | Michael I. Schwartzbach

in IEEE Transactions on Software Engineering, Vol 30(3), 2004

XML documents generated dynamically by programs are typically represented as text strings or DOM trees. This is a low-level approach for several reasons: 1) traversing and modifying such structures can be tedious and error prone; 2) although schema languages, e.g. DTD, allow classes of XML documents to be defined, there are generally no automatic mechanisms for statically checking that a program transforms from one class to another as intended.

We introduce Xact, a high-level approach for Java using XML templates as a first-class data type with operations for manipulating XML values based on XPath. In addition to an efficient runtime representation, the data type permits static type checking using DTD schemas as types. By specifying schemas for the input and output of a program, our analysis algorithm will statically verify that valid input data is always transformed into valid output data and that the operations are used consistently.

PDF

Precise Analysis of String Expressions
Aske Simon Christensen | Anders Møller | Michael I. Schwartzbach

SAS'03

We perform static analysis of Java programs to answer a simple question: which values may occur as results of string expressions? The answers are summarized for each expression by a regular language that is guaranteed to contain all possible values. We present several applications of this analysis, including statically checking the syntax of dynamically generated expressions, such as SQL queries.

Our analysis constructs flow graphs from class files and generates a context-free grammar with a nonterminal for each string expression. The language of this grammar is then widened into a regular language through a variant of an algorithm previously used for speech recognition.

The collection of resulting regular languages is compactly represented as a special kind of multi-level automaton from which individual answers may be extracted. If a program error is detected, examples of invalid strings are automatically produced. We present extensive benchmarks demonstrating that the analysis is efficient and produces results of useful precision.

PDF

The METAFRONT System: Extensible Parsing and Transformation
Claus Brabrand | Michael I. Schwartzbach | Mads Vangaard

LDTA 2003

We present the metafront tool for specifying flexible, safe, and efficient syntactic transformations between languages defined by context-free grammars. The transformations are guaranteed to terminate and to map grammatically legal input to grammatically legal output.

We rely on a novel parser algorithm that is designed to support gradual extensions of a grammar by allowing productions to remain in a natural style and by statically reporting ambiguities and errors in terms of individual productions as they are being added.

Our tool may be used as a parser generator in which the resulting parser automatically supports a flexible, safe, and efficient macro processor, or as an extensible lightweight compiler generator for domain-specific languages. We show substantial examples of both kinds.

PDF

Static Analysis for Dynamic XML
Aske Simon Christensen | Anders Møller | Michael I. Schwartzbach

PLAN-X 2002

We describe the summary graph lattice for dataflow analysis of programs that dynamically construct XML documents. Summary graphs have successfully been used to provide static guarantees in the JWIG language for programming interactive Web services. In particular, the JWIG compiler is able to check validity of dynamically generated XHTML documents and to type check dynamic form data. In this paper we present summary graphs and indicate their applicability for various scenarios. We also show that summary graphs have exactly the same expressive power as the regular expression types from XDuce, but that the extra structure in summary graphs makes them more suitable for certain program analyses.

PDF

Extending Java for High-Level Web Service Construction
Aske Simon Christensen | Anders Møller | Michael I. Schwartzbach

in TOPLAS, Vol 25(6), 2003

We incorporate innovations from the <bigwig> project into the Java language to provide high-level features for Web service programming. The resulting language, JWIG, contains an advanced session model and a flexible mechanism for dynamic construction of XML documents, in particular XHTML. To support program development we provide a suite of program analyses that at compile-time verify for a given program that no run-time errors can occur while building documents or receiving form input, and that all documents being shown are valid according to the document type definition for XHTML 1.0.

We compare JWIG with Servlets and JSP which are widely used Web service development platforms. Our implementation and evaluation of JWIG indicate that the language extensions can simplify the program structure and that the analyses are sufficiently fast and precise to be practically useful.

PDF

Growing Languages with Metamorphic Syntax Macros
Claus Brabrand | Michael I. Schwartzbach

PEPM'02

We present our experiences with a syntax macro language which we claim forms a general abstraction mechanism for growing (domain-specific) extensions of programming languages. Our syntax macro language is designed to guarantee type safety and termination. A concept of metamorphisms allows the arguments of a macro to be inductively defined in a meta level grammar and morphed into the host language. We also show how the metamorphisms can be made to operate simultaneously on multiple parse trees at once. The result is a highly flexible mechanism for growing new language constructs without resorting to compile-time programming. In fact, whole new languages can be defined at surprisingly low cost. This work is fully implemented as part of the <bigwig> system for defining interactive Web services, but could find use in many other languages.

PDF

Language-Based Caching of Dynamically Generated HTML
Claus Brabrand | Anders Møller | Steffan Olesen | Michael I. Schwartzbach

World Wide Web Journal, Vol 5(4), 2002

Increasingly, HTML documents are dynamically generated by interactive Web services. To ensure that the client is presented with the newest versions of such documents it is customary to disable client caching causing a seemingly inevitable performance penalty. In the <bigwig> system, dynamic HTML documents are composed of higher-order templates that are plugged together to construct complete documents. We show how to exploit this feature to provide an automatic fine-grained caching of document templates, based on the service source code. A <bigwig> service transmits not the full HTML document but instead a compact JavaScript recipe for a client-side construction of the document based on a static collection of fragments that can be cached by the browser in the usual manner. We compare our approach with related techniques and demonstrate on a number of realistic benchmarks that the size of the transmitted data and the latency may be reduced significantly.

PDF

 
Static Validation of Dynamically Generated HTML
Claus Brabrand | Anders Møller | Michael I. Schwartzbach

PASTE'01

We describe a static analysis of <bigwig> programs that efficiently decides if all dynamically computed XHTML documents presented to the client will validate according to the official DTD. We employ two interprocedural flow analyses to construct a graph summarizing the possible documents. This graph is subsequently analyzed to determine validity of those documents. By evaluating the technique on a number of realistic benchmarks, we demonstrate that it is sufficiently fast and precise to be practically useful.

PDF

 
The Pointer Assertion Logic Engine
Anders Møller | Michael I. Schwartzbach

PLDI'01

We present a new framework for verifying partial specifications of programs in order to catch type and memory errors and check data structure invariants. Our technique can verify a large class of data structures, namely all those that can be expressed as graph types. Earlier versions were restricted to simple special cases such as lists or trees. Even so, our current implementation is as fast as the previous specialized tools.

Programs are annotated with partial specifications expressed in Pointer Assertion Logic, a new notation for expressing properties of the program store. We work in the logical tradition by encoding the programs and partial specifications as formulas in monadic second-order logic. Validity of these formulas is checked by the MONA tool, which also can provide explicit counterexamples to invalid formulas.

To make verification decidable, the technique requires explicit loop and function call invariants. In return, the technique is highly modular: every statement of a given program is analyzed only once.

The main target applications are safety-critical data-type algorithms, where the cost of annotating a program with invariants is justified by the value of being able to automatically verify complex properties of the program.

PDF

 
The <bigwig> Project
Claus Brabrand | Anders Møller | Michael I. Schwartzbach

Transaction on Internet Technology, Vol 2(2), 2002

We present the results of the <bigwig> project, which aims to design and implement a high-level domain-specific language for programming interactive Web services.

A fundamental aspect of the development of the World Wide Web during the last decade is the gradual change from static to dynamic generation of Web pages. Generating Web pages dynamically in dialogue with the client has the advantage of providing up-to-date and tailor-made information. The development of systems for constructing such dynamic Web services has emerged as a whole new research area.

The <bigwig> language is designed by analyzing its application domain and identifying fundamental aspects of Web services inspired by problems and solutions in existing Web service development languages. The core of the design consists of a session-centered service model together with a flexible template-based mechanism for dynamic Web page construction. Using specialized program analyses, certain Web specific properties are verified at compile-time, for instance that only valid HTML 4.01 is ever shown to the clients. In addition, the design provides high-level solutions to form field validation, caching of dynamic pages, and temporal-logic based concurrency control, and it proposes syntax macros for making highly domain-specific languages.

The language is implemented via widely available Web technologies, such as Apache on the server-side and JavaScript and Java Applets on the client-side. We conclude with experience and evaluation of the project.

PDF

 
MONA Implementation Secrets
Nils Klarlund | Anders Møller | Michael I. Schwartzbach

CIAA 2000

The MONA tool provides an implementation of the decision procedures for the logics WS1S and WS2S. It has been used for numerous applications, and it is remarkably efficient in practice, even though it faces a theoretically non-elementary worst-case complexity. The implementation has matured over a period of six years. Compared to the first naive version, the present tool is faster by several orders of magnitude. This speedup is obtained from many different contributions working on all levels of the compilation and execution of formulas. We present a selection of implementation "secrets" that have been discovered and tested over the years, including formula reductions, DAGification, guided tree automata, three-valued logic, eager minimization, BDD-based automata representations, and cache-conscious data structures. We describe these techniques and quantify their respective effects by experimenting with separate versions of the MONA tool that in turn omit each of them.

PDF

 
PowerForms: Declarative Client-Side Form Field Validation
Claus Brabrand | Anders Møller | Mikkel Ricky | Michael I. Schwartzbach

World Wide Web Journal, Vol 3(4), 2000

All uses of HTML forms may benefit from validation of the specified input field values. Simple validation matches individual values against specified formats, while more advanced validation may involve interdependencies of form fields. There is currently no standard for specifying or implementing such validation. Today, CGI programmers often use Perl libraries for simple server-side validation or program customized JavaScript solutions for client-side validation.

We present PowerForms, which is an add-on to HTML forms that allows a purely declarative specification of input formats and sophisticated interdependencies of form fields. While our work may be seen as inspiration for a future extension of HTML, it is also available for CGI programmers today through a preprocessor that translates a PowerForms document into a combination of standard HTML and JavaScript that works on all combinations of platforms and browsers.

The definitions of PowerForms formats are syntactically disjoint from the form itself, which allows a modular development where the form is perhaps automatically generated by other tools and the formats and interdependencies are added separately.

PowerForms has a clean semantics defined through a fixed-point process that resolves the interdependencies between all field values. Text fields are equipped with status icons (by default traffic lights) that continuously reflect the validity of the text that has been entered so far, thus providing immediate feed-back for the user. For other GUI components the available options are dynamically filtered to present only the allowed values.

PowerForms are integrated into the <bigwig> system for generating interactive Web services, but is also freely available in an Open Source distribution as a stand-alone package.

PDF

 
DSD: A Schema Language for XML
Nils Klarlund | Anders Møller | Michael I. Schwartzbach

FMSP'00

We present the DSD (Document Structure Description) language, which is a schema language for XML documents. A DSD is itself an XML document, which describes a class of XML application documents.

The expressiveness of DSDs goes far beyond the simple DTD concept, covering advanced features such as conditional constraints, multiple nonterminals for each element, gradual declaration of attributes, support for both ordered and unordered contents, and constraints on reference targets. We also support a declarative mechanism for inserting default elements and attributes that is reminiscent of Cascading Style Sheets. Finally, we include a simple technique for evolving DSD documents through selective redefinitions.

The DSD language is completely self-describable, meaning that the syntax of legal DSD documents together with all static requirements are captured in a special DSD document, the meta-DSD of less than 500 lines.

PDF

 
Compile-Time Debugging of C Programs Working on Trees
Jacob Elgaard | Anders Møller | Michael I. Schwartzbach

ESOP'00

We exhibit a technique for automatically verifying the safety of simple C programs working on tree-shaped data structures. We do not consider the complete behavior of programs, but only attempt to verify that they respect the shape and integrity of the store. A verified program is guaranteed to preserve the tree-shapes of data structures, to avoid pointer errors such as NULL dereferences, leaking memory, and dangling references, and furthermore to satisfy assertions specified in a specialized store logic.

A program is transformed into a single formula in WSRT, a novel extension of WS2S that is decided by the MONA tool. This technique is complete for loop-free code, but for loops and recursive functions we rely on Hoare-style invariants. A default well-formedness invariant is supplied and can be strengthened as needed by programmer annotations. If a program fails to verify, a counterexample in the form of an initial store that leads to an error is automatically generated.

This extends previous work that uses a similar technique to verify a simpler syntax manipulating only list structures. In that case, programs are translated into WS1S formulas. A naive generalization to recursive data-types determines an encoding in WS2S that leads to infeasible computations. To obtain a working tool, we have extended MONA to directly support recursive structures using an encoding that provides a necessary state-space factorization. This extension of MONA defines the new WSRT logic together with its decision procedure.

PDF

 
A Type System for Dynamic Web Documents
Anders Sandholm | Michael I. Schwartzbach

POPL'00

Many interactive web services use the CGI interface for communication with clients. They will dynamically create HTML documents that are presented to the client who then resumes the interaction by submitting data through incorporated form fields. This protocol is difficult to statically type-check if the dynamic document are created by arbitrary script code using printf statements. Previous proposals have suggested using static document templates which trades flexibility for safety. We propose a notion of typed, higher-order templates that simultaneously achieve flexibility and safety. Our type system is based on a flow analysis of which we prove soundness. We present an efficient runtime implementation that respects the semantics of only well-typed programs. This work is fully implemented as part of the <bigwig> system for defining interactive web services.

PDF

 
A Runtime System for Interactive Web Services
Claus Brabrand | Anders Møller | Anders Sandholm | Michael I. Schwartzbach

WWW8, 1999

Interactive web services are increasingly replacing traditional static web pages. Producing web services seems to require a tremendous amount of laborious low-level coding due to the primitive nature of CGI programming. We present ideas for an improved runtime system for interactive web services built on top of CGI running on virtually every combination of browser and HTTP/CGI server. The runtime system has been implemented and used extensively in <bigwig>, a tool for producing interactive web services.

PDF

 
YakYak: Parsing with Logical Side Constraints
Niels Damgaard | Nils Klarlund | Michael I. Schwartzbach

DLT'99

Programming language syntax is often described by means of a context-free grammar that is further restricted by some (logical) side constraints. This division is necessary if the intended syntax is not in a tractable grammar class, or if it would simply become too large and unintuitive without constraints. We present the tool YakYak, which extends Yacc with side constraints that are concisely phrased in a first-order parse tree logic. We show how intensive tree automata calculations turn parse tree formulas into canonical attribute grammars. YakYak is implemented as a preprocessor for Yacc, in which the transitions of the tree automata are automatically merged into the action code of productions. Our technique eliminates the explicit encoding of information flow necessary in attribute grammars or handwritten tree-walking procedures. We provide both practical experience and theoretical results establishing that the YakYak approach works on large grammars.

PDF

 
Distributed Safety Controllers for Web Services
Anders Sandholm | Michael I. Schwartzbach

FASE'98

We show how to use high-level synchronization constraints, written in a version of monadic second-order logic on finite strings, to synthesize safety controllers for interactive web services. We improve on the naïve runtime model to avoid state-space explosions and to increase the flow capacities of services.

PDF

 
A Domain-Specific Language for Regular Sets of Strings and Trees
Nils Klarlund | Michael I. Schwartzbach

DSL'97

We propose a new high-level programming notation, called FIDO, that we have designed to concisely express regular sets of strings or trees. In particular, it can be viewed as a domain-specific language for the expression of finite-state automata on large alphabets (of sometimes astronomical size).
FIDO is based on a combination of mathematical logic and programming language concepts. This combination shares no similarities with usual logic programming languages. FIDO compiles into finite-state string or tree automata, so there is no concept of run-time. It has already been applied to a variety of problems of considerable complexity and practical interest.
In the present paper, we motivate the need for a language like FIDO, and discuss our design and its implementation.
We show how recursive data types, unification, implicit coercions, and subtyping can be merged with a variation of predicate logic, called the Monadic Second-order Logic (M2L) on trees. FIDO is translated first into pure M2L via suitable encodings, and finally into finite-state automata through the MONA tool.

PDF

 
Automatic Verification of Pointer Programs using Monadic Second-Order Logic
Jakob L. Jensen | Michael E. Jørgensen | Nils Klarlund | Michael I. Schwartzbach

PLDI'97

We present a technique for automatic verification of pointer programs based on a decision procedure for the monadic second-order logic on finite strings.
We are concerned with a while-fragment of Pascal, which includes recursively-defined pointer structures but excludes pointer arithmetic.
We define a logic of stores with interesting basic predicates such as pointer equality, tests for nil pointers and garbage cells, as well as reachability along pointers.
We present a complete decision procedure for Hoare triples based on this logic over loop-free code. Combined with explicit loop invariants, the decision procedure allows us to answer surprisingly detailed questions about small but non-trivial programs. If a program fails to satisfy a certain property, then we can automatically supply an initial store that provides a counterexample.
Our technique has been fully and efficiently implemented for linear linked lists, and extends in principle to arbitrary tree structures. The resulting system can be used to verify extensive properties of smaller pointer programs and could be particularly useful in a teaching environment.

PDF

 
Formal Design Constraints
Nils Klarlund | Jari Koistinen | Michael I. Schwartzbach

OOPSLA'96

Large software systems are often built on system platforms that support or enforce specific characteristics of the source code or actual design. These characteristics are either captured informally in design guideline documents or in specialized design and implementation languages.
In our view, both approaches are unsatisfactory. Informal descriptions do not allow automated analysis and lead to vague constraint descriptions. The language-based approach leads to different languages for different platforms and even for different versions of the same basic platform.
Our approach is to describe and name the constraints separately in a design constraint language called CDL, which is based on an extraordinarily concise logic of parse trees. Designs are then annotated with the names of the constraints they are supposed to satisfy.
We discuss how the design constraint language is integrated into a design language environment. We exhibit industrial and experimental evidence that our choice of design constraint language allows us to formalize naturally and succinctly common design characteristics.

PDF

 
Type Inference of Turbo Pascal
Ole I. Hougaard | Michael I. Schwartzbach | Hosein Askari

Software - Concepts and Tools 1995 (16) 160-169

Type inference is generally thought of as being an exclusive property of the functional programming paradigm. We argue that such a feature may be of significant benefit for also standard imperative languages. We present a working tool providing these benefits for a full version of Turbo Pascal. It has the form of a preprocessor that analyzes programs in which the type annotations are only partial or even absent. The resulting program has full type annotations, will be accepted by the standard Turbo Pascal compiler, and has polymorphic use of procedures resolved by means of code expansion.

PDF

 
Efficient Recursive Subtyping
Dexter Kozen | Jens Palsberg | Michael I. Schwartzbach

Mathematical Structures in Computer Science Vol 5(1), 1995; also presented at POPL'93

Subtyping in the presence of recursive types for the lambda-calculus was studied by Amadio and Cardelli in 1991. In that paper they showed that the problem of deciding whether one recursive type is a subtype of another is decidable in exponential time.
In this paper we give an O(n²) algorithm. Our algorithm is based on a simplification of the definition of the subtype relation, which allows us to reduce the problem to the emptiness problem for a certain finite automaton with quadratically many states.
It is known that equality of recursive types and the covariant Böhm order can be decided efficiently by means of finite automata, since they are just language equality and language inclusion, respectively. Our results extend the automata-theoretic approach to handle orderings based on contravariance.

PDF

 
Safety Analysis versus Type Inference
Jens Palsberg | Michael I. Schwartzbach
Information and Computation Vol. 118, No. 1, 1995

Safety analysis is an algorithm for determining if a term in an untyped lambda calculus with constants is safe, i.e., if it does not cause an error during evaluation. This ambition is also shared by algorithms for type inference. Safety analysis and type inference are based on rather different perspectives, however. Safety analysis is global in that it can only analyze a complete program. In contrast, type inference is local in that it can analyze pieces of a program in isolation.
In this paper we prove that safety analysis is sound, relative to both a strict and a lazy operational semantics. We also prove that safety analysis accepts strictly more safe lambda terms than does type inference for simple types. The latter result demonstrates that global program analyses can be more precise than local ones.

PDF

 
Injectivity of Composite Functions
Kim Skak Larsen | Michael I. Schwartzbach

Journal of Symbolic Computation 1994 (17), 393-408

The problem of deciding injectivity of functions is addressed. The functions under consideration are compositions of more basic functions for which information about injectivity properties is available. We present an algorithm which will often be able to prove that such a composite function is injective. This algorithm constructs a set of propositional Horn clause axioms from the function specification and the available information about the basic functions. The existence of a proof of injectivity is then reduced to the problem of propositional Horn clause deduction. Dowling and Gallier have designed several very fast algorithms for this problem, the efficiency of which our algorithm inherits. The proof of correctness of the algorithm amounts to showing soundness and completeness of the generated Horn clause axioms.

PDF

 
Static Typing for Object-Oriented Programming
Jens Palsberg | Michael I. Schwartzbach

Science of Computer Programming 23 (1994) 19-53

We develop a theory of statically typed object-oriented languages. It represents classes as labeled, regular trees, types as finite sets of classes, and subclassing as a partial order on trees. We show that our subclassing order strictly generalizes inheritance, and that a novel genericity mechanism arises as an order-theoretic complement. This mechanism, called class substitution, is pragmatically useful and can be implemented efficiently.

PDF

 
Static Correctness of Hierarchical Procedures
Michael I. Schwartzbach

Theoretical Computer Science 156 (1996) 177-201; also presented at ICALP'90

A system of hierarchical, fully recursive types in a truly imperative language allows program fragments written for small types to be reused for all larger types. To exploit this property to enable type-safe hierarchical procedures, it is necessary to impose a static requirement on procedure calls. We introduce an example language and prove the existence of a sound requirement which preserves static correctness while allowing hierarchical procedures. This requirement is further shown to be optimal, in the sense that it imposes as few restrictions as possible. This establishes the theoretical basis for a general type hierarchy with static type checking, which enables 1st order polymorphism combined with multiple inheritance and specialization in a language with assignments.

We extend the results to include opaque types. An opaque version of a type is different from the original but has the same values and the same order relations to other types. The opaque types allow a more flexible polymorphism and provide the usual pragmatic advantages of distinguishing between intended and unintended type equalities. Opaque types can be viewed as a compromise between synonym types and abstract types.

PDF

 
Efficient Inference of Partial Types
Dexter Kozen | Jens Palsberg | Michael I. Schwartzbach

Journal of Computer and Systems Sciences Vol. 49, No. 2, 1994; also presented at FOCS'92

Partial types for the lambda-calculus were introduced by Thatte in 1988 as a means of typing objects that are not typable with simple types, such as heterogeneous lists and persistent data. In that paper he showed that type inference for partial types was semidecidable. Decidability remained open until quite recently, when O'Keefe and Wand gave an exponential time algorithm for type inference.
In this paper we give an O(n³) algorithm. Our algorithm constructs a certain finite automaton that represents a canonical solution to a given set of type constraints. Moreover, the construction works equally well for recursive types; this solves an open problem stated by O'Keefe and Wand.

PDF

 
Graph Types
Nils Klarlund | Michael I. Schwartzbach

POPL'93

Recursive data structures are abstractions of simple records and pointers. They impose a shape invariant, which is verified at compile-time and exploited to automatically generate code for building, copying, comparing, and traversing values without loss of efficiency. However, such values are always tree shaped, which is a major obstacle to practical use.
We propose a notion of graph types, which allow common shapes, such as doubly-linked lists or threaded trees, to be expressed concisely and efficiently. We define regular languages of routing expressions to specify relative addresses of extra pointers in a canonical spanning tree. An efficient algorithm for computing such addresses is developed. We employ a second-order monadic logic to decide well-formedness of graph type specifications. This logic can also be used for automated reasoning about pointer structures.

PDF

 
Graphs and Decidable Transductions Based on Edge Constraints
Nils Klarlund | Michael I. Schwartzbach

CAAP'94

We give examples to show that not even edNCE, the most general known notion of context-free graph grammar, is suited for the specification of some common data structures.
To overcome this problem, we use monadic second-order logic and introduce edge constraints as a new means of specifying a large class of graph families. Our notion stems from a natural dichotomy found in programming practice between ordinary pointers forming spanning trees and auxiliary pointers cutting across.
Our main result is that for certain transformations of graphs definable in monadic second-order logic, the question of whether a graph family given by a specification A is mapped to a family given by a specification B is decidable. Thus a decidable Hoare logic arises.

PDF

 
Type Inference of Self
Ole Agesen | Jens Palsberg | Michael I. Schwartzbach

Software Practice and Experience 25(9) 975-995, 1995; also presented at ECOOP'93

We have designed and implemented a type inference algorithm for the full Self language. The algorithm can guarantee the safety and disambiguity of message sends, and provide useful information for browsers and optimizing compilers.
Self features objects with dynamic inheritance. This construct has until now been considered incompatible with type inference because it allows the inheritance graph to change dynamically. Our algorithm handles this by deriving and solving type constraints that simultaneously define supersets of both the possible values of expressions and of the possible inheritance graphs. The apparent circularity is resolved by computing a global fixed-point, in polynomial time.
The algorithm has been implemented and can successfully handle the Self benchmark programs, which exist in the ``standard Self world'' of more than 40,000 lines of code.

PDF

 
Binding Time Analysis: Abstract Interpretation versus Type Inference
Jens Palsberg | Michael I. Schwartzbach

ICCL'94

Binding-time analysis is important in partial evaluators. Its task is to determine which parts of a program can be evaluated if some of the expected input is known. Two approaches to do this are abstract interpretation and type inference.
We compare two specific such analyses to see which one determines most program parts to be eliminable. The first is a an abstract interpretation approach based on closure analysis and the second is the type inference approach of Gomard and Jones. Both apply to the pure lambda-calculus. We prove that the abstract interpretation approach is more powerful than that of Gomard and Jones: the former determines the same and possibly more program parts to be eliminable as the latter.

PDF

 
Interpretations of Recursively Defined Types
Michael I. Schwartzbach

Theoretical Computer Science, Vol 106(1), 1993

We consider a type system where types are labeled, regular trees. Equipped with a type ordering, it forms the basis for a polymorphic, imperative programming language. This paper studies interpretations, which are homomorphic, monotonic functions from types to sets of values. We show that they form a partial order with a minimal and a maximal element, and various characterizations of other interpretations are provided. We also briefly consider a unification of types and values.

PDF

A New Formalism for Relational Algebra
Kim Skak Larsen | Michael I. Schwartzbach | Erik M. Schmidt

Information Processing Letters, 41(3):163-168, 1992

We present a single relational operator which, in combination with a simple core language for manipulating atoms and tuples, generalizes all standard unary and binary operators in relational databases, while permitting a more intuitive query style. The new operator, factor, is based on a unique factorization of relations. We present an example language and demonstrate how the usual operators appear as simple and intuitive instances of factor. We further show that many new operators and combinations of old ones can be expressed in concise terms using factor. The factor versions will always be evaluated as efficiently as the originals and will sometimes even lead to a speed-up.

PDF

Type Inference with Inequalities
Michael I. Schwartzbach | CAAP '91

Type inference can be phrased as constraint-solving over types. We consider an implicitly typed language equipped with recursive types, multiple inheritance, 1st order parametric polymorphism, and assignments. Type correctness is expressed as satisfiability of a possibly infinite collection of (monotonic) inequalities on the types of variables and expressions. A general result about systems of inequalities over semilattices yields a solvable form. We distinguish between deciding typability (the existence of solutions) and type inference (the computation of a minimal solution). In our case, both can be solved by means of nondeterministic finite automata; unusually, the two problems have different complexities: polynomial vs. exponential time.

PDF