Model Assisted Software Development

Table of Contents

This document and its companions in this folder describe MASD as a methodology — an observation of the external world. They make no reference to any particular product. For how a concrete product instantiates these concepts, see Applied MASD.

MASD is the Model Assisted Software Development methodology developed as part of a PhD programme and documented in full on the Dogen site. Its canonical reference implementation is the code generator Dogen (now decommissioned); the methodology itself outlives the tool. MASD's central claim is that the schematic, repetitive parts of a software system can be treated as projections of a small, language-neutral model, and that those projections can be reconstructed automatically.

The conceptual model

MASD starts from the premise that a software project is a collection of physical artefacts — files and directories — that live on a filesystem, and that these artefacts can be understood as projections from a higher-dimension logical representation. The distinction is adapted from Lakos's Large-Scale C++ Software Design:

Logical design addresses all the type-related aspects of design — classes and their relationships. Physical design addresses the placement of logical entities, such as classes and functions, into physical ones, such as files and libraries. All design has a physical aspect.

masd_physical_to_filesystem.png

Figure 1: Figure 3 (Dogen): transforms from logical representation to filesystem. Source: Domain Architecture.

MASD partitions its problem domain into three orthogonal domains, each formalised by its own metamodel, plus a thin input representation:

  • The logical domain — the language-neutral meta-model of domain concepts (classes, enumerations, primitives…). Formalised by the Logical Metamodel (LMM). See Logical Space.
  • The physical domain — the file and folder artefacts that concepts are rendered into. Formalised by the Physical Metamodel (PMM). See Physical Space.
  • The variability domain — the (non-structural) configuration that decides which artefacts are produced and how they are named. Formalised by the Variability Metamodel (VMM). See Variability.
  • The codec model — the simplest external input representation, carrying stereotypes that route each element into its LMM meta-type.

The Logical-Physical Space (LPS)

The LMM, PMM and VMM are distinct dimensions of a single multidimensional space — the Logical-Physical Space — and "only make sense when viewed as a whole". MASD plays a dual role over this space: it defines the composition of all dimensions (their metamodels, elements, and associations), and it defines the framework of projections between them. The reference implementation is the canonical realisation of both.

logical_physical_space_small.png

Figure 2: Figure 42 (Dogen): high-level view of the MASD Logical-Physical Space. Source: Domain Architecture.

Projections

A projection is a function that takes a point in logical space and maps it to a set of points in physical space. The same logical entity projects into many physical artefacts: a class becomes a header, an implementation file, a serialisation routine, an ORM mapping, and so on — each a distinct expression of the same underlying logical truth. Changing the logical entity (adding a field) propagates to every projection simultaneously through regeneration.

The projection pathway runs through the dimensions of the LPS:

  1. Codec representation — the simplest input form; stereotypes determine routing (e.g. masd::object → the structural object meta-type).
  2. Logical model — an instantiation of the LMM.
  3. Physical model — regions and archetypes of the PMM.
  4. Filesystem — the rendered files and directories.

Each step is parameterised by non-structural variability (naming, file extensions, feature enablement) drawn from the VMM.

masd_projection_across_spaces.png

Figure 3: Figure 34 (Dogen): projection across the MASD spaces. Source: Domain Architecture.

Model-Driven Engineering distinguishes three categories of transformation, all of which appear in the pathway above:

  • Model-to-Model (M2M) — transforms one model into another at the same or a different abstraction level (codec → logical → physical).
  • Model-to-Text (M2T) — produces a string representation (source code, configuration, documentation) from a model. Code generators occupy this category; it is the final physical → filesystem step.
  • Text-to-Text (T2T) — converts one textual representation into another with no intermediate model.

See Models and Transformations for the formal treatment.

The physical address

Every physical artefact has an address — a point in physical space — written as a four-segment path:

[technical space].[part].[facet].[archetype]

For example cpp.include.types.class_header names the C++ header archetype of the types facet in the include part; cpp.src.types.class_implementation its implementation. A prefix names a region: cpp is all C++ content, cpp.include all C++ artefacts under the include part. The four levels are detailed in Physical Space; in brief:

  • Technical space (TS) — a programming language or platform together with its metamodel (C++, SQL, C#…). The outermost grouping. See Technical Space.
  • Part — a named subdivision of a component grouping related file artefacts (e.g. include and src for C++). A part is an arbitrary modelling choice for partitioning the entities of a problem domain; it is expressed in the technical space's language and conventions but is not a subdivision of the technical space itself.
  • Facet — a container for a set of related file artefacts that all belong to the same TS. Like a part, a facet is an arbitrary partitioning/taxonomy of the logical space (for example grouping all the artefacts that give a type its serialisation, or its hashing). It is the mechanism by which trivial structural functions are composed. See Facet.
  • Archetype — the most granular unit: the generating function and template for a single output file. Each archetype produces exactly one file per logical entity.

Not a subdivision of the technical space. Parts and facets are arbitrary partitionings of the logical space — choices about how to classify a problem domain's entities by the role each artefact plays. They "belong" to a technical space only because the TS's language is used to express the partitioning. See Physical Space for the full treatment.

The four metamodels and the metamodelling hierarchy

MASD is structured as a classical four-level metamodelling hierarchy — M3 metametamodel (the archetypes) → M2 metamodel (the PMM) → M1 model (the physical model) → M0 (the filesystem). See Physical Space for the M3–M0 table.

masd_pmm_pm.png

Figure 4: Figure 23 (Dogen): an example MASD metamodel hierarchy (M3 → M0). Source: Domain Architecture.

  • The Logical Metamodel (LMM) houses the meta-elements that model logical entities. Its structural package ranges from traditional meta-types (module, enumeration, builtin, object) up through idiomatic meta-types (exception, primitive, entry point) to design patterns and object templates. See Logical Space.
  • The Physical Metamodel (PMM) encodes the TS→Part→Facet→Archetype geometry as a queryable structure the generator consults at runtime. See Physical Space.
  • The Variability Metamodel (VMM) governs non-structural variability: enabling or disabling regions of physical space and configuring projections. See Variability.

Logical entity types

The LMM defines a fixed set of logical entity types — object, value, enumeration, primitive, exception, concept, module, and builtin. The type determines which facets are available and what artefacts can be produced, and each type has a default active facet set that the variability model can override. See Logical Space for the full type descriptions and the default-facet-per-type table.

Variability

MASD reserves the word variability for non-structural variability — the configuration that does not change the object graph of a model, only how it is rendered. The VMM addresses two things: enabling or disabling regions of physical space (technical spaces, facets, archetypes), and configuring projections (naming, file extensions, feature toggles).

  • A feature is a single configuration point.
  • A feature bundle groups semantically related features.
  • A feature template is an abstract feature instantiated over a domain (e.g. masd.archetype).
  • A binding point is the set of legal meta-entities at which a feature may be used.
  • A profile is a bundle of configuration points bound to logical elements, named after the ability it confers (serializable, =hashable=…), and reusable across products and product lines.

See Variability for the full treatment.

Schematic and Repetitive Physical Patterns (SRPP)

The analytical heart of MASD is the Schematic and Repetitive Pattern in Physical Space (SRPP): a class of infrastructure code that varies predictably across entities but is otherwise mechanically identical. Content is judged an SRPP operationally — "one determines if a physical pattern is schematic and repetitive by reproducing it by automated means". The physical modelling process has five steps:

  1. Sampling — collect representative hand-written artefacts.
  2. Decomposition / segmentation — break each into prologue, body, epilogue.
  3. Labelling and classification — assign artefacts to parts and facets.
  4. Reconstruction — write the trivial function (archetype) that regenerates the pattern.
  5. Cataloguing and parameterisation — register the archetype and expose its variability.

A generated artefact "may only be composed of zero or one trivial structural functions and zero or more trivial non-structural functions" — each gets its own archetype.

The six MASD principles

The principles below are defined in full in The MASD Methodology.

  1. Focus Narrowly — target infrastructure code only, never business logic.
  2. Integrate Pervasively — fit the existing toolchain without disruption.
  3. Evolve Gradually — grow coverage incrementally, driven by concrete practice rather than up-front design.
  4. Govern Openly — keep templates, models, and configuration in the open and subject to ordinary review.
  5. Standardise Judiciously — prefer de facto standards at the core; conform physical artefacts to project conventions.
  6. Assist and Guide — span an automation spectrum from stub generation to full product-line generation, assisting rather than replacing the engineer.

The reference-implementation backout strategy

Rather than designing an abstract generator first, MASD extracts the generator from a proven, hand-written implementation — a bootstrapping / dogfooding approach described for Dogen in MASD Reference Implementation:

  1. Commission a reference entity by hand — every layer written without generator involvement.
  2. Identify the SRPPs — patterns that appear identically across the hand-written set are candidates for extraction.
  3. Encode each SRPP as an archetype — a template parameterised by the logical model.
  4. Register the archetypes in the physical metamodel / profile catalogue.
  5. Verify zero drift — run the generator against the reference model and diff against the hand-written files; a zero diff confirms faithful extraction.
  6. Extend to new entities — commissioning a new entity reduces to authoring its model and running the generator.

This avoids the central failure mode of top-down code generation: a generator that produces code nobody has actually used.

Automation spectrum

MASD frames automation as a spectrum rather than a binary:

Level Description
0 Stub generation only; the engineer writes all content
1 One layer generated (e.g. schema only)
2 Multiple layers generated; some manual editing remains
3 A full entity generated with no manual editing
4 Product-line generation: a new entity is just a new model

See also

MASD concept docs (this folder)

  • Logical Space — the LMM: entity types, packages, abstraction levels, and projections.
  • Physical Space — the TS→Part→Facet→Archetype hierarchy, the PMM, and the metamodelling levels.
  • Technical Space — the language/platform-plus-metamodel concept.
  • Facet — the partitioning of artefacts by role; SRPP and archetypes.
  • Variability — features, bundles, templates, profiles, and configuration.

Applied

  • Applied MASD — how a concrete product instantiates every concept on this page.

Dogen site (primary references)

Emacs 29.1 (Org mode 9.6.6)