All,
We've significantly modified the schema for Draft 9. The primary
driver was to improve support for multiple views, and to better
distinguish between the different types of elements that we are
covering in CWE. Thanks to Sean Barnum for figuring out the bulk of
this. The MITRE team took his inputs and made some small tweaks here
and there.
As CWE is at a crossroads with respect to the schema, we welcome any
feedback or alternatives to our current approaches. Specifically,
while we have chosen XML so far, we are open to leveraging other
techniques to storing and working with the data, if those techniques
are more effective. For example, if it makes sense to store CWE in a
database and use an application server to help present and link
everything together, we are open to pursuing that. We also plan to
investigate RDF, XGGML, and other languages that might be more
directly supportive of graph-based relationships.
Please note that even if we stay with XML and related technologies, we
expect that the schema will still need to change a little bit.
However, we believe that one requirement for "CWE 1.0" is to have
stable schema. In Draft 9, we are definitely a lot closer than we
were. Bob and I will post our requirements for "CWE 1.0" once they've
been finalized.
For Draft 9, some of the highest level schema changes are covered
here:
http://cwe.mitre.org/data/reports/diff_xsd_10_3.0.htmlThe rest of this document assumes that you read the preceding
document.
Any and all feedback would be appreciated, especially if there are
still outstanding issues in the schema that prevent you from using CWE
as extensively as you would want to.
Schema Evaluation Criteria
--------------------------
Here are some of the criteria that I think we should be applying while
finalizing the schema:
- Expressiveness: we should be able to express everything that we
want to. In Draft 9, some examples of this are the creation of
explicit views, and the requirement for relationships to specify
the views they are part of. But, we still don't have a way of
saying things like "this issue theoretically affects any language
that performs direct memory management, but it's especially common
in C." That's important, because if C is not explicitly mentioned
in an element, then that element won't be part of the C language
view.
- Extraction: it should be as easy as possible for CWE users to
extract the data that they want, using commonly available XML
parsers and related tools. In Draft 9, the relevant data for
named chains are not necessarily easy to extract.
- Maintenance
- minimize maintenance costs: the MITRE team, and outside
contributors, should be able to quickly represent the necessary
information.
- minimize preventable errors in data entry: we want to minimize
errors in the CWE representation that cannot be caught by an XML
validator, but nonetheless require consistency.
- minimize XML "bloat": this is hopefully self-explanatory. The
relationships in Draft 9 might exhibit some bloat, although at
the same time, there's a major benefit to their increased
expressiveness.
- Flexibility: ideally, the schema would remain stable, while
allowing us to build in additional capabilities. For Draft 9, we
believe that we've added flexibility for defining new kinds of
relationships and views. The introduction of compound elements
will hopefully allow us to support other kinds of concepts besides
chains and composites that might arise in the future; for example,
some CWE nodes are really talking about multiple distinct issues
and could be called "loose composites."
In light of these criteria, I wanted to explain some of the rationale
for the schema changes, and what we have left ahead of us for CWE 1.0.
Views
-----
We added a number of views to CWE Draft 9. For the most part, this
involved converting weakness/"groupings" from Draft 8, into the new
Views type for draft 9.
See
http://cwe.mitre.org/data/index.html for a list of views.
Slices are basically lists of elements, without any relationships
between them. Membership in a slice can be explicit or implicit. In
explicit slices, all the relevant entries have some ChildOf
relationship where the View node is the parent; see CWE-630
(Weaknesses Examined by SAMATE) and CWE-635 (Weaknesses Used by NVD)
for examples.
In implicit slices, the slice has some filtering criteria that define
membership, and there aren't any relationships within the XML that are
explicitly defined. For example, CWE-658 is a slice that covers
weaknesses found in the C language. This implicit slice has a Filter
that specifies that member entries have "C" under the
Applicable_Platforms field.
The Comprehensive CWE Dictionary view, CWE-2000, is actually an
implicit slice that selects everything from CWE by using a filter that
always returns true.
Views can also be graphs, such as CWE-1000 (Natural Hierarchy).
Currently, graphs are expected to have explicit ChildOf relationships
within the member elements. Before Draft 9, everything was
effectively under the Natural Hierarchy. In Draft 9, however, some of
those elements have been removed from the Natural Hierarchy
altogether, like deprecated nodes and the resource-based view.
We suspect that some individual views might be best described as a
combination of slices *and* graphs, with a combination of implicit or
explicit membership. A view might be best expressed via some set of
explicit relationships (maybe between some implicit slices), then
defaulting to the relationships of a different view at some point.
The most concrete example of this is CWE-631 (Resource-specific
Weaknesses), at:
http://cwe.mitre.org/data/graphs/631.htmlThe higher-level nodes have explicit relationships defined within View
631. Its children - such as the Category node CWE-632 (Weaknesses
that Affect Files or Directories) - have explicitly specified children
such as CWE-22 (Path Traversal). That is, Path Traversal has an
explicit "ChildOf CWE-632" relationship. However, instead of the
explicit relationships, CWE-632 could potentially be defined as an
implicit slice of "all elements that have an Affected_Resource field
of File/Directory." That would reduce maintenance costs and improve
accuracy, but it is not possible in Draft 9, because CWE-632 is a
Category type - it's *in* a view, but not a view itself.
In addition, the resource-based view, CWE-631, could be more
comprehensive by "view hopping." In Draft 9, CWE-631 stops at CWE-22
(Path Traversal), but there are several children under CWE-22 that
would also match - except those children are only listed under the
natural hierarchy (view CWE-1000). It would probably be quite tedious
and error-prone just to copy all the natural hierarchy relationships
over to this new view. This might be best handled by allowing views
to link to each other, but this is not possible in Draft 9. In
addition, the "hops" might wind up including elements that were not
intended.
Finally, we have encountered some difficulties in generating a
"Comprehensive Graph" that merges all views together - the natural
hierarchy, the resource-based graph, the language-specific slices,
etc. So, there isn't a single graph on the CWE web site that covers
the entire CWE. We do have a PDF file that contains most nodes; it
focuses on the natural hierarchy (CWE-1000), and all other nodes are
effectively "orphans." We don't necessarily have to solve this
problem for a comprehensive view - after all, it's not clear who would
have a need for such a thing - but I thought it was worthwhile to
mention.
Relationships
-------------
The expression of relationships has changed significantly for Draft 9.
Much of this is covered by the schema diff report listed at the top of
this document, but there are some fields that I wanted to highlight.
Relationship_Type:
The Draft 8 version of "Relationship_Type" has been renamed to
"Relationship_Nature". The Draft 9 version of this field is
intended to identify the type of the entry that is being linked to.
Since we now have multiple types of entries in CWE, this field might
be useful in simplifying some extraction and presentation logic for
XSLT's. We have not needed this field in generating the web site
for Draft 9, although it might be convenient for others. However,
this field is currently being manually maintained, and this value
was often incorrect, because we changed the types of a number of
elements in Draft 9, which immediately invalidated this field in
dozens of relationships. We are able to perform a consistency check
to ensure that these values are correct before release, but it's
still a little bit of labor.
As a result, we will be looking at this field more closely, trying
to balance utility to the community with maintenance costs to the
CWE team.
Relationship_View_IDs:
We anticipate that, in the future, we will have multiple views that
share a lot of the same structure. As one example - CWE's Natural
Hierarchy (CWE-1000) is beginning to diverge more from the Seven
Pernicious Kingdoms (SPK) way of organizing the world, so it might
be reasonable to create a view into CWE that's useful for people who
are knowledgeable about SPK. The Natural Hierarchy and an SPK view
would probably have a lot of different elements near the top of the
tree, but they would share a lot at a lower level.
With closely overlapping views, this would produce a large number of
duplicate relationships that might contribute significantly to XML
bloat. The MITRE team decided that allowing multiple
Relationship_View_IDs would be a useful shorthand that might be
easier to maintain.
Current Challenges
------------------
Here are some of the current challenges that we still face, and plan
to resolve by CWE 1.0.
1) The Draft 9 schema does not have the expressiveness to define the
more complex views, and there are some associated maintenance
costs, as outlined in the previous sections.
2) Chains and composites, views, and categories all have some
overlapping uses that we'd like to clarify and, to the degree
possible, unify.
For example, both chains and composites involve a small selection
of entries from CWE, and dictate relationships between them. In
this sense, they can be regarded as views - perhaps micro-views.
Yet, we expect that they will have a distinct and important role
throughout CWE.
As another example, the resource-based view (CWE-632) has children
that are categories. These categories might be best described by
defining what their membership should be, but in Draft 9, this type
of automatic population is only possible through filters in View
elements. So, we had to manually create ChildOf relationships.
3) Relationship Directionality
Some views, like CWE-635 (Weaknesses Used by NVD), are defined more
by external criteria than anything that is implicit within
individual nodes, so these are explicit slices. In terms of
maintenance costs and ease of extraction, it might be best for
CWE-635 to explicitly state what its "members" are. Instead, each
member has a ChildOf relationship, with View_ID=635, that is a
ChildOf 635. Thus, maintenance of the NVD slice is done not by
operating on the slice itself, but by operating on its individual
members. This proved to be moderately expensive for us to do when
we changed the membership of the SAMATE view in Draft 8 - it took
an hour or so to edit some nodes to remove the SAMATE relationship,
and then edit other nodes to add the SAMATE relationship; if we
could just edit the SAMATE list directly, it would have been a
5-minute task. However, as I understand it, one of the mantras of
knowledge management is that data is kept as close to individual
nodes as possible; but relationships "belong" to multiple nodes,
even though in Draft 9 they are only explicit in one node.
It would be possible for us to automate some of those maintenance
tasks, but that would involve additional development.
Also, we have multiple relationships that are mutual, but only
expressed in one direction. For example, "X ChildOf Y" might be
specified in the XML, which implies "Y ParentOf X" - but we have no
ParentOf relationships that are explicitly stated. The same thing
applies for relationships that support chains and composites. As a
result, extraction logic can be complicated, because an entry
doesn't explicitly know what its children are. As a result of this
complexity, the extraction logic can be hard to maintain, and
sometimes computationally expensive. We have encountered this
problem in various ways while generating web site pages.
One possibility would be to create separate XML files and
representations for the relationships (and maybe for views),
possibly with separate schema. This might preserve expressiveness
and simplify maintenance, but it might make it more difficult for
some people to extract.
4) Named Chains
There are a couple issues with named chains. See
http://cwe.mitre.org/data/reports/chains_and_composites.html for
background.
All the data that's required to determine the links of a named
chain are within the XML, so there is sufficient expressiveness.
However, extraction is a little more difficult. For a named chain
X, the code has to search throughout all of CWE for entries with
all the CanPrecede relationships with a Chain_ID of X, then order
them appropriately. If you want to know what elements are in a
named chain, you HAVE to do this navigation throughout CWE - a
named chain does not explicitly state what its links are. Just
like composites explicitly state which items they require, it might
be reasonable to have named chains explicitly know what their
starting links are.
In addition, named chains can be difficult to classify, especially
under the natural hierarchy. Because named chains are new, we
decided not to create a separate view to handle them. We do have a
view that lists chain elements (CWE-679), but that view is actually
an implicit slice for extracting components of all the CanPrecede
relationships, whether they're related to a Named Chain or not.
The extraction and presentation logic for presenting chains in
general was too complicated for us to handle cleanly by the release
of Draft 9, so they are generated by external programs, instead of
through XSLT.