{#child toc @template.recipe=html}

1. Executive Summary

{{{pdfAddPageItem id="executive"}}}

Research in Heliophysics requires information from multiple sources which includes data from and about spacecrafts, groundbased observatories, models, simulations and more. The results from research are also invaluable in building up a body of knowledge and need to be available. All the different sources and types of information are considered a "Resource". The Resources exist, are shared, exchanged and used in a framework called the "data environment". The SPASE (Space Physics Archive Search and Extract) group has defined a Data Model which is a set of terms and values along with the relationships between them that allow describing all the resources in a heliophysics data environment. It is the result of many years of effort by an international collaboration of heliophysicists and information scientists to unify and improve on existing Space and Solar Physics data models. The intent of this Data Model is to provide the means to describe resources, most importantly scientifically useful data products, in a uniform way so they may be easily registered, found, accessed, and used.

The Data Model provides enough detail to allow a scientist to understand the content of Data Products (e.g., a set of files for 3 second resolution Geotail magnetic field data for1992 to 2005), together with essential retrieval and contact information. It also allows for the incremental annotation of resources with expert assessments and the free association of resources to create bundles or networks of resources. Resource descriptions can be stored with the data or at remote locations. Sites can harvest the resource descriptions to enable services like a search engine or portal (Virtual Observatory). A typical use would be to have a collection of descriptions stored in one or more related internet-based registries of products; that can be queried with specifically designed search engines and ultimately link users to the data they need. The Data Model also provides constructs for describing components of such a data delivery system. This includes repositories, registries and services.

The SPASE group website is located at https://www.spase-group.org/

A PDF version of this document can be downloaded from the SPASE site.

2. Introduction

{{{pdfAddPageItem id="introduction"}}}

The SPASE (Space Physics Archive Search and Extract) Data Model is a set of terms and values along with the relationships between them that allow describing all the resources in a heliophysics data environment. It is the result of many years of effort by an international collaboration (see https://spase-group.org) to unify and improve on existing Space and Solar Physics data models. The intent of this Data Model is to provide the means to describe resources, most importantly scientifically useful data products, in a uniform way so they may be easily registered, found, accessed, and used.

The SPASE data model divides the heliophysics data environment into a limited set of resources types. A key resource type is Numerical Data. This type of resource typically consists of a set of files containing values of one or more physical variables and that differ from each other only by the time span. To fully describe a Numerical Data resource requires other types of Resources, namely Observatory, Instrument, Person, and Repository, whose names are self-explanatory, and each of which has its own set of attributes. Often, numerical data are presented in prepared images (gif or jpeg), and such presentations are referred to as Display Data resources. The other data related resource types are Catalog which are lists of events; Annotation which enable expert comments on data products; and Granule which describe individual files within another resource (i.e., Numerical Data, Display Data or Catalog). Other types of resources include Document which can contain narratives or supporting information; Service that provide software to use data resources; Repository for storage locations; and Registry for metadata collections. Resource descriptions and the links in them are intended to make the Resource useful to scientific users.

2.1. History of Development

{{{pdfAddPageItem id="intro.development"}}}

The data model presented here has grown from the efforts begun in 2002 that became formalized in regular teleconferences of a group of interested data providers, including scientific and technical representatives of some of the largest data holdings in the US, Europe, and Japan. As the effort to provide seamless access to distributed data proceeded, it became clear that the data model efforts were central. The SPASE Data Model was developed with an iterative process where additions were made when unaddressed needs were discovered. The original impetus occurred at an ISTP meeting in 1998 where a resolution was passed calling to make data more accessible. Interoperability test beds were constructed in 2001 and in 2002 a grassroots effort was undertaken to define the needs of community. In March of 2003 a meeting of many of the people in the Contributors list at the beginning of this document was convened to begin the data model construction in earnest. The initial effort involved collecting terms from CDPP, SWRI, NSSDC, ISTP, and other sets to form a starting point. Two years of teleconferences, e-mailed revisions, and occasional face-to-face efforts, along with the application of the terms to specific cases, led to the release of version 1.0 of the data model in November 2005. Following the release of version 1.0 many existing data products were described and lead to further improvements of the data model. Version1.1 was released in August 2006. At this time NASA established the Heliophysics VxOs and after an extended period of use and improvements version 1.2.2 was released in August of 2008. The version of the data model described in this document is an extension of this earlier release.

2.2. Intended Purpose

{{{pdfAddPageItem id="intro.purpose"}}}

The design of the SPASE data model is based on a core set of principles related to the intended purpose of descriptive information (metadata), the data environment, and the operational environment. The overall goal of the Data Model is to be able to describe resources using a taxonomy of terms familiar to the heliophysics domain. This taxonomy should provide sufficient scientific context and data content information for an individual to assess the applicability of the resource (data and metadata) to a research question. A data model is the cornerstone of an information system and one purpose for the SPASE Data Model to enable the creation of "Virtual Observatories" that will link the broad range of heliophysics resources which may be available in a loosely coupled distributed environment. Additional goals of the data model are to:

  1. Provide a way of registering products using a standard set of terms that allow the products to be found with simple searches and described so that users can determine their utility for a specific purpose;
  2. Allow searching for products containing particular physical quantities (e.g., magnetic field; spectral irradiance) that are variously represented in a diverse array of data products; and
  3. Facilitate a means of mapping comparable variables from many products onto a common set of terms so that visualization, analysis, and higher-order query tools and services can be used on all of them without regard to the origin of the data.

The content of a resource description based on the data model should enable services (either at the provider or in a VxO) to discover and access individual resources. The service layer can contain services for a variety of purposes. The basic functionality of the service layer is to provide the links necessary to connect user applications and search- and-retrieval front ends to data repositories. Ultimately, the data environment based on the data model will involve a number of software tools and services linked together as an internet-based environment. The data along with software tools and documentation associated with products will be directly accessible using standard web protocols (http, ftp). This "system" has the potential to provide capabilities that can aid even expert users of a particular dataset (e.g., on-the-fly coordinate transformations, the ability to merge datasets from different instruments, easy reference to related indices or other data), in addition to providing the broad access needed to investigate emerging questions in heliophysics.

2.3. Design Principles

{{{pdfAddPageItem id="intro.principles"}}}

The design of the SPASE data model begins with a few basic principles. These principles are:

1. Data is self-documented.

Data resources have internal schema or structures for storing values. The physical structure is determined by the storage format. Each retrievable entity on the format is assigned a key or tag which can be used to retrieve the entity.

The SPASE Data Model does not attempt to describe the physical storage of the parameters, for example, the byte offsets, record format or data encoding in the data resource. Instead, the SPASE Data Model describes the scientific attributes of the parameter and links this to the parameter by a key or tag used by the storage format. Applications can use the SPASE descriptions to locate a parameter and the appropriate format-specific reader to extract parameters.

Not all data in the Heliophysics data environment is stored in self- documented formats. For example, data stored as ASCII tables. The method of assigning a key or tag name for each field in the ASCII table is external to the SPASE data model. This method must be part of an "format" specification which may be as simple as the first row of the table containing the tag name of the field.

2. Resources are distributed.

There are many providers of resources and these providers can be located anywhere in the world.

Each provider operates independently and activities are not necessarily coordinated. The SPASE data model assumes that providers have local autonomy and may operate under local rules or jurisdictions.

3. Online Resources have Universal Resource Locators (URL)

If a resource is on-line it can be accessed and retrieved using Universal Resource Locators (URL).

4. The data environment is continuously evolving.

New resources are actively generated either as part of an on-going experiment or as a result of analysis and assessment.

These new resources may be directly related to other resources. As new resources are generated or new associations defined the network or collections formed will expand over time.

2.4. Conceptual System Environment

{{{pdfAddPageItem id="intro.concept"}}}

The data model is intended to enable the sharing of knowledge through structured metadata (SPASE Descriptions) which can be exchanged in queries and responses between systems. The operational environment this occurs in is the current Internet where systems and users are loosely coupled and highly distributed. Special services or portals may harvest (collect) the SPASE descriptions from multiple sources to create an enriched capability for the user. For example, a search engine may provide a comprehensive search for a particular scientific discipline. The web site https://hpde.gsfc.nasa.gov gives a guide to many currently active projects and a great deal of background information. Of particular interest there is the document entitled, "A Framework for Space and Solar Physics Virtual Observatories."

Figure 1 illustrates a conceptual architecture in a distributed environment. In this environment multiple communities have resources to share. The storage location of a resource is called a repository. Some of these repositories (boxes) have local SPASE descriptions which are available through a local registry service (balls). The contents of other repositories are described at external, possibly independent, locations which make the descriptions available through remote registries. Gateways (rings) can harvest and aggregate the resources from multiple registries or perform federated searches which provide a single access point to multiple registries. Applications access the registries to discover resources, determine their location and retrieve them from the repositories.

3. SPASE Data Model

{{{pdfAddPageItem id="model"}}}

3.1. Resource Types

{{{pdfAddPageItem id="model.types"}}}

The top level entity in the SPASE data model is a Resource. There are 12 different types of resources. Each resource type consists of a set of attributes that characterize the resource. The resource types can be divided into three categories: Data Resources, Origination Resources and Infrastructure Resources.

This section provides an overview of the resource types. Complete details for each resource can be found in Section 4.

3.1.1. Data Resources

{{{pdfAddPageItem id="model.types.data"}}}

Data Resources describe one or more data products. A "data product" is a set of data that is uniformly processed and formatted, from one or more instruments, typically spanning the full duration of the observations of the relevant instrument(s). A data product may consist of a collection of granules of successive time spans, but may be high-level entities such as event catalogs. Data products can be images (Display Data), sample or observation values (Numerical Data), event lists (Catalog). Included in the Data Resource category are the resources used to describe individual files (Granule) which are part of data product sets and assessments of a resource (Annotations). The complete list of Data Resources is:

Numerical Data,
Display Data,
Catalog,
Annotation,
Document, and
Granule

3.1.2. Origination Resources

{{{pdfAddPageItem id="model.types.origin"}}}

Origination Resources describe the generators or sources of data. Included in a Data Resource description is information about the origination of the data. A Data Resource will refer to one or more Origination Resource. The complete list of Origination Resources is:

Observatory,
Instrument, and
Person

3.1.3. Infrastructure Resources

{{{pdfAddPageItem id="model.types.infrastructure"}}}

Infrastructure Resources describe system components that are part of the exchange and use of data. This includes storage locations for data (Repository), metadata (Registry) and functions (Service). The complete list of Infrastructure Resources is:

Registry,
Repository, and
Service

3.1.4. Ontology

{{{pdfAddPageItem id="model.types.ontology"}}}

In the SPASE data model there can be associations between pairs of resources. Some associations are specific and are required in order to fully describe a resource. For example, an Instrument resource is always associated with an Observatory resource. The specific associations form an ontology which is illustrated in Figure 2. The SPASE data model also allows associations of resources which are not explicitly defined in the ontology. These associations are described and assigned a relationship type using generic association attributes.

3.2. Resource Identifiers

{{{pdfAddPageItem id="model.id"}}}

Every resource has a unique identifier so that it can be tracked and referenced within a system. This identifier is defined by the naming authority for the resource. The entity which acts as the naming authority is determined by the agency or group who provides the resource. Each resource identifier is a URI that has the form

scheme://authority/path

where "scheme" is "spase" for those resources administered through the SPASE framework, "authority" is the unique identifier for the naming authority within the data environment and "path" is the unique local identifier of the resource within the context of the "authority". The resource ID must be unique within the data environment.

To illustrate the definition of a resource identifier consider that there is a registered "authority" called "SMWG" which maintains information for spacecraft (Observatory) resources. One such spacecraft is GOES8. Now "SMWG" decides that the "path" to the GOES8 resource description should include the Resource Type as part of the path and that the observatory "name" will be "GOES8". So, the resource identifier would be:

spase://SMWG/Observatory/GOES8

The Resource ID is used to formally or informally associate one resource with another. For example an Instrument resource must be formally associated with an Observatory. A Numerical Data resource may be formally associated with an Instrument resource and informally associated with other Numerical Data resources. The free association of resources allows networks or collections to be formed from distributed resources and allows for new associations to be formed as needed without affecting existing associations.

3.3. Core Attributes

{{{pdfAddPageItem id="model.core"}}} With the exception of Granule and Person, every resource has a common set of core attributes. The core attributes provide textual descriptions of the resource and the capability to reference external sources of information (Information URL). It also describes the context of the resource in the larger data environment. This context consists of associations with other resources (Association) and with previous versions (Prior ID). These attributes are grouped in a Resource Header and consists of:

Resource Name
Alternate Name
Release Date
Expiration Date
Description
Acknowledgement
Contact
Information URL
Association
Prior ID

3.4. Text Mark-up

{{{pdfAddPageItem id="model.markup"}}}

While descriptive text may be brief, some formatting of the text may be necessary to convey the necessary information, for example, multiple paragraphs or nested lists. To ensure system portability text values in SPASE are sequences of alphanumeric one byte UTF-8 (US_ASCII) characters with white space preserved. When text is displayed in some applications (a web browser is the best example) a strict preservation of white space may not result in a desirable presentation. Also, to make the metadata more human readable (for example in XML) additional white space may be introduced in the form of indentation. If strictly preserved, this could result in an undesirable presentation. To allow an author to express a preferred layout for the text, a special set of text "mark-up" rules are defined. The layout can then be determined by normalizing the text and applying a simple set of interpretation rules.

3.4.1 Normalization Rules

{{{pdfAddPageItem id="model.markup.normalization"}}}

To aid in determining the layout or structural intent of the author the following rules are to be applied to text to create a normalized form:

  1. All lines are to end with a newline character.
  2. All text is left justified. No line has leading whitespace.

Text Interpretation Rules

{{{pdfAddPageItem id="model.markup.interpret"}}}

After normalization of text the following rules can be used to interpret the layout intent of the author.

  1. Blank lines indicate paragraph breaks.
  2. Lists
    1. Must be preceded by a blank line.
    2. Items are indicated by a line beginning with a reserved character followed by a space. Three levels of lists are supported. The reserved characters are:
      * : First level list
      - : Second level list (must appear within a first level context)
      . : Third level list (must appear within a second level context)
    3. End with a blank line.
  3. Tables
    1. Begin and end with a line that starts with "+--".
    2. The first "row" of a table is the field headings.
    3. Fields in a table are separated with a vertical bar ("|").
    4. Visual row separators are lines which begin with "|--".

3.5. Extensions

{{{pdfAddPageItem id="model.extensions"}}}

The SPASE Data Model allows for additional metadata to be embedded within a SPASE description. Every Resource Type has an "Extension" element which can contain metadata compliant with other data models. The "Extension" element has a SPASE data model type of "Text", but is not limited to alphanumeric characters and may contain tagged information.

4. Guidelines for Metadata Descriptions

{{{pdfAddPageItem id="guidelines"}}}

The following sections describe the details of the SPASE Data Model, especially the metadata used to describe data. There is a richness in the available metadata that allows very detailed descriptions of products. Many of the types of metadata may not apply in your case or you may not need much detail to adequately describe your data holdings. But it must be remembered that the better data are described, the easier they will be to use.

To determine what level of detail is needed, we recommend considering not only what the user needs to find the correct data, but also what is necessary to know if the data will be useful for the requestor's purpose. The user might get this information by contacting you, but if the data were moved somewhere else and only the data description were available to determine the utility of the data, consider if the user would have sufficient information to know if this is the right data set and what problems might be associated with the use of these data. Also consider if additional documentation is neccesary and if so create an Document resource and associate it with the data resource. An "Information URL" may also be used to provide links to more detailed information.

In summary, products need not be described in minute detail, but users will need, at minimum, information for assessing what the data products represent and where to find them. Of course it is also useful to include information on how the data can be applied and common pitfalls in their use, but the first need is to make the products usefully visible.

5. Examples

{{{pdfAddPageItem id="examples"}}}

As an example let us describe a person using SPASE metadata. This person is "John Smith" from Smith Foundation. While the SPASE data model is implementation neutral, XML representation is preferred. This example uses the SPASE XML form.

 <?xml version="1.0" encoding="UTF-8" ?> 
 <Spase>
   <Version>2.0.0</Version>
    <Person>
      <ResourceID>spase://person/jsmith@smith.org</ResourceID>
      <PersonName>John Smith</PersonName>
      <OrganizationName>Smith Foundation</OrganizationName>
      <Address>1 Main St., Smithville, MA</Address>
      <Email>jsmith@smith.org</Email>
      <PhoneNumber>1-800-555-1212</PhoneNumber>
   </Person>
 </Spase>

For a more extensive example let us consider a collection of numerical data from the magnetometer on the ACE spacecraft. This data set has been averaged to 1 minute intervals (cadence) and spans the beginning of the mission to the end of 2004 (1997-09-01 through 2004-12-31). The ACE spacecraft orbits the L1 point between the Earth and the Sun. While the SPASE data model is implementation neutral, XML representation is preferred. This example uses the SPASE XML form. The presented URLs are fictitious and will not direct you to the actual data.

 <?xml version="1.0" encoding="UTF-8" ?> 
 <Spase>
   <Version>2.0.0</Version>
   <NumericalData>
      <ResourceID>spase://VMO/NumericalData/ACE/MAG/200301</ResourceID>
      <ResourceHeader>
         <ResourceName>ACEMAG200301</ResourceName>
         <ReleaseDate>2006-07-26T00:00:00.000</ReleaseDate>
         <Acknowledgement>                                             
            User will acknowledge the data producer and instrument P.I. in any     
            publication resulting from the use of these data.
         </Acknowledgement>
       <Description>
          ACE MFI 1-minute averaged magnetic-field data in GSE coordinates
          from Jan 2003. These data have been derived from the 16 second 
          resolution ACE MFI which were linearly interpolated to a 1-minute 
          time grid with time stamps at second zero of each minute.
       </Description>
       
       <Contact>
          <Role>PrincipalInvestigator</Role>
          <PersonID>spase://SMWG/Person/Norman.F.Ness</PersonID>
       </Contact>
     
       <Contact>
          <Role>Co-Investigator</Role>
          <PersonID>spase://SMWG/Person/Charles.Smith</PersonID>
       </Contact>
     
       <Contact>
          <Role>DataProducer</Role>
          <PresonID>spase://SMWG/Person/James.M.Weygand</PresonID>
       </Contact>
    </ResourceHeader>
    
    <AccessInformation>
       <AccessRights>Open</AccessRights>
       <AccessURL>
          <URL>http://www.igpp.ucla.edu/getResource?format=text&id=spase://UCLA/ACEMAG200301</URL>
       </AccessURL>
       <Format>Text</Format>
       <Encoding>GZIP</Encoding>
    </AccessInformation>
  
    <InstrumentID>spase://SMWG/ACE/MAG</InstrumentID>
    <MeasurementType>MagneticField</MeasurementType>
   
    <TemporalDescription>
       <TimeSpan>
         <StartDate>1997-01-01T00:00</StartDate>
         <StopDate>2004-01-31T23:59</StopDate>
       </TimeSpan>
       <Cadence>PT1M</Cadence>
    </TemporalDescription>
  
    <InstrumentRegion>Heliosphere.NearEarth</InstrumentRegion>
    <ObservedRegion>Heliosphere.NearEarth</ObservedRegion>
  
    <Parameter>
       <Name>SAMPLE_TIME_UTC</Name>
       <ParameterKey>time</ParameterKey>
       <Description>
        Sample UTC in the form DD MM YYYY hh mm ss where
          DD   = day of month (01-31)
          MM   = month of year (01-12)
          YYYY = Gregorian Year AD
          hh   = hour of day     (00:23)
          mm   = minute of hour  (00-59)
          ss   = second of minute (00-60).
       </Description>
       <Support>
         <SuportQuantity>Temporal</SuportQuantity>
       </Support>
    </Parameter>
  
    <Parameter>
       <Name>MAGNETIC_FIELD_VECTOR</Name>
       <Units>nT</Units>
       <CoordinateSystem>
          <CoordinateRepresentation>Cartesian</CoordinateRepresentation>
          <CoordinateSystemName>GSE</CoordinateSystemName>
       </CoordinateSystem>
       <Description>
           Magnetic field vector in GSE Coordinates (Bx, By, Bz).
       </Description>
       <Field>
          <Qualifier>Vector</Qualifier>
          <FieldQuantity>Magnetic</FieldQuantity>
      </Field>
    </Parameter>
  
    <Parameter>
       <Name>SPACECRAFT_POSITION_VECTOR</Name>
       <CoordinateSystem>
          <CoordinateRepresentation>Cartesian</CoordinateRepresentation>
          <CoordinateSystemName>GSE</CoordinateSystemName>
       </CoordinateSystem>
       <Units>EARTH RADII</Units>
       <UnitsConversion>6378.16 km</UnitsConversion>
       <Description>
          ACE spacecraft location in GSE coordinates (X,Y,Z)."
       </Description>
       <Support>
         <SuportQuantity>Positional</SupportQuantity>
       </Support>
    </Parameter>

   </NumericalData>
 </Spase>

6. Element Data Types

{{{pdfAddPageItem id="datatypes"}}}

Each element in the SPASE Data Model has a data type. One design feature of the SPASE data model is that an element can contain either a value or other elements. Mixed content (elements and values) are not allowed. This allows the data model to be implemented in a wider range of metadata languages. The following data types are supported:

{{#each data.type}}
{{type}}
{{definition}}
{{/each}}

7. Enumerations

{{{pdfAddPageItem id="enumerations"}}}

Lists are either "open" or "closed". The items in a "closed" list are determined by the SPASE model and definitions of each item is in the SPASE data dictionary. The items in an "open" list are determined by an external control authority. The URL for the control authority is indicated in the definition of each "open" list.

{{#each data.list}}
{{name}}
{{definition}}
Allowed Values: {{#each (array @root/data/member name) }} {{/each}}
{{/each}}

8. Data Model Tree

{{{pdfAddPageItem id="tree"}}}
The taxonomy tree shows the inter-relationship of elements in the data model. This provides a "big picture" view of the SPASE data model. This taxonomy is implementation neutral. Details for each element are contained in the data dictionary.
Notes: Occurence specifications are enclosed in parenthesis: 0 = optional, 1 = required, * = zero or more, + = 1 or more
{{#each (buildTree data.ontology "Spase" "1" "+") }}
{{prefix}} {{name}} ({{occurrence}})
{{/each}}

9. Dictionary

{{{pdfAddPageItem id="dictionary"}}} How to Read a Definition Each element has certain attributes and context for use. The details for each element are presented in the following form: {{#each data.dictionary}}
{{#if allowedValues}} {{/if}} {{#if subElements}} {{/if}}
{{term}} {{type}}
{{definition}}
 Since:{{since}}
  {{#each allowedValues}} {{/each}}
{{#if @first}}Allowed Values{{/if}}{{this}}
  {{#each subElements}} {{/each}}
{{#if @first}}Sub-elements{{/if}}{{this}}
  {{#each usedBy}} {{/each}}
{{#if @first}}Used by:{{/if}}{{this}}
{{/each}}

10. History

{{{pdfAddPageItem id="history"}}} {{#each (group data.history "version")}}
{{title}}
{{#each entry}} {{/each}}
{{updated}}{{description}}
{{/each}}

11. Bibliography

{{{pdfAddPageItem id="biblio"}}}
National Solar Observatory Sacramento Peak
http://www.sunspot.noao.edu/sunspot/pr/glossary.html
Terms and Definitions
http://www.pgd.hawaii.edu/eschool/glossary.htm
International System of Units (SI)
http://www.bipm.fr/en/si
Base units: http://www.bipm.fr/en/si/si_brochure/chapter2/2-1/#symbols
and those for Common derived units: http://www.bipm.fr/en/si/derived_units/2-2-2.html
ISO 8601:2004 - Date Format
http://en.wikipedia.org/wiki/ISO_8601
- or -
http://www.iso.ch/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=40874
- or -
http://www.iso.org/iso/en/prods-services/popstds/datesandtime.html
RFC 3339 - Date and Time on the Internet
The basis for the ISO 8601 standard. http://www.ietf.org/rfc/rfc3339.txt
RFC 1014 - XDR: External Data Representation standard
http://www.faqs.org/rfcs/rfc1014.html

12. Appendix A - Comparison of Spectrum Domains

{{{pdfAddPageItem id="appendix.a"}}}