2. Introduction

{{{pdfAddPageItem id="introduction"}}}

The SPASE (Space Physics Archive Search and Extract) Data Model is a set of terms and values along with the relationships between them that allow describing all the resources in a heliophysics data environment. It is the result of many years of effort by an international collaboration (see https://spase-group.org) to unify and improve on existing Space and Solar Physics data models. The intent of this Data Model is to provide the means to describe resources, most importantly scientifically useful data products, in a uniform way so they may be easily registered, found, accessed, and used.

The SPASE data model divides the heliophysics data environment into a limited set of resources types. A key resource type is Numerical Data. This type of resource typically consists of a set of files containing values of one or more physical variables and that differ from each other only by the time span. To fully describe a Numerical Data resource requires other types of Resources, namely Observatory, Instrument, Person, and Repository, whose names are self-explanatory, and each of which has its own set of attributes. Often, numerical data are presented in prepared images (gif or jpeg), and such presentations are referred to as Display Data resources. The other data related resource types are Catalog which are lists of events; Annotation which enable expert comments on data products; and Granule which describe individual files within another resource (i.e., Numerical Data, Display Data or Catalog). Other types of resources include Document which can contain narratives or supporting information; Service that provide software to use data resources; Repository for storage locations; and Registry for metadata collections. Resource descriptions and the links in them are intended to make the Resource useful to scientific users.

2.1. History of Development

{{{pdfAddPageItem id="intro.development"}}}

The data model presented here has grown from the efforts begun in 2002 that became formalized in regular teleconferences of a group of interested data providers, including scientific and technical representatives of some of the largest data holdings in the US, Europe, and Japan. As the effort to provide seamless access to distributed data proceeded, it became clear that the data model efforts were central. The SPASE Data Model was developed with an iterative process where additions were made when unaddressed needs were discovered. The original impetus occurred at an ISTP meeting in 1998 where a resolution was passed calling to make data more accessible. Interoperability test beds were constructed in 2001 and in 2002 a grassroots effort was undertaken to define the needs of community. In March of 2003 a meeting of many of the people in the Contributors list at the beginning of this document was convened to begin the data model construction in earnest. The initial effort involved collecting terms from CDPP, SWRI, NSSDC, ISTP, and other sets to form a starting point. Two years of teleconferences, e-mailed revisions, and occasional face-to-face efforts, along with the application of the terms to specific cases, led to the release of version 1.0 of the data model in November 2005. Following the release of version 1.0 many existing data products were described and lead to further improvements of the data model. Version1.1 was released in August 2006. At this time NASA established the Heliophysics VxOs and after an extended period of use and improvements version 1.2.2 was released in August of 2008. The version of the data model described in this document is an extension of this earlier release.

2.2. Intended Purpose

{{{pdfAddPageItem id="intro.purpose"}}}

The design of the SPASE data model is based on a core set of principles related to the intended purpose of descriptive information (metadata), the data environment, and the operational environment. The overall goal of the Data Model is to be able to describe resources using a taxonomy of terms familiar to the heliophysics domain. This taxonomy should provide sufficient scientific context and data content information for an individual to assess the applicability of the resource (data and metadata) to a research question. A data model is the cornerstone of an information system and one purpose for the SPASE Data Model to enable the creation of "Virtual Observatories" that will link the broad range of heliophysics resources which may be available in a loosely coupled distributed environment. Additional goals of the data model are to:

Provide a way of registering products using a standard set of terms that allow the products to be found with simple searches and described so that users can determine their utility for a specific purpose;
Allow searching for products containing particular physical quantities (e.g., magnetic field; spectral irradiance) that are variously represented in a diverse array of data products; and
Facilitate a means of mapping comparable variables from many products onto a common set of terms so that visualization, analysis, and higher-order query tools and services can be used on all of them without regard to the origin of the data.

The content of a resource description based on the data model should enable services (either at the provider or in a VxO) to discover and access individual resources. The service layer can contain services for a variety of purposes. The basic functionality of the service layer is to provide the links necessary to connect user applications and search- and-retrieval front ends to data repositories. Ultimately, the data environment based on the data model will involve a number of software tools and services linked together as an internet-based environment. The data along with software tools and documentation associated with products will be directly accessible using standard web protocols (http, ftp). This "system" has the potential to provide capabilities that can aid even expert users of a particular dataset (e.g., on-the-fly coordinate transformations, the ability to merge datasets from different instruments, easy reference to related indices or other data), in addition to providing the broad access needed to investigate emerging questions in heliophysics.

2.3. Design Principles

{{{pdfAddPageItem id="intro.principles"}}}

The design of the SPASE data model begins with a few basic principles. These principles are:

1. Data is self-documented.

Data resources have internal schema or structures for storing values. The physical structure is determined by the storage format. Each retrievable entity on the format is assigned a key or tag which can be used to retrieve the entity.

The SPASE Data Model does not attempt to describe the physical storage of the parameters, for example, the byte offsets, record format or data encoding in the data resource. Instead, the SPASE Data Model describes the scientific attributes of the parameter and links this to the parameter by a key or tag used by the storage format. Applications can use the SPASE descriptions to locate a parameter and the appropriate format-specific reader to extract parameters.

Not all data in the Heliophysics data environment is stored in self- documented formats. For example, data stored as ASCII tables. The method of assigning a key or tag name for each field in the ASCII table is external to the SPASE data model. This method must be part of an "format" specification which may be as simple as the first row of the table containing the tag name of the field.

2. Resources are distributed.

There are many providers of resources and these providers can be located anywhere in the world.

Each provider operates independently and activities are not necessarily coordinated. The SPASE data model assumes that providers have local autonomy and may operate under local rules or jurisdictions.

3. Online Resources have Universal Resource Locators (URL)

If a resource is on-line it can be accessed and retrieved using Universal Resource Locators (URL).

4. The data environment is continuously evolving.

New resources are actively generated either as part of an on-going experiment or as a result of analysis and assessment.

These new resources may be directly related to other resources. As new resources are generated or new associations defined the network or collections formed will expand over time.

2.4. Conceptual System Environment

{{{pdfAddPageItem id="intro.concept"}}}

The data model is intended to enable the sharing of knowledge through structured metadata (SPASE Descriptions) which can be exchanged in queries and responses between systems. The operational environment this occurs in is the current Internet where systems and users are loosely coupled and highly distributed. Special services or portals may harvest (collect) the SPASE descriptions from multiple sources to create an enriched capability for the user. For example, a search engine may provide a comprehensive search for a particular scientific discipline. The web site https://hpde.gsfc.nasa.gov gives a guide to many currently active projects and a great deal of background information. Of particular interest there is the document entitled, "A Framework for Space and Solar Physics Virtual Observatories."

Figure 1 illustrates a conceptual architecture in a distributed environment. In this environment multiple communities have resources to share. The storage location of a resource is called a repository. Some of these repositories (boxes) have local SPASE descriptions which are available through a local registry service (balls). The contents of other repositories are described at external, possibly independent, locations which make the descriptions available through remote registries. Gateways (rings) can harvest and aggregate the resources from multiple registries or perform federated searches which provide a single access point to multiple registries. Applications access the registries to discover resources, determine their location and retrieve them from the repositories.

3. SPASE Data Model

{{{pdfAddPageItem id="model"}}}

3.1. Resource Types

{{{pdfAddPageItem id="model.types"}}}

The top level entity in the SPASE data model is a Resource. There are 12 different types of resources. Each resource type consists of a set of attributes that characterize the resource. The resource types can be divided into three categories: Data Resources, Origination Resources and Infrastructure Resources.

This section provides an overview of the resource types. Complete details for each resource can be found in Section 4.

3.1.1. Data Resources

{{{pdfAddPageItem id="model.types.data"}}}

Data Resources describe one or more data products. A "data product" is a set of data that is uniformly processed and formatted, from one or more instruments, typically spanning the full duration of the observations of the relevant instrument(s). A data product may consist of a collection of granules of successive time spans, but may be high-level entities such as event catalogs. Data products can be images (Display Data), sample or observation values (Numerical Data), event lists (Catalog). Included in the Data Resource category are the resources used to describe individual files (Granule) which are part of data product sets and assessments of a resource (Annotations). The complete list of Data Resources is:

Numerical Data,
Display Data,
Catalog,
Annotation,
Document, and
Granule

3.1.2. Origination Resources

{{{pdfAddPageItem id="model.types.origin"}}}

Origination Resources describe the generators or sources of data. Included in a Data Resource description is information about the origination of the data. A Data Resource will refer to one or more Origination Resource. The complete list of Origination Resources is:

Observatory,
Instrument, and
Person

3.1.3. Infrastructure Resources

{{{pdfAddPageItem id="model.types.infrastructure"}}}

Infrastructure Resources describe system components that are part of the exchange and use of data. This includes storage locations for data (Repository), metadata (Registry) and functions (Service). The complete list of Infrastructure Resources is:

Registry,
Repository, and
Service

3.1.4. Ontology

{{{pdfAddPageItem id="model.types.ontology"}}}

In the SPASE data model there can be associations between pairs of resources. Some associations are specific and are required in order to fully describe a resource. For example, an Instrument resource is always associated with an Observatory resource. The specific associations form an ontology which is illustrated in Figure 2. The SPASE data model also allows associations of resources which are not explicitly defined in the ontology. These associations are described and assigned a relationship type using generic association attributes.

3.2. Resource Identifiers

{{{pdfAddPageItem id="model.id"}}}

Every resource has a unique identifier so that it can be tracked and referenced within a system. This identifier is defined by the naming authority for the resource. The entity which acts as the naming authority is determined by the agency or group who provides the resource. Each resource identifier is a URI that has the form

scheme://authority/path

where "scheme" is "spase" for those resources administered through the SPASE framework, "authority" is the unique identifier for the naming authority within the data environment and "path" is the unique local identifier of the resource within the context of the "authority". The resource ID must be unique within the data environment.

To illustrate the definition of a resource identifier consider that there is a registered "authority" called "SMWG" which maintains information for spacecraft (Observatory) resources. One such spacecraft is GOES8. Now "SMWG" decides that the "path" to the GOES8 resource description should include the Resource Type as part of the path and that the observatory "name" will be "GOES8". So, the resource identifier would be:

spase://SMWG/Observatory/GOES8

The Resource ID is used to formally or informally associate one resource with another. For example an Instrument resource must be formally associated with an Observatory. A Numerical Data resource may be formally associated with an Instrument resource and informally associated with other Numerical Data resources. The free association of resources allows networks or collections to be formed from distributed resources and allows for new associations to be formed as needed without affecting existing associations.

3.3. Core Attributes

{{{pdfAddPageItem id="model.core"}}} With the exception of Granule and Person, every resource has a common set of core attributes. The core attributes provide textual descriptions of the resource and the capability to reference external sources of information (Information URL). It also describes the context of the resource in the larger data environment. This context consists of associations with other resources (Association) and with previous versions (Prior ID). These attributes are grouped in a Resource Header and consists of:

Resource Name
Alternate Name
Release Date
Expiration Date
Description
Acknowledgement
Contact
Information URL
Association
Prior ID

3.4. Text Mark-up

{{{pdfAddPageItem id="model.markup"}}}

While descriptive text may be brief, some formatting of the text may be necessary to convey the necessary information, for example, multiple paragraphs or nested lists. To ensure system portability text values in SPASE are sequences of alphanumeric one byte UTF-8 (US_ASCII) characters with white space preserved. When text is displayed in some applications (a web browser is the best example) a strict preservation of white space may not result in a desirable presentation. Also, to make the metadata more human readable (for example in XML) additional white space may be introduced in the form of indentation. If strictly preserved, this could result in an undesirable presentation. To allow an author to express a preferred layout for the text, a special set of text "mark-up" rules are defined. The layout can then be determined by normalizing the text and applying a simple set of interpretation rules.

3.4.1 Normalization Rules

{{{pdfAddPageItem id="model.markup.normalization"}}}

To aid in determining the layout or structural intent of the author the following rules are to be applied to text to create a normalized form:

All lines are to end with a newline character.
All text is left justified. No line has leading whitespace.

Text Interpretation Rules

{{{pdfAddPageItem id="model.markup.interpret"}}}

After normalization of text the following rules can be used to interpret the layout intent of the author.

Blank lines indicate paragraph breaks.
Lists
1. Must be preceded by a blank line.
2. Items are indicated by a line beginning with a reserved character followed by a space. Three levels of lists are supported. The reserved characters are:
  * : First level list
  - : Second level list (must appear within a first level context)
  . : Third level list (must appear within a second level context)
3. End with a blank line.
Tables
1. Begin and end with a line that starts with "+--".
2. The first "row" of a table is the field headings.
3. Fields in a table are separated with a vertical bar ("|").
4. Visual row separators are lines which begin with "|--".

3.5. Extensions

{{{pdfAddPageItem id="model.extensions"}}}

The SPASE Data Model allows for additional metadata to be embedded within a SPASE description. Every Resource Type has an "Extension" element which can contain metadata compliant with other data models. The "Extension" element has a SPASE data model type of "Text", but is not limited to alphanumeric characters and may contain tagged information.

5. Examples

{{{pdfAddPageItem id="examples"}}}

As an example let us describe a person using SPASE metadata. This person is "John Smith" from Smith Foundation. While the SPASE data model is implementation neutral, XML representation is preferred. This example uses the SPASE XML form.

 <?xml version="1.0" encoding="UTF-8" ?> 
 <Spase>
   <Version>2.0.0</Version>
    <Person>
      <ResourceID>spase://person/jsmith@smith.org</ResourceID>
      <PersonName>John Smith</PersonName>
      <OrganizationName>Smith Foundation</OrganizationName>
      <Address>1 Main St., Smithville, MA</Address>
      <Email>jsmith@smith.org</Email>
      <PhoneNumber>1-800-555-1212</PhoneNumber>
   </Person>
 </Spase>

For a more extensive example let us consider a collection of numerical data from the magnetometer on the ACE spacecraft. This data set has been averaged to 1 minute intervals (cadence) and spans the beginning of the mission to the end of 2004 (1997-09-01 through 2004-12-31). The ACE spacecraft orbits the L1 point between the Earth and the Sun. While the SPASE data model is implementation neutral, XML representation is preferred. This example uses the SPASE XML form. The presented URLs are fictitious and will not direct you to the actual data.

 <?xml version="1.0" encoding="UTF-8" ?> 
 <Spase>
   <Version>2.0.0</Version>
   <NumericalData>
      <ResourceID>spase://VMO/NumericalData/ACE/MAG/200301</ResourceID>
      <ResourceHeader>
         <ResourceName>ACEMAG200301</ResourceName>
         <ReleaseDate>2006-07-26T00:00:00.000</ReleaseDate>
         <Acknowledgement>                                             
            User will acknowledge the data producer and instrument P.I. in any     
            publication resulting from the use of these data.
         </Acknowledgement>
       <Description>
          ACE MFI 1-minute averaged magnetic-field data in GSE coordinates
          from Jan 2003. These data have been derived from the 16 second 
          resolution ACE MFI which were linearly interpolated to a 1-minute 
          time grid with time stamps at second zero of each minute.
       </Description>
       
       <Contact>
          <Role>PrincipalInvestigator</Role>
          <PersonID>spase://SMWG/Person/Norman.F.Ness</PersonID>
       </Contact>
     
       <Contact>
          <Role>Co-Investigator</Role>
          <PersonID>spase://SMWG/Person/Charles.Smith</PersonID>
       </Contact>
     
       <Contact>
          <Role>DataProducer</Role>
          <PresonID>spase://SMWG/Person/James.M.Weygand</PresonID>
       </Contact>
    </ResourceHeader>
    
    <AccessInformation>
       <AccessRights>Open</AccessRights>
       <AccessURL>
          <URL>http://www.igpp.ucla.edu/getResource?format=text&id=spase://UCLA/ACEMAG200301</URL>
       </AccessURL>
       <Format>Text</Format>
       <Encoding>GZIP</Encoding>
    </AccessInformation>
  
    <InstrumentID>spase://SMWG/ACE/MAG</InstrumentID>
    <MeasurementType>MagneticField</MeasurementType>
   
    <TemporalDescription>
       <TimeSpan>
         <StartDate>1997-01-01T00:00</StartDate>
         <StopDate>2004-01-31T23:59</StopDate>
       </TimeSpan>
       <Cadence>PT1M</Cadence>
    </TemporalDescription>
  
    <InstrumentRegion>Heliosphere.NearEarth</InstrumentRegion>
    <ObservedRegion>Heliosphere.NearEarth</ObservedRegion>
  
    <Parameter>
       <Name>SAMPLE_TIME_UTC</Name>
       <ParameterKey>time</ParameterKey>
       <Description>
        Sample UTC in the form DD MM YYYY hh mm ss where
          DD   = day of month (01-31)
          MM   = month of year (01-12)
          YYYY = Gregorian Year AD
          hh   = hour of day     (00:23)
          mm   = minute of hour  (00-59)
          ss   = second of minute (00-60).
       </Description>
       <Support>
         <SuportQuantity>Temporal</SuportQuantity>
       </Support>
    </Parameter>
  
    <Parameter>
       <Name>MAGNETIC_FIELD_VECTOR</Name>
       <Units>nT</Units>
       <CoordinateSystem>
          <CoordinateRepresentation>Cartesian</CoordinateRepresentation>
          <CoordinateSystemName>GSE</CoordinateSystemName>
       </CoordinateSystem>
       <Description>
           Magnetic field vector in GSE Coordinates (Bx, By, Bz).
       </Description>
       <Field>
          <Qualifier>Vector</Qualifier>
          <FieldQuantity>Magnetic</FieldQuantity>
      </Field>
    </Parameter>
  
    <Parameter>
       <Name>SPACECRAFT_POSITION_VECTOR</Name>
       <CoordinateSystem>
          <CoordinateRepresentation>Cartesian</CoordinateRepresentation>
          <CoordinateSystemName>GSE</CoordinateSystemName>
       </CoordinateSystem>
       <Units>EARTH RADII</Units>
       <UnitsConversion>6378.16 km</UnitsConversion>
       <Description>
          ACE spacecraft location in GSE coordinates (X,Y,Z)."
       </Description>
       <Support>
         <SuportQuantity>Positional</SupportQuantity>
       </Support>
    </Parameter>

   </NumericalData>
 </Spase>

1. Executive Summary

2. Introduction

2.1. History of Development

2.2. Intended Purpose

2.3. Design Principles

2.4. Conceptual System Environment

3. SPASE Data Model

3.1. Resource Types

3.1.1. Data Resources

3.1.2. Origination Resources

3.1.3. Infrastructure Resources

3.1.4. Ontology

3.2. Resource Identifiers

3.3. Core Attributes

3.4. Text Mark-up

3.4.1 Normalization Rules

Text Interpretation Rules

3.5. Extensions

4. Guidelines for Metadata Descriptions

5. Examples

6. Element Data Types

7. Enumerations

8. Data Model Tree

9. Dictionary

10. History

11. Bibliography

12. Appendix A - Comparison of Spectrum Domains