Research in Heliophysics requires information from multiple sources which includes data from and about spacecrafts, groundbased observatories, models, simulations and more. The results from research are also invaluable in building up a body of knowledge and need to be available. All the different sources and types of information are considered a "Resource". The Resources exist, are shared, exchanged and used in a framework called the "data environment". The SPASE (Space Physics Archive Search and Extract) group has defined a Data Model which is a set of terms and values along with the relationships between them that allow describing all the resources in a heliophysics data environment. It is the result of many years of effort by an international collaboration of heliophysicists and information scientists to unify and improve on existing Space and Solar Physics data models. The intent of this Data Model is to provide the means to describe resources, most importantly scientifically useful data products, in a uniform way so they may be easily registered, found, accessed, and used.
The Data Model provides enough detail to allow a scientist to understand the content of Data Products (e.g., a set of files for 3 second resolution Geotail magnetic field data for1992 to 2005), together with essential retrieval and contact information. It also allows for the incremental annotation of resources with expert assessments and the free association of resources to create bundles or networks of resources. Resource descriptions can be stored with the data or at remote locations. Sites can harvest the resource descriptions to enable services like a search engine or portal (Virtual Observatory). A typical use would be to have a collection of descriptions stored in one or more related internet-based registries of products; that can be queried with specifically designed search engines and ultimately link users to the data they need. The Data Model also provides constructs for describing components of such a data delivery system. This includes repositories, registries and services.
The SPASE group website is located at https://www.spase-group.org/
A PDF version of this document can be downloaded from the SPASE site.
The SPASE (Space Physics Archive Search and Extract) Data Model is a set of terms and values along with the relationships between them that allow describing all the resources in a heliophysics data environment. It is the result of many years of effort by an international collaboration (see https://spase-group.org) to unify and improve on existing Space and Solar Physics data models. The intent of this Data Model is to provide the means to describe resources, most importantly scientifically useful data products, in a uniform way so they may be easily registered, found, accessed, and used.
The SPASE data model divides the heliophysics data environment into a limited set of resources types. A key resource type is Numerical Data. This type of resource typically consists of a set of files containing values of one or more physical variables and that differ from each other only by the time span. To fully describe a Numerical Data resource requires other types of Resources, namely Observatory, Instrument, Person, and Repository, whose names are self-explanatory, and each of which has its own set of attributes. Often, numerical data are presented in prepared images (gif or jpeg), and such presentations are referred to as Display Data resources. The other data related resource types are Catalog which are lists of events; Annotation which enable expert comments on data products; and Granule which describe individual files within another resource (i.e., Numerical Data, Display Data or Catalog). Other types of resources include Document which can contain narratives or supporting information; Service that provide software to use data resources; Repository for storage locations; and Registry for metadata collections. Resource descriptions and the links in them are intended to make the Resource useful to scientific users.
The data model presented here has grown from the efforts begun in 2002 that became formalized in regular teleconferences of a group of interested data providers, including scientific and technical representatives of some of the largest data holdings in the US, Europe, and Japan. As the effort to provide seamless access to distributed data proceeded, it became clear that the data model efforts were central. The SPASE Data Model was developed with an iterative process where additions were made when unaddressed needs were discovered. The original impetus occurred at an ISTP meeting in 1998 where a resolution was passed calling to make data more accessible. Interoperability test beds were constructed in 2001 and in 2002 a grassroots effort was undertaken to define the needs of community. In March of 2003 a meeting of many of the people in the Contributors list at the beginning of this document was convened to begin the data model construction in earnest. The initial effort involved collecting terms from CDPP, SWRI, NSSDC, ISTP, and other sets to form a starting point. Two years of teleconferences, e-mailed revisions, and occasional face-to-face efforts, along with the application of the terms to specific cases, led to the release of version 1.0 of the data model in November 2005. Following the release of version 1.0 many existing data products were described and lead to further improvements of the data model. Version1.1 was released in August 2006. At this time NASA established the Heliophysics VxOs and after an extended period of use and improvements version 1.2.2 was released in August of 2008. The version of the data model described in this document is an extension of this earlier release.
The design of the SPASE data model is based on a core set of principles related to the intended purpose of descriptive information (metadata), the data environment, and the operational environment. The overall goal of the Data Model is to be able to describe resources using a taxonomy of terms familiar to the heliophysics domain. This taxonomy should provide sufficient scientific context and data content information for an individual to assess the applicability of the resource (data and metadata) to a research question. A data model is the cornerstone of an information system and one purpose for the SPASE Data Model to enable the creation of "Virtual Observatories" that will link the broad range of heliophysics resources which may be available in a loosely coupled distributed environment. Additional goals of the data model are to:
The content of a resource description based on the data model should enable services (either at the provider or in a VxO) to discover and access individual resources. The service layer can contain services for a variety of purposes. The basic functionality of the service layer is to provide the links necessary to connect user applications and search- and-retrieval front ends to data repositories. Ultimately, the data environment based on the data model will involve a number of software tools and services linked together as an internet-based environment. The data along with software tools and documentation associated with products will be directly accessible using standard web protocols (http, ftp). This "system" has the potential to provide capabilities that can aid even expert users of a particular dataset (e.g., on-the-fly coordinate transformations, the ability to merge datasets from different instruments, easy reference to related indices or other data), in addition to providing the broad access needed to investigate emerging questions in heliophysics.
The design of the SPASE data model begins with a few basic principles. These principles are:
Data resources have internal schema or structures for storing values. The physical structure is determined by the storage format. Each retrievable entity on the format is assigned a key or tag which can be used to retrieve the entity.
The SPASE Data Model does not attempt to describe the physical storage of the parameters, for example, the byte offsets, record format or data encoding in the data resource. Instead, the SPASE Data Model describes the scientific attributes of the parameter and links this to the parameter by a key or tag used by the storage format. Applications can use the SPASE descriptions to locate a parameter and the appropriate format-specific reader to extract parameters.
Not all data in the Heliophysics data environment is stored in self- documented formats. For example, data stored as ASCII tables. The method of assigning a key or tag name for each field in the ASCII table is external to the SPASE data model. This method must be part of an "format" specification which may be as simple as the first row of the table containing the tag name of the field.
There are many providers of resources and these providers can be located anywhere in the world.
Each provider operates independently and activities are not necessarily coordinated. The SPASE data model assumes that providers have local autonomy and may operate under local rules or jurisdictions.
If a resource is on-line it can be accessed and retrieved using Universal Resource Locators (URL).
New resources are actively generated either as part of an on-going experiment or as a result of analysis and assessment.
These new resources may be directly related to other resources. As new resources are generated or new associations defined the network or collections formed will expand over time.
The data model is intended to enable the sharing of knowledge through structured metadata (SPASE Descriptions) which can be exchanged in queries and responses between systems. The operational environment this occurs in is the current Internet where systems and users are loosely coupled and highly distributed. Special services or portals may harvest (collect) the SPASE descriptions from multiple sources to create an enriched capability for the user. For example, a search engine may provide a comprehensive search for a particular scientific discipline. The web site https://hpde.gsfc.nasa.gov gives a guide to many currently active projects and a great deal of background information. Of particular interest there is the document entitled, "A Framework for Space and Solar Physics Virtual Observatories."
Figure 1 illustrates a conceptual architecture in a distributed environment. In this environment multiple communities have resources to share. The storage location of a resource is called a repository. Some of these repositories (boxes) have local SPASE descriptions which are available through a local registry service (balls). The contents of other repositories are described at external, possibly independent, locations which make the descriptions available through remote registries. Gateways (rings) can harvest and aggregate the resources from multiple registries or perform federated searches which provide a single access point to multiple registries. Applications access the registries to discover resources, determine their location and retrieve them from the repositories.
The top level entity in the SPASE data model is a Resource. There are 12 different types of resources. Each resource type consists of a set of attributes that characterize the resource. The resource types can be divided into three categories: Data Resources, Origination Resources and Infrastructure Resources.
This section provides an overview of the resource types. Complete details for each resource can be found in Section 4.
Data Resources describe one or more data products. A "data product" is a set of data that is uniformly processed and formatted, from one or more instruments, typically spanning the full duration of the observations of the relevant instrument(s). A data product may consist of a collection of granules of successive time spans, but may be high-level entities such as event catalogs. Data products can be images (Display Data), sample or observation values (Numerical Data), event lists (Catalog). Included in the Data Resource category are the resources used to describe individual files (Granule) which are part of data product sets and assessments of a resource (Annotations). The complete list of Data Resources is:
Origination Resources describe the generators or sources of data. Included in a Data Resource description is information about the origination of the data. A Data Resource will refer to one or more Origination Resource. The complete list of Origination Resources is:
Infrastructure Resources describe system components that are part of the exchange and use of data. This includes storage locations for data (Repository), metadata (Registry) and functions (Service). The complete list of Infrastructure Resources is:
In the SPASE data model there can be associations between pairs of resources. Some associations are specific and are required in order to fully describe a resource. For example, an Instrument resource is always associated with an Observatory resource. The specific associations form an ontology which is illustrated in Figure 2. The SPASE data model also allows associations of resources which are not explicitly defined in the ontology. These associations are described and assigned a relationship type using generic association attributes.
Every resource has a unique identifier so that it can be tracked and referenced within a system. This identifier is defined by the naming authority for the resource. The entity which acts as the naming authority is determined by the agency or group who provides the resource. Each resource identifier is a URI that has the form
where "scheme" is "spase" for those resources administered through the SPASE framework, "authority" is the unique identifier for the naming authority within the data environment and "path" is the unique local identifier of the resource within the context of the "authority". The resource ID must be unique within the data environment.
To illustrate the definition of a resource identifier consider that there is a registered "authority" called "SMWG" which maintains information for spacecraft (Observatory) resources. One such spacecraft is GOES8. Now "SMWG" decides that the "path" to the GOES8 resource description should include the Resource Type as part of the path and that the observatory "name" will be "GOES8". So, the resource identifier would be:
The Resource ID is used to formally or informally associate one resource with another. For example an Instrument resource must be formally associated with an Observatory. A Numerical Data resource may be formally associated with an Instrument resource and informally associated with other Numerical Data resources. The free association of resources allows networks or collections to be formed from distributed resources and allows for new associations to be formed as needed without affecting existing associations.
While descriptive text may be brief, some formatting of the text may be necessary to convey the necessary information, for example, multiple paragraphs or nested lists. To ensure system portability text values in SPASE are sequences of alphanumeric one byte UTF-8 (US_ASCII) characters with white space preserved. When text is displayed in some applications (a web browser is the best example) a strict preservation of white space may not result in a desirable presentation. Also, to make the metadata more human readable (for example in XML) additional white space may be introduced in the form of indentation. If strictly preserved, this could result in an undesirable presentation. To allow an author to express a preferred layout for the text, a special set of text "mark-up" rules are defined. The layout can then be determined by normalizing the text and applying a simple set of interpretation rules.
To aid in determining the layout or structural intent of the author the following rules are to be applied to text to create a normalized form:
After normalization of text the following rules can be used to interpret the layout intent of the author.
The SPASE Data Model allows for additional metadata to be embedded within a SPASE description. Every Resource Type has an "Extension" element which can contain metadata compliant with other data models. The "Extension" element has a SPASE data model type of "Text", but is not limited to alphanumeric characters and may contain tagged information.
The following sections describe the details of the SPASE Data Model, especially the metadata used to describe data. There is a richness in the available metadata that allows very detailed descriptions of products. Many of the types of metadata may not apply in your case or you may not need much detail to adequately describe your data holdings. But it must be remembered that the better data are described, the easier they will be to use.
To determine what level of detail is needed, we recommend considering not only what the user needs to find the correct data, but also what is necessary to know if the data will be useful for the requestor's purpose. The user might get this information by contacting you, but if the data were moved somewhere else and only the data description were available to determine the utility of the data, consider if the user would have sufficient information to know if this is the right data set and what problems might be associated with the use of these data. Also consider if additional documentation is neccesary and if so create an Document resource and associate it with the data resource. An "Information URL" may also be used to provide links to more detailed information.
In summary, products need not be described in minute detail, but users will need, at minimum, information for assessing what the data products represent and where to find them. Of course it is also useful to include information on how the data can be applied and common pitfalls in their use, but the first need is to make the products usefully visible.
As an example let us describe a person using SPASE metadata. This person is "John Smith" from Smith Foundation. While the SPASE data model is implementation neutral, XML representation is preferred. This example uses the SPASE XML form.
<?xml version="1.0" encoding="UTF-8" ?>
<Spase>
<Version>2.0.0</Version>
<Person>
<ResourceID>spase://person/jsmith@smith.org</ResourceID>
<PersonName>John Smith</PersonName>
<OrganizationName>Smith Foundation</OrganizationName>
<Address>1 Main St., Smithville, MA</Address>
<Email>jsmith@smith.org</Email>
<PhoneNumber>1-800-555-1212</PhoneNumber>
</Person>
</Spase>
For a more extensive example let us consider a collection of numerical data from the magnetometer on the ACE spacecraft. This data set has been averaged to 1 minute intervals (cadence) and spans the beginning of the mission to the end of 2004 (1997-09-01 through 2004-12-31). The ACE spacecraft orbits the L1 point between the Earth and the Sun. While the SPASE data model is implementation neutral, XML representation is preferred. This example uses the SPASE XML form. The presented URLs are fictitious and will not direct you to the actual data.
<?xml version="1.0" encoding="UTF-8" ?>
<Spase>
<Version>2.0.0</Version>
<NumericalData>
<ResourceID>spase://VMO/NumericalData/ACE/MAG/200301</ResourceID>
<ResourceHeader>
<ResourceName>ACEMAG200301</ResourceName>
<ReleaseDate>2006-07-26T00:00:00.000</ReleaseDate>
<Acknowledgement>
User will acknowledge the data producer and instrument P.I. in any
publication resulting from the use of these data.
</Acknowledgement>
<Description>
ACE MFI 1-minute averaged magnetic-field data in GSE coordinates
from Jan 2003. These data have been derived from the 16 second
resolution ACE MFI which were linearly interpolated to a 1-minute
time grid with time stamps at second zero of each minute.
</Description>
<Contact>
<Role>PrincipalInvestigator</Role>
<PersonID>spase://SMWG/Person/Norman.F.Ness</PersonID>
</Contact>
<Contact>
<Role>Co-Investigator</Role>
<PersonID>spase://SMWG/Person/Charles.Smith</PersonID>
</Contact>
<Contact>
<Role>DataProducer</Role>
<PresonID>spase://SMWG/Person/James.M.Weygand</PresonID>
</Contact>
</ResourceHeader>
<AccessInformation>
<AccessRights>Open</AccessRights>
<AccessURL>
<URL>http://www.igpp.ucla.edu/getResource?format=text&id=spase://UCLA/ACEMAG200301</URL>
</AccessURL>
<Format>Text</Format>
<Encoding>GZIP</Encoding>
</AccessInformation>
<InstrumentID>spase://SMWG/ACE/MAG</InstrumentID>
<MeasurementType>MagneticField</MeasurementType>
<TemporalDescription>
<TimeSpan>
<StartDate>1997-01-01T00:00</StartDate>
<StopDate>2004-01-31T23:59</StopDate>
</TimeSpan>
<Cadence>PT1M</Cadence>
</TemporalDescription>
<InstrumentRegion>Heliosphere.NearEarth</InstrumentRegion>
<ObservedRegion>Heliosphere.NearEarth</ObservedRegion>
<Parameter>
<Name>SAMPLE_TIME_UTC</Name>
<ParameterKey>time</ParameterKey>
<Description>
Sample UTC in the form DD MM YYYY hh mm ss where
DD = day of month (01-31)
MM = month of year (01-12)
YYYY = Gregorian Year AD
hh = hour of day (00:23)
mm = minute of hour (00-59)
ss = second of minute (00-60).
</Description>
<Support>
<SuportQuantity>Temporal</SuportQuantity>
</Support>
</Parameter>
<Parameter>
<Name>MAGNETIC_FIELD_VECTOR</Name>
<Units>nT</Units>
<CoordinateSystem>
<CoordinateRepresentation>Cartesian</CoordinateRepresentation>
<CoordinateSystemName>GSE</CoordinateSystemName>
</CoordinateSystem>
<Description>
Magnetic field vector in GSE Coordinates (Bx, By, Bz).
</Description>
<Field>
<Qualifier>Vector</Qualifier>
<FieldQuantity>Magnetic</FieldQuantity>
</Field>
</Parameter>
<Parameter>
<Name>SPACECRAFT_POSITION_VECTOR</Name>
<CoordinateSystem>
<CoordinateRepresentation>Cartesian</CoordinateRepresentation>
<CoordinateSystemName>GSE</CoordinateSystemName>
</CoordinateSystem>
<Units>EARTH RADII</Units>
<UnitsConversion>6378.16 km</UnitsConversion>
<Description>
ACE spacecraft location in GSE coordinates (X,Y,Z)."
</Description>
<Support>
<SuportQuantity>Positional</SupportQuantity>
</Support>
</Parameter>
</NumericalData>
</Spase>
Each element in the SPASE Data Model has a data type. One design feature of the SPASE data model is that an element can contain either a value or other elements. Mixed content (elements and values) are not allowed. This allows the data model to be implemented in a wider range of metadata languages. The following data types are supported:
{{#each data.type}}Lists are either "open" or "closed". The items in a "closed" list are determined by the SPASE model and definitions of each item is in the SPASE data dictionary. The items in an "open" list are determined by an external control authority. The URL for the control authority is indicated in the definition of each "open" list.
{{#each data.list}}
{{#each data.dictionary}}
| {{term}} | {{type}} | ||||
| {{definition}} | |||||
|
|||||
|
|||||
|
|||||
|
|||||
| {{title}} |
| {{updated}} | {{description}} |