E E E
134..PAGES   ?

 

 

 
© Guillaume DefossÈ
 
 

    134..PAGES                                              Foreword       

This Technical Specification has been produced by the DGS Generation Partnership Project (DGS DDS).

The contents of the present document are subject to continuing work within the DGS and may change following formal DGS approval. Should the DGS modify the contents of the present document, it will be re-released by the DGS with an identifying change of release date and an increase in version number as follows:

x     the first digit:

1    presented to DGS for information;

2    presented to DGS for approval;

3    or greater indicates DGS approved document under change control.

y    the second digit is incremented for all changes of substance, i.e. technical enhancements, corrections, updates, etc.

z     the third digit is incremented when editorial only changes have been incorporated in the specification;

The DGS  DDS transparent end-to-end packet-switched streaming service (DGS) specification consists of three DGS DDS  and the present document. The first  contains the service requirements for the DGS the second  provides an overview of the DGS DDS  and the present document the details of protocol and codecs used by the service.

Streaming refers to the ability of an application to play synchronised media streams like audio and video streams in a continuous way while those streams are being transmitted to the client over a data network.

Applications, which can be built on top of streaming services, can be classified into on-demand and live information delivery applications. Examples of the first category are music and news-on-demand applications. Live delivery of radio and television programs are examples of the second category.

The DGS  DDS  provides a framework for Internet Protocol (IP) based streaming applications in DGS networks.


The present document specifies the protocols and codecs for the DGS within the DGS DDS system. Protocols for control signalling, capability exchange, scene description, media transport and media encapsulations are specified. Codecs for speech, natural and synthetic audio, video, still images, bitmap graphics, vector graphics, timed text and text are specified.

The present document is applicable to IP based packet switched networks.

The following documents contain provisions which, through reference in this text, constitute provisions of the present document.

       References are either specific (identified by date of publication, edition number, version number, etc.) or non‑specific.

       For a specific reference, subsequent revisions do not apply.

       For a non-specific reference, the latest version applies.  In the case of a reference to a DGS DDS document (including a

3         Definitions and abbreviations

3.1        Definitions  DEFOSSE G

For the purposes of the present document, the following terms and definitions apply:

continuous media: media with an inherent notion of time. In the present document speech, audio, video and timed text

discrete media: media that itself does not contain an element of time.In the present document all media not defined as continuous media

device capability description: a description of device capabilities and/or user preferences. Contains a number of capability attributes

device capability profile: same as device capability description

presentation description: contains information about one or more media streams within a presentation, such as the set of encodings, network addresses and information about the content

DGSclient: client for the DGS packet switched streaming service based on the DGS DDS and/or HTTP standards, with possible additional DGS requirements according to the present document

DGS server: server for the DGS packet switched streaming service based on theDGS DDS and/or HTTP standards, with possible additional DGS requirements according to the present document

scene description: description of the spatial layout and temporal behaviour of a presentation. It can also contain hyperlinks

3.2        Abbreviations

For the purposes of the present document, the abbreviations given in DGS  and the following apply.

AAC                      Advanced Audio Coding

BIFS                       Binary Format for Scenes

DGS/DDS                   Composite Capability / Preference Profiles

DCT                       Discrete Cosine Transform

GIF                         Graphics Interchange Format

HTML                   Hyper Text Markup Language

ITU-T                    International Telecommunications Union ñ Telecommunications

JFIF                        JPEG File Interchange Format

MIDI                      Musical Instrument Digital Interface

MIME                    Multipurpose Internet Mail Extensions

MMS                     Multimedia Messaging Service

MP4                       MPEG-4 file format

PNG                       Portable Networks Graphics

PSS                        Packet-switched Streaming Service

QCIF                      Quarter Common Intermediate Format

RDF                       Resource Description Framework

RTCP                     RTP Control Protocol

RTP                        Real-time Transport Protocol

RTSP                     Real-Time Streaming Protocol

SDP                        Session Description Protocol

SMIL                     Synchronised Multimedia Integration Language

SP-MIDI                Scalable Polyphony MIDI

SVG                        Scalable Vector Graphics

UAProf                  User Agent Profile

UCS-2                    Universal Character Set (the two octet form)

UTF-8                    Unicode Transformation Format (the 8-bit form)

UTF-16                  Unicode Transformation Format (the 16-bit form)

WML                     Wireless Markup Language

XHTML                eXtensible Hyper Text Markup Language

XML                      eXtensible Markup Language

:

 the functional components of aDGS client. gives an overview of the protocol stack used in a DGS client and also shows a more detailed view of the packet based network interface. The functional components can be divided into control, scene description, media codecs and the transport of media and control data.

The control related elements are session establishment, capability exchange and session control (see clause 5).

-     Session establishment refers to methods to invoke a DGS session from a browser or directly by entering an URL in the terminal's user interface.

-     Capability exchange enables choice or adaptation of media streams depending on different terminal capabilities.

-     Session control deals with the set-up of the individual media streams between a DGS client and one or several DGS servers. It also enables control of the individual media streams by the user. It may involve VCR-like presentation control functions like start, pause, fast forward and stop of a media presentation.

The scene description consists of spatial layout and a description of the temporal relation between different media that is included in the media presentation. The first gives the layout of different media components on the screen and the latter controls the synchronisation of the different media (see clause 8).

The PSS includes media codecs for video, still images, vector graphics, bitmap graphics, text, timed text, natural and synthetic audio, and speech (see clause 7).

Transport of media and control data consists of the encapsulation of the coded media and control data in a transport protocol (see clause 6). This is shown in figure 1 as the "packet based network interface" and displayed in more detail in the protocol stack of  DEFOSSE G.

5.1        Session establishment

Session establishment refers to the method by which a DGS client obtains the initial session description. The initial session description can e.g. be a presentation description, a scene description or just an URL to the content.

A DGS client shall support initial session descriptions specified in one of the following formats: SMIL, SDP, or plain RTSP URL.

In addition to rtsp:// the DGS client shall support URLs [4] to valid initial session descriptions starting with file:// (for locally stored files) and http:// (for presentation descriptions or scene descriptions delivered via HTTP). rtsp://mediaportal/morning_news.

URLs can be made available to a DGS client in many different ways. It is out of the scope of this recommendation to mandate any specific mechanism. However, an application using the DGS shall at least support URLs of the above type, specified or selected by the user.

The preferred way would be to embed URLs to initial session descriptions within HTML or WML pages. Browser applications that support the HTTP protocol could then download the initial session description and pass the content to the DGS client for further processing. How exactly this is done is an implementation specific issue and out of the scope of this recommendation.

5.2        Capability excha5.2.1       General

Capability exchange is an important functionality in the DGS. It enables DGS servers to provide a wide range of devices with content suitable for the particular device in question. Another very important task is to provide a smooth transition between different releases of DGS. Therefore, DGS clients and servers should support capability exchange.

The specification of capability exchange for DGS is divided into two parts. The normative part contained in clause 5.2 and an informative part in clause A.4 in Annex A of the present document. The normative part gives all the necessary requirements that a client or server shall conform to when implementing capability exchange in the DGS. The informative part provides additional important information for understanding the concept and usage of the functionality. It is recommended to read clause A.4 in Annex A before continuing with clauses 5.2.2-5.2.7.

5.2.2   The device capability profile structure

A device capability profile is a RDF [41] document that follows the structure of the DGS/DDS framework [39] and the DGS/DDS  application UAProf [40]. Attributes are used to specify device capabilities and preferences. A set of attribute names, permissible values and semantics constitute a DGS/DDS  vocabulary, which is defined by a RDF schema. For DGS the UAProf vocabulary is reused and an additional DGS specific vocabulary is defined. The details can be found in clause 5.2.3. The syntax of the attributes is defined in the vocabulary schema but also, to some extent, the semantics. A DGS device capability profile is an instance of the schema (UAProf and/or the PSS specific schema) and shall follow the rules governing the formation of a profile given in the DGS DDS specification [39]. The profile schema shall also be governed by the rules defined in UAProf [40] chapter 7, 7.1, 7.3 and 7.4.

5.2.3       Vocabularies for DGS

Clause 5.2.3 specifies the attribute vocabularies to be used by the DGS capability exchange.

DGS servers should understand the attributes in both the streaming component of the DGS base vocabulary and the recommended attributes from the UAProf vocabulary [40]. A server may additionally support other UAProf attributes5.2.3.2            DGS base vocabulary

The DGS base vocabulary contains one component called "Streaming". A vocabulary extension to UAProf shall be defined as a RDF schema. This schema can be found in Annex F. The schema together with the description of the attributes in the present clause, defines the vocabulary. The vocabulary is associated with an XML namespace, which combines a base URI with a local XML element name to yield a URI. Annex F provides the details.

All DGS attributes are put in aDGS specific component called ìStreamingî. The list of DGS attributes is as follows:

Attribute name:          AudioChannels

Attribute definition:   This attribute describes the stereophonic capability of the natural audio device.

Component:                Streaming

Type:                            Literal

Legal values:               ìMonoî, ìStereoî

Resolution rule:          Locked

EXAMPLE 1:        <AudioChannels>Mono</AudioChannels>

 

Attribute name:          MaxPolyphony

Attribute definition:   The MaxPolyphony attribute refers to the maximal polyphony that the synthetic audio device supports as defined in [44].

NOTE:       MaxPolyphony attribute can be used to signal the maximum polyphony capabilities supported by the DGS client. This is a complementary mechanism for the delivery of compatible SP-MIDI content and thus the DGS client is required to support Scalable Polyphony MIDI i.e. Channel Masking defined in [44].

Component:                Streaming

Type:                            Number

Legal values:               Integer between 5 and 24

Resolution rule:          Locked

EXAMPLE 2:                     <MaxPolyphony>8</MaxPolyphony>

 

Attribute name:         DGS Accept

Attribute definition:   List of content types (MIME types) the DGS application supports. Both CcppAccept (SoftwarePlatform, UAProf) and PssAccept can be used but if PssAccept is defined it has precedence over CcppAccept.

Component:                Streaming

Type:                            Literal (Bag)

Legal values:               List of MIME types with related parameters.

Resolution rule:          Append

EXAMPLE 3:        <DGS Accept>
  <rdf:Bag>
    <rdf:li>audio/AMR-WB; octet-alignment</rdf:li>
    <rdf:li>application/smil</rdf:li>
  </rdf:Bag>
</
DGS Accept>

 

Attribute name:          DGS Accept-Subset

Attribute definition:   List of content types for which the DGS application supports a subset. MIME-types can in most cases effectively be used to express variations in support for different media types. Many MIME-types, e.g. AMR-NB has several parameters that can be used for this purpose. There may exist content types for which the DGS application only supports a subset and this subset can not be expressed with MIME-type parameters. In these cases the attribute DGS Accept-Subset is used to describe support for a subset of a specific content type. If a subset of a specific content type is declared in DGS Accept-Subset, this means thatDGS Accept-Subset has precedence over both DGS Accept and DDS Accept. DGS Accept and/or DDS Accept shall always include the corresponding content types for which DGS Accept-Subset specifies subsets of.  This is to ensure compatibility with those content servers that do not understand the DGS Accept-Subset attribute but do understand e.g. DDS Accept.

This is illustrated with an example. DGS DDS="audio/AMR", "image/jpeg" and DGS Accept-Subset="JPEG-DGS" then "audio/AMR" and JPEG Base line are supported. "image/jpeg" in DGS Accept is of no importance since it is related to "JPEG-DGS" in DDS Accept-Subset. Subset identifiers and corresponding semantics shall only be defined by the DDS responsible for the present document. The following values are defined:

-     "JPEG-DGS": Only the two JPEG modes described in clause 7.5 of the present document are supported.

-     "SVG-Tiny"

-     "SVG-Basic"

Component:                Streaming

Type:                            Literal (Bag)

Legal values:               "JPEG-DGS", "SVG-Tiny", "SVG-Basic"

Resolution rule:          Append

EXAMPLE 4:        <DGS Accept-Subset>
  <rdf:Bag>
    <rdf:li>JPEG-
DGS DDS </rdf:li>
  </rdf:Bag>
</
DGS Accept-Subset>

 

Attribute name:          DGS Version

Attribute definition:   DGS version supported by the client.

Component:                Streaming

Type:                            Literal

Legal values:               "DGS DDS-R4", "DGS " and so forth.

Resolution rule:          Locked

EXAMPLE 5:        <DGS Version>DGS DDS</DDS Version>

 

Attribute name:          RenderingScreenSize

Attribute definition:   The rendering size of the deviceís screen in unit of pixels. The horizontal size is given followed by the vertical size.

Component:                Streaming

Type:                            Dimension

Legal values:               Two integer values equal or greater than zero. A value equal ì0x0îmeans that there exists no possibility to render visual DGS presentations.

Resolution rule:          Locked

EXAMPLE 6:        <RenderingScreenSize>70x15</RenderingScreenSize>

 

Attribute name:          SmilBaseSet

Attribute definition:   Indicates a base set of SMIL 2.0 modules that the client supports.

Component:                Streaming

Type:                           Literal

Legal values:               Pre-defined identifiers. "SMIL-DGS-R4" indicates all SMIL 2.0 modules required for scene description support according to clause 8 of Release 4 of TS 26.234. "SMIL-DGS" indicates all SMIL 2.0 modules required for scene description support according to clause 8 of the present document DEFOSSE G

Resolution rule:          Locked

EXAMPLE 7:        <SmilBaseSet>SMIL-DGS.

 

Attribute name:          SmilModules

Attribute definition:   This attribute defines a list of SMIL 2.0 modules supported by the client. If the SmilBaseSet is used those modules do not need to be explicitly listed here. In that case only additional module support needs to be listed.

Component:                Streaming

Type:                            Literal (Bag)

Legal values:               SMIL 2.0 module names defined in the SMIL 2.0 recommendation [31], section 2.3.3, table 2.

Resolution rule:          Append

EXAMPLE 8:   <SmilModules>
  <rdf:Bag>
    <rdf:li>BasicTransitions</rdf:li>
    <rdf:li>MulitArcTiming</rdf:li>
  </rdf:Bag>
</SmilModules>

 

Attribute name:          VideoDecodingByteRate

Attribute definition:   If Annex DGS is not supported, the attribute has no meaning. If Annex DGS is supported, this attribute defines the peak decoding byte rate the DGS client is able to support. In other words, the DGS client fulfils the requirements given in Annex DGS with the signalled peak decoding byte rate. The values are given in bytes per second and shall be greater than or equal to 8000. According to Annex DGS, 8000 is the default peak decoding byte rate for the mandatory video codec profile and level (H.263 Profile 0 Level 10).

Component:                Streaming

Type:                            Number

Legal values:               Integer value greater than or equal to 8000.

Resolution rule:          Locked

EXAMPLE 9:        <VideoDecodingByteRate>16000</VideoDecodingByteRate>

 

Attribute name:          VideoInitialPostDecoderBufferingPeriod

Attribute definition:   If Annex DGS is not supported, the attribute has no meaning. If Annex DGS is supported, this attribute defines the maximum initial post-decoder buffering period of video. Values are interpreted as clock ticks of a 90-kHz clock. In other words, the value is incremented by one for each 1/90 000 seconds. For example, the value 9000 corresponds to 1/10 of a second initial post-decoder buffering.

Component:                Streaming

Type:                            Number

Legal values:               Integer value equal to or greater than zero.

Resolution rule:          Locked

EXAMPLE 10:            <VideoInitialPostDecoderBufferingPeriod>9000
</VideoInitialPostDecoderBufferingPeriod>

 

Attribute name:          VideoPreDecoderBufferSize

Attribute definition:   This attribute signals if the optional video buffering requirements defined in Annex DGS are supported. It also defines the size of the hypothetical pre-decoder buffer defined in Annex DGS. A value equal to zero means that Annex DGS is not supported. A value equal to one means that Annex DGS is supported. In this case the size of the buffer is the default size defined in Annex DGS.  A value equal to or greater than the default buffer size defined in Annex DGS means that Annex DGS is supported and sets the buffer size to the given number of octets.

Component:                Streaming

Legal values:               Integer value equal to or greater than zero. Values greater than one but less than the default buffer size defined in Annex DGS are not allowed.

Resolution rule:          Locked

EXAMPLE 11:      <VideoPreDecoderBufferSize>30720</VideoPreDecoderBufferSize>

5.2.3.3            Attributes from UAProf

In the UAProf vocabulary [40] there are several attributes that are of interest for the DGS. The formal definition of these attributes is given in [40]. The following list of attributes is recommended for DGS applications:

 

Attribute name:                 BitsPerPixel

Component:                       HardwarePlatform

Attribute description:      The number of bits of colour or greyscale information per pixel

EXAMPLE 1:        <BitsPerPixel>8</BitsPerPixel>

 

Attribute name:                 ColorCapable

Component:                       HardwarePlatform

Attribute description:      Whether the device display supports colour or not.

EXAMPLE 2:        <ColorCapable>Yes</ColorCapable>

 

Attribute name:                 PixelAspectRatio

Component:                       HardwarePlatform

Attribute description:      Ratio of pixel width to pixel height

EXAMPLE 3:        <PixelAspectRatio>1x2</PixelAspectRatio>

 

Attribute name:                 PointingResolution

Component:                       HardwarePlatform

Attribute description:      Type of resolution of the pointing accessory supported by the device.

EXAMPLE 4:        <PointingResolution>Pixel</PointingResolution>

 

Attribute name:                 Model

Component:                       HardwarePlatform

Attribute description:      Model number assigned to the terminal device by the vendor or manufactur

 

Attribute name:                 Vendor

Component:                       HardwarePlatform

Attribute description:      Name of the vendor manufacturing the terminal devic

 

Attribute name:                 CcppAccept-Charset

Component:                       SoftwarePlatform DGS

Attribute description:      List of character sets the device supports

EXAMPLE 7:        <DGS Accept-Charset>
  <rdf:Bag>
    <rdf:li>UTF-8</rdf:li>
  </rdf:Bag>
</
DGS Accept-Charset>

 

Attribute name:                 DGS Accept-Encoding

Component:                       SoftwarePlatform DGS

Attribute description:      List of transfer encodings the device supports

EXAMPLE 8:        <DGS Accept-Encoding>
  <rdf:Bag>
    <rdf:li>base64</rdf:li>
  </rdf:Bag>
</
DGS Accept-Encoding>

 

Attribute name:                 DGS Accept-Language

Component:                       SoftwarePlatform

Attribute description:      List of preferred document languages

EXAMPLE 9:              <DGS Accept-Language>
  <rdf:Seq>
    <rdf:li>en</rdf:li>
    <rdf:li>se</rdf:li>
  </rdf:Seq>
</
DGS Accept-Language>

5.2.4       Extensions to the DGS schema/vocabulary

The use of RDF enables an extensibility mechanism for DGS/DDS based schemas that addresses the evolution of new types of devices and applications. The DGS profile schema specification is going to provide a base vocabulary but in the future new usage scenarios might have need for expressing new attributes. If the base vocabulary is updated a new unique namespace will be assigned to the updated schema. The base vocabulary shall only be changed by the DGS responsible for the present document. All extensions to the profile schema shall be governed by the rules defined in [40] clause 7.7.

5.2.5   Signalling of profile information between client and server

When a DGS client or server support capability exchange it shall support the profile information transport over both HTTP and RTSP between client and server as defined in clause 9.1 (including its subsections) of the WAP 2.0 UAProf specification [40] with the following additions:

-     The "x-wap-profile" and "x-wap-profile-diff" headers may not be present in all HTTP or RTSP request. That is, the requirement to send this header in all requests has been relaxed.

-     The defined headers may be applied to both RTSP and HTTP.

-     The "x-wap-profile-diff" header is only valid for the current request. The reason is that DGS does not have the WSP session concept of WAP.

-     Push is not relevant for the DGS.

The following recommendations are made to how and when profile information should be sent between client and server:

-     DGS content servers supporting capability exchange shall be able to receive profile information in all HTTP and RTSP requests.

-     The terminal should not send the "x-wap-profile-diff" header over the air-interface since there is no compression scheme defined.

-     RTSP: the client should send profile information in the DESCRIBE message. It may send it in any other request.

If the terminal has some prior knowledge about the file type it is about to retrieve, e.g. file extensions, the following apply: 

-     HTTP and SDP: when retrieving an SDP with HTTP the client should include profile information in the GET request. This way the HTTP server can deliver an optimised SDP to the client.

-     HTTP and SMIL: When retrieving a SMIL file with HTTP the client should include profile information in the GET request. This way the HTTP server can deliver an optimised SMIL presentation to the client. A SMIL presentation can include links to static media. The server should optimise the SMIL file so that links to the referenced static media are adapted to the requesting client. When the "x-wap-profile-warning" indicates that content selection has been applied (201-203) the DGS client should assume that no more capability exchange has to be performed for the static media components. In this case it should not send any profile information when retrieving static media to be included in the SMIL presentation. This will minimise the HTTP header overhead.

5.2.6       Merging device capability profiles

Profiles need to be merged whenever the DGS server receives multiple device capability profiles. Multiple occurrences of attributes and default values make it necessary to resolve the profiles according to a resolution process.

The resolution process shall be the same as defined in UAProf [40] clause 6.4.1.

-     Resolve all indirect references by retrieving URI references contained within the profile.

-     Resolve each profile and profile-diff document by first applying attribute values contained in the default URI references and by second applying overriding attribute values contained within the category blocks of that profile or profile-diff.

-     Determine the final value of the attributes by applying the resolved attribute values from each profile and profile-diff in order, with the attribute values determined by the resolution rules provided in the schema. Where no resolution rules are provided for a particular attribute in the schema, values provided in profiles or profile-diffs are assumed to override values provided in previous profiles or profile-diffs.

When several URLs are defined in the "x-wap-profile" header and there exists any attribute that occurs more than once in these profiles the rule is that the attribute value in the second URL overrides, or is overridden by, or is appended to the attribute value from the first URL (according to the resolution rule) and so forth. This is what is meant with "Determine the final value of the attributes by applying the resolved attribute values from each profile and profile-diff in order, withÖ" in the third bullet above. If the profile is completely or partly inaccessible or otherwise corrupted the server should still provide content to the client. The server is responsible for delivering content optimised for the client based on the received profile in a best effort manner.

NOTE:       For the reasons explained in Annex A clause A.4.3 the usage of indirect references in profiles (using the DGS/DDS defaults element) is not recommended5.2.7       Profile transfer between the DGS server and the device profile server

The device capability profiles are stored on a device profile server and referenced with URLs. According to the profile resolution process in clause 5.2.6 of the present document, the DGS server ends up with a number of URLs referring to profiles and these shall be retrieved.

-     The device profile server shall support HTTP 1.1 for the transfer of device capability profiles to the DGS server.

-     If theDGS server supports capability exchange it shall support HTTP 1.1 for transfer of device capability profiles from the device profile server. A URL shall be used to identify a device capability profile.

-     Normal content caching provisions as defined by HTTP apply.

5.3.1       General

Continuous media is media that has an intrinsic time line. Discrete media on the other hand does not itself contain an element of time. In this specification speech, audio and video belongs to first category and still images and text to the latter one.

Streaming of continuous media using RTP/UDP/IP (see clause 6.2) requires a session control protocol to set-up and control of the individual media streams. For the transport of discrete media (images and text), vector graphics, timed text and synthetic audio this specification adopts the use of HTTP/TCP/IP (see clause 6.3). In this case there is no need for a separate session set-up and control protocol since this is built into HTTP. This clause describes session set-up and control of the continuous media speech, audio and video.

5.3.2       RTSP

RTSP [5] shall be used for session set-up and session control. DGS clients and servers shall follow the rules for minimal on-demand playback RTSP implementations in appendix D of [5]. In addition to this:

-     DGS servers and clients shall implement the DESCRIBE method (see clause 10.2 in [5]);

-     DGS servers and clients shall implement the Range header field (see clause 12.29 in [5]);

-     DGS servers shall include the Range header field in all PLAY responses.

5.3.3       SDP

DGS DDS  requires a presentation description. DGS shall be used as the format of the presentation description for both DGS clients and servers. DGS servers shall provide and clients interpret the SDP syntax according to the DGS DDS specification [6] and appendix C of [5]. The SDP delivered to the DGS client shall declare the media types to be used in the session using a codec specific MIME media type for each media. MIME media types to be used in the DGS file are described in clause 5.4 of the present document.

The DGS [6] specification requires certain fields to always be included in an DGS file. Apart from this a DGS server shall always include the following fields in the ALL DGS/DDS:

-     "a=control:" according to clauses C.1.1, C.2 and C.3 in [5];

-     "a=range:" according to clause C.1.5 in [5];

-     "a=rtpmap:" according to clause 6 in [6];

-     "a=fmtp:" according to clause 6 in [6].

The bandwidth field in DGS should be used to indicate to the DGS client the amount of bandwidth that is required for the session and the individual media in the presentation. Therefore, a DGS server should include the "b=AS:" field in the DGS (both on the session and media level) and a DGS client shall be able to interpret this field. For RTP based applications, AS gives the DDS "session bandwidth'' (including UDP/IP overhead) as defined in section 6.2 of [9].

NOTE: The DGS/DDS  parsers and/or interpreters shall be able to accept NULL values in the 'c=' field (e.g. 0.0.0.0 in IPv4 case). This may happen when the media content does not have a fixed destination address. For more details, see Section C.1.7 of [5] and Section 6 of [6].

5.3.3.2            Additional SDP fields

The following  media level ALL fields are defined for DGS  DDS:

-     "a=X-predecbufsize:<size of the hypothetical pre-decoder buffer>"
This gives the suggested size of the Annex
DGS hypothetical pre-decoder buffer in bytes.

-     "a=X-initpredecbufperiod:<initial pre-decoder buffering period>"
This gives the required initial pre-decoder buffering period specified according to Annex
DGS. Values are interpreted as clock ticks of a 90-kHz clock. That is, the value is incremented by one for each 1/90 000 seconds. For example, value 180 000 corresponds to a two second initial pre-decoder buffering.

-     "a=X-initpostdecbufperiod:<initial post-decoder buffering period>"
This gives the required initial post-decoder buffering period specified according to Annex
DGS. Values are interpreted as clock ticks of a 90-kHz clock.

-     "a=X-decbyterate:<peak decoding byte rate>"
This gives the peak decoding byte rate that was used to verify the compatibility of the stream with Annex
DGS. Values are given in bytes per second.

If none of the attributes "a=X-predecbufsize:", "a=X-initpredecbufperiod:", "a=X-initpostdecbufperiod:", and "a=x-decbyterate:" is present, clients should not expect a packet stream according to AnnexDGS . If at least one of the listed attributes is present, the transmitted video packet stream shall conform to Annex G. If at least one of the listed attributes is present, but some of the listed attributes are missing in an DGS/DDS description, clients should expect a default value for the missing attributes according to Annex DGS.

5.4        MIME media types

For continuous media (speech, audio and video) the following MIME media types shall be used:

-     AMR narrow-band speech codec (see clause 7.2) MIME media type as defined in [11];

-     AMR wideband speech codec (see clause 7.2) MIME media type as defined in [11];

-     MPEG-4 AAC audio codec (see clause 7.3) MIME media type as defined in RFC 3016 [13]. When used in DGS the attribute ìcpresentî SHALL be set to ì0î indicating that the configuration information is only carried out of band in the DGS ìconfigî parameter;

-     MPEG-4 video codec (see clause 7.4) MIME media type as defined in RFC 3016 [13]. When used inDGS the configuration information shall be carried outband in the "config" DGS parameter and inband  (as stated in RFC 3016). As described in RFC 3016, the configuration information sent inband and the config information in the DGS shall be the same except that first_half_vbv_occupancy and latter_half_vbv_occupancy which, if exist, may vary in the configuration information sent inband;

-     H.263 [22] video codec (see clause 7.4) MIME media type as defined in annex C, clause C.1 of the present document.

MIME media types for JPEG, GIF, PNG, SP-MIDI, SVG, timed text and XHTML can be used both in the "Content-type" field in HTTP and in the "type" attribute in SMIL 2.0. The following MIME media types shall be used for these media:

-     JPEG (see clause 7.5) MIME media type as defined in [15];

-     GIF (see clause 7.6) MIME media type as defined in [15];

-     PNG (see sub clause 7.6) MIME media type as defined in [38];

-     SP-MIDI (see sub clause 7.3A) MIME media type as defined in clause C.2 in Annex C of the present document;

-     SVG (see sub clause 7.7) MIME media type as defined in [42];

-     XHTML (see clause 7.8) MIME media type as defined in [16];

-     Timed text (see subclause 7.9) MIME media type as defined in clause D.9 in Annex D of the present document.

MIME media type used for SMIL files shall be according to [31] and for DGS/DDS  files according to [6].

6.1        Packet based network interface

DGS clients and servers shall support an IP-based network interface for the transport of session control and media data. Control and media data are sent using DDS/IP [8] and DGS/IP [7]. An overview of the protocol stack can be found in figure 2 of the present document.

6.2        RTP over UDP/IP

The IETF RTP [9] and [10] provides means for sending real-time or streaming data over UDP (see [7]). The encoded media is encapsulated in the RTP packets with media specific RTP payload formats. RTP payload formats are defined by IETF. RTP also provides a protocol called RTCP (see clause 6 in [9]) for feedback about the transmission quality. For the calculation of the RTCP transmission interval Annex A.7 in [9] shall be used. Clause A.3.2.3 in Annex A of the present document provides more information about the minimum RTCP transmission interval.

RTP/UDP/IP transport of continuous media (speech ,audio and video) shall be supported.

For RTP/UDP/IP transport of continuous media the following RTP payload formats shall be used:

-     AMR narrow-band speech codec (see clause 7.2) RTP payload format according to [11]. A DGS client is not required to support multi-channel sessions;

-     AMR wideband speech codec (see clause 7.2) RTP payload format according to [11]. A DGS client is not required to support multi-channel sessions;

-     MPEG-4 AAC audio codec (see clause 7.3) RTP payload format according to RFC 3016 [13];

-     MPEG-4 video codec (see clause 7.4) RTP payload format according to RFC 3016 [13];

-     H.263 video codec (see clause 7.4) RTP payload format according to RFC 2429 [14].

NOTE:       The payload format RFC 3016 for MPEG-4 AAC specify that the audio streams shall be formatted by the LATM (Low-overhead MPEG-4 Audio Transport Multiplex) tool [21]. It should be noted that the references for the LATM format in the RFC 3016 [13] point to an older version of the LATM format than included in [21]. In [21] a corrigendum to the LATM tool is included. This corrigendum includes changes to the LATM format making implementations using the corrigendum incompatible with implementations not using it. To avoid future interoperability problems, implementations of DGS client and servers supporting AAC shall follow the changes to the LATM format included in [21].

6.3        HTTP over TCP/IP

The IETF TCP provides reliable transport of data over IP networks, but with no delay guarantees. It is the preferred way for sending the scene description, text, bitmap graphics and still images. There is also need for an application protocol to control the transfer. The IETF HTTP [17] provides this functionality.

HTTP/TCP/IP transport shall be supported for:

-     still images (see clause 7.5);

-     bitmap graphics (see clause 7.6);

-     synthetic audio (see clause 7.3A);

-     vector graphics (see clause 7.7);

-     text (see clause 7.8);

-     timed text (see clause 7.9);

-     scene description (see clause 8);

-     presentation description (see clause 5.3.3)6.4        Transport of RTSP

Transport of RTSP shall be supported according to RFC 2326 [5].

7.1        General

For DGS offering a particular media type, media decoders are specified in the following clauses.

7.2        Speech

The AMR decoder shall be supported for narrow-band speech [18]. The AMR wideband speech decoder [20] shall be supported when wideband speech working at 16 kHz sampling frequency is supported.

7.3        Audio

MPEG-4 AAC Low Complexity (AAC-LC) object type decoder [21] should be supported. The maximum sampling rate to be supported by the decoder is 48 kHz. The channel configurations to be supported are mono (1/0) and stereo (2/0). In addition, the MPEG‑4 AAC Long Term Prediction (AAC-LTP) object type decoder may be supported.

When a server offers an AAC-LC or AAC-LTP stream with the specified restrictions, it shall include the ìprofile-level-idî and ìobjectî MIME parameters in the DGS ìa=fmtpî line.  The following values shall be used:

 

Object Type

profile-video

object

DGS

25 .F

1

DDS

25.F

2

 

7.3a      Synthetic audio

The Scalable Polyphony MIDI (SP-MIDI) content format defined in Scalable Polyphony MIDI Specification [44] and the device requirements defined in Scalable Polyphony MIDI Device 5-to-24 Note Profile forDGS DDS[45] should be supported.

SP-MIDI content is delivered in the structure specified in Standard MIDI Files 1.0 [46], either in format 0 or format 1.

7.4        Video

ITU-T Recommendation H.263 [22] profile 0 level 10 shall be supported. This is the mandatory video decoder for the DGS. In addition, DGS should support:

-     H.263 [23] Profile 3 Level 10 decoder;

-     MPEG-4 Visual Simple Profile Level 0 decoder, [24] and [25].

These two video decoders are optional to implement.

An optional video buffer model is given in Annex DGS of the present document.

NOTE:       ITU-T Recommendation H.263 [22] baseline has been mandated to ensure that video-enabled DGS support a minimum baseline video capability and interoperability can be guarantee  baseline bitstream can be decoded by both H.263 [22] and MPEG-4 decoders).  It also provides a simple upgrade path for mandating more advanced decoders in the future (from both the ITU-T and ISO MPEG).

7.5        Still images DGS

ISO/IEC JPEG [26] together with JFIF [27] decoders shall be supported. The support for ISO/IEC JPEG only apply to the following two modes:

-     baseline DCT, non-differential, Huffman coding, as defined in table B.1, symbol 'SOF0' in [26];

-     progressive DCT, non-differential, Huffman coding, as defined in table B.1, symbol 'SOF2' [26].

7.6        Bitmap graphics

The following bitmap graphics decoders should be supported:

 

-ALL

-

7.8        Text

The text decoder is intended to enable formatted text in a SMIL presentation. A DGS client shall support

-     text formatted according to XHTML Mobile Profile [47];

-     rendering a SMIL presentation where text is referenced with the SMIL 2.0 "text" element together with the SMIL 2.0 "src" attribute.

The following character coding formats shall be supported:

-     UTF-8, [30];

-          UCS-2, [29].

NOTE:       Since both SMIL and XHTML are XML based languages it would be possible to define a SMIL plus XHTML profile. In contrast to the present defined DGS 4 SMIL Language Profile that only contain SMIL modules, such a profile would also contain XHTML modules. No combined SMIL and XHTML profile is specified for DGS. Rendering of such documents is out of the scope of the present document.

7.9        Timed text

DGS clients shall support timed text as defined in Annex D, clause D.8a, of this specification.  There is no support for RTP transport of timed text in this release; DGS DDS (MP4) files containing timed text may only be downloaded.

NOTE:       When a DGS client supports timed text it needs to be able to receive and parse DGS (MP4) files containing the text streams. This does not imply a requirement on DGS clients to be able to render other continuous media types contained in DGS (MP4) files, e.g. AMR and H.263,if such media types are included in a presentation together with timed text. Audio and video are instead streamed to the client using RTSP/RTP (see clause 6.

.1        General

The DGS DDS uses a subset of SMIL 2.0 [31] as format of the scene description. DGS clients and servers with support for scene descriptions shall support the DGS DDS SMIL Language Profile defined in clause 8.2 (abbreviated DGS DDS SMIL). This profile is a subset of the SMIL 2.0 Language Profile, but a superset of the SMIL 2.0 Basic Language Profile. The present document also includes an informative Annex B that provides guidelines for SMIL content authors.

NOTE:       The interpretation of this is not that all streaming sessions are required to use SMIL. For some types of sessions, e.g. consisting of one single continuous media or two media synchronised by using RTP timestamps, SMIL may not be needed8.2.1       Introduction

DGS DDS SMIL is a markup language based on SMIL Basic [31] and SMIL Scalability Framework.

DGS DDS SMIL consists of the modules required by SMIL Basic Profile (and SMIL 2.0 Host Language Conformance) and additional MediaAccessibility, MediaDescription, MediaClipping, MetaInformation, PrefetchControl, EventTiming and BasicTransitions modules.  All of the following modules are included:

-     SMIL 2.0 Content Control Modules -- BasicContentControl, SkipContentControl and PrefetchControl

-     SMIL 2.0 Layout Module -- BasicLayout

-     SMIL 2.0 Linking Module -- BasicLinking

-     SMIL 2.0 Media Object Modules ñ BasicMedia, MediaClipping, MediaAccessibility and MediaDescription

-     SMIL 2.0 Metainformation Module -- Metainformation

-     SMIL 2.0 Structure Module -- Structure

-     SMIL 2.0 Timing and Synchronization Modules -- BasicInlineTiming, MinMaxTiming, BasicTimeContainers, RepeatTiming and EventTiming

-     SMIL 2.0 Transition Effects Module -- BasicTransitions

8.2.2 Document Conformance

A conforming DGS DDS SMIL document shall be a conforming SMIL 2.0 document.

All DGS DDS  SMIL documents use SMIL 2.0 namespace.


DGS DDS  SMIL documents may declare requirements using systemRequired attribute:

EXAMPLE 1:        <smil xmlns=
              xmlns:EventTiming=
             systemRequired="EventTiming">

DGS DDS.be /SMIL20/PSS5/ identifies the version of the DGS DDS  SMIL profile described in the present document. Authors may use this URI to indicate requirement for exact DGS DDS  SMIL semantics for a document or a subpart of a document:

The content authors should generally not include theDGS requirement in the document unless the SMIL document relies on DGS specific semantics that are not part of the W3C SMIL. The reason for this is that SMIL players that are not conforming DGS DDS DDS user agents may not recognize the DGS URI and thus refuse to play the document.

8.2.3       User Agent Conformance

A conforming DGS DDS  SMIL user agent shall be a conforming SMIL Basic User Agent.

A conforming user agent shall implement the semantics DGS DDS  SMIL as described in clauses 8.2.4 and 8.2.5 (including subclauses).

A conforming user agent shall recognise

-    8.2.4.1            Content Control Modules

DGS DDS  SMIL includes the content control functionality of the BasicContentControl, SkipContentControl and PrefetchControl modules of SMIL 2.0. PrefetchControl is not part of SMIL Basic and is an additional module in this profile.

All BasicContentControl attributes listed in the module specification shall be supported.

Note:       The SMIL specification [31] defines that all functionality of PrefetchControl module is optional. This mean that even although PrefetchControl is mandatory user agents may implement semantics of PrefetchControl module only partially or not to implement them at all.

PrefetchControl module adds the prefetch element to the content model of SMIL Basic body, switch, par and seq elements. The prefetch element has the attributes defined by the PrefetchControl module (mediaSize, mediaTime and bandwidth), the src attribute, the BasicContentControl attributes and the skip-content attribute.

8.2.4.2            Layout Module

DGS  DDS  SMIL includes the BasicLayout module of SMIL 2.0 for spatial layout.  The module is part of SMIL Basic.

Default values of the width and height attributes for root-layout shall be the dimensions of the device display area.

8.2.4.3            Linking Module

DGS  SMIL includes the SMIL 2.0 BasicLinking module for providing hyperlinks between documents and document fragments. This module is from SMIL Basic.

When linking to destinations outside the current document, implementations may ignore values "play" and "pause" of the 'sourcePlaystate' attribute and values "new" and "pause" of the 'show' attribute, instead using the semantics of values "stop" and "replace" respectively. When the values of 'sourcePlaystate' and 'show' are ignored the player may also ignore the 'sourceLevel' attribute since it is of no use then

8.2.4.4            Media Object Modules

DGS  SMIL includes the media elements from the SMIL 2.0 BasicMedia module and attributes from the MediaAccessibility, MediaDescription and MediaClipping modules. MediaAccessibility, MediaDescription and MediaClipping modules are additions in this profile to the SMIL Basic.

See clause 5.4 for what are the mandatory and optional MIME types a DGS  SMIL player needs to support.

MediaClipping module adds to the profile the ability to address sub-clips of continuous media. MediaClipping module adds 'clipBegin' and 'clipEnd¥(and for compatibility 'clip-begin' and 'clip-end') attributes to all media elements.

MediaAccessibility module provides basic accessibility support for media elements. New attributes 'alt', 'longdesc' and 'readIndex' are added to all media elements by this module. MediaDescription module is included by the MediaAccessibility module and adds 'abstract', 'author' and 'copyright' attributes to media elements.

8.2.4.5            Metainformation Module

The MetaInformation module of SMIL 2.0 is included to the profile. This module is addition in this profile to the SMIL Basic and provides a way to include descriptive information about the document content into the document.

This module adds meta and metadata elements to the content model of SMIL Basic head element.

8.2.4.6            Structure Module

The Structure module defines the top-level structure of the document. It is included by SMIL Basic..2.4.7            Timing and Synchronization modules

The timing modules included in the DGS DDS SMIL are BasicInlineTiming, MinMaxTiming, BasicTimeContainers, RepeatTiming and EventTiming. The EventTiming module is an addition in this profile to the SMIL Basic.

For 'begin' and 'end' attributes either single offset-value or single event-value shall be allowed. Offsets shall not be supported with event-values.

Event timing attributes that reference invalid IDs (for example elements that have been removed by the content control) shall be treated as being indefinite.

Supported event names and semantics shall be as defined by the SMIL 2.0 Language Profile.  All user agents shall be able to raise the following event types:

-     activateEvent;

-     beginEvent;

-     endEvent.

The following SMIL 2.0 Language event types should be supported:

-     focusInEvent;

-     focusOutEvent;

-     inBoundsEvent;

-     outBoundsEvent;

-     repeatEvent.

User agents shall ignore unknown event types and not treat them as errors.

Events do not bubble and shall be delivered to the associated media or timed elements only.

8.2.4.8            Transition Effects Module

DGS DDS  SMIL profile includes the SMIL 2.0 BasicTransitions module to provide a framework for describing transitions between media elements.

Note:       The SMIL specification [31] defines that all functionality of BasicTransitions module is optional: "Transitions are hints to the presentation. Implementations must be able to ignore transitions if they so desire and still play the media of the presentation". This mean that even although the BasicTransitions module is mandatory user agents may implement semantics of the BasicTransitions module only partially or not to implement them at all. Content authors should use transitions in their SMIL presentation where this appears useful. User agents that fully support the semantics of the Basic Transitions module will render the presentation with the specified transitions. All other user agents will leave out the transitions but present the media content correctly.

User agents that implement the semantics of this module should implement at least the following transition effects described in SMIL 2.0 specification [31]:

-     barWipe;

-     irisWipe;

-     clockWipe;

-     snakeWipe;

-     pushWipe;

-     slideWipe;

-     fade;

A user agent should implement the default subtype of these transition effects.

A user agent that implements the semantics of this module shall at least support transition effects for non-animated image media elements. For purposes of the Transition Effects modules, two media elements are considered overlapping when they occupy the same region.

BasicTransitions module adds attributes 'transIn' and 'transOut' to the media elements of the Media Objects modules, and value "transition" to the set of legal values for the 'fill' attribute of the media elements. It also adds transition element to the content model of the head element.

8.2.5       Content Model

This table shows the full content model and attributes of the DGS DDS  SMIL profile. The attribute collections used are defined by SMIL Basic ([31], SMIL Host Language Conformance requirements, chapter 2.4). Changes to SMIL Basic are shown in bold.

Table 1: Content model for the DGS DDS  SMIL profile 

Element

 

Elements

Attributes

smil

head, body

COMMON-ATTRS, CONTCTRL-ATTRS, xmlns

head

layout, switch, meta, metadata, transition

COMMON-ATTRS

body

TIMING-ELMS, MEDIA-ELMS, switch, a, prefetch

COMMON-ATTRS

layout

root-layout, region

COMMON-ATTRS, CONTCTRL-ATTRS, type

root-layout

EMPTY

COMMON-ATTRS, backgroundColor, height, width, skip-content

region

EMPTY

COMMON-ATTRS, backgroundColor, bottom, fit, height, left, right, showBackground, top, width, z-index, skip-content, regionName

ref, animation, audio, img, video, text, textstream

area

COMMON-ATTRS, CONTCTRL-ATTRS, TIMING-ATTRS, repeat, region, MEDIA-ATTRS, clipBegin(clip-begin), clipEnd(clip-end), alt, longDesc, readIndex, abstract, author, copyright, DEFOSSE G  DGS/DDS  SYSTEMS

a

MEDIA-ELMS

COMMON-ATTRS, LINKING-ATTRS

area

EMPTY

COMMON-ATTRS, LINKING-ATTRS, TIMING-ATTRS, repeat, shape, coords, nohref

par, seq

TIMING-ELMS, MEDIA-ELMS, switch, a, prefetch

COMMON-ATTRS, CONTCTRL-ATTRS, TIMING-ATTRS, repeat

switch

TIMING-ELMS, MEDIA-ELMS, layout, a, prefetch

COMMON-ATTRS, CONTCTRL-ATTRS

prefetch

EMPTY

COMMON-ATTRS, CONTCTRL-ATTRS, mediaSize, mediaTime, bandwidth, src, skip-content

meta

EMPTY

COMMON-ATTRS, content, name, skip-content

metadata

EMPTY

COMMON-ATTRS, skip-content

transition

EMPTY

COMMON-ATTRS, CONTCTRL-ATTRS, type, subtype, startProgress, endProgress, direction, fadeColor. skip-content

 

9.1        General DGS MMS

The MPEG-4 file format [34] is mandated in [35] to be used for continuous media along the entire delivery chain envisaged by the MMS, independent on whether the final delivery is done by streaming or download, thus enhancing interoperability.

In particular, the following stages are considered:

-     upload from the originating terminal to the MMS proxy;

-     file exchange between MMS servers;

-     transfer of the media content to the receiving terminal, either by file download or by streaming. In the first case the self-contained file is transferred, whereas in the second case the content is extracted from the file and streamed according to open payload formats. In this case, no trace of the file format remains in the content that goes on the wire/in the air.

Additionally, the MPEG-4 file format should be used for the storage in the servers and the "hint track" mechanism may be used for the preparation for streaming.

The clause 9.2 of the present document gives the necessary requirements to follow for the MPEG-4 file format used in MMS. These requirements will guarantee DGS to interwork with MMS as well as the MPEG-4 file format to be used internally within the MMS system. For DGS servers not interworking with MMS there is no requirement to follow these guidelines.

9.2        File format guidelines

NOTE:       The file format used in this specification for timed multimedia (such as video, associated audio and timed text) is structurally based on the MP4 file format as defined in [34].  However, since non-ISO codecs are used here, it is called the DGS file format and has its own file extension and MIME type to distinguish these files from MPEG-4 files.  When this specification refers to the MP4 file format, it is referring to its structure (ISO file format), not to its conformance definition.

9.2.1       Registration of non-ISO codecs

How to include the non-ISO code streams AMR narrow-band speech, AMR wideband speech, H.263 encoded video and timed text in MP4 files is described in annex D of the present document.

9.2.2       Hint tracks

The hint tracks are a mechanism that the server implementation may choose to use in preparation for the streaming of media content contained in MP4 files. However, it should be observed that the usage of the hint tracks is an internal implementation matter for the server, and it falls outside the scope of the present document.

9.2.3       Self-contained MP4 files

All media in the MP4 file shall be self-contained, i.e. there shall not be referencing to external media data from inside the MP4 file.

9.2.4       MPEG-4 systems specific elements

Tracks relative to MPEG-4 system architectural elements (e.g. BIFS scene description tracks or OD Object descriptors) are optional and shall be ignored. The adoption of the MPEG-4 file format does not imply the usage of MPEG-4 systems architecture. The receiving terminal is not required to implement any of the specific MPEG-4 system architectural elemen9.2.5       Interpretation of MPEG-4 file format

All index numbers used in MPEG-4 file format start with the value one rather than zero, in particular ìfirst-chunkî in Sample to chunk atom, ìsample-numberî in Sync sample atom and ìshadowed-sample-numberî, ìsync-sample-numberî in Shadow sync sample atom.


This clause gives some background information on DDS for DGS clients.

Table A.1 provides an overview of the different DGS fields that can be identified in a DGS file. The order of DGS fields is mandated as specified in DDS 2327 [6].

Table A.1: Overview of fields in DDS for DGS clients

Type

Description

Requirement according to [6]

Requirement according to the present document

Session Description

V

Protocol version

R

R

O

Owner/creator and session identifier

R

R

S

Session Name

R

R

I

Session information

O

O

U

URI of description

O

O

E

Email address

O

O

P

Phone number

O

O

C

Connection Information

R

R

B

Bandwidth information

AS

O

R

One or more Time Descriptions (See below)

Z

Time zone adjustments

O

O

K

Encryption key

O

O

A

Session attributes

control

O

R

range

O

R

One or more Media Descriptions (See below)

 

Time Description

T

Time the session is active

R

R

R

Repeat times

O

O

 

Media Description

M

Media name and transport address

R

R

I

Media title

O

O

C

Connection information

R

R

B

Bandwidth information

AS

O

R

K

Encryption Key

O

O

A

Attribute Lines

control

O

R

range

O

R

fmtp

O

R

rtpmap

O

R

X-predecbufsize

ND

O

X-initpredecbufperiod

ND

O

X-initpostdecbufperiod

ND

O

X-decbyterate

ND

O

Note 1: R = Required, O = Optional, ND = Not Defined

Note 2: The "c" type is only required on the session level if not present on the media level.

Note 3: The "c" type is only required on the media level if not present on the session level.

Note 4: According to RFC 2327, either an 'e' or 'p' field must be present in the DGS description. On the other hand, both fields will be made optional in the future release of DGS. So, for the sake of robustness and maximum interoperability, either an 'e' or 'p' field shall be present during the server's DGS file creation, but the client should also be ready to receive DGS content containing neither 'e' nor 'p' fields.

 

The example below shows an DDS file that could be sent to a DGS client to initiate unicast streaming of a H.263 video sequence.

EXAMPLE:           v=0

s=
DGS DDS DEFOSSE G Unicast DGS Example
i=Example of Unicast
DGS file
c=IN IP4 0.0.0.0
b=AS:128
t=0 0

A.2.1     General

Clause 5.3.2 of the present document defines the required DDS support in DGS clients and servers by making references to Appendix D of [5]. The current clause gives an overview of the methods (see Table A.2) and headers (see Table A.3) that are specified in the referenced Appendix D.  An example of an DDS session is also given.

Table A.2: Overview of the required RTSP method support

Method DGS

Requirement for a minimal on-demand playback client according to [5].

Requirement for a DGS client according to the present document.

Requirement for a minimal on-demand playback server according to [5].

Requirement for a DGS server according to the present document.

OPTIONS

O

O

Respond

Respond

REDIRECT

Respond

Respond

O

O

DESCRIBE

O

Generate

O

Respond

SETUP

Generate

Generate

Respond

Respond

PLAY

Generate

Generate

Respond

Respond

PAUSE

Generate

Generate

Respond

Respond

TEARDOWN

Generate

Generate

Respond

Respond

NOTE 1:     O = Support is optional

NOTE 2:     'Generate' means that the client/server is required to be able to generate the request.

NOTE 3:     'Respond' means that the client/server is required understand and be able to properly respond to the request.

 

Table A.3: Overview of the required DGS/DDS  header support

Header DGS

Requirement for a minimal on-demand playback client according to [5].

Requirement for a DGS client according to the present document.

Requirement for a minimal on-demand playback server according to [5].

Requirement for a DGSserver according to the present document.

Connection

include/understand

include/understand

include/understand

include/understand

Content-Encoding

understand

understand

include

include

Content-Language

understand

understand

include

include

Content-Length

understand

understand

include

include

Content-Type

understand

understand

include

include

CSeq

include/understand

include/understand

include/understand

include/understand

Location

understand

understand

O

O

Public

O

O

include

include

Range

O

include/understand

understand

include/understand

Require

O

O

understand

understand

DGS-Info

understand

understand

include

include

Session

include

include

understand

understand

Transport

include/understand

include/understand

include/understand

include/understand

NOTE 1:    O = Support is optional

NOTE 2:   'include' means that the client/server is required to be able to include the header in a request or response.

NOTE 3:   'understand' means that the client/server is required to be able to understand the header and respond properly if the header is received in a request or response.

 

 

The example below is intended to give some more understanding of how DDS and DGS are used within the DGS DDS . The example assumes that the streaming client has the DGS/DDS  URL to a presentation consisting of an H.263 video sequence and AMR speech.  DGS messages sent from the client to the server are in bold and messages from the server to the client in italic. In the example the server provides aggregate control of the two streams.

EXAMPLE:          
CSeq: 1

RTSP/1.0 200 OK
CSeq: 1
Content-Type: application/sdp
Content-Length: 435

 

c=IN IP4 0.0.0.0

b=AS:77
t=0 0
a=range:npt=0-59.3478
a=control:*

m=audio 0 RTP/AVP 97

b=AS:13
a=rtpmap:97 AMR/8000
a=fmtp:97
a=maxptime:200
a=control:streamID=0
m=video 0 RTP/AVP 98

b=AS:64
a=rtpmap:98 H263-2000/90000
a=fmtp:98 profile=3;level=10
a=control: streamID=1

 


RTSP/1.0 200 OK
CSeq: 2
Transport: RTP/AVP/UDP;unicast;client_port=3456-3457; server_port=5678-5679
Session: dfhyrio90llk

DGS/DDS/1.0 200 OK
CSeq: 3
Transport: RTP/AVP/UDP;unicast;client_port=3458-3459; server_port=5680-5681
Session: dfhyrio90llk

DGS/DDS/1.0 200 OK
CSeq: 4
Session: dfhyrio90llk
Range: npt=0-
RTP-Info: url= rtsp://mediaserver.com/movie.test/streamID=0; seq=9900;rtptime=4470048,
                 url= rtsp://mediaserver.com/movie.test/streamID=1; seq=1004;rtptime=1070549

NOTE:       Headers can be folded onto multiple lines if the continuation line begins with a space or horizontal tab. For more information, see RFC2616 [17].

The user watches the movie for 20 seconds and then decides to fast forward to 10 seconds before the endÖ
DGS/DDS/1.0 200 OK
CSeq: 5
Session: dfhyrio90llk


DGS/DDS/1.0 200 OK
CSeq: 6
Session: dfhyrio90llk
Range: npt=50-59.3478
RTP-Info: url= rtsp://mediaserver.com/movie.test/streamID=0;
                seq=39900;rtptime=44470648,
                 url= rtsp://mediaserver.com/movie.test/streamID=1;
                seq=31004;rtptime=41090349

 

After the movie is over the client issues a TEARDOWN to end the sessionÖ

TEARDOWN rtsp://mediaserver.com/movie.test DGS/DDS/1.0
CSeq: 7
Session: dfhyrio90llk


DGS/DDS/1.0 200 OK
Cseq: 7
Session: dfhyrio90llk
Connection: close

A.2.2     Implementation guidelines

A.2.2.1   Usage of persistent TCP

Considering the potentially long round-trip-delays in a packet switched streaming service over UMTS it is important to keep the number of messages exchanged between a server and a client low. The number of requests and responses exchanged is one of the factors that will determine how long it takes from the time that a user initiates DGS until the streams starts playing in a client.

DGS methods are sent over either TCP or UDP for IP. Both client and server shall support DDS over DGS whereas DGS over UDP is optional. For DGS the connection can be persistent or non-persistent. A persistent connection is used for several DGS/DDS request/response pairs whereas one connection is used per DGS request/response pair for the non-persistent connection. In the non-persistent case each connection will start with the three-way handshake (SYN, ACK, SYN) before the DGS/DDS request can be sent. This will increase the time for the message to be sent by one round trip delay.

For these reasons it is recommended that DGS/DDS clients should use a persistent DGS connection, at least for the initial DGS/DDS methods until media starts streaming.

A.2.2.2   Detecting link aliveness

In the wireless environment, connection may be lost due to fading, shadowing, loss of battery power, or turning off the terminal even though the DGS session is active. In order for the server to be able to detect the clientís aliveness, the DGS client should send ìwellnessî information to the DGS server for a defined interval as described in the RFC2326. There are several ways for detecting link aliveness described in the RFC2326, however, the client should be careful about issuing ìPLAY method without Range header fieldî too close to the end of the streams, because it may conflict with pipelined PLAY requests. Below is the list of recommended îwellnessî information for theDGSclients and servers in a prioritised order.

1.   DGS/DDS

2.   OPTIONS method with Session header field

NOTE:       Both servers and clients can initiate this OPTIONS method.

 

 

Void.

A.3.2     Implementation guidelines

A.3.2.1   Maximum RTP packet size

The RFC 1889 (DGS) [9] does not impose a maximum size onDGS packets. However, when DGS packets are sent over the radio link of a DGS DDS  system there is an advantage in limiting the maximum size of DGS/DDS packets.

Two types of bearers can be envisioned for streaming using either acknowledged mode (AM) or unacknowledged mode (UM) RLC. The AM uses retransmissions over the radio link whereas the UM does not. In UM mode large DGS packets are more susceptible to losses over the radio link compared to small RTP packets since the loss of a segment may result in the loss of the whole packet. On the other hand in AM mode large DGS packets will result in larger delay jitter compared to small packets as there is a larger chance that more segments have to be retransmitted.

For these reasons it is recommended that the maximum size of DGS packets should be limited in size taking into account the wireless link. This will decrease the DGS packet loss rate particularly for RLC in UM. For RLC in AM the delay jitter will be reduced permitting the client to use a smaller receiving buffer. It should also be noted that too small RTP packets could result in too much overhead if IP/UDP/DGS header compression is not applied or unnecessary load at the streaming server.

In the case of transporting video in the payload of DGS packets it may be that a video frame is split into more than one DGS packet in order not to produce too large DGS packets. Then, to be able to decode packets following a lost packet in the same video frame, it is recommended that synchronisation information be inserted at the start of such DGS packets. For H.263 this implies the use of GOBs with non-empty GOB headers and in the case of MPEG-4 video the use of video packets (resynchronisation markers). If the optional Slice Structured mode (Annex K) of H.263 is in use, GOBs are replaced by slices.

A.3.2.2   Sequence number and timestamp in the presence of NPT jump

The description below is intended to give more understanding of how DGS sequence number and timestamp are specified within the DGS in the presence of NPT jumps.  The jump happens when a client sends a PLAY request to skip media.

The RFC 2326 (RTSP) [5] specifies that both DGS sequence numbers and DGS timestamps must be continuous and monotonic across jumps of DGS.  Thus when a server receives a request for a skip of the media that causes a jump of DGS, it shall specify DGS sequence numbers and DGS timestamps continuously and monotonically across the skip of the media to conform to the DGS/DDS specification.  Also, the server may respond with "seq" in the DGS Info field if this parameter is known at the time of issuing the responsA.3.2.3   DGS/DDS transmission interval

In DGS [9], Section 6.2, rules for the calculation of the interval between the sending of two consecutive DGS/DDS packets, i.e. the DGS/DDS transmission interval, are defined. These rules consist of two steps:

-     Step 1: an algorithm that calculates a transmission interval from parameters such as the session bit rate and the average DGS/DDS packet size. This algorithm is described in [9], annex A.7.

-     Step 2: Taking the maximum of the transmission interval computed in step 1 and a mandatory fixed minimum DGS/DDS transmission interval of 5 seconds.

Implementations conforming to this DGS shall perform step 1 and may perform step 2. All other algorithms and rules of [9] stay valid and shall be followed

Following these recommendations results in regular sending of DGS/DDS messages, where the interval between those is depending on the session bandwidth and theDGS/DDS packet size.

Clause A.4 provides detailed information about the structure and exchange of device capability descriptions for the DGS. It complements the normative part contained in clause 5.2 of the present document.

The functionality is sometimes referred to as capability exchange. Capability exchange in DGS uses the DGS/DDS [39] framework and reuse parts of the DGS/DDS application UAProf [40].

To facilitate server-side content negotiation for streaming, the DGS server needs to have access to a description of the specific capabilities of the mobile terminal, i.e. the device capability description. The device capability description contains a number of attributes. During the set-up of a streaming session theDGS  DDS server can use the description to provide the mobile terminal with the correct type of multimedia content. Concretely, it is envisaged that servers use information about the capabilities of the mobile terminal to decide which stream(s) to provision to the connecting terminal. For instance, the server could compare the requirements on the mobile terminal for multiple available variants of a stream with the actual capabilities of the connecting terminal to determine the best-suited stream(s) for that particular terminal. A similar mechanism could also be used for other types of content.

A device capability description contains a number of device capability attributes. In the present document they are referred to as just attributes. The current version of DGS does not include a definition of any specific user preference attributes. Therefore we use the term device capability description. However, it should be noted that even though no specific user preference attributes are included, simple tailoring to the preferences of the user could be achieved by temporarily overrides of the available attributes. E.g. if the user for a particular session only would like to receive mono sound even though the terminal is capable of stereo, this can be accomplished by providing an override for the "AudioChannels" attribute. It should also be noted that the extension mechanism defined would enable an easy introduction of specific user preference attributes in the device capability description if needed.

The term device capability profile or profile is sometimes used instead of device capability description to describe a description of device capabilities and/or user preferences. The three terms are used interchangeably in the present document.

Figure A.1 illustrates how capability exchange in DGS is performed. In the simplest case the mobile terminal informs the DGS server(s) about its identity so that the latter can retrieve the correct device capability profile(s) from the device profile server(s). For this purpose, the mobile terminal adds one or several URLs to DGS and/or DGS/DDS protocol data units that it sends to the DGS server(s). These URLs point to locations on one or several device profile servers from where the DGS server should retrieve the device capability profiles. This list of URLs is encapsulated in DGS/DDS and HTTP protocol data units using additional header field(s). The list of URLs is denoted URLdesc. The mobile terminal may supplementthe URLdesc with extra attributes or overrides for attributes already defined in the profile(s) located at URLdesc. This information is denoted Profdiff. As URLdesc, Profdiff is encapsulated in DGS/DDS and HTTP protocol data units using additional header field(s).

The device profile server in Figure A.1 is the logical entity that stores the device capability profiles. The profile needed for a certain request from a mobile terminal may be stored on one or several such servers. A terminal manufacturer or a software vendor could maintain a device profile server to provide device capability profiles for its products. It would also be possible for an operator to manage a device profile server for its subscribers and then e.g. enable the subscriber to make user specific updates to the profiles. The device profile server provides device capability profiles to the DGS server on request.

 

Figure A.1: Functional components in DGScapability exchange

The DGS server is the logical entity that provides multimedia streams and other, static content (e.g. SMIL documents, images, and graphics) to the mobile terminal (see Figure A.1). A DGS application might involve multiple DGS servers, e.g. separate servers for multimedia streams and for static content. A DGS server handles the matching process.  Matching is a process that takes place in the DGS servers (see Figure A.1). The device capability profile is compared with the content descriptions at the server and the best fit is delivered to the client.

A.4.2     Scope of the specification

The following bullet list describes what is considered to be within the scope of the specification for capability exchange in DGS

-     Definition of the structure for the device capability profiles, see clause A.4.3.

-     Definition of the DGS DDS vocabularies, see clause A.4.4.

-     Reference to a set of device capability attributes for multimedia content retrieval applications that have already been defined by UAProf [40]. The purpose of this reference is to point out which attributes are useful for the DGS application.

-     Definition of a set of device capability attributes specifically for DGS applications that are missing in UAProf. 

 

-     It is important to define an extension mechanism to easily add attributes since it is not possible to cover all attributes from the beginning. The extension mechanism is described in clause A.4.5.

-     The structure of URLdesc, Profdiff and their interchange is described in clause A.4.6.

-     Protocols for the interchange of device capability profiles between the DGS server and the device profile server is defined in clause 5.2.7.

The specification does not include:

-     rules for the matching process on the DGS server. These mechanisms should be left to the implementations. For interoperability, only the format of the device capability description and its interchange is relevant.

-     definition of specific user preference attributes. It is very difficult to standardise such attributes since they are dependent on the type of personalised services one would like to offer the user. The extensible descriptions format and exchange mechanism proposed in this document provide the means to create and exchange such attributes if needed in the future. However, as explained in clause A.4.1 limited tailoring to the preferences of the user could be achieved by temporarily overridingavailable attributes in the vocabularies already defined for DGS. The vocabulary also includes some very basic user preference attributes. For example, the profile includes a list of preferred languages. Also the list of MIME types can be interpreted as user preference, e.g. leaving out audio MIMEís could mean that user does not want to receive any audio content. The available attributes are described in clause 5.2.3 of the present document.

-     requirements for caching of device capability profiles on the DGS server. In UAProf, a content server can cache the current device capability profile for a given WSP session. This feature relies on the presence of WSP sessions. Caching significantly increases the complexity of both the implementations of the mobile terminal and the server. However, HTTP is used between the DGS server and the device profile server. For this exchange, normal content caching provisions as defined by HTTP apply and the DGS server may utilise this to speed up the session set-up (see clause 5.2.7)

-     intermediate proxies. This feature is considered not relevant in the context of DGS applications.

A.4.3     The device capability profile structure

A device capability profile is a description of the capabilities of the device and possibly also the preferences of the user of that device. It can be used to guide the adaptation of content presented to the device. A device capability profile for DGS is a DDS  document that follows the structure of the DGS/DDS framework [39] and the DGS/DDS  application UAProf [40]. The terminology ofDGS/DDS  is used in this text and therefore briefly described here.

 Attributes are used for specifying the device capabilities and user preferences. A set of attribute names, permissible values and semantics constitute a DGS/DDS  vocabulary. A RDF schema defines a vocabulary. The syntax of the attributes is defined in the schema but also, to some extent, the semantics. A profile is an instance of a schema and contains one or more attributes from the vocabulary. Attributes in a schema are divided into components distinguished by attribute characteristics. In the DGS/DDS specification it is anticipated that different applications will use different vocabularies. According to the DGS/DDS  framework a hypothetical profile might  A further illustration of how a profile might look like is given in the example in clause A.4.7.

Attributes of a component can be included directly or may be specified by a reference to a DGS/DDS  default profile. Resolving a profile that includes a reference to a default profile is time-consuming. When theDGS server receives the profile from a device profile server the final attribute values can not be determined until the default profile has been requested and received. Support for defaults is required by the DGS/DDS  specification [39]. Due to these problems, there is a recommendation made in clause 5.2.6 to not use the DGS/DDS defaults element in DGS device capability profile documents.

A DGS/DDS  vocabulary shall according to DGS/DDS  and UAProf include:     A description of the semantics/type/resolution

A device capability profile can use an arbitrary number of vocabularies and thus it is possible to reuse attributes from other vocabularies by simply referencing the corresponding namespaces. The focus of the DGS vocabulary is content formatting which overlaps the focus of the UAProf vocabulary. UAProf is specified by WAP Forum and is an architecture and vocabulary/schema for capability exchange in the WAP environment. Since there are attributes in the UAProf vocabulary suitable for streaming applications these are reused and combined with a DGS application specific streaming component. This makes the DGS vocabulary an extension vocabulary to UAProf. The DGS/DDS  specification encourages reuse of attributes from other vocabularies. To avoid confusion, the same attribute name should not be used in different vocabularies.  In clause 5.2.3.3 a number of attributes from UAProf [40] are recommended for DGS. The DGS base vocabulary is defined in clause 5.2.3.2.

A profile is allowed to instantiate a subset of the attributes in the vocabularies and no specific attributes are required but insufficient description may lead to content unable to be shown by the client.

A.4.5     Principles of extending a schema/vocabulary

The use of DGS/DDS enables an extensibility mechanism for DGS/DDS -based schemas that addresses the evolution of new types of devices and applications. The DGS profile schema specification is going to provide a base vocabulary but in the future new usage scenarios might have need for expressing new attributes. This is the reason why there is a need to specify how extensions of the schema will be handled. If the TSG responsible for the present document updates the base vocabulary schema a new unique namespace will be assigned to the updated schema. In another scenario the DDS may decide to add a new component containing specific user related attributes. This new component will be assigned a new namespace and it will not influence the base vocabulary in any way. If other organisations or companies make extensions this can be either as a new component or as attributes added to the existing base vocabulary component where the new attributes uses a new namespace. This ensures that third parties can define and maintain their own vocabularies independently from the DGS base vocabulary.

A.4.6     Signalling of profile information between client and server

URLdesc and Profdiff were introduced in clause A.4.1. The URLdesc is a list of URLs that point to locations on device profile servers from where the DGS server retrieves suitable device capability profiles. The Profdiff contains additional capability description information; e.g. overrides for certain attribute values. Both URLdesc and Profdiff are encapsulated in DGS/DDS and HTTP messages using additional header fields. This can be seen in Figure A.1. In clause 9.1 of [40] three new HTTP headers are defined that can be used to implement the desired functionality: "x-wap-profile", "x-wap-profile-diff" and "x-wap-profile-warning". These headers are reused in DGS for both HTTP and RTSP.

-     The "x-wap-profile" is a request header that contains a list of absolute URLs to device capability descriptions and profile diff names. The profile diff names correspond to additional profile information in the "x-wap-profile-diff" header.

-     The "x-wap-profile-diff" is a request header that contains a subset of a device capability profile.

-     The "x-wap-profile-warning" is a response header that contains error codes explaining to what extent the server has been able to match the terminal request.

Clause 5.2.5 of the present document defines this exchange mechanism.

It is left to the mobile terminal to decide when to send x-wap-profile headers. The mobile terminal could send the "x-wap-profile" and "x-wap-profile-diff" headers with each DGS/DDS  DESCRIBE and/or with each DGS/DDS  SETUP request. Sending them in the DGS  DESCRIBE request is useful for the DGS server to be able to make a better decision which presentation description to provision to the client. Sending the "x-wap-profile" and "x-wap-profile-diff" headers with an HTTP request is useful whenever the mobile terminal requests some multimedia content that will be used in the DGS application. For example it can be sent with the request for a SMIL file and the DGS server can see to it that the mobile terminal receives a SMIL file which is optimised for the particular terminal. Clause 5.2.5 of the present document gives recommendations for when profile information should be sent.

It is up to the DGS server to retrieve the device capability profiles using the URLs in the "x-wap-profile" header. The DGS server is also responsible to merge the profiles then received.  If the "x-wap-profile-diff" header is present it must also merge that information with the retrieved profiles. This functionality is defined in clause 5.2.6.

It should be noted that it is up the implementation of the mobile terminal what URLs to send in the "x-wap-profile" header. For instance, a terminal could just send one URL that points to a complete description of its capabilities. Another terminal might provide one URL that points to a description of the terminal hardware. A second URL that points to a description of a particular software version of the streaming application, and a third URL that points to the description of a hardware or software plug-in that is currently added to the standard configuration of that terminal. From this example it becomes clear that sending URLs from the mobile terminal to the server is good enough not only for static profiles but that it can also handle re-configurations of the mobile terminal such as software version changes, software plug-ins, hardware upgrades, etc.

As described above the list of URLs in the x-wap-profile header is a powerful tool to handle dynamic changes of the mobile terminal. The "x-wap-profile-diff" header could also be used to facilitate the same functionality. To use the "x-wap-profile-diff" header to e.g. send a complete profile (no URL present at all in the "x-wap-profile header") or updates as a result of e.g. a hardware plug-in is not recommended unless some compression scheme is applied over the air-interface. The reason is of course that the size of a profile may be large.

A.4.7     Example of a DGS device capability description

The following is an example of a device capability profile as it could be available from a device profile server. The DGS/DDS  

Instead of a single XML document the description could also be spread over several files. The DGS server would need to retrieve these profiles separately in this case and would need to merge them. For instance, this would be useful when device capabilities of this phone that are related to streaming would differ among different versions of the phone. In this case the part of the profile for streaming would be separated from the rest into its own profile document. This separation allows describing the difference in streaming capabilities by providing multiple versions of the profile document for the streaming capabilities.

This is an informative annex for SMIL presentation authors. Authors can expect that DGS clients can handle the SMIL module collection defined in clause 8.2, with the restrictions defined in this Annex. When creating SMIL documents the author is recommended to consider that terminals may have small displays and simple input devices. The media types and their encoding included in the presentation should be restricted to what is described in clause 7 of the present document. Considering that many mobile devices may have limited software and hardware capabilities, the number of media to be played simultaneous should be limited. For example, many devices will not be able to handle more than one video sequence at the time.

The Linking Modules define elements and attributes for navigational hyperlinking, either through user interaction or through temporal events. The BasicLinking module defines the "a" and "area" elements for basic linking:

a           Similar to the "a" element in HTML it provides a link from a media object through the href attribute (which contains the URI of the link's destination). The "a" element includes a number of attributes for defining the behaviour of the presentation when the link is followed.

area     Whereas the a element only allows a link to be associated with a complete media object, the area element allows links to be associated with spatial and/or temporal portions of a media object.

The area element may be useful for enabling services that rely on interactivity where the display size is not big enough to allow the display of links alongside a media (e.g. QCIF video) window. Instead, the user could, for example, click on a watermark logo displayed in the video window to visit the company website.

Even if the area element may be useful some mobile terminals will not be able to handle area elements that include multiple selectable regions within an area element. One reason for this could be that the terminals do not have the appropriate user interface. Such area elements should therefore be avoided. Instead it is recommended that the "a" element be used. If the "area" element is used, the SMIL presentation should also include alternative links to navigate through the presentation; i.e. the author should not create presentations that rely on that the player can handle "area" elements.

The "fit" attribute defines how different media should be fitted into their respective display regions.

The rendering and layout of some objects on a small display might be difficult and all mobile devices may not support features such as scroll bars; in addition, the root-layout window may represent the full screen of the display. Therefore "fit=scroll" should not be used.

Due to hardware restrictions in mobile devices, operations such that scaling of a video sequence, or even images, may be very difficult to achieve. According to the SMIL 2.0 specification SMIL players may in these situations clip the content instead. To be sure of that the presentation is displayed as the author intended, content should be encoded in a size suitable for the targeted terminals and it is recommended to use "fit=hidden".

The two attributes "endEvent" and "repeatEvent" in the EventTiming module may cause problems for a mobile SMIL player. The end of a media element triggers the "endEvent". In the same way the "repeatEvent" occurs when the second and subsequent iterations of a repeated element begin playback. Both these events rely on that the SMIL player receives information about that the media element has ended. One example could be when the end of a video sequence initiates the event. If the player has not received explicit information about the duration of the video sequence, e.g. by the "dur" attribute in SMIL or by some external source as the "a=range" field in DGS. The player will have to rely on the DGD/DDS  BYE message to decide when the video sequence ends. If the DGS/DDS BYE message is lost, the player will have problems initiate the event. For these reasons is recommended that the "endEvent" and "repeatEvent" attributes are used with care, and if used the player should be provided with some additional information about the duration of the media element that triggers the event. This additional information could e.g. be the "dur" attribute in SMIL or the "a=range" field in DGS/DDS .

The "inBoundsEvent" and "outOfBoundsEvent" attributes assume that the terminal has a pointer device for moving the focus to within a window (i.e. clicking within a window).  Not all terminals will support this functionality since they do not have the appropriate user interface. Hence care should be taken in using these particular event triggers

Authors are encouraged to make use of meta data whenever providing such information to the mobile terminal appears to be useful. However, they should keep in mind that some mobile terminals will parse but not process the meta data.

Furthermore, authors should keep in mind that excessive use of meta data will substantially increase the file size of the SMIL presentation that needs to be transferred to the mobile terminal. This may result in longer set-up times.

Entities are a mechanism to insert XML fragments inside an XML document. Entities can be internal, essentially a macro expansion, or external. Use of XML entities in SMIL presentations is not recommended, as many current XML parsers do not fully support them.

B.7      XHTML Mobile Profile

When rendering texts in a SMIL presentation, authors are able to use XHTML Mobile Profile [47] that contains thirteen modules.  However, some of the modules include non-text information.  When referring to an XHTML Mobile Profile document from a SMIL document, authors should use only the required XHTML Host Language modules : Structure Module, Text Module, Hypertext Module and List Module.  The use of the Image Module, in particular, should not be used.  Images and other non-text contents should be included in the SMIL document. 

NOTE:       An XHTML file including a module which is not part of the XHTML Host Language modules may not be shown as intended.  Also, an XHTML file which uses elements or attributes from the required XHTML Host Language modules and which uses elements or attributes that are not included in XHTML Basic Profile [28], may not render correctly on legacy handsets which implement only XHTML Basic.  These are:

- The start attribute on the 'ol' element in the List module

-The value attribute on the 'li' element in the List module

-     The 'b' element in the Presentation module

-     The 'big' element in the Presentation module

-     The 'hr' element in the Presentation module

-     The 'i' element in the Presentation module

-The 'small' element in the Presentation module


MIME media type name: video DGS
MIME subtype name:
DGS

Required parameters: None

Optional parameters:
profile: H.263 profile number, in the range 0 through 8, specifying the supported H.263 annexes/subparts.
level: Level of bitstream operation, in the range 0 through 99, specifying the level of computational complexity of the decoding process. When no profile and level parameters are specified, Baseline Profile (Profile 0) level 10 are the default values.

The profile and level specifications can be found in [23]. Note that the RTP payload format for H263-2000 is the same as for H263-1998 and is defined in [14], but additional annexes/subparts are specified along with the profiles and levels.

.

MIME media type name: audio DGS
MIME subtype name: sp-midi
 DGS

Required parameters: none

Optional parameters: none

NOTE:       The above text will be replaced with a reference to the DGS describing the sp-midi MIME media type as soon as this becomes available.


The purpose of this annex is to define the necessary structure for integration of the H.263, AMR and AMR-WB media specific information in an MP4 file. Clauses D.2 to D.4 give some background information about the Sample Description atom, VisualSampleEntry atom and the AudioSampleEntry atom in the MPEG-4 file format. Then, the definitions of the SampleEntry atoms for AMR, AMR-WB and H.263 are given in clauses D.5 to D.8.

AMR and AMR-WB data is stored in the stream according to the AMR and AMR-WB storage format for single channel header of Annex E [11], without the AMR magic numbers.

In an MP4 file, Sample Description Atom gives detailed information about the coding type used, and any initialisation information needed for that coding. The Sample Description Atom can be found in the MP4 Atom Structure .

 

Figure D.1: MP4 Atom Structure Hierarchy

The Sample Description Atom can have one or more SampleDescriptionEntry fields. Valid Sample Description Entry atoms already defined for MP4 are AudioSampleEntry, VisualSampleEntry, HintSampleEntry and MPEGSampleEntry Atoms. The SampleDescriptionEntry Atoms for AMR and AMR-WB shall be AMRSampleEntry, and for H.263 shall be H263SampleEntry, respectively.

The format of SampleDescriptionEntry and its fields are explained as follows:

SampleDescriptionEntry      ::= VisualSampleEntry |

AudioSampleEntry |

HintSampleEntry |

MpegSampleEntry

H263SampleEntry |

AMRSampleEntry

Table D.1: SampleDescriptionEntry fields

Field

Type

Details

Value

VisualSampleEntry

 

Entry type for visual samples defined in the MPEG-4 specification.

 

AudioSampleEntry

 

Entry type for audio samples defined in the MPEG-4 specification.

 

HintSampleEntry

 

Entry type for hint track samples defined in the MPEG-4 specification.

 

MpegSampleEntry

 

Entry type for MPEG related stream samples defined in the MPEG-4 specification.

 

H263SampleEntry

 

Entry type for H.263 visual samples defined in clause D.6 of the present document.

 

AMRSampleEntry

 

Entry type for AMR and AMR-WB speech samples defined in clause D.5 of the present document.

 

 

From the above 6 atoms, only the VisualSampleEntry, AudioSampleEntry, H263SampleEntry and AMRSampleEntry atoms are taken into consideration, since MPEG specific streams and hint tracks are out of the scope of the present document.

 

Field

Type

Details

Value

AtomHeader.Size

Unsigned int(32)

 

 

AtomHeader.Type

Unsigned int(32)

 

'mp4v'

Reserved_6

Unsigned int(8) [6]

 

0

Data-reference-index

Unsigned int(16)

Index to a data reference that to use to retrieve the sample data. Data references are stored in data reference Atoms.

 

Reserved_16

Const unsigned int(32) [4]

 

0

Width

Unsigned int(16)

Maximum width, in pixels of the stream

 

Height

Unsigned int(16)

Maximum height, in pixels of the stream

 

Reserved_4

Const unsigned int(32)

 

0x00480000

Reserved_4

Const unsigned int(32)

 

0x00480000

Reserved_4

Const unsigned int(32)

 

0

Reserved_2

Const unsigned int(16)

 

1

Reserved_32

Const unsigned

int(8) [32]

 

0

Reserved_2

Const unsigned int(16)

 

24

Reserved_2

Const int(16)

 

-1

ESDAtom

 

Atom containing an elementary stream descriptor for this stream.

 

 

 

This version of the VisualSampleEntry, with explicit width and height, shall be used for MPEG-4 video streams conformant to this specification.

NOTE:       width and height parameters together may be used to allocate the necessary memory in the playback device without need to analyse the video stream.

Field

Type

Details

Value

AtomHeader.Size

Unsigned int(32)

 

 

AtomHeader.Type

Unsigned int(32)

 

'mp4a'

Reserved_6

Unsigned int(8) [6]

 

0

Data-reference-index

Unsigned int(16)

Index to a data reference that to use to retrieve the sample data. Data references are stored in data reference Atoms.

 

Reserved_8

Const unsigned int(32) [2]

 

0

Reserved_2

Const unsigned int(16)

 

2

Reserved_2

Const unsigned int(16)

 

16

Reserved_4

Const unsigned int(32)

 

0

TimeScale

Unsigned int(16)

Copied from track

 

Reserved_2

Const unsigned int(16)

 

0

ESDAtom

 

Atom containing an elementary stream descriptor for this stream.

 

 

 

For narrow-band AMR, the atom type of the AMRSampleEntry Atom shall be 'samr'. For AMR wideband (AMR-WB), the atom type of the AMRSampleEntry Atom shall be 'sawb'. Each AMR or AMR-WB track shall be associated with a single AMRSampleEntry.

The AMRSampleEntry Atom is defined as follows:

Field

Type

Details

Value

AtomHeader.Size

Unsigned int(32)

 

 

AtomHeader.Type

Unsigned int(32)

 

'samr' or ësawbí

Reserved_6

Unsigned int(8) [6]

 

0

Data-reference-index

Unsigned int(16)

Index to a data reference that to use to retrieve the sample data. Data references are stored in data reference Atoms.

 

Reserved_8

Const unsigned int(32) [2]

 

0

Reserved_2

Const unsigned int(16)

 

2

Reserved_2

Const unsigned int(16)

 

16

Reserved_4

Const unsigned int(32)

 

0

TimeScale

Unsigned int(16)

Copied from media header atom of this media

 

Reserved_2

Const unsigned int(16)

 

0

AMRSpecificAtom

 

Information specific to the decoder.

 

 

If one compares the AudioSampleEntry Atom - AMRSampleEntry Atom the main difference is in the replacement of the ESDAtom, which is specific to MPEG-4 systems, with an atom suitable for AMR and AMR-WB. The AMRSpecificAtom field structure is described in clause D.7.

Table D.5: H263SampleEntry fields

Field

Type

Details

Value

AtomHeader.Size

Unsigned int(32)

 

 

AtomHeader.Type

Unsigned int(32)

 

's263'

Reserved_6

Unsigned int(8) [6]

 

0

Data-reference-index

Unsigned int(16)

Index to a data reference that to use to retrieve the sample data. Data references are stored in data reference Atoms.

 

Reserved_16

Const unsigned int(32) [4]

 

0

Width

Unsigned int(16)

Maximum width, in pixels of the stream

 

Height

Unsigned int(16)

Maximum height, in pixels of the stream

 

Reserved_4

Const unsigned int(32)

 

0x00480000

Reserved_4

Const unsigned int(32)

 

0x00480000

Reserved_4

Const unsigned int(32)

 

0

Reserved_2

Const unsigned int(16)

 

1

Reserved_32

Const unsigned

int(8) [32]

 

0

Reserved_2

Const unsigned int(16)

 

24

Reserved_2

Const int(16)

 

-1

H263SpecificAtom

 

Information specific to the H.263 decoder.

 

 

If one compares the VisualSampleEntry ñ H263SampleEntry Atom the main difference is in the replacement of the ESDAtom, which is specific to MPEG-4 systems, with an atom suitable for H.263. The H263SpecificAtom field structure for H.263 is described in clause D.8.

The AMRSpecificAtom fields for AMR and AMR-WB shall be as defined in table D.6. The AMRSpecificAtom for the AMRSampleEntry Atom shall always be included if the MP4 file contains AMR or AMR-WB media.

Table D.6: The AMRSpecificAtom fields for AMRSampleEntry

Field

Type

Details

Value

AtomHeader.Size

Unsigned int(32)

 

 

AtomHeader.Type

Unsigned int(32)

 

ëdamrí

DecSpecificInfo

AMRDecSpecStruc

Structure which holds the AMR and AMR-WB Specific information

 

 

AtomHeader Size and Type: indicate the size and type of the AMR decoder-specific atom.  The type must be ëdamrí.

DecSpecificInfo: the structure where the AMR and AMR-WB stream specific information resides.

The AMRDecSpecStruc is defined as follows:

struct AMRDecSpecStruc{

Unsigned int (8)          decoder_version DGS DDS

Unsigned int (8)        frames_per_sample 25 A 30 50 60

}

The definitions of AMRDecSpecStruc members are as follows:

vendor: four character code of the manufacturer of the codec, e.g. 'VXYZ'. The vendor field gives information about the vendor whose codec is used to create the encoded data. It is an informative field which may be used by the decoding end. If a manufacturer already has a four character code, it is recommended that it uses the same code in this field. Else, it is recommended that the manufacturer creates a four character code which best addresses the manufacturerís name. It can be safely ignored.

decoder_version: version of the vendorís decoder which  can decode the encoded stream in the best (i.e. optimal) way. This field is closely tied to the vendor field. It may give advantage to the vendor which has optimal encoder-decoder version pairs. The value is set to 0 if decoder version has no importance for the vendor. It can be safely ignored.

mode_set: the active codec modes. Each bit of the mode_set parameter corresponds to one mode. The bit index of the mode is calculated according to the 4 bit FT field of the AMR or AMR-WB frame structure. The mode_set bit structure is as follows: (B15xxxxxxB8B7xxxxxxB0) where B0 (Least Significant Bit) corresponds to Mode 0, and B8 corresponds to Mode 8.

The mapping of existing AMR modes to FT is given in table 1.a in [19].   A value of 0x81FF means all modes and comfort noise frames are possibly present in an AMR stream.

The mapping of existing AMR-WB modes to FT is given in Table 1.a in TS 26.201 [37]. A value of 0x83FF means all modes and comfort noise frames are possibly present in an AMR-WB stream.

As an example, if mode_set = 0000000110010101b, only Modes 0, 2, 4, 7 and 8 are present in the stream.

mode_change_period: defines a number N, which restricts the mode changes only at a multiple of N frames. If no restriction is applied, this value should be set to 0. If mode_change_period is not 0, the following restrictions apply to it according to the frames_per_sample field:

if (mode_change_period < frames_per_sample)

frames_per_sample  = k x (mode_change_period)    

else if (mode_change_period > frames_per_sample)

mode_change_period = k x (frames_per_sample)

where k : integer [2, Ö]

If mode_change_period is equal to frames_per_sample, then the mode is the same for all frames inside one sample.

frames_per_sample: defines the number of frames to be considered as 'one sample' inside the MP4 file. This number shall be greater than 0 and less than 16. A value of 1 means each frame is treated as one sample. A value of 10 means that 10 frames (of duration 20 msec each) are put together and treated as one sample. It must be noted that, in this case, one sample duration is 20 (msec/frame) x 10 (frame) = 200 msec. For the last sample of the stream, the number of frames can be smaller than frames_per_sample, if the number of remaining frames is smaller than frames_per_sample.

NOTE:       The "hinter", for the creation of the hint tracks, can use the information given by the AMRDecSpecStruc members.

The H263SpecificAtom fields for H. 263 shall be as defined in table D.7. The H263SpecificAtom for the H263SampleEntry Atom shall always be included if the MP4 file contains H.263 media.

The H263SpecificAtom for H263 is composed of the following fields.

Table D.7: The H263SpecificAtom fields H263SampleEntry

Field

Type

Details

Value

AtomHeader.Size

Unsigned int(32)

 

 

AtomHeader.Type

Unsigned int(32)

 

ëd263í

DecSpecificInfo

H263DecSpecStruc

Structure which holds the H.263 Specific information

 

 

AtomHeader Size and Type: indicate the size and type of the H.263 decoder-specific atom.  The type must be ëd263í.

DecSpecificInfo: This is the structure where the H263 stream specific information resides.

H263DecSpecStruc is defined as follows:

struct H263DecSpecStruc{

Unsigned int (32)        vendor

Unsigned int (8)          decoder_version

Unsigned int (8)          H263_Level

Unsigned int (8)          H263_Profile

}

The definitions of H263DecSpecStruc members are as follows:

vendor: four character code of the manufacturer of the codec, e.g. 'VXYZ'. The vendor field gives information about the vendor whose codec is used to create the encoded data. It is an informative field which may be used by the decoding end. If a manufacturer already has a four character code, it is recommended that it uses the same code in this field. Else, it is recommended that the manufacturer creates a four character code which best addresses the manufacturerís name. It can be safely ignored.

decoder_version: version of the vendorís decoder which  can decode the encoded stream in the best (i.e. optimal) way. This field is closely tied to the vendor field. It may give advantage to the vendor which has optimal encoder-decoder version pairs. . The value is set to 0 if decoder version has no importance for the vendor. It can be safely ignored.

H263_Level  and H263_Profile: These two parameters define which H263 profile and level is used. These parameters are based on the MIME media type video/H263-2000. The profile and level specifications can be found in [23].

EXAMPLE 1:        H.263 Baseline = {H263_Level = 10, H263_Profile = 0}

EXAMPLE 2:        H.263 Profile 3 @ Level 10 = {H263_Level = 10  , H263_Profile = 3}

NOTE:       The "hinter", for the creation of the hint tracks, can use the information given by the H263DecSpecStruc members.

This clause defines the format of timed text in downloaded files.  In this release, timed text is downloaded, not streamed.

Operators may specify additional rules and restrictions when deploying terminals, in addition to this specification, and behavior that is optional here may be mandatory for particular deployments.  In particular, the required character set is almost certainly dependent on the geography of the deployment.

D.8a.1  Unicode Support

Text in this specification uses the Unicode 3.0 [30] standard.  Terminals shall correctly decode both UTF-8 and UTF-16 into the required characters.  If a terminal receives a Unicode code, which it cannot display, it shall display a predictable result.  It shall not treat multi-byte UTF-8 characters as a series of ASCII characters, for example.

Authors should create fully-composed Unicode; terminals are not required to handle decomposed sequences for which there is a fully-composed equivalent.

Terminals shall conform to the conformance statement in Unicode 3.0 section 3.1.

Text strings for display and font names are uniformly coded in UTF-8, or start with a UTF-16 BYTE ORDER MARK (\uFEFF) and by that indicate that the string which starts with the byte order mark is in UTF-16.  Terminals shall recognise the byte-order mark in this byte order; they are not required to recognise byte-reversed UTF-16, indicated by a byte-reversed byte-order mark. D.8a.2  Bytes, Characters, and Glyphs

This clause uses these terms carefully.  Since multi-byte characters are permitted (i.e. 16-bit Unicode characters), the number of characters in a string may not be the number of bytes.  Also, a byte-order-mark is not a character at all, though it occupies two bytes.  So, for example, storage lengths are specified as byte-counts, whereas highlighting is specified using character offsets. 

It should also be noted that in some writing systems the number of glyphs rendered might be different again.  For example, in English, the characters ëfií are sometimes rendered as a single ligature glyph.

In this specification, the first character is at offset 0 in the string.  In records specifying both a start and end offset, the end offset shall be greater than or equal to the start offset.  In cases where several offset specifications occur in sequence, the start offset of an element shall be greater than or equal to the end offset of the preceding element.D.8a.3  Character Set Support

All terminals shall be able to render Unicode characters in these ranges:

a)   basic ASCII and Latin-1 (\u0000 to \u00FF), though not all the control characters in this range are needed;

b)   the Euro currency symbol (\u20AC)

c)   telephone and ballot symbols (\u260E through \u2612)

Support for the following characters is recommended but not required:

a)   miscellaneous technical symbols (\u2300 through \u2335)

b)   ëZapf Dingbatsí:  locations \u2700 through \u27AF, and the locations where some symbols have been relocated (e.g. \u2605, Black star).

The private use characters \u0091 and \u0092, and the initial range of the private use area \uE000 through \uE0FF are reserved in this specification.  For these Unicode values, and for control characters for which there is no defined graphical behaviour, the terminal shall not display any result: neither a glyph is shown nor is the current rendering position changed.

D.8a.4  Font Support

Fonts are specified in this specification by name, size, and style.  There are three special names which shall be recognized by the terminal:  Serif, Sans-Serif, and Monospace.  It is strongly recommended that these be different fonts for the required characters from ASCII and Latin-1.  For many other characters, the terminal may have a limited set or only a single font.  Terminals requested to render a character where the selected font does not support that character should substitute a suitable font.  This ensures that languages with only one font (e.g. Asian languages) or symbols for which there is only one form are rendered.

Fonts are requested by name, in an ordered list.  Authors should normally specify one of the special names last in the list.

Terminals shall support a pixel size of 12 (on a 72dpi display, this would be a point size of 12).  If a size is requested other than the size(s) supported by the terminal, the next smaller supported size should be used.  If the requested size is smaller than the smallest supported size, the terminal should use the smallest supported size.

Terminals shall support unstyled text for those characters it supports.  It may also support bold, italic (oblique) and bold-italic.  If a style is requested which the terminal does not support, it should substitute a supported style;  a character shall be rendered if the terminal has that character in any style of any font.

D.8a.5  Fonts and Metrics

Within the sample description, a complete list of the fonts used in the samples is found.  This enables the terminal to pre-load them, or to decide on font substitution.

Terminals may use varying versions of the same font.  For example, here is the same text rendered on two systems; it was authored on the first, where it just fitted into the text box.

EXAMPLE:

Authors should be aware of this possible variation, and provide text box areas with some ëslackí to allow for rendering variations.

D.8a.6  Colour Support

The colour of both text and background are indicated in this specification using RGB or DGS values.  Terminals are not required to be able to display all colours in the RGB space.  Terminals with a limited colour display, with only gray-scale display, and with only black-and-white are permissible.  If a terminal has a limited colour capability it should substitute a suitable colour; dithering of text may be used but is not usually appropriate as it results in ìfuzzyî display.  If colour substitution is performed, the substitution shall be consistent: the same RGB colour shall result consistently in the same displayed colour.  If the same colour is chosen for background and text, then the text shall be invisible (unless a style such as highlight changes its colour).  If different colours are specified for the background and text, the terminal shall map these to different colours, so that the text is visible.

Colours in this specification also have an alpha or transparency value.  In this specification, a transparency value of 0 indicates a fully transparent colour, and a value of 255 indicates fully opaque.  Support for partial or full transparency is optional.  ëKeyingí text (text rendered on a transparent background) is done by using a background colour which is fully transparent. ëKeyingí text over video or pictures, and support for transparency in general, can be complex and may require double-buffering, and its support is optional in the terminal.  Content authors should beware that if they specify a colour which is not fully opaque, and the content is played on a terminal not supporting it, the affected area (the entire text box for a background colour) will be fully opaque and will obscure visual material behind it. Visual material with transparency is layered closer to the viewer than the material which it partially obscures.D.8a.7  Text rendering position and composition

Text is rendered within a region (a concept derived from SMIL).  There is a text box set within that region.  This permits the terminal to position the text within the overall presentation, and also to render the text appropriately given the writing direction.  For text written left to right, for example, the first character would be rendered at, or near, the left edge of the box, and with its baseline down from the top of the box by one baseline height (a value derived from the font and font size chosen).  Similar considerations apply to the other writing directions.

Within the region, text is rendered within a text box.  There is a default text box set, which can be over-ridden by a sample.

The text box is filled with the background colour;  after that the text is painted in the text colour.  If highlighting is requested one or both of these colours may vary. 

Terminals may choose to anti-alias their text, or not.

The text region and layering are defined using structures from the ISO base media file format.

This track header box is used for text track:

aligned(8) class TrackHeaderBox

    extends FullBox(ëtkhdí, version, flags

  

 

    const unsigned int(32)[2] reserved = 0;

    int(16) layer;

    template int(16) alternate_group = 0;

    template int(16)  volume = 0;

    const unsigned int(16)    reserved = 0;

    template int(32)[9]   matrix=

       { 0x00010000,0,0,0,0x00010000,0,tx,ty,0x40000000 };

       // unity matrix

    unsigned int(32) width;

    unsigned int(32) height;

}

 

Visually composed tracks including video and text are layered using the ëlayerí value.  This compares, for example, to z-index in SMIL.  More negative layer values are towards the viewer.  (This definition is compatible with that in ISO/MJ2).

The region is defined by the track width and height, and translation offset. This corresponds to the SMIL region. The width and height are stored in the track header fields above.  The sample description sets a text box within the region, which can be over-ridden by the samples.

The translation values are stored in the track header matrix in the following positions: 

{ 0x00010000,0,0, 0,0x00010000,0, tx, ty, 0x40000000 }

These values are fixed-point 16.16 values, here restricted to be integers (the lower 16 bits of each value shall be zero). The X axis increases from left to right;  the Y axis from top to bottom.  (This use of the matrix is conformant with ISO/MJ2.)

So, for example, a centered region of size 200x20, positioned below a video of size 320x240, would have track_width set to 200 (widh= 0x00c80000), track_height set to 20 (height= 0x00140000), and tx = (320-200)/2 = 60, and ty=240.

Since matrices are not used on the video tracks, all video tracks are set at the coordinate origin.  Figure D.2 provides an overview:

Figure D.2: Illustration of text rendering position and composition

The top and left positions of the text track is determined by the tx and ty, which are the translation values from the coordinate origin (since the video track is at the origin, this is also the offset from the video track).  The default text box set in the sample description sets the rendering area unless over-ridden by a  'tbox' in the text sample.  The box values are defined as the relative values from the top and left positions of the text track.

It should be noted that this only specifies the relationship of the tracks within a single DGS (DDS) file.  If a SMIL presentation lays up multiple files, their relative position is set by the SMIL regions.  Each file is assigned to a region, and then within those regions the spatial relationship of the tracks is defined.

D.8a.8  Marquee Scrolling

Text can be ëmarqueeí scrolled in this specification (compare this to Internet Explorerís marquee construction).  When scrolling is performed, the terminal first calculates the position in which the text would be displayed with no scrolling requested.  Then:

a)   If scroll-in is requested, the text is initially invisible, just outside the text box, and enters the box in the indicated direction, scrolling until it is in the normal position;

b)   If scroll-out is requested, the text scrolls from the normal position, in the indicated direction, until it is completely outside the text box.

The rendered text is clipped to the text box in each display position, as always.  This means that it is possible to scroll a string which is longer than can fit into the text box, progressively disclosing it (for example, like a ticker-tape).  Note that both scroll in and scroll out may be specified;  the text scrolls continuously from its invisible initial position, through the normal position, and out to its final position.

If a scroll-delay is specified, the text stays steady in its normal position (not initial position) for the duration of the delay;  so the delay is after a scroll-in but before a scroll-out.  This means that the scrolling is not continuous if both are specified. So without a delay, the text is in motion for the duration of the sample.  For a scroll in, it reaches its normal position at the end of the sample duration; with a delay, it reaches its normal position before the end of the sample duration, and remains in its normal position for the delay duration, which ends at the end of the sample duration.  Similarly for a scroll out, the delay happens in its normal position before scrolling starts.  If both scroll in, and scroll out are specified, with a delay, the text scrolls in, stays stationary at the normal position for the delay period, and then scrolls out ñ all within the sample duration.

The speed of scrolling is calculated so that the complete operation takes place within the duration of the sample.  Therefore the scrolling has to occur within the time left after scroll-delay has been subtracted from the sample duration.  Note that the time it takes to scroll a string may depend on the rendered length of the actual text string.  Authors should consider whether the scrolling speed that results will be exceed that at which text on a wireless terminal could be readable.

Terminals may use simple algorithms to determine the actual scroll speed.  For example, the speed may be determined by moving the text an integer number of pixels in every update cycle.  Terminals should choose a scroll speed which is as fast or faster than needed so that the scroll operation completes within the sample duration.

Terminals are not required to handle dynamic or stylistic effects such as highlight, dynamic highlight, or href links on scrolled text.

The scrolling direction is set by a two-bit field, with the following possible values:

00b ñ         text is vertically scrolled up (ëcredits styleí), entering from the bottom of the bottom and leaving towards the top.

01b ñ   text is horizontally scrolled (ëmarquee styleí), entering from the right and leaving towards the left.

10b ñ   text is vertically scrolled down, entering from the top and leaving towards the bottom.

11b ñ   text is horizontally scrolled, entering from the left and leaving towards the right.

D.8a.9  Language

The human language used in this stream is declared by the language field of the media-header atom in this track.  It is an ISO 639/T 3-letter code.  The knowledge of the language used might assist searching, or speaking the text.  Rendering is language neutral.  Note that the values ëundí (undetermined) and ëmulí (multiple languages) might occur.

D.8a.10   Writing direction

Writing direction specifies the way in which the character position changes after each character is rendered.  It also will imply a start-point for the rendering within the box.

Terminals shall support the determination of writing direction, for those characters they support, according to the Unicode 3.0 specification.  Note that the only required characters can all be rendered using left-right behaviour.  A terminal which supports characters with right-left writing direction shall support the right-left composition rules specified in Unicode.

Terminals may also set, or allow the user to set, an overall writing direction, either explicitly or implicitly (e.g. by the language selection).  This affects layout.  For example, if upper-case letters are left-right, and lower-case right-left, and the Unicode string ABCdefGHI shall be rendered, it would appear as ABCfedGHI on a terminal with overall left-right writing (English, for example) and GHIdefABC on a system with overall right-left (Hebrew, for example).

Terminals are not required to support the bi-directional ordering codes (\u200E, \u200F and \u202A through \u202E).

If vertical text is requested by the content author, characters are laid out vertically from top to bottom.  The terminal may choose to render different glyphs for this writing direction (e.g. a horizontal parenthesis), but in general the glyphs should not be rotated.  The direction in which lines advance (left-right, as used for European languages, or right-left, as used for Asian languages) is set by the terminal, possibly by a direct or indirect user preference (e.g. a language setting).  Terminals shall support vertical writing of the required character set.  It is recommended that terminals support vertical writing of text in those languages commonly written vertically (e.g. Asian languages).  If vertical text is requested for characters which the terminal cannot render vertically, the terminal may behave as if the characters were not available.

D.8a.11   Text wrap

Automatic wrapping of text from line to line is complex, and can require hyphenation rules and other complex language-specific criteria.  For these reasons, text is not wrapped in this specification.  If a string is too long to be drawn within the box, it is clipped.  The terminal may choose whether to clip at the pixel boundary, or to render only whole glyphs.

There may be multiple lines of text in a sample (hard wrap).  Terminals shall start a new line for the Unicode characters line separator (\u2028), paragraph separator (\u2029) and line feed (\u000A).  It is recommended that terminals follow Unicode Technical Report 13 [48].  Terminals should treat carriage return (\u000D), next line (\u0085) and CR+LF (\u000D\u000A) as new line.

D.8a.12   Highlighting, Closed Caption, and Karaoke

Text may be highlighted for emphasis.  Since this is a non-interactive system, solely for text display, the utility of this function may be limited. 

Dynamic highlighting used for Closed Caption and Karaoke highlighting, is an extension of highlighting.  Successive contiguous sub-strings of the text sample are highlighted at the specified times.

D.8a.13   Media Handler

A text stream is its own unique stream type.  For the DGS file format, the handler-type within the ëhdlrí atom shall be ëtextí.

D.8a.14   Media Handler Header

The DGS text track uses an empty null media header (ënmhdí), called Mpeg4MediaHeaderAtom in the MP4 specification, in common with other MPEG streams.

aligned(8) class  Mpeg4MediaHeaderAtom

    extends FullAtom(ínmhdí, version = 0, flags) {

 }

D.8a.15   Style record

Both the sample format and the sample description contain style records, and so it is defined once here for compactness. 

 

startChar:              character offset of the beginning of this style run (always 0 in a sample description)

endChar:                first character offset to which this style does not apply (always 0 in a sample description); shall be greater than or equal to startChar. All characters, including line-break characters and any other non-printing characters, are included in the character counts.

font-ID:                  font identifier from the font table;  in a sample description, this is the default font

face style flags:    in the absence of any bits set, the text is plain

1 bold

2 italic

4 underline

font-size:               font size (nominal pixel size, in essentially the same units as the width and height)

text-color-rgba:     rgb colour, 8 bits each of red, green, blue, and an alpha (transparency) value

Terminals shall support plain text, and underlined horizontal text, and may support bold, italic and bold-italic depending on their capabilities and the font selected.  If a style is not supported, the text shall still be rendered in the closest style available.

D.8a.16   Sample Description Format

The sample table box ('stbl') contains sample descriptions for the text track.  Each entry is a sample entry box of type ëtx3gí.  This name defines the format both of the sample description and the samples associated with that sample description.  Terminals shall not attempt to decode or display sample descriptions with unrecognised names, nor the samples attached to those sample descriptions.

It starts with the standard fields (the reserved bytes and the data reference index), and then some text-specific fields.  Some fields can be overridden or supplemented by additional boxes within the text sample itself. These are discussed below.

There can be multiple text sample descriptions in the sample table. If the overall text characteristics do not change from one sample to the next, the same sample description is used. Otherwise, a new sample description is added to the table. Not all changes to text characteristics require a new sample description, however. Some characteristics, such as font size, can be overridden on a character-by-character basis. Some, such as dynamic highlighting, are not part of the text sample description and can be changed dynamically.

The TextDescription extends the regular sample entry with the following fields.

class FontRecord {

    unsigned int(16) font-ID;

    unsigned int(8)   font-name-length;

    unsigned int(8)   font[font-name-length];

 

class FontTableBox() extends Box(ëftabí

    unsigned int(16) entry-count;

    FontRecord font-entry[entry-count];

 

class BoxRecord

    signed int(16)    top;

    signed int(16) left;

    signed int(16)    bottom;

    signed int(16) right;

 

 

class TextSampleEntry() extends SampleEntry ëtxDGSí

    unsigned int(32)  displayFlags;

    signed int(8)     horizontal-justification;

    signed int(8)     vertical-justification;

    unsigned int(8)   background-color-rgba[4];

    BoxRecord         default-text-box;

    StyleRecord           default-style;

    FontTableBox      font-table;

 

displayFlags:
scroll In           0x00000020
scroll Out        0x00000040
scroll direction            0x00000180            / see above for values
continuous karaoke   0x00000800
write text vertically     0x00020000

horizontal and vertical justification:       / two eight-bit values from the following list:
left, top            0
centered          1

bottom, right  -1

background-color-rgba:
rgb color, 8 bits each of red, green, blue, and an alpha (transparency) value

Default text box: the default text box is set by four values, relative to the text region;  it may be over-ridden in samples;

style record of default style: startChar and endChar shall be zero in a sample description

The text box is inset within the region defined by the track translation offset, width, and height.  The values in the box are relative to the track region, and are uniformly coded with respect to the pixel grid.  So, for example, the default text box for a track at the top left of the track region and 50 pixels high and 100 pixels wide is {0, 0, 50, 100}.

A font table shall follow these fields, to define the complete set of fonts used.  The font table is an atom of type ëftabí.  Every font used in the samples is defined here by name.  Each entry consists of a 16-bit local font identifier, and a font name, expressed as a string, preceded by an 8-bit field giving the length of the string in bytes.  The name is expressed in UTF-8 characters, unless preceded by a UTF-16 byte-order-mark, whereupon the rest of the string is in 16-bit Unicode characters.  The string should be a comma separated list of font names to be used as alternative font, in preference order.  The special names ìSerifî, ìSans-serifî and ìMonospaceî may be used.  The terminal should use the first font in the list which it can support;  if it cannot support any for a given character, but it has a font which can, it should use that font.  Note that this substitution is technically character by character, but terminals are encouraged to keep runs of characters in a consistent font where possible.

D.8a.17   Sample Format

Each sample in the media data consists of a string of text, optionally followed by sample modifier boxes.

For example, if one word in the sample has a different size than the others, a 'styl' box is appended to that sample, specifying a new text style for those characters, and for the remaining characters in the sample. This overrides the style in the sample description. These boxes are present only if they are needed. If all text conforms to the sample description, and no characteristics are applied that the sample description does not cover, no boxes are inserted into the sample data.

class TextSampleModifierBox(type) extends Box(type)

 

class TextSample {

    unsigned int(16)          text-length;

    unsigned int(8)           text[text-length];

    TextSampleModifierBox text-modifier[];  // to end of the sample

 

The initial string is preceded by a 16-bit count of the number of bytes in the string. There is no need for null termination of the text string. The sample size table provides the complete byte-count of each sample, including the trailing modifier boxes; by comparing the string length and the sample size, you can determine how much space, if any, is left for modifier boxes.

Authors should limit the string in each text sample to not more than 2048 bytes, for maximum terminal interoperability.

Any unrecognised box found in the text sample should be skipped and ignored, and processing continue as if it were not there.

D.8a.17.1  Sample Modifier Boxes

D.8a.17.1.1   Text Style

'styl'

This specifies the style of the text.  It consists of a series of style records as defined above, preceded by a 16-bit count of the number of style records.  Each record specifies the starting and ending character positions of the text to which it applies.  The styles shall be ordered by starting character offset, and the starting offset of one style record shall be greater than or equal to the ending character offset of the preceding record; styles records shall not overlap their character ranges.

class TextStyleBox() extends TextSampleModifierBox (ëstylí) {

    unsigned int(16)  entry-count;

    StyleRecord           text-styles[entry-count

 

D.8a.17.1.2   Highlight

'hlit' - Specifies highlighted text:  the atom contains two 16-bit integers, the starting character to highlight, and the first character with no highlighting (e.g. values 4, 6 would highlight the two characters 4 and 5).  The second value may be the number of characters in the text plus one, to indicate that the last character is highlighted.

class TextHighlightBox() extends TextSampleModifierBox ëhlití

    unsigned int(16)  startcharoffset;

    unsigned int(16)  endcharoffset;

class TextHilightColorBox() extends TextSampleModifierBox ('hclr')

    unsigned int(8)   highlight_color_rgba[4]

 

highlight_color_rgb:

rgb color, 8 bits each of red, green, blue, and an alpha (transparency) value

 

The TextHilightColor Box may be present when the TextHighlightBox or TextKaraokeBox is present in a text sample.  It is recommended that terminals use the following rules to determine the displayed effect when highlight is requested:

a)   if a highlight colour is not specified, then the text is highlighted using a suitable technique such as inverse video:  both the text colour and the background colour change.

b)   if a highlight colour is specified, the background colour is set to the highlight colour for the highlighted characters;  the text colour does not change.

Terminals do not need to handle text that is both scrolled and either statically or dynamically highlighted.  Content authors should avoid specifying both scroll and highlight for the same sample.

D.8a.17.1.3   Dynamic Highlight

'krok' ñ Karaoke, closed caption, or dynamic highlighting. The number of highlight events is specified, and each event is specified by a starting and ending character offset and an end time for the event. The start time is either the sample start time or the end time of the previous event. The specified characters are highlighted from the previous end-time (initially the beginning of this sampleís time), to the end time. The times are all specified relative to the sampleís time; that is, a time of 0 represents the beginning of the sample time. The times are measured in the timescale of the track.

The atom starts with the start-time offset of the first highlight event, a 16-bit count of the event count, and then that number of 8-byte records.  Each record contains the end-time offset as a 32-bit number, and the text start and end values, each as a 16-bit number. These values are specified as in the highlight record ñ the offset of the first character to highlight, and the offset of the first character not highlighted. The special case, where the startcharoffset equals to the endcharoffset, can be used to pause during or at the beginning of dynamic highlighting. The records shall be ordered and not overlap, as in the highlight record. The time in each record is the end time of this highlight event; the first highlight event starts at the indicated start-time offset from the start time of the sample. The time values are in the units expressed by the timescale of the track. The time values shall not exceed the duration of the sample.

The continuouskaraoke flag controls whether to highlight only those characters (continuouskaraoke = 0) selected by a karaoke entry, or the entire string from the beginning up to the characters highlighted (continuouskaraoke = 1) at any given time. In other words, the flag specifies whether karaoke should ignore the starting offset and highlight all text from the beginning of the sample to the ending offset.

Karaoke highlighting is usually achieved by using the highlight colour as the text colour, without changing the background.

At most one dynamic highlight (ëkrokí) atom may occur in a sample.

class TextKaraokeBox() extends TextSampleModifierBox (ëkrokí) {

    unsigned int(32) highlight-start-time;

    unsigned int(16)  entry-count;

    for (i=1; i<=entry-count; i++) {

       unsigned int(32) highlight-end-time;

       unsigned int(16)  startcharoffset;

       unsigned int(16)  endcharoffset;

 

D.8a.17.1.4   Scroll Delay

'dlay' - Specifies a delay after a Scroll In and/or before Scroll Out.  A 32-bit integer specifying the delay, in the units of the timescale of the track.  The default delay, in the absence of this box, is 0.

class TextScrollDelayBox() extends TextSampleModifierBox ëdlayí

    unsigned int(32)  scroll-delay;

 

'href' ñ HyperText link.  The existence of the hypertext link is visually indicated in a suitable style (e.g. underlined blue text).

This box contains these values:

startCharOffset: ñ the start offset of the text to be linked

endCharOffset: ñ the end offset of the text (start offset + number of characters)

URLLength:ñ the number of bytes in the following URL

URL: UTF-8 characters ñ the linked-to URL

altLength:ñ the number of bytes in the following ìaltî string

altstring: UTF-8 characters ñ an ìaltî string for user display

The URL should be an absolute URL, as the context for a relative URL may not always be clear. 

The ìaltî string may be used as a tool-tip or other visual clue, as a substitute for the URL, if desired by the terminal, to display to the user as a hint on where the link refers.

Hypertext-linked text should not be scrolled; not all terminals can display this or manage the user interaction to determine whether user has interacted with moving text.  It is also hard for the user to interact with scrolling text.

class TextHyperTextBox() extends TextSampleModifierBox (ëhrefí)

    unsigned int(16)  startcharoffset;

    unsigned int(16)  endcharoffset;

    unsigned int(8)   URLLength;

    unsigned int(8)   URL[URLLength];

    unsigned int(8)   altLength;

    unsigned int(8)   altstring[altLength];

 

D.8a.17.1.6   Textbox

ëtboxí ñ text box over-ride.  This over-rides the default text box set in the sample description.

class TextboxBox() extends TextSampleModifierBox ('tbox') {

    BoxRecord  text-box;

 

 

D.8a.17.1.7   Blink

ëblnkí ñ Blinking text.  This requests blinking text for the indicated character range.  Terminals are not required to support blinking text, and the precise way in which blinking is achieved, and its rate, is terminal-dependent.

class BlinkBox() extends TextSampleModifierBox ('blnk') {

    unsigned int(16)      startcharoffset;

    unsigned int(16)      endcharoffset;

 

 

D.8a.18   Combinations of features

Two modifier boxes of the same type shall not be applied to the same character (e.g. it is not permitted to have two href links from the same text). As the ëhclrí, ëdlayí and ëtboxí are globally applied to the whole text in a sample, two modifier boxes of the same type shall not be present within a sample.

Table D.8 details the effects of multiple options:

Table D.8: Combinations of features

 

 

 

 

 

 

First sample modifier atom

 

 

Sample description style record

styl

hlit

krok

href

blnk

Second sample

styl

1

3

 

 

 

 

modifier atom

hlit

 

 

3

 

 

 

 

krok

 

 

4

3

 

 

 

href

2

2

 

5

3

 

 

blnk

 

6

6

6

6

6

 

1.   The sample description provides the default style; the style records over-ride this for the selected characters.

2.   The terminal over-rides the chosen style for HREF links.

3.   Two records of the same type cannot be applied to the same character.

4.   Dynamic and static highlighting must not be applied to the same text.

5.   Dynamic highlighting and linking must not be applied to the same text.

6.   Blinking text is optional, particularly when requested in combination with other features.

DGS multimedia files can be identified using several mechanisms. When stored in traditional computer file systems, these files should be given the file extension ì.DGS î (readers should allow mixed case for the alphabetic characters).  The MIME types ìvideo/DGS î (for visual or audio/visual content, where visual includes both video and timed text) and ìaudio/DGS î (for purely audio content) are expected to be registered and used.

A file-type atom, as defined in the JPEG 2000 specification [36] shall be present in conforming files. The file type box ëftypí shall occur before any variable-length box (e.g. movie, free space, media data).  Only a fixed-size box such as a file signature, if required, may precede it.

The brand identifier for this specification is 'DGS '.  This brand identifier must occur in the compatible brands list, and may also be the primary brand.    If the file is also conformant to release 4 of this specification, it is recommended that the Release 4 brand 'DGS ' also occur in the compatible brands list; if DGS  is not in the compatible brand list the file will not be processed by a Release 4 reader.  Readers should check the compatible brands list for the identifiers they recognize, and not rely on the file having a particular primary brand, for maximum compatibility.  Files may be compatible with more than one brand, and have a 'best use' other than this specification, yet still be compatible with this specification.

Table D.9: The File-Type atom

Field

Type

Details

Value

AtomHeader.Size

Unsigned int(32)

 

 

AtomHeader.Type

Unsigned int(32)

 

'ftyp'

Brand

Unsigned int(32)

The major or ëbest useí of this file

 

MinorVersion

Unsigned int(32)

 

 

CompatibleBrands

Unsigned int(32)

A list of brands, to end of the atom

 

 

Brand:  Identifies the ëbest useí of this file.  The brand should match the file extension.  For files with extension ë.DGS í and conforming to this specification, the brand shall be ëDGS DDSí.

MinorVersion:  This identifies the minor version of the brand.  For files with brand 'DGS DDSZ', where Z is a digit, and conforming to release Z.x.y, this field takes the value x*256 + y.

CompatibleBrands:  a list of brand identifiers (to the end of the atom).  ëDGS í shall be a member of this list.


The AMR and AMR-WB speech codec DGS payload, storage format and MIME type registration are specified in [11].

 

     chapter of the specification. Greates care has been taken to keep

     the two documents consistence. However, in case of any divergence

     the specification takes presidence.

     the specification. This means all references using the form

     [ref] are defined in chapter 2 "References of the

     specification. All other references refer to parts within that

     document.

 

     Note: This Schemas has been aligned in structure and base

     vocabulary to the RDF Schema used by UAProf [40].

 <!-- ****************************************************************** -->

<!-- ***** Properties shared among the components***** -->

 

  <rdf:Description ID="defaults">

    <rdfs:domain rdf:resource="Streaming"/>

    <rdfs:comment>

      An attribute used to identify the default capabilities.

    </rdfs:comment>

  </rdf:Description>

 

<!-- ***** Component Definitions ***** -->

 

      The Streaming component specifies the base vocabulary for

      DGS/DDS servers supporting capability exchange should

      understand the attributes in this component as explained in

    </rdfs:comment>

  </rdf:Description>

 

     ** In the following property definitions, the defined types

     ** are as follows:

     **

     ** Number: A positive integer

     ** [0-9]+

     ** Boolean: A yes or no value

     ** Yes|No

     ** Literal: An alphanumeric string

     ** [A-Za-z0-9/.\-_]+

     ** Dimension: A pair of numbers

     ** [0-9]+x[0-9]+

 

 

<!-- ***** Component: Streaming ***** -->

 

<rdf:Description ID="AudioChannels">

  <rdf:type rdf:resource="http://www.w3.org/2000/01/rdfschema#Property"/>

  <rdfs:domain rdf:resource="#Streaming"/>

  <rdfs:comment>

    Description: This attribute describes the stereophonic capability of the natural audio device.  The only legal values are "Mono" and "Stereo".

 

    Type: Literal

    Resolution: Locked

    Examples: "Mono", "Stereo"

  </rdfs:comment>

</rdf:Description>

 

 

<rdf:Description ID="VideoPreDecoderBufferSize">

  <rdfs:domain rdf:resource="#Streaming"/>

  <rdfs:comment>

    Description: This attribute signals if the optional video

    buffering requirements defined in Annex DGS are supported. It also

    defines the size of the hypothetical pre-decoder buffer defined in

    Annex DGS. A value equal to zero means that Annex DGS is not

    supported. A value equal to one means that Annex DGS is

    supported. In this case the size of the buffer is the default size

    defined in Annex DGS.  A value equal to or greater than the default

    buffer size defined in Annex DGS means that Annex DGS is supported and

    sets the buffer size to the given number of octets. Legal values are all

    integer values equal to or greater than zero. Values greater than

    one but less than the default buffer size defined in Annex DGS are

    not allowed.

 

    Type: Number

    Resolution: Locked

    Examples: "0", "4096"

  </rdfs:comment>

</rdf:Description>

 

 

<rdf:Description ID="VideoInitialPostDecoderBufferingPeriod">

  <rdfs:domain rdf:resource="#Streaming"/>

  <rdfs:comment>

    Description: If Annex DGS is not supported, the attribute has no

    meaning. If Annex DGS is supported, this attribute defines the

    maximum initial post-decoder buffering period of video. Values are

    interpreted as clock ticks of a 90-kHz clock. In other words, the

    value is incremented by one for each 1/90 000 seconds. For

    example, the value 9000 corresponds to 1/10 of a second initial

    post-decodder buffering. Legal valaues are all integer value equal

    to or greater than zero.

 

    Type: Number

    Resolution: Locked

    Examples: <VideoInitialPostDecoderBufferingPeriod>

                9000

          </VideoInitialPostDecoderBufferingPeriod>

  </rdfs:comment>

</rdf:Description>

 

<rdf:Description ID=" VideoDecodingByteRate ">

  <rdfs:domain rdf:resource="#Streaming"/>

  <rdfs:comment>

Description: If Annex DGS is not supported, the attribute has no meaning. If Annex DGS is supported, this attribute defines the peak decoding byte rate the DGS client is able to support. In other words, the DGS client fulfils the requirements given in Annex DGS with the signalled peak decoding byte rate. The values are given in bytes per second and shall be greater than or equal to 8000. According to Annex DGS, 8000 is the default peak decoding byte rate for the mandatory video codec profile and level (H.263 Profile 0 Level 10).Legal values are integer value greater than or equal to 8000.

 

    Type: Number

    Resolution: Locked

    Examples: <VideoDecodingByteRate>16000</VideoDecodingByteRate>

  </rdfs:comment>

</rdf:Description>

 

<rdf:Description ID=" MaxPolyphony">

  <rdf:type rdf:resource="http://www.w3.org/2000/01/rdfschema#Property"/>

  <rdfs:domain rdf:resource="#Streaming"/>

  <rdfs:comment>

    Description: Attribute definition:  The MaxPolyphony attribute refers to the maximal polyphony

    that the synthetic audio device supports as defined in [44]. Legal values are integer between 5

    to 24.

    NOTE:      MaxPolyphony attribute can be used to signal the maximum polyphony capabilities supported by the DGS client. This is a complementary mechanism for the delivery of compatible SP-MIDI content and thus the DGS client is required to support Scalable Polyphony MIDI i.e. Channel Masking defined in [44].

 

    Type: Number

    Resolution: Locked

    Examples: <MaxPolyphony>8</MaxPolyphony>

  </rdfs:comment>

</rdf:Description>

 

<rdf:Description ID="DGS Accept">

   <rdfs:domain rdf:resource="#Streaming"/>

  <rdfs:comment>

    Description: List of content types (MIME types) the DGS

    application supports. Both DGS/DDS Accept (SoftwarePlatform, UAProf)

    and DGS Accept can be used but if DGS Accept is defined it has

    precedence over DGS/DDS Accept and a DGS application shall then use

  DGS Accept.

 

    Type: Literal (bag)

    Resolution: Append

    Examples: "audio/AMR-WB;octet-alignment,application/smil"

  </rdfs:comment>

</rdf:Description>

 

 

<rdf:Description ID="DGS Accept-Subset">

  <rdfs:domain rdf:resource="#Streaming"/>

  <rdfs:comment>

    Description: List of content types for which the DGS application

    supports a subset. MIME-types can in most cases effectively be

    used to express variations in support for different media

    types. Many MIME-types, e.g. AMR-NB has several parameters that

    can be used for this purpose. There may exist content types for

    which the DGS application only supports a subset and this subset

    can not be expressed with MIME-type parameters. In these cases the

    attribute DGS Accept-Subset is used to describe support for a

    subset of a specific content type. If a subset of a specific

    content type is declared in DGS Accept-Subset, this means that

    DGS Accept-Subset has precedence over both DGS Accept and CcppAccept.
   
DGS Accept and/orDGS/DDS Accept shall always include the corresponding
    content types for which
DGS Accept-Subset specifies subsets of. 
    This is to ensure compatibility with those content servers that

    do not understand the DGS Accept-Subset attribute but do understand e.g. CcppAccept.   

 

This is illustrated with an example. If DGS Accept="audio/AMR",

"image/jpeg" and PssAccept-Subset="JPEG-DGS" then "audio/AMR"

and JPEG Base line is supported. "image/jpeg" in DGS Accept is of no

importance since it is related to "JPEG-DGS" in DGS Accept-Subset.

Subset identifiers and corresponding semantics shall only be defined by

the DGS/DDS  responsible for the present document. The following values are defined:

    -   "JPEG-DGS": Only the two JPEG modes described in clause 7.5 of the present

       document are supported.

    -   "SVG-Tiny"

    -   "SVG-Basic"

    Legal values are subset identifiers defined by the specification.

 

    Type: Literal (bag)

    Resolution: Locked

    Examples: "JPEG-DGS","SVG-Tiny","SVG-Basic"

  </rdfs:comment>

</rdf:Description>

 

 

<rdf:Description ID="DGS Version">

  <rdfs:domain rdf:resource="#Streaming"/>

  <rdfs:comment>

    Description: Latest DGS/DDS  version supported by the client. Legal

    values are "DGS DDS-R4", "DGS DDS-" and so forth.

 

    Type: Literal

    Resolution: Locked

    Examples: "DGS DDS-"

  </rdfs:comment>

</rdf:Description>

 

 

<rdf:Description ID="RenderingScreenSize">

  <rdfs:domain rdf:resource="#Streaming"/>

  <rdfs:comment>

    Description: The rendering size of the device's screen in unit of

    pixels. The horizontal size is given followed by the vertical

    size. Legal values are pairs of integer values equal or greater

    than zero. A value equal "0x0"means that there exist no display or

    just textual output is supported.

 

    Type: Dimension

    Resolution: Locked

    Examples: "160x120"

  </rdfs:comment>

</rdf:Description>

 

 

<rdf:Description ID="SmilBaseSet">

  <rdfs:domain rdf:resource="#Streaming"/>

  <rdfs:comment>

    Description: Indicates a base set of SMIL 2.0 modules that the

    client supports. Leagal values are the following pre-defined

    identifiers: "SMIL-DGS DDS-" indicates all SMIL 2.0

    modules required for scene description support according to clause

    8 of Release 4 of TS 26.234. "SMIL-DGS DDS-R5" indicates all SMIL 2.0

    modules required for scene description support according to clause

    8 of the specification.

 

    Type: Literal

    Resolution: Locked

    Examples: "SMIL-DGS DDS-R4", "SMIL-DGS DDS-R5"

  </rdfs:comment>

</rdf:Description>

 

 

<rdf:Description ID="SmilModules">

  <rdfs:domain rdf:resource="#Streaming"/>

  <rdfs:comment>

    Description: This attribute defines a list of SMIL 2.0 modules

    supported by the client. If the SmilBaseSet is used those modules

    do not need to be explicitly listed here. In that case only

    additional module support needs to be listed. Legal values are all

    SMIL 2.0 module names defined in the SMIL 2.0 recommendation [31],

    section 2.3.3, table 2.

 

    Type: Literal (bag)

    Resolution: Locked

    Examples: "BasicTransitions,MulitArcTiming"

  </rdfs:comment>

</rdf:Description>


This annex describes video buffering requirements in the DGS. As defined in clause 7.4 of the present document, support for the annex is optional and may be signalled in the DGS capability exchange and in the DGS. This is described in clause 5.2 and clause 5.3.3 of the present document. When the annex is in use, the content of the annex is normative. In other words, DGS clients shall be capable of receiving an DGS packet stream that complies with the specified buffering model and DGS servers shall verify that the transmitted DGS packet stream complies with the specified buffering model.

 

The behaviour of the DGS buffering model is controlled with the following parameters: the initial pre-decoder buffering period, the initial post-decoder buffering period, the size of the hypothetical pre-decoder buffer, the peak decoding byte rate, and the decoding macroblock rate. The default values of the parameters are defined below.

-     The default initial pre-decoder buffering period is 1 second.

-     The default initial post-decoder buffering period is zero.

-     The default size of the hypothetical pre-decoder buffer is defined according to the maximum video bit-rate according to the table below:

Table : Default size of the hypothetical pre-decoder buffer

Maximum video bit-rate

Default size of the hypothetical pre-decoder buffer

65536 bits per second

20480 bytes

131072 bits per second

40960 bytes

Undefined

51200 bytes

 

-     The maximum video bit-rate can be signalled in the media-level bandwidth attribute of DGS/DDS as defined in clause 5.3.3 of this document. If the video-level bandwidth attribute was not present in the presentation description, the maximum video bit-rate is defined according to the video coding profile and level in use.

-     The size of the hypothetical post-decoder buffer is an implementation-specific issue. The buffer size can be estimated from the maximum output data rate of the decoders in use and from the initial post-decoder buffering period.

-     By default, the peak decoding byte rate is defined according to the video coding profile and level in use. For example, H.263 Level 10 requires support for bit-rates up to 64000 bits per second. Thus, the peak decoding byte rate equals to 8000 bytes per second.

-     The default decoding macroblock rate is defined according to the video coding profile and level in use. If MPEG-4 Visual is in use, the default macroblock rate equals to VCV decoder rate. If H.263 is in use, the default macroblock rate equals to (1 / minimum picture interval) multiplied by number of macroblocks in maximum picture format. For example, H.263 Level 10 requires support for picture formats up to QCIF and minimum picture interval down to 2002 / 30000 sec. Thus, the default macroblock rate would be 30000 x 99 / 2002 ª 1484 macroblocks per second.

DGS clients may signal their capability of providing larger buffers and faster peak decoding byte rates in the capability exchange process described in clause 5.2 of the present document. The average coded video bit-rate should be smaller than or equal to the bit-rate indicated by the video coding profile and level in use, even if a faster peak decoding byte rate were signalled.

Initial parameter values for each stream can be signalled within the DGS description of the stream. Signalled parameter values override the corresponding default parameter values. The values signalled within the DGS description guarantee pauseless playback from the beginning of the stream until the end of the stream (assuming a constant-delay reliable transmission channel).

DGS servers may update parameter values in the response for an DGS/DDS  PLAY request. If an updated parameter value is present, it shall replace the value signalled in the DGS description or the default parameter value in the operation of the DGS buffering model. An updated parameter value is valid only in the indicated playback range, and it has no effect after that. Assuming a constant-delay reliable transmission channel, the updated parameter values guarantee pauseless playback of the actual range indicated in the response for the PLAY request. The indicated pre-decoder buffer size and initial post-decoder buffering period shall be smaller than or equal to the corresponding values in the DGS description or the corresponding default values, whichever ones are valid. The following header fields are defined for DGS/DDS :

-     x-predecbufsize:<size of the hypothetical pre-decoder buffer>
This gives the suggested size of the Annex
DGS hypothetical pre-decoder buffer in bytes.

-     x-initpredecbufperiod:<initial pre-decoder buffering period>
This gives the required initial pre-decoder buffering period specified according to Annex
DGS. Values are interpreted as clock ticks of a 90-kHz clock. That is, the value is incremented by one for each 1/90 000 seconds. For example, value 180 000 corresponds to a two second initial pre-decoder buffering.

-     x-initpostdecbufperiod:<initial post-decoder buffering period>
This gives the required initial post-decoder buffering period specified according to Annex
DGS. Values are interpreted as clock ticks of a 90-kHz clock.

These header fields are defined for the response of an DGS/DDS  PLAY request only. Their use is optional.

The following example plays the whole presentation starting at DGS  time code 0:10:20 until the end of the clip. The playback is . The suggested initial post-decoder buffering period is half a second.

     C->S: PLAY rtsp://audio.example.com/twister.en DGS/DDS /1.0

           CSeq: 833

           Session: 12345678

           Range: smpte=0:10:20-;time=19970123T153600Z

 

     S->C: DGS/DDS /1.0 200 OK

           CSeq: 833

           Date: 23 Jan 1997 15:35:06 GMT

           Range: smpte=0:10:22-;time=19970123T153600Z

           x-initpredecbufperiod: 45000

 

 

The DGS server buffering verifier is specified according to the DGS buffering model. The model is based on two buffers and two timers. The buffers are called the hypothetical pre-decoder buffer and the hypothetical post-decoder buffer. The timers are named the decoding timer and the playback timer.

The DGS buffering model is presented below.

1.   The buffers are initially empty.

2.   A DGS Server adds each transmitted DGS packet having video payload to the pre-decoder buffer immediately when it is transmitted. All protocol headers at DGS or any lower layer are removed.

3.   Data is not removed from the pre-decoder buffer during a period called the initial pre-decoder buffering period. The period starts when the first DGS packet is added to the buffer.

4.   When the initial pre-decoder buffering period has expired, the decoding timer is started from a position indicated in the previous DGS  PLAY request.

5.   Removal of a video frame is started when both of the following two conditions are met: First, the decoding timer has reached the scheduled playback time of the frame. Second, the previous video frame has been totally removed from the pre-decoder buffer.

6.   The duration of frame removal is the larger one of the two candidates: The first candidate is equal to the number of macroblocks in the frame divided by the decoding macroblock rate. The second candidate is equal to the number of bytes in the frame divided by the peak decoding byte rate. When the coded video frame has been removed from the pre-decoder buffer entirely, the corresponding uncompressed video frame is located into the post-decoder buffer.

7.   Data is not removed from the post-decoder buffer during a period called the initial post-decoder buffering period. The period starts when the first frame has been placed into the post-decoder buffer.

8.   When the initial post-decoder buffering period has expired, the playback timer is started from the position indicated in the previous DGS  PLAY request.

9.   A frame is removed from the post-decoder buffer immediately when the playback timer reaches the scheduled playback time of the frame.

10. Each DGS  PLAY request resets the DGS buffering model to its initial state.

A DGS server shall verify that a transmitted DGS packet stream complies with the following requirements:

-     The DGS buffering model shall be used with the default or signalled buffering parameter values. Signalled parameter values override the corresponding default parameter values.

-     The occupancy of the hypothetical pre-decoder buffer shall not exceed the default or signalled buffer size.

-     Each frame shall be inserted into the hypothetical post-decoder buffer before or on its scheduled playback time.

When the annex is in use, the DGS client shall be capable of receiving an DGS packet stream that complies with the DGS server buffering verifier, when the DGS packet stream is carried over a constant-delay reliable transmission channel. Furthermore, the video decoder of the DGS client, which may include handling of post-decoder buffering, shall output frames at the correct rate defined by the DGS time-stamps of the received packet stream


It is recommended that the first element of the MIP (Maximum Instantaneous Polyphony) message of the DGS -MIDI content intended for synthetic audio DGS/DDS  should be no more than 5. For instance the following MIP figures {4, 9, 10, 12, 12, 16, 17, 20, 26, 26, 26} complies with the recommendation whereas  {6, 9, 10, 12, 12, 16, 17, 20, 26, 26, 26} does not.


This informative annex describes some implementation guidelines intended for DGS-MIDI device 5-24 Note Profile for DGS [45].  These guidelines are here to give the possibility for manufacturers to develop early DGS-MIDI implementations using MIDI hardware available at the time of the approval of release 5. These guidelines are valid only for release 5 implementations of DGS-MIDI and are expected to be removed . It should be noted that these guidelines may reduce the musical performance of the synthesiser depending on the content and should be used with extreme caution.

I.2.1      Support of multiple rhythm channels

Scalable Polyphony synthesisers conformant to this Profile shall support at least two MIDI Channels that can function as Rhythm Channels, to enable a fluent scalable polyphony implementation.

If the two rhythm Channels are not natively supported by the MIDI hardware, the SP-MIDI player could redirect the events intended to the additional rhythm channels toward the default rhythm channel (MIDI channel 10). The rendering of the SP-MIDI content should not be affected until different Channel settings (e.g. Channel Volume, Bank Setting, Panning etc.) are applied to the different rhythm Channels. It is recommended that only Channel settings intended for the default rhythm channel be applied.

I.2.2      Support of individual stereophonic panning

When the support of individual stereophonic panning is not possible by the stereophonic MIDI synthesiser, central panning should be used as default instead.

 


This Annex gives recommendation for the mapping rules needed by the DGS applications to request the appropriate QoS from the UMTS network (see Table J.1).

Table J.1: Mapping of DGS/DDS parameters to UMTS QoS parameters for DGS

QoS parameter

Parameter value

comment

Delivery of erroneous SDUs

"no"[TBC]

 

Delivery order

Yes

 

Traffic class

"Streaming class"

 

Maximum SDU size

1520 bytes

 

Guaranteed bit rate for downlink

1.025 * SDP session bandwidth [TBC]

 

Maximum bit rate for downlink

Equal or higher to guaranteed bit rate in downlink

Specifying a minimum overhead bit rate per media might be useful and is FFS

Guaranteed bit rate for uplink

0.025 * SDP session bandwidth [TBC]

 

Maximum bit rate for uplink

Equal or higher to guaranteed bit rate in uplink

 

Residual BER

1*10-5 [TBC]

16 bit CRC should be enough

SDU error ratio

1*10-4 or better

1*10-3 could be  acceptable. RLC AM mode should easily enable 10-4.

Traffic handling priority

Subscribed traffic handling priority

Ignored

Transfer delay

[1s to 1.5s]

 

 


Change history  DGD/DDS

Date

TSG SA#

TSG Doc.

CR

Rev

Subject/Comment

Old

New

03-1998

11

SP-010094

 

 

Version for Release 4

 

4.0.0

09-1998

13

SP-010457

001

1

DGS  DDS SMIL Language Profile

4.0.0

4.1.0

09-1998

13

SP-010457

002

 

Clarification of H.263 baseline settings

4.0.0

4.1.0

09-1998

13

SP-010457

003

2

Updates to references

4.0.0

4.1.0

09-1998

13

SP-010457

004

1

Corrections to Annex A

4.0.0

4.1.0

09-1998

13

SP-010457

005

1

Clarifications to chapter 7

4.0.0

4.1.0

09-1998

13

SP-010457

006

1

Clarification of the use of XHTML Basic

4.0.0

4.1.0

12-1998

14

SP-010703

007

 

Correction of DGS Usage

4.1.0

4.2.0

12-1998

14

SP-010703

008

1

Implementation guidelines for DDS and DGS

4.1.0

4.2.0

12-1998

14

SP-010703

009

 

Correction to media type decoder support in the DGS client

4.1.0

4.2.0

12-1998

14

SP-010703

010

 

Amendments to file format support for 26.234 release 4

4.1.0

4.2.0

03-1998

15

SP-020087

011

 

Specification of missing limit for number of AMR Frames per Sample

4.2.0

4.3.0

03-2002

15

SP-020087

013

2

Removing of the reference to TS 26.235

4.2.0

4.3.0

03-2002

15

SP-020087

014

 

Correction to the reference for the XHTML MIME media type

4.2.0

4.3.0

03-2002

15

SP-020087

015

1

Correction to MPEG-4 references

4.2.0

4.3.0

03-2002

15

SP-020087

018

1

Correction to the width field of H263SampleEntry Atom in Section D.6

4.2.0

4.3.0

03-2002

15

SP-020087

019

 

Correction to the definition of "b=AS"

4.2.0

4.3.0

03-2002

15

SP-020087

020

 

Clarification of the index number's range in the referred MP4 file format

4.2.0

4.3.0

03-2002

15

SP-020087

021

 

Correction of DGS attribute 'C='

4.2.0

4.3.0

03-2002

15

SP-020173

023

 

References to "DGS AMR-WB codec" replaced by "ITU-T Rec. DGS.722.2" and "DDS 3267"

4.2.0

4.3.0

03-2002

15

SP-020088

022

2

Addition of Release 5 functionality

4.3.0

5.0.0

06-2002

16

SP-020226

024

1

Correction to Timed Text

5.0.0

5.1.0

06-2002

16

SP-020226

026

3

Mime media type update

5.0.0

5.1.0

06-2002

16

SP-020226

027

 

Corrections to the description of Sample Description atom and Timed Text Format

5.0.0

5.1.0

06-2002

16

SP-020226

029

1

Corrections Based on Interoperability Issues

5.0.0

5.1.0

DEFOSSE G  DGS DDS SYSTEMS