© Guillaume DefossÈ
This Technical Specification has
been produced by the DGS Generation Partnership Project (DGS
DDS).
The contents of the present document
are subject to continuing work within the DGS and may change following
formal DGS approval. Should the DGS modify the contents of the present
document, it will be re-released by the DGS with an identifying change of
release date and an increase in version number as follows:
x the first digit:
1 presented to
DGS for
information;
2 presented to
DGS for
approval;
3 or greater indicates
DGS approved document under change control.
y the second digit is
incremented for all changes of substance, i.e. technical enhancements,
corrections, updates, etc.
z the third digit
is incremented when editorial only changes have been incorporated in the
specification;
The DGS DDS
transparent end-to-end packet-switched streaming service (DGS) specification
consists of three DGS DDS and the present
document. The first contains the service requirements for the
DGS the second provides an
overview of the DGS DDS and the present document the details of
protocol and codecs used by the service.
Streaming refers to the ability of
an application to play synchronised media streams like audio and video
streams in a continuous way while those streams are being transmitted to the
client over a data network.
Applications, which can be built on
top of streaming services, can be classified into on-demand and live
information delivery applications. Examples of the first category are music
and news-on-demand applications. Live delivery of radio and television
programs are examples of the second category.
The DGS DDS provides a framework for Internet Protocol (IP) based streaming
applications in DGS networks.
The present document specifies the
protocols and codecs for the DGS within the DGS DDS system. Protocols for
control signalling, capability exchange, scene description, media transport
and media encapsulations are specified. Codecs for speech, natural and
synthetic audio, video, still images, bitmap graphics, vector graphics,
timed text and text are specified.
The present document is applicable
to IP based packet switched networks.
The
following documents contain provisions which, through reference in this
text, constitute provisions of the present document.
…
References are either specific (identified
by date of publication, edition number, version number, etc.) or
non‑specific.
…
For a specific reference, subsequent
revisions do not apply.
…
For a non-specific reference, the latest
version applies. In the case of a reference to a DGS DDS
document (including a
For
the purposes of the present document, the following terms and definitions
apply:
continuous media: media with an inherent
notion of time. In the present document speech, audio, video and timed text
discrete media:
media that itself does not contain an element of time.In the present
document all media not defined as continuous media
device capability description:
a description of device capabilities and/or user preferences. Contains a
number of capability attributes
device capability profile:
same as device capability description
presentation description: contains information about one or more media streams
within a presentation, such as the set of encodings, network addresses and
information about the content
DGSclient:
client for the DGS packet switched streaming service based on the
DGS DDS
and/or HTTP standards, with possible additional DGS requirements according
to the present document
DGS server:
server for the DGS packet switched streaming service based on theDGS
DDS
and/or HTTP standards, with possible additional DGS requirements according
to the present document
scene description:
description of the spatial layout and temporal behaviour of a presentation.
It can also contain hyperlinks
For the purposes of the present
document, the abbreviations given in DGS and the following
apply.
AAC
Advanced Audio Coding
BIFS
Binary Format for Scenes
DGS/DDS
Composite Capability / Preference Profiles
DCT
Discrete Cosine Transform
GIF
Graphics Interchange Format
HTML
Hyper Text Markup Language
ITU-T
International Telecommunications Union ñ Telecommunications
JFIF
JPEG File Interchange Format
MIDI
Musical Instrument Digital Interface
MIME
Multipurpose Internet Mail Extensions
MMS
Multimedia Messaging Service
MP4
MPEG-4 file format
PNG
Portable Networks Graphics
PSS
Packet-switched Streaming Service
QCIF
Quarter Common Intermediate Format
RDF
Resource Description Framework
RTCP
RTP Control Protocol
RTP
Real-time Transport Protocol
RTSP
Real-Time Streaming Protocol
SDP
Session Description Protocol
SMIL
Synchronised Multimedia Integration Language
SP-MIDI
Scalable Polyphony MIDI
SVG
Scalable Vector Graphics
UAProf
User Agent Profile
UCS-2
Universal Character Set (the two octet form)
UTF-8
Unicode Transformation Format (the 8-bit form)
UTF-16
Unicode Transformation Format (the 16-bit form)
WML
Wireless Markup Language
XHTML
eXtensible Hyper Text Markup Language
XML
eXtensible Markup Language

:
the functional components of aDGS
client. gives an overview of the protocol stack used in a DGS client and also shows a more detailed view of the packet based
network interface. The functional components can be divided into control,
scene description, media codecs and the transport of media and control data.
The control related elements are
session establishment, capability exchange and session control (see clause
5).
- Session
establishment refers to methods to invoke a DGS session from a browser or
directly by entering an URL in the terminal's user interface.
- Capability
exchange enables choice or adaptation of media streams depending on
different terminal capabilities.
- Session
control deals with the set-up of the individual media streams between a
DGS client and one or several DGS servers. It also enables control of the individual
media streams by the user. It may involve VCR-like presentation control
functions like start, pause, fast forward and stop of a media presentation.
The scene description consists of
spatial layout and a description of the temporal relation between different
media that is included in the media presentation. The first gives the layout
of different media components on the screen and the latter controls the
synchronisation of the different media (see clause 8).
The PSS includes media codecs for
video, still images, vector graphics, bitmap graphics, text, timed text,
natural and synthetic audio, and speech (see clause 7).
Transport of media and control
data consists of the encapsulation of the coded media and control data in a
transport protocol (see clause 6). This is shown in figure 1 as the "packet
based network interface" and displayed in more detail in the protocol stack
of DEFOSSE G.
Session establishment refers to
the method by which a DGS client obtains the initial session description. The
initial session description can e.g. be a presentation description, a scene
description or just an URL to the content.
A DGS client shall support initial
session descriptions specified in one of the following formats: SMIL, SDP,
or plain RTSP URL.
In addition to rtsp:// the
DGS
client shall support URLs [4] to valid initial session descriptions starting
with file:// (for locally stored files) and http:// (for presentation
descriptions or scene descriptions delivered via HTTP). rtsp://mediaportal/morning_news.
URLs can be made available to a
DGS
client in many different ways. It is out of the scope of this recommendation
to mandate any specific mechanism. However, an application using the
DGS shall at least support URLs of the above type, specified or selected by
the user.
The preferred way would be to embed
URLs to initial session descriptions within HTML or WML pages. Browser
applications that support the HTTP protocol could then download the initial
session description and pass the content to the DGS client for further
processing. How exactly this is done is an implementation specific issue and
out of the scope of this recommendation.
Capability exchange is an important
functionality in the DGS. It enables DGS servers to provide a wide range of
devices with content suitable for the particular device in question. Another
very important task is to provide a smooth transition between different
releases of DGS. Therefore, DGS clients and servers should support
capability exchange.
The specification of capability
exchange for DGS is divided into two parts. The normative part contained in
clause 5.2 and an informative part in clause A.4 in Annex A of the present
document. The normative part gives all the necessary requirements that a
client or server shall conform to when implementing capability exchange in
the DGS. The informative part provides additional important information for
understanding the concept and usage of the functionality. It is recommended
to read clause A.4 in Annex A before continuing with clauses 5.2.2-5.2.7.
A device capability profile is a RDF
[41] document that follows the structure of the DGS/DDS framework [39] and the
DGS/DDS application UAProf [40]. Attributes are used to specify device
capabilities and preferences. A set of attribute names, permissible values
and semantics constitute a DGS/DDS vocabulary, which is defined by a RDF
schema. For DGS the UAProf vocabulary is reused
and an additional DGS
specific vocabulary is defined. The details can be found in clause 5.2.3.
The syntax of the attributes is defined in the vocabulary schema but also,
to some extent, the semantics. A DGS device capability profile is an
instance of the schema (UAProf and/or the PSS specific schema) and shall
follow the rules governing the formation of a profile given in the
DGS DDS
specification [39]. The profile schema shall also be governed by the rules
defined in UAProf [40] chapter 7, 7.1, 7.3 and 7.4.
Clause 5.2.3 specifies the
attribute vocabularies to be used by the DGS capability exchange.
DGS servers should understand the
attributes in both the streaming component of the DGS base vocabulary and
the recommended attributes from the UAProf vocabulary [40]. A server may
additionally support other UAProf attributes5.2.3.2
DGS base vocabulary
The DGS base vocabulary contains one
component called "Streaming". A vocabulary extension to UAProf shall be
defined as a RDF schema. This schema can be found in Annex F. The schema
together with the description of the attributes in the present clause,
defines the vocabulary. The vocabulary is associated with an XML namespace,
which combines a base URI with a local XML element name to yield a URI.
Annex F provides the details.
All
DGS attributes are put in aDGS
specific component called ìStreamingî. The list of DGS attributes is as
follows:
Attribute name: AudioChannels
Attribute definition: This
attribute describes the stereophonic capability of the natural audio device.
Component:
Streaming
Type:
Literal
Legal values:
ìMonoî, ìStereoî
Resolution rule:
Locked
EXAMPLE 1:
<AudioChannels>Mono</AudioChannels>
Attribute name: MaxPolyphony
Attribute definition: The MaxPolyphony
attribute refers to the maximal polyphony that the synthetic audio device
supports as defined in [44].
NOTE:
MaxPolyphony attribute can be used to signal the maximum polyphony
capabilities supported by the DGS client. This is a complementary mechanism
for the delivery of compatible SP-MIDI content and thus the DGS client is
required to support Scalable Polyphony MIDI i.e. Channel Masking defined in
[44].
Component:
Streaming
Type:
Number
Legal values:
Integer between 5 and 24
Resolution rule:
Locked
EXAMPLE 2:
<MaxPolyphony>8</MaxPolyphony>
Attribute name: DGS Accept
Attribute definition: List of content types
(MIME types) the DGS application supports. Both CcppAccept (SoftwarePlatform,
UAProf) and PssAccept can be used but if PssAccept is defined it has
precedence over CcppAccept.
Component:
Streaming
Type:
Literal (Bag)
Legal values:
List of MIME types with related parameters.
Resolution rule:
Append
EXAMPLE 3:
<DGS
Accept>
<rdf:Bag>
<rdf:li>audio/AMR-WB; octet-alignment</rdf:li>
<rdf:li>application/smil</rdf:li>
</rdf:Bag>
</DGS
Accept>
Attribute name:
DGS Accept-Subset
Attribute definition: List of content types
for which the DGS application supports a subset. MIME-types can in most
cases effectively be used to express variations in support for different
media types. Many MIME-types, e.g. AMR-NB has several parameters that can be
used for this purpose. There may exist content types for which the
DGS
application only supports a subset and this subset can not be expressed with
MIME-type parameters. In these cases the attribute DGS
Accept-Subset is used
to describe support for a subset of a specific content type. If a subset of
a specific content type is declared in DGS
Accept-Subset, this means thatDGS
Accept-Subset has precedence over both DGS
Accept and DDS
Accept.
DGS
Accept and/or DDS
Accept shall always include the corresponding content
types for which DGS
Accept-Subset specifies subsets of. This is to
ensure compatibility with those content servers that do not understand the
DGS
Accept-Subset attribute but do understand e.g. DDS
Accept.
This is illustrated with an example. DGS DDS="audio/AMR", "image/jpeg"
and DGS
Accept-Subset="JPEG-DGS" then "audio/AMR" and JPEG Base line are
supported. "image/jpeg" in DGS
Accept is of no importance since it is related
to "JPEG-DGS" in DDS
Accept-Subset. Subset identifiers and corresponding
semantics shall only be defined by the DDS
responsible for the present
document. The following values are defined:
- "JPEG-DGS": Only the two JPEG
modes described in clause 7.5 of the present document are supported.
- "SVG-Tiny"
- "SVG-Basic"
Component:
Streaming
Type:
Literal (Bag)
Legal values:
"JPEG-DGS", "SVG-Tiny", "SVG-Basic"
Resolution rule:
Append
EXAMPLE 4:
<DGS
Accept-Subset>
<rdf:Bag>
<rdf:li>JPEG-DGS
DDS
</rdf:li>
</rdf:Bag>
</DGS
Accept-Subset>
Attribute name:
DGS Version
Attribute definition:
DGS
version supported by the client.
Component:
Streaming
Type:
Literal
Legal values:
"DGS DDS-R4", "DGS " and so forth.
Resolution rule:
Locked
EXAMPLE 5:
<DGS
Version>DGS
DDS</DDS
Version>
Attribute name: RenderingScreenSize
Attribute definition: The rendering size of
the deviceís screen in unit of pixels. The horizontal size is given followed
by the vertical size.
Component:
Streaming
Type:
Dimension
Legal values:
Two integer values equal or greater than zero. A value equal ì0x0îmeans that
there exists no possibility to render visual DGS presentations.
Resolution rule:
Locked
EXAMPLE 6:
<RenderingScreenSize>70x15</RenderingScreenSize>
Attribute name: SmilBaseSet
Attribute definition: Indicates
a base set of SMIL 2.0 modules that the client supports.
Component:
Streaming
Type:
Literal
Legal values:
Pre-defined identifiers. "SMIL-DGS-R4" indicates all SMIL 2.0 modules
required for scene description support according to clause 8 of Release 4 of
TS 26.234. "SMIL-DGS" indicates all SMIL
2.0 modules required for scene description support according to clause 8 of
the present document DEFOSSE G
Resolution rule:
Locked
EXAMPLE 7:
<SmilBaseSet>SMIL-DGS.
Attribute name: SmilModules
Attribute definition: This attribute defines
a list of SMIL 2.0 modules supported by the client. If the SmilBaseSet is
used those modules do not need to be explicitly listed here. In that case
only additional module support needs to be listed.
Component:
Streaming
Type:
Literal (Bag)
Legal values:
SMIL 2.0 module names defined in the SMIL 2.0 recommendation [31], section
2.3.3, table 2.
Resolution rule:
Append
EXAMPLE 8:
<SmilModules>
<rdf:Bag>
<rdf:li>BasicTransitions</rdf:li>
<rdf:li>MulitArcTiming</rdf:li>
</rdf:Bag>
</SmilModules>
Attribute name: VideoDecodingByteRate
Attribute definition: If Annex DGS is not
supported, the attribute has no meaning. If Annex DGS is supported, this
attribute defines the peak decoding byte rate the DGS client is able to
support. In other words, the DGS client fulfils the requirements given in
Annex DGS with the signalled peak decoding byte rate. The values are given in
bytes per second and shall be greater than or equal to 8000. According to
Annex
DGS, 8000 is the default peak decoding byte rate for the mandatory video codec profile and level (H.263 Profile 0 Level 10).
Component:
Streaming
Type:
Number
Legal values:
Integer value greater than or equal to 8000.
Resolution rule:
Locked
EXAMPLE 9:
<VideoDecodingByteRate>16000</VideoDecodingByteRate>
Attribute name: VideoInitialPostDecoderBufferingPeriod
Attribute definition: If Annex DGS is not
supported, the attribute has no meaning. If Annex DGS is supported, this
attribute defines the maximum initial post-decoder buffering period of
video. Values are interpreted as clock ticks of a 90-kHz clock. In other
words, the value is incremented by one for each 1/90 000 seconds. For
example, the value 9000 corresponds to 1/10 of a second initial post-decoder
buffering.
Component:
Streaming
Type:
Number
Legal values:
Integer value equal to or greater than zero.
Resolution rule:
Locked
EXAMPLE 10:
<VideoInitialPostDecoderBufferingPeriod>9000
</VideoInitialPostDecoderBufferingPeriod>
Attribute name: VideoPreDecoderBufferSize
Attribute definition: This attribute signals
if the optional video buffering requirements defined in Annex DGS are
supported. It also defines the size of the hypothetical pre-decoder buffer
defined in Annex DGS. A value equal to zero means that Annex
DGS is not
supported. A value equal to one means that Annex DGS is supported. In this
case the size of the buffer is the default size defined in Annex DGS. A
value equal to or greater than the default buffer size defined in Annex
DGS
means that Annex DGS is supported and sets the buffer size to the given number
of octets.
Component:
Streaming
Legal values:
Integer value equal to or greater than zero. Values greater than one but
less than the default buffer size defined in Annex DGS are not allowed.
Resolution rule:
Locked
EXAMPLE 11: <VideoPreDecoderBufferSize>30720</VideoPreDecoderBufferSize>
In the UAProf vocabulary [40] there
are several attributes that are of interest for the DGS. The formal
definition of these attributes is given in [40]. The following list of
attributes is recommended for DGS applications:
Attribute name: BitsPerPixel
Component:
HardwarePlatform
Attribute description:
The number of bits of colour or greyscale information per pixel
EXAMPLE 1:
<BitsPerPixel>8</BitsPerPixel>
Attribute name: ColorCapable
Component:
HardwarePlatform
Attribute description:
Whether the device display supports colour or not.
EXAMPLE 2:
<ColorCapable>Yes</ColorCapable>
Attribute name: PixelAspectRatio
Component:
HardwarePlatform
Attribute description:
Ratio of pixel width to pixel height
EXAMPLE 3:
<PixelAspectRatio>1x2</PixelAspectRatio>
Attribute name: PointingResolution
Component:
HardwarePlatform
Attribute description:
Type of resolution of the pointing accessory supported by the device.
EXAMPLE 4:
<PointingResolution>Pixel</PointingResolution>
Attribute name: Model
Component:
HardwarePlatform
Attribute description:
Model number assigned to the terminal device by the vendor or manufactur
Attribute name: Vendor
Component:
HardwarePlatform
Attribute description:
Name of the vendor manufacturing the terminal devic
Attribute name: CcppAccept-Charset
Component:
SoftwarePlatform DGS
Attribute description:
List of character sets the device supports
EXAMPLE 7:
<DGS
Accept-Charset>
<rdf:Bag>
<rdf:li>UTF-8</rdf:li>
</rdf:Bag>
</DGS
Accept-Charset>
Attribute name:
DGS Accept-Encoding
Component:
SoftwarePlatform DGS
Attribute description:
List of transfer encodings the device supports
EXAMPLE 8:
<DGS
Accept-Encoding>
<rdf:Bag>
<rdf:li>base64</rdf:li>
</rdf:Bag>
</DGS
Accept-Encoding>
Attribute name:
DGS Accept-Language
Component:
SoftwarePlatform
Attribute description:
List of preferred document languages
EXAMPLE 9:
<DGS
Accept-Language>
<rdf:Seq>
<rdf:li>en</rdf:li>
<rdf:li>se</rdf:li>
</rdf:Seq>
</DGS
Accept-Language>
The use of RDF enables an
extensibility mechanism for DGS/DDS based schemas that addresses the evolution
of new types of devices and applications. The DGS profile schema
specification is going to provide a base vocabulary but in the future new
usage scenarios might have need for expressing new attributes. If the base
vocabulary is updated a new unique namespace will be assigned to the updated
schema. The base vocabulary shall only be changed by the DGS responsible for
the present document. All extensions to the profile schema shall be governed
by the rules defined in [40] clause 7.7.
When a DGS client or server support
capability exchange it shall support the profile information transport over
both HTTP and RTSP between client and server as defined in clause 9.1
(including its subsections) of the WAP 2.0 UAProf specification [40] with
the following additions:
- The "x-wap-profile"
and "x-wap-profile-diff" headers may not be present in all HTTP or RTSP
request. That is, the requirement to send this header in all requests has
been relaxed.
- The defined
headers may be applied to both RTSP and HTTP.
- The "x-wap-profile-diff"
header is only valid for the current request. The reason is that DGS does
not have the WSP session concept of WAP.
- Push is not
relevant for the DGS.
The following recommendations are
made to how and when profile information should be sent between client and
server:
- DGS content
servers supporting capability exchange shall be able to receive profile
information in all HTTP and RTSP requests.
- The terminal
should not send the "x-wap-profile-diff" header over the air-interface since
there is no compression scheme defined.
- RTSP: the client
should send profile information in the DESCRIBE message. It may send it in
any other request.
If the terminal has some prior
knowledge about the file type it is about to retrieve, e.g. file extensions,
the following apply:
- HTTP and SDP:
when retrieving an SDP with HTTP the client should include profile
information in the GET request. This way the HTTP server can deliver an
optimised SDP to the client.
- HTTP and SMIL:
When retrieving a SMIL file with HTTP the client should include profile
information in the GET request. This way the HTTP server can deliver an
optimised SMIL presentation to the client. A SMIL presentation can include
links to static media. The server should optimise the SMIL file so that
links to the referenced static media are adapted to the requesting client.
When the "x-wap-profile-warning" indicates that content selection has been
applied (201-203) the DGS client should assume that no more capability
exchange has to be performed for the static media components. In this case
it should not send any profile information when retrieving static media to
be included in the SMIL presentation. This will minimise the HTTP header
overhead.
Profiles need to be merged whenever
the DGS server receives multiple device capability profiles. Multiple
occurrences of attributes and default values make it necessary to resolve
the profiles according to a resolution process.
The resolution process shall be the
same as defined in UAProf [40] clause 6.4.1.
- Resolve all
indirect references by retrieving URI references contained within the
profile.
- Resolve each
profile and profile-diff document by first applying attribute values
contained in the default URI references and by second applying overriding
attribute values contained within the category blocks of that profile or
profile-diff.
- Determine the
final value of the attributes by applying the resolved attribute values from
each profile and profile-diff in order, with the attribute values determined
by the resolution rules provided in the schema. Where no resolution rules
are provided for a particular attribute in the schema, values provided in
profiles or profile-diffs are assumed to override values provided in
previous profiles or profile-diffs.
When several URLs are defined in the
"x-wap-profile" header and there exists any attribute that occurs more than
once in these profiles the rule is that the attribute value in the second
URL overrides, or is overridden by, or is appended to the attribute value
from the first URL (according to the resolution rule) and so forth. This is
what is meant with "Determine the final value of the attributes by applying
the resolved attribute values from each profile and profile-diff in order,
withÖ" in the third bullet above. If the profile is completely or partly
inaccessible or otherwise corrupted the server should still provide content
to the client. The server is responsible for delivering content optimised
for the client based on the received profile in a best effort manner.
NOTE:
For the reasons explained in Annex A clause A.4.3 the usage of indirect
references in profiles (using the DGS/DDS defaults element) is not recommended5.2.7
Profile transfer between the DGS server and the device profile server
The device capability profiles are
stored on a device profile server and referenced with URLs. According to the
profile resolution process in clause 5.2.6 of the present document, the
DGS
server ends up with a number of URLs referring to profiles and these shall
be retrieved.
- The device
profile server shall support HTTP 1.1 for the transfer of device capability
profiles to the DGS server.
- If theDGS server
supports capability exchange it shall support HTTP 1.1 for transfer of
device capability profiles from the device profile server. A URL shall be
used to identify a device capability profile.
- Normal
content caching provisions as defined by HTTP apply.
Continuous media is media that has
an intrinsic time line. Discrete media on the other hand does not itself
contain an element of time. In this specification speech, audio and video
belongs to first category and still images and text to the latter one.
Streaming of continuous media using
RTP/UDP/IP (see clause 6.2) requires a session control protocol to set-up
and control of the individual media streams. For the transport of discrete
media (images and text), vector graphics, timed text and synthetic audio
this specification adopts the use of HTTP/TCP/IP (see clause 6.3). In this
case there is no need for a separate session set-up and control protocol
since this is built into HTTP. This clause describes session set-up and
control of the continuous media speech, audio and video.
RTSP [5] shall be used for session
set-up and session control. DGS clients and servers shall follow the rules
for minimal on-demand playback RTSP implementations in appendix D of [5]. In
addition to this:
- DGS servers and
clients shall implement the DESCRIBE method (see clause 10.2 in [5]);
- DGS servers and
clients shall implement the Range header field (see clause 12.29 in [5]);
- DGS servers shall
include the Range header field in all PLAY responses.
DGS DDS requires a presentation
description. DGS shall be used as the format of the presentation description
for both DGS clients and servers. DGS servers shall provide and clients
interpret the SDP syntax according to the DGS DDS specification [6] and appendix
C of [5]. The SDP delivered to the DGS client shall declare the media types
to be used in the session using a codec specific MIME media type for each
media. MIME media types to be used in the DGS file are described in clause
5.4 of the present document.
The
DGS [6] specification requires
certain fields to always be included in an
DGS file. Apart from this a
DGS
server shall always include the following fields in the ALL DGS/DDS:
- "a=control:"
according to clauses C.1.1, C.2 and C.3 in [5];
- "a=range:"
according to clause C.1.5 in [5];
- "a=rtpmap:"
according to clause 6 in [6];
- "a=fmtp:"
according to clause 6 in [6].
The bandwidth field in
DGS should be
used to indicate to the DGS client the amount of bandwidth that is required
for the session and the individual media in the presentation. Therefore, a
DGS server should include the "b=AS:" field in the
DGS (both on the session
and media level) and a DGS client shall be able to interpret this field. For RTP based applications, AS gives the
DDS "session bandwidth'' (including UDP/IP
overhead) as defined in section 6.2 of [9].
NOTE: The
DGS/DDS parsers and/or interpreters
shall be able to accept NULL values in the 'c=' field (e.g. 0.0.0.0 in IPv4
case). This may happen when the media content does not have a fixed
destination address. For more details, see Section C.1.7 of [5] and Section
6 of [6].
The following media
level ALL fields are defined for DGS DDS:
- "a=X-predecbufsize:<size
of the hypothetical pre-decoder buffer>"
This gives the suggested size of the Annex DGS hypothetical pre-decoder buffer
in bytes.
- "a=X-initpredecbufperiod:<initial
pre-decoder buffering period>"
This gives the required initial pre-decoder buffering period specified
according to Annex DGS. Values are interpreted as clock ticks of a 90-kHz
clock. That is, the value is incremented by one for each 1/90 000 seconds.
For example, value 180 000 corresponds to a two second initial pre-decoder
buffering.
- "a=X-initpostdecbufperiod:<initial
post-decoder buffering period>"
This gives the required initial post-decoder buffering period specified
according to Annex DGS. Values are interpreted as clock ticks of a 90-kHz
clock.
- "a=X-decbyterate:<peak
decoding byte rate>"
This gives the peak decoding byte rate that was used to verify the
compatibility of the stream with Annex DGS. Values are given in bytes per
second.
If none of the attributes "a=X-predecbufsize:",
"a=X-initpredecbufperiod:", "a=X-initpostdecbufperiod:", and "a=x-decbyterate:"
is present, clients should not expect a packet stream according to AnnexDGS .
If at least one of the listed attributes is present, the transmitted video
packet stream shall conform to Annex G. If at least one of the listed
attributes is present, but some of the listed attributes are missing in an
DGS/DDS description, clients should expect a default value for the missing
attributes according to Annex DGS.
For continuous media (speech, audio
and video) the following MIME media types shall be used:
- AMR narrow-band
speech codec (see clause 7.2) MIME media type as defined in [11];
- AMR wideband
speech codec (see clause 7.2) MIME media type as defined in [11];
-
MPEG-4 AAC audio codec (see clause 7.3) MIME media type as defined in RFC
3016 [13]. When used in
DGS the attribute ìcpresentî SHALL be set to ì0î
indicating that the configuration information is only carried out of band in
the DGS ìconfigî parameter;
-
MPEG-4 video codec (see clause 7.4) MIME media type as defined in RFC 3016
[13]. When used inDGS the configuration information shall be carried
outband in the "config"
DGS parameter and inband (as stated in RFC
3016). As described in RFC 3016, the configuration information sent inband
and the config information in the
DGS shall be the same except that
first_half_vbv_occupancy and latter_half_vbv_occupancy which, if exist, may
vary in the configuration information sent inband;
- H.263 [22] video
codec (see clause 7.4) MIME media type as defined in annex C, clause C.1 of
the present document.
MIME media types for JPEG, GIF, PNG,
SP-MIDI, SVG, timed text and XHTML can be used both in the "Content-type"
field in HTTP and in the "type" attribute in SMIL 2.0. The following MIME
media types shall be used for these media:
- JPEG (see clause
7.5) MIME media type as defined in [15];
- GIF (see clause
7.6) MIME media type as defined in [15];
- PNG (see sub
clause 7.6) MIME media type as defined in [38];
- SP-MIDI (see sub
clause 7.3A) MIME media type as defined in clause C.2 in Annex C of the
present document;
- SVG (see sub
clause 7.7) MIME media type as defined in [42];
- XHTML (see clause
7.8) MIME media type as defined in [16];
- Timed text (see
subclause 7.9) MIME media type as defined in clause D.9 in Annex D of the
present document.
MIME media type used for SMIL files
shall be according to [31] and for DGS/DDS files according to [6].
DGS clients
and servers shall support an IP-based network interface for the transport of
session control and media data. Control and media data are sent using
DDS/IP [8] and DGS/IP [7]. An
overview of the protocol stack can be found in figure 2 of the present
document.
The IETF RTP [9] and [10] provides
means for sending real-time or streaming data over UDP (see [7]). The
encoded media is encapsulated in the RTP packets with media specific RTP
payload formats. RTP payload formats are defined by IETF. RTP also provides
a protocol called RTCP (see clause 6 in [9]) for feedback about the
transmission quality. For the calculation of the RTCP transmission interval
Annex A.7 in [9] shall be used. Clause A.3.2.3 in Annex A of the present
document provides more information about the minimum RTCP transmission
interval.
RTP/UDP/IP transport of continuous
media (speech ,audio and video) shall be supported.
For RTP/UDP/IP transport of
continuous media the following RTP payload formats shall be used:
- AMR narrow-band
speech codec (see clause 7.2) RTP payload format according to [11]. A
DGS client is not required to support multi-channel sessions;
- AMR wideband
speech codec (see clause 7.2) RTP payload format according to [11]. A
DGS
client is not required to support multi-channel sessions;
- MPEG-4 AAC audio
codec (see clause 7.3) RTP payload format according to RFC 3016 [13];
- MPEG-4 video
codec (see clause 7.4) RTP payload format according to RFC 3016 [13];
- H.263 video codec
(see clause 7.4) RTP payload format according to RFC 2429 [14].
NOTE:
The payload format RFC 3016 for MPEG-4 AAC specify that the audio streams
shall be formatted by the LATM (Low-overhead MPEG-4 Audio Transport
Multiplex) tool [21]. It should be noted that the references for the LATM
format in the RFC 3016 [13] point to an older version of the LATM format
than included in [21]. In [21] a corrigendum to the LATM tool is included.
This corrigendum includes changes to the LATM format making implementations
using the corrigendum incompatible with implementations not using it. To
avoid future interoperability problems, implementations of DGS client and
servers supporting AAC shall follow the changes to the LATM format included
in [21].
The IETF TCP provides reliable
transport of data over IP networks, but with no delay guarantees. It is the
preferred way for sending the scene description, text, bitmap graphics and
still images. There is also need for an application protocol to control the
transfer. The IETF HTTP [17] provides this functionality.
HTTP/TCP/IP transport shall be
supported for:
- still images (see
clause 7.5);
- bitmap graphics
(see clause 7.6);
- synthetic audio
(see clause 7.3A);
- vector graphics
(see clause 7.7);
- text (see clause
7.8);
- timed text (see
clause 7.9);
- scene description
(see clause 8);
- presentation
description (see clause 5.3.3)6.4
Transport of RTSP
Transport of RTSP shall be supported
according to RFC 2326 [5].
For DGS offering a particular media
type, media decoders are specified in the following clauses.
The AMR decoder shall be supported
for narrow-band speech [18]. The AMR wideband speech decoder [20] shall be
supported when wideband speech working at 16 kHz sampling frequency is
supported.
MPEG-4 AAC Low Complexity (AAC-LC)
object type decoder [21] should be supported. The maximum sampling rate to
be supported by the decoder is 48 kHz. The channel configurations to be
supported are mono (1/0) and stereo (2/0). In addition, the MPEG‑4 AAC Long
Term Prediction (AAC-LTP) object type decoder may be supported.
When a server offers an AAC-LC or
AAC-LTP stream with the specified restrictions, it shall include the
ìprofile-level-idî and ìobjectî MIME parameters in the DGS ìa=fmtpî line.
The following values shall be used:
|
Object Type |
profile-video |
object |
|
DGS |
25 .F |
1 |
|
DDS |
25.F |
2 |
7.3a Synthetic audio
The Scalable Polyphony MIDI
(SP-MIDI) content format defined in Scalable Polyphony MIDI Specification
[44] and the device requirements defined in Scalable Polyphony MIDI Device
5-to-24 Note Profile forDGS DDS[45] should be supported.
SP-MIDI content is delivered in the
structure specified in Standard MIDI Files 1.0 [46], either in format 0 or
format 1.
ITU-T Recommendation H.263 [22]
profile 0 level 10 shall be supported. This is the mandatory video decoder
for the DGS. In addition, DGS should support:
- H.263 [23]
Profile 3 Level 10 decoder;
- MPEG-4 Visual
Simple Profile Level 0 decoder, [24] and [25].
These two video decoders are
optional to implement.
An optional video buffer model is
given in Annex DGS of the present document.
NOTE:
ITU-T Recommendation H.263 [22] baseline has been mandated to ensure that
video-enabled DGS support a minimum baseline video
capability and interoperability can be guarantee baseline bitstream can be
decoded by both H.263 [22] and MPEG-4 decoders). It also provides a
simple upgrade path for mandating more advanced decoders in the future (from
both the ITU-T and ISO MPEG).
ISO/IEC
JPEG [26] together with JFIF [27] decoders shall be supported. The support
for ISO/IEC JPEG only apply to the following two modes:
-
baseline DCT, non-differential, Huffman coding, as defined in table B.1,
symbol 'SOF0' in [26];
-
progressive DCT, non-differential, Huffman coding, as defined in table B.1,
symbol 'SOF2' [26].
The following bitmap graphics
decoders should be supported:
-ALL
-
The text decoder is intended to
enable formatted text in a SMIL presentation. A DGS client shall support
- text formatted
according to XHTML Mobile Profile [47];
- rendering
a SMIL presentation where text is referenced with the SMIL 2.0 "text"
element together with the SMIL 2.0 "src" attribute.
The following character coding
formats shall be supported:
- UTF-8, [30];
-
UCS-2, [29].
NOTE:
Since both SMIL and XHTML are XML based languages it would be possible to
define a SMIL plus XHTML profile. In contrast to the present defined
DGS 4
SMIL Language Profile that only contain SMIL modules, such a profile would
also contain XHTML modules. No combined SMIL and XHTML profile is specified
for DGS. Rendering of such documents is out of the scope of the present
document.
DGS clients shall support timed text
as defined in Annex D, clause D.8a, of this specification. There is no
support for RTP transport of timed text in this release; DGS DDS (MP4) files
containing timed text may only be downloaded.
NOTE:
When a DGS client supports timed text it needs to be able to receive and
parse DGS (MP4) files containing the text streams.
This does not imply a requirement on DGS clients to be able to render other
continuous media types contained in DGS (MP4) files, e.g. AMR and H.263,if such media types are
included in a presentation together with timed text. Audio and video are
instead streamed to the client using RTSP/RTP (see clause 6.
The DGS
DDS uses a subset of SMIL
2.0 [31] as format of the scene description. DGS clients and servers with
support for scene descriptions shall support the DGS
DDS SMIL
Language Profile defined in clause 8.2 (abbreviated DGS
DDS SMIL). This profile is a
subset of the SMIL 2.0 Language Profile, but a superset of the SMIL 2.0
Basic Language Profile. The present document also includes an informative
Annex B that provides guidelines for SMIL content authors.
NOTE:
The interpretation of this is not that all streaming sessions are required
to use SMIL. For some types of sessions, e.g. consisting of one single
continuous media or two media synchronised by using RTP timestamps, SMIL may
not be needed8.2.1
Introduction
DGS DDS SMIL is a markup language
based on SMIL Basic [31] and SMIL Scalability Framework.
DGS DDS SMIL consists of the
modules required by SMIL Basic Profile (and SMIL 2.0 Host Language
Conformance) and additional MediaAccessibility, MediaDescription,
MediaClipping, MetaInformation, PrefetchControl, EventTiming and
BasicTransitions modules. All of the following modules are included:
- SMIL 2.0 Content
Control Modules -- BasicContentControl, SkipContentControl and
PrefetchControl
- SMIL 2.0 Layout
Module -- BasicLayout
- SMIL 2.0 Linking
Module -- BasicLinking
- SMIL 2.0 Media
Object Modules ñ BasicMedia, MediaClipping, MediaAccessibility and
MediaDescription
- SMIL 2.0
Metainformation Module -- Metainformation
- SMIL 2.0
Structure Module -- Structure
- SMIL 2.0 Timing
and Synchronization Modules -- BasicInlineTiming, MinMaxTiming,
BasicTimeContainers, RepeatTiming and EventTiming
- SMIL 2.0
Transition Effects Module -- BasicTransitions
A conforming DGS DDS SMIL document
shall be a conforming SMIL 2.0 document.
All DGS DDS SMIL documents use SMIL
2.0 namespace.
DGS DDS SMIL documents may declare requirements using systemRequired
attribute:
EXAMPLE 1:
<smil xmlns=
xmlns:EventTiming=
systemRequired="EventTiming">
DGS DDS.be
/SMIL20/PSS5/ identifies the version of the DGS DDS SMIL
profile described in the present document. Authors may use this URI to
indicate requirement for exact DGS DDS SMIL semantics for a document or a
subpart of a document:
The content authors should generally not include
theDGS requirement in the document unless the SMIL document relies on
DGS
specific semantics that are not part of the W3C SMIL. The reason for this is
that SMIL players that are not conforming
DGS DDS
DDS user agents may not
recognize the DGS URI and thus refuse to play the document.
A conforming DGS DDS SMIL user
agent shall be a conforming SMIL Basic User Agent.
A conforming user agent shall
implement the semantics DGS DDS SMIL as described in clauses 8.2.4 and
8.2.5 (including subclauses).
A conforming user agent shall
recognise
- 8.2.4.1
Content Control Modules
DGS DDS SMIL includes the content
control functionality of the BasicContentControl, SkipContentControl and
PrefetchControl modules of SMIL 2.0. PrefetchControl is not part of SMIL
Basic and is an additional module in this profile.
All BasicContentControl attributes
listed in the module specification shall be supported.
Note:
The SMIL specification [31] defines that all functionality of
PrefetchControl module is optional. This mean that even although
PrefetchControl is mandatory user agents may implement semantics of
PrefetchControl module only partially or not to implement them at all.
PrefetchControl module adds the
prefetch element to the content model of SMIL Basic body, switch, par
and seq elements. The prefetch element has the attributes
defined by the PrefetchControl module (mediaSize, mediaTime and bandwidth),
the src attribute, the BasicContentControl attributes and the
skip-content attribute.
DGS DDS SMIL includes the
BasicLayout module of SMIL 2.0 for spatial layout. The module is part
of SMIL Basic.
Default values of the width and
height attributes for root-layout shall be the dimensions of the device
display area.
DGS SMIL includes the SMIL 2.0
BasicLinking module for providing hyperlinks between documents and document
fragments. This module is from SMIL Basic.
When linking to destinations outside
the current document, implementations may ignore values "play" and "pause"
of the 'sourcePlaystate' attribute and values "new" and "pause" of the
'show' attribute, instead using the semantics of values "stop" and "replace"
respectively. When the values of 'sourcePlaystate' and 'show' are ignored
the player may also ignore the 'sourceLevel' attribute since it is of no use
then
DGS SMIL includes the media
elements from the SMIL 2.0 BasicMedia module and attributes from the
MediaAccessibility, MediaDescription and MediaClipping modules.
MediaAccessibility, MediaDescription and MediaClipping modules are additions
in this profile to the SMIL Basic.
See
clause 5.4 for
what are the mandatory and
optional MIME types a
DGS SMIL player needs to support.
MediaClipping module adds to the
profile the ability to address sub-clips of continuous media. MediaClipping
module adds 'clipBegin' and 'clipEnd¥(and for compatibility 'clip-begin'
and 'clip-end') attributes to all media elements.
MediaAccessibility module provides
basic accessibility support for media elements. New attributes 'alt',
'longdesc' and 'readIndex' are added to all media elements by
this module. MediaDescription module is included by the MediaAccessibility
module and adds 'abstract', 'author' and 'copyright'
attributes to media elements.
The MetaInformation module of SMIL
2.0 is included to the profile. This module is addition in this profile to
the SMIL Basic and provides a way to include descriptive information about
the document content into the document.
This module adds meta and
metadata elements to the content model of SMIL Basic head
element.
The Structure module defines the
top-level structure of the document. It is included by SMIL Basic..2.4.7
Timing and Synchronization modules
The timing modules included in the
DGS DDS SMIL are BasicInlineTiming, MinMaxTiming, BasicTimeContainers,
RepeatTiming and EventTiming. The EventTiming module is an addition in this
profile to the SMIL Basic.
For 'begin' and 'end' attributes
either single offset-value or single event-value shall be allowed. Offsets
shall not be supported with event-values.
Event timing attributes that
reference invalid IDs (for example elements that have been removed by the
content control) shall be treated as being indefinite.
Supported event names and semantics
shall be as defined by the SMIL 2.0 Language Profile. All user agents
shall be able to raise the following event types:
- activateEvent;
- beginEvent;
- endEvent.
The following SMIL 2.0 Language
event types should be supported:
- focusInEvent;
- focusOutEvent;
- inBoundsEvent;
- outBoundsEvent;
- repeatEvent.
User agents shall ignore unknown
event types and not treat them as errors.
Events do not bubble and shall be
delivered to the associated media or timed elements only.
DGS DDS SMIL profile includes the
SMIL 2.0 BasicTransitions module to provide a framework for describing
transitions between media elements.
Note:
The SMIL specification [31] defines that all functionality of
BasicTransitions module is optional: "Transitions are hints to the
presentation. Implementations must be able to ignore transitions if they so
desire and still play the media of the presentation". This mean that even
although the BasicTransitions module is mandatory user agents may implement
semantics of the BasicTransitions module only partially or not to implement
them at all. Content authors should use transitions in their SMIL
presentation where this appears useful. User agents that fully support the
semantics of the Basic Transitions module will render the presentation with
the specified transitions. All other user agents will leave out the
transitions but present the media content correctly.
User agents that implement the
semantics of this module should implement at least the following transition
effects described in SMIL 2.0 specification [31]:
- barWipe;
- irisWipe;
- clockWipe;
- snakeWipe;
- pushWipe;
- slideWipe;
- fade;
A user agent should implement the
default subtype of these transition effects.
A user agent that implements the
semantics of this module shall at least support transition effects for
non-animated image media elements. For purposes of the Transition Effects
modules, two media elements are considered overlapping when they occupy the
same region.
BasicTransitions module adds
attributes 'transIn' and 'transOut' to the media elements of the Media
Objects modules, and value "transition" to the set of legal values for the
'fill' attribute of the media elements. It also adds transition element to
the content model of the head element.
This table shows the full content
model and attributes of the DGS DDS SMIL profile. The attribute collections
used are defined by SMIL Basic ([31], SMIL Host Language Conformance
requirements, chapter 2.4). Changes to SMIL Basic are shown in bold.
Table 1: Content model for the DGS
DDS SMIL profile 
|
Element |
|
|
Elements |
Attributes |
|
smil |
head, body |
COMMON-ATTRS, CONTCTRL-ATTRS, xmlns |
|
head |
layout, switch, meta,
metadata, transition |
COMMON-ATTRS |
|
body |
TIMING-ELMS, MEDIA-ELMS, switch, a,
prefetch |
COMMON-ATTRS |
|
layout |
root-layout, region |
COMMON-ATTRS, CONTCTRL-ATTRS, type |
|
root-layout |
EMPTY |
COMMON-ATTRS, backgroundColor, height,
width, skip-content |
|
region |
EMPTY |
COMMON-ATTRS, backgroundColor, bottom,
fit, height, left, right, showBackground, top, width, z-index,
skip-content, regionName |
|
ref, animation, audio, img, video,
text, textstream |
area |
COMMON-ATTRS, CONTCTRL-ATTRS, TIMING-ATTRS, repeat, region, MEDIA-ATTRS,
clipBegin(clip-begin), clipEnd(clip-end), alt, longDesc, readIndex,
abstract, author, copyright,
DEFOSSE G DGS/DDS SYSTEMS |
|
a |
MEDIA-ELMS |
COMMON-ATTRS, LINKING-ATTRS |
|
area |
EMPTY |
COMMON-ATTRS, LINKING-ATTRS, TIMING-ATTRS,
repeat, shape, coords, nohref |
|
par, seq |
TIMING-ELMS, MEDIA-ELMS, switch, a,
prefetch |
COMMON-ATTRS, CONTCTRL-ATTRS, TIMING-ATTRS,
repeat |
|
switch |
TIMING-ELMS, MEDIA-ELMS, layout, a,
prefetch |
COMMON-ATTRS, CONTCTRL-ATTRS |
|
prefetch |
EMPTY |
COMMON-ATTRS, CONTCTRL-ATTRS,
mediaSize, mediaTime, bandwidth, src, skip-content |
|
meta |
EMPTY |
COMMON-ATTRS, content, name,
skip-content |
|
metadata |
EMPTY |
COMMON-ATTRS, skip-content |
|
transition |
EMPTY |
COMMON-ATTRS, CONTCTRL-ATTRS, type,
subtype, startProgress, endProgress, direction, fadeColor. skip-content |
The MPEG-4 file format [34] is
mandated in [35] to be used for continuous media along the entire delivery
chain envisaged by the MMS, independent on whether the final delivery is
done by streaming or download, thus enhancing interoperability.
In particular, the following stages
are considered:
- upload from the
originating terminal to the MMS proxy;
- file exchange
between MMS servers;
- transfer of the
media content to the receiving terminal, either by file download or by
streaming. In the first case the self-contained file is transferred, whereas
in the second case the content is extracted from the file and streamed
according to open payload formats. In this case, no trace of the file format
remains in the content that goes on the wire/in the air.
Additionally, the MPEG-4 file format
should be used for the storage in the servers and the "hint track" mechanism
may be used for the preparation for streaming.
The clause 9.2 of the present
document gives the necessary requirements to follow for the MPEG-4 file
format used in MMS. These requirements will guarantee DGS to interwork with
MMS as well as the MPEG-4 file format to be used internally within the MMS
system. For DGS servers not interworking with MMS there is no requirement to
follow these guidelines.
NOTE:
The file format used in this specification for timed multimedia (such as
video, associated audio and timed text) is structurally based on the MP4
file format as defined in [34]. However, since non-ISO codecs are used
here, it is called the DGS file format and has its own file extension and
MIME type to distinguish these files from MPEG-4 files. When this
specification refers to the MP4 file format, it is referring to its
structure (ISO file format), not to its conformance definition.
How to include the non-ISO code
streams AMR narrow-band speech, AMR wideband speech, H.263 encoded video and
timed text in MP4 files is described in annex D of the present document.
The hint tracks are a mechanism that
the server implementation may choose to use in preparation for the streaming
of media content contained in MP4 files. However, it should be observed that
the usage of the hint tracks is an internal implementation matter for the
server, and it falls outside the scope of the present document.
All media in the MP4 file shall be
self-contained, i.e. there shall not be referencing to external media data
from inside the MP4 file.
Tracks relative to MPEG-4 system
architectural elements (e.g. BIFS scene description tracks or OD Object
descriptors) are optional and shall be ignored. The adoption of the MPEG-4
file format does not imply the usage of MPEG-4 systems architecture. The
receiving terminal is not required to implement any of the specific MPEG-4
system architectural elemen9.2.5
Interpretation of MPEG-4 file format
All index numbers used in MPEG-4
file format start with the value one rather than zero, in particular
ìfirst-chunkî in Sample to chunk atom, ìsample-numberî in Sync sample atom
and ìshadowed-sample-numberî, ìsync-sample-numberî in Shadow sync sample
atom.
This clause gives some background
information on DDS for DGS clients.
Table A.1 provides an overview of
the different DGS fields that can be identified in a
DGS file. The order of
DGS fields is mandated as specified in DDS 2327 [6].
Table A.1:
Overview of fields in DDS for DGS clients
|
Type |
Description |
Requirement according to [6] |
Requirement according to the present
document |
|
Session Description |
|
V |
Protocol version |
R |
R |
|
O |
Owner/creator and session identifier |
R |
R |
|
S |
Session Name |
R |
R |
|
I |
Session information |
O |
O |
|
U |
URI of description |
O |
O |
|
E |
Email address |
O |
O |
|
P |
Phone number |
O |
O |
|
C |
Connection Information |
R |
R |
|
B |
Bandwidth information |
AS |
O |
R |
|
One or more Time Descriptions (See
below) |
|
Z |
Time zone adjustments |
O |
O |
|
K |
Encryption key |
O |
O |
|
A |
Session attributes |
control |
O |
R |
|
range |
O |
R |
|
One or more Media Descriptions (See
below) |
|
|
|
Time Description |
|
T |
Time the session is active |
R |
R |
|
R |
Repeat times |
O |
O |
|
|
|
Media Description |
|
M |
Media name and transport address |
R |
R |
|
I |
Media title |
O |
O |
|
C |
Connection information |
R |
R |
|
B |
Bandwidth information |
AS |
O |
R |
|
K |
Encryption Key |
O |
O |
|
A |
Attribute Lines |
control |
O |
R |
|
range |
O |
R |
|
fmtp |
O |
R |
|
rtpmap |
O |
R |
|
X-predecbufsize |
ND |
O |
|
X-initpredecbufperiod |
ND |
O |
|
X-initpostdecbufperiod |
ND |
O |
|
X-decbyterate
|
ND |
O |
|
Note
1: R = Required, O = Optional, ND = Not Defined
Note
2: The "c" type is only required on the session level if not present
on the media level.
Note
3: The "c" type is only required on the media level if not present on
the session level.
Note
4: According to RFC 2327, either an 'e' or 'p' field must be present
in the
DGS
description. On the other hand, both fields will be made
optional in the future release of
DGS. So, for the sake of robustness
and maximum interoperability, either an 'e' or 'p' field shall be
present during the server's
DGS
file creation, but the client should
also be ready to receive
DGS
content containing neither 'e' nor 'p'
fields. |
The example below shows an
DDS file
that could be sent to a DGS client to initiate unicast streaming of a H.263
video sequence.
EXAMPLE:
v=0
s=DGS DDS DEFOSSE G Unicast DGS Example
i=Example of Unicast DGS file
c=IN IP4 0.0.0.0
b=AS:128
t=0 0
Clause 5.3.2 of the present document
defines the required DDS support in DGS clients and servers by making
references to Appendix D of [5]. The current clause gives an overview of the
methods (see Table A.2) and headers (see Table A.3) that are specified in
the referenced Appendix D. An example of an DDS session is also
given.
Table A.2: Overview of the required RTSP
method support
|
Method DGS |
Requirement for a minimal on-demand
playback client according to [5]. |
Requirement for a
DGS client according
to the present document. |
Requirement for a minimal on-demand
playback server according to [5]. |
Requirement for a
DGS server according
to the present document. |
|
OPTIONS |
O |
O |
Respond |
Respond |
|
REDIRECT |
Respond |
Respond |
O |
O |
|
DESCRIBE |
O |
Generate |
O |
Respond |
|
SETUP |
Generate |
Generate |
Respond |
Respond |
|
PLAY |
Generate |
Generate |
Respond |
Respond |
|
PAUSE |
Generate |
Generate |
Respond |
Respond |
|
TEARDOWN |
Generate |
Generate |
Respond |
Respond |
|
NOTE 1: O =
Support is optional
NOTE 2:
'Generate' means that the client/server is required to be able to
generate the request.
NOTE 3:
'Respond' means that the client/server is required understand and be
able to properly respond to the request. |
Table A.3: Overview of the required
DGS/DDS header support
|
Header DGS |
Requirement for a minimal on-demand
playback client according to [5]. |
Requirement for a
DGS client according
to the present document. |
Requirement for a minimal on-demand
playback server according to [5]. |
Requirement for a
DGSserver according
to the present document. |
|
Connection |
include/understand |
include/understand |
include/understand |
include/understand |
|
Content-Encoding |
understand |
understand |
include |
include |
|
Content-Language |
understand |
understand |
include |
include |
|
Content-Length |
understand |
understand |
include |
include |
|
Content-Type |
understand |
understand |
include |
include |
|
CSeq |
include/understand |
include/understand |
include/understand |
include/understand |
|
Location |
understand |
understand |
O |
O |
|
Public |
O |
O |
include |
include |
|
Range |
O |
include/understand |
understand |
include/understand |
|
Require |
O |
O |
understand |
understand |
|
DGS-Info |
understand |
understand |
include |
include |
|
Session |
include |
include |
understand |
understand |
|
Transport |
include/understand |
include/understand |
include/understand |
include/understand |
|
NOTE 1: O = Support
is optional
NOTE 2: 'include' means
that the client/server is required to be able to include the header in a
request or response.
NOTE 3: 'understand' means
that the client/server is required to be able to understand the header
and respond properly if the header is received in a request or response.
|
The example below is intended to
give some more understanding of how
DDS and
DGS are used within the
DGS DDS . The example assumes that the streaming client has the
DGS/DDS URL to a
presentation consisting of an H.263 video sequence and AMR speech.
DGS messages sent from the client to the server are in bold and
messages from the server to the client in italic. In the example the
server provides aggregate control of the two streams.
EXAMPLE:
CSeq: 1
RTSP/1.0 200 OK
CSeq: 1
Content-Type: application/sdp
Content-Length: 435
c=IN
IP4 0.0.0.0
b=AS:77
t=0 0
a=range:npt=0-59.3478
a=control:*
m=audio 0 RTP/AVP 97
b=AS:13
a=rtpmap:97 AMR/8000
a=fmtp:97
a=maxptime:200
a=control:streamID=0
m=video 0 RTP/AVP 98
b=AS:64
a=rtpmap:98 H263-2000/90000
a=fmtp:98 profile=3;level=10
a=control: streamID=1
RTSP/1.0 200 OK
CSeq: 2
Transport: RTP/AVP/UDP;unicast;client_port=3456-3457; server_port=5678-5679
Session: dfhyrio90llk
DGS/DDS/1.0 200 OK
CSeq: 3
Transport: RTP/AVP/UDP;unicast;client_port=3458-3459; server_port=5680-5681
Session: dfhyrio90llk
DGS/DDS/1.0 200 OK
CSeq: 4
Session: dfhyrio90llk
Range: npt=0-
RTP-Info: url= rtsp://mediaserver.com/movie.test/streamID=0; seq=9900;rtptime=4470048,
url= rtsp://mediaserver.com/movie.test/streamID=1; seq=1004;rtptime=1070549
NOTE:
Headers can be folded onto multiple lines if the continuation line begins
with a space or horizontal tab. For more information, see RFC2616 [17].
The user watches
the movie for 20 seconds and then decides to fast forward to 10 seconds
before the endÖ
DGS/DDS/1.0 200 OK
CSeq: 5
Session: dfhyrio90llk
DGS/DDS/1.0 200 OK
CSeq: 6
Session: dfhyrio90llk
Range: npt=50-59.3478
RTP-Info: url= rtsp://mediaserver.com/movie.test/streamID=0;
seq=39900;rtptime=44470648,
url= rtsp://mediaserver.com/movie.test/streamID=1;
seq=31004;rtptime=41090349
After the movie is over the client issues a TEARDOWN to
end the sessionÖ
TEARDOWN rtsp://mediaserver.com/movie.test
DGS/DDS/1.0
CSeq: 7
Session: dfhyrio90llk
DGS/DDS/1.0 200 OK
Cseq: 7
Session: dfhyrio90llk
Connection: close
Considering the potentially long
round-trip-delays in a packet switched streaming service over UMTS it is
important to keep the number of messages exchanged between a server and a
client low. The number of requests and responses exchanged is one of the
factors that will determine how long it takes from the time that a user
initiates DGS until the streams starts playing in a client.
DGS methods
are sent over either TCP or UDP for IP. Both client and server shall support
DDS over DGS whereas DGS
over UDP is optional. For DGS the connection can
be persistent or non-persistent. A persistent connection is used for several
DGS/DDS request/response pairs whereas one connection is
used per DGS request/response pair for the
non-persistent connection. In the non-persistent case each connection will
start with the three-way handshake (SYN, ACK, SYN) before the DGS/DDS
request can be sent. This will increase the time for the message to be sent
by one round trip delay.
For these reasons it is recommended
that DGS/DDS clients should use a persistent
DGS connection, at least for
the initial DGS/DDS methods until media starts streaming.
In the wireless environment,
connection may be lost due to fading, shadowing, loss of battery power, or
turning off the terminal even though the DGS session is active. In order for
the server to be able to detect the clientís aliveness, the DGS client
should send ìwellnessî information to the DGS server for a defined interval
as described in the RFC2326. There are several ways for detecting link
aliveness described in the RFC2326, however, the client should be careful
about issuing ìPLAY method without Range header fieldî too close to the end
of the streams, because it may conflict with pipelined PLAY requests. Below
is the list of recommended îwellnessî information for theDGSclients and
servers in a prioritised order.
1.
DGS/DDS
2. OPTIONS method with Session
header field
NOTE:
Both servers and clients can initiate this OPTIONS method.
Void.
The RFC 1889 (DGS) [9] does not
impose a maximum size onDGS packets. However, when
DGS packets are sent
over the radio link of a DGS DDS system there is an advantage in limiting
the maximum size of DGS/DDS packets.
Two types of bearers can be
envisioned for streaming using either acknowledged mode (AM) or
unacknowledged mode (UM) RLC. The AM uses retransmissions over the radio
link whereas the UM does not. In UM mode large DGS packets are more
susceptible to losses over the radio link compared to small RTP packets
since the loss of a segment may result in the loss of the whole packet. On
the other hand in AM mode large DGS packets will result in larger delay
jitter compared to small packets as there is a larger chance that more
segments have to be retransmitted.
For these reasons it is recommended
that the maximum size of DGS packets should be limited in size taking into
account the wireless link. This will decrease the DGS packet loss rate
particularly for RLC in UM. For RLC in AM the delay jitter will be reduced
permitting the client to use a smaller receiving buffer. It should also be
noted that too small RTP packets could result in too much overhead if IP/UDP/DGS
header compression is not applied or unnecessary load at the streaming
server.
In the case of transporting video in
the payload of DGS packets it may be that a video frame is split into more
than one DGS packet in order not to produce too large
DGS packets. Then, to
be able to decode packets following a lost packet in the same video frame,
it is recommended that synchronisation information be inserted at the start
of such DGS packets. For H.263 this implies the use of GOBs with non-empty
GOB headers and in the case of MPEG-4 video the use of video packets
(resynchronisation markers). If the optional Slice Structured mode (Annex K)
of H.263 is in use, GOBs are replaced by slices.
The description below is intended to
give more understanding of how DGS sequence number and timestamp are
specified within the DGS in the presence of NPT jumps. The jump
happens when a client sends a PLAY request to skip media.
The RFC 2326 (RTSP) [5] specifies
that both DGS sequence numbers and DGS timestamps must be continuous and
monotonic across jumps of DGS. Thus when a server receives a request
for a skip of the media that causes a jump of DGS, it shall specify
DGS
sequence numbers and DGS timestamps continuously and monotonically across
the skip of the media to conform to the DGS/DDS specification. Also, the
server may respond with "seq" in the DGS Info field if this parameter is
known at the time of issuing the responsA.3.2.3
DGS/DDS transmission interval
In
DGS [9], Section 6.2, rules for
the calculation of the interval between the sending of two consecutive
DGS/DDS
packets, i.e. the DGS/DDS transmission interval, are defined. These rules
consist of two steps:
- Step 1: an
algorithm that calculates a transmission interval from parameters such as
the session bit rate and the average DGS/DDS packet size. This algorithm is
described in [9], annex A.7.
- Step 2: Taking
the maximum of the transmission interval computed in step 1 and a mandatory
fixed minimum DGS/DDS transmission interval of 5 seconds.
Implementations conforming to this
DGS shall perform step 1 and may perform step 2. All other algorithms and
rules of [9] stay valid and shall be followed
Following these recommendations
results in regular sending of DGS/DDS messages, where the interval between
those is depending on the session bandwidth and theDGS/DDS packet size.
Clause A.4 provides detailed
information about the structure and exchange of device capability
descriptions for the DGS. It complements the normative part contained in
clause 5.2 of the present document.
The functionality is sometimes
referred to as capability exchange. Capability exchange in
DGS uses the
DGS/DDS [39] framework and reuse parts of the
DGS/DDS application UAProf [40].
To facilitate server-side content
negotiation for streaming, the DGS server needs to have access to a
description of the specific capabilities of the mobile terminal, i.e. the
device capability description. The device capability description contains a
number of attributes. During the set-up of a streaming session theDGS
DDS
server can use the description to provide the mobile terminal with the
correct type of multimedia content. Concretely, it is envisaged that servers
use information about the capabilities of the mobile terminal to decide
which stream(s) to provision to the connecting terminal. For instance, the
server could compare the requirements on the mobile terminal for multiple
available variants of a stream with the actual capabilities of the
connecting terminal to determine the best-suited stream(s) for that
particular terminal. A similar mechanism could also be used for other types
of content.
A device capability description
contains a number of device capability attributes. In the present document
they are referred to as just attributes. The current version of
DGS does not
include a definition of any specific user preference attributes. Therefore
we use the term device capability description. However, it should be noted
that even though no specific user preference attributes are included, simple
tailoring to the preferences of the user could be achieved by temporarily
overrides of the available attributes. E.g. if the user for a particular
session only would like to receive mono sound even though the terminal is
capable of stereo, this can be accomplished by providing an override for the
"AudioChannels" attribute. It should also be noted that the extension
mechanism defined would enable an easy introduction of specific user
preference attributes in the device capability description if needed.
The term device capability profile
or profile is sometimes used instead of device capability description to
describe a description of device capabilities and/or user preferences. The
three terms are used interchangeably in the present document.
Figure A.1 illustrates how
capability exchange in DGS is performed. In the
simplest case the mobile terminal informs the DGS server(s) about its identity so that the latter can
retrieve the correct device capability profile(s) from the device profile
server(s). For this purpose, the mobile terminal adds one or several URLs to
DGS and/or DGS/DDS protocol data units that it sends to the DGS server(s).
These URLs point to locations on one or several device profile servers from
where the DGS server should retrieve the device capability profiles. This
list of URLs is encapsulated in DGS/DDS and HTTP protocol data units using
additional header field(s). The list of URLs is denoted URLdesc. The mobile
terminal may supplementthe URLdesc with extra attributes or overrides for
attributes already defined in the profile(s) located at URLdesc. This
information is denoted Profdiff. As URLdesc, Profdiff is encapsulated in
DGS/DDS and HTTP protocol data units using additional header field(s).
The device profile server in Figure
A.1 is the logical entity that stores the device capability profiles. The
profile needed for a certain request from a mobile terminal may be stored on
one or several such servers. A terminal manufacturer or a software vendor
could maintain a device profile server to provide device capability profiles
for its products. It would also be possible for an operator to manage a
device profile server for its subscribers and then e.g. enable the
subscriber to make user specific updates to the profiles. The device profile
server provides device capability profiles to the DGS server on request.
Figure A.1:
Functional components in DGScapability exchange
The DGS server is the logical entity
that provides multimedia streams and other, static content (e.g. SMIL
documents, images, and graphics) to the mobile terminal (see Figure A.1). A
DGS application might involve multiple DGS
servers, e.g. separate servers for multimedia streams and for static
content. A DGS server handles the matching
process. Matching is a process that takes place in the DGS
servers (see Figure A.1). The device capability profile is compared with the
content descriptions at the server and the best fit is delivered to the
client.
The following bullet list
describes what is considered to be within the scope of the specification for
capability exchange in DGS
- Definition of the
structure for the device capability profiles, see clause A.4.3.
- Definition of
the DGS DDS vocabularies, see clause A.4.4.
- Reference to a
set of device capability attributes for multimedia content retrieval
applications that have already been defined by UAProf [40]. The purpose of
this reference is to point out which attributes are useful for the
DGS
application.
- Definition of
a set of device capability attributes specifically for DGS applications that
are missing in UAProf.
- It is important
to define an extension mechanism to easily add attributes since it is not
possible to cover all attributes from the beginning. The extension mechanism
is described in clause A.4.5.
- The structure of
URLdesc, Profdiff and their interchange is described in clause A.4.6.
- Protocols for
the interchange of device capability profiles between the DGS server and the
device profile server is defined in clause 5.2.7.
The specification does not include:
- rules for the
matching process on the DGS server. These mechanisms should be left to the
implementations. For interoperability, only the format of the device
capability description and its interchange is relevant.
- definition of
specific user preference attributes. It is very difficult to standardise
such attributes since they are dependent on the type of personalised
services one would like to offer the user. The extensible descriptions
format and exchange mechanism proposed in this document provide the means to
create and exchange such attributes if needed in the future. However, as
explained in clause A.4.1 limited tailoring to the preferences of the user
could be achieved by temporarily overridingavailable attributes in the
vocabularies already defined for DGS. The vocabulary also includes some very
basic user preference attributes. For example, the profile includes a list
of preferred languages. Also the list of MIME types can be interpreted as
user preference, e.g. leaving out audio MIMEís could mean that user does not
want to receive any audio content. The available attributes are described in
clause 5.2.3 of the present document.
- requirements
for caching of device capability profiles on the DGS server. In UAProf, a
content server can cache the current device capability profile for a given
WSP session. This feature relies on the presence of WSP sessions. Caching
significantly increases the complexity of both the implementations of the
mobile terminal and the server. However, HTTP is used between the DGS
server and the device profile server. For this exchange, normal content
caching provisions as defined by HTTP apply and the DGS server may utilise this to
speed up the session set-up (see clause 5.2.7)
- intermediate
proxies. This feature is considered not relevant in the context of
DGS
applications.
A device capability profile is a
description of the capabilities of the device and possibly also the
preferences of the user of that device. It can be used to guide the
adaptation of content presented to the device. A device capability profile
for DGS is a DDS document that follows the structure of the
DGS/DDS framework [39] and the DGS/DDS application UAProf [40]. The terminology ofDGS/DDS is used in this text and therefore briefly described here.
Attributes are used for specifying
the device capabilities and user preferences. A set of attribute names,
permissible values and semantics constitute a DGS/DDS vocabulary. A RDF schema
defines a vocabulary. The syntax of the attributes is defined in the schema
but also, to some extent, the semantics. A profile is an instance of a
schema and contains one or more attributes from the vocabulary. Attributes
in a schema are divided into components distinguished by attribute
characteristics. In the DGS/DDS specification it is anticipated that different
applications will use different vocabularies. According to the DGS/DDS framework a hypothetical profile might A further
illustration of how a profile might look like is given in the example in
clause A.4.7.
Attributes of a component can be
included directly or may be specified by a reference to a DGS/DDS default
profile. Resolving a profile that includes a reference to a default profile
is time-consuming. When theDGS server receives the profile from a device
profile server the final attribute values can not be determined until the
default profile has been requested and received. Support for defaults is
required by the DGS/DDS specification [39]. Due to these problems, there is a
recommendation made in clause 5.2.6 to not use the
DGS/DDS defaults element in
DGS device capability profile documents.
A DGS/DDS vocabulary shall according
to DGS/DDS and UAProf include:
A description of the semantics/type/resolution
A device capability profile can use
an arbitrary number of vocabularies and thus it is possible to reuse
attributes from other vocabularies by simply referencing the corresponding
namespaces. The focus of the DGS vocabulary is content formatting which
overlaps the focus of the UAProf vocabulary. UAProf is specified by WAP
Forum and is an architecture and vocabulary/schema for capability exchange
in the WAP environment. Since there are attributes in the UAProf vocabulary
suitable for streaming applications these are reused and combined with a
DGS
application specific streaming component. This makes the DGS vocabulary an
extension vocabulary to UAProf. The DGS/DDS specification encourages reuse of
attributes from other vocabularies. To avoid confusion, the same attribute
name should not be used in different vocabularies. In clause 5.2.3.3 a
number of attributes from UAProf [40] are recommended for DGS. The
DGS base
vocabulary is defined in clause 5.2.3.2.
A profile is allowed to instantiate
a subset of the attributes in the vocabularies and no specific attributes
are required but insufficient description may lead to content unable to be
shown by the client.
The use of
DGS/DDS enables an
extensibility mechanism for DGS/DDS -based schemas that addresses the evolution
of new types of devices and applications. The DGS profile schema
specification is going to provide a base vocabulary but in the future new
usage scenarios might have need for expressing new attributes. This is the
reason why there is a need to specify how extensions of the schema will be
handled. If the TSG responsible for the present document updates the base
vocabulary schema a new unique namespace will be assigned to the updated
schema. In another scenario the DDS may decide to add a new component
containing specific user related attributes. This new component will be
assigned a new namespace and it will not influence the base vocabulary in
any way. If other organisations or companies make extensions this can be
either as a new component or as attributes added to the existing base
vocabulary component where the new attributes uses a new namespace. This
ensures that third parties can define and maintain their own vocabularies
independently from the DGS base vocabulary.
URLdesc and Profdiff were introduced
in clause A.4.1. The URLdesc is a list of URLs that point to locations on
device profile servers from where the DGS server retrieves suitable device
capability profiles. The Profdiff contains additional capability description
information; e.g. overrides for certain attribute values. Both URLdesc and
Profdiff are encapsulated in DGS/DDS and HTTP messages using additional header
fields. This can be seen in Figure A.1. In clause 9.1 of [40] three new HTTP
headers are defined that can be used to implement the desired functionality:
"x-wap-profile", "x-wap-profile-diff" and "x-wap-profile-warning". These
headers are reused in DGS for both HTTP and RTSP.
- The "x-wap-profile"
is a request header that contains a list of absolute URLs to device
capability descriptions and profile diff names. The profile diff names
correspond to additional profile information in the "x-wap-profile-diff"
header.
- The "x-wap-profile-diff"
is a request header that contains a subset of a device capability profile.
- The "x-wap-profile-warning"
is a response header that contains error codes explaining to what extent the
server has been able to match the terminal request.
Clause 5.2.5 of the present document
defines this exchange mechanism.
It is left to the mobile terminal to
decide when to send x-wap-profile headers. The mobile terminal could send
the "x-wap-profile" and "x-wap-profile-diff" headers with each
DGS/DDS DESCRIBE
and/or with each DGS/DDS SETUP request. Sending them in the
DGS DESCRIBE
request is useful for the DGS server to be able to make a better decision
which presentation description to provision to the client. Sending the "x-wap-profile"
and "x-wap-profile-diff" headers with an HTTP request is useful whenever the
mobile terminal requests some multimedia content that will be used in the
DGS application. For example it can be sent with the request for a SMIL file
and the DGS server can see to it that the mobile terminal receives a SMIL
file which is optimised for the particular terminal. Clause 5.2.5 of the
present document gives recommendations for when profile information should
be sent.
It is up to the DGS server to
retrieve the device capability profiles using the URLs in the "x-wap-profile"
header. The DGS server is also responsible to merge the profiles then
received. If the "x-wap-profile-diff" header is present it must also
merge that information with the retrieved profiles. This functionality is
defined in clause 5.2.6.
It should be noted that it is up the
implementation of the mobile terminal what URLs to send in the "x-wap-profile"
header. For instance, a terminal could just send one URL that points to a
complete description of its capabilities. Another terminal might provide one
URL that points to a description of the terminal hardware. A second URL that
points to a description of a particular software version of the streaming
application, and a third URL that points to the description of a hardware or
software plug-in that is currently added to the standard configuration of
that terminal. From this example it becomes clear that sending URLs from the
mobile terminal to the server is good enough not only for static profiles
but that it can also handle re-configurations of the mobile terminal such as
software version changes, software plug-ins, hardware upgrades, etc.
As described above the list of URLs
in the x-wap-profile header is a powerful tool to handle dynamic changes of
the mobile terminal. The "x-wap-profile-diff" header could also be used to
facilitate the same functionality. To use the "x-wap-profile-diff" header to
e.g. send a complete profile (no URL present at all in the "x-wap-profile
header") or updates as a result of e.g. a hardware plug-in is not
recommended unless some compression scheme is applied over the
air-interface. The reason is of course that the size of a profile may be
large.
The following is an example of a
device capability profile as it could be available from a device profile
server. The DGS/DDS
Instead of a single XML document
the description could also be spread over several files. The DGS server would
need to retrieve these profiles separately in this case and would need to
merge them. For instance, this would be useful when device capabilities of
this phone that are related to streaming would differ among different
versions of the phone. In this case the part of the profile for streaming
would be separated from the rest into its own profile document. This
separation allows describing the difference in streaming capabilities by
providing multiple versions of the profile document for the streaming
capabilities.
This is an informative annex for
SMIL presentation authors. Authors can expect that DGS clients can handle
the SMIL module collection defined in clause 8.2, with the restrictions
defined in this Annex. When creating SMIL documents the author is
recommended to consider that terminals may have small displays and simple
input devices. The media types and their encoding included in the
presentation should be restricted to what is described in clause 7 of the
present document. Considering that many mobile devices may have limited
software and hardware capabilities, the number of media to be played
simultaneous should be limited. For example, many devices will not be able
to handle more than one video sequence at the time.
The Linking Modules define elements
and attributes for navigational hyperlinking, either through user
interaction or through temporal events. The BasicLinking module defines the
"a" and "area" elements for basic linking:
a
Similar to the "a" element in HTML it provides a link from a media object
through the href attribute (which contains the URI of the link's
destination). The "a" element includes a number of attributes for defining
the behaviour of the presentation when the link is followed.
area Whereas the a element only
allows a link to be associated with a complete media object, the area
element allows links to be associated with spatial and/or temporal portions
of a media object.
The area element may be useful for
enabling services that rely on interactivity where the display size is not
big enough to allow the display of links alongside a media (e.g.
QCIF video) window. Instead, the user could, for
example, click on a watermark logo displayed in the video window to visit
the company website.
Even if the area element may be
useful some mobile terminals will not be able to handle area elements that
include multiple selectable regions within an area element. One reason for
this could be that the terminals do not have the appropriate user interface.
Such area elements should therefore be avoided. Instead it is recommended
that the "a" element be used. If the "area" element is used, the SMIL
presentation should also include alternative links to navigate through the
presentation; i.e. the author should not create presentations that rely on
that the player can handle "area" elements.
The "fit" attribute defines how
different media should be fitted into their respective display regions.
The rendering and layout of some
objects on a small display might be difficult and all mobile devices may not
support features such as scroll bars; in addition, the root-layout window
may represent the full screen of the display. Therefore "fit=scroll" should
not be used.
Due to hardware restrictions in
mobile devices, operations such that scaling of a video sequence, or even
images, may be very difficult to achieve. According to the SMIL 2.0
specification SMIL players may in these situations clip the content instead.
To be sure of that the presentation is displayed as the author intended,
content should be encoded in a size suitable for the targeted terminals and
it is recommended to use "fit=hidden".
The two attributes "endEvent" and "repeatEvent" in the
EventTiming module may cause problems for a mobile SMIL player. The end of a
media element triggers the "endEvent". In the same way the "repeatEvent"
occurs when the second and subsequent iterations of a repeated element begin
playback. Both these events rely on that the SMIL player receives
information about that the media element has ended. One example could be
when the end of a video sequence initiates the event. If the player has not
received explicit information about the duration of the video sequence, e.g.
by the "dur" attribute in SMIL or by some external source as the "a=range"
field in DGS. The player will have to rely on the
DGD/DDS
BYE message to decide
when the video sequence ends. If the DGS/DDS
BYE message is lost, the player
will have problems initiate the event. For these reasons is recommended that
the "endEvent" and "repeatEvent" attributes are used with care, and if used
the player should be provided with some additional information about the
duration of the media element that triggers the event. This additional
information could e.g. be the "dur" attribute in SMIL or the "a=range" field
in DGS/DDS
.
The "inBoundsEvent"
and "outOfBoundsEvent" attributes assume that the terminal has a pointer
device for moving the focus to within a window (i.e. clicking within a
window). Not all terminals will support this functionality since they
do not have the appropriate user interface. Hence care should be taken in
using these particular event triggers
Authors
are encouraged to make use of meta data whenever providing such information
to the mobile terminal appears to be useful. However, they should keep in
mind that some mobile terminals will parse but not process the meta data.
Furthermore, authors should keep in mind that excessive use of meta data
will substantially increase the file size of the SMIL presentation that
needs to be transferred to the mobile terminal. This may result in longer
set-up times.
Entities
are a mechanism to insert XML fragments inside an XML document. Entities can
be internal, essentially a macro expansion, or external. Use of XML entities
in SMIL presentations is not recommended, as many current XML parsers do not
fully support them.
When
rendering texts in a SMIL presentation, authors are able to use XHTML Mobile
Profile [47] that contains thirteen modules. However, some of the
modules include non-text information. When referring to an XHTML
Mobile Profile document from a SMIL document, authors should use only the
required XHTML Host Language modules : Structure Module, Text Module,
Hypertext Module and List Module. The use of the Image Module, in
particular, should not be used. Images and other non-text contents
should be included in the SMIL document.
NOTE: An XHTML file
including a module which is not part of the XHTML Host Language modules may
not be shown as intended. Also, an XHTML file which uses elements or
attributes from the required XHTML Host Language modules and which uses
elements or attributes that are not included in XHTML Basic Profile [28],
may not render correctly on legacy handsets which implement only XHTML
Basic. These are:
- The start
attribute on the 'ol' element in the List module
-The value
attribute on the 'li' element in the List module
-
The 'b' element in the Presentation module
-
The 'big' element in the Presentation module
-
The 'hr' element in the Presentation module
-
The 'i' element in the Presentation module
-The 'small' element in the Presentation module
MIME media type name: video
DGS
MIME subtype name: DGS
Required parameters: None
Optional parameters:
profile: H.263 profile number, in the range 0 through 8, specifying the
supported H.263 annexes/subparts.
level: Level of bitstream operation, in the range 0 through 99, specifying
the level of computational complexity of the decoding process. When no
profile and level parameters are specified, Baseline Profile (Profile 0)
level 10 are the default values.
The profile and level specifications
can be found in [23]. Note that the RTP payload format for H263-2000 is the
same as for H263-1998 and is defined in [14], but additional
annexes/subparts are specified along with the profiles and levels.
.
MIME media type name: audio
DGS
MIME subtype name: sp-midi DGS
Required parameters: none
Optional parameters: none
NOTE:
The above text will be replaced with a reference to the DGS describing the
sp-midi MIME media type as soon as this becomes available.
The purpose of this annex is to
define the necessary structure for integration of the H.263, AMR and AMR-WB
media specific information in an MP4 file. Clauses D.2 to D.4 give some
background information about the Sample Description atom, VisualSampleEntry
atom and the AudioSampleEntry atom in the MPEG-4 file format. Then, the
definitions of the SampleEntry atoms for AMR, AMR-WB and H.263 are given in
clauses D.5 to D.8.
AMR and AMR-WB data is stored in the
stream according to the AMR and AMR-WB storage format for single channel
header of Annex E [11], without the AMR magic numbers.
In an MP4 file, Sample Description
Atom gives detailed information about the coding type used, and any
initialisation information needed for that coding. The Sample Description
Atom can be found in the MP4 Atom Structure .
Figure D.1:
MP4 Atom Structure Hierarchy
The Sample Description Atom can have
one or more SampleDescriptionEntry fields. Valid Sample Description Entry
atoms already defined for MP4 are AudioSampleEntry, VisualSampleEntry,
HintSampleEntry and MPEGSampleEntry Atoms. The SampleDescriptionEntry Atoms
for AMR and AMR-WB shall be AMRSampleEntry, and for H.263 shall be
H263SampleEntry, respectively.
The format of SampleDescriptionEntry
and its fields are explained as follows:
SampleDescriptionEntry
::= VisualSampleEntry |
AudioSampleEntry |
HintSampleEntry |
MpegSampleEntry
H263SampleEntry |
AMRSampleEntry
Table D.1: SampleDescriptionEntry fields
|
Field |
Type |
Details |
Value |
|
VisualSampleEntry |
|
Entry type for visual samples
defined in the MPEG-4 specification. |
|
|
AudioSampleEntry |
|
Entry type for audio samples defined
in the MPEG-4 specification. |
|
|
HintSampleEntry
|
|
Entry type for hint track samples
defined in the MPEG-4 specification. |
|
|
MpegSampleEntry |
|
Entry type for MPEG related stream
samples defined in the MPEG-4 specification. |
|
|
H263SampleEntry |
|
Entry type for H.263 visual samples
defined in clause D.6 of the present document. |
|
|
AMRSampleEntry |
|
Entry type for AMR and AMR-WB speech
samples defined in clause D.5 of the present document. |
|
From the above 6 atoms, only the
VisualSampleEntry, AudioSampleEntry, H263SampleEntry and AMRSampleEntry
atoms are taken into consideration, since MPEG specific streams and hint
tracks are out of the scope of the present document.
|
Field |
Type |
Details |
Value |
|
AtomHeader.Size |
Unsigned int(32) |
|
|
|
AtomHeader.Type |
Unsigned int(32) |
|
'mp4v' |
|
Reserved_6 |
Unsigned int(8) [6] |
|
0 |
|
Data-reference-index
|
Unsigned int(16) |
Index to a data reference that to
use to retrieve the sample data. Data references are stored in data
reference Atoms. |
|
|
Reserved_16 |
Const unsigned int(32) [4] |
|
0 |
|
Width |
Unsigned int(16) |
Maximum width, in pixels of the
stream |
|
|
Height |
Unsigned int(16) |
Maximum height, in pixels of the
stream |
|
|
Reserved_4 |
Const unsigned int(32) |
|
0x00480000 |
|
Reserved_4 |
Const unsigned int(32) |
|
0x00480000 |
|
Reserved_4 |
Const unsigned int(32) |
|
0 |
|
Reserved_2 |
Const unsigned int(16) |
|
1 |
|
Reserved_32 |
Const unsigned
int(8) [32] |
|
0 |
|
Reserved_2 |
Const unsigned int(16) |
|
24 |
|
Reserved_2 |
Const int(16) |
|
-1 |
|
ESDAtom |
|
Atom containing an elementary stream
descriptor for this stream. |
|
This version of the
VisualSampleEntry, with explicit width and height, shall be used for MPEG-4
video streams conformant to this specification.
NOTE:
width and height parameters together may be used to allocate the necessary
memory in the playback device without need to analyse the video stream.
|
Field |
Type |
Details |
Value |
|
AtomHeader.Size |
Unsigned int(32) |
|
|
|
AtomHeader.Type |
Unsigned int(32) |
|
'mp4a' |
|
Reserved_6 |
Unsigned int(8) [6] |
|
0 |
|
Data-reference-index
|
Unsigned int(16) |
Index to a data reference that to
use to retrieve the sample data. Data references are stored in data
reference Atoms. |
|
|
Reserved_8 |
Const unsigned int(32) [2] |
|
0 |
|
Reserved_2 |
Const unsigned int(16) |
|
2 |
|
Reserved_2 |
Const unsigned int(16) |
|
16 |
|
Reserved_4 |
Const unsigned int(32) |
|
0 |
|
TimeScale |
Unsigned int(16) |
Copied from track |
|
|
Reserved_2 |
Const unsigned int(16) |
|
0 |
|
ESDAtom |
|
Atom containing an elementary stream
descriptor for this stream. |
|
For narrow-band AMR, the atom type
of the AMRSampleEntry Atom shall be 'samr'. For AMR wideband (AMR-WB), the
atom type of the AMRSampleEntry Atom shall be 'sawb'. Each AMR or AMR-WB
track shall be associated with a single AMRSampleEntry.
The AMRSampleEntry Atom is defined
as follows:
|
Field |
Type |
Details |
Value |
|
AtomHeader.Size |
Unsigned int(32) |
|
|
|
AtomHeader.Type |
Unsigned int(32) |
|
'samr' or ësawbí |
|
Reserved_6 |
Unsigned int(8) [6] |
|
0 |
|
Data-reference-index
|
Unsigned int(16) |
Index to a data reference that to
use to retrieve the sample data. Data references are stored in data
reference Atoms. |
|
|
Reserved_8 |
Const unsigned int(32) [2] |
|
0 |
|
Reserved_2 |
Const unsigned int(16) |
|
2 |
|
Reserved_2 |
Const unsigned int(16) |
|
16 |
|
Reserved_4 |
Const unsigned int(32) |
|
0 |
|
TimeScale |
Unsigned int(16) |
Copied from media header atom of
this media |
|
|
Reserved_2 |
Const unsigned int(16) |
|
0 |
|
AMRSpecificAtom |
|
Information specific to the decoder. |
|
If one compares the AudioSampleEntry
Atom - AMRSampleEntry Atom the main difference is in the replacement of the
ESDAtom, which is specific to MPEG-4 systems, with an atom suitable for AMR
and AMR-WB. The AMRSpecificAtom field structure is described in
clause D.7.
Table D.5: H263SampleEntry fields
|
Field |
Type |
Details |
Value |
|
AtomHeader.Size |
Unsigned int(32) |
|
|
|
AtomHeader.Type |
Unsigned int(32) |
|
's263' |
|
Reserved_6 |
Unsigned int(8) [6] |
|
0 |
|
Data-reference-index
|
Unsigned int(16) |
Index to a data reference that to
use to retrieve the sample data. Data references are stored in data
reference Atoms. |
|
|
Reserved_16 |
Const unsigned int(32) [4] |
|
0 |
|
Width |
Unsigned int(16) |
Maximum width, in pixels of the
stream |
|
|
Height |
Unsigned int(16) |
Maximum height, in pixels of the
stream |
|
|
Reserved_4 |
Const unsigned int(32) |
|
0x00480000 |
|
Reserved_4 |
Const unsigned int(32) |
|
0x00480000 |
|
Reserved_4 |
Const unsigned int(32) |
|
0 |
|
Reserved_2 |
Const unsigned int(16) |
|
1 |
|
Reserved_32 |
Const unsigned
int(8) [32] |
|
0 |
|
Reserved_2 |
Const unsigned int(16) |
|
24 |
|
Reserved_2 |
Const int(16) |
|
-1 |
|
H263SpecificAtom |
|
Information specific to the H.263
decoder. |
|
If one compares the
VisualSampleEntry ñ H263SampleEntry Atom the main difference is in the
replacement of the ESDAtom, which is specific to MPEG-4 systems, with an
atom suitable for H.263. The H263SpecificAtom field structure for
H.263 is described in clause D.8.
The AMRSpecificAtom fields for AMR
and AMR-WB shall be as defined in table D.6. The AMRSpecificAtom for the
AMRSampleEntry Atom shall always be included if the MP4 file contains AMR or
AMR-WB media.
Table D.6:
The AMRSpecificAtom fields for AMRSampleEntry
|
Field |
Type |
Details |
Value |
|
AtomHeader.Size |
Unsigned int(32) |
|
|
|
AtomHeader.Type |
Unsigned int(32) |
|
ëdamrí |
|
DecSpecificInfo |
AMRDecSpecStruc |
Structure which holds the AMR and
AMR-WB Specific information |
|
AtomHeader Size and Type:
indicate the size and type of the AMR decoder-specific atom. The type
must be ëdamrí.
DecSpecificInfo:
the structure where the AMR and AMR-WB stream specific information resides.
The AMRDecSpecStruc is defined as
follows:
struct AMRDecSpecStruc{
Unsigned int (8) decoder_version
DGS DDS
Unsigned int (8) frames_per_sample
25 A 30 50 60
}
The definitions of AMRDecSpecStruc
members are as follows:
vendor:
four character code of the manufacturer of the codec, e.g. 'VXYZ'. The
vendor field gives information about the vendor whose codec is used to
create the encoded data. It is an informative field which may be used by the
decoding end. If a manufacturer already has a four character code, it is
recommended that it uses the same code in this field. Else, it is
recommended that the manufacturer creates a four character code which best
addresses the manufacturerís name. It can be safely ignored.
decoder_version:
version of the vendorís decoder which can decode the encoded stream in
the best (i.e. optimal) way. This field is closely tied to the vendor field.
It may give advantage to the vendor which has optimal encoder-decoder
version pairs. The value is set to 0 if decoder version has no importance
for the vendor. It can be safely ignored.
mode_set:
the active codec modes. Each bit of the mode_set parameter corresponds to
one mode. The bit index of the mode is calculated according to the 4 bit FT
field of the AMR or AMR-WB frame structure. The mode_set bit structure is as
follows: (B15xxxxxxB8B7xxxxxxB0) where B0 (Least Significant Bit)
corresponds to Mode 0, and B8 corresponds to Mode 8.
The mapping of existing AMR modes to
FT is given in table 1.a in [19]. A value of 0x81FF means all
modes and comfort noise frames are possibly present in an AMR stream.
The mapping of existing AMR-WB modes
to FT is given in Table 1.a in TS 26.201 [37]. A value of 0x83FF means all
modes and comfort noise frames are possibly present in an AMR-WB stream.
As an example, if mode_set =
0000000110010101b, only Modes 0, 2, 4, 7 and 8 are present in the stream.
mode_change_period:
defines a number N, which restricts the mode changes only at a multiple of N
frames. If no restriction is applied, this value should be set to 0. If
mode_change_period is not 0, the following restrictions apply to it
according to the frames_per_sample field:
if (mode_change_period <
frames_per_sample)
frames_per_sample = k x (mode_change_period)
else if (mode_change_period >
frames_per_sample)
mode_change_period = k x (frames_per_sample)
where k : integer [2, Ö]
If mode_change_period is equal to
frames_per_sample, then the mode is the same for all frames inside one
sample.
frames_per_sample:
defines the number of frames to be considered as 'one sample' inside the MP4
file. This number shall be greater than 0 and less than 16. A value of 1
means each frame is treated as one sample. A value of 10 means that 10
frames (of duration 20 msec each) are put together and treated as one
sample. It must be noted that, in this case, one sample duration is 20 (msec/frame)
x 10 (frame) = 200 msec. For the last sample of the stream, the number of
frames can be smaller than frames_per_sample, if the number of remaining
frames is smaller than frames_per_sample.
NOTE:
The "hinter", for the creation of the hint tracks, can use the information
given by the AMRDecSpecStruc members.
The H263SpecificAtom fields for H.
263 shall be as defined in table D.7. The H263SpecificAtom for the
H263SampleEntry Atom shall always be included if the MP4 file contains H.263
media.
The H263SpecificAtom for H263 is
composed of the following fields.
Table D.7:
The H263SpecificAtom fields H263SampleEntry
|
Field |
Type |
Details |
Value |
|
AtomHeader.Size |
Unsigned int(32) |
|
|
|
AtomHeader.Type |
Unsigned int(32) |
|
ëd263í |
|
DecSpecificInfo |
H263DecSpecStruc |
Structure which holds the H.263
Specific information |
|
AtomHeader Size and Type:
indicate the size and type of the H.263 decoder-specific atom. The
type must be ëd263í.
DecSpecificInfo:
This is the structure where the H263 stream specific information resides.
H263DecSpecStruc is defined as
follows:
struct H263DecSpecStruc{
Unsigned int
(32) vendor
Unsigned int (8) decoder_version
Unsigned
int (8) H263_Level
Unsigned int (8) H263_Profile
}
The definitions of H263DecSpecStruc
members are as follows:
vendor:
four character code of the manufacturer of the codec, e.g. 'VXYZ'. The
vendor field gives information about the vendor whose codec is used to
create the encoded data. It is an informative field which may be used by the
decoding end. If a manufacturer already has a four character code, it is
recommended that it uses the same code in this field. Else, it is
recommended that the manufacturer creates a four character code which best
addresses the manufacturerís name. It can be safely ignored.
decoder_version:
version of the vendorís decoder which can decode the encoded stream in
the best (i.e. optimal) way. This field is closely tied to the vendor field.
It may give advantage to the vendor which has optimal encoder-decoder
version pairs. . The value is set to 0 if decoder version has no importance
for the vendor. It can be safely ignored.
H263_Level and
H263_Profile: These two parameters define
which H263 profile and level is used. These parameters are based on the MIME
media type video/H263-2000. The profile and level specifications can be
found in [23].
EXAMPLE 1:
H.263 Baseline = {H263_Level = 10, H263_Profile = 0}
EXAMPLE 2:
H.263 Profile 3 @ Level 10 = {H263_Level = 10 , H263_Profile = 3}
NOTE:
The "hinter", for the creation of the hint tracks, can use the information
given by the H263DecSpecStruc members.
This clause defines the format of
timed text in downloaded files. In this release, timed text is
downloaded, not streamed.
Operators may specify additional
rules and restrictions when deploying terminals, in addition to this
specification, and behavior that is optional here may be mandatory for
particular deployments. In particular, the required character set is
almost certainly dependent on the geography of the deployment.
Text in this specification uses the
Unicode 3.0 [30] standard. Terminals shall correctly decode both UTF-8
and UTF-16 into the required characters. If a terminal receives a
Unicode code, which it cannot display, it shall display a predictable
result. It shall not treat multi-byte UTF-8 characters as a series of
ASCII characters, for example.
Authors should create fully-composed
Unicode; terminals are not required to handle decomposed sequences for which
there is a fully-composed equivalent.
Terminals shall conform to the
conformance statement in Unicode 3.0 section 3.1.
Text strings for display and font
names are uniformly coded in UTF-8, or start with a UTF-16 BYTE ORDER MARK
(\uFEFF) and by that indicate that the string which starts with the byte
order mark is in UTF-16. Terminals shall recognise the byte-order mark
in this byte order; they are not required to recognise byte-reversed UTF-16,
indicated by a byte-reversed byte-order mark.
D.8a.2 Bytes, Characters, and Glyphs
This clause uses these terms
carefully. Since multi-byte characters are permitted (i.e. 16-bit
Unicode characters), the number of characters in a string may not be the
number of bytes. Also, a byte-order-mark is not a character at all,
though it occupies two bytes. So, for example, storage lengths are
specified as byte-counts, whereas highlighting is specified using character
offsets.
It should also be noted that in some
writing systems the number of glyphs rendered might be different again.
For example, in English, the characters ëfií are sometimes rendered as a
single ligature glyph.
In this specification, the first
character is at offset 0 in the string. In records specifying both a
start and end offset, the end offset shall be greater than or equal to the
start offset. In cases where several offset specifications occur in
sequence, the start offset of an element shall be greater than or equal to
the end offset of the preceding element.D.8a.3
Character Set Support
All terminals shall be able to
render Unicode characters in these ranges:
a) basic ASCII and Latin-1
(\u0000 to \u00FF), though not all the control characters in this range are
needed;
b) the Euro currency symbol
(\u20AC)
c) telephone and ballot symbols
(\u260E through \u2612)
Support for the following characters
is recommended but not required:
a) miscellaneous technical
symbols (\u2300 through \u2335)
b) ëZapf Dingbatsí:
locations \u2700 through \u27AF, and the locations where some symbols have
been relocated (e.g. \u2605, Black star).
The private use characters \u0091
and \u0092, and the initial range of the private use area \uE000 through
\uE0FF are reserved in this specification. For these Unicode values,
and for control characters for which there is no defined graphical
behaviour, the terminal shall not display any result: neither a glyph is
shown nor is the current rendering position changed.
Fonts are specified in this
specification by name, size, and style. There are three special names
which shall be recognized by the terminal: Serif, Sans-Serif, and
Monospace. It is strongly recommended that these be different fonts
for the required characters from ASCII and Latin-1. For many other
characters, the terminal may have a limited set or only a single font.
Terminals requested to render a character where the selected font does not
support that character should substitute a suitable font. This ensures
that languages with only one font (e.g. Asian languages) or symbols for
which there is only one form are rendered.
Fonts are requested by name, in an
ordered list. Authors should normally specify one of the special names
last in the list.
Terminals shall support a pixel size
of 12 (on a 72dpi display, this would be a point size of 12). If a
size is requested other than the size(s) supported by the terminal, the next
smaller supported size should be used. If the requested size is
smaller than the smallest supported size, the terminal should use the
smallest supported size.
Terminals shall support unstyled
text for those characters it supports. It may also support bold,
italic (oblique) and bold-italic. If a style is requested which the
terminal does not support, it should substitute a supported style; a
character shall be rendered if the terminal has that character in any style
of any font.
Within the sample description, a
complete list of the fonts used in the samples is found. This enables
the terminal to pre-load them, or to decide on font substitution.
Terminals may use varying versions
of the same font. For example, here is the same text rendered on two
systems; it was authored on the first, where it just fitted into the text
box.
EXAMPLE:
Authors should be aware of this
possible variation, and provide text box areas with some ëslackí to allow
for rendering variations.
The colour of both text and
background are indicated in this specification using RGB or DGS values.
Terminals are not required to be able to display all colours in the RGB
space. Terminals with a limited colour display, with only gray-scale
display, and with only black-and-white are permissible. If a terminal
has a limited colour capability it should substitute a suitable colour;
dithering of text may be used but is not usually appropriate as it results
in ìfuzzyî display. If colour substitution is performed, the
substitution shall be consistent: the same RGB colour shall result
consistently in the same displayed colour. If the same colour is
chosen for background and text, then the text shall be invisible (unless a
style such as highlight changes its colour). If different colours are
specified for the background and text, the terminal shall map these to
different colours, so that the text is visible.
Colours in this specification also
have an alpha or transparency value. In this specification, a
transparency value of 0 indicates a fully transparent colour, and a value of
255 indicates fully opaque. Support for partial or full transparency
is optional. ëKeyingí text (text rendered on a transparent background)
is done by using a background colour which is fully transparent. ëKeyingí
text over video or pictures, and support for transparency in general, can be
complex and may require double-buffering, and its support is optional in the
terminal. Content authors should beware that if they specify a colour
which is not fully opaque, and the content is played on a terminal not
supporting it, the affected area (the entire text box for a background
colour) will be fully opaque and will obscure visual material behind it.
Visual material with transparency is layered closer to the viewer than the
material which it partially obscures.D.8a.7
Text rendering position and composition
Text is rendered within a region (a
concept derived from SMIL). There is a text box set within that
region. This permits the terminal to position the text within the
overall presentation, and also to render the text appropriately given the
writing direction. For text written left to right, for example, the
first character would be rendered at, or near, the left edge of the box, and
with its baseline down from the top of the box by one baseline height (a
value derived from the font and font size chosen). Similar
considerations apply to the other writing directions.
Within the region, text is rendered
within a text box. There is a default text box set, which can be
over-ridden by a sample.
The text box is filled with the
background colour; after that the text is painted in the text colour.
If highlighting is requested one or both of these colours may vary.
Terminals may choose to anti-alias
their text, or not.
The text region and layering are
defined using structures from the ISO base media file format.
This track header box is used for
text track:
aligned(8) class TrackHeaderBox
extends FullBox(ëtkhdí,
version, flags
const unsigned
int(32)[2] reserved = 0;
int(16) layer;
template int(16)
alternate_group = 0;
template int(16) volume = 0;
const unsigned int(16) reserved = 0;
template int(32)[9] matrix=
{ 0x00010000,0,0,0,0x00010000,0,tx,ty,0x40000000 };
//
unity matrix
unsigned int(32) width;
unsigned int(32) height;
}
Visually composed tracks including
video and text are layered using the ëlayerí value. This compares, for
example, to z-index in SMIL. More negative layer values are towards
the viewer. (This definition is compatible with that in ISO/MJ2).
The region is defined by the track
width and height, and translation offset. This corresponds to the SMIL
region. The width and height are stored in the track header fields above.
The sample description sets a text box within the region, which can be
over-ridden by the samples.
The translation values are stored in
the track header matrix in the following positions:
{ 0x00010000,0,0, 0,0x00010000,0, tx,
ty, 0x40000000 }
These values are fixed-point 16.16
values, here restricted to be integers (the lower 16 bits of each value
shall be zero). The X axis increases from left to right; the Y axis
from top to bottom. (This use of the matrix is conformant with
ISO/MJ2.)
So, for example, a centered region
of size 200x20, positioned below a video of size 320x240, would have
track_width set to 200 (widh= 0x00c80000), track_height set to 20 (height=
0x00140000), and tx = (320-200)/2 = 60, and ty=240.
Since matrices are not used on the
video tracks, all video tracks are set at the coordinate origin.
Figure D.2 provides an overview:
Figure D.2: Illustration of text rendering
position and composition
The top and left positions of the
text track is determined by the tx and ty, which are the translation values
from the coordinate origin (since the video track is at the origin, this is
also the offset from the video track). The default text box set in the
sample description sets the rendering area unless over-ridden by a 'tbox'
in the text sample. The box values are defined as the relative values
from the top and left positions of the text track.
It should be noted that this only
specifies the relationship of the tracks within a single DGS (DDS) file.
If a SMIL presentation lays up multiple files, their relative position is
set by the SMIL regions. Each file is assigned to a region, and then
within those regions the spatial relationship of the tracks is defined.
Text can be ëmarqueeí scrolled in
this specification (compare this to Internet Explorerís marquee
construction). When scrolling is performed, the terminal first
calculates the position in which the text would be displayed with no
scrolling requested. Then:
a) If scroll-in is requested,
the text is initially invisible, just outside the text box, and enters the
box in the indicated direction, scrolling until it is in the normal
position;
b) If scroll-out is requested,
the text scrolls from the normal position, in the indicated direction, until
it is completely outside the text box.
The rendered text is clipped to the
text box in each display position, as always. This means that it is
possible to scroll a string which is longer than can fit into the text box,
progressively disclosing it (for example, like a ticker-tape). Note
that both scroll in and scroll out may be specified; the text scrolls
continuously from its invisible initial position, through the normal
position, and out to its final position.
If a scroll-delay is specified, the
text stays steady in its normal position (not initial position) for the
duration of the delay; so the delay is after a scroll-in but before a
scroll-out. This means that the scrolling is not continuous if both
are specified. So without a delay, the text is in motion for the duration of
the sample. For a scroll in, it reaches its normal position at the end
of the sample duration; with a delay, it reaches its normal position before
the end of the sample duration, and remains in its normal position for the
delay duration, which ends at the end of the sample duration.
Similarly for a scroll out, the delay happens in its normal position before
scrolling starts. If both scroll in, and scroll out are specified,
with a delay, the text scrolls in, stays stationary at the normal position
for the delay period, and then scrolls out ñ all within the sample duration.
The speed of scrolling is calculated
so that the complete operation takes place within the duration of the
sample. Therefore the scrolling has to occur within the time left
after scroll-delay has been subtracted from the sample duration. Note
that the time it takes to scroll a string may depend on the rendered length
of the actual text string. Authors should consider whether the
scrolling speed that results will be exceed that at which text on a wireless
terminal could be readable.
Terminals may use simple algorithms
to determine the actual scroll speed. For example, the speed may be
determined by moving the text an integer number of pixels in every update
cycle. Terminals should choose a scroll speed which is as fast or
faster than needed so that the scroll operation completes within the sample
duration.
Terminals are not required to handle
dynamic or stylistic effects such as highlight, dynamic highlight, or href
links on scrolled text.
The scrolling direction is set by a
two-bit field, with the following possible values:
00b ñ
text is vertically scrolled up (ëcredits styleí), entering from the bottom
of the bottom and leaving towards the top.
01b ñ text is horizontally
scrolled (ëmarquee styleí), entering from the right and leaving towards the
left.
10b ñ text is vertically
scrolled down, entering from the top and leaving towards the bottom.
11b ñ text is horizontally
scrolled, entering from the left and leaving towards the right.
The human language used in this
stream is declared by the language field of the media-header atom in this
track. It is an ISO 639/T 3-letter code. The knowledge of the
language used might assist searching, or speaking the text. Rendering
is language neutral. Note that the values ëundí (undetermined) and
ëmulí (multiple languages) might occur.
Writing direction specifies the way
in which the character position changes after each character is rendered.
It also will imply a start-point for the rendering within the box.
Terminals shall support the
determination of writing direction, for those characters they support,
according to the Unicode 3.0 specification. Note that the only
required characters can all be rendered using left-right behaviour. A
terminal which supports characters with right-left writing direction shall
support the right-left composition rules specified in Unicode.
Terminals may also set, or allow the
user to set, an overall writing direction, either explicitly or implicitly
(e.g. by the language selection). This affects layout. For
example, if upper-case letters are left-right, and lower-case right-left,
and the Unicode string ABCdefGHI shall be rendered, it would appear as
ABCfedGHI on a terminal with overall left-right writing (English, for
example) and GHIdefABC on a system with overall right-left (Hebrew, for
example).
Terminals are not required to
support the bi-directional ordering codes (\u200E, \u200F and \u202A through
\u202E).
If vertical text is requested by the
content author, characters are laid out vertically from top to bottom.
The terminal may choose to render different glyphs for this writing
direction (e.g. a horizontal parenthesis), but in general the glyphs should
not be rotated. The direction in which lines advance (left-right, as
used for European languages, or right-left, as used for Asian languages) is
set by the terminal, possibly by a direct or indirect user preference (e.g.
a language setting). Terminals shall support vertical writing of the
required character set. It is recommended that terminals support
vertical writing of text in those languages commonly written vertically
(e.g. Asian languages). If vertical text is requested for characters
which the terminal cannot render vertically, the terminal may behave as if
the characters were not available.
Automatic wrapping of text from line
to line is complex, and can require hyphenation rules and other complex
language-specific criteria. For these reasons, text is not wrapped in
this specification. If a string is too long to be drawn within the
box, it is clipped. The terminal may choose whether to clip at the
pixel boundary, or to render only whole glyphs.
There may be multiple lines of text
in a sample (hard wrap). Terminals shall start a new line for the
Unicode characters line separator (\u2028), paragraph separator (\u2029) and
line feed (\u000A). It is recommended that terminals follow Unicode
Technical Report 13 [48]. Terminals should treat carriage return
(\u000D), next line (\u0085) and CR+LF (\u000D\u000A) as new line.
Text may be highlighted for
emphasis. Since this is a non-interactive system, solely for text
display, the utility of this function may be limited.
Dynamic highlighting used for Closed
Caption and Karaoke highlighting, is an extension of highlighting.
Successive contiguous sub-strings of the text sample are highlighted at the
specified times.
A text stream is its own unique
stream type. For the DGS file format, the handler-type within the
ëhdlrí atom shall be ëtextí.
The DGS text track uses an empty null
media header (ënmhdí), called Mpeg4MediaHeaderAtom in the MP4 specification,
in common with other MPEG streams.
aligned(8) class Mpeg4MediaHeaderAtom
extends FullAtom(ínmhdí,
version = 0, flags) {
}
Both the sample format and the
sample description contain style records, and so it is defined once here for
compactness.
startChar:
character offset of the beginning of this style run (always 0 in a sample
description)
endChar:
first character offset to which this style does not apply (always 0 in a
sample description); shall be greater than or equal to startChar. All
characters, including line-break characters and any other non-printing
characters, are included in the character counts.
font-ID:
font identifier from the font table; in a sample description, this is
the default font
face style flags: in the
absence of any bits set, the text is plain
1 bold
2 italic
4 underline
font-size:
font size (nominal pixel size, in essentially the same units as the width
and height)
text-color-rgba:
rgb colour, 8 bits each of red, green, blue, and an alpha (transparency)
value
Terminals shall support plain text,
and underlined horizontal text, and may support bold, italic and bold-italic
depending on their capabilities and the font selected. If a style is
not supported, the text shall still be rendered in the closest style
available.
The sample table box ('stbl')
contains sample descriptions for the text track. Each entry is a
sample entry box of type ëtx3gí. This name defines the format both of
the sample description and the samples associated with that sample
description. Terminals shall not attempt to decode or display sample
descriptions with unrecognised names, nor the samples attached to those
sample descriptions.
It starts with the standard fields
(the reserved bytes and the data reference index), and then some
text-specific fields. Some fields can be overridden or supplemented by
additional boxes within the text sample itself. These are discussed below.
There can be multiple text sample
descriptions in the sample table. If the overall text characteristics do not
change from one sample to the next, the same sample description is used.
Otherwise, a new sample description is added to the table. Not all changes
to text characteristics require a new sample description, however. Some
characteristics, such as font size, can be overridden on a
character-by-character basis. Some, such as dynamic highlighting, are not
part of the text sample description and can be changed dynamically.
The TextDescription extends the
regular sample entry with the following fields.
class FontRecord {
unsigned int(16)
font-ID;
unsigned int(8)
font-name-length;
unsigned int(8)
font[font-name-length];
class FontTableBox() extends Box(ëftabí
unsigned int(16)
entry-count;
FontRecord font-entry[entry-count];
class BoxRecord
signed int(16)
top;
signed int(16) left;
signed int(16)
bottom;
signed int(16) right;
class TextSampleEntry() extends SampleEntry
ëtxDGSí
unsigned int(32)
displayFlags;
signed int(8)
horizontal-justification;
signed int(8)
vertical-justification;
unsigned int(8)
background-color-rgba[4];
BoxRecord
default-text-box;
StyleRecord
default-style;
FontTableBox font-table;
displayFlags:
scroll In
0x00000020
scroll Out 0x00000040
scroll direction
0x00000180
/ see above for values
continuous karaoke 0x00000800
write text vertically 0x00020000
horizontal and vertical justification:
/ two eight-bit values from the following list:
left, top
0
centered 1
bottom, right -1
background-color-rgba:
rgb color, 8 bits each of red, green, blue, and an alpha (transparency)
value
Default text box: the default text box is
set by four values, relative to the text region; it may be over-ridden
in samples;
style record of default style: startChar
and endChar shall be zero in a sample description
The text box is inset within the
region defined by the track translation offset, width, and height. The
values in the box are relative to the track region, and are uniformly coded
with respect to the pixel grid. So, for example, the default text box
for a track at the top left of the track region and 50 pixels high and 100
pixels wide is {0, 0, 50, 100}.
A font table shall follow these
fields, to define the complete set of fonts used. The font table is an
atom of type ëftabí. Every font used in the samples is defined here by
name. Each entry consists of a 16-bit local font identifier, and a
font name, expressed as a string, preceded by an 8-bit field giving the
length of the string in bytes. The name is expressed in UTF-8
characters, unless preceded by a UTF-16 byte-order-mark, whereupon the rest
of the string is in 16-bit Unicode characters. The string should be a
comma separated list of font names to be used as alternative font, in
preference order. The special names ìSerifî, ìSans-serifî and
ìMonospaceî may be used. The terminal should use the first font in the
list which it can support; if it cannot support any for a given
character, but it has a font which can, it should use that font. Note
that this substitution is technically character by character, but terminals
are encouraged to keep runs of characters in a consistent font where
possible.
Each sample in the media data
consists of a string of text, optionally followed by sample modifier boxes.
For example, if one word in the
sample has a different size than the others, a 'styl' box is appended to
that sample, specifying a new text style for those characters, and for the
remaining characters in the sample. This overrides the style in the sample
description. These boxes are present only if they are needed. If all text
conforms to the sample description, and no characteristics are applied that
the sample description does not cover, no boxes are inserted into the sample
data.
class TextSampleModifierBox(type) extends
Box(type)
class TextSample {
unsigned int(16)
text-length;
unsigned int(8)
text[text-length];
TextSampleModifierBox
text-modifier[]; // to end of the sample
The initial string is preceded by a
16-bit count of the number of bytes in the string. There is no need for null
termination of the text string. The sample size table provides the complete
byte-count of each sample, including the trailing modifier boxes; by
comparing the string length and the sample size, you can determine how much
space, if any, is left for modifier boxes.
Authors should limit the string in
each text sample to not more than 2048 bytes, for maximum terminal
interoperability.
Any unrecognised box found in the
text sample should be skipped and ignored, and processing continue as if it
were not there.
'styl'
This specifies the style of the
text. It consists of a series of style records as defined above,
preceded by a 16-bit count of the number of style records. Each record
specifies the starting and ending character positions of the text to which
it applies. The styles shall be ordered by starting character offset,
and the starting offset of one style record shall be greater than or equal
to the ending character offset of the preceding record; styles records shall
not overlap their character ranges.
class TextStyleBox() extends
TextSampleModifierBox (ëstylí) {
unsigned int(16)
entry-count;
StyleRecord
text-styles[entry-count
'hlit' - Specifies highlighted text:
the atom contains two 16-bit integers, the starting character to highlight,
and the first character with no highlighting (e.g. values 4, 6 would
highlight the two characters 4 and 5). The second value may be the
number of characters in the text plus one, to indicate that the last
character is highlighted.
class TextHighlightBox() extends
TextSampleModifierBox ëhlití
unsigned int(16)
startcharoffset;
unsigned int(16)
endcharoffset;
class TextHilightColorBox() extends
TextSampleModifierBox ('hclr')
unsigned int(8)
highlight_color_rgba[4]
highlight_color_rgb:
rgb color, 8 bits
each of red, green, blue, and an alpha (transparency) value
The TextHilightColor Box may be
present when the TextHighlightBox or TextKaraokeBox is present in a text
sample. It is recommended that terminals use the following rules to
determine the displayed effect when highlight is requested:
a) if a highlight colour is not
specified, then the text is highlighted using a suitable technique such as
inverse video: both the text colour and the background colour change.
b) if a highlight colour is
specified, the background colour is set to the highlight colour for the
highlighted characters; the text colour does not change.
Terminals do not need to handle text
that is both scrolled and either statically or dynamically highlighted.
Content authors should avoid specifying both scroll and highlight for the
same sample.
'krok' ñ Karaoke, closed caption, or
dynamic highlighting. The number of highlight events is specified, and each
event is specified by a starting and ending character offset and an end time
for the event. The start time is either the sample start time or the end
time of the previous event. The specified characters are highlighted from
the previous end-time (initially the beginning of this sampleís time), to
the end time. The times are all specified relative to the sampleís time;
that is, a time of 0 represents the beginning of the sample time. The times
are measured in the timescale of the track.
The atom starts with the start-time
offset of the first highlight event, a 16-bit count of the event count, and
then that number of 8-byte records. Each record contains the end-time
offset as a 32-bit number, and the text start and end values, each as a
16-bit number. These values are specified as in the highlight record ñ the
offset of the first character to highlight, and the offset of the first
character not highlighted. The special case, where the startcharoffset
equals to the endcharoffset, can be used to pause during or at the beginning
of dynamic highlighting. The records shall be ordered and not overlap, as in
the highlight record. The time in each record is the end time of this
highlight event; the first highlight event starts at the indicated
start-time offset from the start time of the sample. The time values are in
the units expressed by the timescale of the track. The time values shall not
exceed the duration of the sample.
The continuouskaraoke flag controls
whether to highlight only those characters (continuouskaraoke = 0) selected
by a karaoke entry, or the entire string from the beginning up to the
characters highlighted (continuouskaraoke = 1) at any given time. In other
words, the flag specifies whether karaoke should ignore the starting offset
and highlight all text from the beginning of the sample to the ending
offset.
Karaoke highlighting is usually
achieved by using the highlight colour as the text colour, without changing
the background.
At most one dynamic highlight (ëkrokí)
atom may occur in a sample.
class TextKaraokeBox() extends
TextSampleModifierBox (ëkrokí) {
unsigned int(32)
highlight-start-time;
unsigned int(16)
entry-count;
for (i=1; i<=entry-count;
i++) {
unsigned int(32) highlight-end-time;
unsigned int(16) startcharoffset;
unsigned int(16) endcharoffset;
'dlay' - Specifies a delay after a
Scroll In and/or before Scroll Out. A 32-bit integer specifying the
delay, in the units of the timescale of the track. The default delay,
in the absence of this box, is 0.
class TextScrollDelayBox() extends
TextSampleModifierBox ëdlayí
unsigned int(32)
scroll-delay;
'href' ñ HyperText link. The
existence of the hypertext link is visually indicated in a suitable style
(e.g. underlined blue text).
This box contains these values:
startCharOffset: ñ the start offset of the
text to be linked
endCharOffset: ñ the end offset of the text
(start offset + number of characters)
URLLength:ñ the number of bytes in the
following URL
URL: UTF-8 characters ñ the linked-to URL
altLength:ñ the number of bytes in the
following ìaltî string
altstring: UTF-8 characters ñ an ìaltî
string for user display
The URL should be an absolute URL,
as the context for a relative URL may not always be clear.
The ìaltî string may be used as a
tool-tip or other visual clue, as a substitute for the URL, if desired by
the terminal, to display to the user as a hint on where the link refers.
Hypertext-linked text should not be
scrolled; not all terminals can display this or manage the user interaction
to determine whether user has interacted with moving text. It is also
hard for the user to interact with scrolling text.
class TextHyperTextBox() extends
TextSampleModifierBox (ëhrefí)
unsigned int(16)
startcharoffset;
unsigned int(16)
endcharoffset;
unsigned int(8)
URLLength;
unsigned int(8)
URL[URLLength];
unsigned int(8)
altLength;
unsigned int(8)
altstring[altLength];
D.8a.17.1.6
Textbox
ëtboxí ñ text box over-ride.
This over-rides the default text box set in the sample description.
class TextboxBox() extends
TextSampleModifierBox ('tbox') {
BoxRecord
text-box;
ëblnkí ñ Blinking text. This
requests blinking text for the indicated character range. Terminals
are not required to support blinking text, and the precise way in which
blinking is achieved, and its rate, is terminal-dependent.
class BlinkBox() extends
TextSampleModifierBox ('blnk') {
unsigned int(16)
startcharoffset;
unsigned int(16)
endcharoffset;
Two modifier boxes of the same type
shall not be applied to the same character (e.g. it is not permitted to have
two href links from the same text). As the ëhclrí, ëdlayí and ëtboxí are
globally applied to the whole text in a sample, two modifier boxes of the
same type shall not be present within a sample.
Table D.8 details the effects of
multiple options:
Table D.8: Combinations of features
|
|
|
|
|
First sample modifier atom |
|
|
|
Sample description style record |
styl |
hlit |
krok |
href |
blnk |
|
Second sample
|
styl |
1 |
3 |
|
|
|
|
|
modifier atom |
hlit |
|
|
3 |
|
|
|
|
|
krok |
|
|
4 |
3 |
|
|
|
|
href |
2 |
2 |
|
5 |
3 |
|
|
|
blnk |
|
6 |
6 |
6 |
6 |
6 |
1. The sample description
provides the default style; the style records over-ride this for the
selected characters.
2. The terminal over-rides the
chosen style for HREF links.
3. Two records of the same type
cannot be applied to the same character.
4. Dynamic and static
highlighting must not be applied to the same text.
5. Dynamic highlighting and
linking must not be applied to the same text.
6. Blinking text is optional,
particularly when requested in combination with other features.
DGS
multimedia files can be identified using several mechanisms. When
stored in traditional computer file systems, these files should be given the
file extension ì.DGS
î (readers should allow mixed case for the alphabetic
characters). The MIME types ìvideo/DGS
î (for visual or audio/visual
content, where visual includes both video and timed text) and ìaudio/DGS
î
(for purely audio content) are expected to be registered and used.
A file-type atom, as defined in the
JPEG 2000 specification [36]
shall be
present in conforming files. The file type box
ëftypí shall occur before any variable-length box (e.g. movie, free space,
media data). Only a fixed-size box such as a file signature, if
required, may precede it.
The brand identifier for this
specification is 'DGS '. This brand identifier must occur in the
compatible brands list, and may also be the primary brand.
If the file is also conformant to release 4 of this specification, it is
recommended that the Release 4 brand 'DGS ' also
occur in the compatible brands list; if DGS is not in the compatible brand list the file will not
be processed by a Release 4 reader. Readers should check the
compatible brands list for the identifiers they recognize, and not rely on
the file having a particular primary brand, for maximum compatibility.
Files may be compatible with more than one brand, and have a 'best use'
other than this specification, yet still be compatible with this
specification.
Table D.9: The File-Type atom
|
Field |
Type |
Details |
Value |
|
AtomHeader.Size |
Unsigned int(32) |
|
|
|
AtomHeader.Type |
Unsigned int(32) |
|
'ftyp' |
|
Brand |
Unsigned int(32) |
The major or ëbest useí of this file |
|
|
MinorVersion
|
Unsigned int(32) |
|
|
|
CompatibleBrands |
Unsigned int(32) |
A list of brands, to end of the atom |
|
Brand: Identifies the
ëbest useí of this file. The brand should match the file extension.
For files with extension ë.DGS í and conforming to this specification, the
brand shall be ëDGS DDSí.
MinorVersion: This
identifies the minor version of the brand. For files with brand
'DGS DDSZ', where Z is a digit, and conforming to release Z.x.y, this field
takes the value x*256 + y.
CompatibleBrands: a
list of brand identifiers (to the end of the atom). ëDGS
í shall be a
member of this list.
The AMR and AMR-WB speech codec
DGS
payload, storage format and MIME type registration are specified in [11].
chapter of the
specification. Greates care has been taken to keep
the two documents
consistence. However, in case of any divergence
the specification
takes presidence.
the specification.
This means all references using the form
[ref] are defined
in chapter 2 "References of the
specification. All
other references refer to parts within that
document.
Note: This Schemas
has been aligned in structure and base
vocabulary to the
RDF Schema used by UAProf [40].
<!--
****************************************************************** -->
<!-- ***** Properties shared among the
components***** -->
<rdf:Description ID="defaults">
<rdfs:domain
rdf:resource="Streaming"/>
<rdfs:comment>
An attribute
used to identify the default capabilities.
</rdfs:comment>
</rdf:Description>
<!-- ***** Component Definitions ***** -->
The
Streaming component specifies the base vocabulary for
DGS/DDS
servers supporting capability exchange should
understand
the attributes in this component as explained in
</rdfs:comment>
</rdf:Description>
** In the
following property definitions, the defined types
** are as follows:
**
** Number: A
positive integer
** [0-9]+
** Boolean: A yes
or no value
** Yes|No
** Literal: An
alphanumeric string
** [A-Za-z0-9/.\-_]+
** Dimension: A pair of numbers
** [0-9]+x[0-9]+
<!-- ***** Component: Streaming ***** -->
<rdf:Description ID="AudioChannels">
<rdf:type
rdf:resource="http://www.w3.org/2000/01/rdfschema#Property"/>
<rdfs:domain rdf:resource="#Streaming"/>
<rdfs:comment>
Description: This
attribute describes the stereophonic capability of the natural audio device.
The only legal values are "Mono" and "Stereo".
Type: Literal
Resolution: Locked
Examples: "Mono",
"Stereo"
</rdfs:comment>
</rdf:Description>
<rdf:Description ID="VideoPreDecoderBufferSize">
<rdfs:domain rdf:resource="#Streaming"/>
<rdfs:comment>
Description: This
attribute signals if the optional video
buffering requirements
defined in Annex DGS are supported. It also
defines the size of the
hypothetical pre-decoder buffer defined in
Annex DGS. A value equal
to zero means that Annex DGS is not
supported. A value equal
to one means that Annex DGS is
supported. In this case
the size of the buffer is the default size
defined in Annex
DGS.
A value equal to or greater than the default
buffer size defined in
Annex DGS means that Annex DGS is supported and
sets the buffer size to
the given number of octets. Legal values are all
integer values equal to
or greater than zero. Values greater than
one but less than the
default buffer size defined in Annex DGS are
not allowed.
Type: Number
Resolution: Locked
Examples: "0", "4096"
</rdfs:comment>
</rdf:Description>
<rdf:Description ID="VideoInitialPostDecoderBufferingPeriod">
<rdfs:domain rdf:resource="#Streaming"/>
<rdfs:comment>
Description: If Annex
DGS
is not supported, the attribute has no
meaning. If Annex
DGS is
supported, this attribute defines the
maximum initial
post-decoder buffering period of video. Values are
interpreted as clock
ticks of a 90-kHz clock. In other words, the
value is incremented by
one for each 1/90 000 seconds. For
example, the value 9000
corresponds to 1/10 of a second initial
post-decodder buffering.
Legal valaues are all integer value equal
to or greater than zero.
Type: Number
Resolution: Locked
Examples: <VideoInitialPostDecoderBufferingPeriod>
9000
</VideoInitialPostDecoderBufferingPeriod>
</rdfs:comment>
</rdf:Description>
<rdf:Description ID=" VideoDecodingByteRate
">
<rdfs:domain rdf:resource="#Streaming"/>
<rdfs:comment>
Description: If
Annex DGS is not supported, the attribute has no meaning. If Annex
DGS is supported, this attribute defines the peak
decoding byte rate the DGS client is able to
support. In other words, the DGS client fulfils the requirements
given in Annex DGS with the signalled peak decoding byte rate. The values are
given in bytes per second and shall be greater than or equal to 8000.
According to Annex DGS, 8000 is the default peak decoding byte rate for the
mandatory video codec profile and level (H.263 Profile 0 Level 10).Legal
values are integer value greater than or equal to 8000.
Type: Number
Resolution: Locked
Examples: <VideoDecodingByteRate>16000</VideoDecodingByteRate>
</rdfs:comment>
</rdf:Description>
<rdf:Description ID=" MaxPolyphony">
<rdf:type rdf:resource="http://www.w3.org/2000/01/rdfschema#Property"/>
<rdfs:domain rdf:resource="#Streaming"/>
<rdfs:comment>
Description: Attribute definition:
The MaxPolyphony attribute refers to the maximal polyphony
that the synthetic audio device
supports as defined in [44]. Legal values are integer between 5
to 24.
NOTE:
MaxPolyphony attribute can be used to signal the maximum polyphony
capabilities supported by the DGS client. This is a complementary mechanism
for the delivery of compatible SP-MIDI content and thus the DGS client is
required to support Scalable Polyphony MIDI i.e. Channel Masking defined in
[44].
Type: Number
Resolution: Locked
Examples: <MaxPolyphony>8</MaxPolyphony>
</rdfs:comment>
</rdf:Description>
<rdf:Description ID="DGS Accept">
<rdfs:domain
rdf:resource="#Streaming"/>
<rdfs:comment>
Description: List of
content types (MIME types) the DGS
application supports.
Both DGS/DDS Accept (SoftwarePlatform, UAProf)
and DGS
Accept can be
used but if DGS Accept is defined it has
precedence over
DGS/DDS Accept and a DGS application shall then use
DGS
Accept.
Type: Literal (bag)
Resolution: Append
Examples: "audio/AMR-WB;octet-alignment,application/smil"
</rdfs:comment>
</rdf:Description>
<rdf:Description ID="DGS Accept-Subset">
<rdfs:domain rdf:resource="#Streaming"/>
<rdfs:comment>
Description: List of
content types for which the DGS application
supports a subset.
MIME-types can in most cases effectively be
used to express
variations in support for different media
types. Many MIME-types,
e.g. AMR-NB has several parameters that
can be used for this
purpose. There may exist content types for
which the DGS
application only supports a subset and this subset
can not be expressed
with MIME-type parameters. In these cases the
attribute DGS
Accept-Subset
is used to describe support for a
subset of a specific
content type. If a subset of a specific
content type is declared
in DGS Accept-Subset, this means that
DGS
Accept-Subset has
precedence over both DGS Accept and CcppAccept.
DGS Accept and/orDGS/DDS
Accept shall always include the
corresponding
content types for which DGS Accept-Subset specifies subsets
of.
This is to ensure compatibility with those content servers
that
do not understand the
DGS Accept-Subset attribute but do understand e.g. CcppAccept.
This is
illustrated with an example. If DGS Accept="audio/AMR",
"image/jpeg" and
PssAccept-Subset="JPEG-DGS" then "audio/AMR"
and JPEG Base
line is supported. "image/jpeg" in DGS
Accept is of no
importance since
it is related to "JPEG-DGS" in
DGS Accept-Subset.
Subset
identifiers and corresponding semantics shall only be defined by
the
DGS/DDS responsible for the present document. The following values are defined:
- "JPEG-DGS": Only the two
JPEG modes described in clause 7.5 of the present
document are
supported.
- "SVG-Tiny"
- "SVG-Basic"
Legal values are subset
identifiers defined by the specification.
Type: Literal (bag)
Resolution: Locked
Examples: "JPEG-DGS","SVG-Tiny","SVG-Basic"
</rdfs:comment>
</rdf:Description>
<rdf:Description ID="DGS Version">
<rdfs:domain rdf:resource="#Streaming"/>
<rdfs:comment>
Description: Latest
DGS/DDS version supported by the client. Legal
values are "DGS
DDS-R4",
"DGS DDS-" and so forth.
Type: Literal
Resolution: Locked
Examples: "DGS
DDS-"
</rdfs:comment>
</rdf:Description>
<rdf:Description ID="RenderingScreenSize">
<rdfs:domain rdf:resource="#Streaming"/>
<rdfs:comment>
Description: The
rendering size of the device's screen in unit of
pixels. The horizontal
size is given followed by the vertical
size. Legal values are
pairs of integer values equal or greater
than zero. A value equal
"0x0"means that there exist no display or
just textual output is
supported.
Type: Dimension
Resolution: Locked
Examples: "160x120"
</rdfs:comment>
</rdf:Description>
<rdf:Description ID="SmilBaseSet">
<rdfs:domain rdf:resource="#Streaming"/>
<rdfs:comment>
Description: Indicates a
base set of SMIL 2.0 modules that the
client supports. Leagal
values are the following pre-defined
identifiers:
"SMIL-DGS DDS-" indicates all SMIL 2.0
modules required for
scene description support according to clause
8 of Release 4 of TS
26.234. "SMIL-DGS DDS-R5" indicates all SMIL 2.0
modules required for
scene description support according to clause
8 of the specification.
Type: Literal
Resolution: Locked
Examples: "SMIL-DGS DDS-R4", "SMIL-DGS
DDS-R5"
</rdfs:comment>
</rdf:Description>
<rdf:Description ID="SmilModules">
<rdfs:domain
rdf:resource="#Streaming"/>
<rdfs:comment>
Description: This
attribute defines a list of SMIL 2.0 modules
supported by the client.
If the SmilBaseSet is used those modules
do not need to be
explicitly listed here. In that case only
additional module
support needs to be listed. Legal values are all
SMIL 2.0 module names
defined in the SMIL 2.0 recommendation [31],
section 2.3.3, table 2.
Type: Literal (bag)
Resolution: Locked
Examples: "BasicTransitions,MulitArcTiming"
</rdfs:comment>
</rdf:Description>
This annex describes video buffering
requirements in the DGS. As defined in clause 7.4
of the present document, support for the annex is optional and may be
signalled in the DGS capability exchange and in
the DGS. This is described in clause 5.2 and clause 5.3.3 of the present
document. When the annex is in use, the content of the annex is normative.
In other words, DGS clients shall be capable of receiving an
DGS
packet stream that complies with the specified buffering model and
DGS
servers shall verify that the transmitted DGS packet stream complies with
the specified buffering model.
The behaviour of the DGS buffering
model is controlled with the following parameters: the initial pre-decoder
buffering period, the initial post-decoder buffering period, the size of the
hypothetical pre-decoder buffer, the peak decoding byte rate, and the
decoding macroblock rate. The default values of the parameters are defined
below.
- The default
initial pre-decoder buffering period is 1 second.
- The default
initial post-decoder buffering period is zero.
- The default size
of the hypothetical pre-decoder buffer is defined according to the maximum
video bit-rate according to the table below:
Table : Default size of the hypothetical
pre-decoder buffer
|
Maximum video bit-rate |
Default size of the hypothetical
pre-decoder buffer |
|
65536 bits per second |
20480 bytes |
|
131072 bits per second |
40960 bytes |
|
Undefined |
51200 bytes |
- The maximum video
bit-rate can be signalled in the media-level bandwidth attribute of
DGS/DDS as
defined in clause 5.3.3 of this document. If the video-level bandwidth
attribute was not present in the presentation description, the maximum video
bit-rate is defined according to the video coding profile and level in use.
- The size of the
hypothetical post-decoder buffer is an implementation-specific issue. The
buffer size can be estimated from the maximum output data rate of the
decoders in use and from the initial post-decoder buffering period.
- By default, the
peak decoding byte rate is defined according to the video coding profile and
level in use. For example, H.263 Level 10 requires support for bit-rates up
to 64000 bits per second. Thus, the peak decoding byte rate equals to 8000
bytes per second.
- The default
decoding macroblock rate is defined according to the video coding profile
and level in use. If MPEG-4 Visual is in use, the default macroblock rate
equals to VCV decoder rate. If H.263 is in use, the default macroblock rate
equals to (1 / minimum picture interval) multiplied by number of macroblocks
in maximum picture format. For example, H.263 Level 10 requires support for
picture formats up to QCIF and minimum picture interval down to 2002 / 30000
sec. Thus, the default macroblock rate would be 30000 x 99 / 2002
ª
1484 macroblocks per second.
DGS clients may signal their
capability of providing larger buffers and faster peak decoding byte rates
in the capability exchange process described in clause 5.2 of the present
document. The average coded video bit-rate should be smaller than or equal
to the bit-rate indicated by the video coding profile and level in use, even
if a faster peak decoding byte rate were signalled.
Initial parameter values for each
stream can be signalled within the DGS description of the stream. Signalled
parameter values override the corresponding default parameter values. The
values signalled within the DGS description guarantee pauseless playback
from the beginning of the stream until the end of the stream (assuming a
constant-delay reliable transmission channel).
DGS servers may update parameter
values in the response for an
DGS/DDS PLAY request. If an updated parameter
value is present, it shall replace the value signalled in the
DGS
description or the default parameter value in the operation of the
DGS
buffering model. An updated parameter value is valid only in the indicated
playback range, and it has no effect after that. Assuming a constant-delay
reliable transmission channel, the updated parameter values guarantee pauseless playback of the actual range indicated in the response for the
PLAY request. The indicated pre-decoder buffer size and initial post-decoder
buffering period shall be smaller than or equal to the corresponding values
in the DGS description or the corresponding default values, whichever ones
are valid. The following header fields are defined for DGS/DDS :
- x-predecbufsize:<size
of the hypothetical pre-decoder buffer>
This gives the suggested size of the Annex DGS hypothetical pre-decoder buffer
in bytes.
- x-initpredecbufperiod:<initial
pre-decoder buffering period>
This gives the required initial pre-decoder buffering period specified
according to Annex DGS. Values are interpreted as clock ticks of a 90-kHz
clock. That is, the value is incremented by one for each 1/90 000 seconds.
For example, value 180 000 corresponds to a two second initial pre-decoder
buffering.
- x-initpostdecbufperiod:<initial
post-decoder buffering period>
This gives the required initial post-decoder buffering period specified
according to Annex DGS. Values are interpreted as clock ticks of a 90-kHz
clock.
These header fields are defined for
the response of an DGS/DDS PLAY request only. Their use is optional.
The following example plays the
whole presentation starting at DGS time code 0:10:20 until the end of the
clip. The playback is . The suggested
initial post-decoder buffering period is half a second.
C->S: PLAY rtsp://audio.example.com/twister.en
DGS/DDS /1.0
CSeq: 833
Session: 12345678
Range: smpte=0:10:20-;time=19970123T153600Z
S->C: DGS/DDS
/1.0 200 OK
CSeq: 833
Date: 23 Jan 1997 15:35:06 GMT
Range: smpte=0:10:22-;time=19970123T153600Z
x-initpredecbufperiod: 45000
The DGS
server buffering verifier is specified according to the DGS buffering model. The model is based on two
buffers and two timers. The buffers are called the hypothetical pre-decoder
buffer and the hypothetical post-decoder buffer. The timers are named the
decoding timer and the playback timer.
The DGS buffering model is presented below.
1. The buffers are initially
empty.
2. A DGS Server adds each
transmitted DGS packet having video payload to the pre-decoder buffer
immediately when it is transmitted. All protocol headers at DGS or any lower
layer are removed.
3. Data is not removed from the
pre-decoder buffer during a period called the initial pre-decoder buffering
period. The period starts when the first DGS packet is added to the buffer.
4. When the initial pre-decoder
buffering period has expired, the decoding timer is started from a position
indicated in the previous DGS PLAY request.
5. Removal of a video frame is
started when both of the following two conditions are met: First, the
decoding timer has reached the scheduled playback time of the frame. Second,
the previous video frame has been totally removed from the pre-decoder
buffer.
6. The duration of frame
removal is the larger one of the two candidates: The first candidate is
equal to the number of macroblocks in the frame divided by the decoding
macroblock rate. The second candidate is equal to the number of bytes in the
frame divided by the peak decoding byte rate. When the coded video frame has
been removed from the pre-decoder buffer entirely, the corresponding
uncompressed video frame is located into the post-decoder buffer.
7. Data is not removed from the
post-decoder buffer during a period called the initial post-decoder
buffering period. The period starts when the first frame has been placed
into the post-decoder buffer.
8. When the initial
post-decoder buffering period has expired, the playback timer is started
from the position indicated in the previous DGS PLAY request.
9. A frame is removed from the
post-decoder buffer immediately when the playback timer reaches the
scheduled playback time of the frame.
10. Each
DGS PLAY request resets the
DGS
buffering model to its initial state.
A DGS server shall verify that a
transmitted DGS packet stream complies with the following requirements:
- The DGS buffering
model shall be used with the default or signalled buffering parameter
values. Signalled parameter values override the corresponding default
parameter values.
- The occupancy of
the hypothetical pre-decoder buffer shall not exceed the default or
signalled buffer size.
- Each frame shall
be inserted into the hypothetical post-decoder buffer before or on its
scheduled playback time.
When the annex is in use, the
DGS
client shall be capable of receiving an DGS packet stream that complies with
the DGS server buffering verifier, when the
DGS packet stream is carried
over a constant-delay reliable transmission channel. Furthermore, the video
decoder of the DGS client, which may include handling of post-decoder
buffering, shall output frames at the correct rate defined by the DGS
time-stamps of the received packet stream
It is recommended that the first
element of the MIP (Maximum Instantaneous Polyphony) message of the
DGS -MIDI
content intended for synthetic audio DGS/DDS should be no more than 5. For
instance the following MIP figures {4, 9, 10, 12, 12, 16, 17, 20, 26, 26,
26} complies with the recommendation whereas {6, 9, 10, 12, 12, 16,
17, 20, 26, 26, 26} does not.
This informative annex describes
some implementation guidelines intended for DGS-MIDI device 5-24 Note Profile
for DGS [45]. These guidelines are here to give the possibility for
manufacturers to develop early DGS-MIDI implementations using MIDI hardware
available at the time of the approval of release 5. These guidelines are
valid only for release 5 implementations of DGS-MIDI and are expected to be
removed . It should be noted that these guidelines may reduce the musical
performance of the synthesiser depending on the content and should be used
with extreme caution.
Scalable Polyphony synthesisers
conformant to this Profile shall support at least two MIDI Channels that can
function as Rhythm Channels, to enable a fluent scalable polyphony
implementation.
If the two rhythm Channels are not
natively supported by the MIDI hardware, the SP-MIDI player could redirect
the events intended to the additional rhythm channels toward the default
rhythm channel (MIDI channel 10). The rendering of the SP-MIDI content
should not be affected until different Channel settings (e.g. Channel
Volume, Bank Setting, Panning etc.) are applied to the different rhythm
Channels. It is recommended that only Channel settings intended for the
default rhythm channel be applied.
When the support of individual
stereophonic panning is not possible by the stereophonic MIDI synthesiser,
central panning should be used as default instead.
This Annex gives recommendation
for the mapping rules needed by the DGS applications to request the appropriate QoS from the UMTS network (see Table J.1).
Table J.1: Mapping of
DGS/DDS parameters to
UMTS QoS parameters for DGS
|
QoS parameter |
Parameter value |
comment |
|
Delivery of erroneous SDUs |
"no"[TBC] |
|
|
Delivery order |
Yes |
|
|
Traffic class |
"Streaming class" |
|
|
Maximum SDU size |
1520 bytes |
|
|
Guaranteed bit rate for downlink |
1.025 * SDP session bandwidth [TBC] |
|
|
Maximum bit rate for downlink |
Equal or higher to guaranteed bit
rate in downlink |
Specifying a minimum overhead bit
rate per media might be useful and is FFS |
|
Guaranteed bit rate for uplink |
0.025 * SDP session bandwidth [TBC] |
|
|
Maximum bit rate for uplink |
Equal or higher to guaranteed bit
rate in uplink |
|
|
Residual BER |
1*10-5 [TBC] |
16 bit CRC should be enough |
|
SDU error ratio |
1*10-4 or better |
1*10-3 could be acceptable.
RLC AM mode should easily enable 10-4. |
|
Traffic handling priority |
Subscribed traffic handling priority |
Ignored |
|
Transfer delay |
[1s to 1.5s] |
|
|
Change history DGD/DDS |
|
Date |
TSG SA# |
TSG Doc. |
CR |
Rev |
Subject/Comment |
Old |
New |
|
03-1998 |
11 |
SP-010094 |
|
|
Version for Release 4 |
|
4.0.0 |
|
09-1998 |
13 |
SP-010457 |
001 |
1 |
DGS DDS SMIL Language Profile |
4.0.0 |
4.1.0 |
|
09-1998 |
13 |
SP-010457 |
002 |
|
Clarification of H.263 baseline
settings |
4.0.0 |
4.1.0 |
|
09-1998 |
13 |
SP-010457 |
003 |
2 |
Updates to references |
4.0.0 |
4.1.0 |
|
09-1998 |
13 |
SP-010457 |
004 |
1 |
Corrections to Annex A |
4.0.0 |
4.1.0 |
|
09-1998 |
13 |
SP-010457 |
005 |
1 |
Clarifications to chapter 7 |
4.0.0 |
4.1.0 |
|
09-1998 |
13 |
SP-010457 |
006 |
1 |
Clarification of the use of XHTML
Basic |
4.0.0 |
4.1.0 |
|
12-1998 |
14 |
SP-010703 |
007 |
|
Correction of
DGS Usage |
4.1.0 |
4.2.0 |
|
12-1998 |
14 |
SP-010703 |
008 |
1 |
Implementation guidelines for
DDS and DGS |
4.1.0 |
4.2.0 |
|
12-1998 |
14 |
SP-010703 |
009 |
|
Correction to media type decoder
support in the DGS client |
4.1.0 |
4.2.0 |
|
12-1998 |
14 |
SP-010703 |
010 |
|
Amendments to file format support for
26.234 release 4 |
4.1.0 |
4.2.0 |
|
03-1998 |
15 |
SP-020087 |
011 |
|
Specification of
missing limit for number of AMR Frames per Sample |
4.2.0 |
4.3.0 |
|
03-2002 |
15 |
SP-020087 |
013 |
2 |
Removing of the
reference to TS 26.235 |
4.2.0 |
4.3.0 |
|
03-2002 |
15 |
SP-020087 |
014 |
|
Correction to the
reference for the XHTML MIME media type |
4.2.0 |
4.3.0 |
|
03-2002 |
15 |
SP-020087 |
015 |
1 |
Correction to
MPEG-4 references |
4.2.0 |
4.3.0 |
|
03-2002 |
15 |
SP-020087 |
018 |
1 |
Correction to the
width field of H263SampleEntry Atom in Section D.6 |
4.2.0 |
4.3.0 |
|
03-2002 |
15 |
SP-020087 |
019 |
|
Correction to the
definition of "b=AS" |
4.2.0 |
4.3.0 |
|
03-2002 |
15 |
SP-020087 |
020 |
|
Clarification of
the index number's range in the referred MP4 file format |
4.2.0 |
4.3.0 |
|
03-2002 |
15 |
SP-020087 |
021 |
|
Correction of
DGS
attribute 'C=' |
4.2.0 |
4.3.0 |
|
03-2002 |
15 |
SP-020173 |
023 |
|
References to "DGS AMR-WB codec"
replaced by "ITU-T Rec. DGS.722.2" and "DDS 3267" |
4.2.0 |
4.3.0 |
|
03-2002 |
15 |
SP-020088 |
022 |
2 |
Addition of Release 5 functionality |
4.3.0 |
5.0.0 |
|
06-2002 |
16 |
SP-020226 |
024 |
1 |
Correction to Timed Text |
5.0.0 |
5.1.0 |
|
06-2002 |
16 |
SP-020226 |
026 |
3 |
Mime
media type update |
5.0.0 |
5.1.0 |
|
06-2002 |
16 |
SP-020226 |
027 |
|
Corrections to the description of Sample Description atom and Timed Text
Format |
5.0.0 |
5.1.0 |
|
06-2002 |
16 |
SP-020226 |
029 |
1 |
Corrections Based on Interoperability Issues |
5.0.0 |
5.1.0 |