Please resolve these comments along with any other Last Call comments
you may receive.
Reviewer: Elwyn Davies
Review Date: 10 May 2012 (part 1)
IETF LC End Date: 10 May 2012
IESG Telechat date: (if known) -
s2 (and more generally): The splitting of the signal in the frequency
domain into signal (components?) below and above 8kHz presumably
requires that the signal is subjected to a Discrete Fourier Transform to
allow the signal to be split. I think sometging is needed in s2 to
explain how this is managed (or if I don't understand, to explain why it
s4.1: Despite having found a copy of the range coding paper and tried to
read it, I found the whole business of range coding opaque. Given that
this is (I think) fairly novel, some additional less opaque description
could be very helpful to people trying to understand what is going on.
In particular knowledge of how entropy coding works is pretty much a
s4.2.5, para 3:
> When switching from 20 ms to 10 ms, the 10 ms
> 20 ms Opus frame, potentially leaving a hole that needs to be
> concealed from even a single packet loss.
s4.2.5, para 4:
> In order to properly produce LBRR frames under all conditions, an
> encoder might need to buffer up to 60 ms of audio and re-encode it
> during these transitions. However, the reference implementation opts
> to disable LBRR frames at the transition point for simplicity.
Should this be phrased in RFC 2119 requiremenmts language? The first
part sounds like a SHOULD with the second part being the get out , but
its not entirely clear what the consequences are other then simplicity.
global: bytes -> octets
global: The form/terminology Q<n> (e.g., Q13, Q15, Q16) ought to be
s1: Expand CELP
s1.1.6: Need to define floor().
s2: ? Expand SILK
s2: Reference for Vorbis.
s2.1.8: Expand AAC (and MP3).
s3: References for Ogg, Matroska, maybe RTP.
s3: Fig 1, Table 2 and intervening text: Presumably SILK only (Table 2)
etc correspond to MODES 1-3 in Figure 1. This needs to be consistent.
s3.2.1: Make it clear which of the octets is len/len. To be
precise it might be better to say len0/len1 are the values of the two
length octets (in whichever order you intend). The form len could be
misinterpreted as a function 'length of 0'.
s3.2.5: better s/figure below/Figure 5/
> In the CBR case, the compressed length of each frame in bytes is
> equal to the number of remaining bytes in the packet after
> subtracting the (optional) padding, (N-2-P), divided by M. This
> number MUST be a non-negative integer multiple of M.
'This number' is not the compressed length of each frame that is the
subject of the first sentence, but the number of remaining octets - this
> The number of header bytes (TOC byte, frame
> count byte, padding length bytes, and frame length bytes), plus the
> length of the first M-1 frames themselves, plus the length of the
> padding MUST be no larger than N, the total size of the packet.
Surely this is a non sequitur? This might be better phrased as 'The
total size of a well formed packet MUST be at least...'
s3.3: The example diagrams ought to have figure numbers.
s3.4: I am not keen on duplicating normative requirements in this way
(double maintenance issue). It would be better to put explicit numbered
requirements in the sections above an reference the resulting numbers
> The decoder initializes rng to 128 and initializes val to
> 127 minus the top 7 bits of the first input octet.
How are the 'top seven bits' to be interpreted here? e.g. as the bottom
seven bits of a 8 bit integer field? an 8 bit integer with the lowest
bit zeroed out?
s4.1.1: This is really a global point. This section refers to
entdec.c. Presumably (since we haven't reached the code yet) and it is
still compressed, there is some file structure. I don't think this has
been said above. It would be good to provide a list of the file
components (i.e., sectional structure of the code) at the start, maybe
even giving line number positions within the decompressed code.
> Then it reads the next octet of the
> payload and combines it with the left-over bit buffered from the
> previous octet to form the 8-bit value sym. It takes the left-over
> bit as the high bit (bit 7) of sym, and the top 7 bits of the octet
> it just read as the other 7 bits of sym.
This is not well phrased. Better
Then it reads the next octet of the payload [packet? payload hasn't
really been used before] and combines the left over bit from the
previous octet (see Section 4.1 for starting this process) as the high
bit (bit 7)| of 'sym' and the top 7 bits of the octet as the other 7
bits of sym, leaving the remaining bit for the next iteration.
Should r_Q15 = rng >> (l-16) be r_Q15 = rng >> (lg-16)? There doesn't
seem to be an 'l' defined.
s4.2.1: Expand LTP earlier. It would also be useful to expand LPC again.
s4.2.2: acronym VAD is not expanded until the beginning of s4.2.3.
s4.2.7: acronym LSF needs to be expanded on first use.
s188.8.131.52: Explain briefly why Table 7 has values for indices 0 to 15
when wi0/1 are in range 0 to 14.
s184.108.40.206, para below Table 12:
> These 6 bits are combined to form a gain index between 0 and 63.
s/gain index/gain_index/ as this variable is used subsequently.
s220.127.116.11: The use of log_gain seems slightly confusing when combined
with gain_index. One at least is presumably log scaled. Maybe a bit
more explanation is needed.