777米奇影院狠狠色,无套熟女AV呻吟在线观看,国产精品兄妹在线观看麻豆 ,国产自国产自愉自愉免费24区

當前位置:首頁 > 翻譯學習

Speed編碼解介紹資料翻譯-中英對照

發(fā)布時間:2013-2-1      閱讀次數:1497

Speed編碼解介紹資料翻譯-中英對照
Spexx編碼解碼器的出現是因為對開放源碼有這方面的需求,并且不需繳納軟件專利使用費的語音編碼解碼器。這些都

是任何開放源碼的軟件具有可用性的必要條件。本質上來說,Speex對于語音來說就如同Vorbis對于音頻/音樂一樣。不像其它的語音編碼解碼

器,Speex并不是設計用于移動電話,而是針對于掌上電腦和網絡語音電話(VoIP)的應用。當然也支持文檔的壓縮。
Speex編碼解碼器的設計非常靈活,所支持的語音質量和比特率的范圍很廣泛。能夠支持良好質量的語音,同時也指的是除窄頻帶語音(電話

的質量,8kHz的采樣率)以外,Speex還可以解碼寬頻帶的語音(采樣率為16kHz)。
面向網絡語音電話而不是移動電話的設計指的是:在數據包丟失方面Speex十分可靠,而且它不會對數據包造成損壞。這是根據網絡語音電話

VoIP的假設,到達的數據包是不變的或者沒有數據包到達。因為Speex是以多種不同的設備為目標的,它具有適度(可調節(jié)的)的復雜性和很

小的內存占用。
所有的設計目標都是指向將碼激勵線性預測編碼CELP作為編解碼技術的。主要原因之一是CELP已經證明可以穩(wěn)定可靠地工作并且可以同時兼容

低比特率(比如:DoD CELP @ 4.8 kbps)和高比特率(比如:G.728 @ 16 kbps)。
1.1 獲得幫助
對于許多開放源碼的項目,有許多方法可以得到Speex相關信息。這些信息包括:
•本說明手冊
•在Speex的網站上獲取的其它文件材料。
•電郵清單:可以發(fā)郵件到來討論任何和Speex有關事宜(并不僅限于開發(fā)人員)。
•互聯網交互聊天IRC:主通道是 #speex,在 irc.freenode.net 上。請注意:可能由于時差的原因,需要等待一段時間您才能和某人連接上,所以請

保持耐心。
•私下給作者發(fā)送郵件,郵箱地址是:這里只接受您不愿意在公開討論的私人/或者敏感話題 。
    在尋求幫助以前(通過電郵清單或IRC),最重要的是首先閱讀本手冊(是的,所以如果你已經看到這里,那么這是一個很好的開端)。通

常認為發(fā)郵件咨詢一些已經在文件材料上清楚說明的問題是很無禮的。另一方面,完全可以要求對手冊中包含的某些內容進行說明(我們也提

倡這種做法)。由于本手冊未能包含有關Speex的所有內容,所以鼓勵每個人提問題,發(fā)表評論,表達要求,或者告訴我們你正在使用Speex也

是可以的。
    關于電郵清單,這里有一些其它的說明。在向清單報告Speex的故障(bug)之前,強烈建議(如果可能的話)您首先測試這些故障在使用

speexenc和speexdec(見第4章)命令行工具的時候會不會重現。所報告的第三方代碼的故障不僅比較難找到,而且常常會由和Speex沒有任何

關系的錯誤造成。
1.2 關于本文件
本文件按照以下方式進行了劃分。第2章描述了不同的Speex的產品特性并且對本手冊中反復出現的許多基本條款記性了定義。第4章記載論述

了在Speex分配中提供的標準的命令行工具。第5章包括了關于在libspeex應用程序界面(API)中所用到的編程的詳細介紹。第7章是和Speex以

及標準相關一些信息。
最后三個章節(jié)描述了再Speex中所用到的算法。理解這些章節(jié)要求具有信號處理的知識,但是如果僅僅是使用Speex的話則沒有這個要求。 它們

是為那些想要徹底理解Speex是如何工作的/或者是為想要Speex來進行研究的人而準備的。第8章解釋了在碼激勵線性預測CELP后的大概原理,

而第9和第10章則是針對Speex的。
2. 對編碼解碼器的描述
這一章描述了Speex及其特性的細節(jié)問題。
2.1 概念
在介紹所有Speex特點之前,這里關于語音編碼的一些概念能夠幫助大家理解本手冊其余部分。其中有一些是語音/音頻處理中的一般性概念,

另外一些則是針對于Speex的。
采樣率
用赫茲(Hz)來表達的采樣率是指每秒鐘從一個信號中取得的樣本數量。對于一個采樣率為Fs kHz的來說,其所能代表的最高的頻率等同于

Fs/2 kHz(Fs/2 kHz認為是奈奎斯特頻率)。這是在信號處理過程中的一個基本的屬性,并且由抽樣定理進行了描述。Speex是主要為三種不同

的采樣率而設計的: 8 kHz,16 kHz,和32 kHz。這些采樣率分別屬于窄頻帶、寬頻帶和超寬頻帶。
比特率
當解碼一個語音信號的時候,比特率的定義為在解碼語音中單位時間所需要的比特的數量。以每秒的比特數bits per second (bps),或者通常用

每秒的千比特數kilobits per second來測量的。有一點很重要,就是要弄清楚每秒鐘千比特數kilobits per second (kbps)和每秒鐘千字節(jié)數kilobytes

per second (kBps)之間的差別。
質量(可變的)
Speex是一種有損的編碼解碼器,這就是說它所達到的壓縮要以損失輸入語音信號的保真度為代價。不像一些其它的與音編碼解碼器,Speex可

以控制使得在質量和比特率之間的平衡。多數時候,Speex編解碼過程是由一個范圍從0到10的質量參數來控制的。在恒定的比特率(CBR)操

作中,質量參數是一個整數,而對于可變的比特率(VBR)來說,質量參數則是一個浮點數。
復雜性(可變的)
采用了Speex,有可能實現讓編碼器允許復雜性進行改變。這是通過一個1到10的整數來控制搜索的執(zhí)行方式來實現的,這種工作方式是和gzip

和bzip2壓縮工具所采用的-1到-9的選擇是類似的。對于常規(guī)使用來說,在復雜性為1的時候噪音級比復雜性10要高出在1和2dB之間,但是在復

雜性為10的時候,對于CPU的需求卻要比復雜度1的時候高5倍。實際上,最好的平衡是在復雜度為2到4之間,盡管更高的設置在編碼非語音的

聲音比如DTMF聲調的時候通常也很有用。
可變比特率(VBR)
可變比特率(VBR)允許編碼解碼器不斷地改變其比特率從而來適應所要編碼的音頻的“困難性”。對于Speex這個例子來說,聽起來像元音和高

能量的瞬態(tài)要求一個較高的比特率來達到好的質量,而摩擦音(比如:f聲音)則用比較低的比特來編碼就足夠了。為此,VBR可以用比較低的

比特率達到同樣的質量,或者在某一特定的比特率的情況下達到更好的質量。盡管它具有這些優(yōu)勢,但是VBR也存在著著兩個主要的不足之處

:第一,通過僅僅指定質量,對于最后的平均比特率沒有保證。第二,對于一些實時的應用比如網絡語音電話(VoIP),最大比特率是最重要

的,其對于通信信道必須足夠低。
平均比特率(ABR)
平均比特率解決了VBR的一個問題,由于它動態(tài)的調節(jié)了VBR的質量從而來滿足了一個特定的目標比特率。因為質量/比特率是實時調整的(開

環(huán)的),所以總體的質量會比采用了正確的質量設置來滿足目標平均比特率而所使用VBR編碼得到的質量要稍微低一些。
聲音活動探測(VAD)
如情況允許,聲音活動探測會檢測正在編碼的音頻是否為語音或者安靜/或者有背景聲音。在用VBR編碼的時候,VAD總是會隱激活,所以該選

項只對于非VBR 的運算有用。這樣的話,Speex檢測到非語音階段并且僅僅用足夠的比特來編碼以復制背景噪音。這稱為“緩和噪音發(fā)生”(CNF

)。
不連續(xù)傳送(DTX)
不連續(xù)傳送是對于VAD/VBR運算的一種補充,它允許在背景噪音是固定的時候完全停止傳送。在文件的運算中,由于我們不能只是停止向文件

中寫入,只有5比特用于這種幀中(與250位/秒相一致)。
感覺增強
感覺增強是解碼器的一部分,當啟用的時候,它會嘗試去減少在編碼或者解碼過程而產生的噪音/變形的感覺。在更多的情況下,感覺增強會

使聲音偏離最初的客觀性(比如說僅僅考慮SNR),但是在最后聽起來還是會感覺更好(主觀的改進)。
等待時間和算法延遲
每種語音編碼解碼器都會在傳送中引進一種延遲。對于Speex來說,這種延遲是和幀尺寸相等同的,即在處理每個幀的時候,加上一定數量的

所要求的“預見性”。在窄頻帶的運算中(8 kHz),延遲是30毫秒,而對于寬頻帶(16 kHz)來說,延遲是34毫秒。這些時間不算在用來編碼

或者解碼幀的中央處理器時間中。
1 Introduction to Speex
The Speex codec exists because there is a need for a speech codec that is open-source and
free from software patent royalties. These are essential conditions for being usable in any open-source software. In essence,
Speex is to speech what Vorbis is to audio/music. Unlike many other speech codecs, Speex is not designed for mobile phones
but rather for packet networks and voice over IP (VoIP) applications. File-based compression is of course also supported.
The Speex codec is designed to be very flexible and support a wide range of speech quality and bit-rate. Support for very
good quality speech also means that Speex can encode wideband speech (16 kHz sampling rate) in addition to narrowband
speech (telephone quality, 8 kHz sampling rate).
Designing for VoIP instead of mobile phones means that Speex is robust to lost packets, but not to corrupted ones. This is
based on the assumption that in VoIP, packets either arrive unaltered or don’t arrive at all. Because Speex is targeted at a wide
range of devices, it has modest (adjustable) complexity and a small memory footprint.
All the design goals led to the choice of CELP as the encoding technique. One of the main reasons is that CELP has long
proved that it could work reliably and scale well to both low bit-rates (e.g. DoD CELP @ 4.8 kbps) and high bit-rates (e.g.
G.728 @ 16 kbps).
1.1 Getting help
As for many open source projects, there are many ways to get help with Speex. These include:
• This manual
• Other documentation on the Speex website • Mailing list: Discuss any Speex-related topic on (not just for developers)
• IRC: The main channel is #speex on irc.freenode.net. Note that due to time differences, it may take a while to get
someone, so please be patient.
• Email the author privately at  only for private/delicate topics you do not wish to discuss
publically.
Before asking for help (mailing list or IRC), it is important to first read this manual (OK, so if you made it here it’s already
a good sign). It is generally considered rude to ask on a mailing list about topics that are clearly detailed in the documentation.
On the other hand, it’s perfectly OK (and encouraged) to ask for clarifications about something covered in the manual. This
manual does not (yet) cover everything about Speex, so everyone is encouraged to ask questions, send comments, feature
requests, or just let us know how Speex is being used.
Here are some additional guidelines related to the mailing list. Before reporting bugs in Speex to the list, it is strongly
recommended (if possible) to first test whether these bugs can be reproduced using the speexenc and speexdec (see Section 4)
command-line utilities. Bugs reported based on 3rd party code are both harder to find and far too often caused by errors that
have nothing to do with Speex.
1.2 About this document
This document is divided in the following way. Section 2 describes the different Speex features and defines many basic terms
that are used throughout this manual. Section 4 documents the standard command-line tools provided in the Speex distribution.
Section 5 includes detailed instructions about programming using the libspeex API. Section 7 has some information related to
Speex and standards.
The three last sections describe the algorithms used in Speex. These sections require signal processing knowledge, but are
not required for merely using Speex. They are intended for people who want to understand how Speex really works and/or
want to do research based on Speex. Section 8 explains the general idea behind CELP, while sections 9 and 10 are specific to
Speex.
6
2 Codec description
This section describes Speex and its features into more details.
2.1 Concepts
Before introducing all the Speex features, here are some concepts in speech coding that help better understand the rest of the
manual. Although some are general concepts in speech/audio processing, others are specific to Speex.
Sampling rate
The sampling rate expressed in Hertz (Hz) is the number of samples taken from a signal per second. For a sampling rate
of Fs kHz, the highest frequency that can be represented is equal to Fs/2 kHz (Fs/2 is known as the Nyquist frequency).
This is a fundamental property in signal processing and is described by the sampling theorem. Speex is mainly designed for
three different sampling rates: 8 kHz, 16 kHz, and 32 kHz. These are respectively refered to as narrowband, wideband and
ultra-wideband.
Bit-rate
When encoding a speech signal, the bit-rate is defined as the number of bits per unit of time required to encode the speech. It
is measured in bits per second (bps), or generally kilobits per second. It is important to make the distinction between kilobits
per second (kbps) and kilobytes per second (kBps).
Quality (variable)
Speex is a lossy codec, which means that it achives compression at the expense of fidelity of the input speech signal. Unlike
some other speech codecs, it is possible to control the tradeoff made between quality and bit-rate. The Speex encoding process
is controlled most of the time by a quality parameter that ranges from 0 to 10. In constant bit-rate (CBR) operation, the quality
parameter is an integer, while for variable bit-rate (VBR), the parameter is a float.
Complexity (variable)
With Speex, it is possible to vary the complexity allowed for the encoder. This is done by controlling how the search is
performed with an integer ranging from 1 to 10 in a way that’s similar to the -1 to -9 options to gzip and bzip2 compression
utilities. For normal use, the noise level at complexity 1 is between 1 and 2 dB higher than at complexity 10, but the CPU
requirements for complexity 10 is about 5 times higher than for complexity 1. In practice, the best trade-off is between
complexity 2 and 4, though higher settings are often useful when encoding non-speech sounds like DTMF tones.
Variable Bit-Rate (VBR)
Variable bit-rate (VBR) allows a codec to change its bit-rate dynamically to adapt to the “difficulty” of the audio being
encoded. In the example of Speex, sounds like vowels and high-energy transients require a higher bit-rate to achieve good
quality, while fricatives (e.g. s,f sounds) can be coded adequately with less bits. For this reason, VBR can achive lower bit-rate
for the same quality, or a better quality for a certain bit-rate. Despite its advantages, VBR has two main drawbacks: first, by
only specifying quality, there’s no guaranty about the final average bit-rate. Second, for some real-time applications like voice
over IP (VoIP), what counts is the maximum bit-rate, which must be low enough for the communication channel.
Average Bit-Rate (ABR)
Average bit-rate solves one of the problems of VBR, as it dynamically adjusts VBR quality in order to meet a specific target
bit-rate. Because the quality/bit-rate is adjusted in real-time (open-loop), the global quality will be slightly lower than that
obtained by encoding in VBR with exactly the right quality setting to meet the target average bit-rate.
7
2 Codec description
Voice Activity Detection (VAD)
When enabled, voice activity detection detects whether the audio being encoded is speech or silence/background noise. VAD
is always implicitly activated when encoding in VBR, so the option is only useful in non-VBR operation. In this case, Speex
detects non-speech periods and encode them with just enough bits to reproduce the background noise. This is called “comfort
noise generation” (CNG).
Discontinuous Transmission (DTX)
Discontinuous transmission is an addition to VAD/VBR operation, that allows to stop transmitting completely when the
background noise is stationary. In file-based operation, since we cannot just stop writing to the file, only 5 bits are used for
such frames (corresponding to 250 bps).
Perceptual enhancement
Perceptual enhancement is a part of the decoder which, when turned on, attempts to reduce the perception of the noise/distortion
produced by the encoding/decoding process. In most cases, perceptual enhancement brings the sound further from the
original objectively (e.g. considering only SNR), but in the end it still sounds better (subjective improvement).
Latency and algorithmic delay
Every speech codec introduces a delay in the transmission. For Speex, this delay is equal to the frame size, plus some amount
of “look-ahead” required to process each frame. In narrowband operation (8 kHz), the delay is 30 ms, while for wideband (16
kHz), the delay is 34 ms. These values don’t account for the CPU time it takes to encode or decode the frames.

武漢翻譯公司

2013.2.1

  返回>>Top
-x