9/21/09

What exactly is ATSC?

(The original TV technology is called analog. It is also called NTSC (National Television System Committee), which are the people who defined it. The NTSC spec was created in 1946, updated for color in 1953, and updated for stereo in 1984. Both of these updates were backward compatible, rendering nobody’s TV set obsolete. But the new digital standard is totally different. The only thing it has in common with NTSC is the 6 megahertz channel width.)

ATSC (Advanced Television Systems Committee) is the name of the technical standard that defines the digital TV (DTV) that the FCC has chosen for terrestrial TV stations. ATSC employs MPEG-2, a data compression standard. MPEG-2 typically achieves a 50-to-1 reduction in data. It achieves this by not retransmitting areas of the screen that have not changed since the previous frame.

Digital cable TV systems and DBS systems like DirecTV have devised their own standards that differ somewhat from ATSC. Their high-def set top boxes (STBs) conform to ATSC at their output connectors. Those systems use MPEG-2 or MPEG-4.

ATSC has 18 different formats. All TVs must be able to receive all of these formats and display them. The broadcaster chooses the format. Most TV sets will display only 1 or 2 of these formats, but will convert the other formats into these. All 18 formats are shown in the following table.

spec	Horizontal pixels		Vertical pixels	Aspect ratio	Monitor interface	Format name	Frames per sec	Fields per sec	Transmitted interlaced
ATSC	1920		1080	16:9	1080i	1080 60i	30	60	yes
						1080 30p	30	30	no
						1080 24p	24	24	no
	1280		720	16:9	720p	720 60p	60	60	no
						720 30p	30	30	no
						720 24p	24	24	no
	704		480	16:9	480p	480 60p	60	60	no
					480i	480 60i	30	60	yes
						480 30p	30	30	no
						480 24p	24	24	no
	704		480	4:3	480p	480 60p	60	60	no
					480i	480 60i	30	60	yes
						480 30p	30	30	no
						480 24p	24	24	no
	640		480	4:3	480p	480 60p	60	60	no
					480i	480 60i	30	60	yes
						480 30p	30	30	no
						480 24p	24	24	no
NTSC	»640	483		4:3	Note 1	NTSC	30	60	yes

Note 1: Some people refer to NTSC as 480i.

When converting NTSC to digital, about 640 pixels are required to reproduce the image nicely even though the true resolution of NTSC is roughly 400 pixels horizontal.

Interlacing

The term interlacing refers to the practice of drawing all of the odd numbered lines on the CRT, and then drawing all of the even numbered lines, which are drawn interspersed with the odd numbered lines. For 1080i, the 540 odd numbered lines are one field, and the 540 even numbered lines are the other field. When interlacing is employed, there are always two fields per frame. Progressive scan means that interlacing is not employed.

One advantage of interlacing is that, for a given bandwidth, it allows higher resolution (more pixels). Another advantage is that it reduces flicker: A bright white area of the screen will flicker (pulsate rapidly) if that area is drawn only 30 times per second. Drawing 60 fields per second mostly prevents that. Live action interlacing is usually captured by a camera that samples the scene 60 times per second, not 30, and the resulting images portray motion much better than one would expect of 30 frames per second. A disadvantage of interlacing is that data compression is not as efficient.

1080i and 480i are interlaced formats, while 720p and 480p are progressive formats.

The Monitor Interface

The receiver reduces the 18 formats to 4. The display monitor only has to deal with at most four formats. Most receivers let you select the output format, which you must match to what the monitor can do.

If you look at the second ATSC format in the above table, 1080 30p, you will note that it is transmitted in progressive format, but the receiver will convert it into 1080i, an interlaced format. Why? That is because most CRT TV sets must draw this image interlaced to prevent flicker. (CRT sets that can draw 1080 lines at 60 frames per second are very uncommon.)

Presently there are only four defined interface formats: 480i, 480p, 720p, and 1080i. There could be more, and there can be monitors that can benefit from something else. But presently such a monitor will have to have a built-in receiver. (1080p60 and 1080p24 are becoming more common monitor interface formats, but the wisdom in them can be questioned.)

Bandwidth

(The term “bandwidth” means “minimum required channel size”. Thus if a random binary data stream is fed through a 2 MHz-wide channel, and if that channel could handle twice that much data, then the bandwidth of that data stream is said to be 1 MHz.)

The bandwidth for NTSC is always 6 MHz. Without data compression, the bandwidth for 1080i would be 300 MHz. With MPEG-2 data compression the bandwidth varies according to how fast the image changes. For 480i the bandwidth rarely goes above 1 MHz. For 1080i and 720p the bandwidth rarely goes above 3 MHz.

Thus it is possible to put six 480i programs or two 1080i programs in a 6 MHz channel. The FCC allows this. Thus terrestrial DTV stations have sub-channels. It is up to the station managers how many sub-channels to have and what programming will air on those sub-channels. Note that a sub-channel showing a static image (e.g. a weather map or bulletin board) requires almost no bandwidth despite being at high resolution.

ATSC is an imperfect standard in that occasionally the bandwidth requirement will exceed the channel size. When this happens, the picture can get blurry or jumpy. Jumpiness occurs when frames are deleted. Blurriness is preferred because if momentary it is less noticeable. Transmission encoders have improved gradually and hopefully will continue to do so. In the future perhaps they will fail in a completely unnoticeable manner.

Which is better: 1080i or 720p?

· 1080i and 720p require about the same bandwidth when showing live action: A 1080i image has twice as many pixels, while 720p shows twice as many frames per second.

· While showing films at 24 frames per second, 720p requires about half the bandwidth of 1080i.

· A common opinion is that 720p is better for sporting events, while 1080i looks better for documentaries, dramas, and most things that come 24 frames per second.

Unfortunately the networks are picking one format for all their shows. ABC, ESPN, and FOX have chosen 720p. All other networks are using 1080i. Hopefully some day they will choose the format according to the content.

You can find many websites where it is argued that one format is superior. Those who favor 720p are especially strident. They always overlook the fact that many images are stills or have little motion, and will look better in 1080i. They go to great lengths to explain the problems with interlace and flicker. But few people notice these problems (assuming they are sitting at the correct distance, and assuming that rescaling hasn’t introduced gross errors).

1080i and 720p are called High Definition TV (HDTV). 480p is called Enhanced Definition TV (EDTV). 480i is Standard Definition TV (SDTV).

DBS Quality

Present DBS systems (DirecTV and Dish Network) have a bandwidth problem: too many channels. These companies have resorted to some filtering to reduce the bandwidth per program. This allows them to carry more channels, but it gives the images a slightly blurry look. They call it “noise filtering”, but in effect they have reduced the resolution to below 640x480. Exactly what this resolution is has not been stated (550x400? Nobody knows.) On a 17 inch TV this problem is not very noticeable. But the larger the set is, the more offensive it is. You might find it to be a compelling reason to put an antenna on your roof.

This filtering has been applied only to standard-definition channels. The satellite companies claim that the HDTV channels are uncompromised. (Verifying such claims is close to impossible.)

DVD Quality

DVD images are usually 720x480 pixels, 24 frames per second. DVD quality is a step up from NTSC because:

1. digital technology is noise-free.

2. the horizontal resolution is better. NTSC is equivalent to about 400 pixels.

3. when a progressive scan monitor is used, any remaining flicker is eliminated.

4. the colors are better. NTSC has an “overlapping sidebands” problem.

“Overlapping sidebands” is a compromise in NTSC that works most of the time. It will cause wrong colors to appear when showing diagonal lines or fabrics with tweed patterns. Special comb filters improve the image slightly, but DVDs avoid the problem altogether. (Comb filters are only for NTSC.)

Of course, this improvement is lost if the DVD output is converted to NTSC. Many DVD owners have been buying monitors that have component video inputs, thus avoiding NTSC. DVD quality is essentially 480p (EDTV).

What is 1080p?

The term 1080p causes a lot of confusion because people are using it in conflicting ways:

1. The transmission formats 1080 24p and 1080 30p are sometimes called 1080p. But the term 1080p should never be used to refer to a transmission format.

2. If 1080p is to 1080i what 480p is to 480i then 1080p is a 60 frames per second monitor interface format. 1080 60p is becoming a common interface format. But since there are no 1080 60p sources, there is no need for it. (When a 24 frame source is converted to 1080i, no information is lost. A smart monitor can convert this 1080i into a perfect 1080 60p or 120p.)

3. When the maker of a digital display finds a way to improve upon 1080i he will usually say his display does 1080p. The improvement is often just a way to reduce flicker. The sets internally do 1080p but only accept 1080i at the interface.

If you want to be understood unambiguously, you should refrain from using the term 1080p. Instead use 1080p60, etc.

When a 24 frames/second source is converted to 60p, judder (described below) is introduced. A 120Hz display will have to remove that judder, making conversion to 60p seem counterproductive.

A justification often used for 1080 60p is that if the STB and the monitor can both do the conversion to 60p then the viewer can select the one that does it better.

Live action 720p looks a little better when it is converted directly to 1080p, not going through 1080i.

If Hollywood ever decides to make movies at 60 frames/second then 1080 60p will become an essential monitor interface format. But there is presently no indication that they might do this. In fact they tend to consider the flaws in film to be an artistic enhancement (the “film look”).

Motion Compensated Processing

In this process a computer in the receiver turns a 24 or 30 frames/sec image into a true 60 or 120 frames/sec image. The motion vectors (described below) are used for creating the missing frames. This is probably the best hope for truly smooth motion for 1080i or films. But it requires the networks and studios to make maximum use of motion vectors, which may or may not happen.

The motion vectors are not sent over the HDMI interface. So motion compensated processing must be done by the receiver or the DVD player. The monitor’s internal motion compensated processing works only with its internal receiver.

Motion Adaptive De-interlacing

Many 1080p monitors employ motion adaptive de-interlacing, which does not use the motion vectors. To create the missing frames, the set first divides the image into regions of motion and regions without motion. Areas without motion are de-interlaced by combining with the previous field. For areas with motion the missing scan lines are created by averaging the adjacent lines above and below. This is all pretty new and some sets are noticeably better than others. When done well, motion adaptive de-interlacing produces an image about as good as 1080i but without the flicker.

Does up-conversion improve the image?

Many receivers can up-convert a 480 image to 720. Others convert a 480 or 720 image to 1080. Does up-conversion result in a better image? Maybe. There are edge enhancement methods that reduce blurriness. But if the original image had no blurriness (de-aliasing) and you enhance it and try to sit closer, you will see false features. In any case, enhancing the resolution by more than 50% usually introduces objectionable false features. An enhanced 480 image will never look like a good 1080 image.

So while there are times when up-conversion will help, it does not permit you to sit at the distance proper for the higher resolution.

Some minor improvements sometimes happen with up-conversion. For example people who sit too close will notice that the scanning lines are gone.

But on the minus side, format conversions sometimes make interlace errors worse.

The 3:2 Pull-down Issue

Theater film is usually 24 frames per second while TV monitors usually operate at 30 or 60 frames per second. 3:2 pull-down, also called telecining, is the process of converting a 24 frames per second image into a 60 frames or fields per second image. It will normally happen one of two ways:

Thus the TV frames are 3 copies of a film frame, followed by 2 copies of the next frame, then 3, then 2, 3, 2, 3, 2, etcetera. This works well. (It has a minor flaw: During each second, 12 film frames are stretched slightly in time, while the other 12 are shrunken. The stretched frames dominate slightly, producing a slightly jerky image when motion is portrayed. This is called 12-cycle judder. Judder is seldom obvious. The only reliable way to see it is to watch the credits roll up the screen at the end of a program.)

Thus 3:2 pull-down is nearly the same for interlaced and progressive scan. Given the way the brain works, they will look about the same. Does the progressive scan version look better? Many people believe so. This author believes progressive scan is superior by an increment so small that it is not generally noticeable. (Keep in mind, less than 5% of the face of the CRT is lit at any instant. That it is fully lit is an illusion.)

What is not in disagreement are the problems of de-interlacing an interlaced signal. Nearly all DVDs contain data that is already telecined and interlaced. To convert an interlaced image into a 60-frames/sec image you can simply combine successive fields and display them twice. But if you do this with film material, the following happens:

A moving object will turn blurry 6 times per second, which is quite noticeable. To fix this, the de-interlacer must be smart enough to match fields originally from the same film frame. That process is called reverse 3:2 pull-down, and sometimes cadence detection. Manufacturers sometimes call it 3:2 pull-down detection or sometimes just 3:2 pull-down (which is obviously wrong).

If you want to watch DVDs on a progressive monitor, make sure you have reverse 3:2 pull-down. De-interlacing is also called line doubling or line scaling.

120 Hertz technology

The best monitors show the image 120 times per second. 120 is an exact multiple of 24, 30, and 60. Thus in theory the monitor can show any program without introducing judder. But will the monitor actually do this? Consider the following questions:

1) When using the TV’s internal receiver:

a) Can the monitor detect a 24p program and display it without judder?

b) Can the monitor detect a 24 frames/sec program transmitted in 1080 60i and reorganize it for display without judder?

c) Are motion compensated processing and judder-free display simultaneously possible?

2) When taking input from DVI or HDMI:

a) Can the monitor accept 24p and display it without judder?

b) Can the monitor detect a 24 frames/sec program transmitted in 1080 60i or 60p and reorganize it for display without judder?

c) Are motion compensated processing and judder-free display simultaneously possible?

The answer to question 2c is generally no. Since the motion vectors are not available to the monitor, the DVD player would have to do the motion compensated processing. But monitors that can accept 1080 120p are very rare, and 60p will introduce judder. You will likely have to choose either motion compensated processing or judder-free display, whichever looks better on your system.

120 Hertz technology is often touted as a fix for the slowness of LCDs. This is a dubious claim. The improvement is usually miniscule, often not discernable.

Blocks and macro-blocks

A block is an 8-by-8 array of colorless pixels. Thus a block is 64 8-bit numbers.

A macro-block is a 16-by-16 array of complete pixels. A macro-block is made up of four Y blocks plus one Pr block and one Pb block.

A 720p image is 80 macro-blocks wide and 45 macro-blocks high. Each of these 3600 macro-blocks has an address. With each new frame, only the macro-blocks that change are transmitted.

In the transmitted data, a row or partial row of consecutive macro-blocks is called a slice.

Usually each pixel in a block is subtracted from the same pixel from the previous frame. Thus a transmitted block is a block of change values, and gets added to the image in the receiver. But if there is motion, the pixel is subtracted from a nearby pixel in the previous frame, and a motion vector is transmitted with each block. The objective here is to transmit as many zero-valued pixels as possible.

Next each block (64 8-bit numbers) is further compressed by the following three processes:

A Discrete Cosine Transform (DCT) is performed on the block. This process shrinks the data a lot if the block data is not truly random. The transmitted data consists of the resulting transform coefficients, not the pixel values.
A Quantization process is performed. This process might be described as throwing away low-order bits of the transform coefficients. It is a lossy process, which means it degrades the image somewhat. But the degree of quantization is selectable. Thus more precision can be thrown away when image motion causes the process to fall behind.
Variable length encoding. This assigns very short codes to common values, but very long codes to uncommon values. The block subtraction, DCT, and Quantization together result in a large number of the transform coefficients being zero or very simple. Thus variable length coding compresses these values to almost nothing.

The process is slightly more complicated when interleaved images are sent.

This description of MPEG-2 has been extremely brief. There is a more detailed description on the BBC website.

The DCT further explained

A discrete cosine transform is a lot like a Fourier transform. A Fourier transform converts a time function into a frequency function. A DCT converts a spatial function into a “spatial frequencies” function. It converts 64 pixel values into 64 DCT coefficients.

In theory, each DCT coefficient is computed by the formula

Thus all 64 pixels make a contribution to each DCT coefficient. These transforms are reversible. The receiver must perform an Inverse DCT on the coefficients to obtain the 64 pixel values.

A complete understanding of what transforms do and how they do it is a challenging mathematical topic. But consider this simplification. In the following diagram, a line of 8 pixels is shown along with their 1-dimensional DCT coefficients.

Only two coefficients are non-zero, a considerable reduction in data. Coefficient c0 is called the D.C. coefficient because it represents the average height of the 8 pixels. In the receiver the other 7 coefficients would specify the magnitudes of 7 cosine waves. The receiver will just add together those cosines and the D.C., and the result is exactly the original 8 pixels.

If the 8 pixel values were completely random then the coefficients would be too, and there would be no data reduction. But common images are not random, so there are usually more zeroes in the transform than in the pixels.

Here, for your consideration, is an assortment of 8x8 pixel blocks and their DCT coefficients:

8 VSB (8-level vestigial sideband, used in North America)

ATSC and 8-VSB are defined by document A/53 on the ATSC website. The video data is put through the following sequence of processes:

All of these steps are reversible. To recover the original data the receiver must reverse them all in reverse order.

Service multiplexer

This produces a stream of data consisting of video packets, audio packets, and ancillary data packets. This stream is called an MPEG-2 transport stream and is compatible with the streams produced by DVD, satellite, and cable systems.

Ancillary data includes

1. Closed caption data for the hearing impaired

2. PSIP data (explained below)

3. Data-casting (business opportunities not related to TV broadcasting)

Transport

Null packets are added to the data stream as necessary to make the data rate a constant 19.28 megabits per second. The data stream is re-divided into data segments that are all 187 data bytes long. This output is called the payload.

Data randomizer

All of the payload data is randomized by exclusive-ORing it with a pseudo-random data pattern. This is done to keep the spectrum of the transmitted signal flat.

Reed-Solomon encoder

Twenty Reed-Solomon bytes are added to the end of each data segment, so the segments now have 207 bytes. These added bytes are for error correction of data corrupted during transmission. This is also called Forward Error Correction (FEC). The receiver can correct up to 10-byte errors per data segment.

Convolutional interleaver

Next the data segments are grouped into groups of 52 segments. The bytes of each segment are moved to different segments, distributed evenly among the group of 52.

Suppose during transmission a long string of consecutive bits is corrupted. When the receiver puts the bytes back in their correct order, this long string is converted into many short errors, which can all be fixed by Reed-Solomon error correction. Thus ATSC is unaffected by most shot noise, a common type of interference. Noise bursts of up to 193 microseconds will be fully corrected.

Trellis encoder

Trellis coding is currently the best method known for sending digital data in a channel containing Gaussian (white) noise. The improvement is equivalent to using four times as much transmitter power (assuming the receiver is well designed). First the data stream is divided into symbols. Initially each symbol is two bits. Then the trellis coder recodes the symbol (folding in some previous data) and adds a third bit. With all the data added to the segments, the data rate is now over 32 megabits per second.

VSB modulator

Next the 3 bits of each symbol are converted into an 8-level signal. A 4-symbol sync is added to each segment.

After every group of 312 data segments a 313th segment is inserted. This 313th segment is a fixed pattern that the receiver will look for (to know where a 52-segment group starts). The symbol rate is 10.76 million symbols per second.

Multiplying this 8-level signal by a high frequency sine wave will result in AM modulation. If the 8 levels are as often negative as positive, the resulting AM will have no carrier. To prevent this, a small DC level is added to the 8-level signal. The resulting small carrier is called a pilot.

Finally, filters are used to remove all but the carrier and the first 6 megahertz of the upper sideband.

The diagrams above show the average signal density for each channel. The mathematics behind amplitude modulation will not be explained here. The diagrams are presented for the benefit of people who already have a working knowledge of AM.

RF up-converter and transmitter

This step is the same for ATSC as for NTSC. The intermediate frequency signal is converted to the final frequency, boosted to a high voltage, and sent to the antenna.

_______________________________________

PSIP Data (Program and System Information Protocol)

PSIP data is ancillary data, either binary data or text (never audio or video data). Some PSIP data is essential, but most is just helpful. PSIP text employs Huffman code, which is a variable length code for characters. (The most common characters are the shortest.)

The data is arranged as tables with optional sub-tables. The four PSIP tables are:

1. Virtual Channel Table – This table lists all the sub-channels and their attributes. This table is transmitted 2.5 times per second. The table includes:

A. The two-part channel number

B. The sub-channel short name (up to 7 characters)

C. The associated NTSC channel number

D. The FCC-issued TSID (a signal ID, not the call sign)

E. The MPEG-2 program number

F. The type of service (TV, audio only, data only)

G. A link to the EITs for this sub-channel

H. All the packet IDs (PIDs)

2. System Time Table – This small table just contains the current time. This table is transmitted once per second.

3. Rating Region Table – This small table names the program rating system in use. In the U.S. this would be the TVPG system. This table is transmitted once per minute.

4. Master Guide Table – This table links to the sub-tables. This table is transmitted 7 times per second. The sub-tables are:

A. Event Information Tables: EIT-0, EIT-1, EIT-2, … EIT-127. Each EIT covers a three-hour period, and describes all the programs (events) for that period. EIT tables 0-3, which are required, will describe 12 hours of events. EIT tables 4-127, which are optional, will extend this description to 16 days. There will be multiple EITs 0-127 if there are multiple sub-channels. The EIT contains the following information for each event:

1. Event start time.

2. Event duration

3. Event title

4. Pointer to an ETT describing the program

5. Program content advisory

6. Caption options

7. Audio options

The required repetition rates are:

· EIT-0 is sent twice per second

· EIT-1 is sent once every 3 seconds

· EIT-2 to EIT-127 are sent once every minute

B. Extended Text Tables – This table will describe a program. Out of the EIT and ETT tables the receiver can build a complete Program Guide for the channel. ETTs can also be used to describe the purpose of a sub-channel.

The PSIP standard is document A/65C on the ATSC website.

If you actually read all of this, consider yourself an Honorary Engineer.

This page is part of “An HDTV Primer”, which starts at www.hdtvprimer.com

Interlacing

The Monitor Interface

Bandwidth

Which is better: 1080i or 720p?

DBS Quality

DVD Quality

What is 1080p?

The term 1080p causes a lot of confusion because people are using it in conflicting ways:

1. The transmission formats 1080 24p and 1080 30p are sometimes called 1080p. But the term 1080p should never be used to refer to a transmission format.

Motion Compensated Processing

Motion Adaptive De-interlacing

Does up-conversion improve the image?

So while there are times when up-conversion will help, it does not permit you to sit at the distance proper for the higher resolution.

Some minor improvements sometimes happen with up-conversion. For example people who sit too close will notice that the scanning lines are gone.

The 3:2 Pull-down Issue

120 Hertz technology

1) When using the TV’s internal receiver:

a) Can the monitor detect a 24p program and display it without judder?

b) Can the monitor detect a 24 frames/sec program transmitted in 1080 60i and reorganize it for display without judder?

c) Are motion compensated processing and judder-free display simultaneously possible?

2) When taking input from DVI or HDMI:

a) Can the monitor accept 24p and display it without judder?

b) Can the monitor detect a 24 frames/sec program transmitted in 1080 60i or 60p and reorganize it for display without judder?

c) Are motion compensated processing and judder-free display simultaneously possible?

Related topics

Blocks and macro-blocks

The DCT further explained

8 VSB (8-level vestigial sideband, used in North America)

Service multiplexer

Transport