解决ffmpeg生成mp3在ios上时长不对的问题

最近发现一个问题，ffmpeg生成的mp3文件，在ios上会出现时长不对的情况。

转换命令如下

ffmpeg -i ${INPUT_FILE} -c:a mp3 -ab 32k -ar 44100 -ac 1 ${OUT_FILE}

1	ffmpeg -i ${INPUT_FILE} -c:a mp3 -ab 32k -ar 44100 -ac 1 ${OUT_FILE}

文件实际时间是00:19:14，用windows和android播放时间长度显示都没问题，而ios的safari浏览器总是读出00:10:59这样的时间长度。神了。百思不得其解。

究竟是什么原因呢？

上网搜了一下CBR的mp3的长度计算，看了下和mp3音频帧的帧头有关。那我们就来详细看看文件到底发生了什么。使用hexdump来一步步的查看文件结构吧，首先是ID3的头，一共10字节，第四个字节值是04，看出ffmpeg3.2.4默认使用的是id3 v2.4。id3最后四个字节是00 00 00 23，根据标准里的算法，得知ID3不包含头的长度是35字节，所以总共是45字节。

$ hexdump -C -n 10 mod3.mp3
00000000  49 44 33 04 00 00 00 00  00 23                    |ID3......#|
0000000a

$ hexdump -C -n 10 mod3.mp3

00000000 49 44 33 04 00 00 00 00 00 23 |ID3......#|

0000000a

接下来查看第一个mp3帧的头的内容，从第45字节偏移处数4个字节

$ hexdump -C -s 45 -n 4 mod3.mp3
0000002d  ff fb 40 c0                                       |..@.|

1 2	$ hexdump -C -s 45 -n 4 mod3.mp3 0000002d ff fb 40 c0 \|..@.\|

使用一个网站来查询Mp3文件的第一个音频帧，得到如下内容

Correct MP3 frame header
MPEG Version	MPEG1
MPEG Layer	Layer III
BitRate	56 kb/s
SampleRate	44100 Hz
Channel mode	Single channel (Mono)
Emphasis	none
Options	
FrameSize	182

Correct MP3 frame header

MPEG Version MPEG1

MPEG Layer Layer III

BitRate 56 kb/s

SampleRate 44100 Hz

Channel mode Single channel (Mono)

Emphasis none

Options

FrameSize 182

有点意思！居然不是我设定的码率！那接下来看看这个音频帧的内容

$ hexdump -C -s 45 -n 182 mod3.mp3
0000002d  ff fb 40 c0 00 00 00 00  00 00 00 00 00 00 00 00  |..@.............|
0000003d  00 00 00 00 00 49 6e 66  6f 00 00 00 0f 00 00 ac  |.....Info.......|
0000004d  ae 00 46 7b fa 00 02 05  08 0a 0d 0f 12 14 17 19  |..F{............|
0000005d  1c 1f 21 24 26 29 2b 2e  30 33 36 38 3b 3d 40 42  |..!$&)+.0368;=@B|
0000006d  45 47 4a 4d 4f 52 54 57  59 5c 5e 61 64 67 69 6b  |EGJMORTWY\^adgik|
0000007d  6e 70 73 75 78 7b 7e 80  82 85 87 8a 8c 8f 92 94  |npsux{~.........|
0000008d  97 9a 9c 9e a1 a3 a6 a9  ab ae b1 b3 b5 b8 ba bd  |................|
0000009d  bf c2 c5 c8 ca cd cf d1  d4 d6 d9 dc df e1 e4 e6  |................|
000000ad  e8 eb ed f0 f3 f6 f8 fb  fd 00 00 00 00 4c 61 76  |.............Lav|
000000bd  63 35 37 2e 36 34 00 00  00 00 00 00 00 00 00 00  |c57.64..........|
000000cd  00 00 24 02 ec 00 00 00  00 00 46 7b fa 80 83 c1  |..$.......F{....|
000000dd  8a 00 00 00 00 00                                 |......|
000000e3

$ hexdump -C -s 45 -n 182 mod3.mp3

0000002d ff fb 40 c0 00 00 00 00 00 00 00 00 00 00 00 00 |..@.............|

0000003d 00 00 00 00 00 49 6e 66 6f 00 00 00 0f 00 00 ac |.....Info.......|

0000004d ae 00 46 7b fa 00 02 05 08 0a 0d 0f 12 14 17 19 |..F{............|

0000005d 1c 1f 21 24 26 29 2b 2e 30 33 36 38 3b 3d 40 42 |..!$&)+.0368;=@B|

0000006d 45 47 4a 4d 4f 52 54 57 59 5c 5e 61 64 67 69 6b |EGJMORTWY\^adgik|

0000007d 6e 70 73 75 78 7b 7e 80 82 85 87 8a 8c 8f 92 94 |npsux{~.........|

0000008d 97 9a 9c 9e a1 a3 a6 a9 ab ae b1 b3 b5 b8 ba bd |................|

0000009d bf c2 c5 c8 ca cd cf d1 d4 d6 d9 dc df e1 e4 e6 |................|

000000ad e8 eb ed f0 f3 f6 f8 fb fd 00 00 00 00 4c 61 76 |.............Lav|

000000bd 63 35 37 2e 36 34 00 00 00 00 00 00 00 00 00 00 |c57.64..........|

000000cd 00 00 24 02 ec 00 00 00 00 00 46 7b fa 80 83 c1 |..$.......F{....|

000000dd 8a 00 00 00 00 00 |......|

000000e3

看起来好像是些乱七八糟的东西。接着往下看看下一个音频帧吧

$ hexdump -C -s 227 -n 4 mod3.mp3
000000e3  ff fb 10 c4                                       |....|
000000e7

$ hexdump -C -s 227 -n 4 mod3.mp3

000000e3 ff fb 10 c4 |....|

000000e7

和第一帧的头不一样，解码一下看看内容是什么意思

Correct MP3 frame header
MPEG Version	MPEG1
MPEG Layer	Layer III
BitRate	32 kb/s
SampleRate	44100 Hz
Channel mode	Single channel (Mono)
Emphasis	none
Options	Original bit (only informative)
FrameSize	104

Correct MP3 frame header

MPEG Version MPEG1

MPEG Layer Layer III

BitRate 32 kb/s

SampleRate 44100 Hz

Channel mode Single channel (Mono)

Emphasis none

Options Original bit (only informative)

FrameSize 104

这次是对的。再看看第三帧的头

$ hexdump -C -s 331 -n 4 mod3.mp3
0000014b  ff fb 12 c4                                       |....|
0000014f

$ hexdump -C -s 331 -n 4 mod3.mp3

0000014b ff fb 12 c4 |....|

0000014f

解码一下内容

Correct MP3 frame header
MPEG Version	MPEG1
MPEG Layer	Layer III
BitRate	32 kb/s
SampleRate	44100 Hz
Channel mode	Single channel (Mono)
Emphasis	none
Options	Padding bit
Original bit (only informative)
FrameSize	105

Correct MP3 frame header

MPEG Version MPEG1

MPEG Layer Layer III

BitRate 32 kb/s

SampleRate 44100 Hz

Channel mode Single channel (Mono)

Emphasis none

Options Padding bit

Original bit (only informative)

FrameSize 105

虽然和第二帧不一样，但是看起来采样率和码率一样的。接下来看了几帧

$ hexdump -C -s 331 -n 4 mod3.mp3
0000014b  ff fb 12 c4                                       |....|
0000014f
$ hexdump -C -s 436 -n 4 mod3.mp3
000001b4  ff fb 10 c4                                       |....|
000001b8
$ hexdump -C -s 540 -n 4 mod3.mp3
0000021c  ff fb 12 c4                                       |....|
00000220
$ hexdump -C -s 645 -n 4 mod3.mp3
00000285  ff fb 10 c4                                       |....|
00000289

$ hexdump -C -s 331 -n 4 mod3.mp3

0000014b ff fb 12 c4 |....|

0000014f

$ hexdump -C -s 436 -n 4 mod3.mp3

000001b4 ff fb 10 c4 |....|

000001b8

$ hexdump -C -s 540 -n 4 mod3.mp3

0000021c ff fb 12 c4 |....|

00000220

$ hexdump -C -s 645 -n 4 mod3.mp3

00000285 ff fb 10 c4 |....|

00000289

都是交替的ff fb 10 c4和ff fb 12 c4

所以其实时间算不对和第一个音频帧是不是有关系?为了验证，我从网上随便下载了一首歌，ios可以正常识别其长度。看了下正常文件的结构，发现从第一个音频帧开始，标记的采样率和码率都是对的。真相大白了。

那么如何解决这个问题？得想办法让ffmpeg生成正确的首个音频包。先看看ffmpeg是否有相关的选项吧，于是打开了ffmpeg的官方文档，搜索和mp3相关的部分，在mp3 muxer部分找到了下面的内容

The MP3 muxer writes a raw MP3 stream with the following optional features:

An ID3v2 metadata header at the beginning (enabled by default). Versions 2.3 and 2.4 are supported, the id3v2_version private option controls which one is used (3 or 4). Setting id3v2_version to 0 disables the ID3v2 header completely.The muxer supports writing attached pictures (APIC frames) to the ID3v2 header. The pictures are supplied to the muxer in form of a video stream with a single packet. There can be any number of those streams, each will correspond to a single APIC frame. The stream metadata tags title and comment map to APIC description and picture type respectively. See http://id3.org/id3v2.4.0-frames for allowed picture types.Note that the APIC frames must be written at the beginning, so the muxer will buffer the audio frames until it gets all the pictures. It is therefore advised to provide the pictures as soon as possible to avoid excessive buffering.

A Xing/LAME frame right after the ID3v2 header (if present). It is enabled by default, but will be written only if the output is seekable. The write_xing private option can be used to disable it. The frame contains various information that may be useful to the decoder, like the audio duration or encoder delay.

A legacy ID3v1 tag at the end of the file (disabled by default). It may be enabled with the write_id3v1 private option, but as its capabilities are very limited, its usage is not recommended.

Examples:

Write an mp3 with an ID3v2.3 header and an ID3v1 footer:

ffmpeg -i INPUT -id3v2_version 3 -write_id3v1 1 out.mp3

1

ffmpeg -i INPUT -id3v2_version 3 -write_id3v1 1 out.mp3

To attach a picture to an mp3 file select both the audio and the picture stream with map:

ffmpeg -i input.mp3 -i cover.png -c copy -map 0 -map 1 -metadata:s:v title="Album cover" -metadata:s:v comment="Cover (Front)" out.mp3

1
2

ffmpeg -i input.mp3 -i cover.png -c copy -map 0 -map 1
-metadata:s:v title="Album cover" -metadata:s:v comment="Cover (Front)" out.mp3

Write a “clean” MP3 without any extra features:

ffmpeg -i input.wav -write_xing 0 -id3v2_version 0 out.mp3

1

ffmpeg -i input.wav -write_xing 0 -id3v2_version 0 out.mp3

所以说可以试试生成一个clean的mp3? 试了下，果然从第一个字节开始就是正确的码率和采样率了。解决方案如下：

ffmpeg -i ${INPUT_FILE} -c:a mp3 -ab 32k -ar 44100 -ac 1 -write_xing 0 ${OUT_FILE}

1	ffmpeg -i ${INPUT_FILE} -c:a mp3 -ab 32k -ar 44100 -ac 1 -write_xing 0 ${OUT_FILE}

所以还是要多看看官方文档。。。

另外继续搜了下ffmpeg的tickets，发现这是个陈年bug，但是被标记为wontfix。。。