最近发现一个问题,ffmpeg生成的mp3文件,在ios上会出现时长不对的情况。
转换命令如下
1 |
ffmpeg -i ${INPUT_FILE} -c:a mp3 -ab 32k -ar 44100 -ac 1 ${OUT_FILE} |
文件实际时间是00:19:14,用windows和android播放时间长度显示都没问题,而ios的safari浏览器总是读出00:10:59这样的时间长度。神了。百思不得其解。
究竟是什么原因呢?
上网搜了一下CBR的mp3的长度计算,看了下和mp3音频帧的帧头有关。那我们就来详细看看文件到底发生了什么。使用hexdump来一步步的查看文件结构吧,首先是ID3的头,一共10字节,第四个字节值是04,看出ffmpeg3.2.4默认使用的是id3 v2.4。id3最后四个字节是00 00 00 23,根据标准里的算法,得知ID3不包含头的长度是35字节,所以总共是45字节。
1 2 3 |
$ hexdump -C -n 10 mod3.mp3 00000000 49 44 33 04 00 00 00 00 00 23 |ID3......#| 0000000a |
接下来查看第一个mp3帧的头的内容,从第45字节偏移处数4个字节
1 2 |
$ hexdump -C -s 45 -n 4 mod3.mp3 0000002d ff fb 40 c0 |..@.| |
使用一个网站来查询Mp3文件的第一个音频帧,得到如下内容
1 2 3 4 5 6 7 8 9 |
Correct MP3 frame header MPEG Version MPEG1 MPEG Layer Layer III BitRate 56 kb/s SampleRate 44100 Hz Channel mode Single channel (Mono) Emphasis none Options FrameSize 182 |
有点意思!居然不是我设定的码率!那接下来看看这个音频帧的内容
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
$ hexdump -C -s 45 -n 182 mod3.mp3 0000002d ff fb 40 c0 00 00 00 00 00 00 00 00 00 00 00 00 |..@.............| 0000003d 00 00 00 00 00 49 6e 66 6f 00 00 00 0f 00 00 ac |.....Info.......| 0000004d ae 00 46 7b fa 00 02 05 08 0a 0d 0f 12 14 17 19 |..F{............| 0000005d 1c 1f 21 24 26 29 2b 2e 30 33 36 38 3b 3d 40 42 |..!$&)+.0368;=@B| 0000006d 45 47 4a 4d 4f 52 54 57 59 5c 5e 61 64 67 69 6b |EGJMORTWY\^adgik| 0000007d 6e 70 73 75 78 7b 7e 80 82 85 87 8a 8c 8f 92 94 |npsux{~.........| 0000008d 97 9a 9c 9e a1 a3 a6 a9 ab ae b1 b3 b5 b8 ba bd |................| 0000009d bf c2 c5 c8 ca cd cf d1 d4 d6 d9 dc df e1 e4 e6 |................| 000000ad e8 eb ed f0 f3 f6 f8 fb fd 00 00 00 00 4c 61 76 |.............Lav| 000000bd 63 35 37 2e 36 34 00 00 00 00 00 00 00 00 00 00 |c57.64..........| 000000cd 00 00 24 02 ec 00 00 00 00 00 46 7b fa 80 83 c1 |..$.......F{....| 000000dd 8a 00 00 00 00 00 |......| 000000e3 |
看起来好像是些乱七八糟的东西。接着往下看看下一个音频帧吧
1 2 3 |
$ hexdump -C -s 227 -n 4 mod3.mp3 000000e3 ff fb 10 c4 |....| 000000e7 |
和第一帧的头不一样,解码一下看看内容是什么意思
1 2 3 4 5 6 7 8 9 |
Correct MP3 frame header MPEG Version MPEG1 MPEG Layer Layer III BitRate 32 kb/s SampleRate 44100 Hz Channel mode Single channel (Mono) Emphasis none Options Original bit (only informative) FrameSize 104 |
这次是对的。再看看第三帧的头
1 2 3 |
$ hexdump -C -s 331 -n 4 mod3.mp3 0000014b ff fb 12 c4 |....| 0000014f |
解码一下内容
1 2 3 4 5 6 7 8 9 10 |
Correct MP3 frame header MPEG Version MPEG1 MPEG Layer Layer III BitRate 32 kb/s SampleRate 44100 Hz Channel mode Single channel (Mono) Emphasis none Options Padding bit Original bit (only informative) FrameSize 105 |
虽然和第二帧不一样,但是看起来采样率和码率一样的。接下来看了几帧
1 2 3 4 5 6 7 8 9 10 11 12 |
$ hexdump -C -s 331 -n 4 mod3.mp3 0000014b ff fb 12 c4 |....| 0000014f $ hexdump -C -s 436 -n 4 mod3.mp3 000001b4 ff fb 10 c4 |....| 000001b8 $ hexdump -C -s 540 -n 4 mod3.mp3 0000021c ff fb 12 c4 |....| 00000220 $ hexdump -C -s 645 -n 4 mod3.mp3 00000285 ff fb 10 c4 |....| 00000289 |
都是交替的ff fb 10 c4和ff fb 12 c4
所以其实时间算不对和第一个音频帧是不是有关系?为了验证,我从网上随便下载了一首歌,ios可以正常识别其长度。看了下正常文件的结构,发现从第一个音频帧开始,标记的采样率和码率都是对的。真相大白了。
那么如何解决这个问题?得想办法让ffmpeg生成正确的首个音频包。先看看ffmpeg是否有相关的选项吧,于是打开了ffmpeg的官方文档,搜索和mp3相关的部分,在mp3 muxer部分找到了下面的内容
The MP3 muxer writes a raw MP3 stream with the following optional features:
- An ID3v2 metadata header at the beginning (enabled by default). Versions 2.3 and 2.4 are supported, the
id3v2_version
private option controls which one is used (3 or 4). Settingid3v2_version
to 0 disables the ID3v2 header completely.The muxer supports writing attached pictures (APIC frames) to the ID3v2 header. The pictures are supplied to the muxer in form of a video stream with a single packet. There can be any number of those streams, each will correspond to a single APIC frame. The stream metadata tags title and comment map to APIC description and picture type respectively. See http://id3.org/id3v2.4.0-frames for allowed picture types.Note that the APIC frames must be written at the beginning, so the muxer will buffer the audio frames until it gets all the pictures. It is therefore advised to provide the pictures as soon as possible to avoid excessive buffering.- A Xing/LAME frame right after the ID3v2 header (if present). It is enabled by default, but will be written only if the output is seekable. The
write_xing
private option can be used to disable it. The frame contains various information that may be useful to the decoder, like the audio duration or encoder delay.- A legacy ID3v1 tag at the end of the file (disabled by default). It may be enabled with the
write_id3v1
private option, but as its capabilities are very limited, its usage is not recommended.Examples:
Write an mp3 with an ID3v2.3 header and an ID3v1 footer:
1 ffmpeg -i INPUT -id3v2_version 3 -write_id3v1 1 out.mp3To attach a picture to an mp3 file select both the audio and the picture stream with
map
:
12 ffmpeg -i input.mp3 -i cover.png -c copy -map 0 -map 1-metadata:s:v title="Album cover" -metadata:s:v comment="Cover (Front)" out.mp3Write a “clean” MP3 without any extra features:
1 ffmpeg -i input.wav -write_xing 0 -id3v2_version 0 out.mp3
所以说可以试试生成一个clean的mp3? 试了下,果然从第一个字节开始就是正确的码率和采样率了。解决方案如下:
1 |
ffmpeg -i ${INPUT_FILE} -c:a mp3 -ab 32k -ar 44100 -ac 1 -write_xing 0 ${OUT_FILE} |
所以还是要多看看官方文档。。。
另外继续搜了下ffmpeg的tickets,发现这是个陈年bug,但是被标记为wontfix。。。