This means that I needed to be able to read the metadata, grab some key information from each file, and have the ability to make decisions quickly. For reading the metadata, I chose ffprobe, which is part of the ffmpeg Windows binaries made available from Zeranoe.
Then, I fired up cygwin and used the following bash script to grab the stuff I cared about:
The double quotes ward off undesirable behavior when the file names have spaces and other characters that would need to be escaped out in them. The sequence of "OR"ed parameters that egrep is filtering from the output are what allow me to find out what I need to know about both the video and audio parts of each file.for file in *;do ls "$file" >> list.txt;ffprobe.exe -v quiet -show_streams -show_data -pretty -of json "$file" | egrep codec_name\|width\|height\|bit_rate\|channel_layout\|sample_rate >> list.txt;done
Here's an example of the output for two versions of the same source material that were inadvertently created:
VERSION_A.m4v
"codec_name": "h264",
"width": 704,
"height": 384,
"bit_rate": "863.986000 Kbit/s",
"codec_name": "aac",
"sample_rate": "48000 KHz",
"channel_layout": "stereo",
"bit_rate": "164.469000 Kbit/s",
VERSION_B.m4v
"codec_name": "h264",
"width": 1920,
"height": 800,
"bit_rate": "5.983516 Mbit/s",
"codec_name": "aac",
"sample_rate": "48000 KHz",
"channel_layout": "stereo",
"bit_rate": "198.056000 Kbit/s",
"codec_name": "ac3",
"sample_rate": "48000 KHz",
"channel_layout": "5.1(side)",
"bit_rate": "640000 Kbit/s",
In this case, there's a clear winner. Version A is "DVD quality" and only contains a stereo soundtrack. Version B is "full HD" and includes not only a slightly higher bit-rate stereo audio track, but also a 5.1 surround audio track. As you can guess, the file sizes are significantly different, so you'd think you can just keep the larger file... In this case, that would work, but the distinctions are not always so clear, and the data makes the job of curating a bit easier.
With a small amount of work, I should be able to get this output file into a CSV format and then quickly merge, sort, and filter the whole list as a spreadsheet.