1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102
| """ usage: partition_dataset.py [-h] [-i IMAGEDIR] [-o OUTPUTDIR] [-r RATIO] [-x]
Partition dataset of images into training and testing sets
optional arguments: -h, --help show this help message and exit -i IMAGEDIR, --imageDir IMAGEDIR Path to the folder where the image dataset is stored. If not specified, the CWD will be used. -o OUTPUTDIR, --outputDir OUTPUTDIR Path to the output folder where the train and test dirs should be created. Defaults to the same directory as IMAGEDIR. -r RATIO, --ratio RATIO The ratio of the number of test images over the total number of images. The default is 0.1. -x, --xml Set this flag if you want the xml annotation files to be processed and copied over. """ import os import re import shutil from PIL import Image from shutil import copyfile import argparse import glob import math import random import xml.etree.ElementTree as ET
def iterate_dir(source, dest, ratio, copy_xml): source = source.replace('\\', '/') dest = dest.replace('\\', '/') train_dir = os.path.join(dest, 'train') test_dir = os.path.join(dest, 'test')
if not os.path.exists(train_dir): os.makedirs(train_dir) if not os.path.exists(test_dir): os.makedirs(test_dir)
images = [f for f in os.listdir(source) if re.search(r'([a-zA-Z0-9\s_\\.\-\(\):])+(.jpg|.jpeg|.png)$', f)]
num_images = len(images) num_test_images = math.ceil(ratio*num_images)
for i in range(num_test_images): idx = random.randint(0, len(images)-1) filename = images[idx] copyfile(os.path.join(source, filename), os.path.join(test_dir, filename)) if copy_xml: xml_filename = os.path.splitext(filename)[0]+'.xml' copyfile(os.path.join(source, xml_filename), os.path.join(test_dir,xml_filename)) images.remove(images[idx])
for filename in images: copyfile(os.path.join(source, filename), os.path.join(train_dir, filename)) if copy_xml: xml_filename = os.path.splitext(filename)[0]+'.xml' copyfile(os.path.join(source, xml_filename), os.path.join(train_dir, xml_filename))
def main():
parser = argparse.ArgumentParser(description="Partition dataset of images into training and testing sets", formatter_class=argparse.RawTextHelpFormatter) parser.add_argument( '-i', '--imageDir', help='Path to the folder where the image dataset is stored. If not specified, the CWD will be used.', type=str, default=os.getcwd() ) parser.add_argument( '-o', '--outputDir', help='Path to the output folder where the train and test dirs should be created. ' 'Defaults to the same directory as IMAGEDIR.', type=str, default=None ) parser.add_argument( '-r', '--ratio', help='The ratio of the number of test images over the total number of images. The default is 0.1.', default=0.1, type=float) parser.add_argument( '-x', '--xml', help='Set this flag if you want the xml annotation files to be processed and copied over.', action='store_true' ) args = parser.parse_args()
if args.outputDir is None: args.outputDir = args.imageDir
iterate_dir(args.imageDir, args.outputDir, args.ratio, args.xml)
if __name__ == '__main__': main()
|