ERIS-2: MLP Edition: Take 2: Electric Boogaloo
Posted 9 years agoAfter the sort of mediocre results of the first attempt to train ERIS-2 on MLP images, I was itching to try a trick I had thought of. I separated the brightness value of each pixel from its color, and fed those values in separately. Otherwise, the two programs are identical.
As I had hoped, this had some very beneficial effects at first, making it easier for the AI to learn sharper edges and more distinct features (especially eyes!). But, bizarrely, it also caused it to occasionally create images that are basically just a single flat color with a little bit of noise, very strange. In addition, it seemed to create less and less interesting images as it continued to train, sadly. An interesting experiment nonetheless, I'll probably be doing this trick on all my AIs from now on. Though I think I have finally reached the very end of what the ERIS architecture is capable of, and so I have almost finished the next generation of AI architecture. It's a lot more sophisticated than ERIS, so I'm excited to see what it will do! As always, see this journal for the full technical rundown of ERIS.
Full training timeline: http://imgur.com/a/bq4au
As I had hoped, this had some very beneficial effects at first, making it easier for the AI to learn sharper edges and more distinct features (especially eyes!). But, bizarrely, it also caused it to occasionally create images that are basically just a single flat color with a little bit of noise, very strange. In addition, it seemed to create less and less interesting images as it continued to train, sadly. An interesting experiment nonetheless, I'll probably be doing this trick on all my AIs from now on. Though I think I have finally reached the very end of what the ERIS architecture is capable of, and so I have almost finished the next generation of AI architecture. It's a lot more sophisticated than ERIS, so I'm excited to see what it will do! As always, see this journal for the full technical rundown of ERIS.
Full training timeline: http://imgur.com/a/bq4au
ERIS-2 meets Derpibooru.org
Posted 9 years agoAfter the mixed results of training ERIS-2 on e621 images, I wondered what would happen if I gave her a more "consistent" dataset. I decided to download 85k images from derpibooru.org, nicely filtered to make sure there are no/very few sketches, monochrome images and other things that might confuse the AI. I hoped the colorful images of distinct characters would result in better images.
The results were...interesting. It seems that the program is even more unpredictable on this dataset, but I think has been quite consistently improving over the generations. One of the most unusual things I have noticed is that it much more than previous AIs produces slight variations of the same images. It may be suffering from the slightly smaller dataset, but I'm not quite sure. I have a hunch it'd benefit from more training. But instead of continuing, I have written a new version with some tweaks to the data. We will see! As always, see this journal for the full technical rundown.
Training Timeline: http://imgur.com/a/JfWmm
The results were...interesting. It seems that the program is even more unpredictable on this dataset, but I think has been quite consistently improving over the generations. One of the most unusual things I have noticed is that it much more than previous AIs produces slight variations of the same images. It may be suffering from the slightly smaller dataset, but I'm not quite sure. I have a hunch it'd benefit from more training. But instead of continuing, I have written a new version with some tweaks to the data. We will see! As always, see this journal for the full technical rundown.
Training Timeline: http://imgur.com/a/JfWmm
ERIS-2
Posted 9 years agoAnnouncing the release of ERIS-2, the second generation of the ERIS system! Boasting a slightly more sophisticated architecture, and more importantly, 128x128px output images, ERIS-2 takes what ERIS-1 did, but makes it bigger. While some of the results are definitely interesting, sadly I feel it suffers even more from instability than ERIS-1, being unable to learn more past a certain point. The results are interesting, but lets be honest, are mostly curious blobs of color. We must do better! Keep an eye out for my next project!
For those interested in the details of ERIS-2, it is basically the exact same as ERIS-1, but with a few additional layers, and an input and output size of 128x128 instead of 64x64. Training 50 epochs over the 100k image dataset took roughly 52 hours on my GTX 970. See this journal for the full technical rundown.
Here is the training timeline: http://imgur.com/a/DthEP
For those interested in the details of ERIS-2, it is basically the exact same as ERIS-1, but with a few additional layers, and an input and output size of 128x128 instead of 64x64. Training 50 epochs over the 100k image dataset took roughly 52 hours on my GTX 970. See this journal for the full technical rundown.
Here is the training timeline: http://imgur.com/a/DthEP
ERIS-1 Technicals
Posted 9 years agoFor those already familiar with machine learning, here is the TLDR rundown of my approach. This probably won't make any sense to you if you haven't worked with machine learning before.
ERIS-1 is a Generative Adversial Network. Basically, you create two neural networks, a "discriminatory" and a "generative" network. The former is fed by the training images and images output by the latter, and is graded by it's ability to differentiate images generated by the gen network from real training images. On the other hand, the generative network is rated higher the better it can "trick" the discriminatory network. Over time the generative network, in theory, produces images that are very hard to tell apart from "real" images.
The network architecture is four layers of convolutions, with ReLus and batchnorms inbetween. If someone is interested in the exact architecture, I can write it out sometime. ERIS-1 is trained for 50 epochs on the top 100k highest rated images downloaded from e621.net, resized to 64x64px. This is somewhat on the lower end of training data size, and I hope to produce better networks using more data in the future. Training ERIS-1 took about 9 hours on a GTX 970, though improvements I made to the training code may allow me to train other networks a bit quicker in the future.
I train my networks on a GTX 970 card. All of my code in written in Python. I use the library Theano as the backbone for my work (with cuDNN). I use Numpy/Scipy, Fuel, lxml, BeautifulSoup and just the plain Python standard library to handle my data. If someone is interested in how I found, acquired and prepared my data, I can give a more detailed rundown.
Here is a collection of every epoch of ERIS-1, you can really see the progress being made before it peaks around 40ish: http://imgur.com/a/P7CUK
ERIS-1 is a Generative Adversial Network. Basically, you create two neural networks, a "discriminatory" and a "generative" network. The former is fed by the training images and images output by the latter, and is graded by it's ability to differentiate images generated by the gen network from real training images. On the other hand, the generative network is rated higher the better it can "trick" the discriminatory network. Over time the generative network, in theory, produces images that are very hard to tell apart from "real" images.
The network architecture is four layers of convolutions, with ReLus and batchnorms inbetween. If someone is interested in the exact architecture, I can write it out sometime. ERIS-1 is trained for 50 epochs on the top 100k highest rated images downloaded from e621.net, resized to 64x64px. This is somewhat on the lower end of training data size, and I hope to produce better networks using more data in the future. Training ERIS-1 took about 9 hours on a GTX 970, though improvements I made to the training code may allow me to train other networks a bit quicker in the future.
I train my networks on a GTX 970 card. All of my code in written in Python. I use the library Theano as the backbone for my work (with cuDNN). I use Numpy/Scipy, Fuel, lxml, BeautifulSoup and just the plain Python standard library to handle my data. If someone is interested in how I found, acquired and prepared my data, I can give a more detailed rundown.
Here is a collection of every epoch of ERIS-1, you can really see the progress being made before it peaks around 40ish: http://imgur.com/a/P7CUK
The Future
Posted 9 years agoHere is seome unsorted ramblings on future project ideas floating in my head. I can't guarantee I'll actually ever implement any of this, so don't take it as a "roadmap". A lot of the more lofty goals will be hitting the hard limit of how much money I have to spend on high end GPUs. Which isn't much. I'm not going to give any references on jargon, it's more for my own notekeeping than anything.
End Game: A system that you give a description of an image (or animation) you want (e.g. "A female cat anthro with extremely large breasts has vaginal sex with a male fox anthro with a large penis drawn cartoony") and it outputs images to that specification. Maybe one could even incorporate the ability to "encode" new characters and situations, use "heatmaps" to sketch what the final result should be like etc. Lofty goal, definitely not doable unless I had at least some funding, but hypothetically feasible.
-Scale up ERIS-1 to higher resolutions (specifically 128x128) DONE!
-Download and munge the entire e621 dataset
-Create ERIS variants with different architecture. Specifically, I am interested in the effects of significantly deeper networks.
-Try sparse autoencoders and deep belief networks instead of Generative Adversarial Networks
-Train a VGG-19 like network to tag images
-Use things learned from previous experiments to create a system that takes as input a list of tags and outputs an image
-Train extremely specific networks as feature extractors to use in better generation and classification
-Experiment with attention for image composition
-Create a livestream game where users in chat can feed in tags/suggestions that are used to create images live
-Use similar techniques plus frame motion to generate videos (Most likely not feasible with current technology)
-Incorporate 3D simulation to allow "rotating" of characters, objects and scenes
End Game: A system that you give a description of an image (or animation) you want (e.g. "A female cat anthro with extremely large breasts has vaginal sex with a male fox anthro with a large penis drawn cartoony") and it outputs images to that specification. Maybe one could even incorporate the ability to "encode" new characters and situations, use "heatmaps" to sketch what the final result should be like etc. Lofty goal, definitely not doable unless I had at least some funding, but hypothetically feasible.
-Scale up ERIS-1 to higher resolutions (specifically 128x128) DONE!
-Download and munge the entire e621 dataset
-Create ERIS variants with different architecture. Specifically, I am interested in the effects of significantly deeper networks.
-Try sparse autoencoders and deep belief networks instead of Generative Adversarial Networks
-Train a VGG-19 like network to tag images
-Use things learned from previous experiments to create a system that takes as input a list of tags and outputs an image
-Train extremely specific networks as feature extractors to use in better generation and classification
-Experiment with attention for image composition
-Create a livestream game where users in chat can feed in tags/suggestions that are used to create images live
-Use similar techniques plus frame motion to generate videos (Most likely not feasible with current technology)
-Incorporate 3D simulation to allow "rotating" of characters, objects and scenes