Random thoughts about "AI art"
a year ago
Semi-recently I had a chance to experiment with a few of those modern AI image generators, so it's high time I collect some of my thoughts on the matter.
Several conflicting thoughts on the matter, that is. They're also open to discussion and change as the topic, discourse, and overall culture/reputation evolves.
So on one hand, it's actually fascinating to see these generators work, having reached a point where they legitimately "work as advertised". Did you know that these generators can draw protogens? Various Pokemon or Digimon by name? You can literally prompt something like "sergal taur" and it will draw a -taur shape with sergal features, proving that it has "knows" what the "-taur" suffix generally implies.
On another, it is (and SHOULD BE) uncanny that these trained AI models can simulate the results of real artists, and do so in only a fraction of the same time. And there is a legitimate, worthwhile argument that because of the (ironically) timeless mindset of "time is money", commercial corporate culture will always gravitate toward the cheaper options, accepting their shortcomings as just a number cited in their budget spreadsheet, for a greater savings on the ultimate non-refundable resource: time itself.
And it must be said: there is a certain, real risk of "addiction" to the sheer convenience of these image generators. Why spend hours-to-weeks composing and producing a single piece when you can submit a like description to an AI and get a resulting image in minutes to seconds?
...Because when you make art yourself, you're in full control of your artistic process? Because it's not the end result, but the process itself that's worth your time? Because art gives back only what you actually put in?
Yeah, about that.
Despite their capabilities, current image generators still have easy-to-spot limitations, and will tend to commit errors that a human artist would be certain to get right on their first attempt. People meme about AI generators horribly misunderstanding how hands and fingers work -- yes, hands and fingers are legitimately complicated parts of the body that even human artists struggle to get right. But this is actually a subset of a broader flaw of the generator handling fine details generally (including an inability to count, which is part of why it struggles with fingers), and it also (especially!) applies to generating text. An image generator can get individual letters and glyphs right, but stringing them together to form coherent words has ZERO room for error and certain models just don't have this capability yet.
Because AIs trained from tagged images taken from across the Internet, the AI is not so much trained to understand what something "is" (on a compositional or structural level), so much as it's trained on what something "looks like" (as an end result). The AI gravitates towards certain compositions (portrait, 3/4's view, etc) not unlike a human artist does, and trying to specify an unusual composition in detail increases the likelihood that there will be certain elements the AI just can't effectively model, can't "understand", or which it seems to ignore entirely (compared to other elements in the same prompt). Every AI model is a combination of several parts, and a limitation in one element (like its text interpreter) will persist across the algorithm as a whole.
And, honestly, trying to battle/workaround these limitations is a timesink unto itself -- I could (and have!) spent literal hours experimenting with a single prompt, tweaking and iterating and refining it over and over again to minimize whatever mistakes the AI might make along the way, evaluate what compositions it picks, and even grow quite frustrated with the AI's ability to get 95% of the way there but not close out that last 5%. When in the same hours, I could have sketched, refined, inked and colored a 5x8" pencil sketch, even scanned, prepped, and uploaded!
In some ways, AI image generators are not too different conceptually from those old Flash character-creators/editors we had back in the 2000s -- remember them? I tried a few in the day, and maybe you did too. So if they're not that different fundamentally, then the remaining difference must be the sheer scale/scope of what these image generators are capable of, right? And that they may be reaching a certain tipping point or "critical mass", some trifecta of power, convenience, and accessibility that enables them to provide some real competition to real artists.
If somebody says that the AI training process (which involves analyzing images across the Internet, including both amateur and professional art, and a point of serious contention from said artists) isn't too different conceptually from a human artist studying and learning from material, well ... they're not wrong. To this day science still can't identify exactly how biological neurons physically encode knowledge and techniques in the brain, but we all (quite literally sub-consciously) enjoy the fact that "it just works". But AIs are still just a simulation with some useful correlations, even if its underlying design is wrong (much like the difference between polygonal 3D rendering, which can be calculated and displayed in real time, and 3D ray-tracing, which accurately simulates the underlying physics of light but is far more work-intensive for the same result).
But again, if the above is not too different from a real human artist, then by definition the remaining difference must be the sheer speed at which an AI models ("learns") its training data. New tools have always created an ability to do the same work faster (if not better) than a person doing it by hand, and "time is money". And for creative endeavors like art, maybe this difference in speed really IS what matters. We value human-made art because of the process and time that was spent to develop that end result; "AI art" is valued for its end result in spite of whatever process and time was spent on generating it.
On which note, remember that AI image generators are also actually a super expensive piece of kit computationally (comparable to the aforementioned 3D raytracing), and this is why most places running a generator cost some kind of a subscription fee to use it (or place other limitations, like a max # of prompts). In which regard the Stable Diffusion model is actually much more efficient than DALL-E or Midjourney, but it's still one expensive piece of work that you're unlikely to be capable of running on consumer-grade hardware.
This is not a conclusion to the topic, but it's all I have on my mind for now. I may return to update this journal with more thoughts later.
Several conflicting thoughts on the matter, that is. They're also open to discussion and change as the topic, discourse, and overall culture/reputation evolves.
So on one hand, it's actually fascinating to see these generators work, having reached a point where they legitimately "work as advertised". Did you know that these generators can draw protogens? Various Pokemon or Digimon by name? You can literally prompt something like "sergal taur" and it will draw a -taur shape with sergal features, proving that it has "knows" what the "-taur" suffix generally implies.
On another, it is (and SHOULD BE) uncanny that these trained AI models can simulate the results of real artists, and do so in only a fraction of the same time. And there is a legitimate, worthwhile argument that because of the (ironically) timeless mindset of "time is money", commercial corporate culture will always gravitate toward the cheaper options, accepting their shortcomings as just a number cited in their budget spreadsheet, for a greater savings on the ultimate non-refundable resource: time itself.
And it must be said: there is a certain, real risk of "addiction" to the sheer convenience of these image generators. Why spend hours-to-weeks composing and producing a single piece when you can submit a like description to an AI and get a resulting image in minutes to seconds?
...Because when you make art yourself, you're in full control of your artistic process? Because it's not the end result, but the process itself that's worth your time? Because art gives back only what you actually put in?
Yeah, about that.
Despite their capabilities, current image generators still have easy-to-spot limitations, and will tend to commit errors that a human artist would be certain to get right on their first attempt. People meme about AI generators horribly misunderstanding how hands and fingers work -- yes, hands and fingers are legitimately complicated parts of the body that even human artists struggle to get right. But this is actually a subset of a broader flaw of the generator handling fine details generally (including an inability to count, which is part of why it struggles with fingers), and it also (especially!) applies to generating text. An image generator can get individual letters and glyphs right, but stringing them together to form coherent words has ZERO room for error and certain models just don't have this capability yet.
Because AIs trained from tagged images taken from across the Internet, the AI is not so much trained to understand what something "is" (on a compositional or structural level), so much as it's trained on what something "looks like" (as an end result). The AI gravitates towards certain compositions (portrait, 3/4's view, etc) not unlike a human artist does, and trying to specify an unusual composition in detail increases the likelihood that there will be certain elements the AI just can't effectively model, can't "understand", or which it seems to ignore entirely (compared to other elements in the same prompt). Every AI model is a combination of several parts, and a limitation in one element (like its text interpreter) will persist across the algorithm as a whole.
And, honestly, trying to battle/workaround these limitations is a timesink unto itself -- I could (and have!) spent literal hours experimenting with a single prompt, tweaking and iterating and refining it over and over again to minimize whatever mistakes the AI might make along the way, evaluate what compositions it picks, and even grow quite frustrated with the AI's ability to get 95% of the way there but not close out that last 5%. When in the same hours, I could have sketched, refined, inked and colored a 5x8" pencil sketch, even scanned, prepped, and uploaded!
In some ways, AI image generators are not too different conceptually from those old Flash character-creators/editors we had back in the 2000s -- remember them? I tried a few in the day, and maybe you did too. So if they're not that different fundamentally, then the remaining difference must be the sheer scale/scope of what these image generators are capable of, right? And that they may be reaching a certain tipping point or "critical mass", some trifecta of power, convenience, and accessibility that enables them to provide some real competition to real artists.
If somebody says that the AI training process (which involves analyzing images across the Internet, including both amateur and professional art, and a point of serious contention from said artists) isn't too different conceptually from a human artist studying and learning from material, well ... they're not wrong. To this day science still can't identify exactly how biological neurons physically encode knowledge and techniques in the brain, but we all (quite literally sub-consciously) enjoy the fact that "it just works". But AIs are still just a simulation with some useful correlations, even if its underlying design is wrong (much like the difference between polygonal 3D rendering, which can be calculated and displayed in real time, and 3D ray-tracing, which accurately simulates the underlying physics of light but is far more work-intensive for the same result).
But again, if the above is not too different from a real human artist, then by definition the remaining difference must be the sheer speed at which an AI models ("learns") its training data. New tools have always created an ability to do the same work faster (if not better) than a person doing it by hand, and "time is money". And for creative endeavors like art, maybe this difference in speed really IS what matters. We value human-made art because of the process and time that was spent to develop that end result; "AI art" is valued for its end result in spite of whatever process and time was spent on generating it.
On which note, remember that AI image generators are also actually a super expensive piece of kit computationally (comparable to the aforementioned 3D raytracing), and this is why most places running a generator cost some kind of a subscription fee to use it (or place other limitations, like a max # of prompts). In which regard the Stable Diffusion model is actually much more efficient than DALL-E or Midjourney, but it's still one expensive piece of work that you're unlikely to be capable of running on consumer-grade hardware.
This is not a conclusion to the topic, but it's all I have on my mind for now. I may return to update this journal with more thoughts later.