Abstract
The recent advances in language-based generative models have paved the way for the orchestration of multiple generators of different artefact types (text, image, audio, etc.) into one system. Presently, many open-source pre-trained models combine text with other modalities, thus enabling shared vector embeddings to be compared across different generators. Within this context we propose a novel approach to handle multimodal creative tasks using Quality Diversity evolution. Our contribution is a variation of the MAP-Elites algorithm, MAP-Elites with Transverse Assessment (MEliTA), which is tailored for multimodal creative tasks and leverages deep learned models that assess coherence across modalities. MEliTA decouples the artefacts’ modalities and promotes cross-pollination between elites. As a test bed for this algorithm, we generate text descriptions and cover images for a hypothetical video game and assign each artefact a unique modality-specific behavioural characteristic. Results indicate that MEliTA can improve text-to-image mappings within the solution space, compared to a baseline MAP-Elites algorithm that strictly treats each image-text pair as one solution. Our approach represents a significant step forward in multimodal bottom-up orchestration and lays the groundwork for more complex systems coordinating multimodal creative agents in the future.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
We randomly select three spaces or punctuation marks within the text and keep the middle one. This makes it likely that the split will be in the middle of the description.
- 3.
The BC coordinates for these candidate solutions do not need to be recalculated as they are combinations of text BCs and visual BCs that are already known.
- 4.
Unlike [32], we do not normalise the values to the maximum found across runs and across methods. Instead, we present the non-normalised results (e.g. the ratio of occupied versus the maximum size of the feature map for coverage).
References
Alfonseca, M., Cebrián, M., De la Puente, A.: A simple genetic algorithm for music generation by means of algorithmic information theory. In: Proceedings of the IEEE Congress on Evolutionary Computation, pp. 3035–3042 (2007). https://doi.org/10.1109/CEC.2007.4424858
Alvarez, A., Dahlskog, S., Font, J., Togelius, J.: Empowering quality diversity in dungeon design with interactive constrained MAP-elites. In: Proceedings of the IEEE Conference on Games (2019). https://doi.org/10.1109/CIG.2019.8848022
Alvarez, A., Font, J.: TropeTwist: trope-based narrative structure generation. In: Proceedings of the Foundations of Digital Games conference (2022). https://doi.org/10.1145/3555858.3563271
Balestriero, R., et al.: A cookbook of self-supervised learning. arXiv preprint arXiv:2304.12210 (2023). https://doi.org/10.48550/arXiv.2304.12210
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)
Brown, T., et al.: Language models are few-shot learners. In: Proceedings of the Neural Information Processing Systems Conference (2020)
Coello Coello, C.A.: Constraint-handling techniques used with evolutionary algorithms. In: Proceedings of the Genetic and Evolutionary Computation Conference (2010)
Colton, S.: Evolving neural style transfer blends. In: Romero, J., Martins, T., Rodríguez-Fernández, N. (eds.) EvoMUSART 2021. LNCS, vol. 12693, pp. 65–81. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72914-1_5
Copet, J., et al.: Simple and controllable music generation. arXiv preprint arXiv:2306.05284 (2023)
Cully, A., Demiris, Y.: Quality and diversity optimization: a unifying modular framework. IEEE Trans. Evol. Comput. 22(2), 245–259 (2017)
Dangeti, P.: Statistics for Machine Learning. Packt Publishing (2017)
Fontaine, M.C., Nikolaidis, S.: Differentiable quality diversity. In: Proceedings of the Neural Information Processing Systems Conference (2021)
Galanter, P.: Artificial intelligence and problems in generative art theory. In: Proceedings of the Conference on Electronic Visualisation & the Arts, pp. 112–118 (2019). https://doi.org/10.14236/ewic/EVA2019.22
Girdhar, R., et al.: ImageBind: one embedding space to bind them all. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)
Gravina, D., Khalifa, A., Liapis, A., Togelius, J., Yannakakis, G.N.: Procedural content generation through quality-diversity. In: Proceedings of the IEEE Conference on Games (2019)
Gunning, R.: The Technique of Clear Writing, pp. 36–37. McGraw-Hill Book Co. (1973)
Hasler, D., Suesstrunk, S.: Measuring colourfulness in natural images. In: Proceedings of the Conference on Electronic Imaging (2003). https://doi.org/10.1117/12.477378
Hendrycks, D., Mu, N., Cubuk, E.D., Zoph, B., Gilmer, J., Lakshminarayanan, B.: AugMix: a simple data processing method to improve robustness and uncertainty. In: Proceedings of the International Conference on Learning Representations (ICLR) (2020)
Ho, J., Salimans, T.: Classifier-free diffusion guidance. In: Proceedings of the NeurIPS Workshop on Deep Generative Models and Downstream Applications (2021)
Hoover, A.K., Szerlip, P.A., Stanley, K.O.: Interactively evolving harmonies through functional scaffolding. In: Proceedings of the Genetic and evolutionary Computation Conference (2011)
Johnson, C.G.: Stepwise evolutionary learning using deep learned guidance functions. In: Bramer, M., Petridis, M. (eds.) SGAI 2019. LNCS (LNAI), vol. 11927, pp. 50–62. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34885-4_4
Khalifa, A., Lee, S., Nealen, A., Togelius, J.: Talakat: bullet hell generation through constrained Map-Elites. In: Proceedings of the Genetic and Evolutionary Computation Conference (2018)
Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29
Lehman, J., Gordon, J., Jain, S., Ndousse, K., Yeh, C., Stanley, K.O.: Evolution through large models. In: Banzhaf, W., Machado, P., Zhang, M. (eds.) Handbook of Evolutionary Machine Learning. Genetic and Evolutionary Computation, pp. 331–366. Springer, Singapore (2023). https://doi.org/10.1007/978-981-99-3814-8_11
Lehman, J., Stanley, K.O.: Revising the evolutionary computation abstraction: minimal criteria novelty search. In: Proceedings of the Genetic and Evolutionary Computation Conference (2010)
Lehman, J., Stanley, K.O.: Evolving a diversity of virtual creatures through novelty search and local competition. In: Proceedings of the Genetic and Evolutionary Computation Conference (2011)
Liapis, A., Yannakakis, G.N., Togelius, J.: Adapting models of visual aesthetics for personalized content creation. IEEE Trans. Comput. Intell. AI Games 4(3), 213–228 (2012)
Liapis, A., Yannakakis, G.N., Togelius, J.: Constrained novelty search: a study on game content generation. Evol. Comput. 23(1), 101–129 (2015)
Machado, P., et al.: Computerized measures of visual complexity. Acta Physiol. (Oxf) 160, 43–57 (2015). https://doi.org/10.1016/j.actpsy.2015.06.005
Marcel, S., Rodriguez, Y.: Torchvision the machine-vision package of torch. In: Proceedings of the ACM International Conference on Multimedia (2010). https://doi.org/10.1145/1873951.1874254
Michalewicz, Z.: Do not kill unfeasible individuals. In: Proceedings of the 4th Intelligent Information Systems Workshop (1995)
Mouret, J.B., Clune, J.: Illuminating search spaces by mapping elites. arXiv preprint arXiv:1504.04909 (2015). https://doi.org/10.48550/arXiv.1504.04909
OpenAI: GPT-4 technical report. arXiv preprint arXiv:2303.08774 (2023). https://doi.org/10.48550/arXiv.2303.08774
Pugh, J.K., Soros, L.B., Stanley, K.O.: Quality diversity: a new frontier for evolutionary computation. Front. Robot. AI 3, 40 (2016)
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
Radford, A., et al.: Language models are unsupervised multitask learners (2019). https://openai.com/research/better-language-models. Accessed 11 Jan 2024
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of the Empirical Methods in Natural Language Processing Conference (2019)
Ritchie, G.: Some empirical criteria for attributing creativity to a computer program. Mind. Mach. 17, 76–99 (2007)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Roziere, B., et al.: EvolGAN: evolutionary generative adversarial networks. In: Proceedings of the Asian Conference on Computer Vision (2021)
Secretan, J., Beato, N., D’Ambrosio, D.B., Rodriguez, A., Campbell, A., Stanley, K.O.: Picbreeder: evolving pictures collaboratively online. In: Proceeding of the SIGCHI Conference on Human Factors in Computing Systems (2008)
Sfikas, K., Liapis, A., Yannakakis, G.N.: Monte Carlo elites: quality-diversity selection as a multi-armed bandit problem. In: Proceedings of the Genetic and Evolutionary Computation Conference (2021)
Sfikas, K., Liapis, A., Yannakakis, G.N.: A general-purpose expressive algorithm for room-based environments. In: Proceedings of the FDG Workshop on Procedural Content Generation (2022)
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. In: Proceedings of the 32nd International Conference on Machine Learning (2015)
Takagi, H.: Interactive evolutionary computation: fusion of the capabilities of EC optimization and human evaluation. Proc. Inst. Electr. Electron. Eng. 89(9), 1275–1296 (2001)
Touvron, H., et al.: LLaMA: open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023). https://doi.org/10.48550/arXiv.2302.13971
Touvron, H., et al: LLaMA 2: open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023). https://doi.org/10.48550/arXiv.2307.09288
Vaswani, A., et al.: Attention is all you need. In: Proceedings of the Neural Information Processing Systems Conference (2017)
Viana, B.M.F., Pereira, L.T., Toledo, C.F.M.: Illuminating the space of enemies through MAP-Elites. In: Proceedings of the IEEE Conference on Games (2022). https://doi.org/10.1109/CoG51982.2022.9893621
West, P., Lu, X., Holtzman, A., Bhagavatula, C., Hwang, J.D., Choi, Y.: Reflective decoding: beyond unidirectional generation with off-the-shelf language models. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (2021). https://doi.org/10.18653/v1/2021.acl-long.114
Xie, S., Tu, Z.: Holistically-nested edge detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015). https://doi.org/10.1109/ICCV.2015.164
Zammit, M., Liapis, A., Yannakakis, G.N.: Seeding diversity into AI art. In: Proceedings of the International Conference on Computational Creativity (2022)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018). https://doi.org/10.1109/CVPR.2018.00068
Acknowledgements
This project has received funding from the European Union’s Horizon 2020 programme under grant agreement No 951911.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zammit, M., Liapis, A., Yannakakis, G.N. (2024). MAP-Elites with Transverse Assessment for Multimodal Problems in Creative Domains. In: Johnson, C., Rebelo, S.M., Santos, I. (eds) Artificial Intelligence in Music, Sound, Art and Design. EvoMUSART 2024. Lecture Notes in Computer Science, vol 14633. Springer, Cham. https://doi.org/10.1007/978-3-031-56992-0_26
Download citation
DOI: https://doi.org/10.1007/978-3-031-56992-0_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56991-3
Online ISBN: 978-3-031-56992-0
eBook Packages: Computer ScienceComputer Science (R0)