MAP-Elites with Transverse Assessment for Multimodal Problems in Creative Domains

Zammit, Marvin; Liapis, Antonios; Yannakakis, Georgios N.

doi:10.1007/978-3-031-56992-0_26

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14633))

Included in the following conference series:

International Conference on Computational Intelligence in Music, Sound, Art and Design (Part of EvoStar)

461 Accesses
2 Altmetric

Abstract

The recent advances in language-based generative models have paved the way for the orchestration of multiple generators of different artefact types (text, image, audio, etc.) into one system. Presently, many open-source pre-trained models combine text with other modalities, thus enabling shared vector embeddings to be compared across different generators. Within this context we propose a novel approach to handle multimodal creative tasks using Quality Diversity evolution. Our contribution is a variation of the MAP-Elites algorithm, MAP-Elites with Transverse Assessment (MEliTA), which is tailored for multimodal creative tasks and leverages deep learned models that assess coherence across modalities. MEliTA decouples the artefacts’ modalities and promotes cross-pollination between elites. As a test bed for this algorithm, we generate text descriptions and cover images for a hypothetical video game and assign each artefact a unique modality-specific behavioural characteristic. Results indicate that MEliTA can improve text-to-image mappings within the solution space, compared to a baseline MAP-Elites algorithm that strictly treats each image-text pair as one solution. Our approach represents a significant step forward in multimodal bottom-up orchestration and lays the groundwork for more complex systems coordinating multimodal creative agents in the future.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://store.steampowered.com/.
2.
We randomly select three spaces or punctuation marks within the text and keep the middle one. This makes it likely that the split will be in the middle of the description.
3.
The BC coordinates for these candidate solutions do not need to be recalculated as they are combinations of text BCs and visual BCs that are already known.
4.
Unlike [32], we do not normalise the values to the maximum found across runs and across methods. Instead, we present the non-normalised results (e.g. the ratio of occupied versus the maximum size of the feature map for coverage).

References

Alfonseca, M., Cebrián, M., De la Puente, A.: A simple genetic algorithm for music generation by means of algorithmic information theory. In: Proceedings of the IEEE Congress on Evolutionary Computation, pp. 3035–3042 (2007). https://doi.org/10.1109/CEC.2007.4424858
Alvarez, A., Dahlskog, S., Font, J., Togelius, J.: Empowering quality diversity in dungeon design with interactive constrained MAP-elites. In: Proceedings of the IEEE Conference on Games (2019). https://doi.org/10.1109/CIG.2019.8848022
Alvarez, A., Font, J.: TropeTwist: trope-based narrative structure generation. In: Proceedings of the Foundations of Digital Games conference (2022). https://doi.org/10.1145/3555858.3563271
Balestriero, R., et al.: A cookbook of self-supervised learning. arXiv preprint arXiv:2304.12210 (2023). https://doi.org/10.48550/arXiv.2304.12210
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)
Google Scholar
Brown, T., et al.: Language models are few-shot learners. In: Proceedings of the Neural Information Processing Systems Conference (2020)
Google Scholar
Coello Coello, C.A.: Constraint-handling techniques used with evolutionary algorithms. In: Proceedings of the Genetic and Evolutionary Computation Conference (2010)
Google Scholar
Colton, S.: Evolving neural style transfer blends. In: Romero, J., Martins, T., Rodríguez-Fernández, N. (eds.) EvoMUSART 2021. LNCS, vol. 12693, pp. 65–81. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72914-1_5
Chapter Google Scholar
Copet, J., et al.: Simple and controllable music generation. arXiv preprint arXiv:2306.05284 (2023)
Cully, A., Demiris, Y.: Quality and diversity optimization: a unifying modular framework. IEEE Trans. Evol. Comput. 22(2), 245–259 (2017)
Article Google Scholar
Dangeti, P.: Statistics for Machine Learning. Packt Publishing (2017)
Google Scholar
Fontaine, M.C., Nikolaidis, S.: Differentiable quality diversity. In: Proceedings of the Neural Information Processing Systems Conference (2021)
Google Scholar
Galanter, P.: Artificial intelligence and problems in generative art theory. In: Proceedings of the Conference on Electronic Visualisation & the Arts, pp. 112–118 (2019). https://doi.org/10.14236/ewic/EVA2019.22
Girdhar, R., et al.: ImageBind: one embedding space to bind them all. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)
Google Scholar
Gravina, D., Khalifa, A., Liapis, A., Togelius, J., Yannakakis, G.N.: Procedural content generation through quality-diversity. In: Proceedings of the IEEE Conference on Games (2019)
Google Scholar
Gunning, R.: The Technique of Clear Writing, pp. 36–37. McGraw-Hill Book Co. (1973)
Google Scholar
Hasler, D., Suesstrunk, S.: Measuring colourfulness in natural images. In: Proceedings of the Conference on Electronic Imaging (2003). https://doi.org/10.1117/12.477378
Hendrycks, D., Mu, N., Cubuk, E.D., Zoph, B., Gilmer, J., Lakshminarayanan, B.: AugMix: a simple data processing method to improve robustness and uncertainty. In: Proceedings of the International Conference on Learning Representations (ICLR) (2020)
Google Scholar
Ho, J., Salimans, T.: Classifier-free diffusion guidance. In: Proceedings of the NeurIPS Workshop on Deep Generative Models and Downstream Applications (2021)
Google Scholar
Hoover, A.K., Szerlip, P.A., Stanley, K.O.: Interactively evolving harmonies through functional scaffolding. In: Proceedings of the Genetic and evolutionary Computation Conference (2011)
Google Scholar
Johnson, C.G.: Stepwise evolutionary learning using deep learned guidance functions. In: Bramer, M., Petridis, M. (eds.) SGAI 2019. LNCS (LNAI), vol. 11927, pp. 50–62. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34885-4_4
Chapter Google Scholar
Khalifa, A., Lee, S., Nealen, A., Togelius, J.: Talakat: bullet hell generation through constrained Map-Elites. In: Proceedings of the Genetic and Evolutionary Computation Conference (2018)
Google Scholar
Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29
Chapter Google Scholar
Lehman, J., Gordon, J., Jain, S., Ndousse, K., Yeh, C., Stanley, K.O.: Evolution through large models. In: Banzhaf, W., Machado, P., Zhang, M. (eds.) Handbook of Evolutionary Machine Learning. Genetic and Evolutionary Computation, pp. 331–366. Springer, Singapore (2023). https://doi.org/10.1007/978-981-99-3814-8_11
Chapter Google Scholar
Lehman, J., Stanley, K.O.: Revising the evolutionary computation abstraction: minimal criteria novelty search. In: Proceedings of the Genetic and Evolutionary Computation Conference (2010)
Google Scholar
Lehman, J., Stanley, K.O.: Evolving a diversity of virtual creatures through novelty search and local competition. In: Proceedings of the Genetic and Evolutionary Computation Conference (2011)
Google Scholar
Liapis, A., Yannakakis, G.N., Togelius, J.: Adapting models of visual aesthetics for personalized content creation. IEEE Trans. Comput. Intell. AI Games 4(3), 213–228 (2012)
Article Google Scholar
Liapis, A., Yannakakis, G.N., Togelius, J.: Constrained novelty search: a study on game content generation. Evol. Comput. 23(1), 101–129 (2015)
Article Google Scholar
Machado, P., et al.: Computerized measures of visual complexity. Acta Physiol. (Oxf) 160, 43–57 (2015). https://doi.org/10.1016/j.actpsy.2015.06.005
Article Google Scholar
Marcel, S., Rodriguez, Y.: Torchvision the machine-vision package of torch. In: Proceedings of the ACM International Conference on Multimedia (2010). https://doi.org/10.1145/1873951.1874254
Michalewicz, Z.: Do not kill unfeasible individuals. In: Proceedings of the 4th Intelligent Information Systems Workshop (1995)
Google Scholar
Mouret, J.B., Clune, J.: Illuminating search spaces by mapping elites. arXiv preprint arXiv:1504.04909 (2015). https://doi.org/10.48550/arXiv.1504.04909
OpenAI: GPT-4 technical report. arXiv preprint arXiv:2303.08774 (2023). https://doi.org/10.48550/arXiv.2303.08774
Pugh, J.K., Soros, L.B., Stanley, K.O.: Quality diversity: a new frontier for evolutionary computation. Front. Robot. AI 3, 40 (2016)
Article Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
Google Scholar
Radford, A., et al.: Language models are unsupervised multitask learners (2019). https://openai.com/research/better-language-models. Accessed 11 Jan 2024
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of the Empirical Methods in Natural Language Processing Conference (2019)
Google Scholar
Ritchie, G.: Some empirical criteria for attributing creativity to a computer program. Mind. Mach. 17, 76–99 (2007)
Article Google Scholar
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Google Scholar
Roziere, B., et al.: EvolGAN: evolutionary generative adversarial networks. In: Proceedings of the Asian Conference on Computer Vision (2021)
Google Scholar
Secretan, J., Beato, N., D’Ambrosio, D.B., Rodriguez, A., Campbell, A., Stanley, K.O.: Picbreeder: evolving pictures collaboratively online. In: Proceeding of the SIGCHI Conference on Human Factors in Computing Systems (2008)
Google Scholar
Sfikas, K., Liapis, A., Yannakakis, G.N.: Monte Carlo elites: quality-diversity selection as a multi-armed bandit problem. In: Proceedings of the Genetic and Evolutionary Computation Conference (2021)
Google Scholar
Sfikas, K., Liapis, A., Yannakakis, G.N.: A general-purpose expressive algorithm for room-based environments. In: Proceedings of the FDG Workshop on Procedural Content Generation (2022)
Google Scholar
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. In: Proceedings of the 32nd International Conference on Machine Learning (2015)
Google Scholar
Takagi, H.: Interactive evolutionary computation: fusion of the capabilities of EC optimization and human evaluation. Proc. Inst. Electr. Electron. Eng. 89(9), 1275–1296 (2001)
Article Google Scholar
Touvron, H., et al.: LLaMA: open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023). https://doi.org/10.48550/arXiv.2302.13971
Touvron, H., et al: LLaMA 2: open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023). https://doi.org/10.48550/arXiv.2307.09288
Vaswani, A., et al.: Attention is all you need. In: Proceedings of the Neural Information Processing Systems Conference (2017)
Google Scholar
Viana, B.M.F., Pereira, L.T., Toledo, C.F.M.: Illuminating the space of enemies through MAP-Elites. In: Proceedings of the IEEE Conference on Games (2022). https://doi.org/10.1109/CoG51982.2022.9893621
West, P., Lu, X., Holtzman, A., Bhagavatula, C., Hwang, J.D., Choi, Y.: Reflective decoding: beyond unidirectional generation with off-the-shelf language models. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (2021). https://doi.org/10.18653/v1/2021.acl-long.114
Xie, S., Tu, Z.: Holistically-nested edge detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015). https://doi.org/10.1109/ICCV.2015.164
Zammit, M., Liapis, A., Yannakakis, G.N.: Seeding diversity into AI art. In: Proceedings of the International Conference on Computational Creativity (2022)
Google Scholar
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018). https://doi.org/10.1109/CVPR.2018.00068

Download references

Acknowledgements

This project has received funding from the European Union’s Horizon 2020 programme under grant agreement No 951911.

Author information

Authors and Affiliations

Institute of Digital Games, University of Malta, Msida, Malta
Marvin Zammit, Antonios Liapis & Georgios N. Yannakakis

Authors

Marvin Zammit
View author publications
You can also search for this author in PubMed Google Scholar
Antonios Liapis
View author publications
You can also search for this author in PubMed Google Scholar
Georgios N. Yannakakis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marvin Zammit .

Editor information

Editors and Affiliations

University of Nottingham, Nottingham, UK
Colin Johnson
University of Coimbra, Coimbra, Portugal
Sérgio M. Rebelo
University of Coruña, Coruña, Spain
Iria Santos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zammit, M., Liapis, A., Yannakakis, G.N. (2024). MAP-Elites with Transverse Assessment for Multimodal Problems in Creative Domains. In: Johnson, C., Rebelo, S.M., Santos, I. (eds) Artificial Intelligence in Music, Sound, Art and Design. EvoMUSART 2024. Lecture Notes in Computer Science, vol 14633. Springer, Cham. https://doi.org/10.1007/978-3-031-56992-0_26

Download citation

DOI: https://doi.org/10.1007/978-3-031-56992-0_26
Published: 29 March 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56991-3
Online ISBN: 978-3-031-56992-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MAP-Elites with Transverse Assessment for Multimodal Problems in Creative Domains