iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://unpaywall.org/10.1007/978-3-031-56992-0_26
MAP-Elites with Transverse Assessment for Multimodal Problems in Creative Domains | SpringerLink
Skip to main content

MAP-Elites with Transverse Assessment for Multimodal Problems in Creative Domains

  • Conference paper
  • First Online:
Artificial Intelligence in Music, Sound, Art and Design (EvoMUSART 2024)

Abstract

The recent advances in language-based generative models have paved the way for the orchestration of multiple generators of different artefact types (text, image, audio, etc.) into one system. Presently, many open-source pre-trained models combine text with other modalities, thus enabling shared vector embeddings to be compared across different generators. Within this context we propose a novel approach to handle multimodal creative tasks using Quality Diversity evolution. Our contribution is a variation of the MAP-Elites algorithm, MAP-Elites with Transverse Assessment (MEliTA), which is tailored for multimodal creative tasks and leverages deep learned models that assess coherence across modalities. MEliTA decouples the artefacts’ modalities and promotes cross-pollination between elites. As a test bed for this algorithm, we generate text descriptions and cover images for a hypothetical video game and assign each artefact a unique modality-specific behavioural characteristic. Results indicate that MEliTA can improve text-to-image mappings within the solution space, compared to a baseline MAP-Elites algorithm that strictly treats each image-text pair as one solution. Our approach represents a significant step forward in multimodal bottom-up orchestration and lays the groundwork for more complex systems coordinating multimodal creative agents in the future.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://store.steampowered.com/.

  2. 2.

    We randomly select three spaces or punctuation marks within the text and keep the middle one. This makes it likely that the split will be in the middle of the description.

  3. 3.

    The BC coordinates for these candidate solutions do not need to be recalculated as they are combinations of text BCs and visual BCs that are already known.

  4. 4.

    Unlike [32], we do not normalise the values to the maximum found across runs and across methods. Instead, we present the non-normalised results (e.g. the ratio of occupied versus the maximum size of the feature map for coverage).

References

  1. Alfonseca, M., Cebrián, M., De la Puente, A.: A simple genetic algorithm for music generation by means of algorithmic information theory. In: Proceedings of the IEEE Congress on Evolutionary Computation, pp. 3035–3042 (2007). https://doi.org/10.1109/CEC.2007.4424858

  2. Alvarez, A., Dahlskog, S., Font, J., Togelius, J.: Empowering quality diversity in dungeon design with interactive constrained MAP-elites. In: Proceedings of the IEEE Conference on Games (2019). https://doi.org/10.1109/CIG.2019.8848022

  3. Alvarez, A., Font, J.: TropeTwist: trope-based narrative structure generation. In: Proceedings of the Foundations of Digital Games conference (2022). https://doi.org/10.1145/3555858.3563271

  4. Balestriero, R., et al.: A cookbook of self-supervised learning. arXiv preprint arXiv:2304.12210 (2023). https://doi.org/10.48550/arXiv.2304.12210

  5. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)

    Google Scholar 

  6. Brown, T., et al.: Language models are few-shot learners. In: Proceedings of the Neural Information Processing Systems Conference (2020)

    Google Scholar 

  7. Coello Coello, C.A.: Constraint-handling techniques used with evolutionary algorithms. In: Proceedings of the Genetic and Evolutionary Computation Conference (2010)

    Google Scholar 

  8. Colton, S.: Evolving neural style transfer blends. In: Romero, J., Martins, T., Rodríguez-Fernández, N. (eds.) EvoMUSART 2021. LNCS, vol. 12693, pp. 65–81. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72914-1_5

    Chapter  Google Scholar 

  9. Copet, J., et al.: Simple and controllable music generation. arXiv preprint arXiv:2306.05284 (2023)

  10. Cully, A., Demiris, Y.: Quality and diversity optimization: a unifying modular framework. IEEE Trans. Evol. Comput. 22(2), 245–259 (2017)

    Article  Google Scholar 

  11. Dangeti, P.: Statistics for Machine Learning. Packt Publishing (2017)

    Google Scholar 

  12. Fontaine, M.C., Nikolaidis, S.: Differentiable quality diversity. In: Proceedings of the Neural Information Processing Systems Conference (2021)

    Google Scholar 

  13. Galanter, P.: Artificial intelligence and problems in generative art theory. In: Proceedings of the Conference on Electronic Visualisation & the Arts, pp. 112–118 (2019). https://doi.org/10.14236/ewic/EVA2019.22

  14. Girdhar, R., et al.: ImageBind: one embedding space to bind them all. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)

    Google Scholar 

  15. Gravina, D., Khalifa, A., Liapis, A., Togelius, J., Yannakakis, G.N.: Procedural content generation through quality-diversity. In: Proceedings of the IEEE Conference on Games (2019)

    Google Scholar 

  16. Gunning, R.: The Technique of Clear Writing, pp. 36–37. McGraw-Hill Book Co. (1973)

    Google Scholar 

  17. Hasler, D., Suesstrunk, S.: Measuring colourfulness in natural images. In: Proceedings of the Conference on Electronic Imaging (2003). https://doi.org/10.1117/12.477378

  18. Hendrycks, D., Mu, N., Cubuk, E.D., Zoph, B., Gilmer, J., Lakshminarayanan, B.: AugMix: a simple data processing method to improve robustness and uncertainty. In: Proceedings of the International Conference on Learning Representations (ICLR) (2020)

    Google Scholar 

  19. Ho, J., Salimans, T.: Classifier-free diffusion guidance. In: Proceedings of the NeurIPS Workshop on Deep Generative Models and Downstream Applications (2021)

    Google Scholar 

  20. Hoover, A.K., Szerlip, P.A., Stanley, K.O.: Interactively evolving harmonies through functional scaffolding. In: Proceedings of the Genetic and evolutionary Computation Conference (2011)

    Google Scholar 

  21. Johnson, C.G.: Stepwise evolutionary learning using deep learned guidance functions. In: Bramer, M., Petridis, M. (eds.) SGAI 2019. LNCS (LNAI), vol. 11927, pp. 50–62. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34885-4_4

    Chapter  Google Scholar 

  22. Khalifa, A., Lee, S., Nealen, A., Togelius, J.: Talakat: bullet hell generation through constrained Map-Elites. In: Proceedings of the Genetic and Evolutionary Computation Conference (2018)

    Google Scholar 

  23. Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29

    Chapter  Google Scholar 

  24. Lehman, J., Gordon, J., Jain, S., Ndousse, K., Yeh, C., Stanley, K.O.: Evolution through large models. In: Banzhaf, W., Machado, P., Zhang, M. (eds.) Handbook of Evolutionary Machine Learning. Genetic and Evolutionary Computation, pp. 331–366. Springer, Singapore (2023). https://doi.org/10.1007/978-981-99-3814-8_11

    Chapter  Google Scholar 

  25. Lehman, J., Stanley, K.O.: Revising the evolutionary computation abstraction: minimal criteria novelty search. In: Proceedings of the Genetic and Evolutionary Computation Conference (2010)

    Google Scholar 

  26. Lehman, J., Stanley, K.O.: Evolving a diversity of virtual creatures through novelty search and local competition. In: Proceedings of the Genetic and Evolutionary Computation Conference (2011)

    Google Scholar 

  27. Liapis, A., Yannakakis, G.N., Togelius, J.: Adapting models of visual aesthetics for personalized content creation. IEEE Trans. Comput. Intell. AI Games 4(3), 213–228 (2012)

    Article  Google Scholar 

  28. Liapis, A., Yannakakis, G.N., Togelius, J.: Constrained novelty search: a study on game content generation. Evol. Comput. 23(1), 101–129 (2015)

    Article  Google Scholar 

  29. Machado, P., et al.: Computerized measures of visual complexity. Acta Physiol. (Oxf) 160, 43–57 (2015). https://doi.org/10.1016/j.actpsy.2015.06.005

    Article  Google Scholar 

  30. Marcel, S., Rodriguez, Y.: Torchvision the machine-vision package of torch. In: Proceedings of the ACM International Conference on Multimedia (2010). https://doi.org/10.1145/1873951.1874254

  31. Michalewicz, Z.: Do not kill unfeasible individuals. In: Proceedings of the 4th Intelligent Information Systems Workshop (1995)

    Google Scholar 

  32. Mouret, J.B., Clune, J.: Illuminating search spaces by mapping elites. arXiv preprint arXiv:1504.04909 (2015). https://doi.org/10.48550/arXiv.1504.04909

  33. OpenAI: GPT-4 technical report. arXiv preprint arXiv:2303.08774 (2023). https://doi.org/10.48550/arXiv.2303.08774

  34. Pugh, J.K., Soros, L.B., Stanley, K.O.: Quality diversity: a new frontier for evolutionary computation. Front. Robot. AI 3, 40 (2016)

    Article  Google Scholar 

  35. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)

    Google Scholar 

  36. Radford, A., et al.: Language models are unsupervised multitask learners (2019). https://openai.com/research/better-language-models. Accessed 11 Jan 2024

  37. Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of the Empirical Methods in Natural Language Processing Conference (2019)

    Google Scholar 

  38. Ritchie, G.: Some empirical criteria for attributing creativity to a computer program. Mind. Mach. 17, 76–99 (2007)

    Article  Google Scholar 

  39. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

    Google Scholar 

  40. Roziere, B., et al.: EvolGAN: evolutionary generative adversarial networks. In: Proceedings of the Asian Conference on Computer Vision (2021)

    Google Scholar 

  41. Secretan, J., Beato, N., D’Ambrosio, D.B., Rodriguez, A., Campbell, A., Stanley, K.O.: Picbreeder: evolving pictures collaboratively online. In: Proceeding of the SIGCHI Conference on Human Factors in Computing Systems (2008)

    Google Scholar 

  42. Sfikas, K., Liapis, A., Yannakakis, G.N.: Monte Carlo elites: quality-diversity selection as a multi-armed bandit problem. In: Proceedings of the Genetic and Evolutionary Computation Conference (2021)

    Google Scholar 

  43. Sfikas, K., Liapis, A., Yannakakis, G.N.: A general-purpose expressive algorithm for room-based environments. In: Proceedings of the FDG Workshop on Procedural Content Generation (2022)

    Google Scholar 

  44. Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. In: Proceedings of the 32nd International Conference on Machine Learning (2015)

    Google Scholar 

  45. Takagi, H.: Interactive evolutionary computation: fusion of the capabilities of EC optimization and human evaluation. Proc. Inst. Electr. Electron. Eng. 89(9), 1275–1296 (2001)

    Article  Google Scholar 

  46. Touvron, H., et al.: LLaMA: open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023). https://doi.org/10.48550/arXiv.2302.13971

  47. Touvron, H., et al: LLaMA 2: open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023). https://doi.org/10.48550/arXiv.2307.09288

  48. Vaswani, A., et al.: Attention is all you need. In: Proceedings of the Neural Information Processing Systems Conference (2017)

    Google Scholar 

  49. Viana, B.M.F., Pereira, L.T., Toledo, C.F.M.: Illuminating the space of enemies through MAP-Elites. In: Proceedings of the IEEE Conference on Games (2022). https://doi.org/10.1109/CoG51982.2022.9893621

  50. West, P., Lu, X., Holtzman, A., Bhagavatula, C., Hwang, J.D., Choi, Y.: Reflective decoding: beyond unidirectional generation with off-the-shelf language models. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (2021). https://doi.org/10.18653/v1/2021.acl-long.114

  51. Xie, S., Tu, Z.: Holistically-nested edge detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015). https://doi.org/10.1109/ICCV.2015.164

  52. Zammit, M., Liapis, A., Yannakakis, G.N.: Seeding diversity into AI art. In: Proceedings of the International Conference on Computational Creativity (2022)

    Google Scholar 

  53. Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018). https://doi.org/10.1109/CVPR.2018.00068

Download references

Acknowledgements

This project has received funding from the European Union’s Horizon 2020 programme under grant agreement No 951911.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marvin Zammit .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zammit, M., Liapis, A., Yannakakis, G.N. (2024). MAP-Elites with Transverse Assessment for Multimodal Problems in Creative Domains. In: Johnson, C., Rebelo, S.M., Santos, I. (eds) Artificial Intelligence in Music, Sound, Art and Design. EvoMUSART 2024. Lecture Notes in Computer Science, vol 14633. Springer, Cham. https://doi.org/10.1007/978-3-031-56992-0_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-56992-0_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-56991-3

  • Online ISBN: 978-3-031-56992-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics