AI Design Secrets 🤯: Humans vs. Machines! ✨
Tech
🎧



Apple researchers have been investigating how generative AI can streamline app development, focusing on user interface creation. A team developed UICoder, an open-source AI model for generating UI code. Recognizing limitations in existing Reinforcement Learning from Human Feedback methods, the team shifted its approach. Twenty-one designers, with varying levels of experience across UI/UX, product, and service design, participated in a study. These designers provided direct feedback – through sketches and edits – on model-generated UIs, creating a dataset of 1,460 annotations. Researchers found that incorporating this designer-native feedback significantly improved the AI’s output, with models trained on sketch improvements outperforming both the base model and those trained with conventional ranking data. Notably, disagreements between researchers and designers regarding UI quality were substantial, highlighting the inherent subjectivity in design evaluation.
UICoder: A New Approach to AI-Generated UI Design
The Apple research team continues to investigate the potential of generative AI to streamline app development. Their latest work, centered around UICoder, demonstrates a novel method for AI to produce functional UI code, prioritizing compilation accuracy and prompt adherence over initial design quality. This project, detailed in a recent publication, represents a significant step towards more reliable and practical AI-assisted UI creation.
Refining UI Generation with Designer Feedback
Existing Reinforcement Learning from Human Feedback (RLHF) techniques proved inadequate for training Large Language Models (LLMs) to consistently generate well-designed UIs. The research team recognized that these methods were not aligned with designer workflows and often overlooked the critical rationale behind UI critiques and improvements. To address this, they pioneered a new approach: directly engaging professional designers to provide feedback on model-generated UIs through comments, sketches, and hands-on revisions. This iterative process was then translated into training data, effectively teaching the UI generator to prioritize layouts and components reflecting real-world design judgment.
The Design Review Process: A Multi-faceted Approach
Twenty-one designers participated in the study, exhibiting a range of professional experience – from 2 to over 30 years – and working across diverse design areas including UI/UX, product, and service design. Participants routinely conducted design reviews, ranging from infrequent (every few months) to frequent (multiple times a week). The research team meticulously collected 1,460 annotations, converting them into paired UI “preference” examples showcasing the contrast between original model-generated interfaces and the designers’ enhanced versions. This data fueled the creation of a reward model, calibrated to recognize and reward higher-quality visual designs based on numerical scores.
Automated Rendering and Model Selection
To assign rewards to HTML code, the team utilized an automated rendering pipeline, leveraging browser automation software to transform code into UI screenshots. Apple primarily employed Qwen2.5-Coder as the base model for UI generation, subsequently applying the designer-trained reward model to smaller, newer Qwen variants to assess its generalization capabilities across model sizes and versions. Notably, this framework mirrored a traditional RLHF pipeline, differing only in its learning signal originating from designer-native workflows rather than simple ranking or rating data.
Evaluating Performance: Sketch Feedback Yields Superior Results
The research demonstrated that models trained with designer-native feedback, particularly incorporating sketches and direct revisions, produced significantly higher-quality UI designs compared to base models and those trained solely on conventional ranking data. The best-performing model – Qwen3-Coder fine-tuned with sketch feedback – outperformed even GPT-5, achieving this result using only 181 sketch annotations. This highlights the efficiency of leveraging small amounts of high-quality expert feedback to enable smaller models to surpass larger, proprietary LLMs in UI generation.
Subjectivity and Design Evaluation Challenges
A key challenge in human-centered design problems, including this research, is managing subjectivity and multiple interpretations of design solutions. This inherent subjectivity can lead to variance in responses, presenting difficulties for widely-used ranking feedback mechanisms. During independent evaluations of UI pairs, researchers agreed with the designers' choices only 49.2% of the time – a near coin flip. However, when designers provided feedback through sketches or direct edits, agreement rates increased substantially to 63.6% and 76.1% respectively, indicating the value of specific, actionable feedback. This underscores the importance of clear communication and detailed input when working with AI-driven design tools.
This article is AI-synthesized from public sources and may not reflect original reporting.