{"id":9649,"date":"2025-01-27T20:29:33","date_gmt":"2025-01-28T01:29:33","guid":{"rendered":"https:\/\/www.inthacity.com\/blog\/uncategorized\/bytedance-ai-dominates-openai-operator-ui-tars-hands-free\/"},"modified":"2025-08-23T19:53:01","modified_gmt":"2025-08-24T00:53:01","slug":"bytedance-ai-dominates-openai-operator-ui-tars-hands-free","status":"publish","type":"post","link":"https:\/\/www.inthacity.com\/blog\/tech\/ai\/bytedance-ai-dominates-openai-operator-ui-tars-hands-free\/","title":{"rendered":"ByteDance\u2019s New AI Dominates OpenAI Operator &#8211; UI-TARS &#8211; Hands-Free AI"},"content":{"rendered":"<p>If you\u2019ve ever dreamed of having a personal assistant that can handle everything from booking flights to editing Photoshop files, your dreams just got a lot closer to reality. Thanks to AI Revolution\u2019s latest video, we\u2019re diving into <strong>UI Tar<\/strong>, an AI agent that doesn\u2019t just talk\u2014it <em>does<\/em>. This groundbreaking technology is here to revolutionize the way we interact with our devices, and it\u2019s more impressive (and slightly terrifying) than you might imagine.<\/p>\n<p>Developed by <a href=\"https:\/\/www.bytedance.com\/\" title=\"ByteDance\">ByteDance<\/a> in collaboration with <a href=\"https:\/\/www.tsinghua.edu.cn\/en\/\" title=\"Tsinghua University\">Tsinghua University<\/a>, UI Tar is a native GUI agent that can literally take control of your computer or phone. Whether you\u2019re on a Mac or PC, this AI can navigate interfaces, perform complex tasks, and even fix its own mistakes. It\u2019s like having a super-smart coworker who never sleeps, never complains, and\u2014most importantly\u2014never messes up your coffee order.<\/p>\n<h2>What Makes UI Tar So Special?<\/h2>\n<p>UI Tar is available in two versions: one with 7 billion parameters and another with a whopping 72 billion parameters. Trained on a massive dataset of about 50 billion tokens, this AI isn\u2019t just generating text\u2014it\u2019s controlling your graphical user interface (GUI) in real-time. Imagine saying, \u201cFind me roundtrip flights from Seattle to New York next month,\u201d and watching as UI Tar opens <a href=\"https:\/\/www.delta.com\/\" title=\"Delta Airlines\">Delta\u2019s website<\/a>, fills out the details, selects the dates, filters by price, and even clicks around the site as needed. It\u2019s not just efficient\u2014it\u2019s downright magical.<\/p>\n<p>What sets UI Tar apart is its \u201cpure vision-based agent\u201d method. Unlike older AI systems that rely on text-based data like HTML or accessibility trees, UI Tar perceives the screen visually, just like a human. It doesn\u2019t need to peek behind the curtain of code\u2014it sees a screenshot, understands the layout, and interacts with it as if it\u2019s a real user. This makes it incredibly flexible, allowing it to adapt to changes in the interface or platform without missing a beat.<\/p>\n<h3>How Does UI Tar Outperform the Competition?<\/h3>\n<p>When it comes to benchmarks, UI Tar is a serious contender. It outperforms giants like <a href=\"https:\/\/openai.com\/\" title=\"OpenAI\">OpenAI\u2019s GPT-4<\/a>, <a href=\"https:\/\/www.anthropic.com\/\" title=\"Anthropic\">Anthropic\u2019s Claude<\/a>, and even <a href=\"https:\/\/deepmind.google\/technologies\/gemini\/\" title=\"Google Gemini\">Google\u2019s Gemini<\/a> on more than 10 different GUI benchmarks. For example, on the Visual Web Bench, UI Tar scored 82.8 compared to GPT-4\u2019s 78.5. It also shines on multi-step tasks, like rearranging slides in PowerPoint or customizing mobile app settings, where it scored 24.6 on the OS World benchmark\u2014significantly higher than Claude\u2019s 22.0.<\/p>\n<p>Part of its success lies in \u201creflection tuning,\u201d a process where the AI corrects its own errors. If it tries to install a <a href=\"https:\/\/code.visualstudio.com\/\" title=\"Visual Studio Code\">Visual Studio Code<\/a> extension and something goes wrong, UI Tar notices the glitch, checks if the app is still loading, and adjusts its approach. This iterative feedback process means the AI gets better with every mistake, polishing its performance like a seasoned pro.<\/p>\n\t\t\t<div \n\t\t\tclass=\"yotu-playlist yotuwp yotu-limit-min yotu-limit-max   yotu-thumb-169  yotu-template-grid\" \n\t\t\tdata-page=\"1\"\n\t\t\tid=\"yotuwp-6a14183f8ab4e\"\n\t\t\tdata-yotu=\"6a14183fa5bb6\"\n\t\t\tdata-total=\"1\"\n\t\t\tdata-settings=\"eyJ0eXBlIjoidmlkZW9zIiwiaWQiOiJOZUpsVFd3dEx5ZyIsInBhZ2luYXRpb24iOiJvbiIsInBhZ2l0eXBlIjoicGFnZXIiLCJjb2x1bW4iOiIzIiwicGVyX3BhZ2UiOiIxMiIsInRlbXBsYXRlIjoiZ3JpZCIsInRpdGxlIjoib24iLCJkZXNjcmlwdGlvbiI6Im9uIiwidGh1bWJyYXRpbyI6IjE2OSIsIm1ldGEiOiJvZmYiLCJtZXRhX2RhdGEiOiJvZmYiLCJtZXRhX3Bvc2l0aW9uIjoib2ZmIiwiZGF0ZV9mb3JtYXQiOiJvZmYiLCJtZXRhX2FsaWduIjoib2ZmIiwic3Vic2NyaWJlIjoib2ZmIiwiZHVyYXRpb24iOiJvZmYiLCJtZXRhX2ljb24iOiJvZmYiLCJuZXh0dGV4dCI6IiIsInByZXZ0ZXh0IjoiIiwibG9hZG1vcmV0ZXh0IjoiIiwicGxheWVyIjp7Im1vZGUiOiJsYXJnZSIsIndpZHRoIjoiNjAwIiwic2Nyb2xsaW5nIjoiMTAwIiwiYXV0b3BsYXkiOjAsImNvbnRyb2xzIjoxLCJtb2Rlc3RicmFuZGluZyI6MSwibG9vcCI6MCwiYXV0b25leHQiOjAsInNob3dpbmZvIjoxLCJyZWwiOjEsInBsYXlpbmciOjAsInBsYXlpbmdfZGVzY3JpcHRpb24iOjAsInRodW1ibmFpbHMiOjAsImNjX2xvYWRfcG9saWN5IjoiMSIsImNjX2xhbmdfcHJlZiI6IjEiLCJobCI6IiIsIml2X2xvYWRfcG9saWN5IjoiMSJ9LCJsYXN0X3RhYiI6ImFwaSIsInVzZV9hc19tb2RhbCI6Im9mZiIsIm1vZGFsX2lkIjoib2ZmIiwibGFzdF91cGRhdGUiOiIxNjcyNzU1MzE5Iiwic3R5bGluZyI6eyJwYWdlcl9sYXlvdXQiOiJkZWZhdWx0IiwiYnV0dG9uIjoiMSIsImJ1dHRvbl9jb2xvciI6IiIsImJ1dHRvbl9iZ19jb2xvciI6IiIsImJ1dHRvbl9jb2xvcl9ob3ZlciI6IiIsImJ1dHRvbl9iZ19jb2xvcl9ob3ZlciI6IiIsInZpZGVvX3N0eWxlIjoiIiwicGxheWljb25fY29sb3IiOiIiLCJob3Zlcl9pY29uIjoiIiwiZ2FsbGVyeV9iZyI6IiJ9LCJlZmZlY3RzIjp7InZpZGVvX2JveCI6IiIsImZsaXBfZWZmZWN0IjoiIn0sImdhbGxlcnlfaWQiOiI2YTE0MTgzZjhhYjRlIn0=\"\n\t\t\tdata-player=\"large\"\n\t\t\tdata-showdesc=\"on\" >\n\t\t\t\t<div>\n\t\t\t\t\t\t\t\t\t\t<div class=\"yotu-wrapper-player\" style=\"width:600px\">\n\t\t\t\t\t\t\t\t\t\t\t\t<div class=\"yotu-player\">\n\t\t\t\t\t\t\t<div class=\"yotu-video-placeholder\" id=\"yotu-player-6a14183fa5bb6\"><\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t<div class=\"yotu-playing-status\"><\/div>\n\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\n\t\t\t\t\t<div class=\"yotu-pagination yotu-hide yotu-pager_layout-default yotu-pagination-top\">\n<a href=\"#\" class=\"yotu-pagination-prev yotu-button-prs yotu-button-prs-1\" data-page=\"prev\">Prev<\/a>\n<span class=\"yotu-pagination-current\">1<\/span> <span>of<\/span> <span class=\"yotu-pagination-total\">1<\/span>\n<a href=\"#\" class=\"yotu-pagination-next yotu-button-prs yotu-button-prs-1\" data-page=\"next\">Next<\/a>\n<\/div>\n<div class=\"yotu-videos yotu-mode-grid yotu-column-3 yotu-player-mode-large\">\n\t<ul>\n\t\t\t\t\t<li class=\" yotu-first yotu-last\">\n\t\t\t\t\t\t\t\t<a href=\"#NeJlTWwtLyg\" class=\"yotu-video\" data-videoid=\"NeJlTWwtLyg\" data-title=\"This Open Source AI Is Scaring the Biggest Tech Companies\" title=\"This Open Source AI Is Scaring the Biggest Tech Companies\">\n\t\t\t\t\t<div class=\"yotu-video-thumb-wrp\">\n\t\t\t\t\t\t<div>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img  title=\"\" decoding=\"async\" class=\"yotu-video-thumb\" src=\"https:\/\/i.ytimg.com\/vi\/NeJlTWwtLyg\/sddefault.jpg\"  alt=\"sddefault ByteDance\u2019s New AI Dominates OpenAI Operator - UI-TARS - Hands-Free AI\" >\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t<h3 class=\"yotu-video-title\">This Open Source AI Is Scaring the Biggest Tech Companies<\/h3>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<div class=\"yotu-video-description\"><\/div>\n\t\t\t\t\t\t\t\t\t<\/a>\n\t\t\t\t\t\t\t<\/li>\n\t\t\t\t\n\t\t\t\t<\/ul>\n<\/div><div class=\"yotu-pagination yotu-hide yotu-pager_layout-default yotu-pagination-bottom\">\n<a href=\"#\" class=\"yotu-pagination-prev yotu-button-prs yotu-button-prs-1\" data-page=\"prev\">Prev<\/a>\n<span class=\"yotu-pagination-current\">1<\/span> <span>of<\/span> <span class=\"yotu-pagination-total\">1<\/span>\n<a href=\"#\" class=\"yotu-pagination-next yotu-button-prs yotu-button-prs-1\" data-page=\"next\">Next<\/a>\n<\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t\t\n<h2>The Tech Behind UI Tar<\/h2>\n<p>So, how does ByteDance manage to train such a massive system? With GPU export restrictions in place, ByteDance focused on algorithmic breakthroughs rather than brute force. They harnessed synthetic data replays, user interactions, and crawled tutorials to create a pipeline that collects screenshots from a variety of websites and apps. These screenshots are combined with bounding boxes for each element, extracted text, and merged into a unified action space. In short, they\u2019ve taught UI Tar to see, reason, and act like a human.<\/p>\n<p>But it\u2019s not just about perception and action\u2014UI Tar excels at memory, too. It combines short-term memory for immediate tasks with long-term memory encoded in its parameters. This allows it to reason through complex workflows, like opening a settings panel before proceeding or trying a different approach after a failed attempt. It\u2019s the perfect blend of quick, intuitive thinking (System 1) and methodical, reflective planning (System 2).<\/p>\n<h3>Why This is a Game-Changer<\/h3>\n<p>UI Tar isn\u2019t just a tool\u2014it\u2019s a paradigm shift. From OS-level control to chaining tasks across different software, this AI has the potential to transform personal and business workflows. Picture hooking UI Tar up with a coding AI to handle everything from writing code to deploying it. Or imagine it managing your emails, apps, and even your calendar. The possibilities are endless.<\/p>\n<p>And here\u2019s the kicker: <a href=\"https:\/\/github.com\/t\/u-tars\" title=\"UI Tar GitHub Repository\">UI Tar is open-source<\/a>. Developers can tweak it, build new tasks, or even integrate it into their own projects. ByteDance is essentially handing us the keys to the AI kingdom, and it\u2019s up to us to drive.<\/p>\n<h2>The Future of AI Agents<\/h2>\n<p>UI Tar represents a giant leap toward what researchers call \u201cactive and lifelong agents.\u201d These are AIs that can learn from their environment in real-time, set their own tasks, and improve without constant retraining. It\u2019s a future where AI doesn\u2019t just assist\u2014it <em>evolves<\/em>. And while we\u2019re not quite there yet, UI Tar is a major step in that direction.<\/p>\n<p>So, what does this mean for Apple, Microsoft, and other tech giants? With ByteDance pushing the boundaries, they\u2019ll need to step up their game. As of 2025, there\u2019s still no native AI that can run seamlessly across iOS or Mac like UI Tar can. Apple, are you listening?<\/p>\n<h3>Get Your Hands on UI Tar<\/h3>\n<p>If you\u2019re ready to let an AI take the wheel, you can download UI Tar from its <a href=\"https:\/\/github.com\/t\/u-tars\" title=\"UI Tar GitHub Repository\">GitHub repository<\/a> or check out the desktop app. It\u2019s like having an invisible, digital human behind your shoulder\u2014only faster and less prone to random mistakes than your average coworker.<\/p>\n<h2>Final Thoughts<\/h2>\n<p>UI Tar is more than just an AI\u2014it\u2019s a glimpse into the future of human-computer interaction. It\u2019s fast, efficient, and eerily intelligent. But it also raises questions. What happens when AI agents like UI Tar become omnipresent? How do we ensure they\u2019re used ethically and responsibly? And are we ready to hand over control of our workflows to machines?<\/p>\n<p>What do you think about UI Tar? Would you trust it to handle your tasks? Let\u2019s discuss in the comments below. And if you\u2019re excited about the future of AI, why not join the <a href=\"https:\/\/www.inthacity.com\/blog\/newsletter\/\" title=\"iNthacity Community\">iNthacity community<\/a>? We\u2019re building a \u201cShining City on the Web\u201d where innovation and conversation thrive. Become a resident, share your thoughts, and let\u2019s shape the future together.<\/p>\n","protected":false},"excerpt":{"rendered":"<p> UI Tar is an AI agent that can take control of your computer or phone, performing tasks like booking flights or editing Photoshop files. It uses a \u201cpure vision-based agent\u201d method to interact with screens visually.<\/p>\n","protected":false},"author":2,"featured_media":9648,"comment_status":"open","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[348,270],"tags":[350,268,293],"class_list":["post-9649","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-agi","category-ai","tag-agi","tag-ai","tag-technology"],"aioseo_notices":[],"jetpack_featured_media_url":"https:\/\/www.inthacity.com\/blog\/wp-content\/uploads\/2025\/01\/feature_image_1738027769.png","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.inthacity.com\/blog\/wp-json\/wp\/v2\/posts\/9649","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.inthacity.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.inthacity.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.inthacity.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.inthacity.com\/blog\/wp-json\/wp\/v2\/comments?post=9649"}],"version-history":[{"count":0,"href":"https:\/\/www.inthacity.com\/blog\/wp-json\/wp\/v2\/posts\/9649\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.inthacity.com\/blog\/wp-json\/wp\/v2\/media\/9648"}],"wp:attachment":[{"href":"https:\/\/www.inthacity.com\/blog\/wp-json\/wp\/v2\/media?parent=9649"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.inthacity.com\/blog\/wp-json\/wp\/v2\/categories?post=9649"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.inthacity.com\/blog\/wp-json\/wp\/v2\/tags?post=9649"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}