This text introduces a new approach to combining conversational AI and graphical user interface (GUI) interaction in mobile apps. It describes the concept of a Natural Language Bar that allows users to interact with the app using their own language. The article provides examples and implementation details for developers. The Natural Language Bar can be applied to various types of apps and offers a user-friendly and efficient way to navigate and interact with the app’s features.
Use OpenAI GPT function calling to drive your mobile app
Introduction
We have developed a revolutionary approach to enhance the user experience of your mobile app by combining Conversational AI and Graphical User Interface (GUI) interaction. Our solution is a Natural Language Bar that sits at the bottom of every screen, allowing users to interact with the entire app from a single entry point. This eliminates the need for users to search for specific tasks and enables them to express their intentions in their own language. The speed and efficiency of the GUI are preserved, while users have the option to choose between language and direct manipulation. We have optimized this concept and even created a sample app for you to try. The full code is available on GitHub, so you can explore the concept further. This article is intended for product owners, UX designers, and mobile developers.
Background
Natural language interfaces and Graphical User Interfaces (GUIs) are crucial for connecting users with computer systems. Natural language allows humans to communicate about a wide range of topics, while pointing and direct manipulation enable communication about specific items in the world. Our approach leverages the interpreting quality of Natural Language Processing (NLP) and function calling to create complete natural language interfaces that minimize misinterpretations. While the current trend is to focus on chat interfaces, we believe that combining natural language and GUI interaction offers the best user experience. This approach is applicable to various apps, including banking, shopping, and travel apps. It simplifies the user journey, ensures users find what they need, and allows them to express their requests naturally.
The Natural Language Bar
Our solution, the Natural Language Bar (NLB), enables users to type or speak their requests. Along with the request, the definitions of all screens in the app are sent to the Large Language Model (LLM) using function calling. The LLM then navigates the GUI based on the user’s intention. For example, in a banking app, if a user requests information about nearby banking offices, the LLM will identify the appropriate screen and display the relevant information. The NLB also supports shorthand expressions and allows users to correct or refine their previous requests. It offers a seamless and intuitive interaction mode.
How it works
When a user asks a question in the Natural Language Bar, a JSON schema is added to the prompt, defining the structure and purposes of all screens and their input elements. The LLM maps the user’s natural language expression to the appropriate screen definition and returns a JSON object that activates the applicable screen. The implementation of the Natural Language Bar is based on LangChain Dart, which allows for prompt engineering on the client side. The code snippet provided demonstrates how screens are activated and navigated based on user requests.
History Panel
The Natural Language Bar includes a collapsible interaction history panel, allowing users to easily refer to previous statements. This panel preserves the interaction history in a compact and organized form, similar to chat interfaces. Previous user statements are displayed in their original language, and system responses are incorporated as hyperlinks that can reactivate the corresponding screens. This feature provides a convenient way to offer customer support and context-sensitive help.
Future
The history panel of the Natural Language Bar can be further enhanced to offer customer support and context-sensitive help. By combining the history trace of user interaction with Retrieval Augmented Generation (RAG) techniques, chatbots can provide answers based on a large body of text content. This feature can provide users with personalized and relevant assistance. The Natural Language Bar also has the potential to be expanded beyond mobile apps and applied to any application with a GUI. As natural language interaction becomes more prevalent, there will be further advancements in AI models and techniques to enhance the user experience.
Conclusion
Our solution, the Natural Language Bar, seamlessly integrates Conversational AI and GUI interaction to optimize the user experience of your mobile app. Users can interact with the app in their own language, while the system intelligently navigates the GUI based on their requests. The Natural Language Bar opens up the full functionality of the app to users, eliminating the need for them to search for specific tasks or learn complex app jargon. With our sample app and available code, you can quickly implement the Natural Language Bar in your own app. Embrace the power of AI and revolutionize your user experience.