A desert robot depicts the enormous possibilities of AI

When Hongzhi Gao was young, he lived with his family in Gansu, a province located in central northern China on the Tengger Desert. When he thinks back to his childhood, he remembers the constant, constant breeze of dirt outside their house, and that in most months of the year it took no more than a minute after stepping outside before sand would fill an empty space and crawl into his pockets, boots and his mouth. The monotony of the desert stuck in his head for years, and at university he turned that memory into an idea to build a machine that could bring plant life to the desert landscape.

Efforts to stop desertification – the process by which fertile soil turns into desert – have been primarily focused on expensive manual solutions. Hongzhi designed a robot with deep learning technology to automate the process of tree planting: from identifying optimal locations to planting tree implants to watering. Despite having no experience with artificial intelligence, Hongzhi as a bachelor student used Baidu’s deep learning platform PaddlePaddle to sew various modules together to build a robot with better object detection capability than similar machines already available on the market. It took less than a year for Hongzhi and his friends to make the final product and get it started.

Hongzhi’s desert robot serves as a telling example of the increasing availability of artificial intelligence.

Today, more than four million developers use Baidu’s open source AI technology to build solutions that can improve the lives of people in their communities, and many of them have little or no technical expertise in the field. “Within the next decade, artificial intelligence will be the source of change that takes place across all structures in our society and transforms how industries and companies function. Technology will expand the human experience by taking us on a deeper dive into the digital world, ”said Baidu CEO Robin Li at Baidu Create 2021, an AI developer conference.

As we enter a new chapter in the development of AI, Haifeng Wang, CTO of Baidu, identified two key trends supporting the industry’s path forward: AI will continue to mature and increase its technical complexity. And at the same time, the cost of implementation and the barrier to entry will fall – benefiting both companies building large-scale AI-powered solutions and software developers exploring the AI ​​world.

Fusion of knowledge and data with deep learning

The integration of knowledge and data with deep learning has significantly improved the efficiency and accuracy of AI models. Since 2011, Baidu’s AI infrastructure has been collecting and integrating new information into a large-scale knowledge graph. Currently, this knowledge graph has more than 550 billion facts covering all aspects of everyday life, as well as industry-specific topics including manufacturing, pharmaceuticals, law, financial services, technology and media and entertainment.

This knowledge graph and the massive data points together form the building blocks of Baidu’s newly released pre-trained language model PCL-BAIDU Wenxin (version ERINIE 3.0 Titan). The model outperforms other language models without knowledge graphs on 60 natural language processing (NLP) tasks, including reading comprehension, text classification, and semantic similarity.

Learning across modalities

Cross-modal learning is a new area of ​​AI research that seeks to improve machines’ cognitive understanding and better mimic human adaptive behavior. Examples of research efforts in this area include automatic text-to-image synthesis, where a model is trained in generating images solely from text descriptions, as well as algorithms built to understand visual content and express this understanding in words. The challenge with these tasks is for the machines to build semantic connections across different types of datasets (eg images, text) and understand the interdependence between them.

The next step for AI is to merge AI technologies such as computer vision, speech recognition and natural language processing to create a multimodal system.

On this front, Baidu has rolled out a variant of its NLP models that link language and visual semantic understanding together. Examples of real-world applications for this type of model include digital avatars that can perceive their surroundings as humans and handle customer support for companies, and algorithms that can “draw” works of art and compose poems based on their understanding of the generated works of art.

There are even more creative, effective potential outcomes for this technology. The PaddlePaddle platform can build semantic connections across vision and language, prompting a group of master’s students in China to create a dictionary to preserve endangered languages ​​in regions such as Yunnan and Guangxi by more easily translating them into Simplified Chinese.

AI integration across software and hardware and in industry-specific use cases

As AI systems are used to solve increasingly complex and industry-specific problems, greater emphasis is placed on optimizing the software (deep learning framework) and the hardware (AI chip) as a whole, instead of optimizing each one individually, taking into account factors such as. such as computing power, power consumption and latency.

Furthermore, there is a huge innovation taking place on the platform layer in Baidu’s AI infrastructure, where third-party developers use the deep learning features to build new applications tailored to specific use cases. The PaddlePaddle platform has a number of APIs to support AI applications in newer technologies such as quantum computation, life sciences, computational fluid mechanics, and molecular dynamics.

AI also has practical applications. For example, artificial intelligence in Shouguang, a small town in Shandong Province, is being used to streamline the fruit and vegetable industry. It only takes two people and one app to manage dozens of vegetable sheds.

And this is remarkable, says Wang, “Despite the increasing complexity of AI technology, the open source deep learning platform brings together the processor and applications as an operating system, reducing barriers to entry for companies and individuals wishing to incorporate AI into their business. . “

Reduced access barrier for developers and end users

On the technology front, pre-training of large models such as PCL-BAIDU Wenxin (version ERNIE 3.0 Titan) has solved many common bottlenecks that traditional models face. For example, these general models have helped lay the foundation for running different types of downstream NLP tasks, such as text classification and answering questions, in one consolidated place, whereas each type of task previously had to be solved by a separate model.

PaddlePaddle also has a number of developer-friendly tools, such as model compression technologies, to customize the general models to suit more specific applications. The platform provides an officially supported library of industrial quality models with more than 400 models ranging from large to small, which retain only a fraction of the size of the general models, but which can achieve comparable performance, reducing model development and implementation costs.

Today, Baidu’s open source deep learning technology supports a community of more than four million AI developers, who together have created 476,000 models, contributing to the AI-driven transformation of 157,000 companies and institutions. The examples listed above are the result of innovations that take place across all layers of the Baidu AI infrastructure, which integrate technologies such as voice recognition, computer vision, AR / VR, knowledge graphs and large model training that are one step closer to perceiving the world. as humans.

In its current state, AI has reached a level of maturity that enables it to perform amazing tasks. For example, the recent launch of Metaverse XiRang would not have been possible without PaddlePaddle’s platform for creating digital avatars that participants around the world can connect from their devices. Furthermore, future breakthroughs in areas such as quantum computing can significantly improve the performance of metavers. This shows how Baidu’s various offerings are intertwined and interdependent.

In a few years, artificial intelligence will be close to the core of our human experience. It will be for our society what steam power, electricity and internet were for previous generations. As artificial intelligence becomes more complex, developers like Hongzhi will work more as artists and designers, given the creative freedom to explore use cases that were previously only theoretically possible. There are no boundaries.

This content was produced by Baidu. It is not written by the editorial staff of MIT Technology Reviews.

Give a Comment