When Ben Wu, an engineer in China, wanted to install Facebook’s open-source AI framework PyTorch in 2017, he visited its online community on GitHub and asked for some pointers.
Soumith Chintala, a Facebook AI research engineer based in New York, showed him how he could download it quickly.
PyTorch has become a foundational component of AI technology, thanks in large part to knowledge-sharing exchanges like the one between Wu and Chintala that happen every day. And although it’s become increasingly corporatized, the borderless, open-source software movement has risen above geopolitical tensions between China and the U.S., which have centered on concerns over China’s use of AI to carry out repressive surveillance, its plans to transfer civilian tech for military applications, and Chinese government espionage and intellectual property theft.
“I’m definitely surprised at how much [of the] general global considerations you would have from a business angle don’t really come in when you’re talking about open-source collaboration, especially with AI,” Chintala told Protocol in September when Facebook parent company Meta handed over PyTorch to the nonprofit open-source software consortium Linux Foundation.
“Within open-source software, the political doesn’t even start entering into play [until] much later,” Chintala said. “People are mostly trying to learn from each other, build the best thing they can.”
There’s no shortage of Cold War and space race themes tossed around when people discuss the politics of economic, national security, and human-rights implications of China’s AI advancements. But the distance between how yesterday’s energy and space-related technologies were built and how today’s AI-related tech is produced is striking.
People are mostly trying to learn from each other, build the best thing they can.”
In fact, the way modern AI technologies are developed shows there is no race for one country to win. Quite the contrary, the AI industry has skyrocketed because a global community has constructed it, together, brick by digital brick.
“I don’t think we’d have the kind of machine learning boom we are having without open source. I just don’t think it would have been possible,” said Kevin Goldsmith, chief technology officer at Anaconda, a company that provides software tools based on the Python programming language — considered the lingua franca of AI — along with other open-source components used to build machine learning and AI-based projects. “If this was all proprietary solutions being sold, it never would have happened,” he said.
Listen: Kate Kaye talks with Kevin Goldsmith about global, open-source AI collaboration.
For decades, researchers working on these challenges have disseminated their technical achievements in scientific papers. A March 2022 report from the Stanford Institute for Human-Centered Artificial Intelligence showed the largest number of international AI research and development collaborations between 2010 and 2021 were among people from the U.S. and China working together.
China’s tech advancements pose some legitimate concerns when it comes to human rights and the potential to supercharge the country’s military capabilities. However, as the U.S. government makes drastic moves to stall China’s AI progress, broad restrictions on technical components of AI could have unintended consequences affecting global AI research and business.
“The U.S. policy community doesn’t understand how radically open AI research is today,” said Matt Sheehan, a fellow in the Asia Program at the Carnegie Endowment for International Peace and author of “The Transpacific Experiment,” a book about the connections between Silicon Valley and China.
“Most of the biggest AI advances aren’t closely guarded secrets that the Chinese government could steal — they’re already out there for use by anybody with data, compute, and machine learning skills,” Sheehan said.
Still, as global open-source AI projects between the U.S. and China remain active, efforts from both governments to thwart technical collaborations between the two countries could have a dampening effect on the vibrant ecosystem.
Most of the biggest AI advances aren’t closely guarded secrets that the Chinese government could steal — they’re already out there for use by anybody with data, compute, and machine learning skills.”
For instance, NASA plans to use semiconductors incorporating open-source tech built by collaborators in the U.S. and China including Google, Meta, Alibaba, and Huawei for missions to the moon and Mars. But at the same time, the U.S. government is cracking down on sales of AI-enabling semiconductor technology to China in hopes of damaging the growth of its nascent chip-manufacturing industry.
If there is a more deliberate tech disentanglement on the horizon, the global knots of open-source AI will not come apart easily.
“There’s no definition of open source that talks about national boundaries. It’s either open or it’s not open,” Goldsmith said.
The stateless mashups advancing AI
Modern AI technologies are a melting pot of shared foundational elements including free bundles of code, data sets, data architectures, and pre-built machine learning models that when cobbled together and customized, create AI tools and products. This movement has fostered machine learning, deep learning, computer vision, object and speech recognition, natural-language processing, and semiconductor chip technology.
“If you think of the epicenters around some of this, it’s not all in the U.S. There are strong pockets of really advanced work coming out of Europe, coming out of Canada, [and] coming out of China as well,” Goldsmith said. “And it’s all cross-border collaboration.”
China’s government appears to recognize this. The China Academy of Information and Communications Technology, a think tank under China’s Ministry of Industry and Information Technology, published a paper in February assessing the evolution of AI frameworks including TensorFlow, PyTorch, PaddlePaddle, and Huawei’s MindSpore through what the paper refers to as the “budding stage,” “growth stage,” and “stability stage” of each.
It recognized TensorFlow and PyTorch as a “duopoly” among “foreign” AI frameworks that “provide ecosystem-level output capabilities for Chinese AI applications.” It also recognized the growth of homegrown frameworks such as PaddlePaddle and MindSpore.
There’s no definition of open source that talks about national boundaries. It’s either open or it’s not open.”
“The AI framework is the operating system of the smart economy era,” the paper stated, according to a translation by Jeff Ding, an assistant professor of political science at George Washington University, who publishes a newsletter focused on AI in China.
“The next ten years will be a golden period for the global development of the digital economy and the entry of an intelligent economy and society. Focusing on the development of artificial intelligence infrastructure will provide strong traction for the development of China’s AI industry and the vigorous development of the digital economy,” the paper said.
“The [Chinese Communist Party] wants self-sufficiency. They don’t want to be dependent on Western technology,” said Alex Capri, a researcher and consultant studying U.S.-China trade flows and tech competition who teaches at the National University of Singapore Business School.
But China is not simply building applications on top of open-source AI frameworks created by U.S. technologists. Its AI researchers are combining open-source components from the U.S., China, and around the world to produce new technologies.
For instance, China’s search giant Baidu contributed a deep learning project to The Linux Foundation in 2018 that blends its open-source AI framework PaddlePaddle with TensorFlow and Kubernetes, both technologies developed and open-sourced by Google.
Meanwhile, Alibaba released open-source code for a recommendation engine based on Google’s TensorFlow and crafted with help from Intel and AI chipmaker Nvidia.
AI is a melting pot of shared foundational elements including free open-source bundles of code, data sets, data architectures, and prebuilt machine learning models that when cobbled together and customized, create AI tools and products. Here are a few examples of these cross-border AI mashups.
As open-source projects gain adoption, other software companies are investing in building ways to support them. Take Huawei’s Volcano, an open-source data processing technology built for machine and deep learning. In June, Apache Spark — the popular data tool built and open-sourced by Databricks — chose Volcano as its default scheduler for batch data processing. Contributors from Apple, Cloudera, Databricks, Huawei, and Netflix all worked on the project.
Microsoft, which has operated an important research lab in China for decades, has an AI collaboration agreement with ByteDance, the parent company of China’s massively popular social media export TikTok. In August, software engineers from the two companies discussed an open-source machine learning project incorporating Kubernetes and Ray, an AI platform from San Francisco-based Anyscale.
“We’re not from the same company, but we meet every week. We collaborate every week,” said Microsoft principal software engineer Ali Kanso, about his work with ByteDance software engineer Jiaxin Shan.
Microsoft’s GitHub, undoubtedly the world’s most populated online public square for the exchange of AI tech ideas, is the home for many of these collaborations.
While GitHub is a source of knowledge and tech support for China’s AI researchers and industry developers, China’s government has indicated it wants to untether the country’s tech developers from the site. China’s Ministry of Industry and Information Technology began backing a GitHub alternative for open-source tech sharing, Gitee, in 2020. Previously, in 2013, it blocked access to GitHub temporarily.
“In the past two or three years there [have been] a lot of connection resets and connection issues on GitHub,” an AI researcher based in Beijing who asked not to be named for fear of government retaliation told Protocol. However, the researcher said people use VPNs to circumvent blockades. “Even if they shut down connections, there is always another way around,” said the researcher.
Already, Gitee users have seen government intervention that some worry amounts to censorship. In May, developers in China were blocked from accessing Gitee, and its operators notified users that code posted to the site would be reviewed from then on before being published.
“It seems that the government there is very supportive of open source, but whether that becomes kind of open source within China for China, versus being more globally present as a contributor, is unclear,” Goldsmith said.
Google schools the U.S. Commerce Department
Increasingly, the U.S. government and advocates for stronger AI tech protections for the U.S. want to block the flow of advanced AI-related technologies to China. Among their chief concerns is development of “dual use” AI created by China for civilian use that may have potential military or criminal applications.
“Since about 2019 going forward, AI is very much a technology that is in the midst of this U.S.-China tech competition because of its potential dual uses and its potential misuses,” said Paul Triolo, senior vice president focused on China at global strategic consultancy Albright Stonebridge Group.
The AI framework is the operating system of the smart economy era.”
The White House published a list of critical and emerging technologies in February that could be used to inform national security-related activities, such as new export controls or cross-border investment reviews. But when the U.S. Commerce Department included AI on a similar list of tech that could be subject to export controls, purveyors of open-source AI technologies including Google — which in addition to TensorFlow and Kubernetes, has open-sourced large language machine learning models such as BERT — balked.
“There was interestingly a lot of pushback because of all of the things that they mentioned around AI, a lot of it was identifying algorithms that were open source. So the industry [questioned], ‘Why would we want to control these, how could you control open-source algorithms?’” Triolo said.
In a lengthy letter sent to the Commerce Department in 2019, Google mentioned PyTorch, Baidu’s PaddlePaddle, Microsoft’s Cognitive Toolkit, and its own TensorFlow as openly available machine learning libraries.
“These examples point to the fact that the information-sharing ecosystem for AI development is inherently international, with joint development occurring simultaneously across borders, and with a significant open-source culture. U.S. persons in the United States, working for companies with U.S. offices only, do not have a monopoly on such technology,” the company wrote.
No discussion about the ingredients of AI would be complete without mentioning semiconductors, the hardware behind the high-performance computing necessary to train machine and deep learning models. While AI software that’s been assembled from multiple open-source components could prove difficult for governments to monitor or restrict, some semiconductor technologies may be easier for governments to wrangle.
The U.S. government has severely restricted sales of advanced U.S. semiconductor technology to China, for example, in an effort to curtail China’s AI research, development, and business opportunities as well as its use of AI for surveillance and military applications.
The people who need advanced chips to process data and train large machine learning models will feel the effects of the ban, possibly before others will. “That will have a big impact on the researchers and engineers in China,” said Yang You, founder of open-source AI optimizing software company HPC-AI Tech. “Nvidia GPUs in China will be older than the Nvidia GPUs in the U.S. Basically, they are using a worse product,” You said.
However, there is a borderless collaboration effort dedicated to building open-source semiconductor architecture gaining some steam. The RISC-V project involves large tech companies in the U.S. and China including Google, Meta’s Oculus, IBM, Nvidia, Intel, Alibaba, and Huawei.
Chipmaker SiFive said in September it will build RISC-V chips in the U.S. for NASA, and U.S. chip giant Intel has begun building RISC-V chips and supporting the RISC-V architecture. Some inside China see the project as one that could help the country become more self-sufficient in regards to its semiconductor supply; however, at this early stage, RISC-V is not mature enough for it to replace chips based on hardware tech outside of China.
RISC-V declined to comment for this story or to provide any updates regarding its partnerships. The group makes a point on its website to state that it “does not take a political position on behalf of any geography.” The RISC-V Foundation is incorporated as RISC-V International Association in Switzerland.
Semiconductor technology can heavily influence how AI is built and which components companies invest in, said Davis Sawyer, co-founder and chief product officer of Canada’s Deeplite, which provides software that compresses AI so it can work in devices such as phones or vehicles. “If the chip doesn’t support something, you can’t build [with] it,” Sawyer said.
A bad breakup?
If tensions between the U.S. and China continue to escalate, a slow, arduous process of U.S.-China technology and business detachment may be ahead — one that Capri said will be “messy.”
“It’s not a zero-sum outcome,” he said. “After three-plus decades of entanglement and integration, it’s very difficult to disentangle. It’s not like pulling the plug on something overnight and the light goes off.”
And if the U.S. government were to demand broader detachment from China’s technology economy, it might not work, said Ding, who studies AI in China and how it affects China’s power balance with the U.S.
“From a high level, decoupling as a broad brush, overall strategy — the fact that AI development is so globalized — renders that broad brush overall strategy relatively infeasible,” Ding said.
Whether tech research and development collaborations that take place on GitHub would be subject to future laws remains to be seen, but such an approach could backfire.
“There are thoughtful people looking into situations where that openness can be harmful or dangerous. But if the U.S. government comes in and tells AI scientists they can’t publish like this anymore, we’re going to see a huge backlash that could do major damage to U.S. competitiveness,” Sheehan said.
Original post: https://www.protocol.com/enterprise/china-us-ai-open-source