1 DeepSeek: the Chinese aI Model That's a Tech Breakthrough and A Security Risk
Brandy Fysh edited this page 1 year ago


DeepSeek: at this stage, the only takeaway is that open-source designs exceed exclusive ones. Everything else is troublesome and I don't purchase the public numbers.

DeepSink was built on top of open source Meta designs (PyTorch, Llama) and ClosedAI is now in danger because its appraisal is outrageous.

To my knowledge, no public documents links DeepSeek straight to a specific "Test Time Scaling" strategy, but that's extremely probable, so enable me to streamline.

Test Time Scaling is used in machine discovering to scale the model's performance at test time rather than during training.

That means less GPU hours and galgbtqhistoryproject.org less powerful chips.

Simply put, lower computational requirements and lower hardware costs.

That's why Nvidia lost practically $600 billion in market cap, the greatest one-day loss in U.S. history!

Many individuals and institutions who shorted American AI stocks became incredibly rich in a few hours because financiers now project we will require less powerful AI chips ...

Nvidia short-sellers just made a single-day earnings of $6.56 billion according to research study from S3 Partners. Nothing compared to the market cap, I'm looking at the single-day quantity. More than 6 billions in less than 12 hours is a lot in my book. Which's simply for Nvidia. Short sellers of chipmaker Broadcom made more than $2 billion in profits in a couple of hours (the US stock market runs from 9:30 AM to 4:00 PM EST).

The Nvidia Short Interest With time information shows we had the second greatest level in January 2025 at $39B however this is dated since the last record date was Jan 15, 2025 -we have to wait for the current information!

A tweet I saw 13 hours after publishing my post! Perfect summary Distilled language designs

Small language designs are trained on a smaller scale. What makes them various isn't simply the abilities, it is how they have been constructed. A distilled language design is a smaller sized, more effective design developed by transferring the knowledge from a bigger, more complicated design like the future ChatGPT 5.

Imagine we have a teacher model (GPT5), which is a big language model: a deep neural network trained on a lot of data. Highly resource-intensive when there's limited computational power or when you need speed.

The understanding from this instructor design is then "distilled" into a trainee model. The trainee design is simpler and has less parameters/layers, which makes it lighter: less memory use and computational demands.

During distillation, the trainee design is trained not only on the raw data however likewise on the outputs or the "soft targets" (likelihoods for each class rather than tough labels) produced by the teacher model.

With distillation, the trainee design gains from both the original information and the detailed forecasts (the "soft targets") made by the instructor design.

Simply put, the trainee design doesn't simply gain from "soft targets" however also from the very same training information used for the instructor, however with the assistance of the teacher's outputs. That's how understanding transfer is optimized: dual knowing from data and from the instructor's predictions!

Ultimately, the trainee simulates the instructor's decision-making procedure ... all while using much less computational power!

But here's the twist as I understand it: DeepSeek didn't just extract content from a single large language design like ChatGPT 4. It relied on many big language designs, consisting of open-source ones like Meta's Llama.

So now we are distilling not one LLM however multiple LLMs. That was among the "genius" concept: mixing various architectures and datasets to create a seriously versatile and robust little language design!

DeepSeek: Less guidance

Another essential development: less human supervision/guidance.

The concern is: how far can models choose less human-labeled data?

R1-Zero found out "thinking" capabilities through trial and error, it develops, it has distinct "thinking habits" which can cause noise, unlimited repeating, and language mixing.

R1-Zero was experimental: there was no preliminary guidance from identified data.

DeepSeek-R1 is different: it utilized a structured training pipeline that consists of both monitored fine-tuning and support learning (RL). It started with initial fine-tuning, followed by RL to improve and improve its thinking capabilities.

The end result? Less sound and no language mixing, unlike R1-Zero.

R1 uses human-like thinking patterns first and it then advances through RL. The innovation here is less human-labeled data + RL to both guide and refine the model's performance.

My question is: did DeepSeek truly fix the problem understanding they extracted a great deal of information from the datasets of LLMs, which all gained from human guidance? In other words, is the standard dependency actually broken when they depend on formerly trained designs?

Let me show you a live real-world screenshot shared by Alexandre Blanc today. It shows training information extracted from other models (here, bytes-the-dust.com ChatGPT) that have actually gained from human supervision ... I am not yet that the conventional reliance is broken. It is "simple" to not need huge amounts of premium reasoning data for training when taking shortcuts ...

To be well balanced and reveal the research, I have actually published the DeepSeek R1 Paper (downloadable PDF, 22 pages).

My concerns regarding DeepSink?

Both the web and mobile apps collect your IP, keystroke patterns, hb9lc.org and device details, and whatever is stored on servers in China.

Keystroke pattern analysis is a behavioral biometric technique used to identify and validate individuals based upon their unique typing patterns.

I can hear the "But 0p3n s0urc3 ...!" remarks.

Yes, open source is excellent, but this reasoning is restricted because it does rule out human psychology.

Regular users will never run designs in your area.

Most will merely want fast answers.

Technically unsophisticated users will utilize the web and mobile variations.

Millions have currently downloaded the mobile app on their phone.

DeekSeek's models have a genuine edge which's why we see ultra-fast user adoption. For now, they are exceptional to Google's Gemini or OpenAI's ChatGPT in lots of methods. R1 scores high up on unbiased criteria, no doubt about that.

I recommend looking for anything delicate that does not align with the Party's propaganda online or mobile app, and the output will promote itself ...

China vs America

Screenshots by T. Cassel. Freedom of speech is beautiful. I could share horrible examples of propaganda and censorship however I will not. Just do your own research. I'll end with DeepSeek's privacy policy, which you can keep reading their site. This is a simple screenshot, absolutely nothing more.

Rest ensured, your code, concepts and discussions will never be archived! When it comes to the real investments behind DeepSeek, we have no concept if they remain in the numerous millions or in the billions. We feel in one's bones the $5.6 M amount the media has actually been pushing left and right is misinformation!