I conducted a fixed analysis of DeepSeek, a Chinese LLM chatbot, utilizing version 1.8.0 from the Google Play Store. The goal was to identify potential security and privacy problems.
I've blogged about DeepSeek formerly here.
Additional security and personal privacy issues about DeepSeek have been raised.
See also this analysis by NowSecure of the iPhone version of DeepSeek
The findings detailed in this report are based purely on fixed analysis. This means that while the code exists within the app, there is no definitive evidence that all of it is carried out in practice. Nonetheless, the existence of such code warrants scrutiny, specifically given the growing issues around information personal privacy, security, the prospective abuse of AI-driven applications, and cyber-espionage characteristics in between international powers.
Key Findings
Suspicious Data Handling & Exfiltration
- Hardcoded URLs direct information to external servers, raising concerns about user activity monitoring, such as to ByteDance "volce.com" endpoints. NowSecure determines these in the iPhone app yesterday too.
- Bespoke encryption and data obfuscation techniques are present, systemcheck-wiki.de with signs that they could be used to exfiltrate user details.
- The app contains hard-coded public keys, rather than counting on the user device's chain of trust.
- UI interaction tracking captures detailed user behavior without clear consent.
- WebView manipulation is present, which might enable the app to gain access to personal external browser data when links are opened. More details about WebView manipulations is here
Device Fingerprinting & Tracking
A substantial part of the examined code appears to concentrate on event device-specific details, which can be utilized for tracking and fingerprinting.
- The app collects various special device identifiers, including UDID, Android ID, IMEI, IMSI, and provider details. - System properties, set up plans, and root detection mechanisms suggest potential anti-tampering measures. E.g. probes for the presence of Magisk, a tool that personal privacy supporters and security scientists use to root their Android devices.
- Geolocation and network profiling exist, showing potential tracking capabilities and allowing or disabling of fingerprinting routines by region. - Hardcoded gadget design lists suggest the application might act differently depending on the identified hardware.
- Multiple vendor-specific services are utilized to draw out extra device details. E.g. if it can not figure out the gadget through basic Android SIM lookup (since authorization was not granted), it tries maker specific extensions to access the exact same details.
Potential Malware-Like Behavior
While no definitive conclusions can be drawn without vibrant analysis, numerous observed behaviors line up with known spyware and malware patterns:
- The app uses reflection and UI overlays, which could assist in unapproved screen capture or phishing attacks. - SIM card details, identification numbers, and other device-specific information are aggregated for unknown purposes.
- The app implements country-based gain access to constraints and "risk-device" detection, recommending possible surveillance mechanisms.
- The app implements calls to fill Dex modules, where additional code is packed from files with a.so extension at runtime.
- The.so submits themselves turn around and make extra calls to dlopen(), which can be utilized to fill additional.so files. This center is not generally checked by Protect and other static analysis services.
- The.so files can be executed in native code, such as C++. The usage of native code includes a layer of complexity to the analysis process and obscures the complete extent of the app's abilities. Moreover, native code can be leveraged to more quickly escalate advantages, potentially exploiting vulnerabilities within the os or setiathome.berkeley.edu device hardware.
Remarks
While information collection prevails in contemporary applications for debugging and enhancing user experience, aggressive fingerprinting raises substantial privacy issues. The DeepSeek app requires users to log in with a valid email, setiathome.berkeley.edu which need to already provide enough authentication. There is no legitimate factor for the app to strongly collect and transmit unique device identifiers, IMEI numbers, SIM card details, and other non-resettable system residential or commercial properties.
The extent of tracking observed here exceeds normal analytics practices, possibly enabling relentless user tracking and re-identification across devices. These behaviors, combined with obfuscation methods and network communication with third-party tracking services, warrant a greater level of scrutiny from security scientists and users alike.
The employment of runtime code filling as well as the bundling of native code recommends that the app could enable the deployment and execution of unreviewed, remotely delivered code. This is a severe prospective attack vector. No proof in this report exists that remotely deployed code execution is being done, just that the center for this appears present.
Additionally, the app's approach to identifying rooted gadgets appears extreme for an AI chatbot. Root detection is frequently warranted in DRM-protected streaming services, where security and content security are important, or in competitive computer game to avoid unfaithful. However, there is no clear reasoning for such stringent procedures in an application of this nature, raising additional concerns about its intent.
Users and companies considering setting up DeepSeek must know these potential risks. If this application is being utilized within a business or federal government environment, extra vetting and security controls ought to be enforced before enabling its implementation on handled gadgets.
Disclaimer: The analysis presented in this report is based on static code evaluation and does not imply that all discovered functions are actively utilized. Further examination is required for conclusive conclusions.