Skip to content
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Insecure Cache Algorithms

> **Warning: Insecure Cache Key Algorithm (SHA-1)**

LangChain's default cache key encoder uses the SHA-1 hashing algorithm to generate cache keys for prompt/LLM pairs. While this is generally acceptable for most cache scenarios, **SHA-1 is _not_ collision-resistant**. This means that a motivated attacker could potentially craft two different payloads that result in the same cache key, leading to possible cache poisoning or unexpected cache hits.

SHA-1 is now deprecated for cache key generation in LangChain. However, to maintain compatibility with existing deployments, the transition away from SHA-1 is opt-in rather than automatic. In later versions, SHA-1 will be replaced as the default by a more secure hashing algorithm.

### Why does this matter?

- **Security Risk:** If your application is exposed to untrusted input, an attacker could intentionally generate two different prompts or LLM keys that hash to the same value, causing one to overwrite the other's cache entry.
- **Data Integrity:** Collisions could result in incorrect generations being returned from the cache, which may be problematic in sensitive or high-integrity environments.

### When should you care?

- If your application is public-facing or handles sensitive data.
- If cache integrity is critical to your workflow.
- If you have compliance or security requirements that prohibit the use of weak hash functions.

### How to mitigate

You can supply a stronger hash function (such as SHA-256 or SHA-3) for cache key encoding by using the `makeDefaultKeyEncoder()` method on your cache instance. For example:

```ts
import { sha256 } from "@langchain/core/utils/hash/sha256";

const client = new CacheClient(...);
client.makeDefaultKeyEncoder(sha256);
```
1 change: 1 addition & 0 deletions langchain-core/.eslintrc.cjs
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ module.exports = {
"src/utils/@cfworker",
"src/utils/fast-json-patch",
"src/utils/js-sha1",
"src/utils/js-sha256",
"src/utils/sax-js",
".eslintrc.cjs",
"scripts",
Expand Down
8 changes: 8 additions & 0 deletions langchain-core/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,14 @@ utils/hash.cjs
utils/hash.js
utils/hash.d.ts
utils/hash.d.cts
utils/hash/insecure.cjs
utils/hash/insecure.js
utils/hash/insecure.d.ts
utils/hash/insecure.d.cts
utils/hash/sha256.cjs
utils/hash/sha256.js
utils/hash/sha256.d.ts
utils/hash/sha256.d.cts
utils/json_patch.cjs
utils/json_patch.js
utils/json_patch.d.ts
Expand Down
2 changes: 2 additions & 0 deletions langchain-core/langchain.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,8 @@ export const config = {
"utils/event_source_parse": "utils/event_source_parse",
"utils/function_calling": "utils/function_calling",
"utils/hash": "utils/hash",
"utils/hash/insecure": "utils/js-sha1/hash",
"utils/hash/sha256": "utils/js-sha256/hash",
"utils/json_patch": "utils/json_patch",
"utils/json_schema": "utils/json_schema",
"utils/math": "utils/math",
Expand Down
26 changes: 26 additions & 0 deletions langchain-core/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -557,6 +557,24 @@
"import": "./utils/hash.js",
"require": "./utils/hash.cjs"
},
"./utils/hash/insecure": {
"types": {
"import": "./utils/hash/insecure.d.ts",
"require": "./utils/hash/insecure.d.cts",
"default": "./utils/hash/insecure.d.ts"
},
"import": "./utils/hash/insecure.js",
"require": "./utils/hash/insecure.cjs"
},
"./utils/hash/sha256": {
"types": {
"import": "./utils/hash/sha256.d.ts",
"require": "./utils/hash/sha256.d.cts",
"default": "./utils/hash/sha256.d.ts"
},
"import": "./utils/hash/sha256.js",
"require": "./utils/hash/sha256.cjs"
},
"./utils/json_patch": {
"types": {
"import": "./utils/json_patch.d.ts",
Expand Down Expand Up @@ -841,6 +859,14 @@
"utils/hash.js",
"utils/hash.d.ts",
"utils/hash.d.cts",
"utils/hash/insecure.cjs",
"utils/hash/insecure.js",
"utils/hash/insecure.d.ts",
"utils/hash/insecure.d.cts",
"utils/hash/sha256.cjs",
"utils/hash/sha256.js",
"utils/hash/sha256.d.ts",
"utils/hash/sha256.d.cts",
"utils/json_patch.cjs",
"utils/json_patch.js",
"utils/json_patch.d.ts",
Expand Down
28 changes: 24 additions & 4 deletions langchain-core/src/caches/base.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import { insecureHash } from "../utils/hash.js";
import { insecureHash, type HashKeyEncoder } from "../utils/hash.js";
import type { Generation, ChatGeneration } from "../outputs.js";
import { mapStoredMessageToChatMessage } from "../messages/utils.js";
import { type StoredGeneration } from "../messages/base.js";
Expand All @@ -12,8 +12,11 @@ import { type StoredGeneration } from "../messages/base.js";
* separate concerns and scale horizontally.
*
* TODO: Make cache key consistent across versions of LangChain.
*
* @deprecated Use `makeDefaultKeyEncoder()` to create a custom key encoder.
* This function will be removed in a future version.
*/
export const getCacheKey = (...strings: string[]): string =>
export const getCacheKey: HashKeyEncoder = (...strings) =>
insecureHash(strings.join("_"));

export function deserializeStoredGeneration(
Expand Down Expand Up @@ -43,6 +46,21 @@ export function serializeGeneration(generation: Generation) {
* Base class for all caches. All caches should extend this class.
*/
export abstract class BaseCache<T = Generation[]> {
// For backwards compatibility, we use a default key encoder
// that uses SHA-1 to hash the prompt and LLM key. This will also print a warning
// about the security implications of using SHA-1 as a cache key.
protected keyEncoder: HashKeyEncoder = getCacheKey;

/**
* Sets a custom key encoder function for the cache.
* This function should take a prompt and an LLM key and return a string
* that will be used as the cache key.
* @param keyEncoderFn The custom key encoder function.
*/
makeDefaultKeyEncoder(keyEncoderFn: HashKeyEncoder): void {
this.keyEncoder = keyEncoderFn;
}

abstract lookup(prompt: string, llmKey: string): Promise<T | null>;

abstract update(prompt: string, llmKey: string, value: T): Promise<void>;
Expand All @@ -69,7 +87,9 @@ export class InMemoryCache<T = Generation[]> extends BaseCache<T> {
* @returns The data corresponding to the prompt and LLM key, or null if not found.
*/
lookup(prompt: string, llmKey: string): Promise<T | null> {
return Promise.resolve(this.cache.get(getCacheKey(prompt, llmKey)) ?? null);
return Promise.resolve(
this.cache.get(this.keyEncoder(prompt, llmKey)) ?? null
);
}

/**
Expand All @@ -79,7 +99,7 @@ export class InMemoryCache<T = Generation[]> extends BaseCache<T> {
* @param value The data to be stored.
*/
async update(prompt: string, llmKey: string, value: T): Promise<void> {
this.cache.set(getCacheKey(prompt, llmKey), value);
this.cache.set(this.keyEncoder(prompt, llmKey), value);
}

/**
Expand Down
36 changes: 36 additions & 0 deletions langchain-core/src/caches/tests/in_memory_cache.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -39,3 +39,39 @@ test("InMemoryCache works with complex message types", async () => {
text: "text1",
});
});

test("InMemoryCache handles default key encoder", async () => {
const cache = new InMemoryCache();

await cache.update("prompt1", "key1", [
{
text: "text1",
},
]);

// expect this to call console.warn about SHA-1 usage
const result = await cache.lookup("prompt1", "key1");

expect(result).toBeDefined();
});

test("InMemoryCache handles custom key encoder", async () => {
const cache = new InMemoryCache();

// use fancy hashing algorithm to encode the key :)
cache.makeDefaultKeyEncoder((prompt, key) => `${prompt}###${key}`);

// expect custom key encoder not to call console.warn
await cache.update("prompt1", "key1", [
{
text: "text1",
},
]);

const result1 = await cache.lookup("prompt1", "key1");
expect(result1).toBeDefined();
if (!result1) {
return;
}
expect(result1[0].text).toBe("text1");
});
15 changes: 12 additions & 3 deletions langchain-core/src/indexing/base.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import { v5 as uuidv5 } from "uuid";
import { VectorStore } from "../vectorstores.js";
import { RecordManagerInterface, UUIDV5_NAMESPACE } from "./record_manager.js";
import { insecureHash } from "../utils/hash.js";
import { insecureHash, type HashKeyEncoder } from "../utils/hash.js";
import { DocumentInterface, Document } from "../documents/document.js";
import { BaseDocumentLoader } from "../document_loaders/base.js";

Expand Down Expand Up @@ -51,12 +51,21 @@ export class _HashedDocument implements HashedDocumentInterface {

metadata: Metadata;

// For backwards compatibility, we use a default key encoder
// that uses SHA-1 to hash the prompt and LLM key. This will also print a warning
// about the security implications of using SHA-1 as a key encoder.
private keyEncoder: HashKeyEncoder = insecureHash;

constructor(fields: HashedDocumentArgs) {
this.uid = fields.uid;
this.pageContent = fields.pageContent;
this.metadata = fields.metadata;
}

makeDefaultKeyEncoder(keyEncoderFn: HashKeyEncoder): void {
this.keyEncoder = keyEncoderFn;
}

calculateHashes(): void {
const forbiddenKeys = ["hash_", "content_hash", "metadata_hash"];

Expand Down Expand Up @@ -110,13 +119,13 @@ export class _HashedDocument implements HashedDocumentInterface {
}

private _hashStringToUUID(inputString: string): string {
const hash_value = insecureHash(inputString);
const hash_value = this.keyEncoder(inputString);
return uuidv5(hash_value, UUIDV5_NAMESPACE);
}

private _hashNestedDictToUUID(data: Record<string, unknown>): string {
const serialized_data = JSON.stringify(data, Object.keys(data).sort());
const hash_value = insecureHash(serialized_data);
const hash_value = this.keyEncoder(serialized_data);
return uuidv5(hash_value, UUIDV5_NAMESPACE);
}
}
Expand Down
8 changes: 8 additions & 0 deletions langchain-core/src/utils/hash.ts
Original file line number Diff line number Diff line change
@@ -1 +1,9 @@
export { insecureHash } from "./js-sha1/hash.js";
export { sha256 } from "./js-sha256/hash.js";

/**
* A function type for encoding hash keys.
* Accepts any number of string arguments (such as prompt and LLM key)
* and returns a single string to be used as the hash key.
*/
export type HashKeyEncoder = (...strings: string[]) => string;
20 changes: 20 additions & 0 deletions langchain-core/src/utils/js-sha1/hash.ts
Original file line number Diff line number Diff line change
Expand Up @@ -412,6 +412,26 @@ Sha1.prototype.arrayBuffer = function () {
return buffer;
};

let hasLoggedWarning = false;

/**
* @deprecated Use `makeDefaultKeyEncoder()` to create a custom key encoder.
* This function will be removed in a future version.
*/
export const insecureHash = (message) => {
if (!hasLoggedWarning) {
console.warn(
[
`The default method for hashing keys is insecure and will be replaced in a future version,`,
`but hasn't been replaced yet as to not break existing caches. It's recommended that you use`,
`a more secure hashing algorithm to avoid cache poisoning.`,
``,
`See this page for more information:`,
`|`,
`└> https://js.langchain.com/docs/troubleshooting/warnings/insecure-cache-algorithm`,
].join("\n")
);
hasLoggedWarning = true;
}
return new Sha1(true).update(message)["hex"]();
};
22 changes: 22 additions & 0 deletions langchain-core/src/utils/js-sha256/LICENSE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
Copyright (c) 2014-2025 Chen, Yi-Cyuan

MIT License

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Loading
Loading