Skip to content

Not able to add new github projects with sparse checkout of folder #4851

@jkjha

Description

@jkjha

Update
Below issue seems to be happening only with the projects with sparse checkout of git repo. For other projects, it seems to work fine.

I am running a opengrok docker instance with many projects... however after recent opengrok version upgrade and few unrelated changes (OS upgrade where code reside), the indexer command started failing. I see error only in history command, but it doesnt create either data/xref/ or index data.

I see below error in log:

Sep 20, 2025 12:50:11 PM org.opengrok.indexer.history.HistoryGuru lambda$createHistoryCacheReal$36
WARNING: failed to create history cache for {dir='/opengrok/src/project_name.github.devops',type=git,history=on,historyCache=on,merge=true,annotationCache=off,tagsEnabled=off}
org.eclipse.jgit.api.errors.JGitInternalException: Error while parsing attributes
	at org.eclipse.jgit.treewalk.TreeWalk.getAttributes(TreeWalk.java:635)
	at org.eclipse.jgit.treewalk.TreeWalk.getAttributes(TreeWalk.java:589)
	at org.eclipse.jgit.diff.DiffEntry.scan(DiffEntry.java:169)
	at org.eclipse.jgit.diff.DiffEntry.scan(DiffEntry.java:110)
	at org.eclipse.jgit.diff.DiffEntry.scan(DiffEntry.java:87)
	at org.eclipse.jgit.diff.DiffFormatter.scan(DiffFormatter.java:533)
	at org.opengrok.indexer.history.GitRepository.getFilesBetweenCommits(GitRepository.java:649)

To Reproduce
Place the folder in /opengrok/src/<project_name> and start the indexer command :

Example command that I run manually (it runs automatically bu container):

opengrok-reindex-project -J=-XX:-UseGCOverheadLimit -J=-Xmx36g -J=-server --printoutput --api_timeout 300 --jar /opengrok/lib/opengrok.jar -t /opengrok/etc/logging.properties.template -d /opengrok/log/project_name.github.devops -U http://localhost:8080/ -P project_name.github.devops -- --connectTimeout 300 -r dirbased -G -m 4096 --leadingWildCards on -c /usr/local/bin/ctags -o /opengrok/etc/ctags.config -U http://localhost:8080/ -H project_name.github.devops

Expected behavior
Indexing should work normally and it should create "/opengrok/data/xref/project_name.github.devops, /opengrok/data/index/project_name.github.devops etc.

Full log

Sep 20, 2025 12:49:40 PM org.opengrok.indexer.configuration.Configuration read
INFO: Reading configuration from '/tmp/tmpb7qpb63v'
Sep 20, 2025 12:49:41 PM org.opengrok.indexer.index.Indexer parseOptions
INFO: Indexer options: [-R, /tmp/tmpb7qpb63v, --connectTimeout, 300, -r, dirbased, -G, -m, 4096, --leadingWildCards, on, -c, /usr/local/bin/ctags, -o, /opengrok/etc/ctags.config, -U, http://localhost:8080/, -H, project_name.github.devops]
INFO: file with extra options for ctags: /opengrok/etc/ctags.config
SLF4J(W): No SLF4J providers were found.
SLF4J(W): Defaulting to no-operation (NOP) logger implementation
SLF4J(W): See https://www.slf4j.org/codes.html#noProviders for further details.
Sep 20, 2025 12:49:42 PM org.opengrok.indexer.util.Statistics logIt
INFO: Done invalidating repositories (1 valid, 1 working) (took 529 ms)
Sep 20, 2025 12:49:42 PM org.opengrok.indexer.index.Indexer runMain
INFO: Indexer version 1.14.2 (4c5dc2465cb0729dfe0fc765d8bd7cf99ee82e91) running on Java version: 21.0.8+9-LTS, name: OpenJDK 64-Bit Server VM, vendor: Eclipse Adoptium, arch: amd64 with properties: ncpu: 14, maxMemory: 36.0 GiB
Sep 20, 2025 12:49:42 PM org.opengrok.indexer.configuration.RuntimeEnvironment validateUniversalCtags
INFO: Using ctags: Universal Ctags 6.2.0(df6a390df), Copyright (C) 2015-2025 Universal Ctags Team
Universal Ctags is derived from Exuberant Ctags.
Exuberant Ctags 5.8, Copyright (C) 1996-2009 Darren Hiebert
  Compiled: Aug 29 2025, 08:33:35
  URL: https://ctags.io/
  Output version: 1.1
  Optional compiled features: +wildcards, +regex, +iconv, +option-directory, +xpath, +yaml, +packcc, +optscript
Sep 20, 2025 12:49:42 PM org.opengrok.indexer.index.Indexer prepareIndexer
INFO: Generating history cache for repositories: /project_name.github.devops
Sep 20, 2025 12:49:42 PM org.opengrok.indexer.history.HistoryGuru createHistoryCacheReal
INFO: Creating history cache for 1 repositories
Sep 20, 2025 12:49:42 PM org.opengrok.indexer.history.HistoryGuru createHistoryCache
INFO: Creating history cache for {dir='/opengrok/src/project_name.github.devops',type=git,history=on,historyCache=on,merge=true,annotationCache=off,tagsEnabled=off}
Sep 20, 2025 12:50:11 PM org.opengrok.indexer.history.HistoryGuru lambda$createHistoryCacheReal$36
WARNING: failed to create history cache for {dir='/opengrok/src/project_name.github.devops',type=git,history=on,historyCache=on,merge=true,annotationCache=off,tagsEnabled=off}
org.eclipse.jgit.api.errors.JGitInternalException: Error while parsing attributes
	at org.eclipse.jgit.treewalk.TreeWalk.getAttributes(TreeWalk.java:635)
	at org.eclipse.jgit.treewalk.TreeWalk.getAttributes(TreeWalk.java:589)
	at org.eclipse.jgit.diff.DiffEntry.scan(DiffEntry.java:169)
	at org.eclipse.jgit.diff.DiffEntry.scan(DiffEntry.java:110)
	at org.eclipse.jgit.diff.DiffEntry.scan(DiffEntry.java:87)
	at org.eclipse.jgit.diff.DiffFormatter.scan(DiffFormatter.java:533)
	at org.opengrok.indexer.history.GitRepository.getFilesBetweenCommits(GitRepository.java:649)
	at org.opengrok.indexer.history.GitRepository.getFilesForCommit(GitRepository.java:615)
	at org.opengrok.indexer.history.GitRepository.traverseHistory(GitRepository.java:544)
	at org.opengrok.indexer.history.RepositoryWithHistoryTraversal.doCreateCache(RepositoryWithHistoryTraversal.java:194)
	at org.opengrok.indexer.history.Repository.createCache(Repository.java:403)
	at org.opengrok.indexer.history.HistoryGuru.createHistoryCache(HistoryGuru.java:1078)
	at org.opengrok.indexer.history.HistoryGuru.lambda$createHistoryCacheReal$36(HistoryGuru.java:1122)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: org.eclipse.jgit.errors.MissingObjectException: Missing blob 6bd7389c8ac39c614e8ffdfd075f2ca8bbb83e6d
	at org.eclipse.jgit.internal.storage.file.WindowCursor.open(WindowCursor.java:138)
	at org.eclipse.jgit.treewalk.CanonicalTreeParser.loadAttributes(CanonicalTreeParser.java:393)
	at org.eclipse.jgit.treewalk.CanonicalTreeParser.findAttributes(CanonicalTreeParser.java:385)
	at org.eclipse.jgit.treewalk.CanonicalTreeParser.getEntryAttributesNode(CanonicalTreeParser.java:375)
	at org.eclipse.jgit.attributes.AttributesHandler.attributesNode(AttributesHandler.java:402)
	at org.eclipse.jgit.attributes.AttributesHandler.mergePerDirectoryEntryAttributes(AttributesHandler.java:232)
	at org.eclipse.jgit.attributes.AttributesHandler.getAttributes(AttributesHandler.java:144)
	at org.eclipse.jgit.treewalk.TreeWalk.getAttributes(TreeWalk.java:631)
	... 16 more

Sep 20, 2025 12:50:11 PM org.opengrok.indexer.util.Statistics logIt
INFO: Done history cache for all repositories (took 29.252 seconds)
Sep 20, 2025 12:50:11 PM org.opengrok.indexer.index.Indexer prepareIndexer
INFO: Done generating history cache
Sep 20, 2025 12:50:11 PM org.opengrok.indexer.util.Statistics logIt
INFO: Done invalidating repositories (1 valid, 1 working) (took 79 ms)
Sep 20, 2025 12:50:11 PM org.opengrok.indexer.index.Indexer doIndexerExecution
INFO: Starting indexing
Sep 20, 2025 12:50:11 PM org.apache.lucene.store.MemorySegmentIndexInputProvider <init>
INFO: Using MemorySegmentIndexInput with Java 21; to disable start with -Dorg.apache.lucene.store.MMapDirectory.enableMemorySegments=false
Sep 20, 2025 12:50:11 PM org.opengrok.indexer.index.IndexDatabase addIndexDatabaseForProject
SEVERE: Failed to create history cache for some repositories of project project_name.github.devops:indexed=false,history=true: {{dir='/opengrok/src/p2v.tera.vcf.main.github-...
Sep 20, 2025 12:50:11 PM org.opengrok.indexer.index.Indexer doIndexerExecution
INFO: Waiting for the executors to finish
Sep 20, 2025 12:50:11 PM org.opengrok.indexer.util.Statistics logIt
INFO: Done indexing data of all repositories (took 37 ms)
Sep 20, 2025 12:50:11 PM org.opengrok.indexer.util.Statistics logIt
INFO: Indexer finished (took 31.30 seconds)
Sep 20, 2025 12:50:11 PM org.opengrok.indexer.index.Indexer runMain
INFO: Indexer finished with success

Additional context
Opengrok image: 1.14.1
Indexer running on centos(rocky9)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions